[ 
https://issues.apache.org/jira/browse/SPARK-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549366#comment-14549366
 ] 

Christian Kadner edited comment on SPARK-6785 at 5/18/15 11:21 PM:
-------------------------------------------------------------------

{panel:borderStyle=dashed|borderColor=#ccc|bgColor=#FFFFCE}
Please review only my second Pull Request 
+[6242|https://github.com/apache/spark/pull/6242]+ and ignore my first Pull 
Request -[6236|https://github.com/apache/spark/pull/6236]-
Thank you!
{panel}
\\
Before my fix, the from-and-to Java date conversion of dates before 1970 will 
only work for {{java.sql.Date}} objects that reflect a date and time exactly at 
midnight in the System's local time zone. 
Otherwise, if the Date's time is just one millisecond before or after midnight, 
the result of the above conversion will be offset by one day for Dates before 
1970 because of a rounding (truncation) flaw in the function 
{{DateUtils.millisToDays(Long):Int}}

\\

{code}
  scala> val df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")
  df: java.text.SimpleDateFormat = yyyy-MM-dd HH:mm:ss

  scala> val d1 = new Date(df.parse("1969-01-01 00:00:00").getTime)
  d2: java.sql.Date = 1969-01-01
        
  scala> val d2 = new Date(df.parse("1969-01-01 00:00:01").getTime)
  d2: java.sql.Date = 1969-01-01

  scala> DateUtils.toJavaDate(DateUtils.fromJavaDate(d1))
  res1: java.sql.Date = 1969-01-01
        
  scala> DateUtils.toJavaDate(DateUtils.fromJavaDate(d2))
  res2: java.sql.Date = 1969-01-02
{code}

\\

What is the code doing and how to fix it:

\\

 - A {{java.util.Date}} is represented by milliseconds ({{Long}}) since the 
Epoch (1970/01/01 0:00:00 GMT) with positive numbers for dates after and 
negative numbers for dates before 1970
 
 - The function {{DateUtils.fromJavaDate(java.util.Date):Int}} calculates the 
number of full days passed since 1970/01/01 00:00:00 (local time, not UTC), but 
by using the data type {{Long}} (as opposed to {{Double}}) when  converting 
milliseconds to days it essentially truncates the fractional part of days 
passed (disregarding the impact of hours, minutes, seconds)
 
 - The function {{DateUtils.toJavaDate(Int):Date}} converts the given number of 
days into milliseconds and adds it 1970/01/01 00:00:00 (local time, not UTC)

 - _Side note: The time-zone offset from UTC is factored in when converting a 
Date to days and removed when converting days to Date, so the time-zone 
shifting is neutralized in the round-trip conversion 
{{toJavaDate(fromJavaDate(java.util.Date))}}._
 
 - The truncation of partial days is not a problem for dates after 1970 since 
adding a fraction of a day to any date will not flip the calendar to the next 
day (since all our Dates start 0:00:00 AM)
 
 - That truncation of partial days however is a problem when subtracting even a 
second from a {{Date}} with time at 0:00:00 AM which should turn the calender 
back one day to the previous date
 
 - Ideally the date conversion should be done using milliseconds, but since 
using days has been established already, the fix is to work with {{Double}} to 
preserve fractions of days and use {{floor()}} instead of the implicit truncate 
to round to a full number of days ({{Int}})

\\

Pseudo-code example, adding or subtracting 1 hour to Date "1970/01/01 0:00:00" 
using milliseconds...

{code}
"1970-01-01 0:00:00" + 1 hr = "1970-01-01  1:00:00"
"1970-01-01 0:00:00" - 1 hr = "1969-12-31 23:00:00"
{code}

\\

Same example, using full days. One hour is about 0.04 days. Using {{trunc()}} 
versus {{floor()}} we get ...  

{code}
trunc(+0.04) = +0  -->  "1970-01-01" + 0 days = "1970-01-01"    (correct)
floor(+0.04) = +0  -->  "1970-01-01" + 0 days = "1970-01-01"    (correct)

trunc(-0.04) = -0  -->  "1970-01-01" + -0 days = "1970-01-01"   (incorrect, bug)
floor(-0.04) = -1  -->  "1970-01-01" + -1 day  = "1969-12-31"   (correct, fix)
{code}

{code} 
def trunc(d: Dounble): Int = d.toInt
{code}



> DateUtils can not handle date before 1970/01/01 correctly
> ---------------------------------------------------------
>
>                 Key: SPARK-6785
>                 URL: https://issues.apache.org/jira/browse/SPARK-6785
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Davies Liu
>
> {code}
> scala> val d = new Date(100)
> d: java.sql.Date = 1969-12-31
> scala> DateUtils.toJavaDate(DateUtils.fromJavaDate(d))
> res1: java.sql.Date = 1970-01-01
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to