[ https://issues.apache.org/jira/browse/SPARK-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549366#comment-14549366 ]
Christian Kadner edited comment on SPARK-6785 at 5/18/15 11:21 PM: ------------------------------------------------------------------- {panel:borderStyle=dashed|borderColor=#ccc|bgColor=#FFFFCE} Please review only my second Pull Request +[6242|https://github.com/apache/spark/pull/6242]+ and ignore my first Pull Request -[6236|https://github.com/apache/spark/pull/6236]- Thank you! {panel} \\ Before my fix, the from-and-to Java date conversion of dates before 1970 will only work for {{java.sql.Date}} objects that reflect a date and time exactly at midnight in the System's local time zone. Otherwise, if the Date's time is just one millisecond before or after midnight, the result of the above conversion will be offset by one day for Dates before 1970 because of a rounding (truncation) flaw in the function {{DateUtils.millisToDays(Long):Int}} \\ {code} scala> val df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss") df: java.text.SimpleDateFormat = yyyy-MM-dd HH:mm:ss scala> val d1 = new Date(df.parse("1969-01-01 00:00:00").getTime) d2: java.sql.Date = 1969-01-01 scala> val d2 = new Date(df.parse("1969-01-01 00:00:01").getTime) d2: java.sql.Date = 1969-01-01 scala> DateUtils.toJavaDate(DateUtils.fromJavaDate(d1)) res1: java.sql.Date = 1969-01-01 scala> DateUtils.toJavaDate(DateUtils.fromJavaDate(d2)) res2: java.sql.Date = 1969-01-02 {code} \\ What is the code doing and how to fix it: \\ - A {{java.util.Date}} is represented by milliseconds ({{Long}}) since the Epoch (1970/01/01 0:00:00 GMT) with positive numbers for dates after and negative numbers for dates before 1970 - The function {{DateUtils.fromJavaDate(java.util.Date):Int}} calculates the number of full days passed since 1970/01/01 00:00:00 (local time, not UTC), but by using the data type {{Long}} (as opposed to {{Double}}) when converting milliseconds to days it essentially truncates the fractional part of days passed (disregarding the impact of hours, minutes, seconds) - The function {{DateUtils.toJavaDate(Int):Date}} converts the given number of days into milliseconds and adds it 1970/01/01 00:00:00 (local time, not UTC) - _Side note: The time-zone offset from UTC is factored in when converting a Date to days and removed when converting days to Date, so the time-zone shifting is neutralized in the round-trip conversion {{toJavaDate(fromJavaDate(java.util.Date))}}._ - The truncation of partial days is not a problem for dates after 1970 since adding a fraction of a day to any date will not flip the calendar to the next day (since all our Dates start 0:00:00 AM) - That truncation of partial days however is a problem when subtracting even a second from a {{Date}} with time at 0:00:00 AM which should turn the calender back one day to the previous date - Ideally the date conversion should be done using milliseconds, but since using days has been established already, the fix is to work with {{Double}} to preserve fractions of days and use {{floor()}} instead of the implicit truncate to round to a full number of days ({{Int}}) \\ Pseudo-code example, adding or subtracting 1 hour to Date "1970/01/01 0:00:00" using milliseconds... {code} "1970-01-01 0:00:00" + 1 hr = "1970-01-01 1:00:00" "1970-01-01 0:00:00" - 1 hr = "1969-12-31 23:00:00" {code} \\ Same example, using full days. One hour is about 0.04 days. Using {{trunc()}} versus {{floor()}} we get ... {code} trunc(+0.04) = +0 --> "1970-01-01" + 0 days = "1970-01-01" (correct) floor(+0.04) = +0 --> "1970-01-01" + 0 days = "1970-01-01" (correct) trunc(-0.04) = -0 --> "1970-01-01" + -0 days = "1970-01-01" (incorrect, bug) floor(-0.04) = -1 --> "1970-01-01" + -1 day = "1969-12-31" (correct, fix) {code} {code} def trunc(d: Dounble): Int = d.toInt {code} > DateUtils can not handle date before 1970/01/01 correctly > --------------------------------------------------------- > > Key: SPARK-6785 > URL: https://issues.apache.org/jira/browse/SPARK-6785 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Davies Liu > > {code} > scala> val d = new Date(100) > d: java.sql.Date = 1969-12-31 > scala> DateUtils.toJavaDate(DateUtils.fromJavaDate(d)) > res1: java.sql.Date = 1970-01-01 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org