[
https://issues.apache.org/jira/browse/SPARK-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549366#comment-14549366
]
Christian Kadner edited comment on SPARK-6785 at 6/30/15 9:50 PM:
--
{panel:borderStyle=dashed|borderColor=#ccc|bgColor=#CE}
Pull Request +[6242|https://github.com/apache/spark/pull/6242]+
{panel}
\\
Before my fix, the from-and-to Java date conversion of dates before 1970 will
only work for {{java.sql.Date}} objects that reflect a date and time exactly at
midnight in the System's local time zone.
Otherwise, if the Date's time is just one millisecond before or after midnight,
the result of the above conversion will be offset by one day for Dates before
1970 because of a rounding (truncation) flaw in the function
{{DateUtils.millisToDays(Long):Int}}
\\
{code}
scala val df = new SimpleDateFormat(-MM-dd HH:mm:ss)
df: java.text.SimpleDateFormat = -MM-dd HH:mm:ss
scala val d1 = new Date(df.parse(1969-01-01 00:00:00).getTime)
d2: java.sql.Date = 1969-01-01
scala val d2 = new Date(df.parse(1969-01-01 00:00:01).getTime)
d2: java.sql.Date = 1969-01-01
scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d1))
res1: java.sql.Date = 1969-01-01
scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d2))
res2: java.sql.Date = 1969-01-02
{code}
\\
What is the code doing and how to fix it:
\\
- A {{java.util.Date}} is represented by milliseconds ({{Long}}) since the
Epoch (1970/01/01 0:00:00 GMT) with positive numbers for dates after and
negative numbers for dates before 1970
- The function {{DateUtils.fromJavaDate(java.util.Date):Int}} calculates the
number of full days passed since 1970/01/01 00:00:00 (local time, not UTC), but
by using the data type {{Long}} (as opposed to {{Double}}) when converting
milliseconds to days it essentially truncates the fractional part of days
passed (disregarding the impact of hours, minutes, seconds)
- The function {{DateUtils.toJavaDate(Int):Date}} converts the given number of
days into milliseconds and adds it 1970/01/01 00:00:00 (local time, not UTC)
- _Side note: The time-zone offset from UTC is factored in when converting a
Date to days and removed when converting days to Date, so the time-zone
shifting is neutralized in the round-trip conversion
{{toJavaDate(fromJavaDate(java.util.Date))}}._
- The truncation of partial days is not a problem for dates after 1970 since
adding a fraction of a day to any date will not flip the calendar to the next
day (since all our Dates start 0:00:00 AM)
- That truncation of partial days however is a problem when subtracting even a
second from a {{Date}} with time at 0:00:00 AM which should turn the calender
back one day to the previous date
- Ideally the date conversion should be done using milliseconds, but since
using days has been established already, the fix is to work with {{Double}} to
preserve fractions of days and use {{floor()}} instead of the implicit truncate
to round to a full number of days ({{Int}})
\\
Pseudo-code example, adding or subtracting 1 hour to Date 1970/01/01 0:00:00
using milliseconds...
{code}
1970-01-01 0:00:00 + 1 hr = 1970-01-01 1:00:00
1970-01-01 0:00:00 - 1 hr = 1969-12-31 23:00:00
{code}
\\
Same example, using full days. One hour is about 0.04 days. Using {{trunc()}}
versus {{floor()}} we get ...
{code}
trunc(+0.04) = +0 -- 1970-01-01 + 0 days = 1970-01-01(correct)
floor(+0.04) = +0 -- 1970-01-01 + 0 days = 1970-01-01(correct)
trunc(-0.04) = -0 -- 1970-01-01 + -0 days = 1970-01-01 (incorrect, bug)
floor(-0.04) = -1 -- 1970-01-01 + -1 day = 1969-12-31 (correct, fix)
{code}
{code}
def trunc(d: Dounble): Int = d.toInt
{code}
was (Author: ckadner):
{panel:borderStyle=dashed|borderColor=#ccc|bgColor=#CE}
Please review only my second Pull Request
+[6242|https://github.com/apache/spark/pull/6242]+ and ignore my first Pull
Request -[6236|https://github.com/apache/spark/pull/6236]-
Thank you!
{panel}
\\
Before my fix, the from-and-to Java date conversion of dates before 1970 will
only work for {{java.sql.Date}} objects that reflect a date and time exactly at
midnight in the System's local time zone.
Otherwise, if the Date's time is just one millisecond before or after midnight,
the result of the above conversion will be offset by one day for Dates before
1970 because of a rounding (truncation) flaw in the function
{{DateUtils.millisToDays(Long):Int}}
\\
{code}
scala val df = new SimpleDateFormat(-MM-dd HH:mm:ss)
df: java.text.SimpleDateFormat = -MM-dd HH:mm:ss
scala val d1 = new Date(df.parse(1969-01-01 00:00:00).getTime)
d2: java.sql.Date = 1969-01-01
scala val d2 = new Date(df.parse(1969-01-01 00:00:01).getTime)
d2: java.sql.Date = 1969-01-01
scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d1))
res1: