[ https://issues.apache.org/jira/browse/SPARK-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326758#comment-15326758 ]
Josh Rosen commented on SPARK-11415: ------------------------------------ Note that {{Date.valueOf}} appears to be impacted by the system timezone: {code} [joshrosen Spark (timezone-bugs-2)]$ scala -Duser.timezone=Europe/Moscow -e 'println(java.sql.Date.valueOf("1970-01-01").getTime)' -10800000 [joshrosen Spark (timezone-bugs-2)]$ scala -Duser.timezone=America/Los_Angeles -e 'println(java.sql.Date.valueOf("1970-01-01").getTime)' 28800000 {code} However: {code} [joshrosen ~]$ scala -Duser.timezone=Europe/Moscow -e 'println(new java.sql.Date(1451606400000L).getTime)' 1451606400000 [joshrosen ~]$ scala -Duser.timezone=America/Los_Angeles -e 'println(new java.sql.Date(1451606400000L).getTime)' 1451606400000 {code} > Catalyst DateType Shifts Input Data by Local Timezone > ----------------------------------------------------- > > Key: SPARK-11415 > URL: https://issues.apache.org/jira/browse/SPARK-11415 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 1.5.0, 1.5.1 > Reporter: Russell Alexander Spitzer > > I've been running type tests for the Spark Cassandra Connector and couldn't > get a consistent result for java.sql.Date. I investigated and noticed the > following code is used to create Catalyst.DateTypes > https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L139-L144 > {code} > /** > * Returns the number of days since epoch from from java.sql.Date. > */ > def fromJavaDate(date: Date): SQLDate = { > millisToDays(date.getTime) > } > {code} > But millisToDays does not abide by this contract, shifting the underlying > timestamp to the local timezone before calculating the days from epoch. This > causes the invocation to move the actual date around. > {code} > // we should use the exact day as Int, for example, (year, month, day) -> > day > def millisToDays(millisUtc: Long): SQLDate = { > // SPARK-6785: use Math.floor so negative number of days (dates before > 1970) > // will correctly work as input for function toJavaDate(Int) > val millisLocal = millisUtc + > threadLocalLocalTimeZone.get().getOffset(millisUtc) > Math.floor(millisLocal.toDouble / MILLIS_PER_DAY).toInt > } > {code} > The inverse function also incorrectly shifts the timezone > {code} > // reverse of millisToDays > def daysToMillis(days: SQLDate): Long = { > val millisUtc = days.toLong * MILLIS_PER_DAY > millisUtc - threadLocalLocalTimeZone.get().getOffset(millisUtc) > } > {code} > https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L81-L93 > This will cause 1-off errors and could cause significant shifts in data if > the underlying data is worked on in different timezones than UTC. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org