[ https://issues.apache.org/jira/browse/SPARK-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14981968#comment-14981968 ]
Russell Alexander Spitzer edited comment on SPARK-11415 at 10/30/15 6:20 AM: ----------------------------------------------------------------------------- Some tests are now, broken investigating The fix in SPARK-6785 seems to be off to me In it 1 second before epoch and 1 second after epoch are 1 Day apart. This should not be true. They should both be equivelently far (in days) from epoch 0 Actually i'm not sure about this now... was (Author: rspitzer): Some tests are now, broken investigating The fix in SPARK-6785 seems to be off to me In it 1 second before epoch and 1 second after epoch are 1 Day apart. This should not be true. They should both be equivelently far (in days) from epoch 0 > Catalyst DateType Shifts Input Data by Local Timezone > ----------------------------------------------------- > > Key: SPARK-11415 > URL: https://issues.apache.org/jira/browse/SPARK-11415 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0, 1.5.1 > Reporter: Russell Alexander Spitzer > > I've been running type tests for the Spark Cassandra Connector and couldn't > get a consistent result for java.sql.Date. I investigated and noticed the > following code is used to create Catalyst.DateTypes > https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L139-L144 > {code} > /** > * Returns the number of days since epoch from from java.sql.Date. > */ > def fromJavaDate(date: Date): SQLDate = { > millisToDays(date.getTime) > } > {code} > But millisToDays does not abide by this contract, shifting the underlying > timestamp to the local timezone before calculating the days from epoch. This > causes the invocation to move the actual date around. > {code} > // we should use the exact day as Int, for example, (year, month, day) -> > day > def millisToDays(millisUtc: Long): SQLDate = { > // SPARK-6785: use Math.floor so negative number of days (dates before > 1970) > // will correctly work as input for function toJavaDate(Int) > val millisLocal = millisUtc + > threadLocalLocalTimeZone.get().getOffset(millisUtc) > Math.floor(millisLocal.toDouble / MILLIS_PER_DAY).toInt > } > {code} > The inverse function also incorrectly shifts the timezone > {code} > // reverse of millisToDays > def daysToMillis(days: SQLDate): Long = { > val millisUtc = days.toLong * MILLIS_PER_DAY > millisUtc - threadLocalLocalTimeZone.get().getOffset(millisUtc) > } > {code} > https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L81-L93 > This will cause 1-off errors and could cause significant shifts in data if > the underlying data is worked on in different timezones than UTC. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org