[ https://issues.apache.org/jira/browse/SPARK-15613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Davies Liu reassigned SPARK-15613: ---------------------------------- Assignee: Davies Liu > Incorrect days to millis conversion > ------------------------------------ > > Key: SPARK-15613 > URL: https://issues.apache.org/jira/browse/SPARK-15613 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 1.6.0, 2.0.0 > Environment: java version "1.8.0_91" > Reporter: Dmitry Bushev > Assignee: Davies Liu > Priority: Critical > > There is an issue with {{DateTimeUtils.daysToMillis}} implementation. It > affects {{DateTimeUtils.toJavaDate}} and ultimately CatalystTypeConverter, > i.e the conversion of date stored as {{Int}} days from epoch in InternalRow > to {{java.sql.Date}} of Row returned to user. > > The issue can be reproduced with this test (all the following tests are in my > defalut timezone Europe/Moscow): > {code} > $ sbt -Duser.timezone=Europe/Moscow catalyst/console > scala> java.util.Calendar.getInstance().getTimeZone > res0: java.util.TimeZone = > sun.util.calendar.ZoneInfo[id="Europe/Moscow",offset=10800000,dstSavings=0,useDaylight=false,transitions=79,lastRule=null] > scala> import org.apache.spark.sql.catalyst.util.DateTimeUtils._ > import org.apache.spark.sql.catalyst.util.DateTimeUtils._ > scala> for (days <- 0 to 20000 if millisToDays(daysToMillis(days)) != days) > yield days > res23: scala.collection.immutable.IndexedSeq[Int] = Vector(4108, 4473, 4838, > 5204, 5568, 5932, 6296, 6660, 7024, 7388, 8053, 8487, 8851, 9215, 9586, 9950, > 10314, 10678, 11042, 11406, 11777, 12141, 12505, 12869, 13233, 13597, 13968, > 14332, 14696, 15060) > {code} > For example, for {{4108}} day of epoch, the correct date should be > {{1981-04-01}} > {code} > scala> DateTimeUtils.toJavaDate(4107) > res25: java.sql.Date = 1981-03-31 > scala> DateTimeUtils.toJavaDate(4108) > res26: java.sql.Date = 1981-03-31 > scala> DateTimeUtils.toJavaDate(4109) > res27: java.sql.Date = 1981-04-02 > {code} > There was previous unsuccessful attempt to work around the problem in > SPARK-11415. It seems that issue involves flaws in java date implementation > and I don't see how it can be fixed without third-party libraries. > I was not able to identify the library of choice for Spark. The following > implementation uses [JSR-310|http://www.threeten.org/] > {code} > def millisToDays(millisUtc: Long): SQLDate = { > val instant = Instant.ofEpochMilli(millisUtc) > val zonedDateTime = instant.atZone(ZoneId.systemDefault) > zonedDateTime.toLocalDate.toEpochDay.toInt > } > def daysToMillis(days: SQLDate): Long = { > val localDate = LocalDate.ofEpochDay(days) > val zonedDateTime = localDate.atStartOfDay(ZoneId.systemDefault) > zonedDateTime.toInstant.toEpochMilli > } > {code} > that produces correct results: > {code} > scala> for (days <- 0 to 20000 if millisToDays(daysToMillis(days)) != days) > yield days > res37: scala.collection.immutable.IndexedSeq[Int] = Vector() > scala> new java.sql.Date(daysToMillis(4108)) > res36: java.sql.Date = 1981-04-01 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org