[ https://issues.apache.org/jira/browse/SPARK-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15478438#comment-15478438 ]
Dean Wampler edited comment on SPARK-16239 at 9/9/16 10:20 PM: --------------------------------------------------------------- I investigated this a bit today for a customer. I could not reproduce this bug on MacOS X, Ubuntu, nor RedHat releases with kernels 3.10.0-327.el7.x86_64 and 2.6.32-504.8.1.el6.x86_64, using Amazon AMIs. My customer has a private cloud environment with kernel 2.6.32-504.50.1.el6.x86_64 where he sees the bug. Anyway, I think it's something very specific to his cloud VM configuration, such as a buggy library. For all cases we used this JVM: {code} $ java -version java version "1.8.0_101" Java(TM) SE Runtime Environment (build 1.8.0_101-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode) {code} My point is that we should narrow down if this is really a Spark bug or a bug in the underlying platform. For reference, here's the code example we used in his environment and my test environments (some output suppressed): {code} scala> sqlContext.udf.register("to_date", (s: String) => new java.sql.Date( new java.text.SimpleDateFormat("yyyy-MM-dd").parse(s).getTime()) ) scala> val dates = (0 to 5).map(i => s"1949-11-${25+i}") scala> val df = sc.parallelize(dates).toDF("date") scala> df.show +----------+ | date| +----------+ |1949-11-25| |1949-11-26| |1949-11-27| |1949-11-28| |1949-11-29| |1949-11-30| +----------+ scala> val df2 = df.select(to_date($"date")) scala> df2.show +------------+ |todate(date)| +------------+ | 1949-11-25| | 1949-11-26| | 1949-11-27| // <--- my customer sees 1949-11-26 | 1949-11-28| | 1949-11-29| | 1949-11-30| +------------+ {code} If I'm right that this isn't really a Spark bug, then the following should be sufficient to demonstrate it in the Spark shell or a Scala interpreter of the same version: {code} scala> val f = (s: String) => new java.sql.Date( new java.text.SimpleDateFormat("yyyy-MM-dd").parse(s).getTime()) scala> val d = f("1949-11-27") d: java.sql.Date = 1949-11-27 {code} was (Author: deanwampler): I invested this a bit today for a customer. I could not reproduce this bug on MacOS X, Ubuntu, nor RedHat releases with kernels 3.10.0-327.el7.x86_64 and 2.6.32-504.8.1.el6.x86_64, using Amazon AMIs. My customer has a private cloud environment with kernel 2.6.32-504.50.1.el6.x86_64 where he sees the bug. Anyway, I think it's something very specific to his cloud VM configuration, such as a buggy library. For all cases we used this JVM: {code} $ java -version java version "1.8.0_101" Java(TM) SE Runtime Environment (build 1.8.0_101-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode) {code} My point is that we should narrow down if this is really a Spark bug or a bug in the underlying platform. For reference, here's the code example we used in his environment and my test environments (some output suppressed): {code} scala> sqlContext.udf.register("to_date", (s: String) => new java.sql.Date( new java.text.SimpleDateFormat("yyyy-MM-dd").parse(s).getTime()) ) scala> val dates = (0 to 5).map(i => s"1949-11-${25+i}") scala> val df = sc.parallelize(dates).toDF("date") scala> df.show +----------+ | date| +----------+ |1949-11-25| |1949-11-26| |1949-11-27| |1949-11-28| |1949-11-29| |1949-11-30| +----------+ scala> val df2 = df.select(to_date($"date")) scala> df2.show +------------+ |todate(date)| +------------+ | 1949-11-25| | 1949-11-26| | 1949-11-27| // <--- my customer sees 1949-11-26 | 1949-11-28| | 1949-11-29| | 1949-11-30| +------------+ {code} If I'm right that this isn't really a Spark bug, then the following should be sufficient to demonstrate it in the Spark shell or a Scala interpreter of the same version: {code} scala> val f = (s: String) => new java.sql.Date( new java.text.SimpleDateFormat("yyyy-MM-dd").parse(s).getTime()) scala> val d = f("1949-11-27") d: java.sql.Date = 1949-11-27 {code} > SQL issues with cast from date to string around daylight savings time > --------------------------------------------------------------------- > > Key: SPARK-16239 > URL: https://issues.apache.org/jira/browse/SPARK-16239 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.1 > Reporter: Glen Maisey > Priority: Critical > > Hi all, > I have a dataframe with a date column. When I cast to a string using the > spark sql cast function it converts it to the wrong date on certain days. > Looking into it, it occurs once a year when summer daylight savings starts. > I've tried to show this issue the code below. The toString() function works > correctly whereas the cast does not. > Unfortunately my users are using SQL code rather than scala dataframes and > therefore this workaround does not apply. This was actually picked up where a > user was writing something like "SELECT date1 UNION ALL select date2" where > date1 was a string and date2 was a date type. It must be implicitly > converting the date to a string which gives this error. > I'm in the Australia/Sydney timezone (see the time changes here > http://www.timeanddate.com/time/zone/australia/sydney) > val dates = > Array("2014-10-03","2014-10-04","2014-10-05","2014-10-06","2015-10-02","2015-10-03", > "2015-10-04", "2015-10-05") > val df = sc.parallelize(dates) > .toDF("txn_date") > .select(col("txn_date").cast("Date")) > df.select( > col("txn_date"), > col("txn_date").cast("Timestamp").alias("txn_date_timestamp"), > col("txn_date").cast("String").alias("txn_date_str_cast"), > col("txn_date".toString()).alias("txn_date_str_toString") > ) > .show() > +----------+--------------------+-----------------+---------------------+ > | txn_date| txn_date_timestamp|txn_date_str_cast|txn_date_str_toString| > +----------+--------------------+-----------------+---------------------+ > |2014-10-03|2014-10-02 14:00:...| 2014-10-03| 2014-10-03| > |2014-10-04|2014-10-03 14:00:...| 2014-10-04| 2014-10-04| > |2014-10-05|2014-10-04 13:00:...| 2014-10-04| 2014-10-05| > |2014-10-06|2014-10-05 13:00:...| 2014-10-06| 2014-10-06| > |2015-10-02|2015-10-01 14:00:...| 2015-10-02| 2015-10-02| > |2015-10-03|2015-10-02 14:00:...| 2015-10-03| 2015-10-03| > |2015-10-04|2015-10-03 13:00:...| 2015-10-03| 2015-10-04| > |2015-10-05|2015-10-04 13:00:...| 2015-10-05| 2015-10-05| > +----------+--------------------+-----------------+---------------------+ -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org