[ https://issues.apache.org/jira/browse/SPARK-30730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030742#comment-17030742 ]
Maxim Gekk commented on SPARK-30730: ------------------------------------ [~srowen] Since Spark 2.2, CAST uses the session time zone. In Spark 2.1 and maybe earlier, Cast invoked DateTimeUtils.stringToTimestamp w/o time zones that means the function used the default JVM time zone when the input string doesn't contain timezone info: [https://github.com/apache/spark/blob/branch-2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L353] . So, in Spark 2.1, the assumption of convertTz() was correct. It seems this is a longstanding regression. > Wrong results of `converTz` for different session and system time zones > ----------------------------------------------------------------------- > > Key: SPARK-30730 > URL: https://issues.apache.org/jira/browse/SPARK-30730 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Reporter: Maxim Gekk > Priority: Major > > Currently, DateTimeUtils.convertTz() assumes that timestamp strings are > casted to TimestampType using the JVM system timezone but in fact the session > time zone defined by the SQL configĀ *spark.sql.session.timeZone* is used in > the casting. This leads to wrong results of from_utc_timestamp and > to_utc_timestamp when session time zone is different from JVM time zones. The > issues can be reproduces by the code: > {code:java} > test("to_utc_timestamp in various system and session time zones") { > val localTs = "2020-02-04T22:42:10" > val defaultTz = TimeZone.getDefault > try { > DateTimeTestUtils.outstandingTimezonesIds.foreach { systemTz => > TimeZone.setDefault(DateTimeUtils.getTimeZone(systemTz)) > DateTimeTestUtils.outstandingTimezonesIds.foreach { sessionTz => > withSQLConf( > SQLConf.DATETIME_JAVA8API_ENABLED.key -> "true", > SQLConf.SESSION_LOCAL_TIMEZONE.key -> sessionTz) { > DateTimeTestUtils.outstandingTimezonesIds.foreach { toTz => > val instant = LocalDateTime > .parse(localTs) > .atZone(DateTimeUtils.getZoneId(toTz)) > .toInstant > val df = Seq(localTs).toDF("localTs") > val res = df.select(to_utc_timestamp(col("localTs"), > toTz)).first().apply(0) > if (instant != res) { > println(s"system = $systemTz session = $sessionTz to = $toTz") > } > } > } > } > } > } catch { > case NonFatal(_) => TimeZone.setDefault(defaultTz) > } > } > {code} > {code:java} > system = UTC session = PST to = UTC > system = UTC session = PST to = PST > system = UTC session = PST to = CET > system = UTC session = PST to = Africa/Dakar > system = UTC session = PST to = America/Los_Angeles > system = UTC session = PST to = Antarctica/Vostok > system = UTC session = PST to = Asia/Hong_Kong > system = UTC session = PST to = Europe/Amsterdam > ... > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org