[ https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027885#comment-17027885 ]
Dongjoon Hyun commented on SPARK-30696: --------------------------------------- I got the following numbers from the example in the JIRA. I'm wondering why it's 280 in the JIRA description. {code:java} scala> sc.version res5: String = 2.0.2 scala> diff.count res6: Long = 144 {code} {code} scala> sc.version res1: String = 2.1.3 scala> diff.count res2: Long = 144 {code} {code} scala> sc.version res1: String = 2.2.3 scala> diff.count res2: Long = 144 {code} {code} scala> sc.version res1: String = 2.3.4 scala> diff.count res2: Long = 144 {code} {code} scala> sc.version res1: String = 2.4.4 scala> diff.count res2: Long = 144 {code} > Wrong result of the combination of from_utc_timestamp and to_utc_timestamp > -------------------------------------------------------------------------- > > Key: SPARK-30696 > URL: https://issues.apache.org/jira/browse/SPARK-30696 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.4, 3.0.0 > Reporter: Maxim Gekk > Priority: Major > > Applying to_utc_timestamp() to results of from_utc_timestamp() should return > the original timestamp in the same time zone. In the range of 100 years, the > combination of functions returns wrong results 280 times out of 1753200: > {code:java} > scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100 > SECS_PER_YEAR: Long = 31557600 > scala> val SECS_PER_MINUTE = 60L > SECS_PER_MINUTE: Long = 60 > scala> val tz = "America/Los_Angeles" > tz: String = America/Los_Angeles > scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * > SECS_PER_MINUTE) > df: org.apache.spark.sql.Dataset[Long] = [id: bigint] > scala> val diff = > df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), > tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0) > warning: there was one deprecation warning; re-run with -deprecation for > details > diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint] > scala> diff.count > res14: Long = 280 > scala> df.count > res15: Long = 1753200 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org