Maxim Gekk created SPARK-30696: ---------------------------------- Summary: Wrong result of the combination of from_utc_timestamp and to_utc_timestamp Key: SPARK-30696 URL: https://issues.apache.org/jira/browse/SPARK-30696 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.4, 3.0.0 Reporter: Maxim Gekk
Applying to_utc_timestamp() to results of from_utc_timestamp() should return the original timestamp in the same time zone. In the range of 100 years, the combination of functions returns wrong results 280 times out of 1753200: {code:java} scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100 SECS_PER_YEAR: Long = 31557600 scala> val SECS_PER_MINUTE = 60L SECS_PER_MINUTE: Long = 60 scala> val tz = "America/Los_Angeles" tz: String = America/Los_Angeles scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * SECS_PER_MINUTE) df: org.apache.spark.sql.Dataset[Long] = [id: bigint] scala> val diff = df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0) warning: there was one deprecation warning; re-run with -deprecation for details diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint] scala> diff.count res14: Long = 280 scala> df.count res15: Long = 1753200 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org