[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
[ https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557794#comment-17557794 ] Manu Zhang commented on SPARK-30696: [~maxgekk], any update on this? I find another issue in 3.1.1. DST started at 2:00 am, March 13 however the following query already counted the timestamp as DST. {code:java} select from_utc_timestamp(timestamp'2022-03-13 05:18:29.581', "US/Pacific") >> 2022-03-12 22:18:29.581 {code} > Wrong result of the combination of from_utc_timestamp and to_utc_timestamp > -- > > Key: SPARK-30696 > URL: https://issues.apache.org/jira/browse/SPARK-30696 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Max Gekk >Priority: Major > > Applying to_utc_timestamp() to results of from_utc_timestamp() should return > the original timestamp in the same time zone. In the range of 100 years, the > combination of functions returns wrong results 280 times out of 1753200: > {code:java} > scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100 > SECS_PER_YEAR: Long = 31557600 > scala> val SECS_PER_MINUTE = 60L > SECS_PER_MINUTE: Long = 60 > scala> val tz = "America/Los_Angeles" > tz: String = America/Los_Angeles > scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * > SECS_PER_MINUTE) > df: org.apache.spark.sql.Dataset[Long] = [id: bigint] > scala> val diff = > df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), > tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0) > warning: there was one deprecation warning; re-run with -deprecation for > details > diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint] > scala> diff.count > res14: Long = 280 > scala> df.count > res15: Long = 1753200 > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
[ https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351464#comment-17351464 ] dc-heros commented on SPARK-30696: -- fromUTCtime and toUTCtime produced wrong result on Daylight Saving Time changes days For example, in LA in 1960, timezone switch from UTC-7h to UTC-8h at 2AM in 1960-09-25 but previous version have the cutoff at 8AM Because of this, for example 1960-09-25 1:30:00 in LA can be equal to both 1960-09-25 08:30:00 and 1960-09-25 09:30:00, so there just wrong on the cutoff time from those function Could you edit the description [~maxgekk] > Wrong result of the combination of from_utc_timestamp and to_utc_timestamp > -- > > Key: SPARK-30696 > URL: https://issues.apache.org/jira/browse/SPARK-30696 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Max Gekk >Priority: Major > > Applying to_utc_timestamp() to results of from_utc_timestamp() should return > the original timestamp in the same time zone. In the range of 100 years, the > combination of functions returns wrong results 280 times out of 1753200: > {code:java} > scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100 > SECS_PER_YEAR: Long = 31557600 > scala> val SECS_PER_MINUTE = 60L > SECS_PER_MINUTE: Long = 60 > scala> val tz = "America/Los_Angeles" > tz: String = America/Los_Angeles > scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * > SECS_PER_MINUTE) > df: org.apache.spark.sql.Dataset[Long] = [id: bigint] > scala> val diff = > df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), > tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0) > warning: there was one deprecation warning; re-run with -deprecation for > details > diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint] > scala> diff.count > res14: Long = 280 > scala> df.count > res15: Long = 1753200 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
[ https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351463#comment-17351463 ] Apache Spark commented on SPARK-30696: -- User 'dgd-contributor' has created a pull request for this issue: https://github.com/apache/spark/pull/32666 > Wrong result of the combination of from_utc_timestamp and to_utc_timestamp > -- > > Key: SPARK-30696 > URL: https://issues.apache.org/jira/browse/SPARK-30696 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Max Gekk >Priority: Major > > Applying to_utc_timestamp() to results of from_utc_timestamp() should return > the original timestamp in the same time zone. In the range of 100 years, the > combination of functions returns wrong results 280 times out of 1753200: > {code:java} > scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100 > SECS_PER_YEAR: Long = 31557600 > scala> val SECS_PER_MINUTE = 60L > SECS_PER_MINUTE: Long = 60 > scala> val tz = "America/Los_Angeles" > tz: String = America/Los_Angeles > scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * > SECS_PER_MINUTE) > df: org.apache.spark.sql.Dataset[Long] = [id: bigint] > scala> val diff = > df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), > tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0) > warning: there was one deprecation warning; re-run with -deprecation for > details > diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint] > scala> diff.count > res14: Long = 280 > scala> df.count > res15: Long = 1753200 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
[ https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027975#comment-17027975 ] Maxim Gekk commented on SPARK-30696: [~dongjoon] We have different default time zones, maybe it depends on this. > Wrong result of the combination of from_utc_timestamp and to_utc_timestamp > -- > > Key: SPARK-30696 > URL: https://issues.apache.org/jira/browse/SPARK-30696 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > Applying to_utc_timestamp() to results of from_utc_timestamp() should return > the original timestamp in the same time zone. In the range of 100 years, the > combination of functions returns wrong results 280 times out of 1753200: > {code:java} > scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100 > SECS_PER_YEAR: Long = 31557600 > scala> val SECS_PER_MINUTE = 60L > SECS_PER_MINUTE: Long = 60 > scala> val tz = "America/Los_Angeles" > tz: String = America/Los_Angeles > scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * > SECS_PER_MINUTE) > df: org.apache.spark.sql.Dataset[Long] = [id: bigint] > scala> val diff = > df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), > tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0) > warning: there was one deprecation warning; re-run with -deprecation for > details > diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint] > scala> diff.count > res14: Long = 280 > scala> df.count > res15: Long = 1753200 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
[ https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027885#comment-17027885 ] Dongjoon Hyun commented on SPARK-30696: --- I got the following numbers from the example in the JIRA. I'm wondering why it's 280 in the JIRA description. {code:java} scala> sc.version res5: String = 2.0.2 scala> diff.count res6: Long = 144 {code} {code} scala> sc.version res1: String = 2.1.3 scala> diff.count res2: Long = 144 {code} {code} scala> sc.version res1: String = 2.2.3 scala> diff.count res2: Long = 144 {code} {code} scala> sc.version res1: String = 2.3.4 scala> diff.count res2: Long = 144 {code} {code} scala> sc.version res1: String = 2.4.4 scala> diff.count res2: Long = 144 {code} > Wrong result of the combination of from_utc_timestamp and to_utc_timestamp > -- > > Key: SPARK-30696 > URL: https://issues.apache.org/jira/browse/SPARK-30696 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > Applying to_utc_timestamp() to results of from_utc_timestamp() should return > the original timestamp in the same time zone. In the range of 100 years, the > combination of functions returns wrong results 280 times out of 1753200: > {code:java} > scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100 > SECS_PER_YEAR: Long = 31557600 > scala> val SECS_PER_MINUTE = 60L > SECS_PER_MINUTE: Long = 60 > scala> val tz = "America/Los_Angeles" > tz: String = America/Los_Angeles > scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * > SECS_PER_MINUTE) > df: org.apache.spark.sql.Dataset[Long] = [id: bigint] > scala> val diff = > df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), > tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0) > warning: there was one deprecation warning; re-run with -deprecation for > details > diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint] > scala> diff.count > res14: Long = 280 > scala> df.count > res15: Long = 1753200 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
[ https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027883#comment-17027883 ] Dongjoon Hyun commented on SPARK-30696: --- Thank you for pinging me. > Wrong result of the combination of from_utc_timestamp and to_utc_timestamp > -- > > Key: SPARK-30696 > URL: https://issues.apache.org/jira/browse/SPARK-30696 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > Applying to_utc_timestamp() to results of from_utc_timestamp() should return > the original timestamp in the same time zone. In the range of 100 years, the > combination of functions returns wrong results 280 times out of 1753200: > {code:java} > scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100 > SECS_PER_YEAR: Long = 31557600 > scala> val SECS_PER_MINUTE = 60L > SECS_PER_MINUTE: Long = 60 > scala> val tz = "America/Los_Angeles" > tz: String = America/Los_Angeles > scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * > SECS_PER_MINUTE) > df: org.apache.spark.sql.Dataset[Long] = [id: bigint] > scala> val diff = > df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), > tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0) > warning: there was one deprecation warning; re-run with -deprecation for > details > diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint] > scala> diff.count > res14: Long = 280 > scala> df.count > res15: Long = 1753200 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
[ https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027834#comment-17027834 ] Maxim Gekk commented on SPARK-30696: The issue can be reproduced when DateTimeUtils functions are invoked directly w/o casting: {code} var ts = -50 * MICROS_PER_YEAR val maxTs = 50 * MICROS_PER_YEAR val step = 30 * MICROS_PER_MINUTE val tz = "America/Los_Angeles" var incorrectCount = 0 while (ts <= maxTs) { if (toUTCTime(fromUTCTime(ts, tz), tz) != ts) { incorrectCount += 1 } ts += step } println(incorrectCount) {code} > Wrong result of the combination of from_utc_timestamp and to_utc_timestamp > -- > > Key: SPARK-30696 > URL: https://issues.apache.org/jira/browse/SPARK-30696 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > Applying to_utc_timestamp() to results of from_utc_timestamp() should return > the original timestamp in the same time zone. In the range of 100 years, the > combination of functions returns wrong results 280 times out of 1753200: > {code:java} > scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100 > SECS_PER_YEAR: Long = 31557600 > scala> val SECS_PER_MINUTE = 60L > SECS_PER_MINUTE: Long = 60 > scala> val tz = "America/Los_Angeles" > tz: String = America/Los_Angeles > scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * > SECS_PER_MINUTE) > df: org.apache.spark.sql.Dataset[Long] = [id: bigint] > scala> val diff = > df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), > tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0) > warning: there was one deprecation warning; re-run with -deprecation for > details > diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint] > scala> diff.count > res14: Long = 280 > scala> df.count > res15: Long = 1753200 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
[ https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027830#comment-17027830 ] Maxim Gekk commented on SPARK-30696: [~dongjoon] FYI > Wrong result of the combination of from_utc_timestamp and to_utc_timestamp > -- > > Key: SPARK-30696 > URL: https://issues.apache.org/jira/browse/SPARK-30696 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > Applying to_utc_timestamp() to results of from_utc_timestamp() should return > the original timestamp in the same time zone. In the range of 100 years, the > combination of functions returns wrong results 280 times out of 1753200: > {code:java} > scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100 > SECS_PER_YEAR: Long = 31557600 > scala> val SECS_PER_MINUTE = 60L > SECS_PER_MINUTE: Long = 60 > scala> val tz = "America/Los_Angeles" > tz: String = America/Los_Angeles > scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * > SECS_PER_MINUTE) > df: org.apache.spark.sql.Dataset[Long] = [id: bigint] > scala> val diff = > df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), > tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0) > warning: there was one deprecation warning; re-run with -deprecation for > details > diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint] > scala> diff.count > res14: Long = 280 > scala> df.count > res15: Long = 1753200 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org