[ 
https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17351464#comment-17351464
 ] 

dc-heros edited comment on SPARK-30696 at 5/26/21, 3:54 AM:
------------------------------------------------------------

fromUTCtime and toUTCtime produced wrong result on Daylight Saving Time changes 
days
 For example, in LA in 1960, timezone switch from UTC-7h to UTC-8h at 2AM in 
1960-09-25 but previous version have the cutoff at 8AM

Because of this, for example 1960-09-25 1:30:00 in LA can be equal to both 
1960-09-25 08:30:00 and 1960-09-25 09:30:00 and the fromUTCtime just pick 1 of 
them, so there just wrong on the cutoff time in those function

Could you edit the description [~maxgekk]


was (Author: dc-heros):
fromUTCtime and toUTCtime produced wrong result on Daylight Saving Time changes 
days
For example, in LA in 1960, timezone switch from UTC-7h to UTC-8h at 2AM in 
1960-09-25 but previous version have the cutoff at 8AM

Because of this, for example 1960-09-25 1:30:00 in LA can be equal to both 
1960-09-25 08:30:00 and 1960-09-25 09:30:00, so there just wrong on the cutoff 
time from those function

Could you edit the description [~maxgekk]

> Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
> --------------------------------------------------------------------------
>
>                 Key: SPARK-30696
>                 URL: https://issues.apache.org/jira/browse/SPARK-30696
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.4, 3.0.0
>            Reporter: Max Gekk
>            Priority: Major
>
> Applying to_utc_timestamp() to results of from_utc_timestamp() should return 
> the original timestamp in the same time zone. In the range of 100 years, the 
> combination of functions returns wrong results 280 times out of 1753200:
> {code:java}
> scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100
> SECS_PER_YEAR: Long = 31557600
> scala> val SECS_PER_MINUTE = 60L
> SECS_PER_MINUTE: Long = 60
> scala>  val tz = "America/Los_Angeles"
> tz: String = America/Los_Angeles
> scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * 
> SECS_PER_MINUTE)
> df: org.apache.spark.sql.Dataset[Long] = [id: bigint]
> scala> val diff = 
> df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), 
> tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0)
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint]
> scala> diff.count
> res14: Long = 280
> scala> df.count
> res15: Long = 1753200
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to