[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp

Dongjoon Hyun (Jira) Fri, 31 Jan 2020 15:41:43 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027885#comment-17027885
 ]


Dongjoon Hyun commented on SPARK-30696:
---------------------------------------

I got the following numbers from the example in the JIRA. I'm wondering why 
it's 280 in the JIRA description.
{code:java}
scala> sc.version
res5: String = 2.0.2

scala> diff.count
res6: Long = 144
{code}

{code}
scala> sc.version
res1: String = 2.1.3

scala> diff.count
res2: Long = 144
{code} 

{code}
scala> sc.version
res1: String = 2.2.3

scala> diff.count
res2: Long = 144
{code}

{code}
scala> sc.version
res1: String = 2.3.4

scala> diff.count
res2: Long = 144
{code}

{code}
scala> sc.version
res1: String = 2.4.4

scala> diff.count
res2: Long = 144
{code}
 

> Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
> --------------------------------------------------------------------------
>
>                 Key: SPARK-30696
>                 URL: https://issues.apache.org/jira/browse/SPARK-30696
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.4, 3.0.0
>            Reporter: Maxim Gekk
>            Priority: Major
>
> Applying to_utc_timestamp() to results of from_utc_timestamp() should return 
> the original timestamp in the same time zone. In the range of 100 years, the 
> combination of functions returns wrong results 280 times out of 1753200:
> {code:java}
> scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100
> SECS_PER_YEAR: Long = 31557600
> scala> val SECS_PER_MINUTE = 60L
> SECS_PER_MINUTE: Long = 60
> scala>  val tz = "America/Los_Angeles"
> tz: String = America/Los_Angeles
> scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * 
> SECS_PER_MINUTE)
> df: org.apache.spark.sql.Dataset[Long] = [id: bigint]
> scala> val diff = 
> df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), 
> tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0)
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint]
> scala> diff.count
> res14: Long = 280
> scala> df.count
> res15: Long = 1753200
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp

Reply via email to