[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp

2022-06-22 Thread Manu Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557794#comment-17557794
 ] 

Manu Zhang commented on SPARK-30696:


[~maxgekk], any update on this?

I find another issue in 3.1.1. DST started at 2:00 am, March 13 however the 
following query already counted the timestamp as DST.
{code:java}
select from_utc_timestamp(timestamp'2022-03-13 05:18:29.581', "US/Pacific") 
>> 2022-03-12 22:18:29.581 {code}

> Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
> --
>
> Key: SPARK-30696
> URL: https://issues.apache.org/jira/browse/SPARK-30696
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Max Gekk
>Priority: Major
>
> Applying to_utc_timestamp() to results of from_utc_timestamp() should return 
> the original timestamp in the same time zone. In the range of 100 years, the 
> combination of functions returns wrong results 280 times out of 1753200:
> {code:java}
> scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100
> SECS_PER_YEAR: Long = 31557600
> scala> val SECS_PER_MINUTE = 60L
> SECS_PER_MINUTE: Long = 60
> scala>  val tz = "America/Los_Angeles"
> tz: String = America/Los_Angeles
> scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * 
> SECS_PER_MINUTE)
> df: org.apache.spark.sql.Dataset[Long] = [id: bigint]
> scala> val diff = 
> df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), 
> tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0)
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint]
> scala> diff.count
> res14: Long = 280
> scala> df.count
> res15: Long = 1753200
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp

2021-05-25 Thread dc-heros (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351464#comment-17351464
 ] 

dc-heros commented on SPARK-30696:
--

fromUTCtime and toUTCtime produced wrong result on Daylight Saving Time changes 
days
For example, in LA in 1960, timezone switch from UTC-7h to UTC-8h at 2AM in 
1960-09-25 but previous version have the cutoff at 8AM

Because of this, for example 1960-09-25 1:30:00 in LA can be equal to both 
1960-09-25 08:30:00 and 1960-09-25 09:30:00, so there just wrong on the cutoff 
time from those function

Could you edit the description [~maxgekk]

> Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
> --
>
> Key: SPARK-30696
> URL: https://issues.apache.org/jira/browse/SPARK-30696
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Max Gekk
>Priority: Major
>
> Applying to_utc_timestamp() to results of from_utc_timestamp() should return 
> the original timestamp in the same time zone. In the range of 100 years, the 
> combination of functions returns wrong results 280 times out of 1753200:
> {code:java}
> scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100
> SECS_PER_YEAR: Long = 31557600
> scala> val SECS_PER_MINUTE = 60L
> SECS_PER_MINUTE: Long = 60
> scala>  val tz = "America/Los_Angeles"
> tz: String = America/Los_Angeles
> scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * 
> SECS_PER_MINUTE)
> df: org.apache.spark.sql.Dataset[Long] = [id: bigint]
> scala> val diff = 
> df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), 
> tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0)
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint]
> scala> diff.count
> res14: Long = 280
> scala> df.count
> res15: Long = 1753200
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp

2021-05-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351463#comment-17351463
 ] 

Apache Spark commented on SPARK-30696:
--

User 'dgd-contributor' has created a pull request for this issue:
https://github.com/apache/spark/pull/32666

> Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
> --
>
> Key: SPARK-30696
> URL: https://issues.apache.org/jira/browse/SPARK-30696
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Max Gekk
>Priority: Major
>
> Applying to_utc_timestamp() to results of from_utc_timestamp() should return 
> the original timestamp in the same time zone. In the range of 100 years, the 
> combination of functions returns wrong results 280 times out of 1753200:
> {code:java}
> scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100
> SECS_PER_YEAR: Long = 31557600
> scala> val SECS_PER_MINUTE = 60L
> SECS_PER_MINUTE: Long = 60
> scala>  val tz = "America/Los_Angeles"
> tz: String = America/Los_Angeles
> scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * 
> SECS_PER_MINUTE)
> df: org.apache.spark.sql.Dataset[Long] = [id: bigint]
> scala> val diff = 
> df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), 
> tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0)
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint]
> scala> diff.count
> res14: Long = 280
> scala> df.count
> res15: Long = 1753200
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp

2020-01-31 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027975#comment-17027975
 ] 

Maxim Gekk commented on SPARK-30696:


[~dongjoon] We have different default time zones, maybe it depends on this.

> Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
> --
>
> Key: SPARK-30696
> URL: https://issues.apache.org/jira/browse/SPARK-30696
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Applying to_utc_timestamp() to results of from_utc_timestamp() should return 
> the original timestamp in the same time zone. In the range of 100 years, the 
> combination of functions returns wrong results 280 times out of 1753200:
> {code:java}
> scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100
> SECS_PER_YEAR: Long = 31557600
> scala> val SECS_PER_MINUTE = 60L
> SECS_PER_MINUTE: Long = 60
> scala>  val tz = "America/Los_Angeles"
> tz: String = America/Los_Angeles
> scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * 
> SECS_PER_MINUTE)
> df: org.apache.spark.sql.Dataset[Long] = [id: bigint]
> scala> val diff = 
> df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), 
> tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0)
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint]
> scala> diff.count
> res14: Long = 280
> scala> df.count
> res15: Long = 1753200
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp

2020-01-31 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027885#comment-17027885
 ] 

Dongjoon Hyun commented on SPARK-30696:
---

I got the following numbers from the example in the JIRA. I'm wondering why 
it's 280 in the JIRA description.
{code:java}
scala> sc.version
res5: String = 2.0.2

scala> diff.count
res6: Long = 144
{code}

{code}
scala> sc.version
res1: String = 2.1.3

scala> diff.count
res2: Long = 144
{code} 

{code}
scala> sc.version
res1: String = 2.2.3

scala> diff.count
res2: Long = 144
{code}

{code}
scala> sc.version
res1: String = 2.3.4

scala> diff.count
res2: Long = 144
{code}

{code}
scala> sc.version
res1: String = 2.4.4

scala> diff.count
res2: Long = 144
{code}
 

> Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
> --
>
> Key: SPARK-30696
> URL: https://issues.apache.org/jira/browse/SPARK-30696
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Applying to_utc_timestamp() to results of from_utc_timestamp() should return 
> the original timestamp in the same time zone. In the range of 100 years, the 
> combination of functions returns wrong results 280 times out of 1753200:
> {code:java}
> scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100
> SECS_PER_YEAR: Long = 31557600
> scala> val SECS_PER_MINUTE = 60L
> SECS_PER_MINUTE: Long = 60
> scala>  val tz = "America/Los_Angeles"
> tz: String = America/Los_Angeles
> scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * 
> SECS_PER_MINUTE)
> df: org.apache.spark.sql.Dataset[Long] = [id: bigint]
> scala> val diff = 
> df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), 
> tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0)
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint]
> scala> diff.count
> res14: Long = 280
> scala> df.count
> res15: Long = 1753200
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp

2020-01-31 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027883#comment-17027883
 ] 

Dongjoon Hyun commented on SPARK-30696:
---

Thank you for pinging me.

> Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
> --
>
> Key: SPARK-30696
> URL: https://issues.apache.org/jira/browse/SPARK-30696
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Applying to_utc_timestamp() to results of from_utc_timestamp() should return 
> the original timestamp in the same time zone. In the range of 100 years, the 
> combination of functions returns wrong results 280 times out of 1753200:
> {code:java}
> scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100
> SECS_PER_YEAR: Long = 31557600
> scala> val SECS_PER_MINUTE = 60L
> SECS_PER_MINUTE: Long = 60
> scala>  val tz = "America/Los_Angeles"
> tz: String = America/Los_Angeles
> scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * 
> SECS_PER_MINUTE)
> df: org.apache.spark.sql.Dataset[Long] = [id: bigint]
> scala> val diff = 
> df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), 
> tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0)
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint]
> scala> diff.count
> res14: Long = 280
> scala> df.count
> res15: Long = 1753200
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp

2020-01-31 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027834#comment-17027834
 ] 

Maxim Gekk commented on SPARK-30696:


The issue can be reproduced when DateTimeUtils functions are invoked directly 
w/o casting:
{code}
var ts = -50 * MICROS_PER_YEAR
val maxTs = 50 * MICROS_PER_YEAR
val step = 30 * MICROS_PER_MINUTE
val tz = "America/Los_Angeles"
var incorrectCount = 0
while (ts <= maxTs) {
  if (toUTCTime(fromUTCTime(ts, tz), tz) != ts) {
incorrectCount += 1
  }
  ts += step
}
println(incorrectCount)
{code}

> Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
> --
>
> Key: SPARK-30696
> URL: https://issues.apache.org/jira/browse/SPARK-30696
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Applying to_utc_timestamp() to results of from_utc_timestamp() should return 
> the original timestamp in the same time zone. In the range of 100 years, the 
> combination of functions returns wrong results 280 times out of 1753200:
> {code:java}
> scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100
> SECS_PER_YEAR: Long = 31557600
> scala> val SECS_PER_MINUTE = 60L
> SECS_PER_MINUTE: Long = 60
> scala>  val tz = "America/Los_Angeles"
> tz: String = America/Los_Angeles
> scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * 
> SECS_PER_MINUTE)
> df: org.apache.spark.sql.Dataset[Long] = [id: bigint]
> scala> val diff = 
> df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), 
> tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0)
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint]
> scala> diff.count
> res14: Long = 280
> scala> df.count
> res15: Long = 1753200
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp

2020-01-31 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027830#comment-17027830
 ] 

Maxim Gekk commented on SPARK-30696:


[~dongjoon] FYI

> Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
> --
>
> Key: SPARK-30696
> URL: https://issues.apache.org/jira/browse/SPARK-30696
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Applying to_utc_timestamp() to results of from_utc_timestamp() should return 
> the original timestamp in the same time zone. In the range of 100 years, the 
> combination of functions returns wrong results 280 times out of 1753200:
> {code:java}
> scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100
> SECS_PER_YEAR: Long = 31557600
> scala> val SECS_PER_MINUTE = 60L
> SECS_PER_MINUTE: Long = 60
> scala>  val tz = "America/Los_Angeles"
> tz: String = America/Los_Angeles
> scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * 
> SECS_PER_MINUTE)
> df: org.apache.spark.sql.Dataset[Long] = [id: bigint]
> scala> val diff = 
> df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), 
> tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0)
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint]
> scala> diff.count
> res14: Long = 280
> scala> df.count
> res15: Long = 1753200
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org