[ 
https://issues.apache.org/jira/browse/SPARK-30632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023263#comment-17023263
 ] 

Maxim Gekk edited comment on SPARK-30632 at 1/24/20 9:12 PM:
-------------------------------------------------------------

Spark 2.4 and earlier versions use SimpleDateFormat to parse timestamp strings. 
Unfortunately, the class doesn't support time zones in the format like 
"America/Los_Angeles", see 
[https://stackoverflow.com/questions/23242211/java-simpledateformat-parse-timezone-like-america-los-angeles]
 . Spark 3.0 has migrated to DateTimeFormatter which doesn't have such issue. 
Port the changes back to Spark 2.4 is risky, and destabilizes it, IMHO. One of 
the reasons is this requires to change calendar system to Proleptic Gregorian 
calendar, see https://issues.apache.org/jira/browse/SPARK-26651


was (Author: maxgekk):
Spark 2.4 and earlier versions use SimpleDateFormat to parse timestamp strings. 
Unfortunately, the class doesn't support time zones in the format like 
"America/Los_Angeles", see 
[https://stackoverflow.com/questions/23242211/java-simpledateformat-parse-timezone-like-america-los-angeles]
 . Spark 3.0 has migrated to DateTimeFormatter which doesn't have such issue. 
Port the changes back to Spark 2.4 is risky, and destabilizes it, IMHO.

> to_timestamp() doesn't work with certain timezones
> --------------------------------------------------
>
>                 Key: SPARK-30632
>                 URL: https://issues.apache.org/jira/browse/SPARK-30632
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.0, 2.4.4
>            Reporter: Anton Daitche
>            Priority: Major
>
> It seams that to_timestamp() doesn't work with timezones of the type 
> <Country>/<City>, e.g. America/Los_Angeles.
> The code
> {code:scala}
> val df = Seq(
>     ("2019-01-24 11:30:00.123", "America/Los_Angeles"), 
>     ("2020-01-01 01:30:00.123", "PST")
> ).toDF("ts_str", "tz_name")
> val ts_parsed = to_timestamp(
>     concat_ws(" ", $"ts_str", $"tz_name"), "yyyy-MM-dd HH:mm:ss.SSS z"
> ).as("timestamp")
> df.select(ts_parsed).show(false)
> {code}
> prints
> {code}
> +-------------------+
> |timestamp          |
> +-------------------+
> |null               |
> |2020-01-01 10:30:00|
> +-------------------+
> {code}
> So, the datetime string with timezone PST is properly parsed, whereas the one 
> with America/Los_Angeles is converted to null. According to 
> [this|https://github.com/apache/spark/pull/24195#issuecomment-578055146] 
> response on GitHub, this code works when run on the recent master version. 
> See also the discussion in 
> [this|https://github.com/apache/spark/pull/24195#issue] issue for more 
> context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to