[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031860#comment-17031860 ] Xiao Li commented on SPARK-30668: - I think this is still not resolved. Spark 3.0 should not silently return a wrong result for a query whose pattern was right in the previous versions. I did not see the fallback mentioned in [~cloud_fan] > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Maxim Gekk >Priority: Blocker > Fix For: 3.0.0 > > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master > **2.4.5 RC2** > {code} > scala> sql("""SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz")""").show > ++ > |to_timestamp('2020-01-27T20:06:11.847-0800', '-MM-dd\'T\'HH:mm:ss.SSSz')| > ++ > | 2020-01-27 20:06:11| > ++ > {code} > **2.2.3 ~ 2.4.4** (2.0.2 ~ 2.1.3 doesn't have `to_timestamp`). > {code} > spark-sql> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz"); > 2020-01-27 20:06:11 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028748#comment-17028748 ] Wenchen Fan commented on SPARK-30668: - [~maxgekk] can we do them one by one? The SimpleDateFormat fallback is well justified by the examples in this ticket. We should look at the other 2 fallbacks closely as well. > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master > **2.4.5 RC2** > {code} > scala> sql("""SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz")""").show > ++ > |to_timestamp('2020-01-27T20:06:11.847-0800', '-MM-dd\'T\'HH:mm:ss.SSSz')| > ++ > | 2020-01-27 20:06:11| > ++ > {code} > **2.2.3 ~ 2.4.4** (2.0.2 ~ 2.1.3 doesn't have `to_timestamp`). > {code} > spark-sql> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz"); > 2020-01-27 20:06:11 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028731#comment-17028731 ] Maxim Gekk commented on SPARK-30668: Behind of the removed config _spark.sql.legacy.timeParser.enabled_, there are 2 more fallbacks to behaviors since Spark 1.5, see LegacyFallbackDateFormatter: 1. *s.toInt* - In Spark 1.5.0, we store the data as number of days since epoch in string. So, we just convert it to Int. 2. *DateTimeUtils.millisToDays(DateTimeUtils.stringToTime(s).getTime)* - the way used in 2.0 and 1.x 3. FastDateFormat or *SimpleDateFormat* Should we allow users to switch to SimpleDateFormat only or other legacy ways too? > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master > **2.4.5 RC2** > {code} > scala> sql("""SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz")""").show > ++ > |to_timestamp('2020-01-27T20:06:11.847-0800', '-MM-dd\'T\'HH:mm:ss.SSSz')| > ++ > | 2020-01-27 20:06:11| > ++ > {code} > **2.2.3 ~ 2.4.4** (2.0.2 ~ 2.1.3 doesn't have `to_timestamp`). > {code} > spark-sql> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz"); > 2020-01-27 20:06:11 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028709#comment-17028709 ] Wenchen Fan commented on SPARK-30668: - I checked the doc in [Spark 2.4|https://github.com/apache/spark/blob/branch-2.4/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L2967], and it says the pattern string follows java.text.SimpleDateFormat, so I think this is a breaking change. AFAIK we fixed several bugs by switching to the java.time.format.DateTimeFormatter, so it should be OK to do it in 3.0. We can make the migration more smooth by 1. providing a legacy config to restore the old behavior 2. when we use the new formatter, fall back to the old formatter if the new one fails to parse. This can at least fix the problem reported by this ticket. thoughts? > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master > **2.4.5 RC2** > {code} > scala> sql("""SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz")""").show > ++ > |to_timestamp('2020-01-27T20:06:11.847-0800', '-MM-dd\'T\'HH:mm:ss.SSSz')| > ++ > | 2020-01-27 20:06:11| > ++ > {code} > **2.2.3 ~ 2.4.4** (2.0.2 ~ 2.1.3 doesn't have `to_timestamp`). > {code} > spark-sql> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz"); > 2020-01-27 20:06:11 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028586#comment-17028586 ] Dongjoon Hyun commented on SPARK-30668: --- [~maxgekk]. What I meant was the scope of this `BUG` issue (SPARK-30668) is only 3.0.0. It's because only `3.0.0` returns `NULL`. In 2.4.x world, we don't return `NULL` for that kind of case. > The behavior of to_timestamp exists from the beginning. > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master > **2.4.5 RC2** > {code} > scala> sql("""SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz")""").show > ++ > |to_timestamp('2020-01-27T20:06:11.847-0800', '-MM-dd\'T\'HH:mm:ss.SSSz')| > ++ > | 2020-01-27 20:06:11| > ++ > {code} > **2.2.3 ~ 2.4.4** (2.0.2 ~ 2.1.3 doesn't have `to_timestamp`). > {code} > spark-sql> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz"); > 2020-01-27 20:06:11 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028545#comment-17028545 ] Maxim Gekk commented on SPARK-30668: > as a bug introduced by a new improvement patch. Don't understand this. The behavior of to_timestamp exists from the beginning. I am not sure that we can classify it as a bug. We could improve the description of the function, at least. At the moment, it says nothing about precision in 2.4.x, and the example of supported pattern -MM-dd HH:mm:ss. doesn't make sense. > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master > **2.4.5 RC2** > {code} > scala> sql("""SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz")""").show > ++ > |to_timestamp('2020-01-27T20:06:11.847-0800', '-MM-dd\'T\'HH:mm:ss.SSSz')| > ++ > | 2020-01-27 20:06:11| > ++ > {code} > **2.2.3 ~ 2.4.4** (2.0.2 ~ 2.1.3 doesn't have `to_timestamp`). > {code} > spark-sql> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz"); > 2020-01-27 20:06:11 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028537#comment-17028537 ] Dongjoon Hyun commented on SPARK-30668: --- Given the current circumstance, this seems to be considered as a bug introduced by a new improvement patch. > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master > **2.4.5 RC2** > {code} > scala> sql("""SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz")""").show > ++ > |to_timestamp('2020-01-27T20:06:11.847-0800', '-MM-dd\'T\'HH:mm:ss.SSSz')| > ++ > | 2020-01-27 20:06:11| > ++ > {code} > **2.2.3 ~ 2.4.4** (2.0.2 ~ 2.1.3 doesn't have `to_timestamp`). > {code} > spark-sql> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz"); > 2020-01-27 20:06:11 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028535#comment-17028535 ] Dongjoon Hyun commented on SPARK-30668: --- [~maxgekk]. We don't backport improvement. You registered that issue as `Improvement`. Also, I think so. > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master > **2.4.5 RC2** > {code} > scala> sql("""SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz")""").show > ++ > |to_timestamp('2020-01-27T20:06:11.847-0800', '-MM-dd\'T\'HH:mm:ss.SSSz')| > ++ > | 2020-01-27 20:06:11| > ++ > {code} > **2.2.3 ~ 2.4.4** (2.0.2 ~ 2.1.3 doesn't have `to_timestamp`). > {code} > spark-sql> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz"); > 2020-01-27 20:06:11 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028533#comment-17028533 ] Dongjoon Hyun commented on SPARK-30668: --- [~maxgekk]. You should be the last man who surprised. It's SPARK-27438, isn't it? > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master > **2.4.5 RC2** > {code} > scala> sql("""SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz")""").show > ++ > |to_timestamp('2020-01-27T20:06:11.847-0800', '-MM-dd\'T\'HH:mm:ss.SSSz')| > ++ > | 2020-01-27 20:06:11| > ++ > {code} > **2.2.3 ~ 2.4.4** (2.0.2 ~ 2.1.3 doesn't have `to_timestamp`). > {code} > spark-sql> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz"); > 2020-01-27 20:06:11 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028532#comment-17028532 ] Maxim Gekk commented on SPARK-30668: [~dongjoon] I have found the reason [https://github.com/apache/spark/pull/24420] . Theoretically, it can be backported to 2.4.x > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master > **2.4.5 RC2** > {code} > scala> sql("""SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz")""").show > ++ > |to_timestamp('2020-01-27T20:06:11.847-0800', '-MM-dd\'T\'HH:mm:ss.SSSz')| > ++ > | 2020-01-27 20:06:11| > ++ > {code} > **2.2.3 ~ 2.4.4** (2.0.2 ~ 2.1.3 doesn't have `to_timestamp`). > {code} > spark-sql> SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz"); > 2020-01-27 20:06:11 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028515#comment-17028515 ] Maxim Gekk commented on SPARK-30668: [~dongjoon] Output of your example is surprised - it doesn't contain the fractional part of seconds. 847 is printed on the master if we change the pattern z -> Z. > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master > **2.4.5 RC2** > {code} > scala> sql("""SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz")""").show > ++ > |to_timestamp('2020-01-27T20:06:11.847-0800', '-MM-dd\'T\'HH:mm:ss.SSSz')| > ++ > | 2020-01-27 20:06:11| > ++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025729#comment-17025729 ] Maxim Gekk commented on SPARK-30668: We can try to revert this [https://github.com/apache/spark/pull/23495] > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025698#comment-17025698 ] Herman van Hövell commented on SPARK-30668: --- I don't think we should revert the proleptic gregorian patch, the previous behavior was kind of broken. [~maxgekk] can we move back to the previous behavior by using the old parser? And perhaps feature flag that bit, or make it configurable. > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025681#comment-17025681 ] Maxim Gekk commented on SPARK-30668: If [~marmbrus] develops something new, he could use correct pattern for zone offsets as it is pointed out in the Java docs: https://github.com/apache/spark/blob/d69ed9afdf2bd8d03aaf835292b92692ec8189e9/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L2964 /cc [~srowen] [~dongjoon] > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025673#comment-17025673 ] Xiao Li commented on SPARK-30668: - [~hvanhovell] Making it configurable looks necessary. Today, Michael hit this when they tried the master branch. > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025651#comment-17025651 ] Maxim Gekk commented on SPARK-30668: > This is not mentioned in the migration guide. It is mentioned: {code} - The `unix_timestamp`, `date_format`, `to_unix_timestamp`, `from_unixtime`, `to_date`, `to_timestamp` functions. New implementation supports pattern formats as described here https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html and performs strict checking of its input. For example, the `2015-07-22 10:00:00` timestamp cannot be parse if pattern is `-MM-dd` because the parser does not consume whole input. Another example is the `31/01/2015 00:00` input cannot be parsed by the `dd/MM/ hh:mm` pattern because `hh` supposes hours in the range `1-12`. {code} > Do we have a simple way to remove such a behavior change? The change is related to the migration to Proleptic Gregorian calendar. To remove the behavior, you need to revert most of https://issues.apache.org/jira/browse/SPARK-26651 and maybe more. > For example, converting the pattern for users? Even it is possible to convert patterns, the result can be different for old dates due to the calendar system. > Can we let users choose different parsing mechanisms between SimpleDateFormat > and DateTimeFormat? No, a flag was removed 1 year ago, see https://issues.apache.org/jira/browse/SPARK-26503 and see https://github.com/apache/spark/pull/23391#discussion_r244414750 > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025644#comment-17025644 ] Xiao Li commented on SPARK-30668: - Can we let users choose different parsing mechanisms between SimpleDateFormat and DateTimeFormat? > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025643#comment-17025643 ] Xiao Li commented on SPARK-30668: - This will make the migration very painful. This is not mentioned in the migration guide. It will also generate different query results. Do we have a simple way to remove such a behavior change? For example, converting the pattern for users? > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025549#comment-17025549 ] Maxim Gekk commented on SPARK-30668: Date/timestamp parsing is based on Java 8 DateTimeFormat in Spark 3.0 which may have different notion of pattern letters (see [https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html]): {code} V time-zone IDzone-id America/Los_Angeles; Z; -08:30 z time-zone name zone-name Pacific Standard Time; PST O localized zone-offset offset-O GMT+8; GMT+08:00; UTC-08:00; X zone-offset 'Z' for zerooffset-X Z; -08; -0830; -08:30; -083015; -08:30:15; x zone-offset offset-x +; -08; -0830; -08:30; -083015; -08:30:15; Z zone-offset offset-Z +; -0800; -08:00; {code} As you can see 'z' is for time zone name, but you is going to parse zone offsets. You can use 'x' or 'Z' in the pattern instead of 'z': {code} scala> spark.sql("""SELECT to_timestamp("2020-01-27T20:06:11.847-0800", "-MM-dd'T'HH:mm:ss.SSSZ")""").show(false) ++ |to_timestamp('2020-01-27T20:06:11.847-0800', '-MM-dd\'T\'HH:mm:ss.SSSZ')| ++ |2020-01-28 07:06:11.847 | ++ {code} Parsing in Spark 2.4 is based on SimpleDateFormat (see https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html) where 'z' has slightly different meaning. > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30668) to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern "yyyy-MM-dd'T'HH:mm:ss.SSSz"
[ https://issues.apache.org/jira/browse/SPARK-30668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025467#comment-17025467 ] Xiao Li commented on SPARK-30668: - cc [~maxgekk] > to_timestamp failed to parse 2020-01-27T20:06:11.847-0800 using pattern > "-MM-dd'T'HH:mm:ss.SSSz" > > > Key: SPARK-30668 > URL: https://issues.apache.org/jira/browse/SPARK-30668 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > {code:java} > SELECT to_timestamp("2020-01-27T20:06:11.847-0800", > "-MM-dd'T'HH:mm:ss.SSSz") > {code} > This can return a valid value in Spark 2.4 but return NULL in the latest > master -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org