[GitHub] spark pull request #23196: [SPARK-26243][SQL] Use java.time API for parsing ...

MaxGekk Wed, 05 Dec 2018 02:33:02 -0800

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23196#discussion_r239010321
  
    --- Diff: 
sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala
 ---
    @@ -49,8 +49,8 @@ class HiveCompatibilitySuite extends HiveQueryFileTest 
with BeforeAndAfter {
       override def beforeAll() {
         super.beforeAll()
         TestHive.setCacheTables(true)
    -    // Timezone is fixed to America/Los_Angeles for those timezone 
sensitive tests (timestamp_*)
    -    TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"))
    +    // Timezone is fixed to GMT for those timezone sensitive tests 
(timestamp_*)
    --- End diff --
    
    Our current approach for converting dates is inconsistent in a few places, 
for example:
    - `UTF8String` -> `num days` uses hardcoded `GMT` and ignores SQL config: 
https://github.com/apache/spark/blob/f982ca07e80074bdc1e3b742c5e21cf368e4ede2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L493
    - `String` -> `java.util.Date` ignores Spark's time zone settings, and uses 
system time zone: 
https://github.com/apache/spark/blob/f982ca07e80074bdc1e3b742c5e21cf368e4ede2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L186
    - In many places even a function accepts timeZone parameter, it is not 
passed (used default time zone - **not from config but from 
TimeZone.getDefault()**). For example: 
https://github.com/apache/spark/blob/36edbac1c8337a4719f90e4abd58d38738b2e1fb/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala#L187
 .
    - Casting to the date type depends on type of argument, if it is 
`TimestampType`, expression-wise timezone is used, otherwise `GMT`: 
https://github.com/apache/spark/blob/d03e0af80d7659f12821cc2442efaeaee94d3985/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L403-L410
    
    I do really think to disable new parser/formatter outside of CSV/JSON 
datasources because it is hard to guarantee consistent behavior in combination 
with other date/timestamp functions. @srowen @gatorsmile @HyukjinKwon WDYT?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23196: [SPARK-26243][SQL] Use java.time API for parsing ...

Reply via email to