This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new b6e8f64  [SPARK-31284][SQL][TESTS] Check rebasing of timestamps in ORC 
datasource
b6e8f64 is described below

commit b6e8f64d49caf1f0a1f1b910d603e8e000270d01
Author: Maxim Gekk <max.g...@gmail.com>
AuthorDate: Fri Mar 27 09:06:59 2020 -0700

    [SPARK-31284][SQL][TESTS] Check rebasing of timestamps in ORC datasource
    
    ### What changes were proposed in this pull request?
    In the PR, I propose 2 tests to check that rebasing of timestamps from/to 
the hybrid calendar (Julian + Gregorian) to/from Proleptic Gregorian calendar 
works correctly.
    1. The test `compatibility with Spark 2.4 in reading timestamps` load ORC 
file saved by Spark 2.4.5 via:
    ```shell
    $ export TZ="America/Los_Angeles"
    ```
    ```scala
    scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
    
    scala> val df = Seq("1001-01-01 
01:02:03.123456").toDF("tsS").select($"tsS".cast("timestamp").as("ts"))
    df: org.apache.spark.sql.DataFrame = [ts: timestamp]
    
    scala> df.write.orc("/Users/maxim/tmp/before_1582/2_4_5_ts_orc")
    
    scala> 
spark.read.orc("/Users/maxim/tmp/before_1582/2_4_5_ts_orc").show(false)
    +--------------------------+
    |ts                        |
    +--------------------------+
    |1001-01-01 01:02:03.123456|
    +--------------------------+
    ```
    2. The test `rebasing timestamps in write` is round trip test. Since the 
previous test confirms correct rebasing of timestamps in read. This test should 
pass only if rebasing works correctly in write.
    
    ### Why are the changes needed?
    To guarantee that rebasing works correctly for timestamps in ORC datasource.
    
    ### Does this PR introduce any user-facing change?
    No
    
    ### How was this patch tested?
    By running `OrcSourceSuite` for Hive 1.2 and 2.3 via the commands:
    ```
    $ build/sbt -Phive-2.3 "test:testOnly *OrcSourceSuite"
    ```
    and
    ```
    $ build/sbt -Phive-1.2 "test:testOnly *OrcSourceSuite"
    ```
    
    Closes #28047 from MaxGekk/rebase-ts-orc-test.
    
    Authored-by: Maxim Gekk <max.g...@gmail.com>
    Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
    (cherry picked from commit fc2a974e030c82bf500a81c3908f853c3eeb761d)
    Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
---
 .../test-data/before_1582_ts_v2_4.snappy.orc       | Bin 0 -> 251 bytes
 .../execution/datasources/orc/OrcSourceSuite.scala |  28 +++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git 
a/sql/core/src/test/resources/test-data/before_1582_ts_v2_4.snappy.orc 
b/sql/core/src/test/resources/test-data/before_1582_ts_v2_4.snappy.orc
new file mode 100644
index 0000000..af9ef04
Binary files /dev/null and 
b/sql/core/src/test/resources/test-data/before_1582_ts_v2_4.snappy.orc differ
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
index b5e002f..0b7500c 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
@@ -508,6 +508,34 @@ abstract class OrcSuite extends OrcTest with 
BeforeAndAfterAll {
       }
     }
   }
+
+  test("SPARK-31284: compatibility with Spark 2.4 in reading timestamps") {
+    Seq(false, true).foreach { vectorized =>
+      withSQLConf(SQLConf.ORC_VECTORIZED_READER_ENABLED.key -> 
vectorized.toString) {
+        checkAnswer(
+          readResourceOrcFile("test-data/before_1582_ts_v2_4.snappy.orc"),
+          Row(java.sql.Timestamp.valueOf("1001-01-01 01:02:03.123456")))
+      }
+    }
+  }
+
+  test("SPARK-31284: rebasing timestamps in write") {
+    withTempPath { dir =>
+      val path = dir.getAbsolutePath
+      Seq("1001-01-01 01:02:03.123456").toDF("tsS")
+        .select($"tsS".cast("timestamp").as("ts"))
+        .write
+        .orc(path)
+
+      Seq(false, true).foreach { vectorized =>
+        withSQLConf(SQLConf.ORC_VECTORIZED_READER_ENABLED.key -> 
vectorized.toString) {
+          checkAnswer(
+            spark.read.orc(path),
+            Row(java.sql.Timestamp.valueOf("1001-01-01 01:02:03.123456")))
+        }
+      }
+    }
+  }
 }
 
 class OrcSourceSuite extends OrcSuite with SharedSparkSession {


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to