This is an automated email from the ASF dual-hosted git repository. awong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/kudu.git
The following commit(s) were added to refs/heads/master by this push: new a0db990 [backup] set spark.sql.legacy.parquet.int96RebaseModeInWrite a0db990 is described below commit a0db990e08173293e42a7490322f08681abaa5d3 Author: Andrew Wong <aw...@cloudera.com> AuthorDate: Sat Mar 20 21:04:43 2021 -0700 [backup] set spark.sql.legacy.parquet.int96RebaseModeInWrite After the bump to Spark 3.1.1, TestKuduBackup.testRandomBackupAndRestore started failing with errors like the following: 02:04:37.919 [ERROR - Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] (Logging.scala:94) Aborting task org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: writing dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z into Parquet INT96 files can be dangerous, as the files may be read by Spark 2.x or legacy versions of Hive later, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set spark.sql.legacy.parquet.int96RebaseModeInWrite [...] at org.apache.spark.sql.execution.datasources.DataSourceUtils$.newRebaseExceptionInWrite(DataSourceUtils.scala:165) ~[spark-sql_2.12-3.1.1.jar:3.1.1] ... Per their instructions, this sets the int96RebaseModeInWrite option. Change-Id: Ib9ca4d9e69785dd9d056fa8e62c944d56cf219ed Reviewed-on: http://gerrit.cloudera.org:8080/17213 Reviewed-by: Grant Henke <granthe...@apache.org> Tested-by: Andrew Wong <aw...@cloudera.com> --- java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala | 1 + 1 file changed, 1 insertion(+) diff --git a/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala b/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala index c02f5de..13dcc5f 100644 --- a/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala +++ b/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala @@ -86,6 +86,7 @@ object KuduBackup { // 1900-01-01T00:00:00Z in Parquet. Otherwise incorrect values may be read by // Spark 2 or legacy version of Hive. See more details in SPARK-31404. session.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "LEGACY") + session.conf.set("spark.sql.legacy.parquet.int96RebaseModeInWrite", "LEGACY") // Write the data to the backup path. // The backup path contains the timestampMs and should not already exist.