Attila Jeges has uploaded a new patch set (#6). Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet ......................................................................
IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New tables created by Impala will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set the table property unless the global flag is set to true. Tables created using CREATE TABLE LIKE <OTHER TABLE> will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py A tests/query_test/test_parquet_timestamp_compatibility.py 26 files changed, 657 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/6 -- To view, visit http://gerrit.cloudera.org:8080/5939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Zoltan Ivanfi <zi+ger...@cloudera.com>