Tim Armstrong created IMPALA-10491: -------------------------------------- Summary: Impala parquet scanner should use writer.time.zone when converting Hive timestamps Key: IMPALA-10491 URL: https://issues.apache.org/jira/browse/IMPALA-10491 Project: IMPALA Issue Type: Improvement Components: Backend Affects Versions: Impala 3.4.0 Reporter: Tim Armstrong
IMPALA-8721 reports some issues with Hive 3 and timezone conversion. HIVE-21290 fixed some of the issues, and also sets writer.time.zone in the Parquet metadata, which provides a better way to determine how the time zone was written. E.g. {noformat} tarmstrong@tarmstrong-Precision-7540:~/impala/impala$ hadoop jar ~/repos/parquet-mr/parquet-tools/target/parquet-tools-1.12.0-SNAPSHOT.jar meta /test-warehouse/asdfgh/000000_0 21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5 21/02/08 20:26:44 INFO hadoop.ParquetFileReader: reading another 1 footers 21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5 file: hdfs://localhost:20500/test-warehouse/asdfgh/000000_0 creator: parquet-mr version 1.10.99.7.2.7.0-44 (build 27344fd5fdaa371e364c604f471b340f8bcf8936) extra: writer.date.proleptic = false extra: writer.time.zone = America/Los_Angeles extra: writer.model.name = 3.1.3000.7.2.7.0-44 {noformat} We should use this timezone when converting timestamps, I think either always or when convert_legacy_hive_parquet_utc_timestamps=true. CC [~boroknagyz] [~csringhofer] -- This message was sent by Atlassian Jira (v8.3.4#803005)