[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 10: Thanks for the review! -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 21 Apr 2022 12:49:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Zoltan Borok-Nagy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode IcebergScanNode interprets the timestamp literals as UTC timestamps during predicate pushdown to Iceberg. It causes problems when the Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH LOCAL TIME ZONE in SQL) because in the scanners we assume that the timestamp literals in a query are in local timezone. Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is running in a different timezone than UTC, then the following query doesn't return any rows: SELECT * from t WHERE ts = ; Because during predicate pushdown the timestamp is interpreted as a UTC timestamp (no conversion from local to UTC), but during query execution the timestamp data in the files are converted to local timezone, then compared to . I.e. in the scanner the assumption is that is in local timezone. On the other hand, when Iceberg type TIMESTAMP (which correcponds to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just push down the timestamp values without any conversion. In this case there is no conversion in the scanners either. Testing: * added e2e test with TIMESTAMPTZ * added e2e test with TIMESTAMP Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Reviewed-on: http://gerrit.cloudera.org:8080/18399 Tested-by: Impala Public Jenkins Reviewed-by: Csaba Ringhofer --- M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=1969-01-01-01/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-01-10-12/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-3.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-10-31-02/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-00/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-12/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-5.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/a366370e-6b9a-4698-82d0-95fb69b19afb-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/snap-1967339514069250436-1-a366370e-6b9a-4698-82d0-95fb69b19afb.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=1969-01-01-00/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-01-10-11/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-00/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-01/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-3.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-10-22/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-5.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-11-10/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-6.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/94003077-eabb-4dab-95ec-52a1727ef853-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/snap-2778998487482282437-1-94003077-eabb-4dab-95ec-52a1727ef853.avro A testdata/data/iceb
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 21 Apr 2022 12:46:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Apr 2022 20:47:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10474/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Apr 2022 16:38:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8058/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Apr 2022 16:22:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 9: PS9 only adds tests to iceberg-query.test. Other changes are due to a rebase. -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Apr 2022 16:17:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Hello Tamas Mate, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18399 to look at the new patch set (#9). Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode IcebergScanNode interprets the timestamp literals as UTC timestamps during predicate pushdown to Iceberg. It causes problems when the Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH LOCAL TIME ZONE in SQL) because in the scanners we assume that the timestamp literals in a query are in local timezone. Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is running in a different timezone than UTC, then the following query doesn't return any rows: SELECT * from t WHERE ts = ; Because during predicate pushdown the timestamp is interpreted as a UTC timestamp (no conversion from local to UTC), but during query execution the timestamp data in the files are converted to local timezone, then compared to . I.e. in the scanner the assumption is that is in local timezone. On the other hand, when Iceberg type TIMESTAMP (which correcponds to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just push down the timestamp values without any conversion. In this case there is no conversion in the scanners either. Testing: * added e2e test with TIMESTAMPTZ * added e2e test with TIMESTAMP Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa --- M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=1969-01-01-01/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-01-10-12/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-3.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-10-31-02/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-00/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-12/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-5.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/a366370e-6b9a-4698-82d0-95fb69b19afb-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/snap-1967339514069250436-1-a366370e-6b9a-4698-82d0-95fb69b19afb.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=1969-01-01-00/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-01-10-11/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-00/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-01/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-3.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-10-22/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-5.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-11-10/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-6.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/94003077-eabb-4dab-95ec-52a1727ef853-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/snap-2778998487482282437-1-94003077-eabb-4dab-95ec-52a1727ef853.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceber
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 7: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8056/ -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Apr 2022 15:34:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10470/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Apr 2022 12:03:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 8: (3 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test: http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@615 PS6, Line 615: TYPES > NumFileMetadataRead: 0 sounds weird, but I think that it is out of the scop I agree, it's a different issue. http://gerrit.cloudera.org:8080/#/c/18399/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test: http://gerrit.cloudera.org:8080/#/c/18399/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@524 PS7, Line 524: > nit: extra Done http://gerrit.cloudera.org:8080/#/c/18399/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@543 PS7, Line 543: RESULT > Why do we need the order by? We don't *need* it, but I wanted to get the results in order when using bin/impala-py.test --update_results -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Apr 2022 11:43:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Hello Tamas Mate, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18399 to look at the new patch set (#8). Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode IcebergScanNode interprets the timestamp literals as UTC timestamps during predicate pushdown to Iceberg. It causes problems when the Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH LOCAL TIME ZONE in SQL) because in the scanners we assume that the timestamp literals in a query are in local timezone. Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is running in a different timezone than UTC, then the following query doesn't return any rows: SELECT * from t WHERE ts = ; Because during predicate pushdown the timestamp is interpreted as a UTC timestamp (no conversion from local to UTC), but during query execution the timestamp data in the files are converted to local timezone, then compared to . I.e. in the scanner the assumption is that is in local timezone. On the other hand, when Iceberg type TIMESTAMP (which correcponds to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just push down the timestamp values without any conversion. In this case there is no conversion in the scanners either. Testing: * added e2e test with TIMESTAMPTZ * added e2e test with TIMESTAMP Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa --- M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=1969-01-01-01/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-01-10-12/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-3.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-10-31-02/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-00/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-12/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-5.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/a366370e-6b9a-4698-82d0-95fb69b19afb-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/snap-1967339514069250436-1-a366370e-6b9a-4698-82d0-95fb69b19afb.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=1969-01-01-00/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-01-10-11/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-00/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-01/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-3.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-10-22/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-5.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-11-10/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-6.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/94003077-eabb-4dab-95ec-52a1727ef853-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/snap-2778998487482282437-1-94003077-eabb-4dab-95ec-52a1727ef853.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceber
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 7: Code-Review+1 (3 comments) http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test: http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@615 PS6, Line 615: 3,2022-04-10 22:04:00 > In my case the stats are: NumFileMetadataRead: 0 sounds weird, but I think that it is out of the scope of this Jira to investigate it. http://gerrit.cloudera.org:8080/#/c/18399/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test: http://gerrit.cloudera.org:8080/#/c/18399/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@524 PS7, Line 524: nit: extra http://gerrit.cloudera.org:8080/#/c/18399/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@543 PS7, Line 543: order by i; Why do we need the order by? -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Apr 2022 11:32:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8056/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Apr 2022 11:10:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10466/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 19 Apr 2022 17:20:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 6: (4 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/18399/6/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/18399/6/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@31 PS6, Line 31: import org.apache.iceberg.expressions.UnboundPredicate; : import org.apache.hadoop.fs.Path; > nit: order of imports looks wrong Done http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/data/README@722 PS6, Line 722: insert into iceberg_timestamp_part values (1, '2021-10-31 02:15:00'), (2, '2021-01-10 12:00:00'), (3, '2022-04-11 00:04:00'), (4, '2022-04-11 12:04:55'); > It would be nice to add 1-2 <1970-01-01 timestamps to give coverage for ne Done http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test: http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@551 PS6, Line 551: WHERE ts = '2021-10-31 00:15:00'; > Can you also add some tests with < or > predicates? Done http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@615 PS6, Line 615: aggregation(SUM, NumRowGroups): 5 > Hmm, I wonder why we don't drop some of these during stat filtering. Maybe In my case the stats are: - NumFileMetadataRead: 0 (0) - NumRowGroups: 5 (5) - NumStatsFilteredRowGroups: 3 (3) NumFileMetadataRead: 0 is strange since we definitely read the metadata to filter out row groups. Iceberg tables are treated as non-partitioned in most cases. We let Iceberg filter out partitions during planning. When predicate pushdown doesn't work (like in this test) it'd make sense to evaluate the stats for identity-partitioned tables (using the template tuple), but these tables are partitioned via the HOUR partition transform. -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 19 Apr 2022 16:59:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Hello Tamas Mate, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18399 to look at the new patch set (#7). Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode IcebergScanNode interprets the timestamp literals as UTC timestamps during predicate pushdown to Iceberg. It causes problems when the Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH LOCAL TIME ZONE in SQL) because in the scanners we assume that the timestamp literals in a query are in local timezone. Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is running in a different timezone than UTC, then the following query doesn't return any rows: SELECT * from t WHERE ts = ; Because during predicate pushdown the timestamp is interpreted as a UTC timestamp (no conversion from local to UTC), but during query execution the timestamp data in the files are converted to local timezone, then compared to . I.e. in the scanner the assumption is that is in local timezone. On the other hand, when Iceberg type TIMESTAMP (which correcponds to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just push down the timestamp values without any conversion. In this case there is no conversion in the scanners either. Testing: * added e2e test with TIMESTAMPTZ * added e2e test with TIMESTAMP Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa --- M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=1969-01-01-01/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-01-10-12/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-3.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-10-31-02/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-00/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-12/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-5.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/a366370e-6b9a-4698-82d0-95fb69b19afb-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/snap-1967339514069250436-1-a366370e-6b9a-4698-82d0-95fb69b19afb.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=1969-01-01-00/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-01-10-11/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-00/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-01/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-3.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-10-22/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-5.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-11-10/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-6.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/94003077-eabb-4dab-95ec-52a1727ef853-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/snap-2778998487482282437-1-94003077-eabb-4dab-95ec-52a1727ef853.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceber
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/data/README@722 PS6, Line 722: insert into iceberg_timestamp_part values (1, '2021-10-31 02:15:00'), (2, '2021-01-10 12:00:00'), (3, '2022-04-11 00:04:00'), (4, '2022-04-11 12:04:55'); It would be nice to add 1-2 <1970-01-01 timestamps to give coverage for negative hour values, e.g. one at an exact hour and one that only has a minutes component. http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test: http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@551 PS6, Line 551: WHERE ts = '2021-10-31 00:15:00'; Can you also add some tests with < or > predicates? -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 19 Apr 2022 14:12:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 6: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/18399/6/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/18399/6/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@31 PS6, Line 31: import org.apache.iceberg.expressions.UnboundPredicate; : import org.apache.hadoop.fs.Path; nit: order of imports looks wrong http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test: http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@615 PS6, Line 615: aggregation(SUM, NumRowGroups): 5 Hmm, I wonder why we don't drop some of these during stat filtering. Maybe Hive doesn't add stats for timestamp columns? Note that if we want to differentiate between partition and scanner level pruning, a recently introduced counter could be used: NumFileMetadataRead tracks the number of ORC/Parquet files in which we have read the metadata - this should equal to NumRowGroups in case of partition pruning while it should be larger in case of file/row group level stat filtering. An example: https://gerrit.cloudera.org/#/c/18327/17/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 19 Apr 2022 13:37:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 6: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8053/ -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 19 Apr 2022 13:03:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8053/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 19 Apr 2022 08:39:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 6: Code-Review+1 Thanks Zoltan, LGTM! -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 19 Apr 2022 07:54:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10453/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 14 Apr 2022 16:06:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10452/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 14 Apr 2022 16:02:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 6: PS6 is a rebase. -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 14 Apr 2022 15:46:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Hello Tamas Mate, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18399 to look at the new patch set (#6). Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode IcebergScanNode interprets the timestamp literals as UTC timestamps during predicate pushdown to Iceberg. It causes problems when the Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH LOCAL TIME ZONE in SQL) because in the scanners we assume that the timestamp literals in a query are in local timezone. Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is running in a different timezone than UTC, then the following query doesn't return any rows: SELECT * from t WHERE ts = ; Because during predicate pushdown the timestamp is interpreted as a UTC timestamp (no conversion from local to UTC), but during query execution the timestamp data in the files are converted to local timezone, then compared to . I.e. in the scanner the assumption is that is in local timezone. On the other hand, when Iceberg type TIMESTAMP (which correcponds to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just push down the timestamp values without any conversion. In this case there is no conversion in the scanners either. Testing: * added e2e test with TIMESTAMPTZ * added e2e test with TIMESTAMP Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa --- M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-01-10-12/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-10-31-02/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-00/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-3.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-12/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/a2d413b6-0539-4ef9-8c70-df065ef63402-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/snap-7021219619084712613-1-a2d413b6-0539-4ef9-8c70-df065ef63402.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-01-10-11/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-3.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-00/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-01/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-10-22/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-11-10/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-5.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/f522d668-4bd4-4ffa-b617-e1523753691a-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/snap-6467287209495292139-1-f522d668-4bd4-4ffa-b617-e1523753691a.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 5: (2 comments) Thanks for the comments. http://gerrit.cloudera.org:8080/#/c/18399/4/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/18399/4/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@61 PS4, Line 61: : import com.google.common.base.Preconditions; : > nit: imports got mixed here Thanks, done. http://gerrit.cloudera.org:8080/#/c/18399/4/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/18399/4/testdata/datasets/functional/functional_schema_template.sql@3310 PS4, Line 3310: : DATASET : functional : BASE_TABLE_NAME : iceberg_timestamptz_part : CREATE : CREATE EXTERNAL TABLE IF NOT EXISTS {db_name}{db_suffix}.{table_name} : STORED AS ICEBERG : TBLPROPERTIES('write.format.default'='parquet', 'iceberg.catalog'='hadoop.catalog', : 'iceberg.catalog_location'='/test-warehouse/iceberg_test/hadoop_catalog', : 'iceberg.table_identifier'='ice.iceberg_timestamptz_part'); : DEPENDENT_LOAD : `hadoop fs -mkdir -p /test-warehouse/iceberg_test/hadoop_catalog/ice && \ : hadoop fs -put -f ${IMPALA_HOME}/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part /test-warehouse/iceberg_test/hadoop_catalog/ > This table is created twice. Ah, thanks for catching this! Done. -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 14 Apr 2022 15:43:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Hello Tamas Mate, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18399 to look at the new patch set (#5). Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode IcebergScanNode interprets the timestamp literals as UTC timestamps during predicate pushdown to Iceberg. It causes problems when the Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH LOCAL TIME ZONE in SQL) because in the scanners we assume that the timestamp literals in a query are in local timezone. Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is running in a different timezone than UTC, then the following query doesn't return any rows: SELECT * from t WHERE ts = ; Because during predicate pushdown the timestamp is interpreted as a UTC timestamp (no conversion from local to UTC), but during query execution the timestamp data in the files are converted to local timezone, then compared to . I.e. in the scanner the assumption is that is in local timezone. On the other hand, when Iceberg type TIMESTAMP (which correcponds to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just push down the timestamp values without any conversion. In this case there is no conversion in the scanners either. Testing: * added e2e test with TIMESTAMPTZ * added e2e test with TIMESTAMP Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa --- M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-01-10-12/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-10-31-02/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-00/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-3.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-12/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/a2d413b6-0539-4ef9-8c70-df065ef63402-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/snap-7021219619084712613-1-a2d413b6-0539-4ef9-8c70-df065ef63402.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-01-10-11/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-3.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-00/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-01/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-10-22/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-11-10/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-5.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/f522d668-4bd4-4ffa-b617-e1523753691a-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/snap-6467287209495292139-1-f522d668-4bd4-4ffa-b617-e1523753691a.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 4: Code-Review+1 (2 comments) Thanks for the fix Zotlan, LGTM! http://gerrit.cloudera.org:8080/#/c/18399/4/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/18399/4/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@61 PS4, Line 61: import com.google.common.base.Preconditions; : : import org.apache.impala.util.ExprUtil; nit: imports got mixed here http://gerrit.cloudera.org:8080/#/c/18399/4/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/18399/4/testdata/datasets/functional/functional_schema_template.sql@3310 PS4, Line 3310: : DATASET : functional : BASE_TABLE_NAME : iceberg_uppercase_col : CREATE : CREATE EXTERNAL TABLE IF NOT EXISTS {db_name}{db_suffix}.{table_name} : STORED AS ICEBERG : TBLPROPERTIES('write.format.default'='parquet', 'iceberg.catalog'='hadoop.catalog', : 'iceberg.catalog_location'='/test-warehouse/iceberg_test/hadoop_catalog', : 'iceberg.table_identifier'='ice.iceberg_uppercase_col'); : DEPENDENT_LOAD : `hadoop fs -mkdir -p /test-warehouse/iceberg_test/hadoop_catalog/ice && \ : hadoop fs -put -f ${IMPALA_HOME}/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_uppercase_col /test-warehouse/iceberg_test/hadoop_catalog/ice This table is created twice. -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 14 Apr 2022 13:18:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10442/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Apr 2022 12:51:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10441/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Apr 2022 12:46:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 4: PS4 is a rebase. -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Apr 2022 12:31:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Hello Tamas Mate, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18399 to look at the new patch set (#4). Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode IcebergScanNode interprets the timestamp literals as UTC timestamps during predicate pushdown to Iceberg. It causes problems when the Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH LOCAL TIME ZONE in SQL) because in the scanners we assume that the timestamp literals in a query are in local timezone. Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is running in a different timezone than UTC, then the following query doesn't return any rows: SELECT * from t WHERE ts = ; Because during predicate pushdown the timestamp is interpreted as a UTC timestamp (no conversion from local to UTC), but during query execution the timestamp data in the files are converted to local timezone, then compared to . I.e. in the scanner the assumption is that is in local timezone. On the other hand, when Iceberg type TIMESTAMP (which correcponds to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just push down the timestamp values without any conversion. In this case there is no conversion in the scanners either. Testing: * added e2e test with TIMESTAMPTZ * added e2e test with TIMESTAMP Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa --- M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-01-10-12/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-10-31-02/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-00/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-3.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-12/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/a2d413b6-0539-4ef9-8c70-df065ef63402-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/snap-7021219619084712613-1-a2d413b6-0539-4ef9-8c70-df065ef63402.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-01-10-11/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-3.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-00/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-01/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-10-22/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-11-10/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-5.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/f522d668-4bd4-4ffa-b617-e1523753691a-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/snap-6467287209495292139-1-f522d668-4bd4-4ffa-b617-e1523753691a.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Hello Tamas Mate, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18399 to look at the new patch set (#3). Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode IcebergScanNode interprets the timestamp literals as UTC timestamps during predicate pushdown to Iceberg. It causes problems when the Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH LOCAL TIME ZONE in SQL) because in the scanners we assume that the timestamp literals in a query are in local timezone. Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is running in a different timezone than UTC, then the following query doesn't return any rows: SELECT * from t WHERE ts = ; Because during predicate pushdown the timestamp is interpreted as a UTC timestamp (no conversion from local to UTC), but during query execution the timestamp data in the files are converted to local timezone, then compared to . I.e. in the scanner the assumption is that is in local timezone. On the other hand, when Iceberg type TIMESTAMP (which correcponds to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just push down the timestamp values without any conversion. In this case there is no conversion in the scanners either. Testing: * added e2e test with TIMESTAMPTZ * added e2e test with TIMESTAMP Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa --- M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-01-10-12/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-10-31-02/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-00/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-3.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-12/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/a2d413b6-0539-4ef9-8c70-df065ef63402-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/snap-7021219619084712613-1-a2d413b6-0539-4ef9-8c70-df065ef63402.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/version-hint.text A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-01-10-11/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-3.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-00/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-01/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-10-22/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-4.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-11-10/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-5.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/f522d668-4bd4-4ffa-b617-e1523753691a-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/snap-6467287209495292139-1-f522d668-4bd4-4ffa-b617-e1523753691a.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10425/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Comment-Date: Mon, 11 Apr 2022 16:53:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Hello Tamas Mate, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18399 to look at the new patch set (#2). Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode IcebergScanNode interprets the timestamp literals as UTC timestamps during predicate pushdown to Iceberg. It causes problems when the Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH LOCAL TIME ZONE in SQL) because in the scanners we assume that the timestamp literals in a query are in local timezone. Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is running in a different timezone than UTC, then the following query doesn't return any rows: SELECT * from t WHERE ts = ; Because during predicate pushdown the timestamp is interpreted as a UTC timestamp (no conversion from local to UTC), but during query execution the timestamp data in the files are converted to local timezone, then compared to . I.e. in the scanner the assumption is that is in local timezone. On the other hand, when Iceberg type TIMESTAMP (which correcponds to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just push down the timestamp values without any conversion. In this case there is no conversion in the scanners either. Testing: * added e2e test with TIMESTAMPTZ * added e2e test with TIMESTAMP Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa --- M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M testdata/data/README M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test 6 files changed, 191 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18399/2 -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18399 ) Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10424/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Comment-Date: Mon, 11 Apr 2022 16:17:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18399 Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode .. IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode IcebergScanNode interprets the timestamp literals as UTC timestamps during predicate pushdown to Iceberg. It causes problems when the Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH LOCAL TIME ZONE in SQL) because in the scanners we assume that the timestamp literals in a query are in local timezone. Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is running in a different timezone than UTC, then the following query doesn't return any rows: SELECT * from t WHERE ts = ; Because during predicate pushdown the timestamp is interpreted as a UTC timestamp (no conversion from local to UTC), but during query execution the timestamp data in the files are converted to local timezone, then compared to . I.e. in the scanner the assumption is that is in local timezone. On the other hand, when Iceberg type TIMESTAMP (which correcponds to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just push down the timestamp values without any conversion. In this case there is no conversion in the scanners either. Testing: * added e2e test with TIMESTAMPTZ * added e2e test with TIMESTAMP Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa --- M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test 5 files changed, 184 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18399/1 -- To view, visit http://gerrit.cloudera.org:8080/18399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa Gerrit-Change-Number: 18399 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy