[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 10:

Thanks for the review!


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 21 Apr 2022 12:49:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..

IMPALA-10850: Interpret timestamp predicates in local timezone in 
IcebergScanNode

IcebergScanNode interprets the timestamp literals as UTC timestamps
during predicate pushdown to Iceberg. It causes problems when the
Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH
LOCAL TIME ZONE in SQL) because in the scanners we assume that the
timestamp literals in a query are in local timezone.

Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is
running in a different timezone than UTC, then the following query
doesn't return any rows:

 SELECT * from t
 WHERE ts = ;

Because during predicate pushdown the timestamp is interpreted as a
UTC timestamp (no conversion from local to UTC), but during query
execution the timestamp data in the files are converted to local
timezone, then compared to . I.e. in the scanner the
assumption is that  is in local timezone.

On the other hand, when Iceberg type TIMESTAMP (which correcponds
to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just
push down the timestamp values without any conversion. In this case
there is no conversion in the scanners either.

Testing:
 * added e2e test with TIMESTAMPTZ
 * added e2e test with TIMESTAMP

Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Reviewed-on: http://gerrit.cloudera.org:8080/18399
Tested-by: Impala Public Jenkins 
Reviewed-by: Csaba Ringhofer 
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=1969-01-01-01/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-01-10-12/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-3.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-10-31-02/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-00/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-12/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-5.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/a366370e-6b9a-4698-82d0-95fb69b19afb-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/snap-1967339514069250436-1-a366370e-6b9a-4698-82d0-95fb69b19afb.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=1969-01-01-00/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-01-10-11/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-00/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-01/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-3.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-10-22/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-5.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-11-10/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-6.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/94003077-eabb-4dab-95ec-52a1727ef853-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/snap-2778998487482282437-1-94003077-eabb-4dab-95ec-52a1727ef853.avro
A 
testdata/data/iceb

[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-21 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 21 Apr 2022 12:46:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 9: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 20 Apr 2022 20:47:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10474/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 20 Apr 2022 16:38:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8058/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 20 Apr 2022 16:22:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-20 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 9:

PS9 only adds tests to iceberg-query.test. Other changes are due to a rebase.


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 20 Apr 2022 16:17:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-20 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18399

to look at the new patch set (#9).

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..

IMPALA-10850: Interpret timestamp predicates in local timezone in 
IcebergScanNode

IcebergScanNode interprets the timestamp literals as UTC timestamps
during predicate pushdown to Iceberg. It causes problems when the
Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH
LOCAL TIME ZONE in SQL) because in the scanners we assume that the
timestamp literals in a query are in local timezone.

Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is
running in a different timezone than UTC, then the following query
doesn't return any rows:

 SELECT * from t
 WHERE ts = ;

Because during predicate pushdown the timestamp is interpreted as a
UTC timestamp (no conversion from local to UTC), but during query
execution the timestamp data in the files are converted to local
timezone, then compared to . I.e. in the scanner the
assumption is that  is in local timezone.

On the other hand, when Iceberg type TIMESTAMP (which correcponds
to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just
push down the timestamp values without any conversion. In this case
there is no conversion in the scanners either.

Testing:
 * added e2e test with TIMESTAMPTZ
 * added e2e test with TIMESTAMP

Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=1969-01-01-01/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-01-10-12/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-3.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-10-31-02/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-00/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-12/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-5.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/a366370e-6b9a-4698-82d0-95fb69b19afb-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/snap-1967339514069250436-1-a366370e-6b9a-4698-82d0-95fb69b19afb.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=1969-01-01-00/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-01-10-11/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-00/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-01/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-3.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-10-22/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-5.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-11-10/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-6.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/94003077-eabb-4dab-95ec-52a1727ef853-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/snap-2778998487482282437-1-94003077-eabb-4dab-95ec-52a1727ef853.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceber

[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8056/


-- 
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 20 Apr 2022 15:34:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10470/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 20 Apr 2022 12:03:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-20 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 8:

(3 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test:

http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@615
PS6, Line 615:  TYPES
> NumFileMetadataRead: 0 sounds weird, but I think that it is out of the scop
I agree, it's a different issue.


http://gerrit.cloudera.org:8080/#/c/18399/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test:

http://gerrit.cloudera.org:8080/#/c/18399/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@524
PS7, Line 524: 
> nit: extra 
Done


http://gerrit.cloudera.org:8080/#/c/18399/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@543
PS7, Line 543:  RESULT
> Why do we need the order by?
We don't *need* it, but I wanted to get the results in order when using 
bin/impala-py.test --update_results 



--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 20 Apr 2022 11:43:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-20 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18399

to look at the new patch set (#8).

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..

IMPALA-10850: Interpret timestamp predicates in local timezone in 
IcebergScanNode

IcebergScanNode interprets the timestamp literals as UTC timestamps
during predicate pushdown to Iceberg. It causes problems when the
Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH
LOCAL TIME ZONE in SQL) because in the scanners we assume that the
timestamp literals in a query are in local timezone.

Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is
running in a different timezone than UTC, then the following query
doesn't return any rows:

 SELECT * from t
 WHERE ts = ;

Because during predicate pushdown the timestamp is interpreted as a
UTC timestamp (no conversion from local to UTC), but during query
execution the timestamp data in the files are converted to local
timezone, then compared to . I.e. in the scanner the
assumption is that  is in local timezone.

On the other hand, when Iceberg type TIMESTAMP (which correcponds
to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just
push down the timestamp values without any conversion. In this case
there is no conversion in the scanners either.

Testing:
 * added e2e test with TIMESTAMPTZ
 * added e2e test with TIMESTAMP

Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=1969-01-01-01/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-01-10-12/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-3.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-10-31-02/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-00/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-12/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-5.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/a366370e-6b9a-4698-82d0-95fb69b19afb-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/snap-1967339514069250436-1-a366370e-6b9a-4698-82d0-95fb69b19afb.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=1969-01-01-00/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-01-10-11/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-00/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-01/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-3.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-10-22/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-5.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-11-10/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-6.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/94003077-eabb-4dab-95ec-52a1727ef853-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/snap-2778998487482282437-1-94003077-eabb-4dab-95ec-52a1727ef853.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceber

[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-20 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 7: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test:

http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@615
PS6, Line 615: 3,2022-04-10 22:04:00
> In my case the stats are:
NumFileMetadataRead: 0 sounds weird, but I think that it is out of the scope of 
this Jira to investigate it.


http://gerrit.cloudera.org:8080/#/c/18399/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test:

http://gerrit.cloudera.org:8080/#/c/18399/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@524
PS7, Line 524: 
nit: extra 


http://gerrit.cloudera.org:8080/#/c/18399/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@543
PS7, Line 543: order by i;
Why do we need the order by?



--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 20 Apr 2022 11:32:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8056/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 20 Apr 2022 11:10:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10466/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 19 Apr 2022 17:20:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-19 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 6:

(4 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/18399/6/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18399/6/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@31
PS6, Line 31: import org.apache.iceberg.expressions.UnboundPredicate;
: import org.apache.hadoop.fs.Path;
> nit: order of imports looks wrong
Done


http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/data/README@722
PS6, Line 722: insert into iceberg_timestamp_part values (1, '2021-10-31 
02:15:00'), (2, '2021-01-10 12:00:00'), (3, '2022-04-11 00:04:00'), (4, 
'2022-04-11 12:04:55');
> It would be nice to add 1-2  <1970-01-01 timestamps to give coverage for ne
Done


http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test:

http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@551
PS6, Line 551: WHERE ts = '2021-10-31 00:15:00';
> Can you also add some tests with < or > predicates?
Done


http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@615
PS6, Line 615: aggregation(SUM, NumRowGroups): 5
> Hmm, I wonder why we don't drop some of these during stat filtering. Maybe
In my case the stats are:

 - NumFileMetadataRead: 0 (0)
 - NumRowGroups: 5 (5)
 - NumStatsFilteredRowGroups: 3 (3)

NumFileMetadataRead: 0 is strange since we definitely read the metadata to 
filter out row groups.

Iceberg tables are treated as non-partitioned in most cases. We let Iceberg 
filter out partitions during planning.

When predicate pushdown doesn't work (like in this test) it'd make sense to 
evaluate the stats for identity-partitioned tables (using the template tuple), 
but these tables are partitioned via the HOUR partition transform.



--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 19 Apr 2022 16:59:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-19 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18399

to look at the new patch set (#7).

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..

IMPALA-10850: Interpret timestamp predicates in local timezone in 
IcebergScanNode

IcebergScanNode interprets the timestamp literals as UTC timestamps
during predicate pushdown to Iceberg. It causes problems when the
Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH
LOCAL TIME ZONE in SQL) because in the scanners we assume that the
timestamp literals in a query are in local timezone.

Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is
running in a different timezone than UTC, then the following query
doesn't return any rows:

 SELECT * from t
 WHERE ts = ;

Because during predicate pushdown the timestamp is interpreted as a
UTC timestamp (no conversion from local to UTC), but during query
execution the timestamp data in the files are converted to local
timezone, then compared to . I.e. in the scanner the
assumption is that  is in local timezone.

On the other hand, when Iceberg type TIMESTAMP (which correcponds
to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just
push down the timestamp values without any conversion. In this case
there is no conversion in the scanners either.

Testing:
 * added e2e test with TIMESTAMPTZ
 * added e2e test with TIMESTAMP

Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=1969-01-01-01/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-01-10-12/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-3.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-10-31-02/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-00/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-12/0-0-boroknagyz_20220419181820_3b0f79ee-1aff-4983-98cf-7d01647fa77a-job_16493406300920_0023-5.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/a366370e-6b9a-4698-82d0-95fb69b19afb-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/snap-1967339514069250436-1-a366370e-6b9a-4698-82d0-95fb69b19afb.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=1969-01-01-00/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-01-10-11/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-00/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-01/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-3.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-10-22/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-5.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-11-10/0-0-boroknagyz_20220419182502_45a45ed8-85ff-4046-b834-648c5a039891-job_16493406300920_0024-6.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/94003077-eabb-4dab-95ec-52a1727ef853-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/snap-2778998487482282437-1-94003077-eabb-4dab-95ec-52a1727ef853.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceber

[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-19 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/data/README@722
PS6, Line 722: insert into iceberg_timestamp_part values (1, '2021-10-31 
02:15:00'), (2, '2021-01-10 12:00:00'), (3, '2022-04-11 00:04:00'), (4, 
'2022-04-11 12:04:55');
It would be nice to add 1-2  <1970-01-01 timestamps to give coverage for 
negative hour values, e.g. one at an exact hour and one that only has a minutes 
component.


http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test:

http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@551
PS6, Line 551: WHERE ts = '2021-10-31 00:15:00';
Can you also add some tests with < or > predicates?



--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 19 Apr 2022 14:12:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-19 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 6: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18399/6/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18399/6/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@31
PS6, Line 31: import org.apache.iceberg.expressions.UnboundPredicate;
: import org.apache.hadoop.fs.Path;
nit: order of imports looks wrong


http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test:

http://gerrit.cloudera.org:8080/#/c/18399/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@615
PS6, Line 615: aggregation(SUM, NumRowGroups): 5
Hmm, I wonder why we don't drop some of these during stat filtering. Maybe Hive 
doesn't add stats for timestamp columns?

Note that if we want to differentiate between partition and scanner level 
pruning, a recently introduced counter could be used: NumFileMetadataRead 
tracks the number of ORC/Parquet files in which we have read the metadata - 
this should equal to NumRowGroups in case of partition pruning while it should 
be larger in case of file/row group level stat filtering.
An example:
https://gerrit.cloudera.org/#/c/18327/17/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test



--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 19 Apr 2022 13:37:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8053/


-- 
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 19 Apr 2022 13:03:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8053/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 19 Apr 2022 08:39:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-19 Thread Tamas Mate (Code Review)
Tamas Mate has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 6: Code-Review+1

Thanks Zoltan, LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 19 Apr 2022 07:54:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10453/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 14 Apr 2022 16:06:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10452/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 14 Apr 2022 16:02:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-14 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 6:

PS6 is a rebase.


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 14 Apr 2022 15:46:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-14 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18399

to look at the new patch set (#6).

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..

IMPALA-10850: Interpret timestamp predicates in local timezone in 
IcebergScanNode

IcebergScanNode interprets the timestamp literals as UTC timestamps
during predicate pushdown to Iceberg. It causes problems when the
Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH
LOCAL TIME ZONE in SQL) because in the scanners we assume that the
timestamp literals in a query are in local timezone.

Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is
running in a different timezone than UTC, then the following query
doesn't return any rows:

 SELECT * from t
 WHERE ts = ;

Because during predicate pushdown the timestamp is interpreted as a
UTC timestamp (no conversion from local to UTC), but during query
execution the timestamp data in the files are converted to local
timezone, then compared to . I.e. in the scanner the
assumption is that  is in local timezone.

On the other hand, when Iceberg type TIMESTAMP (which correcponds
to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just
push down the timestamp values without any conversion. In this case
there is no conversion in the scanners either.

Testing:
 * added e2e test with TIMESTAMPTZ
 * added e2e test with TIMESTAMP

Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-01-10-12/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-10-31-02/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-00/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-3.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-12/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/a2d413b6-0539-4ef9-8c70-df065ef63402-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/snap-7021219619084712613-1-a2d413b6-0539-4ef9-8c70-df065ef63402.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-01-10-11/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-3.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-00/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-01/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-10-22/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-11-10/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-5.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/f522d668-4bd4-4ffa-b617-e1523753691a-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/snap-6467287209495292139-1-f522d668-4bd4-4ffa-b617-e1523753691a.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/

[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-14 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 5:

(2 comments)

Thanks for the comments.

http://gerrit.cloudera.org:8080/#/c/18399/4/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18399/4/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@61
PS4, Line 61:
: import com.google.common.base.Preconditions;
:
> nit: imports got mixed here
Thanks, done.


http://gerrit.cloudera.org:8080/#/c/18399/4/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/18399/4/testdata/datasets/functional/functional_schema_template.sql@3310
PS4, Line 3310: 
  :  DATASET
  : functional
  :  BASE_TABLE_NAME
  : iceberg_timestamptz_part
  :  CREATE
  : CREATE EXTERNAL TABLE IF NOT EXISTS 
{db_name}{db_suffix}.{table_name}
  : STORED AS ICEBERG
  : TBLPROPERTIES('write.format.default'='parquet', 
'iceberg.catalog'='hadoop.catalog',
  :   
'iceberg.catalog_location'='/test-warehouse/iceberg_test/hadoop_catalog',
  :   
'iceberg.table_identifier'='ice.iceberg_timestamptz_part');
  :  DEPENDENT_LOAD
  : `hadoop fs -mkdir -p 
/test-warehouse/iceberg_test/hadoop_catalog/ice && \
  : hadoop fs -put -f 
${IMPALA_HOME}/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part
 /test-warehouse/iceberg_test/hadoop_catalog/
> This table is created twice.
Ah, thanks for catching this! Done.



--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 14 Apr 2022 15:43:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-14 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18399

to look at the new patch set (#5).

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..

IMPALA-10850: Interpret timestamp predicates in local timezone in 
IcebergScanNode

IcebergScanNode interprets the timestamp literals as UTC timestamps
during predicate pushdown to Iceberg. It causes problems when the
Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH
LOCAL TIME ZONE in SQL) because in the scanners we assume that the
timestamp literals in a query are in local timezone.

Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is
running in a different timezone than UTC, then the following query
doesn't return any rows:

 SELECT * from t
 WHERE ts = ;

Because during predicate pushdown the timestamp is interpreted as a
UTC timestamp (no conversion from local to UTC), but during query
execution the timestamp data in the files are converted to local
timezone, then compared to . I.e. in the scanner the
assumption is that  is in local timezone.

On the other hand, when Iceberg type TIMESTAMP (which correcponds
to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just
push down the timestamp values without any conversion. In this case
there is no conversion in the scanners either.

Testing:
 * added e2e test with TIMESTAMPTZ
 * added e2e test with TIMESTAMP

Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-01-10-12/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-10-31-02/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-00/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-3.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-12/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/a2d413b6-0539-4ef9-8c70-df065ef63402-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/snap-7021219619084712613-1-a2d413b6-0539-4ef9-8c70-df065ef63402.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-01-10-11/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-3.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-00/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-01/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-10-22/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-11-10/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-5.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/f522d668-4bd4-4ffa-b617-e1523753691a-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/snap-6467287209495292139-1-f522d668-4bd4-4ffa-b617-e1523753691a.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/

[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-14 Thread Tamas Mate (Code Review)
Tamas Mate has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 4: Code-Review+1

(2 comments)

Thanks for the fix Zotlan, LGTM!

http://gerrit.cloudera.org:8080/#/c/18399/4/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18399/4/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@61
PS4, Line 61: import com.google.common.base.Preconditions;
:
: import org.apache.impala.util.ExprUtil;
nit: imports got mixed here


http://gerrit.cloudera.org:8080/#/c/18399/4/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/18399/4/testdata/datasets/functional/functional_schema_template.sql@3310
PS4, Line 3310: 
  :  DATASET
  : functional
  :  BASE_TABLE_NAME
  : iceberg_uppercase_col
  :  CREATE
  : CREATE EXTERNAL TABLE IF NOT EXISTS 
{db_name}{db_suffix}.{table_name}
  : STORED AS ICEBERG
  : TBLPROPERTIES('write.format.default'='parquet', 
'iceberg.catalog'='hadoop.catalog',
  :   
'iceberg.catalog_location'='/test-warehouse/iceberg_test/hadoop_catalog',
  :   
'iceberg.table_identifier'='ice.iceberg_uppercase_col');
  :  DEPENDENT_LOAD
  : `hadoop fs -mkdir -p 
/test-warehouse/iceberg_test/hadoop_catalog/ice && \
  : hadoop fs -put -f 
${IMPALA_HOME}/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_uppercase_col
 /test-warehouse/iceberg_test/hadoop_catalog/ice
This table is created twice.



--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 14 Apr 2022 13:18:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10442/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Apr 2022 12:51:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10441/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Apr 2022 12:46:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-13 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 4:

PS4 is a rebase.


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Apr 2022 12:31:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-13 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18399

to look at the new patch set (#4).

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..

IMPALA-10850: Interpret timestamp predicates in local timezone in 
IcebergScanNode

IcebergScanNode interprets the timestamp literals as UTC timestamps
during predicate pushdown to Iceberg. It causes problems when the
Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH
LOCAL TIME ZONE in SQL) because in the scanners we assume that the
timestamp literals in a query are in local timezone.

Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is
running in a different timezone than UTC, then the following query
doesn't return any rows:

 SELECT * from t
 WHERE ts = ;

Because during predicate pushdown the timestamp is interpreted as a
UTC timestamp (no conversion from local to UTC), but during query
execution the timestamp data in the files are converted to local
timezone, then compared to . I.e. in the scanner the
assumption is that  is in local timezone.

On the other hand, when Iceberg type TIMESTAMP (which correcponds
to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just
push down the timestamp values without any conversion. In this case
there is no conversion in the scanners either.

Testing:
 * added e2e test with TIMESTAMPTZ
 * added e2e test with TIMESTAMP

Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-01-10-12/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-10-31-02/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-00/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-3.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-12/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/a2d413b6-0539-4ef9-8c70-df065ef63402-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/snap-7021219619084712613-1-a2d413b6-0539-4ef9-8c70-df065ef63402.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-01-10-11/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-3.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-00/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-01/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-10-22/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-11-10/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-5.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/f522d668-4bd4-4ffa-b617-e1523753691a-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/snap-6467287209495292139-1-f522d668-4bd4-4ffa-b617-e1523753691a.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/

[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-13 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18399

to look at the new patch set (#3).

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..

IMPALA-10850: Interpret timestamp predicates in local timezone in 
IcebergScanNode

IcebergScanNode interprets the timestamp literals as UTC timestamps
during predicate pushdown to Iceberg. It causes problems when the
Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH
LOCAL TIME ZONE in SQL) because in the scanners we assume that the
timestamp literals in a query are in local timezone.

Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is
running in a different timezone than UTC, then the following query
doesn't return any rows:

 SELECT * from t
 WHERE ts = ;

Because during predicate pushdown the timestamp is interpreted as a
UTC timestamp (no conversion from local to UTC), but during query
execution the timestamp data in the files are converted to local
timezone, then compared to . I.e. in the scanner the
assumption is that  is in local timezone.

On the other hand, when Iceberg type TIMESTAMP (which correcponds
to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just
push down the timestamp values without any conversion. In this case
there is no conversion in the scanners either.

Testing:
 * added e2e test with TIMESTAMPTZ
 * added e2e test with TIMESTAMP

Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-01-10-12/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2021-10-31-02/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-00/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-3.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/data/ts_hour=2022-04-11-12/0-0-boroknagyz_20220413134343_7000310a-aecc-4e44-8c41-f8885675a9cb-job_16493406300920_0019-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/a2d413b6-0539-4ef9-8c70-df065ef63402-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/snap-7021219619084712613-1-a2d413b6-0539-4ef9-8c70-df065ef63402.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamp_part/metadata/version-hint.text
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-01-10-11/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-3.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-00/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2021-10-31-01/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-10-22/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-4.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/data/ts_hour=2022-04-11-10/0-0-boroknagyz_20220413131021_733ffe4f-ca07-441a-a714-b4bfe314ad19-job_16493406300920_0017-5.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/f522d668-4bd4-4ffa-b617-e1523753691a-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/snap-6467287209495292139-1-f522d668-4bd4-4ffa-b617-e1523753691a.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_timestamptz_part/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/

[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10425/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Mon, 11 Apr 2022 16:53:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-11 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18399

to look at the new patch set (#2).

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..

IMPALA-10850: Interpret timestamp predicates in local timezone in 
IcebergScanNode

IcebergScanNode interprets the timestamp literals as UTC timestamps
during predicate pushdown to Iceberg. It causes problems when the
Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH
LOCAL TIME ZONE in SQL) because in the scanners we assume that the
timestamp literals in a query are in local timezone.

Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is
running in a different timezone than UTC, then the following query
doesn't return any rows:

 SELECT * from t
 WHERE ts = ;

Because during predicate pushdown the timestamp is interpreted as a
UTC timestamp (no conversion from local to UTC), but during query
execution the timestamp data in the files are converted to local
timezone, then compared to . I.e. in the scanner the
assumption is that  is in local timezone.

On the other hand, when Iceberg type TIMESTAMP (which correcponds
to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just
push down the timestamp values without any conversion. In this case
there is no conversion in the scanners either.

Testing:
 * added e2e test with TIMESTAMPTZ
 * added e2e test with TIMESTAMP

Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M testdata/data/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
6 files changed, 191 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18399/2
--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18399 )

Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10424/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Mon, 11 Apr 2022 16:17:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10850: Interpret timestamp predicates in local timezone in IcebergScanNode

2022-04-11 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18399


Change subject: IMPALA-10850: Interpret timestamp predicates in local timezone 
in IcebergScanNode
..

IMPALA-10850: Interpret timestamp predicates in local timezone in 
IcebergScanNode

IcebergScanNode interprets the timestamp literals as UTC timestamps
during predicate pushdown to Iceberg. It causes problems when the
Iceberg table uses TIMESTAMPTZ (which corresponds to TIMESTAMP WITH
LOCAL TIME ZONE in SQL) because in the scanners we assume that the
timestamp literals in a query are in local timezone.

Hence, if the Iceberg table is partitioned by HOUR(ts), and Impala is
running in a different timezone than UTC, then the following query
doesn't return any rows:

 SELECT * from t
 WHERE ts = ;

Because during predicate pushdown the timestamp is interpreted as a
UTC timestamp (no conversion from local to UTC), but during query
execution the timestamp data in the files are converted to local
timezone, then compared to . I.e. in the scanner the
assumption is that  is in local timezone.

On the other hand, when Iceberg type TIMESTAMP (which correcponds
to TIMESTAMP WITHOUT TIME ZONE in SQL) is used, then we should just
push down the timestamp values without any conversion. In this case
there is no conversion in the scanners either.

Testing:
 * added e2e test with TIMESTAMPTZ
 * added e2e test with TIMESTAMP

Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
5 files changed, 184 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18399/1
--
To view, visit http://gerrit.cloudera.org:8080/18399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I181be5d2fa004f69b457f69ff82dc2f9877f46fa
Gerrit-Change-Number: 18399
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy