[Impala-ASF-CR] IMPALA-12443: Add catalog timeline for all ddl profiles

2023-12-12 Thread Quanlong Huang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20491

to look at the new patch set (#8).

Change subject: IMPALA-12443: Add catalog timeline for all ddl profiles
..

IMPALA-12443: Add catalog timeline for all ddl profiles

This is a follow-up work of IMPALA-12024 where we add the catalog
timeline for CreateTable statements. Using the same mechanism, this
patch adds catalog timeline for all DDL/DML profiles, including
REFRESH and INSERT.

The goal is to add timeline markers after each step that could be
blocked, e.g. acquiring locks, external RPCs. So we can better debug
slow DDLs with the catalog timeline in profiles.

Tried to add some constant strings for widely used events, e.g. "Fetched
table from Metastore". Didn't do so for events that only occurs once.

Most of the catalog methods now have a new argument for tracking the
execution timeline. To avoid adding null checks everywhere, for code
paths that don't need a catalog profile, e.g. EventProcessor, creates an
unused catalogTimeline as the argument. We can use them in future works,
e.g. expose execution timeline of a slow processing on an HMS event.

This patch also removes some unused overloads of HdfsTable#load() and
HdfsTable#reloadPartitionsFromNames().

Example timeline for "REFRESH tpcds.store_sales" when the table is
unloaded:
Catalog Server Operation: 2s300ms
   - Got catalog version read lock: 26.407us (26.407us)
   - Start loading table: 314.663us (288.256us)
   - Got Metastore client: 629.599us (314.936us)
   - Fetched table from Metastore: 7.248ms (6.618ms)
   - Loaded table schema: 27.947ms (20.699ms)
   - Preloaded permissions cache for 1824 partitions: 1s514ms (1s486ms)
   - Got access level: 1s514ms (588.314us)
   - Created partition builders: 2s103ms (588.270ms)
   - Start loading file metadata: 2s103ms (49.760us)
   - Loaded file metadata for 1824 partitions: 2s282ms (179.839ms)
   - Async loaded table: 2s289ms (6.931ms)
   - Loaded table from scratch: 2s289ms (72.038us)
   - Got table read lock: 2s289ms (2.289us)
   - Finished resetMetadata request: 2s300ms (10.188ms)

Example timeline for an INSERT statement:
Catalog Server Operation: 178.120ms
   - Got catalog version read lock: 4.238us (4.238us)
   - Got catalog version write lock and table write lock: 52.768us (48.530us)
   - Got Metastore client: 15.768ms (15.715ms)
   - Fired Metastore events: 156.650ms (140.882ms)
   - Got Metastore client: 163.317ms (6.666ms)
   - Fetched table from Metastore: 166.561ms (3.244ms)
   - Start refreshing file metadata: 167.961ms (1.399ms)
   - Loaded file metadata for 24 partitions: 177.679ms (9.717ms)
   - Reloaded table metadata: 178.021ms (342.261us)
   - Finished updateCatalog request: 178.120ms (98.929us)

Example timeline for a "COMPUTE STATS tpcds_parquet.store_sales":
Catalog Server Operation: 6s737ms
   - Got catalog version read lock: 19.971us (19.971us)
   - Got catalog version write lock and table write lock: 50.255us (30.284us)
   - Got Metastore client: 171.819us (121.564us)
   - Updated column stats: 25.560ms (25.388ms)
   - Got Metastore client: 69.298ms (43.738ms)
   - Altered 500 partitions in Metastore: 1s894ms (1s825ms)
   - Altered 1000 partitions in Metastore: 3s558ms (1s664ms)
   - Altered 1500 partitions in Metastore: 5s144ms (1s586ms)
   - Altered 1824 partitions in Metastore: 6s205ms (1s060ms)
   - Got Metastore client: 6s205ms (329.481us)
   - Altered table in Metastore: 6s216ms (11.073ms)
   - Got Metastore client: 6s216ms (13.377us)
   - Fetched table from Metastore: 6s219ms (2.419ms)
   - Loaded table schema: 6s223ms (4.130ms)
   - Got current Metastore event id 19017: 6s639ms (415.690ms)
   - Start loading file metadata: 6s639ms (9.591us)
   - Loaded file metadata for 1824 partitions: 6s729ms (90.196ms)
   - Reloaded table metadata: 6s735ms (5.865ms)
   - DDL finished: 6s737ms (2.255ms)

Tests:
 - Add e2e test to verify the catalog timeline in some DDLs.

Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5
---
M be/src/exec/catalog-op-executor.cc
M be/src/exec/catalog-op-executor.h
M be/src/service/client-request-state.cc
M be/src/service/impala-server.cc
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogdTableInvalidator.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/MaterializedViewHdfsTable.java
M fe/src/main/java/org/apache/impala/cat

[Impala-ASF-CR] IMPALA-12443: Add catalog timeline for all ddl profiles

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20491 )

Change subject: IMPALA-12443: Add catalog timeline for all ddl profiles
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14659/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20491
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5
Gerrit-Change-Number: 20491
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 12 Dec 2023 08:09:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12443: Add catalog timeline for all ddl profiles

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20491 )

Change subject: IMPALA-12443: Add catalog timeline for all ddl profiles
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14660/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20491
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5
Gerrit-Change-Number: 20491
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 12 Dec 2023 08:15:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12443: Add catalog timeline for all ddl profiles

2023-12-12 Thread Quanlong Huang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20491

to look at the new patch set (#9).

Change subject: IMPALA-12443: Add catalog timeline for all ddl profiles
..

IMPALA-12443: Add catalog timeline for all ddl profiles

This is a follow-up work of IMPALA-12024 where we add the catalog
timeline for CreateTable statements. Using the same mechanism, this
patch adds catalog timeline for all DDL/DML profiles, including
REFRESH and INSERT.

The goal is to add timeline markers after each step that could be
blocked, e.g. acquiring locks, external RPCs. So we can better debug
slow DDLs with the catalog timeline in profiles.

Tried to add some constant strings for widely used events, e.g. "Fetched
table from Metastore". Didn't do so for events that only occurs once.

Most of the catalog methods now have a new argument for tracking the
execution timeline. To avoid adding null checks everywhere, for code
paths that don't need a catalog profile, e.g. EventProcessor, creates an
unused catalogTimeline as the argument. We can use them in future works,
e.g. expose execution timeline of a slow processing on an HMS event.

This patch also removes some unused overloads of HdfsTable#load() and
HdfsTable#reloadPartitionsFromNames().

Example timeline for "REFRESH tpcds.store_sales" when the table is
unloaded:
Catalog Server Operation: 2s300ms
   - Got catalog version read lock: 26.407us (26.407us)
   - Start loading table: 314.663us (288.256us)
   - Got Metastore client: 629.599us (314.936us)
   - Fetched table from Metastore: 7.248ms (6.618ms)
   - Loaded table schema: 27.947ms (20.699ms)
   - Preloaded permissions cache for 1824 partitions: 1s514ms (1s486ms)
   - Got access level: 1s514ms (588.314us)
   - Created partition builders: 2s103ms (588.270ms)
   - Start loading file metadata: 2s103ms (49.760us)
   - Loaded file metadata for 1824 partitions: 2s282ms (179.839ms)
   - Async loaded table: 2s289ms (6.931ms)
   - Loaded table from scratch: 2s289ms (72.038us)
   - Got table read lock: 2s289ms (2.289us)
   - Finished resetMetadata request: 2s300ms (10.188ms)

Example timeline for an INSERT statement:
Catalog Server Operation: 178.120ms
   - Got catalog version read lock: 4.238us (4.238us)
   - Got catalog version write lock and table write lock: 52.768us (48.530us)
   - Got Metastore client: 15.768ms (15.715ms)
   - Fired Metastore events: 156.650ms (140.882ms)
   - Got Metastore client: 163.317ms (6.666ms)
   - Fetched table from Metastore: 166.561ms (3.244ms)
   - Start refreshing file metadata: 167.961ms (1.399ms)
   - Loaded file metadata for 24 partitions: 177.679ms (9.717ms)
   - Reloaded table metadata: 178.021ms (342.261us)
   - Finished updateCatalog request: 178.120ms (98.929us)

Example timeline for a "COMPUTE STATS tpcds_parquet.store_sales":
Catalog Server Operation: 6s737ms
   - Got catalog version read lock: 19.971us (19.971us)
   - Got catalog version write lock and table write lock: 50.255us (30.284us)
   - Got Metastore client: 171.819us (121.564us)
   - Updated column stats: 25.560ms (25.388ms)
   - Got Metastore client: 69.298ms (43.738ms)
   - Altered 500 partitions in Metastore: 1s894ms (1s825ms)
   - Altered 1000 partitions in Metastore: 3s558ms (1s664ms)
   - Altered 1500 partitions in Metastore: 5s144ms (1s586ms)
   - Altered 1824 partitions in Metastore: 6s205ms (1s060ms)
   - Got Metastore client: 6s205ms (329.481us)
   - Altered table in Metastore: 6s216ms (11.073ms)
   - Got Metastore client: 6s216ms (13.377us)
   - Fetched table from Metastore: 6s219ms (2.419ms)
   - Loaded table schema: 6s223ms (4.130ms)
   - Got current Metastore event id 19017: 6s639ms (415.690ms)
   - Start loading file metadata: 6s639ms (9.591us)
   - Loaded file metadata for 1824 partitions: 6s729ms (90.196ms)
   - Reloaded table metadata: 6s735ms (5.865ms)
   - DDL finished: 6s737ms (2.255ms)

Tests:
 - Add e2e test to verify the catalog timeline in some DDLs.

Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5
---
M be/src/exec/catalog-op-executor.cc
M be/src/exec/catalog-op-executor.h
M be/src/service/client-request-state.cc
M be/src/service/impala-server.cc
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogdTableInvalidator.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/MaterializedViewHdfsTable.java
M fe/src/main/java/org/apache/impala/cat

[Impala-ASF-CR] IMPALA-12443: Add catalog timeline for all ddl profiles

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20491 )

Change subject: IMPALA-12443: Add catalog timeline for all ddl profiles
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14661/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20491
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5
Gerrit-Change-Number: 20491
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 12 Dec 2023 08:23:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-12 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 6:

> Patch Set 6: Verified-1
>
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun//

There is a FE test failure: 
https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/1079

java.lang.AssertionError: expected:<1> but was:<10>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:631)
at 
org.apache.impala.catalog.events.MetastoreEventsProcessorTest.runEventBatchingTest(MetastoreEventsProcessorTest.java:2466)
at 
org.apache.impala.catalog.events.MetastoreEventsProcessorTest.testEventBatching(MetastoreEventsProcessorTest.java:2452)


--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 6
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 12 Dec 2023 08:33:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12443: Add catalog timeline for all ddl profiles

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20491 )

Change subject: IMPALA-12443: Add catalog timeline for all ddl profiles
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14662/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20491
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5
Gerrit-Change-Number: 20491
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 12 Dec 2023 08:39:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is null/empty

2023-12-12 Thread Peter Rozsa (Code Review)
Peter Rozsa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20769 )

Change subject: IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is 
null/empty
..


Patch Set 3: Code-Review+1

(2 comments)

Thank you, Riza for fixing it!

http://gerrit.cloudera.org:8080/#/c/20769/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20769/3//COMMIT_MSG@10
PS3, Line 10: fix
nit: fixes


http://gerrit.cloudera.org:8080/#/c/20769/3/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
File fe/src/main/java/org/apache/impala/catalog/FeFsTable.java:

http://gerrit.cloudera.org:8080/#/c/20769/3/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@35
PS3, Line 35: import org.apache.commons.lang.StringUtils;
nit: unnecessary import



--
To view, visit http://gerrit.cloudera.org:8080/20769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15d811430871cd1319d9560901318c1c3b89403c
Gerrit-Change-Number: 20769
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 08:42:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-12 Thread Zihao Ye (Code Review)
Hello Wenzhe Zhou, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20681

to look at the new patch set (#13).

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..

IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

This patch adds a query option 'convert_kudu_utc_timestamps' similar to
'convert_legacy_hive_parquet_utc_timestamps'. When enabled, it converts
UTC timestamps read from Kudu to local timestamps.

The corresponding modification also include predicate pushdown and
runtime filter. Due to the ambiguity of timestamps caused by daylight
saving time changes, it is difficult to resolve in the bloom filter.
This patch additionally introduces a query option
'disable_kudu_local_timestamp_bloom_filter' to default disable the Kudu
timestamp bloom filter after enabling time zone conversion in order to
avoid erroneously filtering out data. However, for regions that do not
observe daylight saving time, it can be set to false to re-enable the
Kudu local timestamp bloom filter.

Testing:
- Add TestKuduTimestampConvert in query_test/test_kudu.py
Perform end-to-end testing in a custom cluster, including basic Kudu UTC
timestamp conversion testing, as well as checking if related predicate
pushdown and runtime filters are working correctly (even with timestamps
involving daylight saving time conversions).

Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
---
M be/src/exec/kudu/kudu-scanner.cc
M be/src/exec/kudu/kudu-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M bin/rat_exclude_files.txt
M common/function-registry/impala_functions.py
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/util/ExprUtil.java
A testdata/data/timestamp_at_dst_changes.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/kudu_predicate_with_timestamp_conversion.test
A 
testdata/workloads/functional-query/queries/QueryTest/kudu_runtime_filter_with_timestamp_conversion.test
A 
testdata/workloads/functional-query/queries/QueryTest/kudu_timestamp_conversion.test
M tests/query_test/test_kudu.py
25 files changed, 591 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/20681/13
--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 13
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 13:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/14663/ : Initial code 
review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 13
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 12 Dec 2023 09:08:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-12 Thread Zihao Ye (Code Review)
Hello Wenzhe Zhou, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20681

to look at the new patch set (#14).

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..

IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

This patch adds a query option 'convert_kudu_utc_timestamps' similar to
'convert_legacy_hive_parquet_utc_timestamps'. When enabled, it converts
UTC timestamps read from Kudu to local timestamps.

The corresponding modification also include predicate pushdown and
runtime filter. Due to the ambiguity of timestamps caused by daylight
saving time changes, it is difficult to resolve in the bloom filter.
This patch additionally introduces a query option
'disable_kudu_local_timestamp_bloom_filter' to default disable the Kudu
timestamp bloom filter after enabling time zone conversion in order to
avoid erroneously filtering out data. However, for regions that do not
observe daylight saving time, it can be set to false to re-enable the
Kudu local timestamp bloom filter.

Testing:
- Add TestKuduTimestampConvert in query_test/test_kudu.py
Perform end-to-end testing in a custom cluster, including basic Kudu UTC
timestamp conversion testing, as well as checking if related predicate
pushdown and runtime filters are working correctly (even with timestamps
involving daylight saving time conversions).

Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
---
M be/src/exec/kudu/kudu-scanner.cc
M be/src/exec/kudu/kudu-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M bin/rat_exclude_files.txt
M common/function-registry/impala_functions.py
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/util/ExprUtil.java
A testdata/data/timestamp_at_dst_changes.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/kudu_predicate_with_timestamp_conversion.test
A 
testdata/workloads/functional-query/queries/QueryTest/kudu_runtime_filter_with_timestamp_conversion.test
A 
testdata/workloads/functional-query/queries/QueryTest/kudu_timestamp_conversion.test
M tests/query_test/test_kudu.py
25 files changed, 591 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/20681/14
--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 14
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14664/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 14
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 12 Dec 2023 09:53:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-12 Thread Zihao Ye (Code Review)
Zihao Ye has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 12:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20681/12/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

http://gerrit.cloudera.org:8080/#/c/20681/12/common/function-registry/impala_functions.py@257
PS12, Line 257:   [['to_utc_timestamp'], 'TIMESTAMP', ['TIMESTAMP', 'STRING', 
'BOOLEAN'],
  :"impala::TimestampFunctions::ToUtcUnambiguous"],
> Maybe this should rather go to the invisible functions list?
 > I am not sure here - while I would prefer to not make this a
 > supported functions, it can be useful for users in some cases.

I've moved it to the invisible functions list. I feel it might not be intuitive 
as a function provided to users. If needed, we can introduce functions with 
clearer naming and parameters in a separate Jira ticket.


http://gerrit.cloudera.org:8080/#/c/20681/12/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20681/12/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@744
PS12, Line 744:*/
> Can you mention that now this can return a list for timestamps?
Done


http://gerrit.cloudera.org:8080/#/c/20681/12/testdata/workloads/functional-query/queries/QueryTest/kudu_predicate_with_timestamp_conversion.test
File 
testdata/workloads/functional-query/queries/QueryTest/kudu_predicate_with_timestamp_conversion.test:

http://gerrit.cloudera.org:8080/#/c/20681/12/testdata/workloads/functional-query/queries/QueryTest/kudu_predicate_with_timestamp_conversion.test@18
PS12, Line 18:  order by id
> here and at other queries: "order by id" is not necessary - the
 > rows in the results section + the actual results are ordered before
 > comparison by default in EE tests
 >
 > otherwise most tests would need order by, as Impala has no
 > guarantee for the order of returned rows if there are multiple
 > files in a table

Thank you for the reminder, all those "order by id" have been removed.



--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 12
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 12 Dec 2023 09:59:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11553: Add event specific metrics in the table metrics

2023-12-12 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20473 )

Change subject: IMPALA-11553: Add event specific metrics in the table metrics
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/20473/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20473/5//COMMIT_MSG@11
PS5, Line 11: extended
Where are the durations set? Are there also larger durations tracked, e.g. 1 
hour, 1 day?


http://gerrit.cloudera.org:8080/#/c/20473/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20473/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1534
PS5, Line 1534:   Timer.Context context = null;
It would be nice to reduce code duplication related to the timer. We seem to do 
the same for each MetastoreTableEvent, so  a shared function could have the 
timer block. E.g. MetastoreTableEvent could implement process() that calls 
processTableEvent(), and existing process() logicof subclasses could be done in 
processTableEvent()


http://gerrit.cloudera.org:8080/#/c/20473/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1536
PS5, Line 1536: getTable
getTableNoThrow() would let us skip the try/ catch block


http://gerrit.cloudera.org:8080/#/c/20473/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1745
PS5, Line 1745: DropTableEvent
What will happen with the timer after drop/rename events? Will we able to see 
in the webui that some event processing time was spent on these tables after 
the table no longer exists?



--
To view, visit http://gerrit.cloudera.org:8080/20473
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2428029361e610a0fcd8ed11be2ab771f03b00dd
Gerrit-Change-Number: 20473
Gerrit-PatchSet: 5
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 12 Dec 2023 10:04:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-12 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 14: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 14
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 12 Dec 2023 10:33:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 5:

(7 comments)

Follow-up for my first partial review.

http://gerrit.cloudera.org:8080/#/c/20760/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20760/5//COMMIT_MSG@26
PS5, Line 26: FlushFinal
Can it spill to disk if the data it has received can't be stored in memory?


http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-buffered-delete-sink.h
File be/src/exec/iceberg-buffered-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-buffered-delete-sink.h@72
PS5, Line 72:   DCHECK(!file_pos.empty());
We could also DCHECK that none of the vectors in the map are empty either.


http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-buffered-delete-sink.h@91
PS5, Line 91:   if (pos_level_it_ != file_level_it_->second.end()) {
This condition should always be true, we've just dereferenced 'pos_level_it_' 
on L84 before calling this function. We should DCHECK it instead.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h
File be/src/exec/iceberg-buffered-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@45
PS3, Line 45: partition_descriptor_map_
Does 'partition_descriptor_map_' exist?


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@48
PS3, Line 48: to the temporary Hdfs files
> Sorry, this was copy-pasted, updated the comments.
I can't see that it has changed in this file, though it did change in 
iceberg-delete-sink.h. Maybe those changes were intended here?
Also, the other function descriptions that were copied here may contain legacy 
comments?


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@69
PS3, Line 69:   class FilePositionsIterator {
> This way I can keep the typedefs private.
The typedefs could stay here, the class could be forward-declared here and 
defined in the .cc file.


http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.h
File be/src/exec/iceberg-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.h@66
PS5, Line 66:   /// and the delete files as well.
We could mention the case when the different deletes go to different sinks 
because of the differing new values and that we give an error in this case 
because we can't check it.



--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 10:34:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20648 )

Change subject: IMPALA-10987: Changing impala.disableHmsSync in Hive should not 
break event processing
..


Patch Set 8: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1/


--
To view, visit http://gerrit.cloudera.org:8080/20648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37055990be49e91462ebc98aa97009ca768a0072
Gerrit-Change-Number: 20648
Gerrit-PatchSet: 8
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 12 Dec 2023 10:50:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10002/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 11:29:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20760

to look at the new patch set (#6).

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..

IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

Part 2 had some limitations, most importantly it could not update
Iceberg tables if any of the following were true:
 * UPDATE value of partitioning column
 * UPDATE table that went through partition evolution
 * Table has SORT BY properties

The problem with partitions is that the delete record and new
data record might belong to different partitions and records
are shuffled across based on the partitions of the delete
records, hence the data files might not get written efficiently.

The problem with SORT BY properties, is that we need to write
the position delete files ordered by (file_path, position).

To address the above problems, this patch introduces a new
backend operator: IcebergBufferedDeleteSink. This new operator
extracts and aggregates the delete record information from
the incoming row batches, then in FlushFinal it orders the
position delete records and writes them out to files. This
mechanism is similar to Hive's approach:
https://github.com/apache/hive/pull/3251

IcebergBufferedDeleteSink cannot spill to disk, so it can only
run if there's enough memory to store the delete records. Paths
are stored only once, and the int64_t positions are stored in
a vector, so updating 100 Million records per node should require
around 800MBs + (100K) filepaths ~= 820 MBs of memory per node.
Spilling could be added later, but currently the need for it is not
too realistic.

Now records can get shuffled around based on the new data records'
partition values, and the SORT operator sorts the records based on
the SORT BY properties.

There's only one case we don't allow the UPDATE statement:
 * UPDATE partition column AND
 * Right-hand side of assignment is non-constant expression AND
 * UPDATE statement has a JOIN

When all of the above conditions meet, it would be possible to
have an incorrect JOIN condition that has multiple matches for the
data records, then the duplicated records would be shuffled
independently (based on the new partition value) to different
backend SINKs, and the different backend SINK would not be able
to detect the duplicates.

If any of the above conditions was false, then the duplicated records
would be shuffled together to the same SINK, that could do the
duplicate check.

This patch also moves some code from IcebergDeleteSink to the
newly introduced IcebergDeleteSinkBase.

Testing:
 * planner tests
 * e2e tests
 * Impala/Hive interop tests

Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
---
M be/src/exec/CMakeLists.txt
M be/src/exec/data-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
A be/src/exec/iceberg-buffered-delete-sink.cc
A be/src/exec/iceberg-buffered-delete-sink.h
A be/src/exec/iceberg-delete-sink-base.cc
A be/src/exec/iceberg-delete-sink-base.h
A be/src/exec/iceberg-delete-sink-config.cc
A be/src/exec/iceberg-delete-sink-config.h
M be/src/exec/iceberg-delete-sink.cc
M be/src/exec/iceberg-delete-sink.h
M be/src/exec/table-sink-base.cc
M be/src/exec/table-sink-base.h
M be/src/exprs/slot-ref.h
M be/src/runtime/dml-exec-state.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java
M fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
A fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-update.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-partitions.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-stress.test
M tests/query_test/test_iceberg.py
M tests/stress/test_update_stress.py
34 files changed, 1,881 insertions(+), 334 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/20760/6
--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9

[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 6:

(21 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/20760/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20760/5//COMMIT_MSG@26
PS5, Line 26: FlushFinal
> Can it spill to disk if the data it has received can't be stored in memory?
It cannot spill, though it's not realistic to have that many delete records 
buffered. I've added a paragraph about it.


http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-buffered-delete-sink.h
File be/src/exec/iceberg-buffered-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-buffered-delete-sink.h@72
PS5, Line 72:   /// Nested iterator class to conveniently iterate over a 
FilePositions object.
> We could also DCHECK that none of the vectors in the map are empty either.
Here this would need another iteration over the map (though only in debug 
mode). Instead of this I added a DCHECK to VerifyBufferedRecords().


http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-buffered-delete-sink.h@91
PS5, Line 91:   /// that SortBufferedRecords() has been called already.
> This condition should always be true, we've just dereferenced 'pos_level_it
Done


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-buffered-delete-sink.h
File be/src/exec/iceberg-buffered-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-buffered-delete-sink.h@35
PS4, Line 35: /// IcebergBufferedDeleteSink buffers the Iceberg position delete 
records in its
> There could be a comment describing when this class is/should be used, e.g.
Added comment.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h
File be/src/exec/iceberg-buffered-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@45
PS3, Line 45:
> Does 'partition_descriptor_map_' exist?
Updated the comments.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@48
PS3, Line 48: state, MemTracker* parent_m
> I can't see that it has changed in this file, though it did change in icebe
Yeah, sorry, updated the wrong comments.
Apart from Prepare/Open/Send/FlushFinal/Close I didn't copy legacy comments.


http://gerrit.cloudera.org:8080/#/c/20760/3/be/src/exec/iceberg-buffered-delete-sink.h@69
PS3, Line 69:   /// It's necessary to use a std::map so we can get back the 
file paths in order.
> The typedefs could stay here, the class could be forward-declared here and
You're right, I didn't think of that.


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-buffered-delete-sink.cc
File be/src/exec/iceberg-buffered-delete-sink.cc:

http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-buffered-delete-sink.cc@345
PS4, Line 345:   int capacity = batch->capacity();
> See comment at iceberg-delete-sink-base.cc:66.
Done


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h
File be/src/exec/iceberg-delete-sink-base.h:

http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h@30
PS4, Line 30: IcebergDeleteSinkBase
> Could we make this class pure virtual?
I'm not sure what do you mean by that. It is currently an abstract class as it 
doesn't override everything from DataSink(), e.g. Send().


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h@63
PS4, Line 63:   TIcebergPartitionTransformType::type transform_type, const 
std::string& value,
> An example would be useful.
Added comment.


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h@70
PS4, Line 70:
> Does the unbuffered IcebergDeleteSink also need a separate DmlExecState? Is
Moved it to IcebergBufferedDeleteSink.


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc
File be/src/exec/iceberg-delete-sink-base.cc:

http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@66
PS4, Line 66: TIcebergPartitionTransformType::type transform_type, const 
std::string& value,
> The 'closed_' field is a bit strange, AFAICS it comes from DataSink and the
I don't really find the comment misleading. Subclasses need to check it and set 
it to false either directly or indirectly via SuperClass::Close(state);
I rather not modify that comment in this CR.


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@88
PS4, Line 88:   ScalarExprEvaluator* spec_id_eval = 
partition_key_expr_evals_[0];
> Nit: doesn't it fit on the previous line?
Done


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@100
PS4, Line 100: _
> Why is it plural? Maybe it could be 'parti

[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 18:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10003/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 12 Dec 2023 11:39:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-12 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..

IMPALA-12465: Unicode column name support

Impala depends on Hive functions for column name validation and uses
validateName() function for the same. Since Hive already supports
unicode column names, the patch just updates the column name validation
function to validateColumnName(). validateName() checks for a certain
conformance based on pattern matching standards while
validateColumnName() places no restrictions on column names at the
Metadata level.

Testing: The support is tested and cross-checked with Hive. The tests
can be found in unicode-column-name.test.

Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
---
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
A testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
A tests/metadata/test_column_unicode.py
M tests/shell/test_shell_interactive.py
6 files changed, 366 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/20506/18
--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-12 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 18:

(3 comments)

> Uploaded patch set 18.

http://gerrit.cloudera.org:8080/#/c/20506/14/fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java
File fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java:

http://gerrit.cloudera.org:8080/#/c/20506/14/fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java@826
PS14, Line 826:
  :   /*
> nit: Use two stars in the first line of method comment, i.e. /**
Done


http://gerrit.cloudera.org:8080/#/c/20506/14/fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java@830
PS14, Line 830:   pu
> nit: use 2 spaces indention
Done


http://gerrit.cloudera.org:8080/#/c/20506/14/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/20506/14/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@465
PS14, Line 465:
> Instead of deleting these, we should change them into AnalyzesOk().
It leads to further errors, as other tests are not expecting those columns eg. 
`???`. That's why I didn't write them into AnalyzeOk().



--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 12 Dec 2023 11:38:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14665/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 11:55:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 12:05:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 18:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14666/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 12 Dec 2023 12:06:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-12 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20506/14/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/20506/14/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@465
PS14, Line 465:
> It leads to further errors, as other tests are not expecting those columns
What other tests? Probably we should change those too. How many are they?



--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 12 Dec 2023 12:13:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20773 )

Change subject: IMPALA-12229: Support soft-delete Kudu table
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20773/1/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20773/1/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java@332
PS1, Line 332:   boolean ifExists, int kudu_table_reserve_seconds)
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/20773/1/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/20773/1/tests/query_test/test_kudu.py@1351
PS1, Line 1351: t
flake8: F841 local variable 'table' is assigned to but never used



--
To view, visit http://gerrit.cloudera.org:8080/20773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Gerrit-Change-Number: 20773
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 12 Dec 2023 12:36:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table

2023-12-12 Thread Yifan Zhang (Code Review)
Yifan Zhang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20773


Change subject: IMPALA-12229: Support soft-delete Kudu table
..

IMPALA-12229: Support soft-delete Kudu table

Adds 'kudu_table_reserve_seconds' query option to set reserved time
for deleted Impala managed Kudu tables. The default value is 0.
This option can prevent users deleting important Kudu tables
by mistake.

Testing:
- Added an e2e test.

Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CatalogService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M infra/python/deps/kudu-requirements.txt
M tests/query_test/test_kudu.py
10 files changed, 73 insertions(+), 17 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20773/1
--
To view, visit http://gerrit.cloudera.org:8080/20773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Gerrit-Change-Number: 20773
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table

2023-12-12 Thread Yifan Zhang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20773

to look at the new patch set (#2).

Change subject: IMPALA-12229: Support soft-delete Kudu table
..

IMPALA-12229: Support soft-delete Kudu table

Adds 'kudu_table_reserve_seconds' query option to set reserved time
for deleted Impala managed Kudu tables. The default value is 0.
This option can prevent users deleting important Kudu tables
by mistake.

Testing:
- Added an e2e test.

Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CatalogService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M infra/python/deps/kudu-requirements.txt
M tests/query_test/test_kudu.py
10 files changed, 72 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20773/2
--
To view, visit http://gerrit.cloudera.org:8080/20773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Gerrit-Change-Number: 20773
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20760

to look at the new patch set (#7).

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..

IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

Part 2 had some limitations, most importantly it could not update
Iceberg tables if any of the following were true:
 * UPDATE value of partitioning column
 * UPDATE table that went through partition evolution
 * Table has SORT BY properties

The problem with partitions is that the delete record and new
data record might belong to different partitions and records
are shuffled across based on the partitions of the delete
records, hence the data files might not get written efficiently.

The problem with SORT BY properties, is that we need to write
the position delete files ordered by (file_path, position).

To address the above problems, this patch introduces a new
backend operator: IcebergBufferedDeleteSink. This new operator
extracts and aggregates the delete record information from
the incoming row batches, then in FlushFinal it orders the
position delete records and writes them out to files. This
mechanism is similar to Hive's approach:
https://github.com/apache/hive/pull/3251

IcebergBufferedDeleteSink cannot spill to disk, so it can only
run if there's enough memory to store the delete records. Paths
are stored only once, and the int64_t positions are stored in
a vector, so updating 100 Million records per node should require
around 800MBs + (100K) filepaths ~= 820 MBs of memory per node.
Spilling could be added later, but currently the need for it is not
too realistic.

Now records can get shuffled around based on the new data records'
partition values, and the SORT operator sorts the records based on
the SORT BY properties.

There's only one case we don't allow the UPDATE statement:
 * UPDATE partition column AND
 * Right-hand side of assignment is non-constant expression AND
 * UPDATE statement has a JOIN

When all of the above conditions meet, it would be possible to
have an incorrect JOIN condition that has multiple matches for the
data records, then the duplicated records would be shuffled
independently (based on the new partition value) to different
backend SINKs, and the different backend SINK would not be able
to detect the duplicates.

If any of the above conditions was false, then the duplicated records
would be shuffled together to the same SINK, that could do the
duplicate check.

This patch also moves some code from IcebergDeleteSink to the
newly introduced IcebergDeleteSinkBase.

Testing:
 * planner tests
 * e2e tests
 * Impala/Hive interop tests

Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
---
M be/src/exec/CMakeLists.txt
M be/src/exec/data-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
A be/src/exec/iceberg-buffered-delete-sink.cc
A be/src/exec/iceberg-buffered-delete-sink.h
A be/src/exec/iceberg-delete-sink-base.cc
A be/src/exec/iceberg-delete-sink-base.h
A be/src/exec/iceberg-delete-sink-config.cc
A be/src/exec/iceberg-delete-sink-config.h
M be/src/exec/iceberg-delete-sink.cc
M be/src/exec/iceberg-delete-sink.h
M be/src/exec/table-sink-base.cc
M be/src/exec/table-sink-base.h
M be/src/exprs/slot-ref.h
M be/src/runtime/dml-exec-state.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java
M fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
A fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-update.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-partitions.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-stress.test
M tests/query_test/test_iceberg.py
M tests/stress/test_update_stress.py
34 files changed, 1,882 insertions(+), 334 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/20760/7
--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9

[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 7:

(1 comment)

PS7 is a rebase (apart from fixing a long line).

http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/iceberg-delete-sink-base.cc
File be/src/exec/iceberg-delete-sink-base.cc:

http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/iceberg-delete-sink-base.cc@167
PS6, Line 167: if (i < partition_values_decoded.size() - 1) {
> line too long (91 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 12:43:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20773 )

Change subject: IMPALA-12229: Support soft-delete Kudu table
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14667/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Gerrit-Change-Number: 20773
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 12 Dec 2023 13:02:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20773 )

Change subject: IMPALA-12229: Support soft-delete Kudu table
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14668/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Gerrit-Change-Number: 20773
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 12 Dec 2023 13:06:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14669/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 13:16:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 6:

(8 comments)

With this partial review I'm clear from the beginning up to 
iceberg-delete-sink.cc.

http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/iceberg-buffered-delete-sink.h
File be/src/exec/iceberg-buffered-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/iceberg-buffered-delete-sink.h@56
PS6, Line 56:   /// Writes the buffered records to position delete files.
We could add "after sorting" or "in the correct order".


http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-buffered-delete-sink.h
File be/src/exec/iceberg-buffered-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-buffered-delete-sink.h@72
PS5, Line 72:   /// Nested iterator class to conveniently iterate over a 
FilePositions object.
> Here this would need another iteration over the map (though only in debug m
I feel that it is cleaner to check it here - it is a precondition of this class 
that there should be no empty vectors.

An alternative to the extra loop would be to check that the vector is not empty 
whenever we call file_level_it_->second.begin(), i.e. on L75 and L103. We could 
either add DCHECK(!file_level_it_.second.empty()) before these calls or 
DCHECK(pos_level_it != file_level_it_.second.end()) afterwards.

Or, even simpler, we could add DCHECK(pos_level_it != 
file_level_it_.second.end()) before L84 in Next(), i.e. before dereferencing 
'pos_level_it_', with an error message about the emty vector.


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h
File be/src/exec/iceberg-delete-sink-base.h:

http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.h@30
PS4, Line 30: IcebergDeleteSinkBase
> I'm not sure what do you mean by that. It is currently an abstract class as
Right, I missed it.


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc
File be/src/exec/iceberg-delete-sink-base.cc:

http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@66
PS4, Line 66: TIcebergPartitionTransformType::type transform_type, const 
std::string& value,
> I don't really find the comment misleading. Subclasses need to check it and
Right, I somehow assumed that superclass methods are automatically called like 
in the case of destructors.


http://gerrit.cloudera.org:8080/#/c/20760/4/be/src/exec/iceberg-delete-sink-base.cc@112
PS4, Line 112: // non void partition names and transforms.
> Yeah, currently it is checked twice when the first overload is being used,
Ok, it's indeed a cheap check.


http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.h
File be/src/exec/iceberg-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.h@66
PS5, Line 66:   /// having the same filepath + position), because that would 
corrupt the table data
> That case is handled in IcebergUpdateImpl.java, so I added the comment ther
I'd still consider adding a reference to the comment in IcebergUpdateImpl.java, 
for example
  "For a case where deduplication is not possible at the sink level, see the 
comment in IcebergUpdateImpl::buildAndValidateSelectExprs()".


http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.cc
File be/src/exec/iceberg-delete-sink.cc:

http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.cc@79
PS5, Line 79: VerifyRowsNotDuplicated
Can we add DCHECKs for (prev_file_path_, filepath) and (prev_position_, 
position) to check that the data is already sorted?


http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.cc@121
PS5, Line 121:   RETURN_IF_ERROR(ConstructPartitionInfo(row, 
current_partition_.first.get()));
What is the reason for taking this out of InitOutputPartition()?



--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 14:15:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 5: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@310
PS5, Line 310: while
optional: Not sure if this worth the effort, but it would be nice to make this 
better than O(N) - it doesn't seem impossible that alter database or 
create/drop database pairs are common in some workload. It could also make 
sense to increase batch sizes in the future, making this search quite costly.

e.g. pendingTableEventsMap could be Map>, 
with db/table keys, and the first time a db is met could add the db entries


http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2610
PS5, Line 2610: anways
typo



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 14:26:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is null/empty

2023-12-12 Thread Riza Suminto (Code Review)
Hello Zoltan Borok-Nagy, Peter Rozsa, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20769

to look at the new patch set (#4).

Change subject: IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is 
null/empty
..

IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is null/empty

IMPALA-11507 missed a case where both FileDescriptor.getRelativePath()
and FbFileDesc.absolutePath() are null/empty. This patch fixes the bug
so that FileDescriptor.getPath() and FileDescriptor.getAbsolutePath() do
not fallback to FbFileDesc.absolutePath() if it is null/empty.

Testing:
Add test_scanner.py::TestSingleFileTable

Change-Id: I15d811430871cd1319d9560901318c1c3b89403c
---
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M tests/query_test/test_scanners.py
2 files changed, 61 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/20769/4
--
To view, visit http://gerrit.cloudera.org:8080/20769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15d811430871cd1319d9560901318c1c3b89403c
Gerrit-Change-Number: 20769
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is null/empty

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20769 )

Change subject: IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is 
null/empty
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20769/4/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/20769/4/tests/query_test/test_scanners.py@1995
PS4, Line 1995:
flake8: W391 blank line at end of file



--
To view, visit http://gerrit.cloudera.org:8080/20769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15d811430871cd1319d9560901318c1c3b89403c
Gerrit-Change-Number: 20769
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 15:34:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10002/


--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 15:58:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is null/empty

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20769 )

Change subject: IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is 
null/empty
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14670/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15d811430871cd1319d9560901318c1c3b89403c
Gerrit-Change-Number: 20769
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 16:01:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 18: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 12 Dec 2023 16:08:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 7:

(14 comments)

Finally gone through all of the files.

http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/table-sink-base.h
File be/src/exec/table-sink-base.h:

http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/table-sink-base.h@90
PS6, Line 90: The partition key is derived from 'row'.
We don't have 'row' anymore. We could add that the partition key is now 
supposed to be available in 'output_partition' (if this is correct).


http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/table-sink-base.cc
File be/src/exec/table-sink-base.cc:

http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/table-sink-base.cc@365
PS6, Line 365: OutputPartition
Is there a reason for using separate OutputPartition and vector parameters 
instead of PartitionPair? And not clearing the vector at the end of this 
function?


http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java
File fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java:

http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java@115
PS6, Line 115: If there are duplicates in the JOIN operator
Strictly speaking, we cannot check for duplicates even if there are none, so 
this clause is not needed, though it is useful to mention that the situation 
described below happens if there is a JOIN.


http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java@126
PS6, Line 126: via UPDATE FROM
'modifyStmt_.fromClause_.size() > 1' suggests if there is only one table (or 
view) in the FROM clause then it should be ok. Is that viable?


http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java:

http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@101
PS6, Line 101:   public TSortingOrder getSortingOrder() {
Wouldn't it be better to not have 'getSortingOrder()' in DmlStatementBase but 
at a lower lever in the class hierarchy? Or would it make sense for 
OptimizeStmt too, it's just not implemented yet?


http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java
File fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java:

http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java@34
PS6, Line 34: TableSink
Would it make sense to have an IcebergDeleteSinkBase in the Java code too? Some 
functions and fields seem to be the same or very similar here and in 
IcebergDeleteSink.


http://gerrit.cloudera.org:8080/#/c/20760/7/fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java
File fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java:

http://gerrit.cloudera.org:8080/#/c/20760/7/fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java@76
PS7, Line 76: 1024L * 1024L
We could use ByteUnits.MEGABYTE, but see comment at 
IcebergBufferedDeleteSink.java:34.


http://gerrit.cloudera.org:8080/#/c/20760/7/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/20760/7/fe/src/main/java/org/apache/impala/planner/Planner.java@926
PS7, Line 926: get
'partitionKeyExprs'?


http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/datasets/functional/functional_schema_template.sql@3407
PS7, Line 3407: INTO
Shouldn't this be an INSERT OVERWRITE TABLE, like for example on L3469? If we 
re-load the table without first deleting it then it will add extra rows.


http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/datasets/functional/functional_schema_template.sql@3407
PS7, Line 3407: iceberg_partition_transforms_zorder
Shouldn't this be '{table_name}'?


http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-update.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-update.test:

http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-update.test@340
PS7, Line 340: from iceberg_partition_transforms_zorder ice_zorder, 
iceberg_partitioned source
Just curious: can we also use INNER JOIN syntax here?


http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test
File 
test

[Impala-ASF-CR] IMPALA-12614: Use atomics for 64-bit host metrics

2023-12-12 Thread Kurt Deschler (Code Review)
Kurt Deschler has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20776


Change subject: IMPALA-12614: Use atomics for 64-bit host metrics
..

IMPALA-12614: Use atomics for 64-bit host metrics

This patch changes 64-bit host metrics to use atomics to avoid potential
partial load/store data races. There are no functional changes. This
issue was exposed by TSAN tests when IMPALA-12385 enabled periodic
metrics by default.

Testing: TSAN tests cover this case.
Change-Id: I88a15d3ccfeab29506427b3bfcaeebf3a2831176
---
M be/src/runtime/query-state.cc
M be/src/util/system-state-info.cc
M be/src/util/system-state-info.h
3 files changed, 13 insertions(+), 12 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/20776/1
--
To view, visit http://gerrit.cloudera.org:8080/20776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I88a15d3ccfeab29506427b3bfcaeebf3a2831176
Gerrit-Change-Number: 20776
Gerrit-PatchSet: 1
Gerrit-Owner: Kurt Deschler 


[Impala-ASF-CR] IMPALA-12617: Fix DCHECK failure for Statestore

2023-12-12 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20775


Change subject: IMPALA-12617: Fix DCHECK failure for Statestore
..

IMPALA-12617: Fix DCHECK failure for Statestore

Statestore uses thread pools to periodically send catalog topic update
and cluster membership. It adds sending tasks to the queues of thread
pools when receiving registration requests from subscribers so the
thread pools have to be ready before the Thrift server of Statestore
is started to accept registration request.
Current code call ThreadPool::Init() after the Thrift server is started.
This could cause Statestore to hit DCHECK failure when calling
ThreadPool::Offer(). It's more likely to happen when Statestore HA is
enabled since Statestore takes more time for initialization.
This patch changes the order to call ThreadPool::Init() before starting
Thrift server of the Statestore server.

Testing:
 - Repeatedly ran custom_cluster/test_statestored_ha.py on local machine
   and Jenkins over night without failure.
 - Passed core tests.

Change-Id: I91423f3de2d64cb617a06ea7adbe5ee2937bde66
---
M be/src/statestore/statestore.cc
1 file changed, 4 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/20775/1
--
To view, visit http://gerrit.cloudera.org:8080/20775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I91423f3de2d64cb617a06ea7adbe5ee2937bde66
Gerrit-Change-Number: 20775
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-12 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20506/14/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/20506/14/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@465
PS14, Line 465: // Invalid column name.
> What other tests? Probably we should change those too. How many are they?
If it's really a huge amount of work we can do it in a separate patch but 
preferably it should be done in this one.



--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 12 Dec 2023 17:01:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Revert "IMPALA-9923: Load ORC serially to hack around flakiness"

2023-12-12 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20755 )

Change subject: Revert "IMPALA-9923: Load ORC serially to hack around flakiness"
..


Patch Set 2:

Filed Jira IMPALA-12617 for failure of test_statestored_manual_failover


--
To view, visit http://gerrit.cloudera.org:8080/20755
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67f4051dd07273f2b51843cb5c1ec2cf185c5924
Gerrit-Change-Number: 20755
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 17:07:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12614: Use atomics for 64-bit host metrics

2023-12-12 Thread Andrew Sherman (Code Review)
Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20776 )

Change subject: IMPALA-12614: Use atomics for 64-bit host metrics
..


Patch Set 1: Code-Review+2

LGTM, thanks for the quick fix


--
To view, visit http://gerrit.cloudera.org:8080/20776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88a15d3ccfeab29506427b3bfcaeebf3a2831176
Gerrit-Change-Number: 20776
Gerrit-PatchSet: 1
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 12 Dec 2023 17:09:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12614: Use atomics for 64-bit host metrics

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20776 )

Change subject: IMPALA-12614: Use atomics for 64-bit host metrics
..


Patch Set 1:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/14671/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/20776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88a15d3ccfeab29506427b3bfcaeebf3a2831176
Gerrit-Change-Number: 20776
Gerrit-PatchSet: 1
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 12 Dec 2023 17:18:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12617: Fix DCHECK failure for Statestore

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20775 )

Change subject: IMPALA-12617: Fix DCHECK failure for Statestore
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14672/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91423f3de2d64cb617a06ea7adbe5ee2937bde66
Gerrit-Change-Number: 20775
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 12 Dec 2023 17:26:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12617: Fix DCHECK failure for Statestore

2023-12-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20775 )

Change subject: IMPALA-12617: Fix DCHECK failure for Statestore
..


Patch Set 1: Code-Review+1

Looks OK to me. Thanks for fixing this!


-- 
To view, visit http://gerrit.cloudera.org:8080/20775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91423f3de2d64cb617a06ea7adbe5ee2937bde66
Gerrit-Change-Number: 20775
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 12 Dec 2023 17:26:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is null/empty

2023-12-12 Thread Riza Suminto (Code Review)
Hello Zoltan Borok-Nagy, Peter Rozsa, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20769

to look at the new patch set (#5).

Change subject: IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is 
null/empty
..

IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is null/empty

IMPALA-11507 missed a case where both FileDescriptor.getRelativePath()
and FbFileDesc.absolutePath() are null/empty. This patch fixes the bug
so that FileDescriptor.getPath() and FileDescriptor.getAbsolutePath() do
not fallback to FbFileDesc.absolutePath() if it is null/empty.

Testing:
- Add test_scanner.py::TestSingleFileTable
- Pass core tests

Change-Id: I15d811430871cd1319d9560901318c1c3b89403c
---
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M tests/query_test/test_scanners.py
2 files changed, 61 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/20769/5
--
To view, visit http://gerrit.cloudera.org:8080/20769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15d811430871cd1319d9560901318c1c3b89403c
Gerrit-Change-Number: 20769
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is null/empty

2023-12-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20769 )

Change subject: IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is 
null/empty
..


Patch Set 5:

Ran core tests overnight and it pass.


--
To view, visit http://gerrit.cloudera.org:8080/20769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15d811430871cd1319d9560901318c1c3b89403c
Gerrit-Change-Number: 20769
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 17:32:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing

2023-12-12 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20648

to look at the new patch set (#9).

Change subject: IMPALA-10987: Changing impala.disableHmsSync in Hive should not 
break event processing
..

IMPALA-10987: Changing impala.disableHmsSync in
Hive should not break event processing

Currently we require a global invalidate to reset the events processor
if the events sync is re-enabled on a table from HMS. This patch
eliminates the need to reset the catalog cache when events sync is
re-enabled.

Implementation details: when events sync is re-enabled on table via HMS
1) If the table exists in Impala,
  a) We can just invalidate the table, if the current event is greater
than the create event id of the table, so that it is reloaded the first
time query accesses it.
  b) Otherwise we can just ignore the event.
2) If the table doesn't exist in Impala, create a Incomplete table, if
there is no entry in the event delete log for this table.

Note: If the eventSync is disabled on a table, for all subsequent table
events, ideally we should mark the table as stale if the table object
is loaded, so that it is reloaded the next time query accesses it. But,
since this approach has performance impact, the events will be ignored.

Testing:
1) manually verified few scenarios.
2) Added test case for the above scenarios.

Change-Id: I37055990be49e91462ebc98aa97009ca768a0072
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_events_custom_configs.py
3 files changed, 162 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/20648/9
--
To view, visit http://gerrit.cloudera.org:8080/20648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37055990be49e91462ebc98aa97009ca768a0072
Gerrit-Change-Number: 20648
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20648 )

Change subject: IMPALA-10987: Changing impala.disableHmsSync in Hive should not 
break event processing
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14673/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37055990be49e91462ebc98aa97009ca768a0072
Gerrit-Change-Number: 20648
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 12 Dec 2023 18:57:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20760

to look at the new patch set (#8).

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..

IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

Part 2 had some limitations, most importantly it could not update
Iceberg tables if any of the following were true:
 * UPDATE value of partitioning column
 * UPDATE table that went through partition evolution
 * Table has SORT BY properties

The problem with partitions is that the delete record and new
data record might belong to different partitions and records
are shuffled across based on the partitions of the delete
records, hence the data files might not get written efficiently.

The problem with SORT BY properties, is that we need to write
the position delete files ordered by (file_path, position).

To address the above problems, this patch introduces a new
backend operator: IcebergBufferedDeleteSink. This new operator
extracts and aggregates the delete record information from
the incoming row batches, then in FlushFinal it orders the
position delete records and writes them out to files. This
mechanism is similar to Hive's approach:
https://github.com/apache/hive/pull/3251

IcebergBufferedDeleteSink cannot spill to disk, so it can only
run if there's enough memory to store the delete records. Paths
are stored only once, and the int64_t positions are stored in
a vector, so updating 100 Million records per node should require
around 800MBs + (100K) filepaths ~= 820 MBs of memory per node.
Spilling could be added later, but currently the need for it is not
too realistic.

Now records can get shuffled around based on the new data records'
partition values, and the SORT operator sorts the records based on
the SORT BY properties.

There's only one case we don't allow the UPDATE statement:
 * UPDATE partition column AND
 * Right-hand side of assignment is non-constant expression AND
 * UPDATE statement has a JOIN

When all of the above conditions meet, it would be possible to
have an incorrect JOIN condition that has multiple matches for the
data records, then the duplicated records would be shuffled
independently (based on the new partition value) to different
backend SINKs, and the different backend SINK would not be able
to detect the duplicates.

If any of the above conditions was false, then the duplicated records
would be shuffled together to the same SINK, that could do the
duplicate check.

This patch also moves some code from IcebergDeleteSink to the
newly introduced IcebergDeleteSinkBase.

Testing:
 * planner tests
 * e2e tests
 * Impala/Hive interop tests

Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
---
M be/src/exec/CMakeLists.txt
M be/src/exec/data-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
A be/src/exec/iceberg-buffered-delete-sink.cc
A be/src/exec/iceberg-buffered-delete-sink.h
A be/src/exec/iceberg-delete-sink-base.cc
A be/src/exec/iceberg-delete-sink-base.h
A be/src/exec/iceberg-delete-sink-config.cc
A be/src/exec/iceberg-delete-sink-config.h
M be/src/exec/iceberg-delete-sink.cc
M be/src/exec/iceberg-delete-sink.h
M be/src/exec/table-sink-base.cc
M be/src/exec/table-sink-base.h
M be/src/exprs/slot-ref.h
M be/src/runtime/dml-exec-state.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java
M fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
A fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-update.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-partitions.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-stress.test
M tests/query_test/test_iceberg.py
M tests/stress/test_update_stress.py
35 files changed, 1,956 insertions(+), 344 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/20760/8
--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Bran

[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20760

to look at the new patch set (#9).

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..

IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

Part 2 had some limitations, most importantly it could not update
Iceberg tables if any of the following were true:
 * UPDATE value of partitioning column
 * UPDATE table that went through partition evolution
 * Table has SORT BY properties

The problem with partitions is that the delete record and new
data record might belong to different partitions and records
are shuffled across based on the partitions of the delete
records, hence the data files might not get written efficiently.

The problem with SORT BY properties, is that we need to write
the position delete files ordered by (file_path, position).

To address the above problems, this patch introduces a new
backend operator: IcebergBufferedDeleteSink. This new operator
extracts and aggregates the delete record information from
the incoming row batches, then in FlushFinal it orders the
position delete records and writes them out to files. This
mechanism is similar to Hive's approach:
https://github.com/apache/hive/pull/3251

IcebergBufferedDeleteSink cannot spill to disk, so it can only
run if there's enough memory to store the delete records. Paths
are stored only once, and the int64_t positions are stored in
a vector, so updating 100 Million records per node should require
around 800MBs + (100K) filepaths ~= 820 MBs of memory per node.
Spilling could be added later, but currently the need for it is not
too realistic.

Now records can get shuffled around based on the new data records'
partition values, and the SORT operator sorts the records based on
the SORT BY properties.

There's only one case we don't allow the UPDATE statement:
 * UPDATE partition column AND
 * Right-hand side of assignment is non-constant expression AND
 * UPDATE statement has a JOIN

When all of the above conditions meet, it would be possible to
have an incorrect JOIN condition that has multiple matches for the
data records, then the duplicated records would be shuffled
independently (based on the new partition value) to different
backend SINKs, and the different backend SINK would not be able
to detect the duplicates.

If any of the above conditions was false, then the duplicated records
would be shuffled together to the same SINK, that could do the
duplicate check.

This patch also moves some code from IcebergDeleteSink to the
newly introduced IcebergDeleteSinkBase.

Testing:
 * planner tests
 * e2e tests
 * Impala/Hive interop tests

Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
---
M be/src/exec/CMakeLists.txt
M be/src/exec/data-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
A be/src/exec/iceberg-buffered-delete-sink.cc
A be/src/exec/iceberg-buffered-delete-sink.h
A be/src/exec/iceberg-delete-sink-base.cc
A be/src/exec/iceberg-delete-sink-base.h
A be/src/exec/iceberg-delete-sink-config.cc
A be/src/exec/iceberg-delete-sink-config.h
M be/src/exec/iceberg-delete-sink.cc
M be/src/exec/iceberg-delete-sink.h
M be/src/exec/table-sink-base.cc
M be/src/exec/table-sink-base.h
M be/src/exprs/slot-ref.h
M be/src/runtime/dml-exec-state.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java
M fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
A fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-update.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-partitions.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-stress.test
M tests/query_test/test_iceberg.py
M tests/stress/test_update_stress.py
35 files changed, 1,956 insertions(+), 344 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/20760/9
--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Bran

[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 8:

(19 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/iceberg-buffered-delete-sink.h
File be/src/exec/iceberg-buffered-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/iceberg-buffered-delete-sink.h@56
PS6, Line 56:   /// Writes the buffered records to position delete files in the 
correct oreder.
> We could add "after sorting" or "in the correct order".
Done


http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-buffered-delete-sink.h
File be/src/exec/iceberg-buffered-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-buffered-delete-sink.h@72
PS5, Line 72:   /// Nested iterator class to conveniently iterate over a 
FilePositions object.
> I feel that it is cleaner to check it here - it is a precondition of this c
Added DCHECK(!file_level_it_.second.empty()) checks as these are the most 
straightforward about what we want to ensure.

Also added DCHECK(pos_level_it != file_level_it_.second.end()) to check that we 
haven't messed up anyting in NextPos()/NextFile().


http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.h
File be/src/exec/iceberg-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.h@66
PS5, Line 66:   /// having the same filepath + position), because that would 
corrupt the table data
> I'd still consider adding a reference to the comment in IcebergUpdateImpl.j
Added reference


http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.cc
File be/src/exec/iceberg-delete-sink.cc:

http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.cc@79
PS5, Line 79: VerifyRowsNotDuplicated
> Can we add DCHECKs for (prev_file_path_, filepath) and (prev_position_, pos
file paths and positions are not sorted across partitions. So we would need to 
reset them whenever we create new partitions. Planner tests check the presence 
of the SORT node and sort expressions.
I rather not modify code that is working now, also, I think we might retire 
IcebergDeleteSink anyway and always use IcebergBufferedDeleteWriter because I 
feel it should be much faster (need to run some measurements before), plus we 
wouldn't need to maintain two classes for the same functionality.
We would lose spilling capabilities, but that might not needed anyway.


http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.cc@121
PS5, Line 121:   RETURN_IF_ERROR(ConstructPartitionInfo(row, 
current_partition_.first.get()));
> What is the reason for taking this out of InitOutputPartition()?
This method has different signatures in the child classes 
(IcebergBufferedDeleteSink), so we cannot do a virtual call in 
InitOutputPartition() anymore.
Child classes are now responsible creating the current partition object and 
pass it to InitOutputPartition.


http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/table-sink-base.h
File be/src/exec/table-sink-base.h:

http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/table-sink-base.h@90
PS6, Line 90: The caller of this function must already
> We don't have 'row' anymore. We could add that the partition key is now sup
Updated the comment.


http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/table-sink-base.cc
File be/src/exec/table-sink-base.cc:

http://gerrit.cloudera.org:8080/#/c/20760/6/be/src/exec/table-sink-base.cc@365
PS6, Line 365: OutputPartition
> Is there a reason for using separate OutputPartition and vector parameters
The vector only makes sense for the clustered writers where the vector denotes 
the values that belong to the current partition in the row batch.
In IcebergBufferedDeleteSink the vector is not needed as the created row 
batches always have rows of a single partition.
I felt that PartitionPair* is a bit awkward data structure, and it was a bit 
strange that WriteRowsToPartition() modifies its vector.


http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java
File fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java:

http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java@115
PS6, Line 115: If there are duplicates in the JOIN operator
> Strictly speaking, we cannot check for duplicates even if there are none, s
I'm not sure what is the point here. Duplicates are only possible in the 
context of JOINs.
Otherwise a source statement like SELECT * FROM target_table WHERE ; 
should never duplicate rows AFAICT.
But I happily rephrase the comment if you have a suggestion.


http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdate

[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-12 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, k.venureddy2...@gmail.com, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20485

to look at the new patch set (#7).

Change subject: IMPALA-10949: Improve batching logic of partition events
..

IMPALA-10949: Improve batching logic of partition events

Currently the batching logic for partitions evaluates self-event check
by looking at the event id of the last event in the batch. This patch
improves the batching logic by evaluating table's lastSyncEventId to
the current event's event id and decide whether to batch the event or
not. This way we can have fewer batch sizes and avoid self events if
any, beforehand.

Testing: Added an end-to-end test to verfiy the batch size for
ALTER_PARTITION batch events.

Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_events_custom_configs.py
3 files changed, 58 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/20485/7
--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 7
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is null/empty

2023-12-12 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20769 )

Change subject: IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is 
null/empty
..


Patch Set 5: Code-Review+1

(2 comments)

Had minor comments, but LGTM!

http://gerrit.cloudera.org:8080/#/c/20769/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20769/5//COMMIT_MSG@13
PS5, Line 13:
Could you please elaborate on how and why this happens?


http://gerrit.cloudera.org:8080/#/c/20769/5/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/20769/5/tests/query_test/test_scanners.py@1993
PS5, Line 1993: select_stmt = "select count(*) from 
{db}.{tbl}".format(**params)
Maybe we could check the result of the count(*) at least.



--
To view, visit http://gerrit.cloudera.org:8080/20769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15d811430871cd1319d9560901318c1c3b89403c
Gerrit-Change-Number: 20769
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 19:07:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table

2023-12-12 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20773 )

Change subject: IMPALA-12229: Support soft-delete Kudu table
..


Patch Set 2:

(2 comments)

Thanks for your contribution. It looks good to me.

http://gerrit.cloudera.org:8080/#/c/20773/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20773/2//COMMIT_MSG@11
PS2, Line 11: prevent users deleting
nit: prevent users from deleting


http://gerrit.cloudera.org:8080/#/c/20773/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20773/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2683
PS2, Line 2683: dropTablesFromKudu(db, kudu_table_reserve_seconds)
What's behavior of Kudu engine for soft table deletion when the database is 
dropped? Can we recover the tables after database is deleted?



--
To view, visit http://gerrit.cloudera.org:8080/20773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Gerrit-Change-Number: 20773
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 12 Dec 2023 19:11:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10004/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 19:12:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10005/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 7
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 12 Dec 2023 19:14:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20648 )

Change subject: IMPALA-10987: Changing impala.disableHmsSync in Hive should not 
break event processing
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10006/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37055990be49e91462ebc98aa97009ca768a0072
Gerrit-Change-Number: 20648
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 12 Dec 2023 19:14:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 8:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/14674/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 19:19:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 9:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/14675/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 19:19:48 +
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-11157: Rename hadoop to hadoop-client

2023-12-12 Thread Michael Smith (Code Review)
Michael Smith has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20777


Change subject: IMPALA-11157: Rename hadoop to hadoop-client
..

IMPALA-11157: Rename hadoop to hadoop-client

The hadoop build only produces client binaries, not a full hadoop build.
The name was therefore misleading, and could not replace the full build
of hadoop required by Impala. Impala's toolchain bootstrap process would
then fail if we tried to include two packages named "hadoop" when
overriding the download URL via IMPALA_HADOOP_URL.

Renames hadoop to hadoop-client to clarify its contents and avoid
conflicts with a full hadoop build.

Change-Id: I9049efb6fa83a692a6c242ebdf1b7180ee396469
---
M buildall.sh
R source/hadoop-client/build.sh
2 files changed, 2 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/77/20777/1
--
To view, visit http://gerrit.cloudera.org:8080/20777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9049efb6fa83a692a6c242ebdf1b7180ee396469
Gerrit-Change-Number: 20777
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 


[native-toolchain-CR] IMPALA-3192: Add fetch.sh to download prebuilt artifacts

2023-12-12 Thread Michael Smith (Code Review)
Michael Smith has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20778


Change subject: IMPALA-3192: Add fetch.sh to download prebuilt artifacts
..

IMPALA-3192: Add fetch.sh to download prebuilt artifacts

Adds fetch.sh to download prebuilt artifacts from an existing toolchain
build. Requires a TOOLCHAIN_BUILD_ID (can be found in Impala's
impala-config.sh).

To get compilation tools, run

./fetch.sh $TOOLCHAIN_BUILD_ID

then download specific packages with

./fetch.sh $TOOLCHAIN_BUILD_ID zlib 1.2.13 zstd 1.5.2

Change-Id: I2914bd7d5e0589200d114218fbe107bec0b1f8a2
---
M README.md
A fetch.sh
M init-compiler.sh
M init.sh
4 files changed, 109 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/78/20778/1
--
To view, visit http://gerrit.cloudera.org:8080/20778
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2914bd7d5e0589200d114218fbe107bec0b1f8a2
Gerrit-Change-Number: 20778
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14676/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 7
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 12 Dec 2023 19:30:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-12 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 7: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 7
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Tue, 12 Dec 2023 19:26:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 9: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10004/


--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 19:33:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12614: Use atomics for 64-bit host metrics

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20776 )

Change subject: IMPALA-12614: Use atomics for 64-bit host metrics
..


Patch Set 1:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/14677/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/20776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88a15d3ccfeab29506427b3bfcaeebf3a2831176
Gerrit-Change-Number: 20776
Gerrit-PatchSet: 1
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 12 Dec 2023 19:35:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12617: Fix DCHECK failure for Statestore

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20775 )

Change subject: IMPALA-12617: Fix DCHECK failure for Statestore
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10007/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91423f3de2d64cb617a06ea7adbe5ee2937bde66
Gerrit-Change-Number: 20775
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 12 Dec 2023 19:35:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is null/empty

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20769 )

Change subject: IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is 
null/empty
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20769/6/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/20769/6/tests/query_test/test_scanners.py@1997
PS6, Line 1997:
flake8: W391 blank line at end of file



--
To view, visit http://gerrit.cloudera.org:8080/20769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15d811430871cd1319d9560901318c1c3b89403c
Gerrit-Change-Number: 20769
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 19:36:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is null/empty

2023-12-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20769 )

Change subject: IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is 
null/empty
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20769/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20769/5//COMMIT_MSG@13
PS5, Line 13: set with null, while absolute_path may not be set and left as 
null.
> Could you please elaborate on how and why this happens?
Done


http://gerrit.cloudera.org:8080/#/c/20769/5/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/20769/5/tests/query_test/test_scanners.py@1993
PS5, Line 1993: assert res.data[0].split("\t")[0] == (hdfs_file_path + '/')
> Maybe we could check the result of the count(*) at least.
Done



--
To view, visit http://gerrit.cloudera.org:8080/20769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15d811430871cd1319d9560901318c1c3b89403c
Gerrit-Change-Number: 20769
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 19:36:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12614: Use atomics for 64-bit host metrics

2023-12-12 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20776 )

Change subject: IMPALA-12614: Use atomics for 64-bit host metrics
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88a15d3ccfeab29506427b3bfcaeebf3a2831176
Gerrit-Change-Number: 20776
Gerrit-PatchSet: 2
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 12 Dec 2023 19:33:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12617: Fix DCHECK failure for Statestore

2023-12-12 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20775 )

Change subject: IMPALA-12617: Fix DCHECK failure for Statestore
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91423f3de2d64cb617a06ea7adbe5ee2937bde66
Gerrit-Change-Number: 20775
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 12 Dec 2023 19:35:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is null/empty

2023-12-12 Thread Riza Suminto (Code Review)
Hello Zoltan Borok-Nagy, Peter Rozsa, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20769

to look at the new patch set (#6).

Change subject: IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is 
null/empty
..

IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is null/empty

IMPALA-11507 add absolute_path field to FbFileDesc in
CatalogObjects.fbs. This field is used as a fallback for the
relative_path field when the relative path of an Iceberg data file
cannot be obtained by the table location path. relative_path is never
set with null, while absolute_path may not be set and left as null.

IMPALA-11507 missed a case where both FileDescriptor.getRelativePath()
and FbFileDesc.absolutePath() are null/empty. This patch fixes the bug
so that FileDescriptor.getPath() and FileDescriptor.getAbsolutePath() do
not fall back to FbFileDesc.absolutePath() if it is null/empty. With
this fix, FileDescriptor.getPath() and FileDescriptor.getAbsolutePath()
will not return null.

Testing:
- Add test_scanner.py::TestSingleFileTable
- Pass core tests

Change-Id: I15d811430871cd1319d9560901318c1c3b89403c
---
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M tests/query_test/test_scanners.py
2 files changed, 63 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/20769/6
--
To view, visit http://gerrit.cloudera.org:8080/20769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15d811430871cd1319d9560901318c1c3b89403c
Gerrit-Change-Number: 20769
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is null/empty

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20769 )

Change subject: IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is 
null/empty
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14678/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15d811430871cd1319d9560901318c1c3b89403c
Gerrit-Change-Number: 20769
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 20:02:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is null/empty

2023-12-12 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20769 )

Change subject: IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is 
null/empty
..


Patch Set 6: Code-Review+2

Thanks for fixing this so quickly, LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/20769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15d811430871cd1319d9560901318c1c3b89403c
Gerrit-Change-Number: 20769
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 20:05:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10008/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 20:10:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20760

to look at the new patch set (#10).

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..

IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

Part 2 had some limitations, most importantly it could not update
Iceberg tables if any of the following were true:
 * UPDATE value of partitioning column
 * UPDATE table that went through partition evolution
 * Table has SORT BY properties

The problem with partitions is that the delete record and new
data record might belong to different partitions and records
are shuffled across based on the partitions of the delete
records, hence the data files might not get written efficiently.

The problem with SORT BY properties, is that we need to write
the position delete files ordered by (file_path, position).

To address the above problems, this patch introduces a new
backend operator: IcebergBufferedDeleteSink. This new operator
extracts and aggregates the delete record information from
the incoming row batches, then in FlushFinal it orders the
position delete records and writes them out to files. This
mechanism is similar to Hive's approach:
https://github.com/apache/hive/pull/3251

IcebergBufferedDeleteSink cannot spill to disk, so it can only
run if there's enough memory to store the delete records. Paths
are stored only once, and the int64_t positions are stored in
a vector, so updating 100 Million records per node should require
around 800MBs + (100K) filepaths ~= 820 MBs of memory per node.
Spilling could be added later, but currently the need for it is not
too realistic.

Now records can get shuffled around based on the new data records'
partition values, and the SORT operator sorts the records based on
the SORT BY properties.

There's only one case we don't allow the UPDATE statement:
 * UPDATE partition column AND
 * Right-hand side of assignment is non-constant expression AND
 * UPDATE statement has a JOIN

When all of the above conditions meet, it would be possible to
have an incorrect JOIN condition that has multiple matches for the
data records, then the duplicated records would be shuffled
independently (based on the new partition value) to different
backend SINKs, and the different backend SINK would not be able
to detect the duplicates.

If any of the above conditions was false, then the duplicated records
would be shuffled together to the same SINK, that could do the
duplicate check.

This patch also moves some code from IcebergDeleteSink to the
newly introduced IcebergDeleteSinkBase.

Testing:
 * planner tests
 * e2e tests
 * Impala/Hive interop tests

Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
---
M be/src/exec/CMakeLists.txt
M be/src/exec/data-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
A be/src/exec/iceberg-buffered-delete-sink.cc
A be/src/exec/iceberg-buffered-delete-sink.h
A be/src/exec/iceberg-delete-sink-base.cc
A be/src/exec/iceberg-delete-sink-base.h
A be/src/exec/iceberg-delete-sink-config.cc
A be/src/exec/iceberg-delete-sink-config.h
M be/src/exec/iceberg-delete-sink.cc
M be/src/exec/iceberg-delete-sink.h
M be/src/exec/table-sink-base.cc
M be/src/exec/table-sink-base.h
M be/src/exprs/slot-ref.h
M be/src/runtime/dml-exec-state.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java
M fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
A fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-update.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-partitions.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-stress.test
M tests/query_test/test_iceberg.py
M tests/stress/test_update_stress.py
35 files changed, 1,956 insertions(+), 344 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/20760/10
--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Br

[Impala-ASF-CR] IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is null/empty

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20769 )

Change subject: IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is 
null/empty
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15d811430871cd1319d9560901318c1c3b89403c
Gerrit-Change-Number: 20769
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 20:25:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is null/empty

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20769 )

Change subject: IMPALA-12589: Ignore FbFileDesc.absolutePath() if it is 
null/empty
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10009/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15d811430871cd1319d9560901318c1c3b89403c
Gerrit-Change-Number: 20769
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 20:25:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@310
PS5, Line 310: while
> optional: Not sure if this worth the effort, but it would be nice to make t
Good idea, I switched to make this a two-layer map.


http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/20533/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2610
PS5, Line 2610: anways
> typo
Done



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 20:22:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Joe McDonnell (Code Review)
Hello Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, 
Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20533

to look at the new patch set (#6).

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..

IMPALA-12463: Batch non-consecutive table events in the event processor

The current batching of events requires events to be
consecutive. When there are multiple tables being modified,
events can be interleaved such that each batch is very
small. If the batching criteria can be relaxed, the
non-consecutive events could be batched and processed more
efficiently.

This implements batching for non-consecutive events by
keeping state on each table individually. Different tables
can continue to accumulate batchable events independently
unless they hit a condition that cuts the batch. The batching
can ignore some events on unrelated tables, but the same
rules apply about the batching of events on an individual table.
For example, for a particular table, any non-INSERT event
between two INSERT events on that table continues to cut the
batching.

In addition, there are certain cross-table events that need to
cut batches across multiple tables:
 1. Drop database / alter database cuts any batches on tables
in the affected database.
 2. Alter table rename cuts any batches on the source or destination
table.

This emits events in monotonically increasing order by
maintaining the resulting events in a sorted map. All
non-batchable events will be emitted in the original order.
Batchable events are emitted based on the ending Event ID
of the batch. This means that batchable events can move
later in the sequence, but they cannot move earlier.

This is based on the original design by Wenzhe Zhou.

Testing:
 - MetastoreEventsProcessorTest has new tests for interleaved
   events on two tables as well as tests for events that
   cut batches across tables (alter table, drop database,
   alter database).
 - A core job showed no other test failures.

Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
2 files changed, 366 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/6
--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12443: Add catalog timeline for all ddl profiles

2023-12-12 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20491 )

Change subject: IMPALA-12443: Add catalog timeline for all ddl profiles
..


Patch Set 9:

(2 comments)

Thanks for working on this, it will be extremely useful.
Looked at the Iceberg-related parts.

http://gerrit.cloudera.org:8080/#/c/20491/9/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/20491/9/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@405
PS9, Line 405: GroupedContentFiles icebergFiles = 
IcebergUtil.getIcebergFiles(this,
 : new ArrayList<>(), /*timeTravelSpec=*/null);
I think it'd be worth to mark an event after this since the planFiles() call in 
getIcebergFiles() can take a significant time.


http://gerrit.cloudera.org:8080/#/c/20491/9/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20491/9/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7109
PS9, Line 7109: catalogTimeline.markEvent("Executed Iceberg operation");
Can we include update.getIceberg_operation() in the event?



--
To view, visit http://gerrit.cloudera.org:8080/20491
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5
Gerrit-Change-Number: 20491
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 20:26:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14679/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 20:37:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12617: Fix DCHECK failure for Statestore

2023-12-12 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20775 )

Change subject: IMPALA-12617: Fix DCHECK failure for Statestore
..


Patch Set 1:

Thanks Riza and Michael for your quick review.


--
To view, visit http://gerrit.cloudera.org:8080/20775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91423f3de2d64cb617a06ea7adbe5ee2937bde66
Gerrit-Change-Number: 20775
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 12 Dec 2023 20:46:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14680/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 20:51:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10010/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 14
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 12 Dec 2023 21:31:53 +
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-11157: Rename hadoop to hadoop-client

2023-12-12 Thread Michael Smith (Code Review)
Hello Laszlo Gaal, Joe McDonnell,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20777

to look at the new patch set (#2).

Change subject: IMPALA-11157: Rename hadoop to hadoop-client
..

IMPALA-11157: Rename hadoop to hadoop-client

The hadoop build only produces client binaries, not a full hadoop build.
The name was therefore misleading, and could not replace the full build
of hadoop required by Impala. Impala's toolchain bootstrap process would
then fail if we tried to include two packages named "hadoop" when
overriding the download URL via IMPALA_HADOOP_URL.

Renames hadoop to hadoop-client to clarify its contents and avoid
conflicts with a full hadoop build.

Change-Id: I9049efb6fa83a692a6c242ebdf1b7180ee396469
---
M buildall.sh
R source/hadoop-client/build.sh
2 files changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/77/20777/2
--
To view, visit http://gerrit.cloudera.org:8080/20777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9049efb6fa83a692a6c242ebdf1b7180ee396469
Gerrit-Change-Number: 20777
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-12 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 14: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 14
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 12 Dec 2023 21:30:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12614: Use atomics for 64-bit host metrics

2023-12-12 Thread Kurt Deschler (Code Review)
Hello Andrew Sherman, Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20776

to look at the new patch set (#3).

Change subject: IMPALA-12614: Use atomics for 64-bit host metrics
..

IMPALA-12614: Use atomics for 64-bit host metrics

This patch changes 64-bit host metrics to use atomics to avoid potential
partial load/store data races. There are no functional changes. This
issue was exposed by TSAN tests when IMPALA-12385 enabled periodic
metrics by default.

Testing: TSAN tests cover this case.
Change-Id: I88a15d3ccfeab29506427b3bfcaeebf3a2831176
---
M be/src/runtime/query-state.cc
M be/src/util/system-state-info-test.cc
M be/src/util/system-state-info.cc
M be/src/util/system-state-info.h
4 files changed, 18 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/20776/3
--
To view, visit http://gerrit.cloudera.org:8080/20776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88a15d3ccfeab29506427b3bfcaeebf3a2831176
Gerrit-Change-Number: 20776
Gerrit-PatchSet: 3
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-12614: Use atomics for 64-bit host metrics

2023-12-12 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20776 )

Change subject: IMPALA-12614: Use atomics for 64-bit host metrics
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88a15d3ccfeab29506427b3bfcaeebf3a2831176
Gerrit-Change-Number: 20776
Gerrit-PatchSet: 3
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 12 Dec 2023 21:42:38 +
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-3192: Add fetch.sh to download prebuilt artifacts

2023-12-12 Thread Michael Smith (Code Review)
Hello Laszlo Gaal, Joe McDonnell,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20778

to look at the new patch set (#2).

Change subject: IMPALA-3192: Add fetch.sh to download prebuilt artifacts
..

IMPALA-3192: Add fetch.sh to download prebuilt artifacts

Adds fetch.sh to download prebuilt artifacts from an existing toolchain
build. Requires a TOOLCHAIN_BUILD_ID (can be found in Impala's
impala-config.sh).

To get compilation tools, run

./fetch.sh $TOOLCHAIN_BUILD_ID

then download specific packages with

./fetch.sh $TOOLCHAIN_BUILD_ID zlib 1.2.13 zstd 1.5.2

Change-Id: I2914bd7d5e0589200d114218fbe107bec0b1f8a2
---
M README.md
A fetch.sh
M init-compiler.sh
M init.sh
4 files changed, 110 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/78/20778/2
--
To view, visit http://gerrit.cloudera.org:8080/20778
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2914bd7d5e0589200d114218fbe107bec0b1f8a2
Gerrit-Change-Number: 20778
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 6: Code-Review+1

(1 comment)

Thanks for working on this.

http://gerrit.cloudera.org:8080/#/c/20533/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20533/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@347
PS6, Line 347: must
nit: duplicated



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 22:05:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3825: Delegate runtime filter aggregation to some executors

2023-12-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20612 )

Change subject: IMPALA-3825: Delegate runtime filter aggregation to some 
executors
..


Patch Set 11:

ps11 is a rebase.


--
To view, visit http://gerrit.cloudera.org:8080/20612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0
Gerrit-Change-Number: 20612
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 12 Dec 2023 22:15:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20533/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20533/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@347
PS6, Line 347: must
> nit: duplicated
Thanks for pointing that out, fixed



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 12 Dec 2023 22:22:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-12 Thread Joe McDonnell (Code Review)
Hello Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, Csaba Ringhofer, 
Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20533

to look at the new patch set (#7).

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..

IMPALA-12463: Batch non-consecutive table events in the event processor

The current batching of events requires events to be
consecutive. When there are multiple tables being modified,
events can be interleaved such that each batch is very
small. If the batching criteria can be relaxed, the
non-consecutive events could be batched and processed more
efficiently.

This implements batching for non-consecutive events by
keeping state on each table individually. Different tables
can continue to accumulate batchable events independently
unless they hit a condition that cuts the batch. The batching
can ignore some events on unrelated tables, but the same
rules apply about the batching of events on an individual table.
For example, for a particular table, any non-INSERT event
between two INSERT events on that table continues to cut the
batching.

In addition, there are certain cross-table events that need to
cut batches across multiple tables:
 1. Drop database / alter database cuts any batches on tables
in the affected database.
 2. Alter table rename cuts any batches on the source or destination
table.

This emits events in monotonically increasing order by
maintaining the resulting events in a sorted map. All
non-batchable events will be emitted in the original order.
Batchable events are emitted based on the ending Event ID
of the batch. This means that batchable events can move
later in the sequence, but they cannot move earlier.

This is based on the original design by Wenzhe Zhou.

Testing:
 - MetastoreEventsProcessorTest has new tests for interleaved
   events on two tables as well as tests for events that
   cut batches across tables (alter table, drop database,
   alter database).
 - A core job showed no other test failures.

Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
2 files changed, 366 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/7
--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 


  1   2   >