[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-13 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 19:

> Patch Set 19:
>
> I'm confused why these changes could affect test_orc_stats, but I noticed 
> another patch that encountered the same situation: 
> https://jenkins.impala.io/job/gerrit-verify-dryrun/10031/
> Could it be due to some changes causing this test to become unstable?

I think it's unrelated to this patch. Just filed IMPALA-12630.


--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 19
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 14 Dec 2023 07:19:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 19:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10033/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 19
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 14 Dec 2023 07:19:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-13 Thread Zihao Ye (Code Review)
Zihao Ye has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 19:

I'm confused why these changes could affect test_orc_stats, but I noticed 
another patch that encountered the same situation: 
https://jenkins.impala.io/job/gerrit-verify-dryrun/10031/
Could it be due to some changes causing this test to become unstable?


--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 19
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 14 Dec 2023 07:15:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10032/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 11
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 14 Dec 2023 06:57:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 11: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 11
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 14 Dec 2023 06:57:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 10: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10031/


--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 10
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 14 Dec 2023 04:54:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11501: Add flag to allow catalog-cache operations on masked tables

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20742 )

Change subject: IMPALA-11501: Add flag to allow catalog-cache operations on 
masked tables
..

IMPALA-11501: Add flag to allow catalog-cache operations on masked tables

REFRESH/INVALIDATE METADATA  are the table level catalog-cache
operations. In Hive-Ranger plugin, when a table is masked (either by
column-masking or row-filtering policy) for a user, the user is unable
to perform any modification (insert/delete/update) on the table, i.e.
it's considered a read-only user (RANGER-1087, RANGER-1100). However,
Hive doesn't have these catalog-cache operations. It's a grey area
whether they should be blocked.

Before this patch, these catalog-cache operations are considered as
modifications on the table so they are also blocked for masked users.
Table metadata is required to be loaded so we have the column names to
fetch Ranger column masking policies. This causes a performance
regression on INVALIDATE METADATA  commands since in older
versions (e.g. CDH), IM commands don't need to load the table metadata
and runs pretty fast.

This patch adds a flag, allow_catalog_cache_op_from_masked_users, for
coordinators to skip checking masking policies for such statements. When
this flag is on, coordinators don't need to load the table metadata thus
fix the performance regression as well.

Note that Ranger ownership based policies can't be applied correctly
when the table is unloaded (so the owner is unknown). REFRESH/INVALIDATE
METADATA  commands could be denied on owners even if there are
Ranger policies allowing the owner's operations. This is a known issue
since IMPALA-8228. To ensure a user can always perform these operations,
grant REFRESH privilege to them to get rid of the unloaded table issue.

This patch also fixes a bug in local catalog mode which only occurs
after adding the new flag. The bug is that LocalDb#getTableIfCached()
doesn't make good use of the cache. If the table meta is cahced but
LocalDb#getTable() hasn't been invoked on the table, getTableIfCached()
will always return a LocalIncompleteTable which is missing some table
info, e.g. ownership. This causes REFRESH/INVALIDATE statements not able
to pass the ownership context to RangerAccessResourceImpl so ownership
policies can't be correctly applied.

Ideally, LocalDb#getTableIfCached() should return a LocalTable instance
if the table is cached. However, in local catalog mode, we don't cache
everything that constructs a LocalTable instance. Constructing a
LocalTable instance might still trigger external RPCs which should be
avoided. As an alternative, this patch checks if the msTable is cached.
If it's cached, add it to the LocalIncompleteTable instance so most of
the table info can be retrieved, including the ownership string.

Tests:
 - Add e2e tests on both the legacy and local catalog mode.

Change-Id: I45935654cbf05a55d740f1b04781022c271f7678
Reviewed-on: http://gerrit.cloudera.org:8080/20742
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/service/frontend.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java
M fe/src/main/java/org/apache/impala/authorization/BaseAuthorizationChecker.java
M fe/src/main/java/org/apache/impala/authorization/Privilege.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M tests/authorization/test_ranger.py
15 files changed, 188 insertions(+), 12 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/20742
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I45935654cbf05a55d740f1b04781022c271f7678
Gerrit-Change-Number: 20742
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11501: Add flag to allow catalog-cache operations on masked tables

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20742 )

Change subject: IMPALA-11501: Add flag to allow catalog-cache operations on 
masked tables
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20742
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I45935654cbf05a55d740f1b04781022c271f7678
Gerrit-Change-Number: 20742
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 14 Dec 2023 03:58:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12618: Update README.md to reduce emphasis on Hadoop

2023-12-13 Thread Michael Smith (Code Review)
Michael Smith has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20783 )

Change subject: IMPALA-12618: Update README.md to reduce emphasis on Hadoop
..

IMPALA-12618: Update README.md to reduce emphasis on Hadoop

The README.md file is displayed on the github home page
https://github.com/apache/impala Change this so that the opening line
mentions “data stored in open data and table formats” instead of “data
stored in Apache Hadoop clusters“. Also add Iceberg as the first
mentioned place where data can be stored.

Change-Id: I35e91611374f20ec6a540d4c3bf5ae375ac38bce
Reviewed-on: http://gerrit.cloudera.org:8080/20783
Reviewed-by: Quanlong Huang 
Tested-by: Michael Smith 
---
M README.md
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Quanlong Huang: Looks good to me, approved
  Michael Smith: Verified

--
To view, visit http://gerrit.cloudera.org:8080/20783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I35e91611374f20ec6a540d4c3bf5ae375ac38bce
Gerrit-Change-Number: 20783
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12618: Update README.md to reduce emphasis on Hadoop

2023-12-13 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20783 )

Change subject: IMPALA-12618: Update README.md to reduce emphasis on Hadoop
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35e91611374f20ec6a540d4c3bf5ae375ac38bce
Gerrit-Change-Number: 20783
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 14 Dec 2023 03:28:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11157: Switch to hadoop-client build

2023-12-13 Thread Michael Smith (Code Review)
Michael Smith has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20779 )

Change subject: IMPALA-11157: Switch to hadoop-client build
..

IMPALA-11157: Switch to hadoop-client build

The hadoop build only produces client binaries, not a full hadoop build.
The name was therefore misleading, and could not replace the full build
of hadoop required by Impala. Impala's toolchain bootstrap process would
then fail if we tried to include two packages named "hadoop" when
overriding the download URL via IMPALA_HADOOP_URL.

Renames hadoop to hadoop-client to clarify its contents and avoid
conflicts with a full hadoop build.

Change-Id: Ia50b5151e5339b06ae2b623a4b2090ae6708491f
Reviewed-on: http://gerrit.cloudera.org:8080/20779
Tested-by: Impala Public Jenkins 
Reviewed-by: Joe McDonnell 
Reviewed-by: Quanlong Huang 
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M buildall.sh
3 files changed, 7 insertions(+), 8 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Joe McDonnell: Looks good to me, approved
  Quanlong Huang: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/20779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia50b5151e5339b06ae2b623a4b2090ae6708491f
Gerrit-Change-Number: 20779
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14714/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 15
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 14 Dec 2023 03:12:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20367/15/tests/custom_cluster/test_sync_to_latest_hms_events.py
File tests/custom_cluster/test_sync_to_latest_hms_events.py:

http://gerrit.cloudera.org:8080/#/c/20367/15/tests/custom_cluster/test_sync_to_latest_hms_events.py@581
PS15, Line 581:
flake8: W292 no newline at end of file



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 15
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 14 Dec 2023 02:45:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2023-12-13 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, k.venureddy2...@gmail.com, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20367

to look at the new patch set (#15).

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..

IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

The idea is that when any DDL/DML operation is performed by Impala, it
also syncs the db/table to its latest event ID as per HMS. This way
updates to a db/table's are applied in the same order as they appear in
the Notification log table in HMS which ensures consistency. Currently
catalogD applies any updates received from Impala clients in-place.
Instead it should perform an HMS operation first and then replay all
the HMS events since the last synced event id.

Implementation: when the enable_sync_to_latest_event_on_ddls flag is
set to true, we do the DDL/DML operation first, i.e., perform HMS
operation and then sync the db/table in the catalogD's cache to the
latest event in HMS for the corresponding db/table. Currently we fetch
all events greater than the db/table's lastSyncEventId and filter them
in the events processor to sync only the current db/table events. Once
HIVE-27499 is implemented, we can directly fetch the events only for
the respective db/table and process them. Currently, there is no
efficient way to identify if there are pending events for a db/table.

Set 'enable_sync_to_latest_event_on_ddls' to true. Also, set
'file_metadata_reload_properties' to empty string to avoid data
inconsistencies.

Note: We don't modify the cache using MetastoreEventsProcessor for
alter table rename operation as this is a complex operation regarding
cache modification (IMPALA-12553 has more details about this) . We
don't modify cache using above process for 'refresh table' or
'invalidate metadata table' commands.

Testing:
1) Added few tests in the MetaStoreEventProcessorForTest to verify this
feature that simulates the metadata sync between HMS and Impala.
2) Added few tests in the CatalogHmsSyncToLatestEventIdTest class to
the metadata sync between HMS end point, Catalog Metastore Server and
Impala. The HMS end point serves as common interface to metadata
changes outside the current Impala service such as Hive, Spark or other
Impala service. Also verified the table lastSyncEventId is updated
after the events are sync and confirmed that metastore event processor
ignored these synced events.

Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
---
M be/src/catalog/catalog-server.cc
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
A tests/custom_cluster/test_sync_to_latest_hms_events.py
M tests/metadata/test_recover_partitions.py
13 files changed, 1,134 insertions(+), 125 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/20367/15
--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 15
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2023-12-13 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20367/14/tests/custom_cluster/test_sync_to_latest_hms_events.py
File tests/custom_cluster/test_sync_to_latest_hms_events.py:

http://gerrit.cloudera.org:8080/#/c/20367/14/tests/custom_cluster/test_sync_to_latest_hms_events.py@37
PS14, Line 37: --file_metadata_reload_properties=''
> I'm still understanding why we need this in some tests. Do those tests depe
This is a real problem with queries involving the 'Insert or Insert overwrite' 
command. This command generates an alter table event followed by an insert 
event. if the numRows don't change then we cannot detect if need to reload file 
metadata. We need to detect that an alter table event is generated because of 
an insert query and reload file metadata accordingly.
Below is an example where we cannot detect whether to reload file metadata or 
not:
create table tb1(i int); (Query run in Impala)
insert into tb1 values (1); (Query run in Hive)
Insert overwrite table tb1 values (2); (Query run in Hive)
Select * from tb1; (Query run in Impala) -- The output comes out as '1' instead 
of '2'.

Reason:
-> For the first insert query, we get 2 events, alter table and insert event, 
alter table event has numRows property changed, so we reload file metadata and 
update the lastSyncEventId on table, then the insert event gets skipped.
-> For the second insert overwrite query, we get 2 events, alter table and 
insert event, since the numRows are changed (even though underlying data 
changed), we cannot detect if file metadata needs to be reloaded, so we process 
this event without reloading file metadata and update the lastSyncEventId on 
table, then the insert event gets skipped. As a result, we get data correctness 
issues.

I believe the solution to this issue is to fix the Alter table event in the 
metastore, to indicate that this event is triggered because of an insert event 
then we can simply reload file metadata.



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 15
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 14 Dec 2023 02:44:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20753 )

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
Gerrit-Change-Number: 20753
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 14 Dec 2023 02:15:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11157: Switch to hadoop-client build

2023-12-13 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20779 )

Change subject: IMPALA-11157: Switch to hadoop-client build
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia50b5151e5339b06ae2b623a4b2090ae6708491f
Gerrit-Change-Number: 20779
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 14 Dec 2023 01:51:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11157: Switch to hadoop-client build

2023-12-13 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20779 )

Change subject: IMPALA-11157: Switch to hadoop-client build
..


Patch Set 3: Code-Review+2

This looks good to me


--
To view, visit http://gerrit.cloudera.org:8080/20779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia50b5151e5339b06ae2b623a4b2090ae6708491f
Gerrit-Change-Number: 20779
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Thu, 14 Dec 2023 01:50:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12618: Update README.md to reduce emphasis on Hadoop

2023-12-13 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20783 )

Change subject: IMPALA-12618: Update README.md to reduce emphasis on Hadoop
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35e91611374f20ec6a540d4c3bf5ae375ac38bce
Gerrit-Change-Number: 20783
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 14 Dec 2023 01:38:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12618: Update README.md to reduce emphasis on Hadoop

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20783 )

Change subject: IMPALA-12618: Update README.md to reduce emphasis on Hadoop
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14713/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35e91611374f20ec6a540d4c3bf5ae375ac38bce
Gerrit-Change-Number: 20783
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 14 Dec 2023 01:39:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2023-12-13 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@400
PS7, Line 400: } else if (next instanceof AlterTableEvent) {
Shouldn't we consider create/drop table events as batch-breaking events?


http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@405
PS7, Line 405: flushBatchForTable(pendingTableEventsMap, 
sortedFinalBatches, beforeTable);
IMO, we should also consider the scenario where the rename table happens across 
different databases also and flush the corresponding events.
Eg: Alter table rename db1.tb1 to db2.tb2;


http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@408
PS7, Line 408: // an invalid scenario, because the destination must 
not exist at the time
This is a possible scenario, In the current queue, There are table events for 
t1, table events for t2, drop event for t1, rename event from t2 to t1.


http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@426
PS7, Line 426:   dbMap = pendingTableEventsMap.get(dbName);
Shouldn't we just assign a new HashMap<>() directly to the dbMap variable?



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 14 Dec 2023 01:15:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) IMPALA-12619: Update Impala website to reduce emphasis on Hadoop

2023-12-13 Thread Andrew Sherman (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20782

to look at the new patch set (#3).

Change subject: IMPALA-12619: Update Impala website to reduce emphasis on Hadoop
..

IMPALA-12619: Update Impala website to reduce emphasis on Hadoop

The Impala website at ASF https://impala.apache.org/  is the first hit
returned for “Apache Impala”. Update the first line of the description
to say ="Apache Impala is a modern, open source, distributed SQL query
engine for open data and table formats." instead of "Apache Impala is a
modern, open source, distributed SQL query engine for Apache Hadoop."
Also mention Ranger instead of Sentry, and add references to Iceberg.

Change-Id: I2d63bbbc87375345eaf58989a59f704dbb9559fd
---
M index.html
1 file changed, 5 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/20782/3
--
To view, visit http://gerrit.cloudera.org:8080/20782
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2d63bbbc87375345eaf58989a59f704dbb9559fd
Gerrit-Change-Number: 20782
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12618: Update README.md to reduce emphasis on Hadoop

2023-12-13 Thread Andrew Sherman (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20783

to look at the new patch set (#2).

Change subject: IMPALA-12618: Update README.md to reduce emphasis on Hadoop
..

IMPALA-12618: Update README.md to reduce emphasis on Hadoop

The README.md file is displayed on the github home page
https://github.com/apache/impala Change this so that the opening line
mentions “data stored in open data and table formats” instead of “data
stored in Apache Hadoop clusters“. Also add Iceberg as the first
mentioned place where data can be stored.

Change-Id: I35e91611374f20ec6a540d4c3bf5ae375ac38bce
---
M README.md
1 file changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/20783/2
--
To view, visit http://gerrit.cloudera.org:8080/20783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I35e91611374f20ec6a540d4c3bf5ae375ac38bce
Gerrit-Change-Number: 20783
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-11157: Switch to hadoop-client build

2023-12-13 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20779 )

Change subject: IMPALA-11157: Switch to hadoop-client build
..


Patch Set 3:

Passed an ARM test run as well: 
https://jenkins.impala.io/job/ubuntu-20.04-from-scratch-ARM/66/


--
To view, visit http://gerrit.cloudera.org:8080/20779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia50b5151e5339b06ae2b623a4b2090ae6708491f
Gerrit-Change-Number: 20779
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Thu, 14 Dec 2023 00:34:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-13 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 9:

> Patch Set 9: Verified-1
>
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10023/

This seems a flaky test tracked by IMPALA-12416:

custom_cluster/test_events_custom_configs.py:375: in test_skipping_older_events
verify_skipping_older_events(test_old_table, False, False)
custom_cluster/test_events_custom_configs.py:355: in 
verify_skipping_older_events
query.format(unique_database, table_name), table_name)
custom_cluster/test_events_custom_configs.py:342: in 
verify_skipping_hive_stmt_events
assert tbl_events_skipped_after > tbl_events_skipped_before
E   assert 1 > 1


--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 14 Dec 2023 00:21:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10031/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 10
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 14 Dec 2023 00:21:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 10
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 14 Dec 2023 00:21:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20773 )

Change subject: IMPALA-12229: Support soft-delete Kudu table
..

IMPALA-12229: Support soft-delete Kudu table

Adds 'kudu_table_reserve_seconds' query option to set reserved time
for deleted Impala managed Kudu tables. The default value is 0.
This option can prevent users from deleting important Kudu tables
by mistake.

Testing:
- Added e2e tests.

Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Reviewed-on: http://gerrit.cloudera.org:8080/20773
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CatalogService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M infra/python/deps/kudu-requirements.txt
M tests/query_test/test_kudu.py
10 files changed, 112 insertions(+), 17 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/20773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Gerrit-Change-Number: 20773
Gerrit-PatchSet: 6
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 


[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20773 )

Change subject: IMPALA-12229: Support soft-delete Kudu table
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Gerrit-Change-Number: 20773
Gerrit-PatchSet: 5
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Thu, 14 Dec 2023 00:12:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11501: Add flag to allow catalog-cache operations on masked tables

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20742 )

Change subject: IMPALA-11501: Add flag to allow catalog-cache operations on 
masked tables
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10030/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20742
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I45935654cbf05a55d740f1b04781022c271f7678
Gerrit-Change-Number: 20742
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 13 Dec 2023 23:32:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11501: Add flag to allow catalog-cache operations on masked tables

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20742 )

Change subject: IMPALA-11501: Add flag to allow catalog-cache operations on 
masked tables
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20742
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I45935654cbf05a55d740f1b04781022c271f7678
Gerrit-Change-Number: 20742
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 13 Dec 2023 23:32:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11501: Add flag to allow catalog-cache operations on masked tables

2023-12-13 Thread Fang-Yu Rao (Code Review)
Fang-Yu Rao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20742 )

Change subject: IMPALA-11501: Add flag to allow catalog-cache operations on 
masked tables
..


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20742/2/tests/authorization/test_ranger.py
File tests/authorization/test_ranger.py:

http://gerrit.cloudera.org:8080/#/c/20742/2/tests/authorization/test_ranger.py@1615
PS2, Line 1615: ida
> Yeah, just wanted to use a short grant statement. I can change it to the mi
Ack



--
To view, visit http://gerrit.cloudera.org:8080/20742
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I45935654cbf05a55d740f1b04781022c271f7678
Gerrit-Change-Number: 20742
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 13 Dec 2023 23:28:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 9:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10027/


--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Wed, 13 Dec 2023 23:22:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3825: Delegate runtime filter aggregation to some executors

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20612 )

Change subject: IMPALA-3825: Delegate runtime filter aggregation to some 
executors
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14712/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0
Gerrit-Change-Number: 20612
Gerrit-PatchSet: 15
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 13 Dec 2023 22:49:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3825: Delegate runtime filter aggregation to some executors

2023-12-13 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20612 )

Change subject: IMPALA-3825: Delegate runtime filter aggregation to some 
executors
..


Patch Set 15: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0
Gerrit-Change-Number: 20612
Gerrit-PatchSet: 15
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 13 Dec 2023 22:28:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3825: Delegate runtime filter aggregation to some executors

2023-12-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20612 )

Change subject: IMPALA-3825: Delegate runtime filter aggregation to some 
executors
..


Patch Set 15:

(5 comments)

Thank you, Michael.

http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/query-state.cc@271
PS14, Line 271:   // Making a copy of the "filepath to hosts" mapping into std 
library types.
> This comment doesn't really explain why this is necessary.
I'm not sure either. This is from added by IMPALA-12308 
https://gerrit.cloudera.org/c/20548/
Not part of this patch.


http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h
File be/src/runtime/runtime-filter-bank.h:

http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h@107
PS14, Line 107: /// selected as intermediate filter aggregator to help 
coordinator. Besides doing
> nit: remove "of", so it says "Besides doing"
Done


http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h@108
PS14, Line 108: /// local aggregation, each intermediate aggregator will also 
listen and aggregate
> grammar: "each intermediate aggregator will"
Done


http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h@109
PS14, Line 109: /// filter updates from at most 
MAX_NUM_FILTERS_AGGREGATED_PER_HOST-1 other executors.
> "filter updates from"
Done


http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h@110
PS14, Line 110: /// Intermediate aggregator then sends the aggregated filter 
update to coordinator for
> "then sends the"
Done



--
To view, visit http://gerrit.cloudera.org:8080/20612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0
Gerrit-Change-Number: 20612
Gerrit-PatchSet: 15
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 13 Dec 2023 22:21:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3825: Delegate runtime filter aggregation to some executors

2023-12-13 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Abhishek Rawat, Csaba Ringhofer, Michael Smith, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20612

to look at the new patch set (#15).

Change subject: IMPALA-3825: Delegate runtime filter aggregation to some 
executors
..

IMPALA-3825: Delegate runtime filter aggregation to some executors

IMPALA-4400 improve the runtime filter by aggregating runtime filters
locally before sending filter update to the coordinator and sharing a
single RuntimeFilterBank for all fragment instances in a query. However,
local filter aggregation is still insufficient if the number of nodes in
an impala cluster is large. For example, in a cluster of around 700
impalad backends, aggregation of 1 MB bloom filter updates in the
coordinator can exceed more than 1 second.

This patch aims to reduce coordinator load and speed up runtime filter
aggregation by doing intermediate aggregation in a few designated impala
backends before doing final aggregation and publishing in the
coordinator. Query option MAX_NUM_FILTERS_AGGREGATED_PER_HOST is added
to control this feature. Given N as the number of backend executors
excluding the coordinator, the selected number of intermediate
aggregators M = ceil(N / MAX_NUM_FILTERS_AGGREGATED_PER_HOST). Setting
MAX_NUM_FILTERS_AGGREGATED_PER_HOST <= 1 will disable the intermediate
aggregator feature. In the backend scheduler, M impalad will be selected
randomly as the intermediate aggregator for that runtime filter.
Information of this M selected impalad then passed from the scheduler to
coordinator as a RuntimeFilterAggregatorInfoPB. The coordinator then
converts the RuntimeFilterAggregatorInfoPB into a filter routing
information TRuntimeFilterAggDesc that is piggy-backed in
TRuntimeFilterSource.

A new RPC endpoint named UpdateFilterFromRemote is added in
data_stream_service.proto to handle filter updates from fellow impalad
executor to the designated aggregator impalad. This RPC will merge
filter updates into 'pending_remote_filter'. The intermediate aggregator
will then combine 'pending_remote_filter' with
'pending_merge_filter' (from local aggregation) into 'result_filter'
which is then sent to the coordinator. RuntimeFilterBank of the
intermediate aggregator will wait for all remote filter updates for at
least RUNTIME_FILTER_WAIT_TIME_MS. If RuntimeFilterBank is closing and
RUNTIME_FILTER_WAIT_TIME_MS has passed, any incomplete filter will be
marked as ALWAYS_TRUE and sent to the coordinator.

This patch currently targets the bloom filter produced by partitioned
join build only. Another kind of runtime filter is still efficient to
aggregate in coordinator only, while the bloom filter from broadcast
join only requires 1 valid filter update for publishing.

test_runtime_filters.py is modified to clarify the exec_options
dimension, test matrix constraints, and reduce pytest.skip() calls on
each test. runtime_filters.test is also changed to use counter
aggregation and assert on ExecSummary table so that they stay valid
irrespective of the number of fragment instances.

We benchmark the aggregation speed of 1 MB runtime filter aggregation on
20 executor nodes cluster with MT_DOP=36 that is instrumented to disable
local aggregation, simulating 720 runtime filter updates. The speed is
approximated as the duration between the earliest time a filter update
is made and the time that the coordinator publishes the complete filter.
The result is following:

+-++
| num aggregator node | Aggregation speed (ms) |
+-++
|   0 |   1296 |
|   1 |   1229 |
|   2 |608 |
|   4 |329 |
|   8 |205 |
+-++

Testing:
- Exercise MAX_NUM_FILTERS_AGGREGATED_PER_HOST in
  test_runtime_filters.py and query-options-test.cc
- Add custom_cluster/test_runtime_filter_aggregation.py.
- Pass exhaustive end-to-end and custom-cluster tests.

Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0
---
M be/src/common/logging.h
M be/src/runtime/coordinator.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/network-util.h
M be/src/util/runtime-profile-counters.h
M 

[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 22: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 13 Dec 2023 22:19:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3825: Delegate runtime filter aggregation to some executors

2023-12-13 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20612 )

Change subject: IMPALA-3825: Delegate runtime filter aggregation to some 
executors
..


Patch Set 14:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/query-state.cc@271
PS14, Line 271:   // Making a copy of the "filepath to hosts" mapping into std 
library types.
This comment doesn't really explain why this is necessary.


http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h
File be/src/runtime/runtime-filter-bank.h:

http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h@107
PS14, Line 107: /// selected as intermediate filter aggregator to help 
coordinator. Besides of doing
nit: remove "of", so it says "Besides doing"


http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h@108
PS14, Line 108: /// local aggregation, each intermediate aggregators will also 
listen and aggregate
grammar: "each intermediate aggregator will"


http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h@109
PS14, Line 109: /// filter update from at most 
MAX_NUM_FILTERS_AGGREGATED_PER_HOST-1 other executors.
"filter updates from"


http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h@110
PS14, Line 110: /// Intermediate aggregator then send the aggregated filter 
update to coordinator for
"then sends the"



--
To view, visit http://gerrit.cloudera.org:8080/20612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0
Gerrit-Change-Number: 20612
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 13 Dec 2023 21:49:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20753 )

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10029/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
Gerrit-Change-Number: 20753
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Dec 2023 21:44:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20753 )

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14711/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
Gerrit-Change-Number: 20753
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Dec 2023 21:42:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20753 )

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14710/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
Gerrit-Change-Number: 20753
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Dec 2023 21:40:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12375: Make DataSource Object persistent

2023-12-13 Thread Anonymous Coward (Code Review)
gsi...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20768 )

Change subject: IMPALA-12375: Make DataSource Object persistent
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I500a99142bb62ce873e693d573064ad4ffa153ab
Gerrit-Change-Number: 20768
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 13 Dec 2023 21:18:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables

2023-12-13 Thread Gabor Kaszab (Code Review)
Hello Andrew Sherman, Tamas Mate, Daniel Becker, Zoltan Borok-Nagy, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20753

to look at the new patch set (#8).

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
..

IMPALA-12597: Basic Equality delete read support for Iceberg tables

In general, applying equality deletes is similar to how position
deletes are applied to data files: using a LEFT ANTI JOIN where the
SCAN for the data rows is on the left side while the SCAN for the
delete rows is on the right side of the JOIN. The difference is the
virtual columns and the conjuncts being used.
For equality deletes the data sequence number of a delete file has to
be greater than the data sequence number of the data file being
investigated. This information is added as a virtual column to the
scans and a conjunct is created in the JOIN node to check the relation.
The equality delete fields from the delete files are checked agains the
respective columns of the data SCANS.

This patch makes it possible for Impala to read Iceberg tables with
basic equality delete files. The Iceberg spec gives great flexibility
for engines for writing equality deletes, however in practice Flink,
one of the engines that write EQ-deletes supports only a subset of the
use cases. This patch focuses on reading the EQ-deletes written by
Flink.

The restrictions are the following:
- All equality delete files in a table should have the same equality
  field ID list.
- For partitioned Iceberg tables it is expected that the partition
  values are also written into the equality delete files.
- Tables with equality deletes shouldn't have partition or schema
  evolution.
- Floating point equality columns aren't supported.
- If a malformed equality delete file doesn't have some of the equality
  field IDs then Parquet reader will fill those missing fields with
  NULLs. As a side effect this will drop the rows from the result where
  the corresponding data columns has a null value.
See IMPALA-11388 epic Jira for more details.

Testing:
- Checked if the existing functional_parquet.iceberg_v2_delete_equality
  table can be read successfully.
- Added new test table so that E2E tests can validate correctness.

Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
---
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.h
M common/thrift/CatalogObjects.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
A fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java
A fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/delete-074a9e19e61b766e-652a169e0001_800513971_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/bb4b8c07-84e1-421a-bb6c-594f297d118e-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-3802179086205335895-1-3d36bf90-2625-4625-b09b-d4359b979df9.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-8985205515767142888-1-0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25.avro

[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables

2023-12-13 Thread Gabor Kaszab (Code Review)
Hello Andrew Sherman, Tamas Mate, Daniel Becker, Zoltan Borok-Nagy, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20753

to look at the new patch set (#7).

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
..

IMPALA-12597: Basic Equality delete read support for Iceberg tables

In general, applying equality deletes is similar to how position
deletes are applied to data files: using a LEFT ANTI JOIN where the
SCAN for the data rows is on the left side while the SCAN for the
delete rows is on the right side of the JOIN. The difference is the
virtual columns and the conjuncts being used.
For equality deletes the data sequence number of a delete file has to
be greater than the data sequence number of the data file being
investigated. This information is added as a virtual column to the
scans and a conjunct is created in the JOIN node to check the relation.
The equality delete fields from the delete files are checked agains the
respective columns of the data SCANS.

This patch makes it possible for Impala to read Iceberg tables with
basic equality delete files. The Iceberg spec gives great flexibility
for engines for writing equality deletes, however in practice Flink,
one of the engines that write EQ-deletes supports only a subset of the
use cases. This patch focuses on reading the EQ-deletes written by
Flink.

The restrictions are the following:
- All equality delete files in a table should have the same equality
  field ID list.
- For partitioned Iceberg tables it is expected that the partition
  values are also written into the equality delete files.
- Tables with equality deletes shouldn't have partition or schema
  evolution.
- Floating point equality columns aren't supported.
- If a malformed equality delete file doesn't have some of the equality
  field IDs then Parquet reader will fill those missing fields with
  NULLs. As a side effect this will drop the rows from the result where
  the corresponding data columns has a null value.
See IMPALA-11388 epic Jira for more details.

Testing:
- Checked if the existing functional_parquet.iceberg_v2_delete_equality
  table can be read successfully.
- Added new test table so that E2E tests can validate correctness.

Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
---
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.h
M common/thrift/CatalogObjects.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
A fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java
A fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/delete-074a9e19e61b766e-652a169e0001_800513971_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/bb4b8c07-84e1-421a-bb6c-594f297d118e-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-3802179086205335895-1-3d36bf90-2625-4625-b09b-d4359b979df9.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-8985205515767142888-1-0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25.avro

[Impala-ASF-CR] IMPALA-3825: Delegate runtime filter aggregation to some executors

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20612 )

Change subject: IMPALA-3825: Delegate runtime filter aggregation to some 
executors
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14709/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0
Gerrit-Change-Number: 20612
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 13 Dec 2023 20:55:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3825: Delegate runtime filter aggregation to some executors

2023-12-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20612 )

Change subject: IMPALA-3825: Delegate runtime filter aggregation to some 
executors
..


Patch Set 14:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20612/10/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/20612/10/be/src/runtime/runtime-filter-bank.cc@139
PS10, Line 139: esc.filter_id);
> If the reservation is claimed, then it is considered a fatal error if alloc
I think what you mean is, it is ok to allocate later as long as the whole 
total_bloom_filter_mem_required_ is already claimed. Is that correct?

ps14 move the initialization to UpdateFilterFromRemote().


http://gerrit.cloudera.org:8080/#/c/20612/10/be/src/runtime/runtime-filter-bank.cc@722
PS10, Line 722: HECK_EQ(0, produced_filter
> Few things about this:
If this can be a reassurrance, note that SendIncompleteFilters is only called 
when RuntimeFilterBank is closing.
RuntimeFilterBank lifetime is equal to query lifetime in that executor node. It 
is closing only if query is completed, or canceled. On both case, plan root 
sink is basically done, and runtime filter value does not matter anymore. 
Coordinator can just drop runtime filter update by then.

CombinePeerAndLocalUpdates() is done here for correctness. It cleanup 
'pending_merge_filter' and 'pending_remote_filter' of 'produced_filter'.

This feature should be exercised in TestRuntimeFilters, TestBloomFilters, 
TestBloomFiltersOnParquet, and TestRuntimeRowFilters. And 
test_wait_time_cancellation is within TestRuntimeFilters.



--
To view, visit http://gerrit.cloudera.org:8080/20612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0
Gerrit-Change-Number: 20612
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 13 Dec 2023 20:31:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3825: Delegate runtime filter aggregation to some executors

2023-12-13 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Abhishek Rawat, Csaba Ringhofer, Michael Smith, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20612

to look at the new patch set (#14).

Change subject: IMPALA-3825: Delegate runtime filter aggregation to some 
executors
..

IMPALA-3825: Delegate runtime filter aggregation to some executors

IMPALA-4400 improve the runtime filter by aggregating runtime filters
locally before sending filter update to the coordinator and sharing a
single RuntimeFilterBank for all fragment instances in a query. However,
local filter aggregation is still insufficient if the number of nodes in
an impala cluster is large. For example, in a cluster of around 700
impalad backends, aggregation of 1 MB bloom filter updates in the
coordinator can exceed more than 1 second.

This patch aims to reduce coordinator load and speed up runtime filter
aggregation by doing intermediate aggregation in a few designated impala
backends before doing final aggregation and publishing in the
coordinator. Query option MAX_NUM_FILTERS_AGGREGATED_PER_HOST is added
to control this feature. Given N as the number of backend executors
excluding the coordinator, the selected number of intermediate
aggregators M = ceil(N / MAX_NUM_FILTERS_AGGREGATED_PER_HOST). Setting
MAX_NUM_FILTERS_AGGREGATED_PER_HOST <= 1 will disable the intermediate
aggregator feature. In the backend scheduler, M impalad will be selected
randomly as the intermediate aggregator for that runtime filter.
Information of this M selected impalad then passed from the scheduler to
coordinator as a RuntimeFilterAggregatorInfoPB. The coordinator then
converts the RuntimeFilterAggregatorInfoPB into a filter routing
information TRuntimeFilterAggDesc that is piggy-backed in
TRuntimeFilterSource.

A new RPC endpoint named UpdateFilterFromRemote is added in
data_stream_service.proto to handle filter updates from fellow impalad
executor to the designated aggregator impalad. This RPC will merge
filter updates into 'pending_remote_filter'. The intermediate aggregator
will then combine 'pending_remote_filter' with
'pending_merge_filter' (from local aggregation) into 'result_filter'
which is then sent to the coordinator. RuntimeFilterBank of the
intermediate aggregator will wait for all remote filter updates for at
least RUNTIME_FILTER_WAIT_TIME_MS. If RuntimeFilterBank is closing and
RUNTIME_FILTER_WAIT_TIME_MS has passed, any incomplete filter will be
marked as ALWAYS_TRUE and sent to the coordinator.

This patch currently targets the bloom filter produced by partitioned
join build only. Another kind of runtime filter is still efficient to
aggregate in coordinator only, while the bloom filter from broadcast
join only requires 1 valid filter update for publishing.

test_runtime_filters.py is modified to clarify the exec_options
dimension, test matrix constraints, and reduce pytest.skip() calls on
each test. runtime_filters.test is also changed to use counter
aggregation and assert on ExecSummary table so that they stay valid
irrespective of the number of fragment instances.

We benchmark the aggregation speed of 1 MB runtime filter aggregation on
20 executor nodes cluster with MT_DOP=36 that is instrumented to disable
local aggregation, simulating 720 runtime filter updates. The speed is
approximated as the duration between the earliest time a filter update
is made and the time that the coordinator publishes the complete filter.
The result is following:

+-++
| num aggregator node | Aggregation speed (ms) |
+-++
|   0 |   1296 |
|   1 |   1229 |
|   2 |608 |
|   4 |329 |
|   8 |205 |
+-++

Testing:
- Exercise MAX_NUM_FILTERS_AGGREGATED_PER_HOST in
  test_runtime_filters.py and query-options-test.cc
- Add custom_cluster/test_runtime_filter_aggregation.py.
- Pass exhaustive end-to-end and custom-cluster tests.

Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0
---
M be/src/common/logging.h
M be/src/runtime/coordinator.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/network-util.h
M be/src/util/runtime-profile-counters.h
M 

[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 11: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Dec 2023 20:22:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12581: Fix issue of ILIKE and IREGEXP don't work correctly with non-const pattern

2023-12-13 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20785 )

Change subject: IMPALA-12581: Fix issue of ILIKE and IREGEXP don't work 
correctly with non-const pattern
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20785/2/tests/query_test/test_exprs.py
File tests/query_test/test_exprs.py:

http://gerrit.cloudera.org:8080/#/c/20785/2/tests/query_test/test_exprs.py@316
PS2, Line 316: "SELECT count(*) FROM {0} WHERE 'ABC' ILIKE 
pattern_str".format(tbl_name))
Is there a transposed version we should test where the literal is on the 
right-hand side?



--
To view, visit http://gerrit.cloudera.org:8080/20785
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d66680f5a7660e6a41859754c4230f276e66712
Gerrit-Change-Number: 20785
Gerrit-PatchSet: 2
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 13 Dec 2023 20:19:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-13 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 19:

There was one test failure in TestOrcStats.test_orc_stats


--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 19
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 13 Dec 2023 19:57:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20773 )

Change subject: IMPALA-12229: Support soft-delete Kudu table
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10028/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Gerrit-Change-Number: 20773
Gerrit-PatchSet: 5
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Wed, 13 Dec 2023 19:45:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20773 )

Change subject: IMPALA-12229: Support soft-delete Kudu table
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Gerrit-Change-Number: 20773
Gerrit-PatchSet: 5
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Wed, 13 Dec 2023 19:45:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 19: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10024/


--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 19
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 13 Dec 2023 19:25:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-13 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 22: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20506/22/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/20506/22/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@710
PS22, Line 710: AnalyzesOk("alter table functional.alltypes change column 
int_col `汉字` int");
nit: move up to line 699

Similar feedback to the other AnalysisError that were changed to AnalyzesOk.



--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 13 Dec 2023 19:14:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12581: Fix issue of ILIKE and IREGEXP don't work correctly with non-const pattern

2023-12-13 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20785 )

Change subject: IMPALA-12581: Fix issue of ILIKE and IREGEXP don't work 
correctly with non-const pattern
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20785/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20785/2//COMMIT_MSG@7
PS2, Line 7: don't work
nit: not working


http://gerrit.cloudera.org:8080/#/c/20785/2//COMMIT_MSG@16
PS2, Line 16: fix
nit: fixing


http://gerrit.cloudera.org:8080/#/c/20785/2/be/src/exprs/like-predicate.cc
File be/src/exprs/like-predicate.cc:

http://gerrit.cloudera.org:8080/#/c/20785/2/be/src/exprs/like-predicate.cc@186
PS2, Line 186: state
Should we set the default value for state->case_sensitive_ in this case?



--
To view, visit http://gerrit.cloudera.org:8080/20785
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d66680f5a7660e6a41859754c4230f276e66712
Gerrit-Change-Number: 20785
Gerrit-PatchSet: 2
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 13 Dec 2023 19:03:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10027/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Wed, 13 Dec 2023 18:54:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table

2023-12-13 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20773 )

Change subject: IMPALA-12229: Support soft-delete Kudu table
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20773
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7
Gerrit-Change-Number: 20773
Gerrit-PatchSet: 4
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Wed, 13 Dec 2023 18:38:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 9: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10023/


--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Wed, 13 Dec 2023 18:18:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20753 )

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
..


Patch Set 6: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
Gerrit-Change-Number: 20753
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Dec 2023 18:07:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-13 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 19: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 19
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 13 Dec 2023 17:29:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11157: Switch to hadoop-client build

2023-12-13 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20779 )

Change subject: IMPALA-11157: Switch to hadoop-client build
..


Patch Set 3:

Initial code review checks now include an ARM build.


--
To view, visit http://gerrit.cloudera.org:8080/20779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia50b5151e5339b06ae2b623a4b2090ae6708491f
Gerrit-Change-Number: 20779
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Wed, 13 Dec 2023 16:19:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 22:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14708/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 13 Dec 2023 16:11:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 19: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 13 Dec 2023 16:11:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 21:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14707/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 13 Dec 2023 15:55:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14706/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Dec 2023 15:36:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 22:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20506/22/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/20506/22/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@602
PS22, Line 602: AnalyzesOk("alter table functional.alltypes replace columns 
(`?최종हिंदी` int)");
line too long (97 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 13 Dec 2023 15:26:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 22:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10026/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 13 Dec 2023 15:26:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-13 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#22). ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..

IMPALA-12465: Unicode column name support

Impala depends on Hive functions for column name validation and uses
validateName() function for the same. Since Hive already supports
unicode column names, the patch just updates the column name validation
function to validateColumnName(). validateName() checks for a certain
conformance based on pattern matching standards while
validateColumnName() places no restrictions on column names at the
Metadata level.

Testing: The support is tested and cross-checked with Hive. The tests
can be found in unicode-column-name.test.

Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
---
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
A testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
A tests/metadata/test_column_unicode.py
M tests/shell/test_shell_interactive.py
6 files changed, 379 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/20506/22
--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20506/21/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/20506/21/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@602
PS21, Line 602: AnalyzesOk("alter table functional.alltypes replace columns 
(`?최종हिंदी` int)");
line too long (97 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 13 Dec 2023 15:21:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-13 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#21). ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..

IMPALA-12465: Unicode column name support

Impala depends on Hive functions for column name validation and uses
validateName() function for the same. Since Hive already supports
unicode column names, the patch just updates the column name validation
function to validateColumnName(). validateName() checks for a certain
conformance based on pattern matching standards while
validateColumnName() places no restrictions on column names at the
Metadata level.

Testing: The support is tested and cross-checked with Hive. The tests
can be found in unicode-column-name.test.

Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
---
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
A testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
A tests/metadata/test_column_unicode.py
M tests/shell/test_shell_interactive.py
6 files changed, 379 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/20506/21
--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-13 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 11:

(5 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/20760/10/be/src/exec/table-sink-base.h
File be/src/exec/table-sink-base.h:

http://gerrit.cloudera.org:8080/#/c/20760/10/be/src/exec/table-sink-base.h@90
PS10, Line 90: must already have
> Nit: "must already have filled".
Done


http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java
File fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java:

http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java@115
PS6, Line 115: In case of a JOIN, and if duplicated rows ar
> It is a bit nit-picky, I meant that in the sentence "If there are duplicate
Updated the comment.


http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java@126
PS6, Line 126: se_.size() > 1)
> I wanted to ask if it is possible that modifyStmt_.fromClause_.size() == 1.
Even 'UPDATE tbl SET val = 3;' has a fromClause_ (maybe the null checking is 
redundant, but I think it should be fine), and have a single tableRef which is 
for the target table 'tbl'.
Updated the error message.


http://gerrit.cloudera.org:8080/#/c/20760/10/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test:

http://gerrit.cloudera.org:8080/#/c/20760/10/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test@400
PS10, Line 400: 1
> Are these changes compared to PS7 because of a rebase?
No, this is because of the new INSERT INTO in functional_schema_template.sql.


http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test:

http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test@252
PS7, Line 252: FROM clause
> I asked because I'm unsure whether we should add "multiple tables" to the c
Done



--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Dec 2023 15:14:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10025/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Dec 2023 15:14:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-13 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20760

to look at the new patch set (#11).

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..

IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

Part 2 had some limitations, most importantly it could not update
Iceberg tables if any of the following were true:
 * UPDATE value of partitioning column
 * UPDATE table that went through partition evolution
 * Table has SORT BY properties

The problem with partitions is that the delete record and new
data record might belong to different partitions and records
are shuffled across based on the partitions of the delete
records, hence the data files might not get written efficiently.

The problem with SORT BY properties, is that we need to write
the position delete files ordered by (file_path, position).

To address the above problems, this patch introduces a new
backend operator: IcebergBufferedDeleteSink. This new operator
extracts and aggregates the delete record information from
the incoming row batches, then in FlushFinal it orders the
position delete records and writes them out to files. This
mechanism is similar to Hive's approach:
https://github.com/apache/hive/pull/3251

IcebergBufferedDeleteSink cannot spill to disk, so it can only
run if there's enough memory to store the delete records. Paths
are stored only once, and the int64_t positions are stored in
a vector, so updating 100 Million records per node should require
around 800MBs + (100K) filepaths ~= 820 MBs of memory per node.
Spilling could be added later, but currently the need for it is not
too realistic.

Now records can get shuffled around based on the new data records'
partition values, and the SORT operator sorts the records based on
the SORT BY properties.

There's only one case we don't allow the UPDATE statement:
 * UPDATE partition column AND
 * Right-hand side of assignment is non-constant expression AND
 * UPDATE statement has a JOIN

When all of the above conditions meet, it would be possible to
have an incorrect JOIN condition that has multiple matches for the
data records, then the duplicated records would be shuffled
independently (based on the new partition value) to different
backend SINKs, and the different backend SINK would not be able
to detect the duplicates.

If any of the above conditions was false, then the duplicated records
would be shuffled together to the same SINK, that could do the
duplicate check.

This patch also moves some code from IcebergDeleteSink to the
newly introduced IcebergDeleteSinkBase.

Testing:
 * planner tests
 * e2e tests
 * Impala/Hive interop tests

Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
---
M be/src/exec/CMakeLists.txt
M be/src/exec/data-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
A be/src/exec/iceberg-buffered-delete-sink.cc
A be/src/exec/iceberg-buffered-delete-sink.h
A be/src/exec/iceberg-delete-sink-base.cc
A be/src/exec/iceberg-delete-sink-base.h
A be/src/exec/iceberg-delete-sink-config.cc
A be/src/exec/iceberg-delete-sink-config.h
M be/src/exec/iceberg-delete-sink.cc
M be/src/exec/iceberg-delete-sink.h
M be/src/exec/table-sink-base.cc
M be/src/exec/table-sink-base.h
M be/src/exprs/slot-ref.h
M be/src/runtime/dml-exec-state.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java
M fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
A fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-update.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-partitions.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-stress.test
M tests/query_test/test_iceberg.py
M tests/stress/test_update_stress.py
35 files changed, 1,960 insertions(+), 345 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/20760/11
--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF

[Impala-ASF-CR] IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20759 )

Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table 
columns
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14705/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20759
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273
Gerrit-Change-Number: 20759
Gerrit-PatchSet: 5
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Dec 2023 15:11:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 18:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14704/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 18
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 13 Dec 2023 15:09:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 19: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 19
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 13 Dec 2023 14:57:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 19:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10024/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 19
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 13 Dec 2023 14:57:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-13 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 18: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 18
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 13 Dec 2023 14:56:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12431: Support reading compressed JSON file

2023-12-13 Thread Zihao Ye (Code Review)
Zihao Ye has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20482 )

Change subject: IMPALA-12431: Support reading compressed JSON file
..


Patch Set 8:

The previous reply was meant for another patch. I accidentally replied in the 
wrong place. Please ignore it.


--
To view, visit http://gerrit.cloudera.org:8080/20482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2471855d97d4cdd51363b321055e6b06aa6d81e8
Gerrit-Change-Number: 20482
Gerrit-PatchSet: 8
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 13 Dec 2023 14:47:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns

2023-12-13 Thread Tamas Mate (Code Review)
Tamas Mate has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/20759 )

Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table 
columns
..

IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns

As the slots have already been created on the frontend this change
focuses on populating them on the backend side. There are two major
parts of this commit. Obtaining the right Accessors for the slot and
recursively filling the tuples with data.

The field ids are present in the struct slot's ColumnType field as a
list of integers. This list can be indexed with the correct element of
the SchemaPath to obtain the field id for a struct member and with that
the Accessor.

Once the Accessors are available the IcebergRowReader's MaterializeTuple
method can be called recursively to write the primitive slots of a
struct slot.

Testing:
 - Added E2E tests

Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
8 files changed, 241 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/20759/5
--
To view, visit http://gerrit.cloudera.org:8080/20759
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273
Gerrit-Change-Number: 20759
Gerrit-PatchSet: 5
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-13 Thread Zihao Ye (Code Review)
Zihao Ye has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 18:

Due to the lack of an 'only' constraint, the load of 
'functional_kudu.timestamp_at_dst_changes' was skipped. This has been fixed.


--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 18
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 13 Dec 2023 14:44:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12431: Support reading compressed JSON file

2023-12-13 Thread Zihao Ye (Code Review)
Zihao Ye has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20482 )

Change subject: IMPALA-12431: Support reading compressed JSON file
..


Patch Set 8:

Due to the lack of an 'only' constraint, the load of 
'functional_kudu.timestamp_at_dst_changes' was skipped. This has been fixed.


--
To view, visit http://gerrit.cloudera.org:8080/20482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2471855d97d4cdd51363b321055e6b06aa6d81e8
Gerrit-Change-Number: 20482
Gerrit-PatchSet: 8
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 13 Dec 2023 14:43:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-13 Thread Zihao Ye (Code Review)
Hello Wenzhe Zhou, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20681

to look at the new patch set (#18).

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..

IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

This patch adds a query option 'convert_kudu_utc_timestamps' similar to
'convert_legacy_hive_parquet_utc_timestamps'. When enabled, it converts
UTC timestamps read from Kudu to local timestamps.

The corresponding modification also include predicate pushdown and
runtime filter. Due to the ambiguity of timestamps caused by daylight
saving time changes, it is difficult to resolve in the bloom filter.
This patch additionally introduces a query option
'disable_kudu_local_timestamp_bloom_filter' to default disable the Kudu
timestamp bloom filter after enabling time zone conversion in order to
avoid erroneously filtering out data. However, for regions that do not
observe daylight saving time, it can be set to false to re-enable the
Kudu local timestamp bloom filter.

Testing:
- Add TestKuduTimestampConvert in query_test/test_kudu.py
Perform end-to-end testing in a custom cluster, including basic Kudu UTC
timestamp conversion testing, as well as checking if related predicate
pushdown and runtime filters are working correctly (even with timestamps
involving daylight saving time conversions).

Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
---
M be/src/exec/kudu/kudu-scanner.cc
M be/src/exec/kudu/kudu-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M bin/rat_exclude_files.txt
M common/function-registry/impala_functions.py
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/util/ExprUtil.java
A testdata/data/timestamp_at_dst_changes.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/kudu_predicate_with_timestamp_conversion.test
A 
testdata/workloads/functional-query/queries/QueryTest/kudu_runtime_filter_with_timestamp_conversion.test
A 
testdata/workloads/functional-query/queries/QueryTest/kudu_timestamp_conversion.test
M tests/query_test/test_kudu.py
25 files changed, 592 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/20681/18
--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 18
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2023-12-13 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20367/14/tests/custom_cluster/test_sync_to_latest_hms_events.py
File tests/custom_cluster/test_sync_to_latest_hms_events.py:

http://gerrit.cloudera.org:8080/#/c/20367/14/tests/custom_cluster/test_sync_to_latest_hms_events.py@37
PS14, Line 37: --file_metadata_reload_properties=''
I'm still understanding why we need this in some tests. Do those tests depend 
on schema-only AlterTable commands (e.g. add column) to also load the file 
metadata?



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 14
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Wed, 13 Dec 2023 13:43:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20753 )

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14703/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
Gerrit-Change-Number: 20753
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Dec 2023 13:47:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10023/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Wed, 13 Dec 2023 13:44:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Wed, 13 Dec 2023 13:44:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20753 )

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10022/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
Gerrit-Change-Number: 20753
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Dec 2023 13:43:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20753 )

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14702/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
Gerrit-Change-Number: 20753
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Dec 2023 13:41:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-13 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 20:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20506/20/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/20506/20/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@534
PS20, Line 534: (
Compile time error: can't break a string like this. See also L536-537.



--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 13 Dec 2023 13:29:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-13 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 10:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.cc
File be/src/exec/iceberg-delete-sink.cc:

http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.cc@79
PS5, Line 79: VerifyRowsNotDuplicated
> file paths and positions are not sorted across partitions. So we would need
Ok, it can stay as it is.


http://gerrit.cloudera.org:8080/#/c/20760/10/be/src/exec/table-sink-base.h
File be/src/exec/table-sink-base.h:

http://gerrit.cloudera.org:8080/#/c/20760/10/be/src/exec/table-sink-base.h@90
PS10, Line 90: must already fill
Nit: "must already have filled".


http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java
File fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java:

http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java@115
PS6, Line 115: If there are duplicates in the JOIN operator
> I'm not sure what is the point here. Duplicates are only possible in the co
It is a bit nit-picky, I meant that in the sentence "If there are duplicates 
[...] then we cannot do duplicate checking in the SINK if ..." the condition at 
the beginning is not necessary - if it happens that there are actually no 
duplicates we still can't check for them if the rows are shuffled 
independently. I'd suggest something like this:

"""
In case of a JOIN, if duplicated rows can be shuffled independently, we cannot 
do duplicate checking in the SINK.
This is the case when ...
"""


http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java@126
PS6, Line 126: via UPDATE FROM
> There will be always at least one tableRef because of the target table.
I wanted to ask if it is possible that modifyStmt_.fromClause_.size() == 1.

1. If it is possible, then in that case the exception (currently) won't be 
thrown.
 1a) If it should be thrown we should remove that condition.
 1b) Otherwise, the error message lists the conditions that were needed to 
trigger the error:
  - partition column,
  - non-constant RHS
  -> in this case we should include "more than one table ref in the FROM 
clause" as well

2. If modifyStmt_.fromClause_.size() == 1 is not possible, we should remove the 
relevant part of the condition on L123 and add a precondition check instead.


http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java:

http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@101
PS6, Line 101:   public TSortingOrder getSortingOrder() {
> There's good chance we will need it later, e.g. optimizing a table that has
Ok, it should stay then.


http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java
File fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java:

http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java@34
PS6, Line 34: TableSink
> It may have some value now, as there are some common fields/methods, but I'
Ok, if IcebergDeleteSink will probably be deleted we can leave it as it is now. 
But we should open a Jira about it then.


http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/datasets/functional/functional_schema_template.sql@3407
PS7, Line 3407: E TA
> Makes sense, I never really thought about this as I usually re-load my tabl
I agree, let's not make this patch even bigger.


http://gerrit.cloudera.org:8080/#/c/20760/10/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test:

http://gerrit.cloudera.org:8080/#/c/20760/10/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test@400
PS10, Line 400: 1
Are these changes compared to PS7 because of a rebase?


http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test:

http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test@252
PS7, Line 252: FROM clause
> I think yes, otherwise you cannot have a join that produces duplicates.
I asked because I'm unsure whether we should add 

[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables

2023-12-13 Thread Gabor Kaszab (Code Review)
Hello Andrew Sherman, Tamas Mate, Daniel Becker, Zoltan Borok-Nagy, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20753

to look at the new patch set (#6).

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
..

IMPALA-12597: Basic Equality delete read support for Iceberg tables

In general, applying equality deletes is similar to how position
deletes are applied to data files: using a LEFT ANTI JOIN where the
SCAN for the data rows is on the left side while the SCAN for the
delete rows is on the right side of the JOIN. The difference is the
virtual columns and the conjuncts being used.
For equality deletes the data sequence number of a delete file has to
be greater than the data sequence number of the data file being
investigated. This information is added as a virtual column to the
scans and a conjunct is created in the JOIN node to check the relation.
The equality delete fields from the delete files are checked agains the
respective columns of the data SCANS.

This patch makes it possible for Impala to read Iceberg tables with
basic equality delete files. The Iceberg spec gives great flexibility
for engines for writing equality deletes, however in practice Flink,
one of the engines that write EQ-deletes supports only a subset of the
use cases. This patch focuses on reading the EQ-deletes written by
Flink.

The restrictions are the following:
- All equality delete files in a table should have the same equality
  field ID list.
- For partitioned Iceberg tables it is expected that the partition
  values are also written into the equality delete files.
- Tables with equality deletes shouldn't have partition or schema
  evolution.
- Floating point equality columns aren't supported.
- If a malformed equality delete file doesn't have some of the equality
  field IDs then Parquet reader will fill those missing fields with
  NULLs. As a side effect this will drop the rows from the result where
  the corresponding data columns has a null value.
See IMPALA-11388 epic Jira for more details.

Testing:
- Checked if the existing functional_parquet.iceberg_v2_delete_equality
  table can be read successfully.

TODO: Add some test tables created by Flink to the test suite:
- Partitioned table hat has equality deletes.
- Table with partition evolution.
- Table with schema evolution.

Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
---
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.h
M common/thrift/CatalogObjects.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
A fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java
A fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/delete-074a9e19e61b766e-652a169e0001_800513971_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/bb4b8c07-84e1-421a-bb6c-594f297d118e-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-3802179086205335895-1-3d36bf90-2625-4625-b09b-d4359b979df9.avro
A 

[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables

2023-12-13 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20753 )

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
..


Patch Set 5:

(2 comments)

I added the most important test tables with this patch. Will add the rest soon.

http://gerrit.cloudera.org:8080/#/c/20753/4/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/20753/4/common/thrift/PlanNodes.thrift@403
PS4, Line 403: this case.
> nit: this case
Done


http://gerrit.cloudera.org:8080/#/c/20753/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20753/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@289
PS4, Line 289: tblR
> nit: too much indent
Done



--
To view, visit http://gerrit.cloudera.org:8080/20753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
Gerrit-Change-Number: 20753
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Dec 2023 13:14:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables

2023-12-13 Thread Gabor Kaszab (Code Review)
Hello Andrew Sherman, Tamas Mate, Daniel Becker, Zoltan Borok-Nagy, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20753

to look at the new patch set (#5).

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
..

IMPALA-12597: Basic Equality delete read support for Iceberg tables

In general, applying equality deletes is similar to how position
deletes are applied to data files: using a LEFT ANTI JOIN where the
SCAN for the data rows is on the left side while the SCAN for the
delete rows is on the right side of the JOIN. The difference is the
virtual columns and the conjuncts being used.
For equality deletes the data sequence number of a delete file has to
be greater than the data sequence number of the data file being
investigated. This information is added as a virtual column to the
scans and a conjunct is created in the JOIN node to check the relation.
The equality delete fields from the delete files are checked agains the
respective columns of the data SCANS.

This patch makes it possible for Impala to read Iceberg tables with
basic equality delete files. The Iceberg spec gives great flexibility
for engines for writing equality deletes, however in practice Flink,
one of the engines that write EQ-deletes supports only a subset of the
use cases. This patch focuses on reading the EQ-deletes written by
Flink.

The restrictions are the following:
- All equality delete files in a table should have the same equality
  field ID list.
- For partitioned Iceberg tables it is expected that the partition
  values are also written into the equality delete files.
- Tables with equality deletes shouldn't have partition or schema
  evolution.
- Floating point equality columns aren't supported.
- If a malformed equality delete file doesn't have some of the equality
  field IDs then Parquet reader will fill those missing fields with
  NULLs. As a side effect this will drop the rows from the result where
  the corresponding data columns has a null value.
See IMPALA-11388 epic Jira for more details.

Testing:
- Checked if the existing functional_parquet.iceberg_v2_delete_equality
  table can be read successfully.

TODO: Add some test tables created by Flink to the test suite:
- Partitioned table hat has equality deletes.
- Table with partition evolution.
- Table with schema evolution.

Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
---
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.h
M common/thrift/CatalogObjects.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
A fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java
A fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-1.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-2.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/delete-074a9e19e61b766e-652a169e0001_800513971_data.0.parq
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m1.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/bb4b8c07-84e1-421a-bb6c-594f297d118e-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-3802179086205335895-1-3d36bf90-2625-4625-b09b-d4359b979df9.avro
A 

[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20753 )

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
..


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/20753/5/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/20753/5/tests/query_test/test_iceberg.py@1278
PS5, Line 1278: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/20753/5/tests/query_test/test_iceberg.py@1279
PS5, Line 1279: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/20753/5/tests/query_test/test_iceberg.py@1280
PS5, Line 1280: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/20753/5/tests/query_test/test_iceberg.py@1281
PS5, Line 1281: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/20753/5/tests/query_test/test_iceberg.py@1282
PS5, Line 1282: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/20753/5/tests/query_test/test_iceberg.py@1283
PS5, Line 1283: \
flake8: E502 the backslash is redundant between brackets



--
To view, visit http://gerrit.cloudera.org:8080/20753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad
Gerrit-Change-Number: 20753
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 13 Dec 2023 13:14:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 17: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10019/


--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 17
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 13 Dec 2023 13:07:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 8: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10020/


--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 8
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Wed, 13 Dec 2023 13:06:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20648 )

Change subject: IMPALA-10987: Changing impala.disableHmsSync in Hive should not 
break event processing
..

IMPALA-10987: Changing impala.disableHmsSync in
Hive should not break event processing

Currently we require a global invalidate to reset the events processor
if the events sync is re-enabled on a table from HMS. This patch
eliminates the need to reset the catalog cache when events sync is
re-enabled.

Implementation details: when events sync is re-enabled on table via HMS
1) If the table exists in Impala,
  a) We can just invalidate the table, if the current event is greater
than the create event id of the table, so that it is reloaded the first
time query accesses it.
  b) Otherwise we can just ignore the event.
2) If the table doesn't exist in Impala, create a Incomplete table, if
there is no entry in the event delete log for this table.

Note: If the eventSync is disabled on a table, for all subsequent table
events, ideally we should mark the table as stale if the table object
is loaded, so that it is reloaded the next time query accesses it. But,
since this approach has performance impact, the events will be ignored.

Testing:
1) manually verified few scenarios.
2) Added test case for the above scenarios.

Change-Id: I37055990be49e91462ebc98aa97009ca768a0072
Reviewed-on: http://gerrit.cloudera.org:8080/20648
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_events_custom_configs.py
3 files changed, 162 insertions(+), 59 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/20648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I37055990be49e91462ebc98aa97009ca768a0072
Gerrit-Change-Number: 20648
Gerrit-PatchSet: 12
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing

2023-12-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20648 )

Change subject: IMPALA-10987: Changing impala.disableHmsSync in Hive should not 
break event processing
..


Patch Set 11: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37055990be49e91462ebc98aa97009ca768a0072
Gerrit-Change-Number: 20648
Gerrit-PatchSet: 11
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Wed, 13 Dec 2023 12:34:42 +
Gerrit-HasComments: No


  1   2   >