[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20681 ) Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu to local time .. Patch Set 19: > Patch Set 19: > > I'm confused why these changes could affect test_orc_stats, but I noticed > another patch that encountered the same situation: > https://jenkins.impala.io/job/gerrit-verify-dryrun/10031/ > Could it be due to some changes causing this test to become unstable? I think it's unrelated to this patch. Just filed IMPALA-12630. -- To view, visit http://gerrit.cloudera.org:8080/20681 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3 Gerrit-Change-Number: 20681 Gerrit-PatchSet: 19 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Thu, 14 Dec 2023 07:19:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20681 ) Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu to local time .. Patch Set 19: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10033/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/20681 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3 Gerrit-Change-Number: 20681 Gerrit-PatchSet: 19 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Thu, 14 Dec 2023 07:19:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time
Zihao Ye has posted comments on this change. ( http://gerrit.cloudera.org:8080/20681 ) Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu to local time .. Patch Set 19: I'm confused why these changes could affect test_orc_stats, but I noticed another patch that encountered the same situation: https://jenkins.impala.io/job/gerrit-verify-dryrun/10031/ Could it be due to some changes causing this test to become unstable? -- To view, visit http://gerrit.cloudera.org:8080/20681 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3 Gerrit-Change-Number: 20681 Gerrit-PatchSet: 19 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Thu, 14 Dec 2023 07:15:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20485 ) Change subject: IMPALA-10949: Improve batching logic of partition events .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10032/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 11 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 14 Dec 2023 06:57:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20485 ) Change subject: IMPALA-10949: Improve batching logic of partition events .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 11 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 14 Dec 2023 06:57:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20485 ) Change subject: IMPALA-10949: Improve batching logic of partition events .. Patch Set 10: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10031/ -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 10 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 14 Dec 2023 04:54:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11501: Add flag to allow catalog-cache operations on masked tables
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20742 ) Change subject: IMPALA-11501: Add flag to allow catalog-cache operations on masked tables .. IMPALA-11501: Add flag to allow catalog-cache operations on masked tables REFRESH/INVALIDATE METADATA are the table level catalog-cache operations. In Hive-Ranger plugin, when a table is masked (either by column-masking or row-filtering policy) for a user, the user is unable to perform any modification (insert/delete/update) on the table, i.e. it's considered a read-only user (RANGER-1087, RANGER-1100). However, Hive doesn't have these catalog-cache operations. It's a grey area whether they should be blocked. Before this patch, these catalog-cache operations are considered as modifications on the table so they are also blocked for masked users. Table metadata is required to be loaded so we have the column names to fetch Ranger column masking policies. This causes a performance regression on INVALIDATE METADATA commands since in older versions (e.g. CDH), IM commands don't need to load the table metadata and runs pretty fast. This patch adds a flag, allow_catalog_cache_op_from_masked_users, for coordinators to skip checking masking policies for such statements. When this flag is on, coordinators don't need to load the table metadata thus fix the performance regression as well. Note that Ranger ownership based policies can't be applied correctly when the table is unloaded (so the owner is unknown). REFRESH/INVALIDATE METADATA commands could be denied on owners even if there are Ranger policies allowing the owner's operations. This is a known issue since IMPALA-8228. To ensure a user can always perform these operations, grant REFRESH privilege to them to get rid of the unloaded table issue. This patch also fixes a bug in local catalog mode which only occurs after adding the new flag. The bug is that LocalDb#getTableIfCached() doesn't make good use of the cache. If the table meta is cahced but LocalDb#getTable() hasn't been invoked on the table, getTableIfCached() will always return a LocalIncompleteTable which is missing some table info, e.g. ownership. This causes REFRESH/INVALIDATE statements not able to pass the ownership context to RangerAccessResourceImpl so ownership policies can't be correctly applied. Ideally, LocalDb#getTableIfCached() should return a LocalTable instance if the table is cached. However, in local catalog mode, we don't cache everything that constructs a LocalTable instance. Constructing a LocalTable instance might still trigger external RPCs which should be avoided. As an alternative, this patch checks if the msTable is cached. If it's cached, add it to the LocalIncompleteTable instance so most of the table info can be retrieved, including the ownership string. Tests: - Add e2e tests on both the legacy and local catalog mode. Change-Id: I45935654cbf05a55d740f1b04781022c271f7678 Reviewed-on: http://gerrit.cloudera.org:8080/20742 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/service/frontend.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java M fe/src/main/java/org/apache/impala/authorization/BaseAuthorizationChecker.java M fe/src/main/java/org/apache/impala/authorization/Privilege.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIncompleteTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M tests/authorization/test_ranger.py 15 files changed, 188 insertions(+), 12 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/20742 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I45935654cbf05a55d740f1b04781022c271f7678 Gerrit-Change-Number: 20742 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11501: Add flag to allow catalog-cache operations on masked tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20742 ) Change subject: IMPALA-11501: Add flag to allow catalog-cache operations on masked tables .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20742 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45935654cbf05a55d740f1b04781022c271f7678 Gerrit-Change-Number: 20742 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 14 Dec 2023 03:58:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12618: Update README.md to reduce emphasis on Hadoop
Michael Smith has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20783 ) Change subject: IMPALA-12618: Update README.md to reduce emphasis on Hadoop .. IMPALA-12618: Update README.md to reduce emphasis on Hadoop The README.md file is displayed on the github home page https://github.com/apache/impala Change this so that the opening line mentions “data stored in open data and table formats” instead of “data stored in Apache Hadoop clusters“. Also add Iceberg as the first mentioned place where data can be stored. Change-Id: I35e91611374f20ec6a540d4c3bf5ae375ac38bce Reviewed-on: http://gerrit.cloudera.org:8080/20783 Reviewed-by: Quanlong Huang Tested-by: Michael Smith --- M README.md 1 file changed, 2 insertions(+), 2 deletions(-) Approvals: Quanlong Huang: Looks good to me, approved Michael Smith: Verified -- To view, visit http://gerrit.cloudera.org:8080/20783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I35e91611374f20ec6a540d4c3bf5ae375ac38bce Gerrit-Change-Number: 20783 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12618: Update README.md to reduce emphasis on Hadoop
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20783 ) Change subject: IMPALA-12618: Update README.md to reduce emphasis on Hadoop .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I35e91611374f20ec6a540d4c3bf5ae375ac38bce Gerrit-Change-Number: 20783 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 14 Dec 2023 03:28:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11157: Switch to hadoop-client build
Michael Smith has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20779 ) Change subject: IMPALA-11157: Switch to hadoop-client build .. IMPALA-11157: Switch to hadoop-client build The hadoop build only produces client binaries, not a full hadoop build. The name was therefore misleading, and could not replace the full build of hadoop required by Impala. Impala's toolchain bootstrap process would then fail if we tried to include two packages named "hadoop" when overriding the download URL via IMPALA_HADOOP_URL. Renames hadoop to hadoop-client to clarify its contents and avoid conflicts with a full hadoop build. Change-Id: Ia50b5151e5339b06ae2b623a4b2090ae6708491f Reviewed-on: http://gerrit.cloudera.org:8080/20779 Tested-by: Impala Public Jenkins Reviewed-by: Joe McDonnell Reviewed-by: Quanlong Huang --- M bin/bootstrap_toolchain.py M bin/impala-config.sh M buildall.sh 3 files changed, 7 insertions(+), 8 deletions(-) Approvals: Impala Public Jenkins: Verified Joe McDonnell: Looks good to me, approved Quanlong Huang: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/20779 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia50b5151e5339b06ae2b623a4b2090ae6708491f Gerrit-Change-Number: 20779 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 ) Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14714/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 15 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 14 Dec 2023 03:12:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 ) Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/20367/15/tests/custom_cluster/test_sync_to_latest_hms_events.py File tests/custom_cluster/test_sync_to_latest_hms_events.py: http://gerrit.cloudera.org:8080/#/c/20367/15/tests/custom_cluster/test_sync_to_latest_hms_events.py@581 PS15, Line 581: flake8: W292 no newline at end of file -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 15 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 14 Dec 2023 02:45:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
Hello Quanlong Huang, k.venureddy2...@gmail.com, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20367 to look at the new patch set (#15). Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs .. IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs The idea is that when any DDL/DML operation is performed by Impala, it also syncs the db/table to its latest event ID as per HMS. This way updates to a db/table's are applied in the same order as they appear in the Notification log table in HMS which ensures consistency. Currently catalogD applies any updates received from Impala clients in-place. Instead it should perform an HMS operation first and then replay all the HMS events since the last synced event id. Implementation: when the enable_sync_to_latest_event_on_ddls flag is set to true, we do the DDL/DML operation first, i.e., perform HMS operation and then sync the db/table in the catalogD's cache to the latest event in HMS for the corresponding db/table. Currently we fetch all events greater than the db/table's lastSyncEventId and filter them in the events processor to sync only the current db/table events. Once HIVE-27499 is implemented, we can directly fetch the events only for the respective db/table and process them. Currently, there is no efficient way to identify if there are pending events for a db/table. Set 'enable_sync_to_latest_event_on_ddls' to true. Also, set 'file_metadata_reload_properties' to empty string to avoid data inconsistencies. Note: We don't modify the cache using MetastoreEventsProcessor for alter table rename operation as this is a complex operation regarding cache modification (IMPALA-12553 has more details about this) . We don't modify cache using above process for 'refresh table' or 'invalidate metadata table' commands. Testing: 1) Added few tests in the MetaStoreEventProcessorForTest to verify this feature that simulates the metadata sync between HMS and Impala. 2) Added few tests in the CatalogHmsSyncToLatestEventIdTest class to the metadata sync between HMS end point, Catalog Metastore Server and Impala. The HMS end point serves as common interface to metadata changes outside the current Impala service such as Hive, Spark or other Impala service. Also verified the table lastSyncEventId is updated after the events are sync and confirmed that metastore event processor ignored these synced events. Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf --- M be/src/catalog/catalog-server.cc M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java A tests/custom_cluster/test_sync_to_latest_hms_events.py M tests/metadata/test_recover_partitions.py 13 files changed, 1,134 insertions(+), 125 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/20367/15 -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 15 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 ) Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/20367/14/tests/custom_cluster/test_sync_to_latest_hms_events.py File tests/custom_cluster/test_sync_to_latest_hms_events.py: http://gerrit.cloudera.org:8080/#/c/20367/14/tests/custom_cluster/test_sync_to_latest_hms_events.py@37 PS14, Line 37: --file_metadata_reload_properties='' > I'm still understanding why we need this in some tests. Do those tests depe This is a real problem with queries involving the 'Insert or Insert overwrite' command. This command generates an alter table event followed by an insert event. if the numRows don't change then we cannot detect if need to reload file metadata. We need to detect that an alter table event is generated because of an insert query and reload file metadata accordingly. Below is an example where we cannot detect whether to reload file metadata or not: create table tb1(i int); (Query run in Impala) insert into tb1 values (1); (Query run in Hive) Insert overwrite table tb1 values (2); (Query run in Hive) Select * from tb1; (Query run in Impala) -- The output comes out as '1' instead of '2'. Reason: -> For the first insert query, we get 2 events, alter table and insert event, alter table event has numRows property changed, so we reload file metadata and update the lastSyncEventId on table, then the insert event gets skipped. -> For the second insert overwrite query, we get 2 events, alter table and insert event, since the numRows are changed (even though underlying data changed), we cannot detect if file metadata needs to be reloaded, so we process this event without reloading file metadata and update the lastSyncEventId on table, then the insert event gets skipped. As a result, we get data correctness issues. I believe the solution to this issue is to fix the Alter table event in the metastore, to indicate that this event is triggered because of an insert event then we can simply reload file metadata. -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 15 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 14 Dec 2023 02:44:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20753 ) Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg tables .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad Gerrit-Change-Number: 20753 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 14 Dec 2023 02:15:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11157: Switch to hadoop-client build
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20779 ) Change subject: IMPALA-11157: Switch to hadoop-client build .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20779 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia50b5151e5339b06ae2b623a4b2090ae6708491f Gerrit-Change-Number: 20779 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 14 Dec 2023 01:51:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11157: Switch to hadoop-client build
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/20779 ) Change subject: IMPALA-11157: Switch to hadoop-client build .. Patch Set 3: Code-Review+2 This looks good to me -- To view, visit http://gerrit.cloudera.org:8080/20779 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia50b5151e5339b06ae2b623a4b2090ae6708491f Gerrit-Change-Number: 20779 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Thu, 14 Dec 2023 01:50:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12618: Update README.md to reduce emphasis on Hadoop
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20783 ) Change subject: IMPALA-12618: Update README.md to reduce emphasis on Hadoop .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I35e91611374f20ec6a540d4c3bf5ae375ac38bce Gerrit-Change-Number: 20783 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 14 Dec 2023 01:38:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12618: Update README.md to reduce emphasis on Hadoop
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20783 ) Change subject: IMPALA-12618: Update README.md to reduce emphasis on Hadoop .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14713/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I35e91611374f20ec6a540d4c3bf5ae375ac38bce Gerrit-Change-Number: 20783 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 14 Dec 2023 01:39:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/20533 ) Change subject: IMPALA-12463: Batch non-consecutive table events in the event processor .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@400 PS7, Line 400: } else if (next instanceof AlterTableEvent) { Shouldn't we consider create/drop table events as batch-breaking events? http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@405 PS7, Line 405: flushBatchForTable(pendingTableEventsMap, sortedFinalBatches, beforeTable); IMO, we should also consider the scenario where the rename table happens across different databases also and flush the corresponding events. Eg: Alter table rename db1.tb1 to db2.tb2; http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@408 PS7, Line 408: // an invalid scenario, because the destination must not exist at the time This is a possible scenario, In the current queue, There are table events for t1, table events for t2, drop event for t1, rename event from t2 to t1. http://gerrit.cloudera.org:8080/#/c/20533/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@426 PS7, Line 426: dbMap = pendingTableEventsMap.get(dbName); Shouldn't we just assign a new HashMap<>() directly to the dbMap variable? -- To view, visit http://gerrit.cloudera.org:8080/20533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905 Gerrit-Change-Number: 20533 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 14 Dec 2023 01:15:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR](asf-site) IMPALA-12619: Update Impala website to reduce emphasis on Hadoop
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20782 to look at the new patch set (#3). Change subject: IMPALA-12619: Update Impala website to reduce emphasis on Hadoop .. IMPALA-12619: Update Impala website to reduce emphasis on Hadoop The Impala website at ASF https://impala.apache.org/ is the first hit returned for “Apache Impala”. Update the first line of the description to say ="Apache Impala is a modern, open source, distributed SQL query engine for open data and table formats." instead of "Apache Impala is a modern, open source, distributed SQL query engine for Apache Hadoop." Also mention Ranger instead of Sentry, and add references to Iceberg. Change-Id: I2d63bbbc87375345eaf58989a59f704dbb9559fd --- M index.html 1 file changed, 5 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/20782/3 -- To view, visit http://gerrit.cloudera.org:8080/20782 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2d63bbbc87375345eaf58989a59f704dbb9559fd Gerrit-Change-Number: 20782 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12618: Update README.md to reduce emphasis on Hadoop
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20783 to look at the new patch set (#2). Change subject: IMPALA-12618: Update README.md to reduce emphasis on Hadoop .. IMPALA-12618: Update README.md to reduce emphasis on Hadoop The README.md file is displayed on the github home page https://github.com/apache/impala Change this so that the opening line mentions “data stored in open data and table formats” instead of “data stored in Apache Hadoop clusters“. Also add Iceberg as the first mentioned place where data can be stored. Change-Id: I35e91611374f20ec6a540d4c3bf5ae375ac38bce --- M README.md 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/20783/2 -- To view, visit http://gerrit.cloudera.org:8080/20783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I35e91611374f20ec6a540d4c3bf5ae375ac38bce Gerrit-Change-Number: 20783 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-11157: Switch to hadoop-client build
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20779 ) Change subject: IMPALA-11157: Switch to hadoop-client build .. Patch Set 3: Passed an ARM test run as well: https://jenkins.impala.io/job/ubuntu-20.04-from-scratch-ARM/66/ -- To view, visit http://gerrit.cloudera.org:8080/20779 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia50b5151e5339b06ae2b623a4b2090ae6708491f Gerrit-Change-Number: 20779 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Thu, 14 Dec 2023 00:34:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20485 ) Change subject: IMPALA-10949: Improve batching logic of partition events .. Patch Set 9: > Patch Set 9: Verified-1 > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10023/ This seems a flaky test tracked by IMPALA-12416: custom_cluster/test_events_custom_configs.py:375: in test_skipping_older_events verify_skipping_older_events(test_old_table, False, False) custom_cluster/test_events_custom_configs.py:355: in verify_skipping_older_events query.format(unique_database, table_name), table_name) custom_cluster/test_events_custom_configs.py:342: in verify_skipping_hive_stmt_events assert tbl_events_skipped_after > tbl_events_skipped_before E assert 1 > 1 -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 9 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 14 Dec 2023 00:21:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20485 ) Change subject: IMPALA-10949: Improve batching logic of partition events .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10031/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 10 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 14 Dec 2023 00:21:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20485 ) Change subject: IMPALA-10949: Improve batching logic of partition events .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 10 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 14 Dec 2023 00:21:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20773 ) Change subject: IMPALA-12229: Support soft-delete Kudu table .. IMPALA-12229: Support soft-delete Kudu table Adds 'kudu_table_reserve_seconds' query option to set reserved time for deleted Impala managed Kudu tables. The default value is 0. This option can prevent users from deleting important Kudu tables by mistake. Testing: - Added e2e tests. Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7 Reviewed-on: http://gerrit.cloudera.org:8080/20773 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/CatalogService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M infra/python/deps/kudu-requirements.txt M tests/query_test/test_kudu.py 10 files changed, 112 insertions(+), 17 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/20773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7 Gerrit-Change-Number: 20773 Gerrit-PatchSet: 6 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang
[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20773 ) Change subject: IMPALA-12229: Support soft-delete Kudu table .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7 Gerrit-Change-Number: 20773 Gerrit-PatchSet: 5 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Thu, 14 Dec 2023 00:12:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11501: Add flag to allow catalog-cache operations on masked tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20742 ) Change subject: IMPALA-11501: Add flag to allow catalog-cache operations on masked tables .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10030/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/20742 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45935654cbf05a55d740f1b04781022c271f7678 Gerrit-Change-Number: 20742 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 13 Dec 2023 23:32:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11501: Add flag to allow catalog-cache operations on masked tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20742 ) Change subject: IMPALA-11501: Add flag to allow catalog-cache operations on masked tables .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20742 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45935654cbf05a55d740f1b04781022c271f7678 Gerrit-Change-Number: 20742 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 13 Dec 2023 23:32:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11501: Add flag to allow catalog-cache operations on masked tables
Fang-Yu Rao has posted comments on this change. ( http://gerrit.cloudera.org:8080/20742 ) Change subject: IMPALA-11501: Add flag to allow catalog-cache operations on masked tables .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/20742/2/tests/authorization/test_ranger.py File tests/authorization/test_ranger.py: http://gerrit.cloudera.org:8080/#/c/20742/2/tests/authorization/test_ranger.py@1615 PS2, Line 1615: ida > Yeah, just wanted to use a short grant statement. I can change it to the mi Ack -- To view, visit http://gerrit.cloudera.org:8080/20742 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45935654cbf05a55d740f1b04781022c271f7678 Gerrit-Change-Number: 20742 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 13 Dec 2023 23:28:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20485 ) Change subject: IMPALA-10949: Improve batching logic of partition events .. Patch Set 9: Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10027/ -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 9 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Wed, 13 Dec 2023 23:22:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3825: Delegate runtime filter aggregation to some executors
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20612 ) Change subject: IMPALA-3825: Delegate runtime filter aggregation to some executors .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14712/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0 Gerrit-Change-Number: 20612 Gerrit-PatchSet: 15 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 13 Dec 2023 22:49:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3825: Delegate runtime filter aggregation to some executors
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20612 ) Change subject: IMPALA-3825: Delegate runtime filter aggregation to some executors .. Patch Set 15: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0 Gerrit-Change-Number: 20612 Gerrit-PatchSet: 15 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 13 Dec 2023 22:28:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3825: Delegate runtime filter aggregation to some executors
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20612 ) Change subject: IMPALA-3825: Delegate runtime filter aggregation to some executors .. Patch Set 15: (5 comments) Thank you, Michael. http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/query-state.cc@271 PS14, Line 271: // Making a copy of the "filepath to hosts" mapping into std library types. > This comment doesn't really explain why this is necessary. I'm not sure either. This is from added by IMPALA-12308 https://gerrit.cloudera.org/c/20548/ Not part of this patch. http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h File be/src/runtime/runtime-filter-bank.h: http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h@107 PS14, Line 107: /// selected as intermediate filter aggregator to help coordinator. Besides doing > nit: remove "of", so it says "Besides doing" Done http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h@108 PS14, Line 108: /// local aggregation, each intermediate aggregator will also listen and aggregate > grammar: "each intermediate aggregator will" Done http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h@109 PS14, Line 109: /// filter updates from at most MAX_NUM_FILTERS_AGGREGATED_PER_HOST-1 other executors. > "filter updates from" Done http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h@110 PS14, Line 110: /// Intermediate aggregator then sends the aggregated filter update to coordinator for > "then sends the" Done -- To view, visit http://gerrit.cloudera.org:8080/20612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0 Gerrit-Change-Number: 20612 Gerrit-PatchSet: 15 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 13 Dec 2023 22:21:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3825: Delegate runtime filter aggregation to some executors
Hello Kurt Deschler, Abhishek Rawat, Csaba Ringhofer, Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20612 to look at the new patch set (#15). Change subject: IMPALA-3825: Delegate runtime filter aggregation to some executors .. IMPALA-3825: Delegate runtime filter aggregation to some executors IMPALA-4400 improve the runtime filter by aggregating runtime filters locally before sending filter update to the coordinator and sharing a single RuntimeFilterBank for all fragment instances in a query. However, local filter aggregation is still insufficient if the number of nodes in an impala cluster is large. For example, in a cluster of around 700 impalad backends, aggregation of 1 MB bloom filter updates in the coordinator can exceed more than 1 second. This patch aims to reduce coordinator load and speed up runtime filter aggregation by doing intermediate aggregation in a few designated impala backends before doing final aggregation and publishing in the coordinator. Query option MAX_NUM_FILTERS_AGGREGATED_PER_HOST is added to control this feature. Given N as the number of backend executors excluding the coordinator, the selected number of intermediate aggregators M = ceil(N / MAX_NUM_FILTERS_AGGREGATED_PER_HOST). Setting MAX_NUM_FILTERS_AGGREGATED_PER_HOST <= 1 will disable the intermediate aggregator feature. In the backend scheduler, M impalad will be selected randomly as the intermediate aggregator for that runtime filter. Information of this M selected impalad then passed from the scheduler to coordinator as a RuntimeFilterAggregatorInfoPB. The coordinator then converts the RuntimeFilterAggregatorInfoPB into a filter routing information TRuntimeFilterAggDesc that is piggy-backed in TRuntimeFilterSource. A new RPC endpoint named UpdateFilterFromRemote is added in data_stream_service.proto to handle filter updates from fellow impalad executor to the designated aggregator impalad. This RPC will merge filter updates into 'pending_remote_filter'. The intermediate aggregator will then combine 'pending_remote_filter' with 'pending_merge_filter' (from local aggregation) into 'result_filter' which is then sent to the coordinator. RuntimeFilterBank of the intermediate aggregator will wait for all remote filter updates for at least RUNTIME_FILTER_WAIT_TIME_MS. If RuntimeFilterBank is closing and RUNTIME_FILTER_WAIT_TIME_MS has passed, any incomplete filter will be marked as ALWAYS_TRUE and sent to the coordinator. This patch currently targets the bloom filter produced by partitioned join build only. Another kind of runtime filter is still efficient to aggregate in coordinator only, while the bloom filter from broadcast join only requires 1 valid filter update for publishing. test_runtime_filters.py is modified to clarify the exec_options dimension, test matrix constraints, and reduce pytest.skip() calls on each test. runtime_filters.test is also changed to use counter aggregation and assert on ExecSummary table so that they stay valid irrespective of the number of fragment instances. We benchmark the aggregation speed of 1 MB runtime filter aggregation on 20 executor nodes cluster with MT_DOP=36 that is instrumented to disable local aggregation, simulating 720 runtime filter updates. The speed is approximated as the duration between the earliest time a filter update is made and the time that the coordinator publishes the complete filter. The result is following: +-++ | num aggregator node | Aggregation speed (ms) | +-++ | 0 | 1296 | | 1 | 1229 | | 2 |608 | | 4 |329 | | 8 |205 | +-++ Testing: - Exercise MAX_NUM_FILTERS_AGGREGATED_PER_HOST in test_runtime_filters.py and query-options-test.cc - Add custom_cluster/test_runtime_filter_aggregation.py. - Pass exhaustive end-to-end and custom-cluster tests. Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0 --- M be/src/common/logging.h M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/data-stream-service.cc M be/src/service/data-stream-service.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M be/src/util/network-util.h M be/src/util/runtime-profile-counters.h M
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 22: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 13 Dec 2023 22:19:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3825: Delegate runtime filter aggregation to some executors
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20612 ) Change subject: IMPALA-3825: Delegate runtime filter aggregation to some executors .. Patch Set 14: (5 comments) http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/query-state.cc@271 PS14, Line 271: // Making a copy of the "filepath to hosts" mapping into std library types. This comment doesn't really explain why this is necessary. http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h File be/src/runtime/runtime-filter-bank.h: http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h@107 PS14, Line 107: /// selected as intermediate filter aggregator to help coordinator. Besides of doing nit: remove "of", so it says "Besides doing" http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h@108 PS14, Line 108: /// local aggregation, each intermediate aggregators will also listen and aggregate grammar: "each intermediate aggregator will" http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h@109 PS14, Line 109: /// filter update from at most MAX_NUM_FILTERS_AGGREGATED_PER_HOST-1 other executors. "filter updates from" http://gerrit.cloudera.org:8080/#/c/20612/14/be/src/runtime/runtime-filter-bank.h@110 PS14, Line 110: /// Intermediate aggregator then send the aggregated filter update to coordinator for "then sends the" -- To view, visit http://gerrit.cloudera.org:8080/20612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0 Gerrit-Change-Number: 20612 Gerrit-PatchSet: 14 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 13 Dec 2023 21:49:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20753 ) Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg tables .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10029/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad Gerrit-Change-Number: 20753 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2023 21:44:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20753 ) Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg tables .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14711/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad Gerrit-Change-Number: 20753 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2023 21:42:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20753 ) Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg tables .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14710/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad Gerrit-Change-Number: 20753 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2023 21:40:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12375: Make DataSource Object persistent
gsi...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/20768 ) Change subject: IMPALA-12375: Make DataSource Object persistent .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I500a99142bb62ce873e693d573064ad4ffa153ab Gerrit-Change-Number: 20768 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 13 Dec 2023 21:18:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables
Hello Andrew Sherman, Tamas Mate, Daniel Becker, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20753 to look at the new patch set (#8). Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg tables .. IMPALA-12597: Basic Equality delete read support for Iceberg tables In general, applying equality deletes is similar to how position deletes are applied to data files: using a LEFT ANTI JOIN where the SCAN for the data rows is on the left side while the SCAN for the delete rows is on the right side of the JOIN. The difference is the virtual columns and the conjuncts being used. For equality deletes the data sequence number of a delete file has to be greater than the data sequence number of the data file being investigated. This information is added as a virtual column to the scans and a conjunct is created in the JOIN node to check the relation. The equality delete fields from the delete files are checked agains the respective columns of the data SCANS. This patch makes it possible for Impala to read Iceberg tables with basic equality delete files. The Iceberg spec gives great flexibility for engines for writing equality deletes, however in practice Flink, one of the engines that write EQ-deletes supports only a subset of the use cases. This patch focuses on reading the EQ-deletes written by Flink. The restrictions are the following: - All equality delete files in a table should have the same equality field ID list. - For partitioned Iceberg tables it is expected that the partition values are also written into the equality delete files. - Tables with equality deletes shouldn't have partition or schema evolution. - Floating point equality columns aren't supported. - If a malformed equality delete file doesn't have some of the equality field IDs then Parquet reader will fill those missing fields with NULLs. As a side effect this will drop the rows from the result where the corresponding data columns has a null value. See IMPALA-11388 epic Jira for more details. Testing: - Checked if the existing functional_parquet.iceberg_v2_delete_equality table can be read successfully. - Added new test table so that E2E tests can validate correctness. Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad --- M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.h M common/thrift/CatalogObjects.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java A fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java A fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/delete-074a9e19e61b766e-652a169e0001_800513971_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/bb4b8c07-84e1-421a-bb6c-594f297d118e-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-3802179086205335895-1-3d36bf90-2625-4625-b09b-d4359b979df9.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-8985205515767142888-1-0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25.avro
[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables
Hello Andrew Sherman, Tamas Mate, Daniel Becker, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20753 to look at the new patch set (#7). Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg tables .. IMPALA-12597: Basic Equality delete read support for Iceberg tables In general, applying equality deletes is similar to how position deletes are applied to data files: using a LEFT ANTI JOIN where the SCAN for the data rows is on the left side while the SCAN for the delete rows is on the right side of the JOIN. The difference is the virtual columns and the conjuncts being used. For equality deletes the data sequence number of a delete file has to be greater than the data sequence number of the data file being investigated. This information is added as a virtual column to the scans and a conjunct is created in the JOIN node to check the relation. The equality delete fields from the delete files are checked agains the respective columns of the data SCANS. This patch makes it possible for Impala to read Iceberg tables with basic equality delete files. The Iceberg spec gives great flexibility for engines for writing equality deletes, however in practice Flink, one of the engines that write EQ-deletes supports only a subset of the use cases. This patch focuses on reading the EQ-deletes written by Flink. The restrictions are the following: - All equality delete files in a table should have the same equality field ID list. - For partitioned Iceberg tables it is expected that the partition values are also written into the equality delete files. - Tables with equality deletes shouldn't have partition or schema evolution. - Floating point equality columns aren't supported. - If a malformed equality delete file doesn't have some of the equality field IDs then Parquet reader will fill those missing fields with NULLs. As a side effect this will drop the rows from the result where the corresponding data columns has a null value. See IMPALA-11388 epic Jira for more details. Testing: - Checked if the existing functional_parquet.iceberg_v2_delete_equality table can be read successfully. - Added new test table so that E2E tests can validate correctness. Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad --- M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.h M common/thrift/CatalogObjects.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java A fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java A fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/delete-074a9e19e61b766e-652a169e0001_800513971_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/bb4b8c07-84e1-421a-bb6c-594f297d118e-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-3802179086205335895-1-3d36bf90-2625-4625-b09b-d4359b979df9.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-8985205515767142888-1-0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25.avro
[Impala-ASF-CR] IMPALA-3825: Delegate runtime filter aggregation to some executors
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20612 ) Change subject: IMPALA-3825: Delegate runtime filter aggregation to some executors .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14709/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0 Gerrit-Change-Number: 20612 Gerrit-PatchSet: 14 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 13 Dec 2023 20:55:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3825: Delegate runtime filter aggregation to some executors
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20612 ) Change subject: IMPALA-3825: Delegate runtime filter aggregation to some executors .. Patch Set 14: (2 comments) http://gerrit.cloudera.org:8080/#/c/20612/10/be/src/runtime/runtime-filter-bank.cc File be/src/runtime/runtime-filter-bank.cc: http://gerrit.cloudera.org:8080/#/c/20612/10/be/src/runtime/runtime-filter-bank.cc@139 PS10, Line 139: esc.filter_id); > If the reservation is claimed, then it is considered a fatal error if alloc I think what you mean is, it is ok to allocate later as long as the whole total_bloom_filter_mem_required_ is already claimed. Is that correct? ps14 move the initialization to UpdateFilterFromRemote(). http://gerrit.cloudera.org:8080/#/c/20612/10/be/src/runtime/runtime-filter-bank.cc@722 PS10, Line 722: HECK_EQ(0, produced_filter > Few things about this: If this can be a reassurrance, note that SendIncompleteFilters is only called when RuntimeFilterBank is closing. RuntimeFilterBank lifetime is equal to query lifetime in that executor node. It is closing only if query is completed, or canceled. On both case, plan root sink is basically done, and runtime filter value does not matter anymore. Coordinator can just drop runtime filter update by then. CombinePeerAndLocalUpdates() is done here for correctness. It cleanup 'pending_merge_filter' and 'pending_remote_filter' of 'produced_filter'. This feature should be exercised in TestRuntimeFilters, TestBloomFilters, TestBloomFiltersOnParquet, and TestRuntimeRowFilters. And test_wait_time_cancellation is within TestRuntimeFilters. -- To view, visit http://gerrit.cloudera.org:8080/20612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0 Gerrit-Change-Number: 20612 Gerrit-PatchSet: 14 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 13 Dec 2023 20:31:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3825: Delegate runtime filter aggregation to some executors
Hello Kurt Deschler, Abhishek Rawat, Csaba Ringhofer, Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20612 to look at the new patch set (#14). Change subject: IMPALA-3825: Delegate runtime filter aggregation to some executors .. IMPALA-3825: Delegate runtime filter aggregation to some executors IMPALA-4400 improve the runtime filter by aggregating runtime filters locally before sending filter update to the coordinator and sharing a single RuntimeFilterBank for all fragment instances in a query. However, local filter aggregation is still insufficient if the number of nodes in an impala cluster is large. For example, in a cluster of around 700 impalad backends, aggregation of 1 MB bloom filter updates in the coordinator can exceed more than 1 second. This patch aims to reduce coordinator load and speed up runtime filter aggregation by doing intermediate aggregation in a few designated impala backends before doing final aggregation and publishing in the coordinator. Query option MAX_NUM_FILTERS_AGGREGATED_PER_HOST is added to control this feature. Given N as the number of backend executors excluding the coordinator, the selected number of intermediate aggregators M = ceil(N / MAX_NUM_FILTERS_AGGREGATED_PER_HOST). Setting MAX_NUM_FILTERS_AGGREGATED_PER_HOST <= 1 will disable the intermediate aggregator feature. In the backend scheduler, M impalad will be selected randomly as the intermediate aggregator for that runtime filter. Information of this M selected impalad then passed from the scheduler to coordinator as a RuntimeFilterAggregatorInfoPB. The coordinator then converts the RuntimeFilterAggregatorInfoPB into a filter routing information TRuntimeFilterAggDesc that is piggy-backed in TRuntimeFilterSource. A new RPC endpoint named UpdateFilterFromRemote is added in data_stream_service.proto to handle filter updates from fellow impalad executor to the designated aggregator impalad. This RPC will merge filter updates into 'pending_remote_filter'. The intermediate aggregator will then combine 'pending_remote_filter' with 'pending_merge_filter' (from local aggregation) into 'result_filter' which is then sent to the coordinator. RuntimeFilterBank of the intermediate aggregator will wait for all remote filter updates for at least RUNTIME_FILTER_WAIT_TIME_MS. If RuntimeFilterBank is closing and RUNTIME_FILTER_WAIT_TIME_MS has passed, any incomplete filter will be marked as ALWAYS_TRUE and sent to the coordinator. This patch currently targets the bloom filter produced by partitioned join build only. Another kind of runtime filter is still efficient to aggregate in coordinator only, while the bloom filter from broadcast join only requires 1 valid filter update for publishing. test_runtime_filters.py is modified to clarify the exec_options dimension, test matrix constraints, and reduce pytest.skip() calls on each test. runtime_filters.test is also changed to use counter aggregation and assert on ExecSummary table so that they stay valid irrespective of the number of fragment instances. We benchmark the aggregation speed of 1 MB runtime filter aggregation on 20 executor nodes cluster with MT_DOP=36 that is instrumented to disable local aggregation, simulating 720 runtime filter updates. The speed is approximated as the duration between the earliest time a filter update is made and the time that the coordinator publishes the complete filter. The result is following: +-++ | num aggregator node | Aggregation speed (ms) | +-++ | 0 | 1296 | | 1 | 1229 | | 2 |608 | | 4 |329 | | 8 |205 | +-++ Testing: - Exercise MAX_NUM_FILTERS_AGGREGATED_PER_HOST in test_runtime_filters.py and query-options-test.cc - Add custom_cluster/test_runtime_filter_aggregation.py. - Pass exhaustive end-to-end and custom-cluster tests. Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0 --- M be/src/common/logging.h M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/data-stream-service.cc M be/src/service/data-stream-service.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h M be/src/util/network-util.h M be/src/util/runtime-profile-counters.h M
[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20760 ) Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables. .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315 Gerrit-Change-Number: 20760 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2023 20:22:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12581: Fix issue of ILIKE and IREGEXP don't work correctly with non-const pattern
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20785 ) Change subject: IMPALA-12581: Fix issue of ILIKE and IREGEXP don't work correctly with non-const pattern .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/20785/2/tests/query_test/test_exprs.py File tests/query_test/test_exprs.py: http://gerrit.cloudera.org:8080/#/c/20785/2/tests/query_test/test_exprs.py@316 PS2, Line 316: "SELECT count(*) FROM {0} WHERE 'ABC' ILIKE pattern_str".format(tbl_name)) Is there a transposed version we should test where the literal is on the right-hand side? -- To view, visit http://gerrit.cloudera.org:8080/20785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d66680f5a7660e6a41859754c4230f276e66712 Gerrit-Change-Number: 20785 Gerrit-PatchSet: 2 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Peter Rozsa Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 13 Dec 2023 20:19:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/20681 ) Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu to local time .. Patch Set 19: There was one test failure in TestOrcStats.test_orc_stats -- To view, visit http://gerrit.cloudera.org:8080/20681 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3 Gerrit-Change-Number: 20681 Gerrit-PatchSet: 19 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Wed, 13 Dec 2023 19:57:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20773 ) Change subject: IMPALA-12229: Support soft-delete Kudu table .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10028/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/20773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7 Gerrit-Change-Number: 20773 Gerrit-PatchSet: 5 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Wed, 13 Dec 2023 19:45:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20773 ) Change subject: IMPALA-12229: Support soft-delete Kudu table .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7 Gerrit-Change-Number: 20773 Gerrit-PatchSet: 5 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Wed, 13 Dec 2023 19:45:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20681 ) Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu to local time .. Patch Set 19: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10024/ -- To view, visit http://gerrit.cloudera.org:8080/20681 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3 Gerrit-Change-Number: 20681 Gerrit-PatchSet: 19 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Wed, 13 Dec 2023 19:25:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 22: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/20506/22/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/20506/22/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@710 PS22, Line 710: AnalyzesOk("alter table functional.alltypes change column int_col `汉字` int"); nit: move up to line 699 Similar feedback to the other AnalysisError that were changed to AnalyzesOk. -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 13 Dec 2023 19:14:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12581: Fix issue of ILIKE and IREGEXP don't work correctly with non-const pattern
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/20785 ) Change subject: IMPALA-12581: Fix issue of ILIKE and IREGEXP don't work correctly with non-const pattern .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/20785/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20785/2//COMMIT_MSG@7 PS2, Line 7: don't work nit: not working http://gerrit.cloudera.org:8080/#/c/20785/2//COMMIT_MSG@16 PS2, Line 16: fix nit: fixing http://gerrit.cloudera.org:8080/#/c/20785/2/be/src/exprs/like-predicate.cc File be/src/exprs/like-predicate.cc: http://gerrit.cloudera.org:8080/#/c/20785/2/be/src/exprs/like-predicate.cc@186 PS2, Line 186: state Should we set the default value for state->case_sensitive_ in this case? -- To view, visit http://gerrit.cloudera.org:8080/20785 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d66680f5a7660e6a41859754c4230f276e66712 Gerrit-Change-Number: 20785 Gerrit-PatchSet: 2 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 13 Dec 2023 19:03:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20485 ) Change subject: IMPALA-10949: Improve batching logic of partition events .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10027/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 9 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Wed, 13 Dec 2023 18:54:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12229: Support soft-delete Kudu table
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/20773 ) Change subject: IMPALA-12229: Support soft-delete Kudu table .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3020567bb6cfe4dd48ef17906f8de674f37217e7 Gerrit-Change-Number: 20773 Gerrit-PatchSet: 4 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Wed, 13 Dec 2023 18:38:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20485 ) Change subject: IMPALA-10949: Improve batching logic of partition events .. Patch Set 9: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10023/ -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 9 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Wed, 13 Dec 2023 18:18:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20753 ) Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg tables .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad Gerrit-Change-Number: 20753 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2023 18:07:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/20681 ) Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu to local time .. Patch Set 19: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20681 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3 Gerrit-Change-Number: 20681 Gerrit-PatchSet: 19 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Wed, 13 Dec 2023 17:29:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11157: Switch to hadoop-client build
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/20779 ) Change subject: IMPALA-11157: Switch to hadoop-client build .. Patch Set 3: Initial code review checks now include an ARM build. -- To view, visit http://gerrit.cloudera.org:8080/20779 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia50b5151e5339b06ae2b623a4b2090ae6708491f Gerrit-Change-Number: 20779 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Wed, 13 Dec 2023 16:19:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 22: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14708/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 13 Dec 2023 16:11:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 19: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 19 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 13 Dec 2023 16:11:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 21: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14707/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 21 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 13 Dec 2023 15:55:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20760 ) Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables. .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14706/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315 Gerrit-Change-Number: 20760 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2023 15:36:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 22: (1 comment) http://gerrit.cloudera.org:8080/#/c/20506/22/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/20506/22/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@602 PS22, Line 602: AnalyzesOk("alter table functional.alltypes replace columns (`?최종हिंदी` int)"); line too long (97 > 90) -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 13 Dec 2023 15:26:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 22: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10026/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 13 Dec 2023 15:26:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
pranav.lo...@cloudera.com has uploaded a new patch set (#22). ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. IMPALA-12465: Unicode column name support Impala depends on Hive functions for column name validation and uses validateName() function for the same. Since Hive already supports unicode column names, the patch just updates the column name validation function to validateColumnName(). validateName() checks for a certain conformance based on pattern matching standards while validateColumnName() places no restrictions on column names at the Metadata level. Testing: The support is tested and cross-checked with Hive. The tests can be found in unicode-column-name.test. Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 --- M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java A testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test A tests/metadata/test_column_unicode.py M tests/shell/test_shell_interactive.py 6 files changed, 379 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/20506/22 -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 21: (1 comment) http://gerrit.cloudera.org:8080/#/c/20506/21/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/20506/21/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@602 PS21, Line 602: AnalyzesOk("alter table functional.alltypes replace columns (`?최종हिंदी` int)"); line too long (97 > 90) -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 21 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 13 Dec 2023 15:21:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
pranav.lo...@cloudera.com has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. IMPALA-12465: Unicode column name support Impala depends on Hive functions for column name validation and uses validateName() function for the same. Since Hive already supports unicode column names, the patch just updates the column name validation function to validateColumnName(). validateName() checks for a certain conformance based on pattern matching standards while validateColumnName() places no restrictions on column names at the Metadata level. Testing: The support is tested and cross-checked with Hive. The tests can be found in unicode-column-name.test. Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 --- M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java A testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test A tests/metadata/test_column_unicode.py M tests/shell/test_shell_interactive.py 6 files changed, 379 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/20506/21 -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 21 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/20760 ) Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables. .. Patch Set 11: (5 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/20760/10/be/src/exec/table-sink-base.h File be/src/exec/table-sink-base.h: http://gerrit.cloudera.org:8080/#/c/20760/10/be/src/exec/table-sink-base.h@90 PS10, Line 90: must already have > Nit: "must already have filled". Done http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java File fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java: http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java@115 PS6, Line 115: In case of a JOIN, and if duplicated rows ar > It is a bit nit-picky, I meant that in the sentence "If there are duplicate Updated the comment. http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java@126 PS6, Line 126: se_.size() > 1) > I wanted to ask if it is possible that modifyStmt_.fromClause_.size() == 1. Even 'UPDATE tbl SET val = 3;' has a fromClause_ (maybe the null checking is redundant, but I think it should be fine), and have a single tableRef which is for the target table 'tbl'. Updated the error message. http://gerrit.cloudera.org:8080/#/c/20760/10/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test: http://gerrit.cloudera.org:8080/#/c/20760/10/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test@400 PS10, Line 400: 1 > Are these changes compared to PS7 because of a rebase? No, this is because of the new INSERT INTO in functional_schema_template.sql. http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test: http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test@252 PS7, Line 252: FROM clause > I asked because I'm unsure whether we should add "multiple tables" to the c Done -- To view, visit http://gerrit.cloudera.org:8080/20760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315 Gerrit-Change-Number: 20760 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2023 15:14:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20760 ) Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables. .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10025/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315 Gerrit-Change-Number: 20760 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2023 15:14:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
Hello Tamas Mate, Daniel Becker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20760 to look at the new patch set (#11). Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables. .. IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables. Part 2 had some limitations, most importantly it could not update Iceberg tables if any of the following were true: * UPDATE value of partitioning column * UPDATE table that went through partition evolution * Table has SORT BY properties The problem with partitions is that the delete record and new data record might belong to different partitions and records are shuffled across based on the partitions of the delete records, hence the data files might not get written efficiently. The problem with SORT BY properties, is that we need to write the position delete files ordered by (file_path, position). To address the above problems, this patch introduces a new backend operator: IcebergBufferedDeleteSink. This new operator extracts and aggregates the delete record information from the incoming row batches, then in FlushFinal it orders the position delete records and writes them out to files. This mechanism is similar to Hive's approach: https://github.com/apache/hive/pull/3251 IcebergBufferedDeleteSink cannot spill to disk, so it can only run if there's enough memory to store the delete records. Paths are stored only once, and the int64_t positions are stored in a vector, so updating 100 Million records per node should require around 800MBs + (100K) filepaths ~= 820 MBs of memory per node. Spilling could be added later, but currently the need for it is not too realistic. Now records can get shuffled around based on the new data records' partition values, and the SORT operator sorts the records based on the SORT BY properties. There's only one case we don't allow the UPDATE statement: * UPDATE partition column AND * Right-hand side of assignment is non-constant expression AND * UPDATE statement has a JOIN When all of the above conditions meet, it would be possible to have an incorrect JOIN condition that has multiple matches for the data records, then the duplicated records would be shuffled independently (based on the new partition value) to different backend SINKs, and the different backend SINK would not be able to detect the duplicates. If any of the above conditions was false, then the duplicated records would be shuffled together to the same SINK, that could do the duplicate check. This patch also moves some code from IcebergDeleteSink to the newly introduced IcebergDeleteSinkBase. Testing: * planner tests * e2e tests * Impala/Hive interop tests Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315 --- M be/src/exec/CMakeLists.txt M be/src/exec/data-sink.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h A be/src/exec/iceberg-buffered-delete-sink.cc A be/src/exec/iceberg-buffered-delete-sink.h A be/src/exec/iceberg-delete-sink-base.cc A be/src/exec/iceberg-delete-sink-base.h A be/src/exec/iceberg-delete-sink-config.cc A be/src/exec/iceberg-delete-sink-config.h M be/src/exec/iceberg-delete-sink.cc M be/src/exec/iceberg-delete-sink.h M be/src/exec/table-sink-base.cc M be/src/exec/table-sink-base.h M be/src/exprs/slot-ref.h M be/src/runtime/dml-exec-state.h M common/thrift/DataSinks.thrift M fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java M fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java A fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java M fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-update.test M testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-update-partitions.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-update-stress.test M tests/query_test/test_iceberg.py M tests/stress/test_update_stress.py 35 files changed, 1,960 insertions(+), 345 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/20760/11 -- To view, visit http://gerrit.cloudera.org:8080/20760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF
[Impala-ASF-CR] IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20759 ) Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14705/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273 Gerrit-Change-Number: 20759 Gerrit-PatchSet: 5 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2023 15:11:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20681 ) Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu to local time .. Patch Set 18: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14704/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20681 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3 Gerrit-Change-Number: 20681 Gerrit-PatchSet: 18 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Wed, 13 Dec 2023 15:09:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20681 ) Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu to local time .. Patch Set 19: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20681 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3 Gerrit-Change-Number: 20681 Gerrit-PatchSet: 19 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Wed, 13 Dec 2023 14:57:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20681 ) Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu to local time .. Patch Set 19: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10024/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/20681 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3 Gerrit-Change-Number: 20681 Gerrit-PatchSet: 19 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Wed, 13 Dec 2023 14:57:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20681 ) Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu to local time .. Patch Set 18: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20681 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3 Gerrit-Change-Number: 20681 Gerrit-PatchSet: 18 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Wed, 13 Dec 2023 14:56:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12431: Support reading compressed JSON file
Zihao Ye has posted comments on this change. ( http://gerrit.cloudera.org:8080/20482 ) Change subject: IMPALA-12431: Support reading compressed JSON file .. Patch Set 8: The previous reply was meant for another patch. I accidentally replied in the wrong place. Please ignore it. -- To view, visit http://gerrit.cloudera.org:8080/20482 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2471855d97d4cdd51363b321055e6b06aa6d81e8 Gerrit-Change-Number: 20482 Gerrit-PatchSet: 8 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Wed, 13 Dec 2023 14:47:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns
Tamas Mate has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/20759 ) Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns .. IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns As the slots have already been created on the frontend this change focuses on populating them on the backend side. There are two major parts of this commit. Obtaining the right Accessors for the slot and recursively filling the tuples with data. The field ids are present in the struct slot's ColumnType field as a list of integers. This list can be indexed with the correct element of the SchemaPath to obtain the field id for a struct member and with that the Accessor. Once the Accessors are available the IcebergRowReader's MaterializeTuple method can be called recursively to write the primitive slots of a struct slot. Testing: - Added E2E tests Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273 --- M be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc M be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h M be/src/exec/iceberg-metadata/iceberg-row-reader.cc M be/src/exec/iceberg-metadata/iceberg-row-reader.h M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test 8 files changed, 241 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/20759/5 -- To view, visit http://gerrit.cloudera.org:8080/20759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273 Gerrit-Change-Number: 20759 Gerrit-PatchSet: 5 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time
Zihao Ye has posted comments on this change. ( http://gerrit.cloudera.org:8080/20681 ) Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu to local time .. Patch Set 18: Due to the lack of an 'only' constraint, the load of 'functional_kudu.timestamp_at_dst_changes' was skipped. This has been fixed. -- To view, visit http://gerrit.cloudera.org:8080/20681 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3 Gerrit-Change-Number: 20681 Gerrit-PatchSet: 18 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Wed, 13 Dec 2023 14:44:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12431: Support reading compressed JSON file
Zihao Ye has posted comments on this change. ( http://gerrit.cloudera.org:8080/20482 ) Change subject: IMPALA-12431: Support reading compressed JSON file .. Patch Set 8: Due to the lack of an 'only' constraint, the load of 'functional_kudu.timestamp_at_dst_changes' was skipped. This has been fixed. -- To view, visit http://gerrit.cloudera.org:8080/20482 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2471855d97d4cdd51363b321055e6b06aa6d81e8 Gerrit-Change-Number: 20482 Gerrit-PatchSet: 8 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Wed, 13 Dec 2023 14:43:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time
Hello Wenzhe Zhou, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20681 to look at the new patch set (#18). Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu to local time .. IMPALA-12322: Support converting UTC timestamps read from Kudu to local time This patch adds a query option 'convert_kudu_utc_timestamps' similar to 'convert_legacy_hive_parquet_utc_timestamps'. When enabled, it converts UTC timestamps read from Kudu to local timestamps. The corresponding modification also include predicate pushdown and runtime filter. Due to the ambiguity of timestamps caused by daylight saving time changes, it is difficult to resolve in the bloom filter. This patch additionally introduces a query option 'disable_kudu_local_timestamp_bloom_filter' to default disable the Kudu timestamp bloom filter after enabling time zone conversion in order to avoid erroneously filtering out data. However, for regions that do not observe daylight saving time, it can be set to false to re-enable the Kudu local timestamp bloom filter. Testing: - Add TestKuduTimestampConvert in query_test/test_kudu.py Perform end-to-end testing in a custom cluster, including basic Kudu UTC timestamp conversion testing, as well as checking if related predicate pushdown and runtime filters are working correctly (even with timestamps involving daylight saving time conversions). Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3 --- M be/src/exec/kudu/kudu-scanner.cc M be/src/exec/kudu/kudu-scanner.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timestamp-functions.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/query-options.cc M be/src/service/query-options.h M bin/rat_exclude_files.txt M common/function-registry/impala_functions.py M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/util/ExprUtil.java A testdata/data/timestamp_at_dst_changes.txt M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/kudu_predicate_with_timestamp_conversion.test A testdata/workloads/functional-query/queries/QueryTest/kudu_runtime_filter_with_timestamp_conversion.test A testdata/workloads/functional-query/queries/QueryTest/kudu_timestamp_conversion.test M tests/query_test/test_kudu.py 25 files changed, 592 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/20681/18 -- To view, visit http://gerrit.cloudera.org:8080/20681 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3 Gerrit-Change-Number: 20681 Gerrit-PatchSet: 18 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zihao Ye
[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 ) Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs .. Patch Set 14: (1 comment) http://gerrit.cloudera.org:8080/#/c/20367/14/tests/custom_cluster/test_sync_to_latest_hms_events.py File tests/custom_cluster/test_sync_to_latest_hms_events.py: http://gerrit.cloudera.org:8080/#/c/20367/14/tests/custom_cluster/test_sync_to_latest_hms_events.py@37 PS14, Line 37: --file_metadata_reload_properties='' I'm still understanding why we need this in some tests. Do those tests depend on schema-only AlterTable commands (e.g. add column) to also load the file metadata? -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 14 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Wed, 13 Dec 2023 13:43:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20753 ) Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg tables .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14703/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad Gerrit-Change-Number: 20753 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2023 13:47:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20485 ) Change subject: IMPALA-10949: Improve batching logic of partition events .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10023/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 9 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Wed, 13 Dec 2023 13:44:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20485 ) Change subject: IMPALA-10949: Improve batching logic of partition events .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 9 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Wed, 13 Dec 2023 13:44:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20753 ) Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg tables .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10022/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad Gerrit-Change-Number: 20753 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2023 13:43:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20753 ) Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg tables .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14702/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad Gerrit-Change-Number: 20753 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2023 13:41:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12465: Unicode column name support
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20506 ) Change subject: IMPALA-12465: Unicode column name support .. Patch Set 20: (1 comment) http://gerrit.cloudera.org:8080/#/c/20506/20/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/20506/20/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@534 PS20, Line 534: ( Compile time error: can't break a string like this. See also L536-537. -- To view, visit http://gerrit.cloudera.org:8080/20506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718 Gerrit-Change-Number: 20506 Gerrit-PatchSet: 20 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 13 Dec 2023 13:29:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20760 ) Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables. .. Patch Set 10: (9 comments) http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.cc File be/src/exec/iceberg-delete-sink.cc: http://gerrit.cloudera.org:8080/#/c/20760/5/be/src/exec/iceberg-delete-sink.cc@79 PS5, Line 79: VerifyRowsNotDuplicated > file paths and positions are not sorted across partitions. So we would need Ok, it can stay as it is. http://gerrit.cloudera.org:8080/#/c/20760/10/be/src/exec/table-sink-base.h File be/src/exec/table-sink-base.h: http://gerrit.cloudera.org:8080/#/c/20760/10/be/src/exec/table-sink-base.h@90 PS10, Line 90: must already fill Nit: "must already have filled". http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java File fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java: http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java@115 PS6, Line 115: If there are duplicates in the JOIN operator > I'm not sure what is the point here. Duplicates are only possible in the co It is a bit nit-picky, I meant that in the sentence "If there are duplicates [...] then we cannot do duplicate checking in the SINK if ..." the condition at the beginning is not necessary - if it happens that there are actually no duplicates we still can't check for them if the rows are shuffled independently. I'd suggest something like this: """ In case of a JOIN, if duplicated rows can be shuffled independently, we cannot do duplicate checking in the SINK. This is the case when ... """ http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java@126 PS6, Line 126: via UPDATE FROM > There will be always at least one tableRef because of the target table. I wanted to ask if it is possible that modifyStmt_.fromClause_.size() == 1. 1. If it is possible, then in that case the exception (currently) won't be thrown. 1a) If it should be thrown we should remove that condition. 1b) Otherwise, the error message lists the conditions that were needed to trigger the error: - partition column, - non-constant RHS -> in this case we should include "more than one table ref in the FROM clause" as well 2. If modifyStmt_.fromClause_.size() == 1 is not possible, we should remove the relevant part of the condition on L123 and add a precondition check instead. http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java: http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@101 PS6, Line 101: public TSortingOrder getSortingOrder() { > There's good chance we will need it later, e.g. optimizing a table that has Ok, it should stay then. http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java File fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java: http://gerrit.cloudera.org:8080/#/c/20760/6/fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java@34 PS6, Line 34: TableSink > It may have some value now, as there are some common fields/methods, but I' Ok, if IcebergDeleteSink will probably be deleted we can leave it as it is now. But we should open a Jira about it then. http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/datasets/functional/functional_schema_template.sql@3407 PS7, Line 3407: E TA > Makes sense, I never really thought about this as I usually re-load my tabl I agree, let's not make this patch even bigger. http://gerrit.cloudera.org:8080/#/c/20760/10/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test: http://gerrit.cloudera.org:8080/#/c/20760/10/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test@400 PS10, Line 400: 1 Are these changes compared to PS7 because of a rebase? http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test: http://gerrit.cloudera.org:8080/#/c/20760/7/testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test@252 PS7, Line 252: FROM clause > I think yes, otherwise you cannot have a join that produces duplicates. I asked because I'm unsure whether we should add
[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables
Hello Andrew Sherman, Tamas Mate, Daniel Becker, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20753 to look at the new patch set (#6). Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg tables .. IMPALA-12597: Basic Equality delete read support for Iceberg tables In general, applying equality deletes is similar to how position deletes are applied to data files: using a LEFT ANTI JOIN where the SCAN for the data rows is on the left side while the SCAN for the delete rows is on the right side of the JOIN. The difference is the virtual columns and the conjuncts being used. For equality deletes the data sequence number of a delete file has to be greater than the data sequence number of the data file being investigated. This information is added as a virtual column to the scans and a conjunct is created in the JOIN node to check the relation. The equality delete fields from the delete files are checked agains the respective columns of the data SCANS. This patch makes it possible for Impala to read Iceberg tables with basic equality delete files. The Iceberg spec gives great flexibility for engines for writing equality deletes, however in practice Flink, one of the engines that write EQ-deletes supports only a subset of the use cases. This patch focuses on reading the EQ-deletes written by Flink. The restrictions are the following: - All equality delete files in a table should have the same equality field ID list. - For partitioned Iceberg tables it is expected that the partition values are also written into the equality delete files. - Tables with equality deletes shouldn't have partition or schema evolution. - Floating point equality columns aren't supported. - If a malformed equality delete file doesn't have some of the equality field IDs then Parquet reader will fill those missing fields with NULLs. As a side effect this will drop the rows from the result where the corresponding data columns has a null value. See IMPALA-11388 epic Jira for more details. Testing: - Checked if the existing functional_parquet.iceberg_v2_delete_equality table can be read successfully. TODO: Add some test tables created by Flink to the test suite: - Partitioned table hat has equality deletes. - Table with partition evolution. - Table with schema evolution. Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad --- M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.h M common/thrift/CatalogObjects.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java A fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java A fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/delete-074a9e19e61b766e-652a169e0001_800513971_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/bb4b8c07-84e1-421a-bb6c-594f297d118e-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-3802179086205335895-1-3d36bf90-2625-4625-b09b-d4359b979df9.avro A
[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20753 ) Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg tables .. Patch Set 5: (2 comments) I added the most important test tables with this patch. Will add the rest soon. http://gerrit.cloudera.org:8080/#/c/20753/4/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/20753/4/common/thrift/PlanNodes.thrift@403 PS4, Line 403: this case. > nit: this case Done http://gerrit.cloudera.org:8080/#/c/20753/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/20753/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@289 PS4, Line 289: tblR > nit: too much indent Done -- To view, visit http://gerrit.cloudera.org:8080/20753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad Gerrit-Change-Number: 20753 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2023 13:14:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables
Hello Andrew Sherman, Tamas Mate, Daniel Becker, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20753 to look at the new patch set (#5). Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg tables .. IMPALA-12597: Basic Equality delete read support for Iceberg tables In general, applying equality deletes is similar to how position deletes are applied to data files: using a LEFT ANTI JOIN where the SCAN for the data rows is on the left side while the SCAN for the delete rows is on the right side of the JOIN. The difference is the virtual columns and the conjuncts being used. For equality deletes the data sequence number of a delete file has to be greater than the data sequence number of the data file being investigated. This information is added as a virtual column to the scans and a conjunct is created in the JOIN node to check the relation. The equality delete fields from the delete files are checked agains the respective columns of the data SCANS. This patch makes it possible for Impala to read Iceberg tables with basic equality delete files. The Iceberg spec gives great flexibility for engines for writing equality deletes, however in practice Flink, one of the engines that write EQ-deletes supports only a subset of the use cases. This patch focuses on reading the EQ-deletes written by Flink. The restrictions are the following: - All equality delete files in a table should have the same equality field ID list. - For partitioned Iceberg tables it is expected that the partition values are also written into the equality delete files. - Tables with equality deletes shouldn't have partition or schema evolution. - Floating point equality columns aren't supported. - If a malformed equality delete file doesn't have some of the equality field IDs then Parquet reader will fill those missing fields with NULLs. As a side effect this will drop the rows from the result where the corresponding data columns has a null value. See IMPALA-11388 epic Jira for more details. Testing: - Checked if the existing functional_parquet.iceberg_v2_delete_equality table can be read successfully. TODO: Add some test tables created by Flink to the test suite: - Partitioned table hat has equality deletes. - Table with partition evolution. - Table with schema evolution. Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad --- M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.h M common/thrift/CatalogObjects.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java A fe/src/main/java/org/apache/impala/catalog/IcebergDeleteTable.java A fe/src/main/java/org/apache/impala/catalog/IcebergEqualityDeleteTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-38a471ff-46f4-4350-85cc-2e7ba946b34c-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-1.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/0-0-72709aba-fb15-4bd6-9758-5f39eb9bdcb7-2.parquet A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/data/delete-074a9e19e61b766e-652a169e0001_800513971_data.0.parq A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/0cf1a310-d39c-4c6a-bfef-c3fe33cd0c25-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/3d36bf90-2625-4625-b09b-d4359b979df9-m1.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/bb4b8c07-84e1-421a-bb6c-594f297d118e-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_both_eq_and_pos/metadata/snap-3802179086205335895-1-3d36bf90-2625-4625-b09b-d4359b979df9.avro A
[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20753 ) Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg tables .. Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/20753/5/tests/query_test/test_iceberg.py File tests/query_test/test_iceberg.py: http://gerrit.cloudera.org:8080/#/c/20753/5/tests/query_test/test_iceberg.py@1278 PS5, Line 1278: \ flake8: E502 the backslash is redundant between brackets http://gerrit.cloudera.org:8080/#/c/20753/5/tests/query_test/test_iceberg.py@1279 PS5, Line 1279: \ flake8: E502 the backslash is redundant between brackets http://gerrit.cloudera.org:8080/#/c/20753/5/tests/query_test/test_iceberg.py@1280 PS5, Line 1280: \ flake8: E502 the backslash is redundant between brackets http://gerrit.cloudera.org:8080/#/c/20753/5/tests/query_test/test_iceberg.py@1281 PS5, Line 1281: \ flake8: E502 the backslash is redundant between brackets http://gerrit.cloudera.org:8080/#/c/20753/5/tests/query_test/test_iceberg.py@1282 PS5, Line 1282: \ flake8: E502 the backslash is redundant between brackets http://gerrit.cloudera.org:8080/#/c/20753/5/tests/query_test/test_iceberg.py@1283 PS5, Line 1283: \ flake8: E502 the backslash is redundant between brackets -- To view, visit http://gerrit.cloudera.org:8080/20753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2053e6f321c69f1c82059a84a5d99aeaa9814cad Gerrit-Change-Number: 20753 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2023 13:14:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20681 ) Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu to local time .. Patch Set 17: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10019/ -- To view, visit http://gerrit.cloudera.org:8080/20681 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3 Gerrit-Change-Number: 20681 Gerrit-PatchSet: 17 Gerrit-Owner: Zihao Ye Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Wed, 13 Dec 2023 13:07:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20485 ) Change subject: IMPALA-10949: Improve batching logic of partition events .. Patch Set 8: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10020/ -- To view, visit http://gerrit.cloudera.org:8080/20485 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3 Gerrit-Change-Number: 20485 Gerrit-PatchSet: 8 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Wed, 13 Dec 2023 13:06:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20648 ) Change subject: IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing .. IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing Currently we require a global invalidate to reset the events processor if the events sync is re-enabled on a table from HMS. This patch eliminates the need to reset the catalog cache when events sync is re-enabled. Implementation details: when events sync is re-enabled on table via HMS 1) If the table exists in Impala, a) We can just invalidate the table, if the current event is greater than the create event id of the table, so that it is reloaded the first time query accesses it. b) Otherwise we can just ignore the event. 2) If the table doesn't exist in Impala, create a Incomplete table, if there is no entry in the event delete log for this table. Note: If the eventSync is disabled on a table, for all subsequent table events, ideally we should mark the table as stale if the table object is loaded, so that it is reloaded the next time query accesses it. But, since this approach has performance impact, the events will be ignored. Testing: 1) manually verified few scenarios. 2) Added test case for the above scenarios. Change-Id: I37055990be49e91462ebc98aa97009ca768a0072 Reviewed-on: http://gerrit.cloudera.org:8080/20648 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M tests/custom_cluster/test_events_custom_configs.py 3 files changed, 162 insertions(+), 59 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/20648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I37055990be49e91462ebc98aa97009ca768a0072 Gerrit-Change-Number: 20648 Gerrit-PatchSet: 12 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20648 ) Change subject: IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37055990be49e91462ebc98aa97009ca768a0072 Gerrit-Change-Number: 20648 Gerrit-PatchSet: 11 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Wed, 13 Dec 2023 12:34:42 + Gerrit-HasComments: No