[Impala-ASF-CR] IMPALA-13159: Fix query cancellation caused by statestore failover
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21520 ) Change subject: IMPALA-13159: Fix query cancellation caused by statestore failover .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16345/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21520 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I720bec5199df46475b954558abb0637ca7e6298b Gerrit-Change-Number: 21520 Gerrit-PatchSet: 1 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 14 Jun 2024 04:22:25 + Gerrit-HasComments: No
[Impala-ASF-CR] [tools] Add .gitignore for new files
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21503 ) Change subject: [tools] Add .gitignore for new files .. [tools] Add .gitignore for new files Adds .gitignore for test.jceks - added with IMPALA-12920 - and hive-site-housekeeping-on (presumably added via a Hive update). Change-Id: I3d289d465fff7c81091b28cd62b9436957f8bade Reviewed-on: http://gerrit.cloudera.org:8080/21503 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M fe/src/test/resources/.gitignore A testdata/jceks/.gitignore D testdata/jceks/.gitkeep 3 files changed, 3 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I3d289d465fff7c81091b28cd62b9436957f8bade Gerrit-Change-Number: 21503 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Yida Wu
[Impala-ASF-CR] [tools] Add .gitignore for new files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21503 ) Change subject: [tools] Add .gitignore for new files .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d289d465fff7c81091b28cd62b9436957f8bade Gerrit-Change-Number: 21503 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 14 Jun 2024 04:14:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13159: Fix query cancellation caused by statestore failover
Wenzhe Zhou has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21520 Change subject: IMPALA-13159: Fix query cancellation caused by statestore failover .. IMPALA-13159: Fix query cancellation caused by statestore failover A momentary inconsistent cluster membership state after statestore failover results in query cancellation. We already have code to handle inconsistent cluster membership after statestore restarting. This patch adds code to handle inconsistent cluster membership state after statestore failover in the same way. Testing: - Added a new test case to verify that inconsistent cluster membership after statestore failover will not result in query cancellation. - Fixed closing client issue for Catalogd HA test case test_catalogd_failover_with_sync_ddl when the test fails. - Passed core test. Change-Id: I720bec5199df46475b954558abb0637ca7e6298b --- M be/src/scheduling/cluster-membership-mgr.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M tests/custom_cluster/test_catalogd_ha.py M tests/custom_cluster/test_statestored_ha.py 5 files changed, 111 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/21520/1 -- To view, visit http://gerrit.cloudera.org:8080/21520 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I720bec5199df46475b954558abb0637ca7e6298b Gerrit-Change-Number: 21520 Gerrit-PatchSet: 1 Gerrit-Owner: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21516 ) Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16344/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0 Gerrit-Change-Number: 21516 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 14 Jun 2024 00:30:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN
Hello Quanlong Huang, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21516 to look at the new patch set (#4). Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN .. IMPALA-13077: Fix selectivity estimation for SEMI JOIN JoinNode.getSemiJoinCardinality() will skip an equality expression if either NDV or Cardinality of equality expression is unknown (-1). If NDV is unknown, but Cardinality is known, getSemiJoinCardinality() should assume NDV = Cardinality. On the other hand, assume NDV = 1 if that NDV is being used as a divisor in the selectivity formula to prevent underestimation. Testing: - Add test case where LEFT SEMI JOIN from subquery can reduce cardinality estimate of leftmost ScanNode in the query plan. - Add new pattern at CARDINALITY_FILTER to ignore reduction by runtime filter. - Pass core tests. Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0 --- M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/testutil/TestUtils.java M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test 4 files changed, 58 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/21516/4 -- To view, visit http://gerrit.cloudera.org:8080/21516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0 Gerrit-Change-Number: 21516 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21516 ) Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16343/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0 Gerrit-Change-Number: 21516 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 13 Jun 2024 23:50:30 + Gerrit-HasComments: No
[native-toolchain-CR] PROTOTYPE: Build shared libs for googletest
Joe McDonnell has abandoned this change. ( http://gerrit.cloudera.org:8080/21177 ) Change subject: PROTOTYPE: Build shared libs for googletest .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/21177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I817a7721f12053540e06669402d97a69588ac0e9 Gerrit-Change-Number: 21177 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 14: (4 comments) Will take another look at this, but here is my preliminary comment. http://gerrit.cloudera.org:8080/#/c/21437/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21437/12//COMMIT_MSG@17 PS12, Line 17: r lagged behind. This patch : address this issue by always loading the verifying the partitions from : HMS without loading the file metadata from HMS regardless > Will change this. Done http://gerrit.cloudera.org:8080/#/c/21437/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21437/14//COMMIT_MSG@18 PS14, Line 18: always loading the verifying the partitions nit: "always verify the partitions" ? http://gerrit.cloudera.org:8080/#/c/21437/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/21437/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1265 PS14, Line 1265: public void load(boolean reuseMetadata, IMetaStoreClient client, : org.apache.hadoop.hive.metastore.api.Table msTbl, boolean loadPartitionFileMetadata, : boolean loadTableSchema, boolean refreshUpdatedPartitions, : @Nullable Set partitionsToUpdate, @Nullable String debugAction, : @Nullable Map partitionToEventId, String reason, : EventSequence catalogTimeline, boolean isPreLoadForInsert) This method has 12 parameter now. That is quite long. Is it possible to wrap them into one Class? And maybe fluent style Builder for that class. LoadParam param = LoadParamBuilder.reuseMetadata(true).metastoreClient(client). ... .build(); See example at ResourceProfile.java and ResourceProfileBuilder.java. http://gerrit.cloudera.org:8080/#/c/21437/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1270 PS14, Line 1270: isPreLoadForInsert Add documentation for isPreLoadForInsert parameter. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 14 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 13 Jun 2024 23:42:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21516 ) Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN .. Patch Set 3: Patch set 3 move the new test case to PlannerTest.testImplicitJoins() since that test validate cardinality. -- To view, visit http://gerrit.cloudera.org:8080/21516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0 Gerrit-Change-Number: 21516 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 13 Jun 2024 23:26:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN
Hello Quanlong Huang, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21516 to look at the new patch set (#3). Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN .. IMPALA-13077: Fix selectivity estimation for SEMI JOIN JoinNode.getSemiJoinCardinality() will skip an equality expression if either NDV or Cardinality of equality expression is unknown (-1). If NDV is unknown, but Cardinality is known, getSemiJoinCardinality() should assume NDV = Cardinality. On the other hand, assume NDV = 1 if that NDV is being used as a divisor in the selectivity formula to prevent underestimation. Testing: - Add test case where LEFT SEMI JOIN from subquery can reduce cardinality estimate of leftmost ScanNode in the query plan. - Add new pattern at CARDINALITY_FILTER to ignore reduction by runtime filter. - Pass core tests. Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0 --- M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/testutil/TestUtils.java M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test 4 files changed, 57 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/21516/3 -- To view, visit http://gerrit.cloudera.org:8080/21516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0 Gerrit-Change-Number: 21516 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21516 ) Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16342/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0 Gerrit-Change-Number: 21516 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 13 Jun 2024 23:16:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16341/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 14 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 13 Jun 2024 23:11:26 + Gerrit-HasComments: No
[Impala-ASF-CR] [tools] Add .gitignore for new files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21503 ) Change subject: [tools] Add .gitignore for new files .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10711/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/21503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d289d465fff7c81091b28cd62b9436957f8bade Gerrit-Change-Number: 21503 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 13 Jun 2024 23:05:08 + Gerrit-HasComments: No
[Impala-ASF-CR] [tools] Add .gitignore for new files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21503 ) Change subject: [tools] Add .gitignore for new files .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d289d465fff7c81091b28cd62b9436957f8bade Gerrit-Change-Number: 21503 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 13 Jun 2024 23:05:07 + Gerrit-HasComments: No
[Impala-ASF-CR] [tools] Add .gitignore for new files
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21503 ) Change subject: [tools] Add .gitignore for new files .. Patch Set 1: Code-Review+2 Simple git housekeeping, so adding +2. -- To view, visit http://gerrit.cloudera.org:8080/21503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d289d465fff7c81091b28cd62b9436957f8bade Gerrit-Change-Number: 21503 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 13 Jun 2024 23:04:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20367 ) Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs .. Patch Set 44: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16340/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 44 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 13 Jun 2024 23:03:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN
Hello Quanlong Huang, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21516 to look at the new patch set (#2). Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN .. IMPALA-13077: Fix selectivity estimation for SEMI JOIN JoinNode.getSemiJoinCardinality() will skip an equality expression if either NDV or Cardinality of equality expression is unknown (-1). If NDV is unknown, but Cardinality is known, getSemiJoinCardinality() should assume NDV = Cardinality. On the other hand, assume NDV = 1 if that NDV is being used as a divisor in the selectivity formula to prevent underestimation. Testing: - Add test case where LEFT SEMI JOIN from subquery can reduce cardinality estimate of leftmost ScanNode in the query plan. - Pass core tests. Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0 --- M fe/src/main/java/org/apache/impala/planner/JoinNode.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test 3 files changed, 55 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/21516/2 -- To view, visit http://gerrit.cloudera.org:8080/21516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0 Gerrit-Change-Number: 21516 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] [tools] Add .gitignore for new files
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/21503 ) Change subject: [tools] Add .gitignore for new files .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d289d465fff7c81091b28cd62b9436957f8bade Gerrit-Change-Number: 21503 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 13 Jun 2024 22:52:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#14). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always load the partitions from cached table. This can lead to data inconsistency issue especially in the case of event processor being turned off or lagged behind. This patch address this issue by always loading the verifying the partitions from HMS without loading the file metadata from HMS regardless of state of event processor. This approach will ensure that partition metadata is always consistent with metastore. The issue can be seen with the following steps: - Turn off the event processor - create a partitioned table and add a partition from impala - drop the same partition from hive - from impala, insert values into the partition (expectation is that if the partition didn't exist, it will create a new one). Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario for external and manged tables. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/custom_cluster/test_events_custom_configs.py 3 files changed, 241 insertions(+), 159 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/14 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 14 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
Hello Quanlong Huang, k.venureddy2...@gmail.com, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20367 to look at the new patch set (#44). Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs .. IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs The idea is that when any DDL/DML operation is performed by Impala, it also syncs the db/table to its latest event ID as per HMS. This way updates to a db/table's are applied in the same order as they appear in the Notification log table in HMS which ensures consistency. Currently catalogD applies any updates received from Impala clients in-place. Instead it should perform an HMS operation first and then replay all the HMS events since the last synced event id. Implementation: when the enable_sync_to_latest_event_on_ddls flag is set to true, we do the DDL/DML operation first, i.e., perform HMS operation and then sync the db/table in the catalogD's cache to the latest event in HMS for the corresponding db/table. By leveraging HIVE-27499 we are directly fetching the events only for the respective db/table and process them. Set 'enable_sync_to_latest_event_on_ddls'to true to enable this feature. Performance impact: DDL/DML might need more time to execute due to fetching and applying other events for corresponding metadata object. Note: We don't modify the cache using MetastoreEventsProcessor for alter table rename operation as this is a complex operation regarding cache modification (IMPALA-12553 has more details about this). We also don't modify the cache this way for the truncate table operation, unless the table is replicated or an Iceberg table. The same applies to insert operation if the table is in Iceberg format. We don't modify cache using above process for 'refresh table'/'invalidate metadata table' commands. Testing: 1) Added few tests in the MetaStoreEventProcessorForTest to verify this feature that simulates the metadata sync between HMS and Impala. 2) Added few tests in the CatalogHmsSyncToLatestEventIdTest class to the metadata sync between HMS end point, Catalog Metastore Server and Impala. The HMS end point serves as common interface to metadata changes outside the current Impala service such as Hive, Spark or other Impala service. Also verified the table lastSyncEventId is updated after the events are sync and confirmed that metastore event processor ignored these synced events. 3) Added some end-to-end tests in test_sync_to_latest_hms_events.py Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf --- M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/events/EventFactory.java M fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/events/NoOpEventProcessor.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java M fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java A tests/custom_cluster/test_sync_to_latest_hms_events.py A tests/metadata/test_common_ddl.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py M tests/metadata/test_event_processing.py M tests/metadata/test_recover_partitions.py 20 files changed, 1,302 insertions(+), 536 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/20367/44 -- To view, visit http://gerrit.cloudera.org:8080/20367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf Gerrit-Change-Number: 20367 Gerrit-PatchSet: 44 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala
[native-toolchain-CR] IMPALA-13158: Skip building Thrift for rust/dotnet
Hello Laszlo Gaal, Joe McDonnell, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21515 to look at the new patch set (#2). Change subject: IMPALA-13158: Skip building Thrift for rust/dotnet .. IMPALA-13158: Skip building Thrift for rust/dotnet Skips building Thrift on rust (with-rs) and .NET core (dotnet in Thrift 11, netstd in Thrift 16). Avoids errors if rust compiler is old or dotnet missing an SDK. Change-Id: Iac3b3ca9dffc45853dc22d5c34089b9b9f9e242d --- M source/thrift/build.sh 1 file changed, 3 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/15/21515/2 -- To view, visit http://gerrit.cloudera.org:8080/21515 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iac3b3ca9dffc45853dc22d5c34089b9b9f9e242d Gerrit-Change-Number: 21515 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Smith
[native-toolchain-CR] IMPALA-13158: Skip building Thrift for rust/dotnet
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/21515 ) Change subject: IMPALA-13158: Skip building Thrift for rust/dotnet .. Patch Set 1: Code-Review+2 This makes sense to me -- To view, visit http://gerrit.cloudera.org:8080/21515 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iac3b3ca9dffc45853dc22d5c34089b9b9f9e242d Gerrit-Change-Number: 21515 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Thu, 13 Jun 2024 22:30:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21516 ) Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16339/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0 Gerrit-Change-Number: 21516 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 13 Jun 2024 22:11:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12712: Invalidate metadata on table should set better createEventId
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21402 ) Change subject: IMPALA-12712: Invalidate metadata on table should set better createEventId .. IMPALA-12712: Invalidate metadata on table should set better createEventId "INVALIDATE METADATA " can be used to bring up a table in Impala's catalog cache if the table exists in HMS. Currently, createEventId for such tables are always set as -1 which will lead to always removing the table. Sequence of drop table + create table + invalidate table can lead to flaky test failures like IMPALA-12266. Solution: When Invalidate metadata is fired, fetch the latest eventId from HMS and set it as createEventId for the table, so that drop table event that happend before invalidate query will be ignored without removing the table from cache. Note: Also removed an unnecessary RPC call to HMS to get table object since we alrady have required info in table metadata rpc call. Testing: - Added an end-to-end test to verify that drop table event happened before time shouldn't remove the metadata object from cache. Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd Reviewed-on: http://gerrit.cloudera.org:8080/21402 Reviewed-by: Csaba Ringhofer Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/custom_cluster/test_events_custom_configs.py 4 files changed, 34 insertions(+), 11 deletions(-) Approvals: Csaba Ringhofer: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/21402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd Gerrit-Change-Number: 21402 Gerrit-PatchSet: 5 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12712: Invalidate metadata on table should set better createEventId
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21402 ) Change subject: IMPALA-12712: Invalidate metadata on table should set better createEventId .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd Gerrit-Change-Number: 21402 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 13 Jun 2024 22:08:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21516 Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN .. IMPALA-13077: Fix selectivity estimation for SEMI JOIN JoinNode.getSemiJoinCardinality() will skip an equality expression if either NDV or Cardinality of equality expression is unknown (-1). If NDV is unknown, but Cardinality is known, getSemiJoinCardinality() should assume NDV = Cardinality. On the other hand, assume NDV = 1 if that NDV is being used as a divisor in the selectivity formula to prevent underestimation. Testing: - Add test case where LEFT SEMI JOIN from subquery can reduce cardinality estimate of leftmost ScanNode in the query plan. - Pass core tests. Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0 --- M fe/src/main/java/org/apache/impala/planner/JoinNode.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test 3 files changed, 59 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/21516/1 -- To view, visit http://gerrit.cloudera.org:8080/21516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0 Gerrit-Change-Number: 21516 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[native-toolchain-CR] IMPALA-13158: Skip building Thrift for rust/dotnet
Michael Smith has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21515 Change subject: IMPALA-13158: Skip building Thrift for rust/dotnet .. IMPALA-13158: Skip building Thrift for rust/dotnet Skips building Thrift on rust (with-rs) and .NET core (dotnet in Thrift 11, netstd in Thrift 17). Avoids errors if rust compiler is old or dotnet missing an SDK. Change-Id: Iac3b3ca9dffc45853dc22d5c34089b9b9f9e242d --- M source/thrift/build.sh 1 file changed, 3 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/15/21515/1 -- To view, visit http://gerrit.cloudera.org:8080/21515 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iac3b3ca9dffc45853dc22d5c34089b9b9f9e242d Gerrit-Change-Number: 21515 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith
[native-toolchain-CR] WIP: Add OpenTelemetry C++
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21514 ) Change subject: WIP: Add OpenTelemetry C++ .. Patch Set 1: Started on this for my own experiments. Builds on Ubuntu 22, still need to do a test build on other platforms and try it out in Impala. -- To view, visit http://gerrit.cloudera.org:8080/21514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1657e2d344e90654b5d009d65973d762c0081a05 Gerrit-Change-Number: 21514 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Thu, 13 Jun 2024 21:42:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost
Hello Quanlong Huang, Abhishek Rawat, David Rorke, Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21504 to look at the new patch set (#3). Change subject: IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost .. IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost TOP-N cost will turn into NaN if inputCardinality is equal to 0 due to Math.log(inputCardinality). This patch fix the issue by avoiding Math.log(0) and replace it with 0 instead. After this patch, Instantiating BaseProcessingCost with NaN, infinite, or negative totalCost will throw IllegalArgumentException. In BaseProcessingCost.getDetails(), "total-cost" is renamed to "raw-cost" to avoid confusion with "cost-total" in ProcessingCost.getDetails(). Testing: - Add testcase that run TOP-N query over empty table. - Compute ProcessingCost in most FE and EE test even when COMPUTE_PROCESSING_COST option is not enabled by checking if RuntimeEnv.INSTANCE.isTestEnv() is True or TEST_REPLAN option is enabled. - Pass core test. Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177 --- M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java M fe/src/main/java/org/apache/impala/planner/BroadcastProcessingCost.java M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java M fe/src/main/java/org/apache/impala/planner/ScaledProcessingCost.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/SumProcessingCost.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q43-verbose.test 30 files changed, 218 insertions(+), 76 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/21504/3 -- To view, visit http://gerrit.cloudera.org:8080/21504 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177 Gerrit-Change-Number: 21504 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[native-toolchain-CR] WIP: Add OpenTelemetry C++
Michael Smith has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21514 Change subject: WIP: Add OpenTelemetry C++ .. WIP: Add OpenTelemetry C++ Change-Id: I1657e2d344e90654b5d009d65973d762c0081a05 --- M buildall.sh A source/opentelemetry-cpp/build.sh 2 files changed, 38 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/14/21514/1 -- To view, visit http://gerrit.cloudera.org:8080/21514 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I1657e2d344e90654b5d009d65973d762c0081a05 Gerrit-Change-Number: 21514 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith
[Impala-ASF-CR] IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21455 ) Change subject: IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16336/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iee27cbbe8d3100301d05a6516b62c45975a8d0e0 Gerrit-Change-Number: 21455 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 Jun 2024 19:12:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 3: (6 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h File be/src/exec/iceberg-delete-builder.h: http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@95 PS2, Line 95: // Collects the processed data files' file paths and fills 'intermediate_delete_rows_' > The function comment is not valid now. Would be nice to mention that this p Done http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@124 PS2, Line 124: // Keys are data file paths processed by the join operator. Values are the file > Can you add a comment what the keys and values are for in this map? Done http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@146 PS2, Line 146: : /// Inserts the conte > nit: seems a weird place to break the line Done http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@184 PS2, Line 184: auto& delete_vector = > To increase readability, do you think it'd make sense to introduce a variab Done http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@259 PS2, Line 259: rrent delet > nit: I'd break the line before the first param, and then each param could f Done http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@325 PS2, Line 325: state->LogError( > KrpcDataStreamSender already filters out NULL file_paths. Is there a need t We can see NULLs when NUM_NODES=1 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 Jun 2024 16:31:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder
Hello Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21452 to look at the new patch set (#3). Change subject: IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder .. IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder With this patch IcebergDeleteBuilder checks how many probe threads are actually blocked on the builder. Let's assume the following plan: UNION ALL / \ / \ / \ SCAN allANTI JOIN datafiles / \ without / \ deletes SCAN SCAN datafilesdeletes with deletes In that case UNION ALL, and the two "SCAN datafiles" operators are in the same fragment, while the builder of the ANTI JOIN is in a different fragment. This means that "SCAN datafiles without deletes" can run in parallel with the builder. But once that SCAN is exhausted, the UNION ALL will drain rows from "SCAN datafiles with deletes" via the ANTI JOIN operator, but that operator depends on the join builder output. This means in some cases the SCAN fragments are busy, while in other cases the SCAN fragments are blocked. It depends on how much work they need to do, and how much work the build-side needs to do. So to handle all cases, we dynamically check how many probe fragments are blocked on the builder, then spin up as many threads to parellelize the final sort. This also works well when we have the following plan: ANTI JOIN / \ / \ SCAN SCAN datafilesdeletes with deletes The above plan is created when all data files have corresponding deletes, or when we are running a simple count(*) query. In that case all "SCAN datafiles" fragments are blocked on the builder, so we can use that many threads to sort the build results. A new field "ThreadCountInFinalBuild" was added, so we can check the query profile about how many threads were used for the final sorting in the builders. Measurements: In a table with 1 Trillion data records and 68.5 Billion delete records it reduced "IcebergDeletePositionSortTimer" from ~1 minute to 8-10 seconds, in an environment with 40 executors and MT_DOP=12. TODO: * e2e tests that check counter "ThreadCountInFinalBuild" Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807 --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h M be/src/exec/join-builder.cc M be/src/exec/join-builder.h 4 files changed, 105 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/21452/3 -- To view, visit http://gerrit.cloudera.org:8080/21452 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807 Gerrit-Change-Number: 21452 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-12712: Invalidate metadata on table should set better createEventId
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21402 to look at the new patch set (#4). Change subject: IMPALA-12712: Invalidate metadata on table should set better createEventId .. IMPALA-12712: Invalidate metadata on table should set better createEventId "INVALIDATE METADATA " can be used to bring up a table in Impala's catalog cache if the table exists in HMS. Currently, createEventId for such tables are always set as -1 which will lead to always removing the table. Sequence of drop table + create table + invalidate table can lead to flaky test failures like IMPALA-12266. Solution: When Invalidate metadata is fired, fetch the latest eventId from HMS and set it as createEventId for the table, so that drop table event that happend before invalidate query will be ignored without removing the table from cache. Note: Also removed an unnecessary RPC call to HMS to get table object since we alrady have required info in table metadata rpc call. Testing: - Added an end-to-end test to verify that drop table event happened before time shouldn't remove the metadata object from cache. Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/custom_cluster/test_events_custom_configs.py 4 files changed, 34 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/21402/4 -- To view, visit http://gerrit.cloudera.org:8080/21402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd Gerrit-Change-Number: 21402 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-13137: Add additional client fetch metrics columns to the queries page
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21482 ) Change subject: IMPALA-13137: Add additional client fetch metrics columns to the queries page .. Patch Set 4: (6 comments) http://gerrit.cloudera.org:8080/#/c/21482/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21482/4//COMMIT_MSG@11 PS4, Line 11: - First row fetched time : - Client fetch wait time Match this with chosen label in queries.tmpl http://gerrit.cloudera.org:8080/#/c/21482/4//COMMIT_MSG@18 PS4, Line 18: fetch all rows "fetch all rows since the first row" http://gerrit.cloudera.org:8080/#/c/21482/4/be/src/service/impala-http-handler.cc File be/src/service/impala-http-handler.cc: http://gerrit.cloudera.org:8080/#/c/21482/4/be/src/service/impala-http-handler.cc@463 PS4, Line 463: " fetch all rows." " fetch all rows since the first row." http://gerrit.cloudera.org:8080/#/c/21482/4/be/src/service/impala-http-handler.cc@548 PS4, Line 548: "First row fetched" Make this a constant and share it with the same occurrence in coordinator.cc runtime/coordinator.cc:1051:query_events_->MarkEvent("First row fetched"); http://gerrit.cloudera.org:8080/#/c/21482/3/www/queries.tmpl File www/queries.tmpl: http://gerrit.cloudera.org:8080/#/c/21482/3/www/queries.tmpl@84 PS3, Line 84: {{default_db}} : {{stmt_type}} > I agree. Done. Done http://gerrit.cloudera.org:8080/#/c/21482/4/www/queries.tmpl File www/queries.tmpl: http://gerrit.cloudera.org:8080/#/c/21482/4/www/queries.tmpl@117 PS4, Line 117: First row fetched : : Client fetch wait time nit: Since this column is all about Client time, I'd suggest shorter label "FirstFetch" and "FetchDuration". Does that make sense? -- To view, visit http://gerrit.cloudera.org:8080/21482 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74a9393a7b38750de0c3f6230b6e5e048048c4b5 Gerrit-Change-Number: 21482 Gerrit-PatchSet: 4 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 13 Jun 2024 20:59:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21504 ) Change subject: IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/21504/2/fe/src/main/java/org/apache/impala/planner/DataSink.java File fe/src/main/java/org/apache/impala/planner/DataSink.java: http://gerrit.cloudera.org:8080/#/c/21504/2/fe/src/main/java/org/apache/impala/planner/DataSink.java@141 PS2, Line 141: "Processing cost of DataSink " + fragment_.getId() + ":" + getLabel() nit: could improve this string construction using a format string with Preconditions.checkState: https://guava.dev/releases/19.0/api/docs/com/google/common/base/Preconditions.html#checkState(boolean,%20java.lang.String,%20java.lang.Object...) http://gerrit.cloudera.org:8080/#/c/21504/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: http://gerrit.cloudera.org:8080/#/c/21504/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@342 PS2, Line 342: long outputCardinality = Math.max(0, getCardinality()); Is there some reason we need cardinality_ to default to -1 rather than 0? -- To view, visit http://gerrit.cloudera.org:8080/21504 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177 Gerrit-Change-Number: 21504 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 13 Jun 2024 18:35:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21455 ) Change subject: IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/21455/7/tests/query_test/test_join_queries.py File tests/query_test/test_join_queries.py: http://gerrit.cloudera.org:8080/#/c/21455/7/tests/query_test/test_join_queries.py@218 PS7, Line 218: parquet/snap/bloc > Is there a reason for using this instead of default parquet/none? With parquet_none, the catalog_sales table is not partitioned, and the minimum MEM_LIMIT to pass admission control is 183mb. That is high enough for this test to pass with and without the improvement code. Meanwhile, in tpcds_partitioned_snap_block, both store_sales and catalog_sales are partitioned, allowing query to pass admission control with lower 149mb MEM_LIMIT. However, without the improvement code, it will fail execution in backend due to memory limit exceeded. https://issues.apache.org/jira/browse/IMPALA-13075?focusedCommentId=17848720=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17848720 I will mention this in test comment. -- To view, visit http://gerrit.cloudera.org:8080/21455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iee27cbbe8d3100301d05a6516b62c45975a8d0e0 Gerrit-Change-Number: 21455 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 Jun 2024 18:16:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13076 Add pstack and jstack to Impala Redhat docker images
Andrew Sherman has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/21433 ) Change subject: IMPALA-13076 Add pstack and jstack to Impala Redhat docker images .. IMPALA-13076 Add pstack and jstack to Impala Redhat docker images When the Impala docker images are deployed in production environments, it can be hard to add debugging tools at runtime. Two of the most useful diagnostic tools are jstack and pstack, which can be used to print Java and native stack traces. Install these tools into Redhat images which are the most commonly used in production. To install pstack we install gdb To install jstack we install a development jdk on top of the headless jdk. Extend the install_os_packages.sh script to add an argument to --install-debug-tools to set the level of diagnostic tools to install. The possible arguments are: none - install no extra tools basic - install pstack and jstack full - install more debugging tools. In a Centos 8.5 build, the size of a impalad_coord_exec image increased from 1.74GB to 1.85GB, as reported by ‘docker image list’. What other tools might be added? - Installing perf is tricky as in a container perf requires an installation specific to the underlying linux kernel image, which is hard to predict at build time. - Installing pprof is hard as installation seems to require compiling from sources. Clearly there are many options and we cannot install everything. TESTING Built release and debug docker images, and used jstack and pstack in a running container to print Impala's stacks. Change-Id: I25e6827b86564a9c0fc25678e4a194ee8e0be0e9 --- M docker/CMakeLists.txt M docker/impala_base/Dockerfile M docker/impala_profile_tool/Dockerfile M docker/install_os_packages.sh 4 files changed, 44 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/21433/4 -- To view, visit http://gerrit.cloudera.org:8080/21433 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I25e6827b86564a9c0fc25678e4a194ee8e0be0e9 Gerrit-Change-Number: 21433 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21452 ) Change subject: IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder .. Patch Set 2: (7 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/21452/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21452/2//COMMIT_MSG@33 PS2, Line 33: build fragments > aren't these 'probe' fragments that are blocked on the builder? Thanks for catching this, done. http://gerrit.cloudera.org:8080/#/c/21452/2//COMMIT_MSG@37 PS2, Line 37: The > nit: typo Done http://gerrit.cloudera.org:8080/#/c/21452/2//COMMIT_MSG@57 PS2, Line 57: lowered > reduced? Done http://gerrit.cloudera.org:8080/#/c/21452/2/be/src/exec/iceberg-delete-builder.h File be/src/exec/iceberg-delete-builder.h: http://gerrit.cloudera.org:8080/#/c/21452/2/be/src/exec/iceberg-delete-builder.h@147 PS2, Line 147: Status FinalizeBuildImplParallel(int num_threads); > Since this one can return error codes too, could you add a comment in what Done http://gerrit.cloudera.org:8080/#/c/21452/2/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21452/2/be/src/exec/iceberg-delete-builder.cc@314 PS2, Line 314: IcebergDeleteBuilder::DeleteRowVector IcebergDeleteBuilder::GetFinalDeleteRowVector( > This and the other PR made me think about whther it makes sense to create a That's a good idea. However, I have another Jira ticket to throw out all of these, or at least the final delete vectors, and use RoaringBitmap: IMPALA-13109 I'm afraid I'd have to throw out half of the work with that change. So probably it would be better to do the encapsulation and refactoring for the RoaringBitmap. http://gerrit.cloudera.org:8080/#/c/21452/2/be/src/exec/iceberg-delete-builder.cc@320 PS2, Line 320: int64_t capacity = 0; > indentation Done http://gerrit.cloudera.org:8080/#/c/21452/2/be/src/exec/join-builder.h File be/src/exec/join-builder.h: http://gerrit.cloudera.org:8080/#/c/21452/2/be/src/exec/join-builder.h@197 PS2, Line 197: int probes_waiting_on_builder_ = 0; > The variable name is self-explanatory, but some additional comment that is Done -- To view, visit http://gerrit.cloudera.org:8080/21452 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807 Gerrit-Change-Number: 21452 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 Jun 2024 16:59:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21504 ) Change subject: IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21504 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177 Gerrit-Change-Number: 21504 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 13 Jun 2024 19:55:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13076 Add pstack and jstack to Impala Redhat docker images
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/21433 ) Change subject: IMPALA-13076 Add pstack and jstack to Impala Redhat docker images .. Patch Set 4: Code-Review+1 This is looking good to me, thanks for putting this together. I'll bump to +2 when it goes through precommit. -- To view, visit http://gerrit.cloudera.org:8080/21433 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I25e6827b86564a9c0fc25678e4a194ee8e0be0e9 Gerrit-Change-Number: 21433 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Thu, 13 Jun 2024 18:55:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21504 ) Change subject: IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16338/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21504 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177 Gerrit-Change-Number: 21504 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 13 Jun 2024 19:44:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21504 ) Change subject: IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/21504/2/fe/src/main/java/org/apache/impala/planner/DataSink.java File fe/src/main/java/org/apache/impala/planner/DataSink.java: http://gerrit.cloudera.org:8080/#/c/21504/2/fe/src/main/java/org/apache/impala/planner/DataSink.java@141 PS2, Line 141: "Processing cost of DataSink %s:%s is invalid! %s", fragment_.getId(), getLabel(), > nit: could improve this string construction using a format string with Prec Done http://gerrit.cloudera.org:8080/#/c/21504/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: http://gerrit.cloudera.org:8080/#/c/21504/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@342 PS2, Line 342: long outputCardinality = Math.max(0, getCardinality()); > Is there some reason we need cardinality_ to default to -1 rather than 0? PlanNode starts with unknown cardinality (-1) until stats are analyzed. Unknown is represented as -1, and some planner code handle unknown cardinality with best guess estimate like in some ScanNode subclasses when stats are unavailable. https://gerrit.cloudera.org/c/21504/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java#2256 -- To view, visit http://gerrit.cloudera.org:8080/21504 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177 Gerrit-Change-Number: 21504 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 13 Jun 2024 19:23:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21435 ) Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16333/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 Jun 2024 16:54:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12712: Invalidate metadata on table should set better createEventId
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21402 ) Change subject: IMPALA-12712: Invalidate metadata on table should set better createEventId .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16334/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd Gerrit-Change-Number: 21402 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 13 Jun 2024 17:23:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13076 Add pstack and jstack to Impala Redhat docker images
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21433 ) Change subject: IMPALA-13076 Add pstack and jstack to Impala Redhat docker images .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16337/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21433 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I25e6827b86564a9c0fc25678e4a194ee8e0be0e9 Gerrit-Change-Number: 21433 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Thu, 13 Jun 2024 19:15:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21452 ) Change subject: IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16335/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21452 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807 Gerrit-Change-Number: 21452 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 Jun 2024 17:26:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21455 ) Change subject: IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/21455/7/tests/common/test_dimensions.py File tests/common/test_dimensions.py: http://gerrit.cloudera.org:8080/#/c/21455/7/tests/common/test_dimensions.py@119 PS7, Line 119: > the create...dimension functions below could be shorter by calling this fun Done http://gerrit.cloudera.org:8080/#/c/21455/7/tests/query_test/test_join_queries.py File tests/query_test/test_join_queries.py: http://gerrit.cloudera.org:8080/#/c/21455/7/tests/query_test/test_join_queries.py@218 PS7, Line 218: > With parquet_none, the catalog_sales table is not partitioned, and the mini Done -- To view, visit http://gerrit.cloudera.org:8080/21455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iee27cbbe8d3100301d05a6516b62c45975a8d0e0 Gerrit-Change-Number: 21455 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 Jun 2024 18:48:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB
Hello Yida Wu, Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21455 to look at the new patch set (#8). Change subject: IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB .. IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB ExprValuesCache uses BATCH_SIZE as a deciding factor to set its capacity. It bounds the capacity such that expr_values_array_ memory usage stays below 256KB. This patch tightens that limit to include all memory usage from ExprValuesCache::MemUsage() instead of expr_values_array_ only. Therefore, setting a very high BATCH_SIZE will not push the total memory usage of ExprValuesCache beyond 256KB. Simplify table dimension creation methods and fix few flake8 warnings in test_dimensions.py. Testing: - Add test_join_queries.py::TestExprValueCache. - Pass core tests. Change-Id: Iee27cbbe8d3100301d05a6516b62c45975a8d0e0 --- M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M bin/rat_exclude_files.txt A testdata/workloads/tpcds_partitioned/queries M tests/common/test_dimensions.py M tests/query_test/test_join_queries.py 6 files changed, 70 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/21455/8 -- To view, visit http://gerrit.cloudera.org:8080/21455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iee27cbbe8d3100301d05a6516b62c45975a8d0e0 Gerrit-Change-Number: 21455 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12712: Invalidate metadata on table should set better createEventId
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21402 ) Change subject: IMPALA-12712: Invalidate metadata on table should set better createEventId .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/21402/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21402/3//COMMIT_MSG@19 PS3, Line 19: event happend nit: "event that happened" -- To view, visit http://gerrit.cloudera.org:8080/21402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd Gerrit-Change-Number: 21402 Gerrit-PatchSet: 3 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 13 Jun 2024 16:38:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12712: Invalidate metadata on table should set better createEventId
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/21402 ) Change subject: IMPALA-12712: Invalidate metadata on table should set better createEventId .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/21402/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21402/3//COMMIT_MSG@19 PS3, Line 19: event that hap > nit: "event that happened" Done -- To view, visit http://gerrit.cloudera.org:8080/21402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd Gerrit-Change-Number: 21402 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 13 Jun 2024 16:58:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12712: Invalidate metadata on table should set better createEventId
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21402 ) Change subject: IMPALA-12712: Invalidate metadata on table should set better createEventId .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd Gerrit-Change-Number: 21402 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 13 Jun 2024 16:59:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12712: Invalidate metadata on table should set better createEventId
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21402 ) Change subject: IMPALA-12712: Invalidate metadata on table should set better createEventId .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10710/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/21402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd Gerrit-Change-Number: 21402 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 13 Jun 2024 17:00:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder
Hello Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21435 to look at the new patch set (#3). Change subject: IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder .. IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder When there are lots of delete records the IcebergDeleteBuilder can become a bottleneck. Since the left side of the JOIN is blocked on the build side any improvement we make here significantly improves Iceberg V2 table scanning. Improvements of this patch: * Use a vector of vectors to collect the position delete records. This way we can avoid large re-allocations and copyings. * Insert large ranges from the build batches into the collected delete records instead of doing it one-by-one. Measurements Local measurement with 824 Million position delete records: JOIN BUILD: ~32s -> ~14s (6s is the final sorting) 40-node cluster with 68.5 Billion position delete records: JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting) Parallelization of the final sort will be added in a follow-up CR. Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf --- M be/src/exec/iceberg-delete-builder.cc M be/src/exec/iceberg-delete-builder.h 2 files changed, 108 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/3 -- To view, visit http://gerrit.cloudera.org:8080/21435 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf Gerrit-Change-Number: 21435 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21504 ) Change subject: IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16332/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21504 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177 Gerrit-Change-Number: 21504 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 13 Jun 2024 15:13:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12732: Add support for MERGE statements for Iceberg tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/21423 ) Change subject: IMPALA-12732: Add support for MERGE statements for Iceberg tables .. Patch Set 7: (25 comments) Let me flush another batch of comments now. I'm somewhere halfway through the analyzer changes. http://gerrit.cloudera.org:8080/#/c/21423/6/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/21423/6/fe/src/main/cup/sql-parser.cup@1027 PS6, Line 1027: List cases = Lists.newArrayList(merge_case); nit: I think here we can merge these 2 rows http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java File fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java: http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@60 PS7, Line 60: public class IcebergMergeImpl extends MergeImpl { As I wrote in MergeStmt, I'm not sure I'm convinced that we need a separate abstraction for the implementation. One reason is that we don't plan to add any other implementations and even if we did, the coding should hppen the other way around: when we add the other Merge impl then as a separate commit we could do the extra level of abstracion then as a refactor. The other thing I find myself uncomfortable with is that this IcebergMergeImpl now uses all the inner members of MergeStmt, even the MergeStmt itself is passed to the constructor. For me this asks for Inheritance rather than an impl member. http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@93 PS7, Line 93: if (mergeStmt_.isOnlyMatchedCases()) { : sourceTableRef_.setJoinOp(JoinOperator.INNER_JOIN); : } else { : sourceTableRef_.setJoinOp(JoinOperator.FULL_OUTER_JOIN); : } : sourceTableRef_.setOnClause(on_); : sourceTableRef_.setLeftTblRef(targetTableRef_); This could go into a setJoinParams() function http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@120 PS7, Line 120: icebergPositionalDeleteTable_ = new IcebergPositionDeleteTable( If you have only a WHEN NOT NATCHED THEN INSERT case do you still have to create a pos del table? http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@127 PS7, Line 127: for (MergeCase mergeCase : mergeStmt_.getCases()) { Since all these merge cases live inside MergeStmt, shouldn't this code live in MergeStmt.analyze()? http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@131 PS7, Line 131: analyzer.registerPrivReq( Here we require ALL privs on the target table. Shouldn't we also expect SELECT privs on the source table? http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@138 PS7, Line 138: private void addMergeActionTuple(Analyzer analyzer) { Function comment pls http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@140 PS7, Line 140: analyzer.getDescTbl().createTupleDescriptor(MERGE_ACTION_TUPLE_NAME); What is a merge action and why is a tuple needed for it? Could you comment it on the class? http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@153 PS7, Line 153: public void substituteResultExprs(ExprSubstitutionMap smap, It stinks for me that MergeStmt also has a fn with similar name but here we inherit this not from there but from another interface, MergeImpl. For me this also indicates that this class should be a child of MergeStmt http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@186 PS7, Line 186: public PlanNode getPlanNode(PlannerContext ctx, PlanNode parent, Creating new PlanNodes should be the responsibility of the Planner, in my opinion. An Analysis class in the Analysis package shouldn't know how a corresponding PlanNode is created. http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@204 PS7, Line 204: public DataSink createDataSink() { I think this should be an override. Also true to some other functions in this class. http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeImpl.java File fe/src/main/java/org/apache/impala/analysis/MergeImpl.java: http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeImpl.java@20 PS7, Line 20: public abstract QueryStmt prepareQuery(); Since this serves as an interface for the Merge implementations
[Impala-ASF-CR] IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost
Hello Quanlong Huang, Abhishek Rawat, David Rorke, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21504 to look at the new patch set (#2). Change subject: IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost .. IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost TOP-N cost will turn into NaN if inputCardinality is equal to 0 due to Math.log(inputCardinality). This patch fix the issue by avoiding Math.log(0) and replace it with 0 instead. After this patch, Instantiating BaseProcessingCost with NaN, infinite, or negative totalCost will throw IllegalArgumentException. In BaseProcessingCost.getDetails(), "total-cost" is renamed to "raw-cost" to avoid confusion with "cost-total" in ProcessingCost.getDetails(). Testing: - Add testcase that run TOP-N query over empty table. - Compute ProcessingCost in most FE and EE test even when COMPUTE_PROCESSING_COST option is not enabled by checking if RuntimeEnv.INSTANCE.isTestEnv() is True or TEST_REPLAN option is enabled. - Pass core test. Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177 --- M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q43-verbose.test 25 files changed, 207 insertions(+), 69 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/21504/2 -- To view, visit http://gerrit.cloudera.org:8080/21504 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177 Gerrit-Change-Number: 21504 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-13138: Never smallify existing StringValue objects, only new ones during DeepCopy
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21502 ) Change subject: IMPALA-13138: Never smallify existing StringValue objects, only new ones during DeepCopy .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16331/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I739048b37a59a81c41c85d475fad00cb520a5f99 Gerrit-Change-Number: 21502 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 Jun 2024 13:26:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13138: Never smallify existing StringValue objects, only new ones during DeepCopy
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21502 ) Change subject: IMPALA-13138: Never smallify existing StringValue objects, only new ones during DeepCopy .. Patch Set 4: (4 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value-test.cc File be/src/runtime/string-value-test.cc: http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value-test.cc@81 PS3, Line 81: , cons > Could take const ref. Done http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value-test.cc@81 PS3, Line 81: Small > Nit: "...Smaller..." would be better. Done http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value.h File be/src/runtime/string-value.h: http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value.h@73 PS3, Line 73: DCHECK_LE(source.Len(), SmallableString::SMALL_LIMIT); > Is there a reason why we DCHECK it instead of simply returning a non-smalli To ensure that the user of this function can do anything with the returned object. We could only ensure this for non-small string values if we deep copied them, and that would require a mem pool parameter as well. http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/tuple.cc File be/src/runtime/tuple.cc: http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/tuple.cc@112 PS3, Line 112: , > Nit: comma instead of period. Done -- To view, visit http://gerrit.cloudera.org:8080/21502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I739048b37a59a81c41c85d475fad00cb520a5f99 Gerrit-Change-Number: 21502 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 Jun 2024 13:14:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21501 ) Change subject: IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom() .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16330/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21501 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6a1d03d65ec4339a0f33e69ff29abdd8cc3e3067 Gerrit-Change-Number: 21501 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 13 Jun 2024 13:16:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13138: Never smallify existing StringValue objects, only new ones during DeepCopy
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21502 ) Change subject: IMPALA-13138: Never smallify existing StringValue objects, only new ones during DeepCopy .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10709/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I739048b37a59a81c41c85d475fad00cb520a5f99 Gerrit-Change-Number: 21502 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 Jun 2024 13:14:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom()
Daniel Becker has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/21501 ) Change subject: IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom() .. IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom() In StringVal::CopyFrom(), we take the 'len' parameter as a size_t, which is usually a 64-bit unsigned integer. We pass it to the constructor of StringVal, which takes it as an int, which is usually a 32-bit signed integer. The constructor then allocates memory for the length using the int value, but afterwards in CopyFrom(), we copy the buffer with the size_t length. If size_t is indeed 64 bits and int is 32 bits, and the value is truncated, we may copy more bytes that what we have allocated for the destination. Note that in the constructor of StringVal it is checked whether the length is greater than 1GB, but if the value is truncated because of the type conversion, the check doesn't necessarily catch it as the truncated value may be small. This change fixes the problem by doing the length check with 64 bit integers in StringVal::CopyFrom(). Testing: - added unit tests for StringVal::CopyFrom() in udf-test.cc. Change-Id: I6a1d03d65ec4339a0f33e69ff29abdd8cc3e3067 --- M be/src/udf/udf-test.cc M be/src/udf/udf.cc M be/src/udf/udf.h 3 files changed, 90 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/21501/2 -- To view, visit http://gerrit.cloudera.org:8080/21501 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6a1d03d65ec4339a0f33e69ff29abdd8cc3e3067 Gerrit-Change-Number: 21501 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-12732: Add support for MERGE statements for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21423 ) Change subject: IMPALA-12732: Add support for MERGE statements for Iceberg tables .. Patch Set 7: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/16329/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/21423 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3416a79740eddc446c87f72bf1a85ed3f71af268 Gerrit-Change-Number: 21423 Gerrit-PatchSet: 7 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Peter Rozsa Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 Jun 2024 12:40:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21455 ) Change subject: IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB .. Patch Set 7: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/21455/7/tests/common/test_dimensions.py File tests/common/test_dimensions.py: http://gerrit.cloudera.org:8080/#/c/21455/7/tests/common/test_dimensions.py@119 PS7, Line 119: create_table_format_dimension the create...dimension functions below could be shorter by calling this functions http://gerrit.cloudera.org:8080/#/c/21455/7/tests/query_test/test_join_queries.py File tests/query_test/test_join_queries.py: http://gerrit.cloudera.org:8080/#/c/21455/7/tests/query_test/test_join_queries.py@218 PS7, Line 218: parquet/snap/bloc Is there a reason for using this instead of default parquet/none? -- To view, visit http://gerrit.cloudera.org:8080/21455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iee27cbbe8d3100301d05a6516b62c45975a8d0e0 Gerrit-Change-Number: 21455 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Yida Wu Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 Jun 2024 12:31:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12732: Add support for MERGE statements for Iceberg tables
Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/21423 ) Change subject: IMPALA-12732: Add support for MERGE statements for Iceberg tables .. Patch Set 7: (63 comments) Thank you Zoltan and Gabor! I covered most of the comments, some of them are still open. If we could resolve those, I'll split up this change into multiple pieces to make reviewing easier. http://gerrit.cloudera.org:8080/#/c/21423/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21423/6//COMMIT_MSG@7 PS6, Line 7: IMPALA-12732: Add support for MERGE statements for Iceberg tables > I know MERGE is a known SQL functionality but I'd find it useful to add a f Done http://gerrit.cloudera.org:8080/#/c/21423/6//COMMIT_MSG@11 PS6, Line 11: ource table. This change adds MERGE > Would you mind adding some examples? Done http://gerrit.cloudera.org:8080/#/c/21423/6//COMMIT_MSG@24 PS6, Line 24: > nit: maybe 'phase' is a better word here Done http://gerrit.cloudera.org:8080/#/c/21423/6//COMMIT_MSG@47 PS6, Line 47: > We should also have authorization tests. Also for fine-grained authz, e.g. Done http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.h File be/src/exec/iceberg-merge-node.h: http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.h@18 PS6, Line 18: #pragma once : > nit: we prefer #pragma once Done http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.h@43 PS6, Line 43: /// for merge action and for the target table. > Please add class comment Done http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.h@101 PS6, Line 101: /// added to the > We prefer std::unique_ptr Done http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.h@158 PS6, Line 158: se(con > no need for 'inline', member-functions defined inside class definitions are Done http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.cc File be/src/exec/iceberg-merge-node.cc: http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.cc@165 PS6, Line 165: os > nit: too much indent Done http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.cc@178 PS6, Line 178: auto row = iter.Ge > I'm not sure if we want to reset last_row_ each time EvaluateCases() is inv Removed http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.cc@186 PS6, Line 186: last_row_ = row; : > Can we also output the duplicated row with PrintTuple()? It's a tricky situation; it's easy to print the target table's tuple, but it also contains the virtual columns, so for small tables, the path will be the dominant part of the error message; I think it's acceptable. To print the source tuples, we have to consider cases where the source row consists of multiple tuples, so first we have to distinguish the source tuples or the tuple that is used for the join, and then print them. There's another quite simple option; using PrintRow. What do you think? Examples: 1. ERROR: Duplicate row found: one target table row matched more than one source row. Target row: (3 3 3 hdfs://localhost:20500/test-warehouse/target_part/data/514806418e9554cc-654b33170005_831375517_data.0.parq 3) 2. ERROR: Duplicate row found: one target table row matched more than one source row. Target row: (4 4 4 hdfs://localhost:20500/test-warehouse/target_part/data/514806418e9554cc-654b33170005_831375517_data.0.parq 4), Source row: (4 4 4) (4 4 4) (4 4 4) (50) 3. ERROR: Duplicate row found: one target table row matched more than one source row. Affected row: [(4 4 4 hdfs://localhost:20500/test-warehouse/target_part/data/514806418e9554cc-654b33170005_831375517_data.0.parq 0) (4 4 4) (4 4 4) (4 4 4) (50)] // The source table ref here are 4 joined tables http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.cc@209 PS6, Line 209: if (ReachedLimit() | > nit: to reduce nesting, we could have Done http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.cc@210 PS6, Line 210: } : return Status::OK(); : } : : void IcebergMergeNode::AddRow( : RowBatch* output_batch, IcebergMergeCase* merge_case, TupleRow* row) { : int dst_row_idx = output_batch->AddRow(); : TupleRow* dst_row = output_batch->GetRow(dst_row_idx); : : auto* target_tuple = : Tuple::Create(row_descriptor_.tuple_descriptors()[target_tuple_idx_]->byte_size(), : output_batch->tuple_data_pool()); : auto* merge_action_tuple = Tuple::Create( : row_descriptor_.tuple_descriptors()[merge_action_tuple_idx_]->byte_size(), : output_batch->tuple_data_pool()); :
[Impala-ASF-CR] IMPALA-12732: Add support for MERGE statements for Iceberg tables
Peter Rozsa has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/21423 ) Change subject: IMPALA-12732: Add support for MERGE statements for Iceberg tables .. IMPALA-12732: Add support for MERGE statements for Iceberg tables MERGE statement is a DML command that allows users to perform conditional insert, update, or delete operations on a target table based on the results of a join with a source table. This change adds MERGE statement parsing and an Iceberg-specific semantic analysis, planning, and execution. The parsing grammar follows the SQL standard, it accepts the same syntax as Hive, Spark, and Trino by supporting arbitrary number of WHEN clauses, with conditions or without and accepting inline views as source. Example: 'MERGE INTO target t USING source s ON t.id = s.id WHEN MATCHED AND t.id < 100 THEN UPDATE SET column1 = s.column1 WHEN MATCHED AND t.id > 100 THEN DELETE WHEN MATCHED THEN UPDATE SET column1 = "value" WHEN NOT MATCHED THEN INSERT VALUES (s.id, s.column1);' The Iceberg-specific analysis, planning, and execution are based on a concept that was previously used for UPDATE: The analyzer creates a SELECT statement with all target and source columns (including Iceberg's virtual columns) and a 'row_present' column that defines whether the source, the target, or both rows are present in the result set after joining the two table references by the ON clause. The join condition should be an equi-join, as it is a FULL OUTER JOIN, and Impala currently supports only equi-joins in this case. The joining order is forced by a query hint, this guarantees that the target table is always on the left side. A new, IcebergMergeNode is added at planning level, this node does the row-level filtering for each MATCHED/ NOT MATCHED cases. The 'row_present' column decides which case group will be evaluated; if both sides are available, the matched cases, if only the source side matches then the not matched cases and their filter expressions will be evaluated over the row. If one of the cases match, then the execution evaluates the result expressions into the output row batch, and an auxiliary tuple will store the merge action. The merge action is a flag for the newly added IcebergMergeSink; this sink will route each incoming row from IcebergMergeNode to their respective destination. Each row could go to the delete sink, insert sink, or to both sinks. Target-side duplicate records are filtered during IcebergMergeNode's execution, if one target table-side duplicate is detected, the whole statement's execution is stopped and the error is reported back to the user. Added tests: - Parser tests - Analyzer tests - Unit test for WHEN NOT MATCHED INSERT column collation - Planner tests for partitioned/sorted cases - Authorization tests - E2E tests Change-Id: I3416a79740eddc446c87f72bf1a85ed3f71af268 --- M be/src/exec/CMakeLists.txt M be/src/exec/data-sink.cc M be/src/exec/exec-node.cc A be/src/exec/iceberg-merge-node.cc A be/src/exec/iceberg-merge-node.h A be/src/exec/iceberg-merge-sink.cc A be/src/exec/iceberg-merge-sink.h M be/src/service/client-request-state.cc M common/thrift/DataSinks.thrift M common/thrift/PlanNodes.thrift M common/thrift/Types.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java A fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java M fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/KuduModifyImpl.java A fe/src/main/java/org/apache/impala/analysis/MergeCase.java A fe/src/main/java/org/apache/impala/analysis/MergeDelete.java A fe/src/main/java/org/apache/impala/analysis/MergeImpl.java A fe/src/main/java/org/apache/impala/analysis/MergeInsert.java A fe/src/main/java/org/apache/impala/analysis/MergeStmt.java A fe/src/main/java/org/apache/impala/analysis/MergeUpdate.java M fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java M fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java A fe/src/main/java/org/apache/impala/planner/IcebergMergeNode.java A fe/src/main/java/org/apache/impala/planner/IcebergMergeSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/PlannerContext.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeModifyStmtsTest.java A fe/src/test/java/org/apache/impala/analysis/MergeInsertTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M
[Impala-ASF-CR] IMPALA-13138: Never smallify existing StringValue objects, only new ones during DeepCopy
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21502 ) Change subject: IMPALA-13138: Never smallify existing StringValue objects, only new ones during DeepCopy .. Patch Set 3: (4 comments) Thanks Zoltán! http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value-test.cc File be/src/runtime/string-value-test.cc: http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value-test.cc@81 PS3, Line 81: Small Nit: "...Smaller..." would be better. http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value-test.cc@81 PS3, Line 81: string Could take const ref. http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value.h File be/src/runtime/string-value.h: http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value.h@73 PS3, Line 73: DCHECK_LE(source.Len(), SmallableString::SMALL_LIMIT); Is there a reason why we DCHECK it instead of simply returning a non-smallified string if the length is too large? http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/tuple.cc File be/src/runtime/tuple.cc: http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/tuple.cc@112 PS3, Line 112: . Nit: comma instead of period. -- To view, visit http://gerrit.cloudera.org:8080/21502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I739048b37a59a81c41c85d475fad00cb520a5f99 Gerrit-Change-Number: 21502 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 13 Jun 2024 11:21:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom()
Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/21501 ) Change subject: IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom() .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/21501/1/be/src/udf/udf-test.cc File be/src/udf/udf-test.cc: http://gerrit.cloudera.org:8080/#/c/21501/1/be/src/udf/udf-test.cc@340 PS1, Line 340: dummpy typo: dummy http://gerrit.cloudera.org:8080/#/c/21501/1/be/src/udf/udf-test.cc@344 PS1, Line 344: } nit: could be one line -- To view, visit http://gerrit.cloudera.org:8080/21501 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6a1d03d65ec4339a0f33e69ff29abdd8cc3e3067 Gerrit-Change-Number: 21501 Gerrit-PatchSet: 1 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 13 Jun 2024 11:04:27 + Gerrit-HasComments: Yes