[Impala-ASF-CR] IMPALA-11992: Support setting query options in Hive JDBC's connection URL
Xiang Yang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19612 ) Change subject: IMPALA-11992: Support setting query options in Hive JDBC's connection URL .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/19612/7/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: http://gerrit.cloudera.org:8080/#/c/19612/7/be/src/service/impala-hs2-server.cc@370 PS7, Line 370: rfind > Why don't we just use 'find'? As we expect it at the beginning of the strin yeah, it must be a prefix with lower case, from the perspective of these hive jdbc codes: https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L1110 -- To view, visit http://gerrit.cloudera.org:8080/19612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie184a0c2404f36a3ee28296336f6545615a5c6ca Gerrit-Change-Number: 19612 Gerrit-PatchSet: 7 Gerrit-Owner: Xiang Yang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Xiang Yang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 30 Mar 2023 02:53:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11985: [DOCS] Support for Kudu's multi-rows transaction
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19651 ) Change subject: IMPALA-11985: [DOCS] Support for Kudu's multi-rows transaction .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/19651/3/docs/topics/impala_kudu.xml File docs/topics/impala_kudu.xml: http://gerrit.cloudera.org:8080/#/c/19651/3/docs/topics/impala_kudu.xml@1438 PS3, Line 1438: This capability provides the ability to With multi-row transaction, you can atomically ingest large number of rows into a Kudu table with INSERT-SELECT or CTAS statement. http://gerrit.cloudera.org:8080/#/c/19651/3/docs/topics/impala_kudu.xml@1439 PS3, Line 1439: : Atomically do a bulk ingest. This allows to atomically ingest large number of rows : into a Kudu table with INSERT-SELECT or CTAS statement. : remove this section -- To view, visit http://gerrit.cloudera.org:8080/19651 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic226679d83d7221f843994ead11cb2bc9e971882 Gerrit-Change-Number: 19651 Gerrit-PatchSet: 3 Gerrit-Owner: Shajini Thayasingh Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shajini Thayasingh Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 02:18:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@9 PS4, Line 9: In multiple executor group set setup nit: In a setup with multiple executor group set http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@11 PS4, Line 11: kind nit: kinds http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@18 PS4, Line 18: relax nit: relaxes http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@21 PS4, Line 21: we replan once again with that executor group set but with : scan fragment parallelism returned back to MT_DOP. Does this mean we need one more replan for any query plan with scan fragment? What's overhead for small query? http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@27 PS4, Line 27: fragment nit: fragments http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1061 PS4, Line 1061: fixLeafNodes here and other places, should we name is as fixedLeafNodes? http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/ScanNode.java@361 PS4, Line 361: < 0 <= 0. Otherwise if getInputCardinality() equals 0, syntheticCardinality equals 0. Then divide by 0. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 02:06:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12032: Fix min parallelism bug in PlanFragment
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19663 ) Change subject: IMPALA-12032: Fix min parallelism bug in PlanFragment .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/19663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6e58d5d54f60818c003f488b1681b8660552f1e9 Gerrit-Change-Number: 19663 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 00:07:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12714/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 29 Mar 2023 23:11:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12032: Fix min parallelism bug in PlanFragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19663 ) Change subject: IMPALA-12032: Fix min parallelism bug in PlanFragment .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12713/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6e58d5d54f60818c003f488b1681b8660552f1e9 Gerrit-Change-Number: 19663 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 29 Mar 2023 23:11:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12032: Fix min parallelism bug in PlanFragment
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19663 to look at the new patch set (#3). Change subject: IMPALA-12032: Fix min parallelism bug in PlanFragment .. IMPALA-12032: Fix min parallelism bug in PlanFragment PROCESSING_COST_MIN_THREADS suppose to be a lower bound of per-node parallelism when CPU costing algorithm adjust fragment parallelism across executor group set. But PlanFragment.adjustToMaxParallelism() did not take account of it during adjustment. This patch fix that bug by capping per-node fragment parallelism to PROCESSING_COST_MIN_THREADS if cost based parallelism comes up with number that is lower. Testing: - Set PROCESSING_COST_MIN_THREADS in PlannerTest.testProcessingCost. - Pass test_executor_groups.py. - Add test cases in query-options-test.cc. Change-Id: I6e58d5d54f60818c003f488b1681b8660552f1e9 --- M be/src/service/query-options-test.cc M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test 5 files changed, 451 insertions(+), 430 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/19663/3 -- To view, visit http://gerrit.cloudera.org:8080/19663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6e58d5d54f60818c003f488b1681b8660552f1e9 Gerrit-Change-Number: 19663 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19656 to look at the new patch set (#4). Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. IMPALA-12029: Relax scan fragment parallelism on first planning In multiple executor group set setup, Frontend will try to match a query with the smallest executor group set that can fit the memory and cpu requirement of the compiled query. There are kind of query where the compiled plan will fit to any executor group set but not necessarily deliver the best performance. An example for this is Impala's COMPUTE STATS query. It does full table scan and aggregate the stats, have fairly simple query plan shape, but can benefit from higher scan parallelism. This patch relax the scan fragment parallelism on first round of query planning. This allows scan fragment to increase its parallelism based on its ProcessingCost estimation. If the relaxed plan fit in an executor group set, we replan once again with that executor group set but with scan fragment parallelism returned back to MT_DOP. We can remove the replanning in the future once we can fully manage scan node parallelism without MT_DOP. This patch also tune computeScanProcessingCost() to guard against scheduling too many scan fragment by comparing with the actual scan range count that Planner knows. Testing: - Pass test_executor_groups.py - Add test case in test_min_processing_per_thread_small. - Raised impala.admission-control.max-query-mem-limit.root.small from 64MB to 70MB in llama-site-3-groups.xml so that the new grouping query can fit in root.small pool. Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 --- M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/resources/llama-site-3-groups.xml M tests/custom_cluster/test_executor_groups.py 10 files changed, 177 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/4 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12032: Fix min parallelism bug in PlanFragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19663 ) Change subject: IMPALA-12032: Fix min parallelism bug in PlanFragment .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12711/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6e58d5d54f60818c003f488b1681b8660552f1e9 Gerrit-Change-Number: 19663 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 29 Mar 2023 21:29:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 3: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/12712/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 29 Mar 2023 21:20:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/19656/2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/19656/2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1078 PS2, Line 1078: { : // TODO: Fragment with UnionNode but without ScanNode should have it > Comment here and other places should say leaf node instead of scan node to Done http://gerrit.cloudera.org:8080/#/c/19656/2/tests/custom_cluster/test_executor_groups.py File tests/custom_cluster/test_executor_groups.py: http://gerrit.cloudera.org:8080/#/c/19656/2/tests/custom_cluster/test_executor_groups.py@955 PS2, Line 955: "Test processing cost with min_processing_per_thread s > Misplaced comment. Done -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 29 Mar 2023 21:10:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12032: Fix min parallelism bug in PlanFragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19663 ) Change subject: IMPALA-12032: Fix min parallelism bug in PlanFragment .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/19663/2/fe/src/main/java/org/apache/impala/planner/CostingSegment.java File fe/src/main/java/org/apache/impala/planner/CostingSegment.java: http://gerrit.cloudera.org:8080/#/c/19663/2/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@203 PS2, Line 203: + "maxParallelism={} newParallelism={} consumerCost={} consumerInstCount={} " line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6e58d5d54f60818c003f488b1681b8660552f1e9 Gerrit-Change-Number: 19663 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 29 Mar 2023 21:08:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12032: Fix min parallelism bug in PlanFragment
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19663 to look at the new patch set (#2). Change subject: IMPALA-12032: Fix min parallelism bug in PlanFragment .. IMPALA-12032: Fix min parallelism bug in PlanFragment PROCESSING_COST_MIN_THREADS suppose to be a lower bound of per-node parallelism when CPU costing algorithm adjust fragment parallelism across executor group set. But PlanFragment.adjustToMaxParallelism() did not take account of it during adjustment. This patch fix that bug by capping per-node fragment parallelism to PROCESSING_COST_MIN_THREADS if cost based parallelism comes up with number that is lower. Testing: - Set PROCESSING_COST_MIN_THREADS in PlannerTest.testProcessingCost. - Pass test_executor_groups.py. - Add test cases in query-options-test.cc. Change-Id: I6e58d5d54f60818c003f488b1681b8660552f1e9 --- M be/src/service/query-options-test.cc M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test 5 files changed, 451 insertions(+), 430 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/19663/2 -- To view, visit http://gerrit.cloudera.org:8080/19663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6e58d5d54f60818c003f488b1681b8660552f1e9 Gerrit-Change-Number: 19663 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19656 to look at the new patch set (#3). Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. IMPALA-12029: Relax scan fragment parallelism on first planning In multiple executor group set setup, Frontend will try to match a query with the smallest executor group set that can fit the memory and cpu requirement of the compiled query. There are kind of query where the compiled plan will fit to any executor group set but not necessarily deliver the best performance. An example for this is Impala's COMPUTE STATS query. It does full table scan and aggregate the stats, have fairly simple query plan shape, but can benefit from higher scan parallelism. This patch relax the scan fragment parallelism on first round of query planning. This allows scan fragment to increase its parallelism based on its ProcessingCost estimation. If the relaxed plan fit in an executor group set, we replan once again with that executor group set but with scan fragment parallelism returned back to MT_DOP. We can remove the replanning in the future once we can fully manage scan node parallelism without MT_DOP. Testing: - Pass test_executor_groups.py - Add test case in test_min_processing_per_thread_small. - Raised impala.admission-control.max-query-mem-limit.root.small from 64MB to 70MB in llama-site-3-groups.xml so that the new grouping query can fit in root.small pool. Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 --- M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/resources/llama-site-3-groups.xml M tests/custom_cluster/test_executor_groups.py 6 files changed, 125 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/3 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-11908: Parser change for Iceberg metadata querying
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19483 ) Change subject: IMPALA-11908: Parser change for Iceberg metadata querying .. IMPALA-11908: Parser change for Iceberg metadata querying This change extends parsing table references with Iceberg metadata tables. The TableName class has been extended with an extra vTbl field which is filled when a virtual table reference is suspected. This additional field helps to keep the real table in the statement table cache next to the virtual table, which should be loaded so Iceberg metadata tables can be created. Iceberg provides a rich API to query metadata, these Iceberg API tables are accessible through the MetadataTableUtils class. Using these table schemas it is possible to create an Impala table that can be queried later on. Querying a metadata table at this point is expected to throw a NotImplementedException. Testing: - Added E2E test to test it for some tables. Change-Id: I0b5db884b5f3fecbd132fcb2c2cbd6c622ff965b Reviewed-on: http://gerrit.cloudera.org:8080/19483 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/FromClause.java A fe/src/main/java/org/apache/impala/analysis/IcebergMetadataTableRef.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/TableName.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTimeTravelTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java A testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M tests/query_test/test_iceberg.py 14 files changed, 423 insertions(+), 36 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/19483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I0b5db884b5f3fecbd132fcb2c2cbd6c622ff965b Gerrit-Change-Number: 19483 Gerrit-PatchSet: 13 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-11908: Parser change for Iceberg metadata querying
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19483 ) Change subject: IMPALA-11908: Parser change for Iceberg metadata querying .. Patch Set 12: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/19483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0b5db884b5f3fecbd132fcb2c2cbd6c622ff965b Gerrit-Change-Number: 19483 Gerrit-PatchSet: 12 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Mar 2023 20:53:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11985: [DOCS] Support for Kudu's multi-rows transaction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19651 ) Change subject: IMPALA-11985: [DOCS] Support for Kudu's multi-rows transaction .. Patch Set 3: Verified+1 Build Successful https://jenkins.impala.io/job/gerrit-docs-auto-test/718/ : Doc tests passed. -- To view, visit http://gerrit.cloudera.org:8080/19651 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic226679d83d7221f843994ead11cb2bc9e971882 Gerrit-Change-Number: 19651 Gerrit-PatchSet: 3 Gerrit-Owner: Shajini Thayasingh Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shajini Thayasingh Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 29 Mar 2023 20:50:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11985: [DOCS] Support for Kudu's multi-rows transaction
Hello Alexey Serbin, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19651 to look at the new patch set (#3). Change subject: IMPALA-11985: [DOCS] Support for Kudu's multi-rows transaction .. IMPALA-11985: [DOCS] Support for Kudu's multi-rows transaction Clarified some questions that were raised as comments. Incorporated some minor comments. Documented the support for Kudu's multi-rows transaction. Change-Id: Ic226679d83d7221f843994ead11cb2bc9e971882 --- M docs/topics/impala_kudu.xml 1 file changed, 86 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/19651/3 -- To view, visit http://gerrit.cloudera.org:8080/19651 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic226679d83d7221f843994ead11cb2bc9e971882 Gerrit-Change-Number: 19651 Gerrit-PatchSet: 3 Gerrit-Owner: Shajini Thayasingh Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shajini Thayasingh Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-11985: [DOCS] Support for Kudu's multi-rows transaction
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19651 ) Change subject: IMPALA-11985: [DOCS] Support for Kudu's multi-rows transaction .. Patch Set 3: Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/718/ Testing docs change - this change appears to modify docs/ and no code. This is experimental - please report any issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317 -- To view, visit http://gerrit.cloudera.org:8080/19651 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic226679d83d7221f843994ead11cb2bc9e971882 Gerrit-Change-Number: 19651 Gerrit-PatchSet: 3 Gerrit-Owner: Shajini Thayasingh Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shajini Thayasingh Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 29 Mar 2023 20:41:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12032: Fix min parallelism bug in PlanFragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19663 ) Change subject: IMPALA-12032: Fix min parallelism bug in PlanFragment .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12710/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6e58d5d54f60818c003f488b1681b8660552f1e9 Gerrit-Change-Number: 19663 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 29 Mar 2023 20:14:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12032: Fix min parallelism bug in PlanFragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19663 ) Change subject: IMPALA-12032: Fix min parallelism bug in PlanFragment .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/19663/1/fe/src/main/java/org/apache/impala/planner/CostingSegment.java File fe/src/main/java/org/apache/impala/planner/CostingSegment.java: http://gerrit.cloudera.org:8080/#/c/19663/1/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@203 PS1, Line 203: + "maxParallelism={} newParallelism={} consumerCost={} consumerInstCount={} " line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/19663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6e58d5d54f60818c003f488b1681b8660552f1e9 Gerrit-Change-Number: 19663 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 29 Mar 2023 19:55:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12032: Fix min parallelism bug in PlanFragment
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19663 Change subject: IMPALA-12032: Fix min parallelism bug in PlanFragment .. IMPALA-12032: Fix min parallelism bug in PlanFragment PROCESSING_COST_MIN_THREADS suppose to be a lower bound of per-node parallelism when CPU costing algorithm adjust fragment parallelism across executor group set. But PlanFragment.adjustToMaxParallelism() did not take account of it during adjustment. This patch fix that bug by capping per-node fragment parallelism to PROCESSING_COST_MIN_THREADS if cost based parallelism comes up with number that is lower. Testing: - Set PROCESSING_COST_MIN_THREADS in PlannerTest.testProcessingCost. - Pass test_executor_groups.py Change-Id: I6e58d5d54f60818c003f488b1681b8660552f1e9 --- M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test 4 files changed, 450 insertions(+), 430 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/19663/1 -- To view, visit http://gerrit.cloudera.org:8080/19663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I6e58d5d54f60818c003f488b1681b8660552f1e9 Gerrit-Change-Number: 19663 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-12031: Add security-related HTTP headers
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19661 ) Change subject: IMPALA-12031: Add security-related HTTP headers .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12709/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19661 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I58e12961e7faa31f42bc2e6bd4de23b56e3dfd5f Gerrit-Change-Number: 19661 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Wed, 29 Mar 2023 19:25:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12031: Add security-related HTTP headers
Michael Smith has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19661 Change subject: IMPALA-12031: Add security-related HTTP headers .. IMPALA-12031: Add security-related HTTP headers These are primarily no-ops as Impala never serves HTTP and HTTPS at the same time, and does not provide any way to upload files. Testing: - manually interacted with web UI over HTTP and HTTPS, verified headers in responses - ran webserver-test Change-Id: I58e12961e7faa31f42bc2e6bd4de23b56e3dfd5f --- M be/src/util/webserver-test.cc M be/src/util/webserver.cc 2 files changed, 32 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/19661/1 -- To view, visit http://gerrit.cloudera.org:8080/19661 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I58e12961e7faa31f42bc2e6bd4de23b56e3dfd5f Gerrit-Change-Number: 19661 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith
[Impala-ASF-CR] IMPALA-11908: Parser change for Iceberg metadata querying
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19483 ) Change subject: IMPALA-11908: Parser change for Iceberg metadata querying .. Patch Set 12: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9184/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/19483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0b5db884b5f3fecbd132fcb2c2cbd6c622ff965b Gerrit-Change-Number: 19483 Gerrit-PatchSet: 12 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Mar 2023 15:35:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11908: Parser change for Iceberg metadata querying
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19483 ) Change subject: IMPALA-11908: Parser change for Iceberg metadata querying .. Patch Set 12: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/19483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0b5db884b5f3fecbd132fcb2c2cbd6c622ff965b Gerrit-Change-Number: 19483 Gerrit-PatchSet: 12 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Mar 2023 15:35:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11954: Fix for URL encoded partition columns for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19654 ) Change subject: IMPALA-11954: Fix for URL encoded partition columns for Iceberg tables .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12708/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67edc3d04738306fed0d4ebc5312f3d8d4f14254 Gerrit-Change-Number: 19654 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Mar 2023 15:15:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11954: Fix for URL encoded partition columns for Iceberg tables
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/19654 ) Change subject: IMPALA-11954: Fix for URL encoded partition columns for Iceberg tables .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/19654/3/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/19654/3/be/src/exec/hdfs-table-sink.cc@516 PS3, Line 516: url_encoded_partition_name != nullptr); : DCHECK(external_partition_name != null > nit: can you add separate DCHECKs, so if we hit the DCHECK we'll know what Done http://gerrit.cloudera.org:8080/#/c/19654/3/be/src/exec/output-partition.h File be/src/exec/output-partition.h: http://gerrit.cloudera.org:8080/#/c/19654/3/be/src/exec/output-partition.h@70 PS3, Line 70: keys > nit: keys and values Done http://gerrit.cloudera.org:8080/#/c/19654/3/common/fbs/IcebergObjects.fbs File common/fbs/IcebergObjects.fbs: http://gerrit.cloudera.org:8080/#/c/19654/3/common/fbs/IcebergObjects.fbs@68 PS3, Line 68: raw_partition_f > nit: raw_partition_fields? or raw_partition_key_values? Done http://gerrit.cloudera.org:8080/#/c/19654/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test: http://gerrit.cloudera.org:8080/#/c/19654/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test@694 PS3, Line 694: RUNTIME_PROFILE > Could you please add RUNTIME_PROFILE to check partition pruning? Done -- To view, visit http://gerrit.cloudera.org:8080/19654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67edc3d04738306fed0d4ebc5312f3d8d4f14254 Gerrit-Change-Number: 19654 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Mar 2023 14:55:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11954: Fix for URL encoded partition columns for Iceberg tables
Hello Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19654 to look at the new patch set (#4). Change subject: IMPALA-11954: Fix for URL encoded partition columns for Iceberg tables .. IMPALA-11954: Fix for URL encoded partition columns for Iceberg tables There is a bug when an Iceberg table has a string partition column and Impala insert special chars into this column that need to be URL encoded. In this case the partition name is URL encoded not to confuse the file paths for that partition. E.g. 'b=1/2' value is converted to 'b=1%2F2'. This if fine for path creation, however, for Iceberg tables the same URL encoded partition name is saved into catalog as the partition name also used for Iceberg column stats. This brings to incorrect results when querying the table as the URL encoded values are returned in a SELECT * query instead of what the user inserted. Additionally, when adding a filter to the query, Iceberg will filter out all the rows because it compares the non-encoded values to the URL encoded values. Testing: - Added new tests to iceberg-partitioned-insert.test to cover this scenario. - Re-run the existing test suite. Change-Id: I67edc3d04738306fed0d4ebc5312f3d8d4f14254 --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/output-partition.h M be/src/runtime/dml-exec-state.cc M common/fbs/IcebergObjects.fbs M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test 8 files changed, 204 insertions(+), 52 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/19654/4 -- To view, visit http://gerrit.cloudera.org:8080/19654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I67edc3d04738306fed0d4ebc5312f3d8d4f14254 Gerrit-Change-Number: 19654 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-11908: Parser change for Iceberg metadata querying
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19483 ) Change subject: IMPALA-11908: Parser change for Iceberg metadata querying .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12707/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0b5db884b5f3fecbd132fcb2c2cbd6c622ff965b Gerrit-Change-Number: 19483 Gerrit-PatchSet: 11 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Mar 2023 14:19:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11908: Parser change for Iceberg metadata querying
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/19483 ) Change subject: IMPALA-11908: Parser change for Iceberg metadata querying .. Patch Set 11: Code-Review+2 Great works! Thanks, Tamas! -- To view, visit http://gerrit.cloudera.org:8080/19483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0b5db884b5f3fecbd132fcb2c2cbd6c622ff965b Gerrit-Change-Number: 19483 Gerrit-PatchSet: 11 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Mar 2023 14:07:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11908: Parser change for Iceberg metadata querying
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19483 ) Change subject: IMPALA-11908: Parser change for Iceberg metadata querying .. Patch Set 11: Code-Review+1 Thanks Tamas, let's wait for the +2 from Gábor. -- To view, visit http://gerrit.cloudera.org:8080/19483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0b5db884b5f3fecbd132fcb2c2cbd6c622ff965b Gerrit-Change-Number: 19483 Gerrit-PatchSet: 11 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Mar 2023 14:04:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11908: Parser change for Iceberg metadata querying
Hello Daniel Becker, Gabor Kaszab, Zoltan Borok-Nagy, lipeng...@apache.org, Gergely Fürnstáhl, Noemi Pap-Takacs, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19483 to look at the new patch set (#11). Change subject: IMPALA-11908: Parser change for Iceberg metadata querying .. IMPALA-11908: Parser change for Iceberg metadata querying This change extends parsing table references with Iceberg metadata tables. The TableName class has been extended with an extra vTbl field which is filled when a virtual table reference is suspected. This additional field helps to keep the real table in the statement table cache next to the virtual table, which should be loaded so Iceberg metadata tables can be created. Iceberg provides a rich API to query metadata, these Iceberg API tables are accessible through the MetadataTableUtils class. Using these table schemas it is possible to create an Impala table that can be queried later on. Querying a metadata table at this point is expected to throw a NotImplementedException. Testing: - Added E2E test to test it for some tables. Change-Id: I0b5db884b5f3fecbd132fcb2c2cbd6c622ff965b --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/FromClause.java A fe/src/main/java/org/apache/impala/analysis/IcebergMetadataTableRef.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/TableName.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTimeTravelTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java A testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M tests/query_test/test_iceberg.py 14 files changed, 423 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/19483/11 -- To view, visit http://gerrit.cloudera.org:8080/19483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0b5db884b5f3fecbd132fcb2c2cbd6c622ff965b Gerrit-Change-Number: 19483 Gerrit-PatchSet: 11 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-11908: Parser change for Iceberg metadata querying
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/19483 ) Change subject: IMPALA-11908: Parser change for Iceberg metadata querying .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/19483/10/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/19483/10/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3340 PS10, Line 3340: paramter > Nit: typo. Done -- To view, visit http://gerrit.cloudera.org:8080/19483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0b5db884b5f3fecbd132fcb2c2cbd6c622ff965b Gerrit-Change-Number: 19483 Gerrit-PatchSet: 10 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Mar 2023 13:59:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11992: Support setting query options in Hive JDBC's connection URL
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/19612 ) Change subject: IMPALA-11992: Support setting query options in Hive JDBC's connection URL .. Patch Set 7: Code-Review+1 (1 comment) Thanks for working on this! http://gerrit.cloudera.org:8080/#/c/19612/7/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: http://gerrit.cloudera.org:8080/#/c/19612/7/be/src/service/impala-hs2-server.cc@370 PS7, Line 370: rfind Why don't we just use 'find'? As we expect it at the beginning of the string anyway? -- To view, visit http://gerrit.cloudera.org:8080/19612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie184a0c2404f36a3ee28296336f6545615a5c6ca Gerrit-Change-Number: 19612 Gerrit-PatchSet: 7 Gerrit-Owner: Xiang Yang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Xiang Yang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Mar 2023 13:47:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11908: Parser change for Iceberg metadata querying
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19483 ) Change subject: IMPALA-11908: Parser change for Iceberg metadata querying .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/19483/10/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/19483/10/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3340 PS10, Line 3340: paramter Nit: typo. -- To view, visit http://gerrit.cloudera.org:8080/19483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0b5db884b5f3fecbd132fcb2c2cbd6c622ff965b Gerrit-Change-Number: 19483 Gerrit-PatchSet: 10 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Mar 2023 13:46:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11954: Fix for URL encoded partition columns for Iceberg tables
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/19654 ) Change subject: IMPALA-11954: Fix for URL encoded partition columns for Iceberg tables .. Patch Set 3: Code-Review+1 (4 comments) Thanks for fixing this, the change looks great! http://gerrit.cloudera.org:8080/#/c/19654/3/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/19654/3/be/src/exec/hdfs-table-sink.cc@516 PS3, Line 516: url_encoded_partition_name != nullptr && raw_partition_names != nullptr && : external_partition_name != nullptr nit: can you add separate DCHECKs, so if we hit the DCHECK we'll know what was NULL? http://gerrit.cloudera.org:8080/#/c/19654/3/be/src/exec/output-partition.h File be/src/exec/output-partition.h: http://gerrit.cloudera.org:8080/#/c/19654/3/be/src/exec/output-partition.h@70 PS3, Line 70: names nit: keys and values http://gerrit.cloudera.org:8080/#/c/19654/3/common/fbs/IcebergObjects.fbs File common/fbs/IcebergObjects.fbs: http://gerrit.cloudera.org:8080/#/c/19654/3/common/fbs/IcebergObjects.fbs@68 PS3, Line 68: partition_names nit: raw_partition_fields? or raw_partition_key_values? http://gerrit.cloudera.org:8080/#/c/19654/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test: http://gerrit.cloudera.org:8080/#/c/19654/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test@694 PS3, Line 694: Could you please add RUNTIME_PROFILE to check partition pruning? -- To view, visit http://gerrit.cloudera.org:8080/19654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67edc3d04738306fed0d4ebc5312f3d8d4f14254 Gerrit-Change-Number: 19654 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Mar 2023 13:37:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11908: Parser change for Iceberg metadata querying
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19483 ) Change subject: IMPALA-11908: Parser change for Iceberg metadata querying .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12706/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0b5db884b5f3fecbd132fcb2c2cbd6c622ff965b Gerrit-Change-Number: 19483 Gerrit-PatchSet: 10 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Mar 2023 13:36:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11908: Parser change for Iceberg metadata querying
Tamas Mate has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/19483 ) Change subject: IMPALA-11908: Parser change for Iceberg metadata querying .. IMPALA-11908: Parser change for Iceberg metadata querying This change extends parsing table references with Iceberg metadata tables. The TableName class has been extended with an extra vTbl field which is filled when a virtual table reference is suspected. This additional field helps to keep the real table in the statement table cache next to the virtual table, which should be loaded so Iceberg metadata tables can be created. Iceberg provides a rich API to query metadata, these Iceberg API tables are accessible through the MetadataTableUtils class. Using these table schemas it is possible to create an Impala table that can be queried later on. Querying a metadata table at this point is expected to throw a NotImplementedException. Testing: - Added E2E test to test it for some tables. Change-Id: I0b5db884b5f3fecbd132fcb2c2cbd6c622ff965b --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/FromClause.java A fe/src/main/java/org/apache/impala/analysis/IcebergMetadataTableRef.java M fe/src/main/java/org/apache/impala/analysis/Path.java M fe/src/main/java/org/apache/impala/analysis/TableName.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTimeTravelTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java A testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test M tests/query_test/test_iceberg.py 14 files changed, 423 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/19483/10 -- To view, visit http://gerrit.cloudera.org:8080/19483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0b5db884b5f3fecbd132fcb2c2cbd6c622ff965b Gerrit-Change-Number: 19483 Gerrit-PatchSet: 10 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-11908: Parser change for Iceberg metadata querying
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/19483 ) Change subject: IMPALA-11908: Parser change for Iceberg metadata querying .. Patch Set 6: (2 comments) Thank you for the reviews, updated the patch. http://gerrit.cloudera.org:8080/#/c/19483/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/19483/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3341 PS6, Line 3341: addMetadataVirtualTable > You could mention in the doc comment that tblRefPath is expected to be an I Done http://gerrit.cloudera.org:8080/#/c/19483/9/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/19483/9/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@119 PS9, Line 119: > Have you tried out if we can refer to the ITEM and POS pseudo columns of th Added 'pos' for this test case. Added another test case with this exploding join functionality. -- To view, visit http://gerrit.cloudera.org:8080/19483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0b5db884b5f3fecbd132fcb2c2cbd6c622ff965b Gerrit-Change-Number: 19483 Gerrit-PatchSet: 6 Gerrit-Owner: Tamas Mate Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Mar 2023 13:16:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates to scanner if already applied by Iceberg
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/19534 ) Change subject: IMPALA-11701 Part1: Don't push down predicates to scanner if already applied by Iceberg .. Patch Set 8: Code-Review+1 (4 comments) LGTM! http://gerrit.cloudera.org:8080/#/c/19534/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19534/8//COMMIT_MSG@14 PS8, Line 14: partition columns nit: IDENTITY-partition columns? http://gerrit.cloudera.org:8080/#/c/19534/8//COMMIT_MSG@20 PS8, Line 20: materialize less slots And in this example we not just materialize less slots, but we also get count-star optimization. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@64 PS1, Line 64: > 'nonIdentityConjuncts_' is in fact a subset of 'conjuncts_' that doesn't in Yeah, I think 'nonIdentityConjuncts_' can retire with this change, and it's more precise to use 'conjuncts_', as it is already filtered by 'IcebergScanPlanner.filterConjuncts()'. http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test: http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@990 PS3, Line 990: event_time='2020-01-01 11:00:00'; > I had to check, but apparently we push down predicates of an HOUR() partiti I see, thanks for checking. -- To view, visit http://gerrit.cloudera.org:8080/19534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b Gerrit-Change-Number: 19534 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Mar 2023 13:08:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates to scanner if already applied by Iceberg
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19534 ) Change subject: IMPALA-11701 Part1: Don't push down predicates to scanner if already applied by Iceberg .. Patch Set 8: Code-Review+1 (2 comments) Looks good to me. http://gerrit.cloudera.org:8080/#/c/19534/8/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/19534/8/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@175 PS8, Line 175: file We could write "file(s)". http://gerrit.cloudera.org:8080/#/c/19534/8/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@371 PS8, Line 371: file See L175. -- To view, visit http://gerrit.cloudera.org:8080/19534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b Gerrit-Change-Number: 19534 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Mar 2023 13:01:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12705/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Wed, 29 Mar 2023 08:39:52 + Gerrit-HasComments: No
[Impala-ASF-CR](asf-site) Update download links for release 4.1.2
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19658 Change subject: Update download links for release 4.1.2 .. Update download links for release 4.1.2 Also update the download links of 4.1.1 to use archive URLs. Change-Id: Ife018360ebb582bd55ab4138ac06de5c302394c3 --- M downloads.html 1 file changed, 15 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/19658/1 -- To view, visit http://gerrit.cloudera.org:8080/19658 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: newchange Gerrit-Change-Id: Ife018360ebb582bd55ab4138ac06de5c302394c3 Gerrit-Change-Number: 19658 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang
[Impala-ASF-CR](asf-site) Add 4.1.2 change log
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19657 ) Change subject: Add 4.1.2 change log .. Patch Set 1: Verified-1 Build Failed https://jenkins.impala.io/job/gerrit-docs-auto-test/717/ : Doc tests failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/19657 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: comment Gerrit-Change-Id: Ib0732aaaf51003da205814184006f2814a30f96a Gerrit-Change-Number: 19657 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 29 Mar 2023 08:26:39 + Gerrit-HasComments: No
[Impala-ASF-CR](asf-site) Add 4.1.2 change log
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19657 ) Change subject: Add 4.1.2 change log .. Patch Set 1: Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/717/ Testing docs change - this change appears to modify docs/ and no code. This is experimental - please report any issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317 -- To view, visit http://gerrit.cloudera.org:8080/19657 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: comment Gerrit-Change-Id: Ib0732aaaf51003da205814184006f2814a30f96a Gerrit-Change-Number: 19657 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 29 Mar 2023 08:22:46 + Gerrit-HasComments: No
[Impala-ASF-CR](asf-site) Add 4.1.2 change log
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19657 Change subject: Add 4.1.2 change log .. Add 4.1.2 change log Tested by opening the files in my browser. Change-Id: Ib0732aaaf51003da205814184006f2814a30f96a --- A docs/changelog-4.1.2.html M impala-docs.html 2 files changed, 51 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/19657/1 -- To view, visit http://gerrit.cloudera.org:8080/19657 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: asf-site Gerrit-MessageType: newchange Gerrit-Change-Id: Ib0732aaaf51003da205814184006f2814a30f96a Gerrit-Change-Number: 19657 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang
[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
Daniel Becker has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/18999 ) Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted .. IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted If, in a VALUES clause, for the same column all of the values are CHAR types but not all are of the same length, the common type chosen is CHAR(max(lengths)). This means that shorter values are padded with spaces. If the destination column is not CHAR but VARCHAR or STRING, this produces different results than if the values in the column are inserted individually, in separate statements. This behaviour is suboptimal because information is lost. This patch adds the query option VALUES_STMT_NON_LOSSY_COMMON_TYPE which, when set to true, fixes the problem by implicitly casting the values to the VARCHAR type of the longest value if all values in a column are CHAR types AND not all have the same length. This VARCHAR type will be the common type of the column in the VALUES statement. The new behaviour is not turned on by default because it is a breaking change. We choose VARCHAR instead of STRING as the common type because VARCHAR can be converted to any VARCHAR type shorter or the same length and also to STRING, while STRING cannot safely be converted to VARCHAR because its length is not bounded - we therefore would run into problems if the common type were STRING and the destination column were VARCHAR. Note: although the VALUES statement is implemented as a special UNION operation under the hood, this patch doesn't change the behaviour of explicit UNION statements, it only applies to VALUES statements. Testing: - Added tests verifying that unneeded padding doesn't occur and the queries succeed in various situations, e.g. different destination column types and multi-column inserts. See testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/catalog/Type.java A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-common-type.test A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test M tests/query_test/test_chars.py 10 files changed, 383 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/6 -- To view, visit http://gerrit.cloudera.org:8080/18999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86 Gerrit-Change-Number: 18999 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates to scanner if already applied by Iceberg
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19534 ) Change subject: IMPALA-11701 Part1: Don't push down predicates to scanner if already applied by Iceberg .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12704/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b Gerrit-Change-Number: 19534 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Mar 2023 07:29:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates to scanner if already applied by Iceberg
Hello Daniel Becker, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19534 to look at the new patch set (#7). Change subject: IMPALA-11701 Part1: Don't push down predicates to scanner if already applied by Iceberg .. IMPALA-11701 Part1: Don't push down predicates to scanner if already applied by Iceberg We push down predicates to Iceberg that uses them to filter out files when getting the results of planFiles(). Using the FileScanTask.residual() function we can find out if we have to use the predicates to further filter the rows of the given files or if Iceberg has already performed all the filtering. Basically if we only filter on partition columns then Iceberg can filter the files and using these filters in Impala wouldn't filter any more rows from the output (assuming that no partition evolution was performed on the table). An additional benefit of not pushing down no-op predicates to the scanner is that we can potentially materialize less slots. For example: SELECT count(1) from iceberg_tbl where part_col = 10; In the above query Iceberg filters the files using the predicate on a partition column and then there won't be any need to materialize 'part_col' in Impala, nor to push down the 'part_col = 10' predicate. Note, this is an all or nothing approach, meaning that assuming N number of predicates we either push down all predicates to the scanner or none of them. There is a room for improvement to identify a subset of the predicates that we still have to push down to the scanner. However, for this we'd need a mapping between Impala predicates and the predicates returned by Iceberg's FileScanTask.residual() function that would significantly increase the complexity of the relevant code. Testing: - Some existing tests needed some extra care as they were checking for predicates being pushed down to the scanner, but with this patch not all of them are pushed down. For these tests I added some extra predicates to achieve that all of the predicates are pushed down to the scanner. - Added a new planner test suite for checking how predicate push down works with Iceberg tables. Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b --- M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test 12 files changed, 309 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/19534/7 -- To view, visit http://gerrit.cloudera.org:8080/19534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b Gerrit-Change-Number: 19534 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy