[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Sat, 24 Jun 2023 02:24:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. IMPALA-12192: Fix scaling bug in scan fragment IMPALA-12091 has a bug where scan fragment parallelism will always be limited solely by the ScanNode cost. If ScanNode is colocated with other query node operators that have higher processing costs, Planner will not scale it up beyond what is allowed by the ScanNode cost. This patch fixes the problem in two aspects. The first is to allow a scan fragment to scale up higher as long as it is within the total fragment cost and the number of effective scan ranges. The second is to add missing Math.max() in CostingSegment.java which causes lower fragment parallelism even when the total fragment cost is high. IMPALA-10287 optimization is re-enabled to reduce regression in TPC-DS Q78. Ideally, the broadcast vs partitioned costing formula during distributed planning should not rely on numInstance. But enabling this optimization ensures consistent query plan shape when comparing against MT_DOP plan. This optimization can still be disabled by specifying USE_DOP_FOR_COSTING=false. This patch also does some cleanup including: - Fix "max-parallelism" value in explain string. - Make a constant in ScanNode.rowMaterializationCost() into a backend flag named scan_range_cost_factor for experimental purposes. - Replace all references to ProcessingCost.isComputeCost() to queryOptions.isCompute_processing_cost() directly. - Add Precondition in PlanFragment.getNumInstances() to verify that the fragment's num instance is not modified anymore after the costing algorithm finish. Testing: - Manually run TPCDS Q84 over tpcds10_parquet and confirm that the leftmost scan fragment parallelism is raised from 12 (before the patch) to 18 (after the patch). - Add test in PlannerTest.testProcessingCost that reproduces the issue. - Update compute stats test in test_executor_groups.py to maintain test assertion. - Pass core tests. Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Reviewed-on: http://gerrit.cloudera.org:8080/20024 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M tests/custom_cluster/test_executor_groups.py 17 files changed, 585 insertions(+), 409 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 10: Code-Review+2 carry +1 from Kurt -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 23 Jun 2023 20:55:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9434/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 23 Jun 2023 20:56:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 23 Jun 2023 20:56:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13376/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 23 Jun 2023 15:59:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 10: (1 comment) ps10 fix line too long issue. http://gerrit.cloudera.org:8080/#/c/20024/9/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/20024/9/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1100 PS9, Line 1100: (int) Math.min(costBasedMaxParallelism, maxRangesPerScanNode)); > line too long (94 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 23 Jun 2023 15:36:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Hello Kurt Deschler, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20024 to look at the new patch set (#10). Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. IMPALA-12192: Fix scaling bug in scan fragment IMPALA-12091 has a bug where scan fragment parallelism will always be limited solely by the ScanNode cost. If ScanNode is colocated with other query node operators that have higher processing costs, Planner will not scale it up beyond what is allowed by the ScanNode cost. This patch fixes the problem in two aspects. The first is to allow a scan fragment to scale up higher as long as it is within the total fragment cost and the number of effective scan ranges. The second is to add missing Math.max() in CostingSegment.java which causes lower fragment parallelism even when the total fragment cost is high. IMPALA-10287 optimization is re-enabled to reduce regression in TPC-DS Q78. Ideally, the broadcast vs partitioned costing formula during distributed planning should not rely on numInstance. But enabling this optimization ensures consistent query plan shape when comparing against MT_DOP plan. This optimization can still be disabled by specifying USE_DOP_FOR_COSTING=false. This patch also does some cleanup including: - Fix "max-parallelism" value in explain string. - Make a constant in ScanNode.rowMaterializationCost() into a backend flag named scan_range_cost_factor for experimental purposes. - Replace all references to ProcessingCost.isComputeCost() to queryOptions.isCompute_processing_cost() directly. - Add Precondition in PlanFragment.getNumInstances() to verify that the fragment's num instance is not modified anymore after the costing algorithm finish. Testing: - Manually run TPCDS Q84 over tpcds10_parquet and confirm that the leftmost scan fragment parallelism is raised from 12 (before the patch) to 18 (after the patch). - Add test in PlannerTest.testProcessingCost that reproduces the issue. - Update compute stats test in test_executor_groups.py to maintain test assertion. - Pass core tests. Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 --- M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M tests/custom_cluster/test_executor_groups.py 17 files changed, 585 insertions(+), 409 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/10 -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 9: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 23 Jun 2023 14:21:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13370/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 22 Jun 2023 20:11:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13369/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 22 Jun 2023 20:00:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 9: (8 comments) http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/CostingSegment.java File fe/src/main/java/org/apache/impala/planner/CostingSegment.java: http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@194 PS7, Line 194: Preconditions.checkState(newParallelism <= maxParallelism, > This seems always true now with the local max() bounding. Removed. http://gerrit.cloudera.org:8080/#/c/20024/6/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/20024/6/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1085 PS6, Line 1085: Preconditions.checkState(child.hasAdjustedInstanceCount()); > rename maxScanRangesPerNode I think maxRangesPerScanNode is more appropriate? http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@150 PS7, Line 150: // Set in adjustToMaxParallelism(). > Not clear what the ForExplainString is about Renamed to maxParallelism_ and update the comment. The intention is to show the maximum desired parallelism, in query profile, before capped against upper limits like MAX_FRAGMENT_INSTANCES_PER_NODE query option or Analyzer.getAvailableCoresPerNode() from llama file config. This is useful for debugging and tuning. http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1093 PS7, Line 1093: // union fargment, but it should be capped at costBasedMaxParallelism. > nit: be capped Done http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1100 PS7, Line 1100: maxParallelism_, (int) Math.min(costBasedMaxParallelism, maxRangesPerScanNode)); > Might be better to round up. I pick min here because scan range count can be really big, but costBasedMaxParallelism should already have scan cost added into it. http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1140 PS7, Line 1140: > Seems strange that maxParallelismForExplainString_does not always follow ma Refactored the variable names within this method. I think is is easier to follow now. http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@357 PS7, Line 357: ExprUtil.computeExprsTotalCost(conjuncts_), rowMaterializationCost()); > Move this into the else if block below. Refactored this code. The if block is now removed. http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@403 PS7, Line 403: return perRowCost + scanRangeCostPerRow; > getScanRangeCostFactor returning double whereas these functons multiply and Move the type casting within getScanRangeCostFactor(). -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 22 Jun 2023 19:49:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/20024/9/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/20024/9/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1100 PS9, Line 1100: maxParallelism_, (int) Math.min(costBasedMaxParallelism, maxRangesPerScanNode)); line too long (94 > 90) -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 22 Jun 2023 19:48:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Hello Kurt Deschler, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20024 to look at the new patch set (#9). Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. IMPALA-12192: Fix scaling bug in scan fragment IMPALA-12091 has a bug where scan fragment parallelism will always be limited solely by the ScanNode cost. If ScanNode is colocated with other query node operators that have higher processing costs, Planner will not scale it up beyond what is allowed by the ScanNode cost. This patch fixes the problem in two aspects. The first is to allow a scan fragment to scale up higher as long as it is within the total fragment cost and the number of effective scan ranges. The second is to add missing Math.max() in CostingSegment.java which causes lower fragment parallelism even when the total fragment cost is high. IMPALA-10287 optimization is re-enabled to reduce regression in TPC-DS Q78. Ideally, the broadcast vs partitioned costing formula during distributed planning should not rely on numInstance. But enabling this optimization ensures consistent query plan shape when comparing against MT_DOP plan. This optimization can still be disabled by specifying USE_DOP_FOR_COSTING=false. This patch also does some cleanup including: - Fix "max-parallelism" value in explain string. - Make a constant in ScanNode.rowMaterializationCost() into a backend flag named scan_range_cost_factor for experimental purposes. - Replace all references to ProcessingCost.isComputeCost() to queryOptions.isCompute_processing_cost() directly. - Add Precondition in PlanFragment.getNumInstances() to verify that the fragment's num instance is not modified anymore after the costing algorithm finish. Testing: - Manually run TPCDS Q84 over tpcds10_parquet and confirm that the leftmost scan fragment parallelism is raised from 12 (before the patch) to 18 (after the patch). - Add test in PlannerTest.testProcessingCost that reproduces the issue. - Update compute stats test in test_executor_groups.py to maintain test assertion. - Pass core tests. Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 --- M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M tests/custom_cluster/test_executor_groups.py 17 files changed, 585 insertions(+), 409 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/9 -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Hello Kurt Deschler, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20024 to look at the new patch set (#8). Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. IMPALA-12192: Fix scaling bug in scan fragment IMPALA-12091 has a bug where scan fragment parallelism will always be limited solely by the ScanNode cost. If ScanNode is colocated with other query node operators that have higher processing costs, Planner will not scale it up beyond what is allowed by the ScanNode cost. This patch fixes the problem in two aspects. The first is to allow a scan fragment to scale up higher as long as it is within the total fragment cost and the number of effective scan ranges. The second is to add missing Math.max() in CostingSegment.java which causes lower fragment parallelism even when the total fragment cost is high. IMPALA-10287 optimization is re-enabled to reduce regression in TPC-DS Q78. Ideally, the broadcast vs partitioned costing formula during distributed planning should not rely on numInstance. But enabling this optimization ensures consistent query plan shape when comparing against MT_DOP plan. This optimization can still be disabled by specifying USE_DOP_FOR_COSTING=false. This patch also does some cleanup including: - Fix "max-parallelism" value in explain string. - Make a constant in ScanNode.rowMaterializationCost() into a backend flag named scan_range_cost_factor for experimental purposes. - Replace all references to ProcessingCost.isComputeCost() to queryOptions.isCompute_processing_cost() directly. - Add Precondition in PlanFragment.getNumInstances() to verify that the fragment's num instance is not modified anymore after the costing algorithm finish. Testing: - Manually run TPCDS Q84 over tpcds10_parquet and confirm that the leftmost scan fragment parallelism is raised from 12 (before the patch) to 18 (after the patch). - Add test in PlannerTest.testProcessingCost that reproduces the issue. - Update compute stats test in test_executor_groups.py to maintain test assertion. - Pass core tests. Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 --- M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M tests/custom_cluster/test_executor_groups.py 17 files changed, 585 insertions(+), 409 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/8 -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 7: (8 comments) http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/CostingSegment.java File fe/src/main/java/org/apache/impala/planner/CostingSegment.java: http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@194 PS7, Line 194: Preconditions.checkState(newParallelism >= minParallelism, This seems always true now with the local max() bounding. http://gerrit.cloudera.org:8080/#/c/20024/6/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/20024/6/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1085 PS6, Line 1085: Preconditions.checkState(child.hasAdjustedInstanceCount()); rename maxScanRangesPerNode http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@150 PS7, Line 150: private int maxParallelismForExplainString_ = -1; Not clear what the ForExplainString is about http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1093 PS7, Line 1093: // union fargment, but it should still capped at costBasedMaxParallelism. nit: be capped http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1100 PS7, Line 1100: (int) Math.min(costBasedMaxParallelism, largestScanRange); Might be better to round up. http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1140 PS7, Line 1140: maxParallelism = costBasedMaxParallelism; Seems strange that maxParallelismForExplainString_does not always follow maxParallelism. http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@357 PS7, Line 357: int maxScannerThreads = Move this into the else if block below. http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@403 PS7, Line 403: * (float) BackendConfig.INSTANCE.getScanRangeCostFactor() / getInputCardinality() getScanRangeCostFactor returning double whereas these functons multiply and return float -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 21 Jun 2023 15:08:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13314/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 16 Jun 2023 16:36:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Hello Kurt Deschler, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20024 to look at the new patch set (#7). Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. IMPALA-12192: Fix scaling bug in scan fragment IMPALA-12091 has a bug where scan fragment parallelism will always be limited solely by the ScanNode cost. If ScanNode is colocated with other query node operators that have higher processing costs, Planner will not scale it up beyond what is allowed by the ScanNode cost. This patch fixes the problem in two aspects. The first is to allow a scan fragment to scale up higher as long as it is within the total fragment cost and the number of effective scan ranges. The second is to add missing Math.max() in CostingSegment.java which causes lower fragment parallelism even when the total fragment cost is high. IMPALA-10287 optimization is re-enabled to reduce regression in TPC-DS Q78. Ideally, the broadcast vs partitioned costing formula during distributed planning should not rely on numInstance. But enabling this optimization ensures consistent query plan shape when comparing against MT_DOP plan. This optimization can still be disabled by specifying USE_DOP_FOR_COSTING=false. This patch also does some cleanup including: - Fix "max-parallelism" value in explain string. - Make a constant in ScanNode.rowMaterializationCost() into a backend flag named scan_range_cost_factor for experimental purposes. - Replace all references to ProcessingCost.isComputeCost() to queryOptions.isCompute_processing_cost() directly. - Add Precondition in PlanFragment.getNumInstances() to verify that the fragment's num instance is not modified anymore after the costing algorithm finish. Testing: - Manually run TPCDS Q84 over tpcds10_parquet and confirm that the leftmost scan fragment parallelism is raised from 12 (before the patch) to 18 (after the patch). - Add test in PlannerTest.testProcessingCost that reproduces the issue. - Update compute stats test in test_executor_groups.py to maintain test assertion. - Pass core tests. Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 --- M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M tests/custom_cluster/test_executor_groups.py 17 files changed, 298 insertions(+), 169 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/7 -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13266/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 12 Jun 2023 16:16:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Hello Kurt Deschler, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20024 to look at the new patch set (#6). Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. IMPALA-12192: Fix scaling bug in scan fragment IMPALA-12091 has a bug where scan fragment parallelism will always be limited solely by the ScanNode cost. If ScanNode is colocated with other query node operator that have higher processing cost, Planner will not scale it up beyond what is allowed by the ScanNode cost. This patch fix the problem in two aspect. First is to allow scan fragment scale up higher as long as it is within total fragment cost and number of effective scan ranges. Second is to add missing Math.max() in CostingSegment.java that cause lower fragment parallelism even when the total fragment cost is high. This patch also fix "max-parallelism" value in explain string and make a constant in ScanNode.rowMaterializationCost() into a backend flag named scan_range_cost_factor for experimental purpose. Testing: - Manually run TPCDS Q84 over tpcds10_parquet and confirm that the leftmost scan fragment parallelism is raised from 12 (before patch) to 18 (after patch). - Add test in PlannerTest.testProcessingCost that reproduce the issue. - Update compute stats test in test_executor_groups.py to maintain test assertion. - Pass core tests. Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 --- M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/CostingSegment.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M tests/custom_cluster/test_executor_groups.py 12 files changed, 277 insertions(+), 151 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/6 -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13253/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 09 Jun 2023 23:22:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Hello Kurt Deschler, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20024 to look at the new patch set (#5). Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. IMPALA-12192: Fix scaling bug in scan fragment IMPALA-12091 has a bug where scan fragment parallelism will always be limited solely by the ScanNode cost. If ScanNode is colocated with other query node operator that have higher processing cost, Planner will not scale it up beyond what is allowed by the ScanNode cost. This patch fix the problem by allowing scan fragment scale up higher as long as it is within total processing cost of the fragment and number of effective scan ranges. This patch also fix "max-parallelism" value in explain string and make a constant in ScanNode.rowMaterializationCost() into a backend flag named scan_range_cost_factor for experimental purpose. Testing: - Manually run TPCDS Q84 over tpcds10_parquet and confirm that the leftmost scan fragment parallelism is raised from 12 (before patch) to 18 (after patch). - Add test in PlannerTest.testProcessingCost that reproduce the issue. - Update compute stats test in test_executor_groups.py to maintain test assertion. - Pass core tests. Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 --- M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M tests/custom_cluster/test_executor_groups.py 11 files changed, 276 insertions(+), 150 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/5 -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Hello Kurt Deschler, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20024 to look at the new patch set (#4). Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. IMPALA-12192: Fix scaling bug in scan fragment IMPALA-12091 has a bug where scan fragment parallelism will always be limited solely by the ScanNode cost. If ScanNode is colocated with other query node operator that have higher processing cost, Planner will not scale it up beyond what is allowed by the ScanNode cost. This patch fix the problem by allowing scan fragment scale up higher as long as it is within total processing cost of the fragment and number of effective scan ranges. This patch also fix "max-parallelism" value in explain string and make a constant in ScanNode.rowMaterializationCost() into a backend flag named scan_range_cost_factor for experimental purpose. Testing: - Manually run TPCDS Q84 over tpcds10_parquet and confirm that the leftmost scan fragment parallelism is raised from 12 (before patch) to 18 (after patch). - Add test in PlannerTest.testProcessingCost that reproduce the issue. - Update compute stats test in test_executor_groups.py to maintain test assertion. - Pass core tests Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 --- M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M tests/custom_cluster/test_executor_groups.py 11 files changed, 276 insertions(+), 150 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/4 -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 2: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/13250/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 09 Jun 2023 20:07:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/20024/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/20024/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1103 PS1, Line 1103: costBasedMaxParallelism > nit: costBasedMaxParallelism Done http://gerrit.cloudera.org:8080/#/c/20024/1/tests/custom_cluster/test_executor_groups.py File tests/custom_cluster/test_executor_groups.py: http://gerrit.cloudera.org:8080/#/c/20024/1/tests/custom_cluster/test_executor_groups.py@950 PS1, Line 950: u > flake8: E124 closing bracket does not match visual indentation Done -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 09 Jun 2023 19:46:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Hello Kurt Deschler, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20024 to look at the new patch set (#2). Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. IMPALA-12192: Fix scaling bug in scan fragment IMPALA-12091 has a bug where scan fragment parallelism will always be limited solely by the ScanNode cost. If ScanNode is colocated with other query node operator that have higher processing cost, Planner will not scale it up beyond what is allowed by the ScanNode cost. This patch fix the problem by allowing scan fragment scale up higher as long as it is within total processing cost of the fragment and number of effective scan ranges. This patch also fix "max-parallelism" value in explain string and make a constant in ScanNode.rowMaterializationCost() into a backend flag named scan_range_cost_factor for experimental purpose. Testing: - Manually run TPCDS Q84 over tpcds10_parquet and confirm that the leftmost scan fragment parallelism is raised from 12 (before patch) to 18 (after patch). - Add test in PlannerTest.testProcessingCost that reproduce the issue. - Update compute stats test in test_executor_groups.py to maintain test assertion. - Pass core tests. Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 --- M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M tests/custom_cluster/test_executor_groups.py 11 files changed, 276 insertions(+), 150 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/2 -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13232/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 08 Jun 2023 17:37:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/20024/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/20024/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1103 PS1, Line 1103: costBasedMaxParallelism_ nit: costBasedMaxParallelism -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 08 Jun 2023 17:32:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20024 Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. IMPALA-12192: Fix scaling bug in scan fragment IMPALA-12091 has a bug where scan fragment parallelism will always be limited solely by the ScanNode cost. If ScanNode is colocated with other query node operator that have higher processing cost, Planner will not scale it up beyond what is allowed by the ScanNode cost. This patch fix the problem by allowing scan fragment scale up higher as long as it is within total processing cost of the fragment and number of effective scan ranges. This patch also fix "max-parallelism" value in explain string and make a constant in ScanNode.rowMaterializationCost() into a backend flag named scan_range_cost_factor for experimental purpose. Testing: - Manually run TPCDS Q84 over tpcds10_parquet and confirm that the leftmost scan fragment parallelism is raised from 12 (before patch) to 18 (after patch). - Add test in PlannerTest.testProcessingCost that reproduce the issue. - Update compute stats test in test_executor_groups.py to maintain test assertion. - Pass core tests. Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 --- M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M tests/custom_cluster/test_executor_groups.py 11 files changed, 276 insertions(+), 150 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/1 -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20024 ) Change subject: IMPALA-12192: Fix scaling bug in scan fragment .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/20024/1/tests/custom_cluster/test_executor_groups.py File tests/custom_cluster/test_executor_groups.py: http://gerrit.cloudera.org:8080/#/c/20024/1/tests/custom_cluster/test_executor_groups.py@950 PS1, Line 950: ) flake8: E124 closing bracket does not match visual indentation -- To view, visit http://gerrit.cloudera.org:8080/20024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226 Gerrit-Change-Number: 20024 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 08 Jun 2023 17:16:21 + Gerrit-HasComments: Yes