[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Riza Suminto has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. IMPALA-12029: Relax scan fragment parallelism on first planning In a setup with multiple executor group set, Frontend will try to match a query with the smallest executor group set that can fit the memory and cpu requirement of the compiled query. There are kinds of query where the compiled plan will fit to any executor group set but not necessarily deliver the best performance. An example for this is Impala's COMPUTE STATS query. It does full table scan and aggregate the stats, have fairly simple query plan shape, but can benefit from higher scan parallelism. This patch relaxes the scan fragment parallelism on first round of query planning. This allows scan fragment to increase its parallelism based on its ProcessingCost estimation. If the relaxed plan fit in an executor group set, we replan once again with that executor group set but with scan fragment parallelism returned back to MT_DOP. This one extra round of query planning adds couple millisecond overhead depending on the complexity of the query plan, but necessary since the backend scheduler still expect at most MT_DOP amount of scan fragment instances. We can remove the extra replanning in the future once we can fully manage scan node parallelism without MT_DOP. This patch also adds some improvement, including: - Tune computeScanProcessingCost() to guard against scheduling too many scan fragments by comparing with the actual scan range count that Planner knows. - Use NUM_SCANNER_THREADS as a hint to cap scan node cost during the first round of planning. - Multiply memory related counters by num executors to make it per group set rather than per node. - Fix bug in doCreateExecRequest() about selection of num executors for planning. Testing: - Pass test_executor_groups.py - Add test cases in test_min_processing_per_thread_small. - Raised impala.admission-control.max-query-mem-limit.root.small from 64MB to 70MB in llama-site-3-groups.xml so that the new grouping query can fit in root.small pool. Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Reviewed-on: http://gerrit.cloudera.org:8080/19656 Tested-by: Impala Public Jenkins Reviewed-by: Kurt Deschler Reviewed-by: Wenzhe Zhou --- M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/resources/llama-site-3-groups.xml M tests/custom_cluster/test_executor_groups.py 11 files changed, 265 insertions(+), 56 deletions(-) Approvals: Impala Public Jenkins: Verified Kurt Deschler: Looks good to me, but someone else must approve Wenzhe Zhou: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 14 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 04 Apr 2023 17:34:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 13: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 04 Apr 2023 16:05:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 13: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 04 Apr 2023 00:45:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 13: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9196/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 03 Apr 2023 19:30:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 13: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 22:13:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12731/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 22:02:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12730/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 21:59:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 13: (4 comments) http://gerrit.cloudera.org:8080/#/c/19656/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19656/11//COMMIT_MSG@33 PS11, Line 33: - Use NUM_SCANNER_THREADS as a hint to cap scan node cost > I think this should only be applied on first iteration and not applied when Done http://gerrit.cloudera.org:8080/#/c/19656/11/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/19656/11/fe/src/main/java/org/apache/impala/planner/ScanNode.java@378 PS11, Line 378: long syntheticPerRowCost = Lon > can return directly Done http://gerrit.cloudera.org:8080/#/c/19656/11/tests/custom_cluster/test_executor_groups.py File tests/custom_cluster/test_executor_groups.py: http://gerrit.cloudera.org:8080/#/c/19656/11/tests/custom_cluster/test_executor_groups.py@974 PS11, Line 974: that high_scan_cost_qu > nit: that high_scan_cost_query Done http://gerrit.cloudera.org:8080/#/c/19656/11/tests/custom_cluster/test_executor_groups.py@976 PS11, Line 976: options['NUM_SCANNER_THREADS'] = '1' > nit: duplicated line. Done -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 21:42:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19656 to look at the new patch set (#13). Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. IMPALA-12029: Relax scan fragment parallelism on first planning In a setup with multiple executor group set, Frontend will try to match a query with the smallest executor group set that can fit the memory and cpu requirement of the compiled query. There are kinds of query where the compiled plan will fit to any executor group set but not necessarily deliver the best performance. An example for this is Impala's COMPUTE STATS query. It does full table scan and aggregate the stats, have fairly simple query plan shape, but can benefit from higher scan parallelism. This patch relaxes the scan fragment parallelism on first round of query planning. This allows scan fragment to increase its parallelism based on its ProcessingCost estimation. If the relaxed plan fit in an executor group set, we replan once again with that executor group set but with scan fragment parallelism returned back to MT_DOP. This one extra round of query planning adds couple millisecond overhead depending on the complexity of the query plan, but necessary since the backend scheduler still expect at most MT_DOP amount of scan fragment instances. We can remove the extra replanning in the future once we can fully manage scan node parallelism without MT_DOP. This patch also adds some improvement, including: - Tune computeScanProcessingCost() to guard against scheduling too many scan fragments by comparing with the actual scan range count that Planner knows. - Use NUM_SCANNER_THREADS as a hint to cap scan node cost during the first round of planning. - Multiply memory related counters by num executors to make it per group set rather than per node. - Fix bug in doCreateExecRequest() about selection of num executors for planning. Testing: - Pass test_executor_groups.py - Add test cases in test_min_processing_per_thread_small. - Raised impala.admission-control.max-query-mem-limit.root.small from 64MB to 70MB in llama-site-3-groups.xml so that the new grouping query can fit in root.small pool. Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 --- M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/resources/llama-site-3-groups.xml M tests/custom_cluster/test_executor_groups.py 11 files changed, 265 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/13 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19656 to look at the new patch set (#12). Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. IMPALA-12029: Relax scan fragment parallelism on first planning In a setup with multiple executor group set, Frontend will try to match a query with the smallest executor group set that can fit the memory and cpu requirement of the compiled query. There are kinds of query where the compiled plan will fit to any executor group set but not necessarily deliver the best performance. An example for this is Impala's COMPUTE STATS query. It does full table scan and aggregate the stats, have fairly simple query plan shape, but can benefit from higher scan parallelism. This patch relaxes the scan fragment parallelism on first round of query planning. This allows scan fragment to increase its parallelism based on its ProcessingCost estimation. If the relaxed plan fit in an executor group set, we replan once again with that executor group set but with scan fragment parallelism returned back to MT_DOP. This one extra round of query planning adds couple millisecond overhead depending on the complexity of the query plan, but necessary since the backend scheduler still expect at most MT_DOP amount of scan fragment instances. We can remove the extra replanning in the future once we can fully manage scan node parallelism without MT_DOP. This patch also adds some improvement, including: - Tune computeScanProcessingCost() to guard against scheduling too many scan fragments by comparing with the actual scan range count that Planner knows. - Use NUM_SCANNER_THREADS as a hint to cap scan node cost during the first round of planning. - Multiply memory related counters by num executors to make it per group set rather than per node. - Fix bug in doCreateExecRequest() about selection of num executors for planning. Testing: - Pass test_executor_groups.py - Add test cases in test_min_processing_per_thread_small. - Raised impala.admission-control.max-query-mem-limit.root.small from 64MB to 70MB in llama-site-3-groups.xml so that the new grouping query can fit in root.small pool. Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 --- M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/resources/llama-site-3-groups.xml M tests/custom_cluster/test_executor_groups.py 11 files changed, 266 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/12 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/19656/11/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/19656/11/fe/src/main/java/org/apache/impala/planner/ScanNode.java@378 PS11, Line 378: ProcessingCost syntheticCost = can return directly -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 21:38:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/19656/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19656/11//COMMIT_MSG@33 PS11, Line 33: - Use NUM_SCANNER_THREADS as a hint to cap scan node cost. I think this should only be applied on first iteration and not applied when we restore MT_DOP. Otherwise, the scan will have low cost, but high number of instances because MT_DOP wins. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 20:44:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12728/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 19:39:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 11: (2 comments) http://gerrit.cloudera.org:8080/#/c/19656/11/tests/custom_cluster/test_executor_groups.py File tests/custom_cluster/test_executor_groups.py: http://gerrit.cloudera.org:8080/#/c/19656/11/tests/custom_cluster/test_executor_groups.py@974 PS11, Line 974: thatgh_scan_cost_query nit: that high_scan_cost_query http://gerrit.cloudera.org:8080/#/c/19656/11/tests/custom_cluster/test_executor_groups.py@976 PS11, Line 976: options = copy.deepcopy(CPU_DOP_OPTIONS) nit: duplicated line. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 19:21:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 11: (2 comments) http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2178 PS10, Line 2178: planCtx.c > Missing planCtx.compilationState_.restoreState() before continue here. Done http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2324 PS10, Line 2324: LOG.info("Analysis finished."); : } > This should be fixed. Done -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 19:19:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19656 to look at the new patch set (#11). Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. IMPALA-12029: Relax scan fragment parallelism on first planning In a setup with multiple executor group set, Frontend will try to match a query with the smallest executor group set that can fit the memory and cpu requirement of the compiled query. There are kinds of query where the compiled plan will fit to any executor group set but not necessarily deliver the best performance. An example for this is Impala's COMPUTE STATS query. It does full table scan and aggregate the stats, have fairly simple query plan shape, but can benefit from higher scan parallelism. This patch relaxes the scan fragment parallelism on first round of query planning. This allows scan fragment to increase its parallelism based on its ProcessingCost estimation. If the relaxed plan fit in an executor group set, we replan once again with that executor group set but with scan fragment parallelism returned back to MT_DOP. This one extra round of query planning adds couple millisecond overhead depending on the complexity of the query plan, but necessary since the backend scheduler still expect at most MT_DOP amount of scan fragment instances. We can remove the extra replanning in the future once we can fully manage scan node parallelism without MT_DOP. This patch also adds some improvement, including: - Tune computeScanProcessingCost() to guard against scheduling too many scan fragments by comparing with the actual scan range count that Planner knows. - Use NUM_SCANNER_THREADS as a hint to cap scan node cost. - Multiply memory related counters by num executors to make it per group set rather than per node. - Fix bug in doCreateExecRequest() about selection of num executors for planning. Testing: - Pass test_executor_groups.py - Add test cases in test_min_processing_per_thread_small. - Raised impala.admission-control.max-query-mem-limit.root.small from 64MB to 70MB in llama-site-3-groups.xml so that the new grouping query can fit in root.small pool. Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 --- M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/resources/llama-site-3-groups.xml M tests/custom_cluster/test_executor_groups.py 11 files changed, 253 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/11 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2178 PS10, Line 2178: continue; Missing planCtx.compilationState_.restoreState() before continue here. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 19:18:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2324 PS10, Line 2324: analysisResult.getAnalyzer().setNumExecutorsForPlanning( : planCtx.compilationState_.getGroupSet().getCurr_num_executors()); > Is this a potential bug? What if getCurr_num_executors() == 0? Maybe we sho This should be fixed. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 16:38:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2324 PS10, Line 2324: analysisResult.getAnalyzer().setNumExecutorsForPlanning( : planCtx.compilationState_.getGroupSet().getCurr_num_executors()); Is this a potential bug? What if getCurr_num_executors() == 0? Maybe we should fallback to getExpected_num_executors() if that is the case? -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 15:58:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 10: Saw your comment in other patch for which you are waiting David to confirm something about the scan costing. Stopped the verification job. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 02:17:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 10: Code-Review+2 carry +1 from Kurt -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 02:10:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12723/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 02:14:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9187/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 02:11:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/planner/ScanNode.java@362 PS8, Line 362: // Input cardinality is unknown or cost is too high compared to > typo: unkown Done -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 01:54:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19656 to look at the new patch set (#10). Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. IMPALA-12029: Relax scan fragment parallelism on first planning In a setup with multiple executor group set, Frontend will try to match a query with the smallest executor group set that can fit the memory and cpu requirement of the compiled query. There are kinds of query where the compiled plan will fit to any executor group set but not necessarily deliver the best performance. An example for this is Impala's COMPUTE STATS query. It does full table scan and aggregate the stats, have fairly simple query plan shape, but can benefit from higher scan parallelism. This patch relaxes the scan fragment parallelism on first round of query planning. This allows scan fragment to increase its parallelism based on its ProcessingCost estimation. If the relaxed plan fit in an executor group set, we replan once again with that executor group set but with scan fragment parallelism returned back to MT_DOP. This one extra round of query planning adds couple millisecond overhead depending on the complexity of the query plan, but necessary since the backend scheduler still expect at most MT_DOP amount of scan fragment instances. We can remove the extra replanning in the future once we can fully manage scan node parallelism without MT_DOP. This patch also tune computeScanProcessingCost() to guard against scheduling too many scan fragments by comparing with the actual scan range count that Planner knows. Testing: - Pass test_executor_groups.py - Add test case in test_min_processing_per_thread_small. - Raised impala.admission-control.max-query-mem-limit.root.small from 64MB to 70MB in llama-site-3-groups.xml so that the new grouping query can fit in root.small pool. Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 --- M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/resources/llama-site-3-groups.xml M tests/custom_cluster/test_executor_groups.py 10 files changed, 209 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/10 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 9: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/planner/ScanNode.java@362 PS8, Line 362: // Input cardinality is unkown or cost is too high compared to typo: unkown -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 00:46:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 9: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/19656/9/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/19656/9/fe/src/main/java/org/apache/impala/service/Frontend.java@2167 PS9, Line 2167: if (!planCtx.compilationState_.isLimitScanParallelism()) { > If CPU cost option is not enabled, cpuReqSatisfied equals true. Add checkin ignore it -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 00:14:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/19656/9/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/19656/9/fe/src/main/java/org/apache/impala/service/Frontend.java@2167 PS9, Line 2167: if (!planCtx.compilationState_.isLimitScanParallelism()) { If CPU cost option is not enabled, cpuReqSatisfied equals true. Add checking ProcessingCost.isComputeCost(queryOptions) -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 00:11:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12722/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 31 Mar 2023 00:01:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/service/Frontend.java@2110 PS8, Line 2110: > Should we check if ProcessingCost.isComputeCost(queryOptions) is true when Good idea. Moved below with the rest of cpu checking. http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/service/Frontend.java@2144 PS8, Line 2144: if (memoryAskUnbounded > 0) { : addCounter(groupSetProfile, : new TCounter(MEMORY_ASK_UNBOUNDED, TUnit.BYTES, memoryAskUnbounded)); : memoryAskUnbounded = -1; : } : if (cpuAskUnbounded > 0) { : addCounter(groupSetProfile, : new TCounter(CPU_ASK_UNBOUNDED, TUnit.UNIT, cpuAskUnbounded)); : cpuAskUnbounded = -1; : > move this code block in front of line 2142 '}" ? Done -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 23:41:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19656 to look at the new patch set (#9). Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. IMPALA-12029: Relax scan fragment parallelism on first planning In a setup with multiple executor group set, Frontend will try to match a query with the smallest executor group set that can fit the memory and cpu requirement of the compiled query. There are kinds of query where the compiled plan will fit to any executor group set but not necessarily deliver the best performance. An example for this is Impala's COMPUTE STATS query. It does full table scan and aggregate the stats, have fairly simple query plan shape, but can benefit from higher scan parallelism. This patch relaxes the scan fragment parallelism on first round of query planning. This allows scan fragment to increase its parallelism based on its ProcessingCost estimation. If the relaxed plan fit in an executor group set, we replan once again with that executor group set but with scan fragment parallelism returned back to MT_DOP. This one extra round of query planning adds couple millisecond overhead depending on the complexity of the query plan, but necessary since the backend scheduler still expect at most MT_DOP amount of scan fragment instances. We can remove the extra replanning in the future once we can fully manage scan node parallelism without MT_DOP. This patch also tune computeScanProcessingCost() to guard against scheduling too many scan fragments by comparing with the actual scan range count that Planner knows. Testing: - Pass test_executor_groups.py - Add test case in test_min_processing_per_thread_small. - Raised impala.admission-control.max-query-mem-limit.root.small from 64MB to 70MB in llama-site-3-groups.xml so that the new grouping query can fit in root.small pool. Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 --- M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/resources/llama-site-3-groups.xml M tests/custom_cluster/test_executor_groups.py 10 files changed, 209 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/9 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/service/Frontend.java@2110 PS8, Line 2110: addCounter(groupSetProfile, new TCounter(CPU_MAX, TUnit.UNIT, available_cores)); Should we check if ProcessingCost.isComputeCost(queryOptions) is true when adding CPU_MAX counter ? http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/service/Frontend.java@2144 PS8, Line 2144: if (memoryAskUnbounded > 0) { : addCounter(groupSetProfile, : new TCounter(MEMORY_ASK_UNBOUNDED, TUnit.BYTES, memoryAskUnbounded)); : memoryAskUnbounded = -1; : } : if (cpuAskUnbounded > 0) { : addCounter(groupSetProfile, : new TCounter(CPU_ASK_UNBOUNDED, TUnit.UNIT, cpuAskUnbounded)); : cpuAskUnbounded = -1; : } move this code block in front of line 2142 '}" ? -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 23:10:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12721/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 21:25:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/19656/7/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/19656/7/fe/src/main/java/org/apache/impala/service/Frontend.java@2160 PS7, Line 2160: : reason = "query option REQUEST_POOL=" + queryOptions.getRequest_pool() : + " is set. Memory and cpu limit checking is skipped."; : add > Just realized a bug here. This will be set by current EG, but replan does n Fixed by moving the addCounter up. The EG counters will look like the following in the last test case of test_query_cpu_count_divisor_default. - CpuCountDivisor: 1.00 - ExecutorGroupsConsidered: 2 (2) Executor group 1 (root.tiny): Verdict: not enough per-host memory - CpuAsk: 2 (2) - CpuAskUnbounded: 1 (1) - CpuMax: 2 (2) - EffectiveParallelism: 2 (2) - MemoryAsk: 66.27 MB (69489080) - MemoryAskUnbounded: 40.21 MB (42164664) - MemoryMax: 64.00 MB (67108864) Executor group 2 (root.small): Verdict: Match - CpuAsk: 4 (4) - CpuMax: 16 (16) - EffectiveParallelism: 4 (4) - MemoryAsk: 66.32 MB (69540060) - MemoryMax: 70.00 MB (73400320) What happen here is, unbounded plan fit in EG 1, MT_DOP restored but that cause new plan to not fit in EG 1 anymore, then it move on to EG 2 with MT_DOP in place. Neither CpuAskUnbounded nor MemoryAskUnbounded will show up in test_min_processing_per_thread_small because MT_DOP is restored automatically in last EG (large pool). -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 21:11:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19656 to look at the new patch set (#8). Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. IMPALA-12029: Relax scan fragment parallelism on first planning In a setup with multiple executor group set, Frontend will try to match a query with the smallest executor group set that can fit the memory and cpu requirement of the compiled query. There are kinds of query where the compiled plan will fit to any executor group set but not necessarily deliver the best performance. An example for this is Impala's COMPUTE STATS query. It does full table scan and aggregate the stats, have fairly simple query plan shape, but can benefit from higher scan parallelism. This patch relaxes the scan fragment parallelism on first round of query planning. This allows scan fragment to increase its parallelism based on its ProcessingCost estimation. If the relaxed plan fit in an executor group set, we replan once again with that executor group set but with scan fragment parallelism returned back to MT_DOP. This one extra round of query planning adds couple millisecond overhead depending on the complexity of the query plan, but necessary since the backend scheduler still expect at most MT_DOP amount of scan fragment instances. We can remove the extra replanning in the future once we can fully manage scan node parallelism without MT_DOP. This patch also tune computeScanProcessingCost() to guard against scheduling too many scan fragments by comparing with the actual scan range count that Planner knows. Testing: - Pass test_executor_groups.py - Add test case in test_min_processing_per_thread_small. - Raised impala.admission-control.max-query-mem-limit.root.small from 64MB to 70MB in llama-site-3-groups.xml so that the new grouping query can fit in root.small pool. Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 --- M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/resources/llama-site-3-groups.xml M tests/custom_cluster/test_executor_groups.py 10 files changed, 207 insertions(+), 50 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/8 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12720/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 21:02:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/19656/7/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/19656/7/fe/src/main/java/org/apache/impala/service/Frontend.java@2160 PS7, Line 2160: memoryAskUnbounded = per_host_mem_estimate; : if (ProcessingCost.isComputeCost(queryOptions)) { : cpuAskUnbounded = scaled_cores_requirement; : } Just realized a bug here. This will be set by current EG, but replan does not guarantee that it will fit into current EG, causing this to be displayed in next larger EG. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 20:48:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 7: Patch set 7 adds MemoryAskUnbounded and CpuAskUnbounded counter for research purpose. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 20:42:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19656 to look at the new patch set (#7). Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. IMPALA-12029: Relax scan fragment parallelism on first planning In a setup with multiple executor group set, Frontend will try to match a query with the smallest executor group set that can fit the memory and cpu requirement of the compiled query. There are kinds of query where the compiled plan will fit to any executor group set but not necessarily deliver the best performance. An example for this is Impala's COMPUTE STATS query. It does full table scan and aggregate the stats, have fairly simple query plan shape, but can benefit from higher scan parallelism. This patch relaxes the scan fragment parallelism on first round of query planning. This allows scan fragment to increase its parallelism based on its ProcessingCost estimation. If the relaxed plan fit in an executor group set, we replan once again with that executor group set but with scan fragment parallelism returned back to MT_DOP. This one extra round of query planning adds couple millisecond overhead depending on the complexity of the query plan, but necessary since the backend scheduler still expect at most MT_DOP amount of scan fragment instances. We can remove the extra replanning in the future once we can fully manage scan node parallelism without MT_DOP. This patch also tune computeScanProcessingCost() to guard against scheduling too many scan fragments by comparing with the actual scan range count that Planner knows. Testing: - Pass test_executor_groups.py - Add test case in test_min_processing_per_thread_small. - Raised impala.admission-control.max-query-mem-limit.root.small from 64MB to 70MB in llama-site-3-groups.xml so that the new grouping query can fit in root.small pool. Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 --- M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/resources/llama-site-3-groups.xml M tests/custom_cluster/test_executor_groups.py 10 files changed, 203 insertions(+), 50 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/7 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12719/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 17:53:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 6: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 17:37:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/19656/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19656/5//COMMIT_MSG@25 PS5, Line 25: We can : remove the extra replanning in the future > Could you add a TODO comment in the code? Added TODO in Frontend.java -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 17:33:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19656 to look at the new patch set (#6). Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. IMPALA-12029: Relax scan fragment parallelism on first planning In a setup with multiple executor group set, Frontend will try to match a query with the smallest executor group set that can fit the memory and cpu requirement of the compiled query. There are kinds of query where the compiled plan will fit to any executor group set but not necessarily deliver the best performance. An example for this is Impala's COMPUTE STATS query. It does full table scan and aggregate the stats, have fairly simple query plan shape, but can benefit from higher scan parallelism. This patch relaxes the scan fragment parallelism on first round of query planning. This allows scan fragment to increase its parallelism based on its ProcessingCost estimation. If the relaxed plan fit in an executor group set, we replan once again with that executor group set but with scan fragment parallelism returned back to MT_DOP. This one extra round of query planning adds couple millisecond overhead depending on the complexity of the query plan, but necessary since the backend scheduler still expect at most MT_DOP amount of scan fragment instances. We can remove the extra replanning in the future once we can fully manage scan node parallelism without MT_DOP. This patch also tune computeScanProcessingCost() to guard against scheduling too many scan fragments by comparing with the actual scan range count that Planner knows. Testing: - Pass test_executor_groups.py - Add test case in test_min_processing_per_thread_small. - Raised impala.admission-control.max-query-mem-limit.root.small from 64MB to 70MB in llama-site-3-groups.xml so that the new grouping query can fit in root.small pool. Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 --- M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/resources/llama-site-3-groups.xml M tests/custom_cluster/test_executor_groups.py 10 files changed, 187 insertions(+), 50 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/6 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 5: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/19656/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19656/5//COMMIT_MSG@25 PS5, Line 25: We can : remove the extra replanning in the future Could you add a TODO comment in the code? -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 16:41:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12718/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 15:53:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 5: (8 comments) http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@9 PS4, Line 9: In a setup with multiple executor gr > nit: In a setup with multiple executor group set Done http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@11 PS4, Line 11: are > nit: kinds Done http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@18 PS4, Line 18: relax > nit: relaxes Done http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@21 PS4, Line 21: we replan once again with that executor group set but with : scan fragment parallelism returned back to MT_DOP. > Does this mean we need one more replan for any query plan with scan fragmen Yes, there will be one extra replan, except when we arrive at largest executor group set where we immediately fix scan node parallelism to MT_DOP. Looking at test_min_processing_per_thread_small, a single round of query planning there is about 35ms. http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@27 PS4, Line 27: MT_DOP. > nit: fragments Done http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@262 PS4, Line 262: public void computeCostingSegment( > Need a better name for the flag. Maybe limitScanParallelism? Done http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1061 PS4, Line 1061: rNode, int m > here and other places, should we name is as fixedLeafNodes? Renamed to limitScanParallelism. http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/ScanNode.java@361 PS4, Line 361: Thr > <= 0. Otherwise if getInputCardinality() equals 0, syntheticCardinality equ cardinality == 0 is actually legal for empty scan. Added one more case above this. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 15:36:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19656 to look at the new patch set (#5). Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. IMPALA-12029: Relax scan fragment parallelism on first planning In a setup with multiple executor group set, Frontend will try to match a query with the smallest executor group set that can fit the memory and cpu requirement of the compiled query. There are kinds of query where the compiled plan will fit to any executor group set but not necessarily deliver the best performance. An example for this is Impala's COMPUTE STATS query. It does full table scan and aggregate the stats, have fairly simple query plan shape, but can benefit from higher scan parallelism. This patch relaxes the scan fragment parallelism on first round of query planning. This allows scan fragment to increase its parallelism based on its ProcessingCost estimation. If the relaxed plan fit in an executor group set, we replan once again with that executor group set but with scan fragment parallelism returned back to MT_DOP. This one extra round of query planning adds couple millisecond overhead depending on the complexity of the query plan, but necessary since the backend scheduler still expect at most MT_DOP amount of scan fragment instances. We can remove the extra replanning in the future once we can fully manage scan node parallelism without MT_DOP. This patch also tune computeScanProcessingCost() to guard against scheduling too many scan fragments by comparing with the actual scan range count that Planner knows. Testing: - Pass test_executor_groups.py - Add test case in test_min_processing_per_thread_small. - Raised impala.admission-control.max-query-mem-limit.root.small from 64MB to 70MB in llama-site-3-groups.xml so that the new grouping query can fit in root.small pool. Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 --- M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/resources/llama-site-3-groups.xml M tests/custom_cluster/test_executor_groups.py 10 files changed, 185 insertions(+), 50 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/5 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@262 PS4, Line 262: public void computeCostingSegment(TQueryOptions queryOptions, boolean fixLeafNodes) { Need a better name for the flag. Maybe limitScanParallelism? -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 14:09:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@9 PS4, Line 9: In multiple executor group set setup nit: In a setup with multiple executor group set http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@11 PS4, Line 11: kind nit: kinds http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@18 PS4, Line 18: relax nit: relaxes http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@21 PS4, Line 21: we replan once again with that executor group set but with : scan fragment parallelism returned back to MT_DOP. Does this mean we need one more replan for any query plan with scan fragment? What's overhead for small query? http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@27 PS4, Line 27: fragment nit: fragments http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1061 PS4, Line 1061: fixLeafNodes here and other places, should we name is as fixedLeafNodes? http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/ScanNode.java@361 PS4, Line 361: < 0 <= 0. Otherwise if getInputCardinality() equals 0, syntheticCardinality equals 0. Then divide by 0. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 30 Mar 2023 02:06:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12714/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 29 Mar 2023 23:11:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19656 to look at the new patch set (#4). Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. IMPALA-12029: Relax scan fragment parallelism on first planning In multiple executor group set setup, Frontend will try to match a query with the smallest executor group set that can fit the memory and cpu requirement of the compiled query. There are kind of query where the compiled plan will fit to any executor group set but not necessarily deliver the best performance. An example for this is Impala's COMPUTE STATS query. It does full table scan and aggregate the stats, have fairly simple query plan shape, but can benefit from higher scan parallelism. This patch relax the scan fragment parallelism on first round of query planning. This allows scan fragment to increase its parallelism based on its ProcessingCost estimation. If the relaxed plan fit in an executor group set, we replan once again with that executor group set but with scan fragment parallelism returned back to MT_DOP. We can remove the replanning in the future once we can fully manage scan node parallelism without MT_DOP. This patch also tune computeScanProcessingCost() to guard against scheduling too many scan fragment by comparing with the actual scan range count that Planner knows. Testing: - Pass test_executor_groups.py - Add test case in test_min_processing_per_thread_small. - Raised impala.admission-control.max-query-mem-limit.root.small from 64MB to 70MB in llama-site-3-groups.xml so that the new grouping query can fit in root.small pool. Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 --- M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/resources/llama-site-3-groups.xml M tests/custom_cluster/test_executor_groups.py 10 files changed, 177 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/4 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 3: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/12712/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 29 Mar 2023 21:20:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/19656/2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/19656/2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1078 PS2, Line 1078: { : // TODO: Fragment with UnionNode but without ScanNode should have it > Comment here and other places should say leaf node instead of scan node to Done http://gerrit.cloudera.org:8080/#/c/19656/2/tests/custom_cluster/test_executor_groups.py File tests/custom_cluster/test_executor_groups.py: http://gerrit.cloudera.org:8080/#/c/19656/2/tests/custom_cluster/test_executor_groups.py@955 PS2, Line 955: "Test processing cost with min_processing_per_thread s > Misplaced comment. Done -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 29 Mar 2023 21:10:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19656 to look at the new patch set (#3). Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. IMPALA-12029: Relax scan fragment parallelism on first planning In multiple executor group set setup, Frontend will try to match a query with the smallest executor group set that can fit the memory and cpu requirement of the compiled query. There are kind of query where the compiled plan will fit to any executor group set but not necessarily deliver the best performance. An example for this is Impala's COMPUTE STATS query. It does full table scan and aggregate the stats, have fairly simple query plan shape, but can benefit from higher scan parallelism. This patch relax the scan fragment parallelism on first round of query planning. This allows scan fragment to increase its parallelism based on its ProcessingCost estimation. If the relaxed plan fit in an executor group set, we replan once again with that executor group set but with scan fragment parallelism returned back to MT_DOP. We can remove the replanning in the future once we can fully manage scan node parallelism without MT_DOP. Testing: - Pass test_executor_groups.py - Add test case in test_min_processing_per_thread_small. - Raised impala.admission-control.max-query-mem-limit.root.small from 64MB to 70MB in llama-site-3-groups.xml so that the new grouping query can fit in root.small pool. Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 --- M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/resources/llama-site-3-groups.xml M tests/custom_cluster/test_executor_groups.py 6 files changed, 125 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/3 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12703/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 29 Mar 2023 01:48:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19656 ) Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/19656/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19656/1//COMMIT_MSG@23 PS1, Line 23: ag > is not Updated. http://gerrit.cloudera.org:8080/#/c/19656/2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/19656/2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1078 PS2, Line 1078: One exception to that is if this fragment has ScanNode and this is the first : // planning round over selected executor group (see IMPALA-12029). Comment here and other places should say leaf node instead of scan node to be consistent. EmptySetNode, ScanNode, and UnionNode is considered as leaf node. http://gerrit.cloudera.org:8080/#/c/19656/2/tests/custom_cluster/test_executor_groups.py File tests/custom_cluster/test_executor_groups.py: http://gerrit.cloudera.org:8080/#/c/19656/2/tests/custom_cluster/test_executor_groups.py@955 PS2, Line 955: Expect to run the query on the small group by default. Misplaced comment. -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 29 Mar 2023 01:35:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning
Hello Kurt Deschler, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19656 to look at the new patch set (#2). Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning .. IMPALA-12029: Relax scan fragment parallelism on first planning In multiple executor group set setup, Frontend will try to match a query with the smallest executor group set that can fit the memory and cpu requirement of the compiled query. There are kind of query where the compiled plan will fit to any executor group set but not necessarily deliver the best performance. An example for this is Impala's COMPUTE STATS query. It does full table scan and aggregate the stats, have fairly simple query plan shape, but can benefit from higher scan parallelism. This patch relax the scan fragment parallelism on first round of query planning. This allows scan fragment to increase its parallelism based on its ProcessingCost estimation. If the relaxed plan fit in an executor group set, we replan once again with that executor group set but with scan fragment parallelism returned back to MT_DOP. We can remove the replanning in the future once we can fully manage scan node parallelism without MT_DOP. Testing: - Pass test_executor_groups.py - Add test cases in query-options-test.cc. - Add test case in test_min_processing_per_thread_small. - Raised impala.admission-control.max-query-mem-limit.root.small from 64MB to 70MB in llama-site-3-groups.xml so that the new grouping query can fit in root.small pool. Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 --- M be/src/service/query-options-test.cc M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/resources/llama-site-3-groups.xml M tests/custom_cluster/test_executor_groups.py 7 files changed, 129 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/2 -- To view, visit http://gerrit.cloudera.org:8080/19656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9 Gerrit-Change-Number: 19656 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto