[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 11: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sat, 24 Jun 2023 02:24:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..

IMPALA-12192: Fix scaling bug in scan fragment

IMPALA-12091 has a bug where scan fragment parallelism will always be
limited solely by the ScanNode cost. If ScanNode is colocated with other
query node operators that have higher processing costs, Planner will not
scale it up beyond what is allowed by the ScanNode cost.

This patch fixes the problem in two aspects. The first is to allow a
scan fragment to scale up higher as long as it is within the total
fragment cost and the number of effective scan ranges. The second is to
add missing Math.max() in CostingSegment.java which causes lower
fragment parallelism even when the total fragment cost is high.

IMPALA-10287 optimization is re-enabled to reduce regression in TPC-DS
Q78. Ideally, the broadcast vs partitioned costing formula during
distributed planning should not rely on numInstance. But enabling this
optimization ensures consistent query plan shape when comparing against
MT_DOP plan. This optimization can still be disabled by specifying
USE_DOP_FOR_COSTING=false.

This patch also does some cleanup including:
- Fix "max-parallelism" value in explain string.
- Make a constant in ScanNode.rowMaterializationCost() into a backend
  flag named scan_range_cost_factor for experimental purposes.
- Replace all references to ProcessingCost.isComputeCost() to
  queryOptions.isCompute_processing_cost() directly.
- Add Precondition in PlanFragment.getNumInstances() to verify that the
  fragment's num instance is not modified anymore after the costing
  algorithm finish.

Testing:
- Manually run TPCDS Q84 over tpcds10_parquet and confirm that the
  leftmost scan fragment parallelism is raised from 12 (before the
  patch) to 18 (after the patch).
- Add test in PlannerTest.testProcessingCost that reproduces the issue.
- Update compute stats test in test_executor_groups.py to maintain test
  assertion.
- Pass core tests.

Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Reviewed-on: http://gerrit.cloudera.org:8080/20024
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/CostingSegment.java
M fe/src/main/java/org/apache/impala/planner/DataSink.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M tests/custom_cluster/test_executor_groups.py
17 files changed, 585 insertions(+), 409 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-23 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 10: Code-Review+2

carry +1 from Kurt


--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 23 Jun 2023 20:55:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9434/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 23 Jun 2023 20:56:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 11: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 23 Jun 2023 20:56:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13376/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 23 Jun 2023 15:59:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-23 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 10:

(1 comment)

ps10 fix line too long issue.

http://gerrit.cloudera.org:8080/#/c/20024/9/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/20024/9/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1100
PS9, Line 1100:   (int) Math.min(costBasedMaxParallelism, 
maxRangesPerScanNode));
> line too long (94 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 23 Jun 2023 15:36:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-23 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20024

to look at the new patch set (#10).

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..

IMPALA-12192: Fix scaling bug in scan fragment

IMPALA-12091 has a bug where scan fragment parallelism will always be
limited solely by the ScanNode cost. If ScanNode is colocated with other
query node operators that have higher processing costs, Planner will not
scale it up beyond what is allowed by the ScanNode cost.

This patch fixes the problem in two aspects. The first is to allow a
scan fragment to scale up higher as long as it is within the total
fragment cost and the number of effective scan ranges. The second is to
add missing Math.max() in CostingSegment.java which causes lower
fragment parallelism even when the total fragment cost is high.

IMPALA-10287 optimization is re-enabled to reduce regression in TPC-DS
Q78. Ideally, the broadcast vs partitioned costing formula during
distributed planning should not rely on numInstance. But enabling this
optimization ensures consistent query plan shape when comparing against
MT_DOP plan. This optimization can still be disabled by specifying
USE_DOP_FOR_COSTING=false.

This patch also does some cleanup including:
- Fix "max-parallelism" value in explain string.
- Make a constant in ScanNode.rowMaterializationCost() into a backend
  flag named scan_range_cost_factor for experimental purposes.
- Replace all references to ProcessingCost.isComputeCost() to
  queryOptions.isCompute_processing_cost() directly.
- Add Precondition in PlanFragment.getNumInstances() to verify that the
  fragment's num instance is not modified anymore after the costing
  algorithm finish.

Testing:
- Manually run TPCDS Q84 over tpcds10_parquet and confirm that the
  leftmost scan fragment parallelism is raised from 12 (before the
  patch) to 18 (after the patch).
- Add test in PlannerTest.testProcessingCost that reproduces the issue.
- Update compute stats test in test_executor_groups.py to maintain test
  assertion.
- Pass core tests.

Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/CostingSegment.java
M fe/src/main/java/org/apache/impala/planner/DataSink.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M tests/custom_cluster/test_executor_groups.py
17 files changed, 585 insertions(+), 409 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/10
--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-23 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 9: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 23 Jun 2023 14:21:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13370/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 22 Jun 2023 20:11:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13369/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 22 Jun 2023 20:00:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-22 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 9:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/CostingSegment.java
File fe/src/main/java/org/apache/impala/planner/CostingSegment.java:

http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@194
PS7, Line 194: Preconditions.checkState(newParallelism <= maxParallelism,
> This seems always true now with the local max() bounding.
Removed.


http://gerrit.cloudera.org:8080/#/c/20024/6/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/20024/6/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1085
PS6, Line 1085:   
Preconditions.checkState(child.hasAdjustedInstanceCount());
> rename maxScanRangesPerNode
I think maxRangesPerScanNode is more appropriate?


http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@150
PS7, Line 150:   // Set in adjustToMaxParallelism().
> Not clear what the ForExplainString is about
Renamed to maxParallelism_ and update the comment.
The intention is to show the maximum desired parallelism, in query profile, 
before capped against upper limits like MAX_FRAGMENT_INSTANCES_PER_NODE query 
option or Analyzer.getAvailableCoresPerNode() from llama file config.
This is useful for debugging and tuning.


http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1093
PS7, Line 1093:   // union fargment, but it should be capped at 
costBasedMaxParallelism.
> nit: be capped
Done


http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1100
PS7, Line 1100:   maxParallelism_, (int) 
Math.min(costBasedMaxParallelism, maxRangesPerScanNode));
> Might be better to round up.
I pick min here because scan range count can be really big, but 
costBasedMaxParallelism should already have scan cost added into it.


http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1140
PS7, Line 1140:
> Seems strange that maxParallelismForExplainString_does not always follow ma
Refactored the variable names within this method. I think is is easier to 
follow now.


http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@357
PS7, Line 357: ExprUtil.computeExprsTotalCost(conjuncts_), 
rowMaterializationCost());
> Move this into the else if block below.
Refactored this code. The if block is now removed.


http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@403
PS7, Line 403: return perRowCost + scanRangeCostPerRow;
> getScanRangeCostFactor returning double whereas these functons multiply and
Move the type casting within getScanRangeCostFactor().



--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 22 Jun 2023 19:49:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20024/9/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/20024/9/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1100
PS9, Line 1100:   maxParallelism_, (int) 
Math.min(costBasedMaxParallelism, maxRangesPerScanNode));
line too long (94 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 22 Jun 2023 19:48:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-22 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20024

to look at the new patch set (#9).

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..

IMPALA-12192: Fix scaling bug in scan fragment

IMPALA-12091 has a bug where scan fragment parallelism will always be
limited solely by the ScanNode cost. If ScanNode is colocated with other
query node operators that have higher processing costs, Planner will not
scale it up beyond what is allowed by the ScanNode cost.

This patch fixes the problem in two aspects. The first is to allow a
scan fragment to scale up higher as long as it is within the total
fragment cost and the number of effective scan ranges. The second is to
add missing Math.max() in CostingSegment.java which causes lower
fragment parallelism even when the total fragment cost is high.

IMPALA-10287 optimization is re-enabled to reduce regression in TPC-DS
Q78. Ideally, the broadcast vs partitioned costing formula during
distributed planning should not rely on numInstance. But enabling this
optimization ensures consistent query plan shape when comparing against
MT_DOP plan. This optimization can still be disabled by specifying
USE_DOP_FOR_COSTING=false.

This patch also does some cleanup including:
- Fix "max-parallelism" value in explain string.
- Make a constant in ScanNode.rowMaterializationCost() into a backend
  flag named scan_range_cost_factor for experimental purposes.
- Replace all references to ProcessingCost.isComputeCost() to
  queryOptions.isCompute_processing_cost() directly.
- Add Precondition in PlanFragment.getNumInstances() to verify that the
  fragment's num instance is not modified anymore after the costing
  algorithm finish.

Testing:
- Manually run TPCDS Q84 over tpcds10_parquet and confirm that the
  leftmost scan fragment parallelism is raised from 12 (before the
  patch) to 18 (after the patch).
- Add test in PlannerTest.testProcessingCost that reproduces the issue.
- Update compute stats test in test_executor_groups.py to maintain test
  assertion.
- Pass core tests.

Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/CostingSegment.java
M fe/src/main/java/org/apache/impala/planner/DataSink.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M tests/custom_cluster/test_executor_groups.py
17 files changed, 585 insertions(+), 409 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/9
--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-22 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20024

to look at the new patch set (#8).

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..

IMPALA-12192: Fix scaling bug in scan fragment

IMPALA-12091 has a bug where scan fragment parallelism will always be
limited solely by the ScanNode cost. If ScanNode is colocated with other
query node operators that have higher processing costs, Planner will not
scale it up beyond what is allowed by the ScanNode cost.

This patch fixes the problem in two aspects. The first is to allow a
scan fragment to scale up higher as long as it is within the total
fragment cost and the number of effective scan ranges. The second is to
add missing Math.max() in CostingSegment.java which causes lower
fragment parallelism even when the total fragment cost is high.

IMPALA-10287 optimization is re-enabled to reduce regression in TPC-DS
Q78. Ideally, the broadcast vs partitioned costing formula during
distributed planning should not rely on numInstance. But enabling this
optimization ensures consistent query plan shape when comparing against
MT_DOP plan. This optimization can still be disabled by specifying
USE_DOP_FOR_COSTING=false.

This patch also does some cleanup including:
- Fix "max-parallelism" value in explain string.
- Make a constant in ScanNode.rowMaterializationCost() into a backend
  flag named scan_range_cost_factor for experimental purposes.
- Replace all references to ProcessingCost.isComputeCost() to
  queryOptions.isCompute_processing_cost() directly.
- Add Precondition in PlanFragment.getNumInstances() to verify that the
  fragment's num instance is not modified anymore after the costing
  algorithm finish.

Testing:
- Manually run TPCDS Q84 over tpcds10_parquet and confirm that the
  leftmost scan fragment parallelism is raised from 12 (before the
  patch) to 18 (after the patch).
- Add test in PlannerTest.testProcessingCost that reproduces the issue.
- Update compute stats test in test_executor_groups.py to maintain test
  assertion.
- Pass core tests.

Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/CostingSegment.java
M fe/src/main/java/org/apache/impala/planner/DataSink.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M tests/custom_cluster/test_executor_groups.py
17 files changed, 585 insertions(+), 409 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/8
--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-21 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 7:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/CostingSegment.java
File fe/src/main/java/org/apache/impala/planner/CostingSegment.java:

http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@194
PS7, Line 194: Preconditions.checkState(newParallelism >= minParallelism,
This seems always true now with the local max() bounding.


http://gerrit.cloudera.org:8080/#/c/20024/6/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/20024/6/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1085
PS6, Line 1085:   
Preconditions.checkState(child.hasAdjustedInstanceCount());
rename maxScanRangesPerNode


http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@150
PS7, Line 150:   private int maxParallelismForExplainString_ = -1;
Not clear what the ForExplainString is about


http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1093
PS7, Line 1093:   // union fargment, but it should still capped at 
costBasedMaxParallelism.
nit: be capped


http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1100
PS7, Line 1100:   (int) Math.min(costBasedMaxParallelism, 
largestScanRange);
Might be better to round up.


http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1140
PS7, Line 1140: maxParallelism = costBasedMaxParallelism;
Seems strange that maxParallelismForExplainString_does not always follow 
maxParallelism.


http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@357
PS7, Line 357: int maxScannerThreads =
Move this into the else if block below.


http://gerrit.cloudera.org:8080/#/c/20024/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@403
PS7, Line 403: * (float) 
BackendConfig.INSTANCE.getScanRangeCostFactor() / getInputCardinality()
getScanRangeCostFactor returning double whereas these functons multiply and 
return float



--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 21 Jun 2023 15:08:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13314/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 16 Jun 2023 16:36:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-16 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20024

to look at the new patch set (#7).

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..

IMPALA-12192: Fix scaling bug in scan fragment

IMPALA-12091 has a bug where scan fragment parallelism will always be
limited solely by the ScanNode cost. If ScanNode is colocated with other
query node operators that have higher processing costs, Planner will not
scale it up beyond what is allowed by the ScanNode cost.

This patch fixes the problem in two aspects. The first is to allow a
scan fragment to scale up higher as long as it is within the total
fragment cost and the number of effective scan ranges. The second is to
add missing Math.max() in CostingSegment.java which causes lower
fragment parallelism even when the total fragment cost is high.

IMPALA-10287 optimization is re-enabled to reduce regression in TPC-DS
Q78. Ideally, the broadcast vs partitioned costing formula during
distributed planning should not rely on numInstance. But enabling this
optimization ensures consistent query plan shape when comparing against
MT_DOP plan. This optimization can still be disabled by specifying
USE_DOP_FOR_COSTING=false.

This patch also does some cleanup including:
- Fix "max-parallelism" value in explain string.
- Make a constant in ScanNode.rowMaterializationCost() into a backend
  flag named scan_range_cost_factor for experimental purposes.
- Replace all references to ProcessingCost.isComputeCost() to
  queryOptions.isCompute_processing_cost() directly.
- Add Precondition in PlanFragment.getNumInstances() to verify that the
  fragment's num instance is not modified anymore after the costing
  algorithm finish.

Testing:
- Manually run TPCDS Q84 over tpcds10_parquet and confirm that the
  leftmost scan fragment parallelism is raised from 12 (before the
  patch) to 18 (after the patch).
- Add test in PlannerTest.testProcessingCost that reproduces the issue.
- Update compute stats test in test_executor_groups.py to maintain test
  assertion.
- Pass core tests.

Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/CostingSegment.java
M fe/src/main/java/org/apache/impala/planner/DataSink.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M tests/custom_cluster/test_executor_groups.py
17 files changed, 298 insertions(+), 169 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/7
--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13266/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 12 Jun 2023 16:16:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-12 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20024

to look at the new patch set (#6).

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..

IMPALA-12192: Fix scaling bug in scan fragment

IMPALA-12091 has a bug where scan fragment parallelism will always be
limited solely by the ScanNode cost. If ScanNode is colocated with other
query node operator that have higher processing cost, Planner will not
scale it up beyond what is allowed by the ScanNode cost.

This patch fix the problem in two aspect. First is to allow scan
fragment scale up higher as long as it is within total fragment cost and
number of effective scan ranges. Second is to add missing Math.max() in
CostingSegment.java that cause lower fragment parallelism even when the
total fragment cost is high.

This patch also fix "max-parallelism" value in explain string and make a
constant in ScanNode.rowMaterializationCost() into a backend flag named
scan_range_cost_factor for experimental purpose.

Testing:
- Manually run TPCDS Q84 over tpcds10_parquet and confirm that the
  leftmost scan fragment parallelism is raised from 12 (before patch) to
  18 (after patch).
- Add test in PlannerTest.testProcessingCost that reproduce the issue.
- Update compute stats test in test_executor_groups.py to maintain test
  assertion.
- Pass core tests.

Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/CostingSegment.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M tests/custom_cluster/test_executor_groups.py
12 files changed, 277 insertions(+), 151 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/6
--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13253/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 09 Jun 2023 23:22:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-09 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20024

to look at the new patch set (#5).

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..

IMPALA-12192: Fix scaling bug in scan fragment

IMPALA-12091 has a bug where scan fragment parallelism will always be
limited solely by the ScanNode cost. If ScanNode is colocated with other
query node operator that have higher processing cost, Planner will not
scale it up beyond what is allowed by the ScanNode cost.

This patch fix the problem by allowing scan fragment scale up higher as
long as it is within total processing cost of the fragment and number of
effective scan ranges.

This patch also fix "max-parallelism" value in explain string and make a
constant in ScanNode.rowMaterializationCost() into a backend flag named
scan_range_cost_factor for experimental purpose.

Testing:
- Manually run TPCDS Q84 over tpcds10_parquet and confirm that the
  leftmost scan fragment parallelism is raised from 12 (before patch) to
  18 (after patch).
- Add test in PlannerTest.testProcessingCost that reproduce the issue.
- Update compute stats test in test_executor_groups.py to maintain test
  assertion.
- Pass core tests.

Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M tests/custom_cluster/test_executor_groups.py
11 files changed, 276 insertions(+), 150 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/5
-- 
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-09 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20024

to look at the new patch set (#4).

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..

IMPALA-12192: Fix scaling bug in scan fragment

IMPALA-12091 has a bug where scan fragment parallelism will always be
limited solely by the ScanNode cost. If ScanNode is colocated with other
query node operator that have higher processing cost, Planner will not
scale it up beyond what is allowed by the ScanNode cost.

This patch fix the problem by allowing scan fragment scale up higher as
long as it is within total processing cost of the fragment and number of
effective scan ranges.

This patch also fix "max-parallelism" value in explain string and make a
constant in ScanNode.rowMaterializationCost() into a backend flag named
scan_range_cost_factor for experimental purpose.

Testing:
- Manually run TPCDS Q84 over tpcds10_parquet and confirm that the
  leftmost scan fragment parallelism is raised from 12 (before patch) to
  18 (after patch).
- Add test in PlannerTest.testProcessingCost that reproduce the issue.
- Update compute stats test in test_executor_groups.py to maintain test
  assertion.
- Pass core tests

Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M tests/custom_cluster/test_executor_groups.py
11 files changed, 276 insertions(+), 150 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/4
--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 2:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/13250/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 09 Jun 2023 20:07:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-09 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20024/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/20024/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1103
PS1, Line 1103: costBasedMaxParallelism
> nit: costBasedMaxParallelism
Done


http://gerrit.cloudera.org:8080/#/c/20024/1/tests/custom_cluster/test_executor_groups.py
File tests/custom_cluster/test_executor_groups.py:

http://gerrit.cloudera.org:8080/#/c/20024/1/tests/custom_cluster/test_executor_groups.py@950
PS1, Line 950: u
> flake8: E124 closing bracket does not match visual indentation
Done



--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 09 Jun 2023 19:46:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-09 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20024

to look at the new patch set (#2).

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..

IMPALA-12192: Fix scaling bug in scan fragment

IMPALA-12091 has a bug where scan fragment parallelism will always be
limited solely by the ScanNode cost. If ScanNode is colocated with other
query node operator that have higher processing cost, Planner will not
scale it up beyond what is allowed by the ScanNode cost.

This patch fix the problem by allowing scan fragment scale up higher as
long as it is within total processing cost of the fragment and number of
effective scan ranges.

This patch also fix "max-parallelism" value in explain string and make a
constant in ScanNode.rowMaterializationCost() into a backend flag named
scan_range_cost_factor for experimental purpose.

Testing:
- Manually run TPCDS Q84 over tpcds10_parquet and confirm that the
  leftmost scan fragment parallelism is raised from 12 (before patch) to
  18 (after patch).
- Add test in PlannerTest.testProcessingCost that reproduce the issue.
- Update compute stats test in test_executor_groups.py to maintain test
  assertion.
- Pass core tests.

Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M tests/custom_cluster/test_executor_groups.py
11 files changed, 276 insertions(+), 150 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/2
-- 
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13232/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 08 Jun 2023 17:37:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-08 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20024/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/20024/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1103
PS1, Line 1103: costBasedMaxParallelism_
nit: costBasedMaxParallelism



--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 08 Jun 2023 17:32:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-08 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20024


Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..

IMPALA-12192: Fix scaling bug in scan fragment

IMPALA-12091 has a bug where scan fragment parallelism will always be
limited solely by the ScanNode cost. If ScanNode is colocated with other
query node operator that have higher processing cost, Planner will not
scale it up beyond what is allowed by the ScanNode cost.

This patch fix the problem by allowing scan fragment scale up higher as
long as it is within total processing cost of the fragment and number of
effective scan ranges.

This patch also fix "max-parallelism" value in explain string and make a
constant in ScanNode.rowMaterializationCost() into a backend flag named
scan_range_cost_factor for experimental purpose.

Testing:
- Manually run TPCDS Q84 over tpcds10_parquet and confirm that the
  leftmost scan fragment parallelism is raised from 12 (before patch) to
  18 (after patch).
- Add test in PlannerTest.testProcessingCost that reproduce the issue.
- Update compute stats test in test_executor_groups.py to maintain test
  assertion.
- Pass core tests.

Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M tests/custom_cluster/test_executor_groups.py
11 files changed, 276 insertions(+), 150 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/20024/1
--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-12192: Fix scaling bug in scan fragment

2023-06-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20024 )

Change subject: IMPALA-12192: Fix scaling bug in scan fragment
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20024/1/tests/custom_cluster/test_executor_groups.py
File tests/custom_cluster/test_executor_groups.py:

http://gerrit.cloudera.org:8080/#/c/20024/1/tests/custom_cluster/test_executor_groups.py@950
PS1, Line 950: )
flake8: E124 closing bracket does not match visual indentation



--
To view, visit http://gerrit.cloudera.org:8080/20024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Gerrit-Change-Number: 20024
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 08 Jun 2023 17:16:21 +
Gerrit-HasComments: Yes