[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-04-04 Thread Riza Suminto (Code Review)
Riza Suminto has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..

IMPALA-12029: Relax scan fragment parallelism on first planning

In a setup with multiple executor group set, Frontend will try to match
a query with the smallest executor group set that can fit the memory and
cpu requirement of the compiled query. There are kinds of query where
the compiled plan will fit to any executor group set but not necessarily
deliver the best performance. An example for this is Impala's COMPUTE
STATS query. It does full table scan and aggregate the stats, have
fairly simple query plan shape, but can benefit from higher scan
parallelism.

This patch relaxes the scan fragment parallelism on first round of query
planning. This allows scan fragment to increase its parallelism based on
its ProcessingCost estimation. If the relaxed plan fit in an executor
group set, we replan once again with that executor group set but with
scan fragment parallelism returned back to MT_DOP. This one extra round
of query planning adds couple millisecond overhead depending on the
complexity of the query plan, but necessary since the backend scheduler
still expect at most MT_DOP amount of scan fragment instances. We can
remove the extra replanning in the future once we can fully manage scan
node parallelism without MT_DOP.

This patch also adds some improvement, including:
- Tune computeScanProcessingCost() to guard against scheduling too many
  scan fragments by comparing with the actual scan range count that
  Planner knows.
- Use NUM_SCANNER_THREADS as a hint to cap scan node cost during the
  first round of planning.
- Multiply memory related counters by num executors to make it per group
  set rather than per node.
- Fix bug in doCreateExecRequest() about selection of num executors for
  planning.

Testing:
- Pass test_executor_groups.py
- Add test cases in test_min_processing_per_thread_small.
- Raised impala.admission-control.max-query-mem-limit.root.small from
  64MB to 70MB in llama-site-3-groups.xml so that the new grouping query
  can fit in root.small pool.

Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Reviewed-on: http://gerrit.cloudera.org:8080/19656
Tested-by: Impala Public Jenkins 
Reviewed-by: Kurt Deschler 
Reviewed-by: Wenzhe Zhou 
---
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/resources/llama-site-3-groups.xml
M tests/custom_cluster/test_executor_groups.py
11 files changed, 265 insertions(+), 56 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Kurt Deschler: Looks good to me, but someone else must approve
  Wenzhe Zhou: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-04-04 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 13: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 13
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 04 Apr 2023 17:34:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-04-04 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 13: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 13
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 04 Apr 2023 16:05:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-04-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 13: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 13
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 04 Apr 2023 00:45:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-04-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9196/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 13
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 03 Apr 2023 19:30:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-31 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 13: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 13
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 22:13:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12731/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 13
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 22:02:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12730/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 21:59:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-31 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 13:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/19656/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19656/11//COMMIT_MSG@33
PS11, Line 33: - Use NUM_SCANNER_THREADS as a hint to cap scan node cost
> I think this should only be applied on first iteration and not applied when
Done


http://gerrit.cloudera.org:8080/#/c/19656/11/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19656/11/fe/src/main/java/org/apache/impala/planner/ScanNode.java@378
PS11, Line 378: long syntheticPerRowCost = Lon
> can return directly
Done


http://gerrit.cloudera.org:8080/#/c/19656/11/tests/custom_cluster/test_executor_groups.py
File tests/custom_cluster/test_executor_groups.py:

http://gerrit.cloudera.org:8080/#/c/19656/11/tests/custom_cluster/test_executor_groups.py@974
PS11, Line 974: that high_scan_cost_qu
> nit: that high_scan_cost_query
Done


http://gerrit.cloudera.org:8080/#/c/19656/11/tests/custom_cluster/test_executor_groups.py@976
PS11, Line 976: options['NUM_SCANNER_THREADS'] = '1'
> nit: duplicated line.
Done



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 13
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 21:42:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-31 Thread Riza Suminto (Code Review)
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19656

to look at the new patch set (#13).

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..

IMPALA-12029: Relax scan fragment parallelism on first planning

In a setup with multiple executor group set, Frontend will try to match
a query with the smallest executor group set that can fit the memory and
cpu requirement of the compiled query. There are kinds of query where
the compiled plan will fit to any executor group set but not necessarily
deliver the best performance. An example for this is Impala's COMPUTE
STATS query. It does full table scan and aggregate the stats, have
fairly simple query plan shape, but can benefit from higher scan
parallelism.

This patch relaxes the scan fragment parallelism on first round of query
planning. This allows scan fragment to increase its parallelism based on
its ProcessingCost estimation. If the relaxed plan fit in an executor
group set, we replan once again with that executor group set but with
scan fragment parallelism returned back to MT_DOP. This one extra round
of query planning adds couple millisecond overhead depending on the
complexity of the query plan, but necessary since the backend scheduler
still expect at most MT_DOP amount of scan fragment instances. We can
remove the extra replanning in the future once we can fully manage scan
node parallelism without MT_DOP.

This patch also adds some improvement, including:
- Tune computeScanProcessingCost() to guard against scheduling too many
  scan fragments by comparing with the actual scan range count that
  Planner knows.
- Use NUM_SCANNER_THREADS as a hint to cap scan node cost during the
  first round of planning.
- Multiply memory related counters by num executors to make it per group
  set rather than per node.
- Fix bug in doCreateExecRequest() about selection of num executors for
  planning.

Testing:
- Pass test_executor_groups.py
- Add test cases in test_min_processing_per_thread_small.
- Raised impala.admission-control.max-query-mem-limit.root.small from
  64MB to 70MB in llama-site-3-groups.xml so that the new grouping query
  can fit in root.small pool.

Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
---
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/resources/llama-site-3-groups.xml
M tests/custom_cluster/test_executor_groups.py
11 files changed, 265 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/13
--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 13
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-31 Thread Riza Suminto (Code Review)
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19656

to look at the new patch set (#12).

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..

IMPALA-12029: Relax scan fragment parallelism on first planning

In a setup with multiple executor group set, Frontend will try to match
a query with the smallest executor group set that can fit the memory and
cpu requirement of the compiled query. There are kinds of query where
the compiled plan will fit to any executor group set but not necessarily
deliver the best performance. An example for this is Impala's COMPUTE
STATS query. It does full table scan and aggregate the stats, have
fairly simple query plan shape, but can benefit from higher scan
parallelism.

This patch relaxes the scan fragment parallelism on first round of query
planning. This allows scan fragment to increase its parallelism based on
its ProcessingCost estimation. If the relaxed plan fit in an executor
group set, we replan once again with that executor group set but with
scan fragment parallelism returned back to MT_DOP. This one extra round
of query planning adds couple millisecond overhead depending on the
complexity of the query plan, but necessary since the backend scheduler
still expect at most MT_DOP amount of scan fragment instances. We can
remove the extra replanning in the future once we can fully manage scan
node parallelism without MT_DOP.

This patch also adds some improvement, including:
- Tune computeScanProcessingCost() to guard against scheduling too many
  scan fragments by comparing with the actual scan range count that
  Planner knows.
- Use NUM_SCANNER_THREADS as a hint to cap scan node cost during the
  first round of planning.
- Multiply memory related counters by num executors to make it per group
  set rather than per node.
- Fix bug in doCreateExecRequest() about selection of num executors for
  planning.

Testing:
- Pass test_executor_groups.py
- Add test cases in test_min_processing_per_thread_small.
- Raised impala.admission-control.max-query-mem-limit.root.small from
  64MB to 70MB in llama-site-3-groups.xml so that the new grouping query
  can fit in root.small pool.

Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
---
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/resources/llama-site-3-groups.xml
M tests/custom_cluster/test_executor_groups.py
11 files changed, 266 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/12
--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-31 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19656/11/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19656/11/fe/src/main/java/org/apache/impala/planner/ScanNode.java@378
PS11, Line 378: ProcessingCost syntheticCost =
can return directly



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 21:38:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-31 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19656/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19656/11//COMMIT_MSG@33
PS11, Line 33: - Use NUM_SCANNER_THREADS as a hint to cap scan node cost.
I think this should only be applied on first iteration and not applied when we 
restore MT_DOP. Otherwise, the scan will have low cost, but high number of 
instances because MT_DOP wins.



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 20:44:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12728/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 19:39:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-31 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19656/11/tests/custom_cluster/test_executor_groups.py
File tests/custom_cluster/test_executor_groups.py:

http://gerrit.cloudera.org:8080/#/c/19656/11/tests/custom_cluster/test_executor_groups.py@974
PS11, Line 974: thatgh_scan_cost_query
nit: that high_scan_cost_query


http://gerrit.cloudera.org:8080/#/c/19656/11/tests/custom_cluster/test_executor_groups.py@976
PS11, Line 976: options = copy.deepcopy(CPU_DOP_OPTIONS)
nit: duplicated line.



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 19:21:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-31 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2178
PS10, Line 2178: planCtx.c
> Missing planCtx.compilationState_.restoreState() before continue here.
Done


http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2324
PS10, Line 2324:   LOG.info("Analysis finished.");
   : }
> This should be fixed.
Done



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 19:19:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-31 Thread Riza Suminto (Code Review)
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19656

to look at the new patch set (#11).

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..

IMPALA-12029: Relax scan fragment parallelism on first planning

In a setup with multiple executor group set, Frontend will try to match
a query with the smallest executor group set that can fit the memory and
cpu requirement of the compiled query. There are kinds of query where
the compiled plan will fit to any executor group set but not necessarily
deliver the best performance. An example for this is Impala's COMPUTE
STATS query. It does full table scan and aggregate the stats, have
fairly simple query plan shape, but can benefit from higher scan
parallelism.

This patch relaxes the scan fragment parallelism on first round of query
planning. This allows scan fragment to increase its parallelism based on
its ProcessingCost estimation. If the relaxed plan fit in an executor
group set, we replan once again with that executor group set but with
scan fragment parallelism returned back to MT_DOP. This one extra round
of query planning adds couple millisecond overhead depending on the
complexity of the query plan, but necessary since the backend scheduler
still expect at most MT_DOP amount of scan fragment instances. We can
remove the extra replanning in the future once we can fully manage scan
node parallelism without MT_DOP.

This patch also adds some improvement, including:
- Tune computeScanProcessingCost() to guard against scheduling too many
  scan fragments by comparing with the actual scan range count that
  Planner knows.
- Use NUM_SCANNER_THREADS as a hint to cap scan node cost.
- Multiply memory related counters by num executors to make it per group
  set rather than per node.
- Fix bug in doCreateExecRequest() about selection of num executors for
  planning.

Testing:
- Pass test_executor_groups.py
- Add test cases in test_min_processing_per_thread_small.
- Raised impala.admission-control.max-query-mem-limit.root.small from
  64MB to 70MB in llama-site-3-groups.xml so that the new grouping query
  can fit in root.small pool.

Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
---
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/resources/llama-site-3-groups.xml
M tests/custom_cluster/test_executor_groups.py
11 files changed, 253 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/11
-- 
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-31 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2178
PS10, Line 2178: continue;
Missing planCtx.compilationState_.restoreState() before continue here.



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 19:18:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-31 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2324
PS10, Line 2324: analysisResult.getAnalyzer().setNumExecutorsForPlanning(
   : 
planCtx.compilationState_.getGroupSet().getCurr_num_executors());
> Is this a potential bug? What if getCurr_num_executors() == 0? Maybe we sho
This should be fixed.



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 16:38:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-31 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/19656/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2324
PS10, Line 2324: analysisResult.getAnalyzer().setNumExecutorsForPlanning(
   : 
planCtx.compilationState_.getGroupSet().getCurr_num_executors());
Is this a potential bug? What if getCurr_num_executors() == 0? Maybe we should 
fallback to getExpected_num_executors() if that is the case?



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 15:58:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 10:

Saw your comment in other patch for which you are waiting David to confirm 
something about the scan costing. Stopped the verification job.


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 02:17:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 10: Code-Review+2

carry +1 from Kurt


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 02:10:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12723/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 02:14:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9187/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 02:11:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/planner/ScanNode.java@362
PS8, Line 362:   // Input cardinality is unknown or cost is too high 
compared to
> typo: unkown
Done



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 01:54:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Riza Suminto (Code Review)
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19656

to look at the new patch set (#10).

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..

IMPALA-12029: Relax scan fragment parallelism on first planning

In a setup with multiple executor group set, Frontend will try to match
a query with the smallest executor group set that can fit the memory and
cpu requirement of the compiled query. There are kinds of query where
the compiled plan will fit to any executor group set but not necessarily
deliver the best performance. An example for this is Impala's COMPUTE
STATS query. It does full table scan and aggregate the stats, have
fairly simple query plan shape, but can benefit from higher scan
parallelism.

This patch relaxes the scan fragment parallelism on first round of query
planning. This allows scan fragment to increase its parallelism based on
its ProcessingCost estimation. If the relaxed plan fit in an executor
group set, we replan once again with that executor group set but with
scan fragment parallelism returned back to MT_DOP. This one extra round
of query planning adds couple millisecond overhead depending on the
complexity of the query plan, but necessary since the backend scheduler
still expect at most MT_DOP amount of scan fragment instances. We can
remove the extra replanning in the future once we can fully manage scan
node parallelism without MT_DOP.

This patch also tune computeScanProcessingCost() to guard against
scheduling too many scan fragments by comparing with the actual scan
range count that Planner knows.

Testing:
- Pass test_executor_groups.py
- Add test case in test_min_processing_per_thread_small.
- Raised impala.admission-control.max-query-mem-limit.root.small from
  64MB to 70MB in llama-site-3-groups.xml so that the new grouping query
  can fit in root.small pool.

Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
---
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/resources/llama-site-3-groups.xml
M tests/custom_cluster/test_executor_groups.py
10 files changed, 209 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/10
--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 9: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/planner/ScanNode.java@362
PS8, Line 362:   // Input cardinality is unkown or cost is too high 
compared to
typo: unkown



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 00:46:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 9: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19656/9/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/19656/9/fe/src/main/java/org/apache/impala/service/Frontend.java@2167
PS9, Line 2167: if (!planCtx.compilationState_.isLimitScanParallelism()) {
> If CPU cost option is not enabled, cpuReqSatisfied equals true. Add checkin
ignore it



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 00:14:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19656/9/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/19656/9/fe/src/main/java/org/apache/impala/service/Frontend.java@2167
PS9, Line 2167: if (!planCtx.compilationState_.isLimitScanParallelism()) {
If CPU cost option is not enabled, cpuReqSatisfied equals true. Add checking 
ProcessingCost.isComputeCost(queryOptions)



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 00:11:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12722/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 31 Mar 2023 00:01:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/service/Frontend.java@2110
PS8, Line 2110:
> Should we check if ProcessingCost.isComputeCost(queryOptions) is true when
Good idea. Moved below with the rest of cpu checking.


http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/service/Frontend.java@2144
PS8, Line 2144: if (memoryAskUnbounded > 0) {
  :   addCounter(groupSetProfile,
  :   new TCounter(MEMORY_ASK_UNBOUNDED, TUnit.BYTES, 
memoryAskUnbounded));
  :   memoryAskUnbounded = -1;
  : }
  : if (cpuAskUnbounded > 0) {
  :   addCounter(groupSetProfile,
  :   new TCounter(CPU_ASK_UNBOUNDED, TUnit.UNIT, 
cpuAskUnbounded));
  :   cpuAskUnbounded = -1;
  :
> move this code block in front of line 2142 '}" ?
Done



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 30 Mar 2023 23:41:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Riza Suminto (Code Review)
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19656

to look at the new patch set (#9).

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..

IMPALA-12029: Relax scan fragment parallelism on first planning

In a setup with multiple executor group set, Frontend will try to match
a query with the smallest executor group set that can fit the memory and
cpu requirement of the compiled query. There are kinds of query where
the compiled plan will fit to any executor group set but not necessarily
deliver the best performance. An example for this is Impala's COMPUTE
STATS query. It does full table scan and aggregate the stats, have
fairly simple query plan shape, but can benefit from higher scan
parallelism.

This patch relaxes the scan fragment parallelism on first round of query
planning. This allows scan fragment to increase its parallelism based on
its ProcessingCost estimation. If the relaxed plan fit in an executor
group set, we replan once again with that executor group set but with
scan fragment parallelism returned back to MT_DOP. This one extra round
of query planning adds couple millisecond overhead depending on the
complexity of the query plan, but necessary since the backend scheduler
still expect at most MT_DOP amount of scan fragment instances. We can
remove the extra replanning in the future once we can fully manage scan
node parallelism without MT_DOP.

This patch also tune computeScanProcessingCost() to guard against
scheduling too many scan fragments by comparing with the actual scan
range count that Planner knows.

Testing:
- Pass test_executor_groups.py
- Add test case in test_min_processing_per_thread_small.
- Raised impala.admission-control.max-query-mem-limit.root.small from
  64MB to 70MB in llama-site-3-groups.xml so that the new grouping query
  can fit in root.small pool.

Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
---
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/resources/llama-site-3-groups.xml
M tests/custom_cluster/test_executor_groups.py
10 files changed, 209 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/9
--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/service/Frontend.java@2110
PS8, Line 2110: addCounter(groupSetProfile, new TCounter(CPU_MAX, TUnit.UNIT, 
available_cores));
Should we check if ProcessingCost.isComputeCost(queryOptions) is true when 
adding CPU_MAX counter ?


http://gerrit.cloudera.org:8080/#/c/19656/8/fe/src/main/java/org/apache/impala/service/Frontend.java@2144
PS8, Line 2144:   if (memoryAskUnbounded > 0) {
  : addCounter(groupSetProfile,
  : new TCounter(MEMORY_ASK_UNBOUNDED, TUnit.BYTES, 
memoryAskUnbounded));
  : memoryAskUnbounded = -1;
  :   }
  :   if (cpuAskUnbounded > 0) {
  : addCounter(groupSetProfile,
  : new TCounter(CPU_ASK_UNBOUNDED, TUnit.UNIT, 
cpuAskUnbounded));
  : cpuAskUnbounded = -1;
  :   }
move this code block in front of line 2142 '}" ?



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 30 Mar 2023 23:10:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12721/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 30 Mar 2023 21:25:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19656/7/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/19656/7/fe/src/main/java/org/apache/impala/service/Frontend.java@2160
PS7, Line 2160:
  : reason = "query option REQUEST_POOL=" + 
queryOptions.getRequest_pool()
  : + " is set. Memory and cpu limit checking is 
skipped.";
  : add
> Just realized a bug here. This will be set by current EG, but replan does n
Fixed by moving the addCounter up.

The EG counters will look like the following in the last test case of 
test_query_cpu_count_divisor_default.

   - CpuCountDivisor: 1.00
   - ExecutorGroupsConsidered: 2 (2)
  Executor group 1 (root.tiny):
Verdict: not enough per-host memory
 - CpuAsk: 2 (2)
 - CpuAskUnbounded: 1 (1)
 - CpuMax: 2 (2)
 - EffectiveParallelism: 2 (2)
 - MemoryAsk: 66.27 MB (69489080)
 - MemoryAskUnbounded: 40.21 MB (42164664)
 - MemoryMax: 64.00 MB (67108864)
  Executor group 2 (root.small):
Verdict: Match
 - CpuAsk: 4 (4)
 - CpuMax: 16 (16)
 - EffectiveParallelism: 4 (4)
 - MemoryAsk: 66.32 MB (69540060)
 - MemoryMax: 70.00 MB (73400320)

What happen here is, unbounded plan fit in EG 1, MT_DOP restored but that cause 
new plan to not fit in EG 1 anymore, then it move on to EG 2 with MT_DOP in 
place.

Neither CpuAskUnbounded nor MemoryAskUnbounded will show up in 
test_min_processing_per_thread_small because MT_DOP is restored automatically 
in last EG (large pool).



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 30 Mar 2023 21:11:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Riza Suminto (Code Review)
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19656

to look at the new patch set (#8).

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..

IMPALA-12029: Relax scan fragment parallelism on first planning

In a setup with multiple executor group set, Frontend will try to match
a query with the smallest executor group set that can fit the memory and
cpu requirement of the compiled query. There are kinds of query where
the compiled plan will fit to any executor group set but not necessarily
deliver the best performance. An example for this is Impala's COMPUTE
STATS query. It does full table scan and aggregate the stats, have
fairly simple query plan shape, but can benefit from higher scan
parallelism.

This patch relaxes the scan fragment parallelism on first round of query
planning. This allows scan fragment to increase its parallelism based on
its ProcessingCost estimation. If the relaxed plan fit in an executor
group set, we replan once again with that executor group set but with
scan fragment parallelism returned back to MT_DOP. This one extra round
of query planning adds couple millisecond overhead depending on the
complexity of the query plan, but necessary since the backend scheduler
still expect at most MT_DOP amount of scan fragment instances. We can
remove the extra replanning in the future once we can fully manage scan
node parallelism without MT_DOP.

This patch also tune computeScanProcessingCost() to guard against
scheduling too many scan fragments by comparing with the actual scan
range count that Planner knows.

Testing:
- Pass test_executor_groups.py
- Add test case in test_min_processing_per_thread_small.
- Raised impala.admission-control.max-query-mem-limit.root.small from
  64MB to 70MB in llama-site-3-groups.xml so that the new grouping query
  can fit in root.small pool.

Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
---
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/resources/llama-site-3-groups.xml
M tests/custom_cluster/test_executor_groups.py
10 files changed, 207 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/8
--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12720/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 30 Mar 2023 21:02:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19656/7/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/19656/7/fe/src/main/java/org/apache/impala/service/Frontend.java@2160
PS7, Line 2160: memoryAskUnbounded = per_host_mem_estimate;
  :   if (ProcessingCost.isComputeCost(queryOptions)) {
  : cpuAskUnbounded = scaled_cores_requirement;
  :   }
Just realized a bug here. This will be set by current EG, but replan does not 
guarantee that it will fit into current EG, causing this to be displayed in 
next larger EG.



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 30 Mar 2023 20:48:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 7:

Patch set 7 adds MemoryAskUnbounded and CpuAskUnbounded counter for research 
purpose.


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 30 Mar 2023 20:42:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Riza Suminto (Code Review)
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19656

to look at the new patch set (#7).

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..

IMPALA-12029: Relax scan fragment parallelism on first planning

In a setup with multiple executor group set, Frontend will try to match
a query with the smallest executor group set that can fit the memory and
cpu requirement of the compiled query. There are kinds of query where
the compiled plan will fit to any executor group set but not necessarily
deliver the best performance. An example for this is Impala's COMPUTE
STATS query. It does full table scan and aggregate the stats, have
fairly simple query plan shape, but can benefit from higher scan
parallelism.

This patch relaxes the scan fragment parallelism on first round of query
planning. This allows scan fragment to increase its parallelism based on
its ProcessingCost estimation. If the relaxed plan fit in an executor
group set, we replan once again with that executor group set but with
scan fragment parallelism returned back to MT_DOP. This one extra round
of query planning adds couple millisecond overhead depending on the
complexity of the query plan, but necessary since the backend scheduler
still expect at most MT_DOP amount of scan fragment instances. We can
remove the extra replanning in the future once we can fully manage scan
node parallelism without MT_DOP.

This patch also tune computeScanProcessingCost() to guard against
scheduling too many scan fragments by comparing with the actual scan
range count that Planner knows.

Testing:
- Pass test_executor_groups.py
- Add test case in test_min_processing_per_thread_small.
- Raised impala.admission-control.max-query-mem-limit.root.small from
  64MB to 70MB in llama-site-3-groups.xml so that the new grouping query
  can fit in root.small pool.

Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
---
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/resources/llama-site-3-groups.xml
M tests/custom_cluster/test_executor_groups.py
10 files changed, 203 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/7
--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12719/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 30 Mar 2023 17:53:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 6: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 30 Mar 2023 17:37:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19656/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19656/5//COMMIT_MSG@25
PS5, Line 25: We can
: remove the extra replanning in the future
> Could you add a TODO comment in the code?
Added TODO in Frontend.java



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 30 Mar 2023 17:33:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Riza Suminto (Code Review)
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19656

to look at the new patch set (#6).

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..

IMPALA-12029: Relax scan fragment parallelism on first planning

In a setup with multiple executor group set, Frontend will try to match
a query with the smallest executor group set that can fit the memory and
cpu requirement of the compiled query. There are kinds of query where
the compiled plan will fit to any executor group set but not necessarily
deliver the best performance. An example for this is Impala's COMPUTE
STATS query. It does full table scan and aggregate the stats, have
fairly simple query plan shape, but can benefit from higher scan
parallelism.

This patch relaxes the scan fragment parallelism on first round of query
planning. This allows scan fragment to increase its parallelism based on
its ProcessingCost estimation. If the relaxed plan fit in an executor
group set, we replan once again with that executor group set but with
scan fragment parallelism returned back to MT_DOP. This one extra round
of query planning adds couple millisecond overhead depending on the
complexity of the query plan, but necessary since the backend scheduler
still expect at most MT_DOP amount of scan fragment instances. We can
remove the extra replanning in the future once we can fully manage scan
node parallelism without MT_DOP.

This patch also tune computeScanProcessingCost() to guard against
scheduling too many scan fragments by comparing with the actual scan
range count that Planner knows.

Testing:
- Pass test_executor_groups.py
- Add test case in test_min_processing_per_thread_small.
- Raised impala.admission-control.max-query-mem-limit.root.small from
  64MB to 70MB in llama-site-3-groups.xml so that the new grouping query
  can fit in root.small pool.

Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
---
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/resources/llama-site-3-groups.xml
M tests/custom_cluster/test_executor_groups.py
10 files changed, 187 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/6
--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 5: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19656/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19656/5//COMMIT_MSG@25
PS5, Line 25: We can
: remove the extra replanning in the future
Could you add a TODO comment in the code?



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 30 Mar 2023 16:41:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12718/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 30 Mar 2023 15:53:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@9
PS4, Line 9: In a setup with multiple executor gr
> nit: In a setup with multiple executor group set
Done


http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@11
PS4, Line 11: are
> nit: kinds
Done


http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@18
PS4, Line 18: relax
> nit: relaxes
Done


http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@21
PS4, Line 21: we replan once again with that executor group set but with
: scan fragment parallelism returned back to MT_DOP.
> Does this mean we need one more replan for any query plan with scan fragmen
Yes, there will be one extra replan, except when we arrive at largest executor 
group set where we immediately fix scan node parallelism to MT_DOP.
Looking at test_min_processing_per_thread_small, a single round of query 
planning there is about 35ms.


http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@27
PS4, Line 27: MT_DOP.
> nit: fragments
Done


http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@262
PS4, Line 262:   public void computeCostingSegment(
> Need a better name for the flag. Maybe limitScanParallelism?
Done


http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1061
PS4, Line 1061: rNode, int m
> here and other places, should we name is as fixedLeafNodes?
Renamed to limitScanParallelism.


http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/ScanNode.java@361
PS4, Line 361: Thr
> <= 0. Otherwise if getInputCardinality() equals 0, syntheticCardinality equ
cardinality == 0 is actually legal for empty scan.
Added one more case above this.



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 30 Mar 2023 15:36:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Riza Suminto (Code Review)
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19656

to look at the new patch set (#5).

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..

IMPALA-12029: Relax scan fragment parallelism on first planning

In a setup with multiple executor group set, Frontend will try to match
a query with the smallest executor group set that can fit the memory and
cpu requirement of the compiled query. There are kinds of query where
the compiled plan will fit to any executor group set but not necessarily
deliver the best performance. An example for this is Impala's COMPUTE
STATS query. It does full table scan and aggregate the stats, have
fairly simple query plan shape, but can benefit from higher scan
parallelism.

This patch relaxes the scan fragment parallelism on first round of query
planning. This allows scan fragment to increase its parallelism based on
its ProcessingCost estimation. If the relaxed plan fit in an executor
group set, we replan once again with that executor group set but with
scan fragment parallelism returned back to MT_DOP. This one extra round
of query planning adds couple millisecond overhead depending on the
complexity of the query plan, but necessary since the backend scheduler
still expect at most MT_DOP amount of scan fragment instances. We can
remove the extra replanning in the future once we can fully manage scan
node parallelism without MT_DOP.

This patch also tune computeScanProcessingCost() to guard against
scheduling too many scan fragments by comparing with the actual scan
range count that Planner knows.

Testing:
- Pass test_executor_groups.py
- Add test case in test_min_processing_per_thread_small.
- Raised impala.admission-control.max-query-mem-limit.root.small from
  64MB to 70MB in llama-site-3-groups.xml so that the new grouping query
  can fit in root.small pool.

Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
---
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/resources/llama-site-3-groups.xml
M tests/custom_cluster/test_executor_groups.py
10 files changed, 185 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/5
--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-30 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@262
PS4, Line 262:   public void computeCostingSegment(TQueryOptions queryOptions, 
boolean fixLeafNodes) {
Need a better name for the flag. Maybe limitScanParallelism?



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 30 Mar 2023 14:09:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-29 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@9
PS4, Line 9: In multiple executor group set setup
nit: In a setup with multiple executor group set


http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@11
PS4, Line 11: kind
nit: kinds


http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@18
PS4, Line 18: relax
nit: relaxes


http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@21
PS4, Line 21: we replan once again with that executor group set but with
: scan fragment parallelism returned back to MT_DOP.
Does this mean we need one more replan for any query plan with scan fragment? 
What's overhead for small query?


http://gerrit.cloudera.org:8080/#/c/19656/4//COMMIT_MSG@27
PS4, Line 27: fragment
nit: fragments


http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1061
PS4, Line 1061: fixLeafNodes
here and other places, should we name is as fixedLeafNodes?


http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19656/4/fe/src/main/java/org/apache/impala/planner/ScanNode.java@361
PS4, Line 361: < 0
<= 0. Otherwise if getInputCardinality() equals 0, syntheticCardinality equals 
0. Then divide by 0.



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 30 Mar 2023 02:06:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12714/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 29 Mar 2023 23:11:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-29 Thread Riza Suminto (Code Review)
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19656

to look at the new patch set (#4).

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..

IMPALA-12029: Relax scan fragment parallelism on first planning

In multiple executor group set setup, Frontend will try to match a query
with the smallest executor group set that can fit the memory and cpu
requirement of the compiled query. There are kind of query where the
compiled plan will fit to any executor group set but not necessarily
deliver the best performance. An example for this is Impala's COMPUTE
STATS query. It does full table scan and aggregate the stats, have
fairly simple query plan shape, but can benefit from higher scan
parallelism.

This patch relax the scan fragment parallelism on first round of query
planning. This allows scan fragment to increase its parallelism based on
its ProcessingCost estimation. If the relaxed plan fit in an executor
group set, we replan once again with that executor group set but with
scan fragment parallelism returned back to MT_DOP. We can remove the
replanning in the future once we can fully manage scan node parallelism
without MT_DOP.

This patch also tune computeScanProcessingCost() to guard against
scheduling too many scan fragment by comparing with the actual scan
range count that Planner knows.

Testing:
- Pass test_executor_groups.py
- Add test case in test_min_processing_per_thread_small.
- Raised impala.admission-control.max-query-mem-limit.root.small from
  64MB to 70MB in llama-site-3-groups.xml so that the new grouping query
  can fit in root.small pool.

Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
---
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/resources/llama-site-3-groups.xml
M tests/custom_cluster/test_executor_groups.py
10 files changed, 177 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/4
--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 3:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/12712/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 29 Mar 2023 21:20:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-29 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19656/2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/19656/2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1078
PS2, Line 1078: {
  :   // TODO: Fragment with UnionNode but without ScanNode 
should have it
> Comment here and other places should say leaf node instead of scan node to
Done


http://gerrit.cloudera.org:8080/#/c/19656/2/tests/custom_cluster/test_executor_groups.py
File tests/custom_cluster/test_executor_groups.py:

http://gerrit.cloudera.org:8080/#/c/19656/2/tests/custom_cluster/test_executor_groups.py@955
PS2, Line 955: "Test processing cost with min_processing_per_thread s
> Misplaced comment.
Done



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 29 Mar 2023 21:10:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-29 Thread Riza Suminto (Code Review)
Hello Andrew Sherman, Kurt Deschler, Abhishek Rawat, Wenzhe Zhou, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19656

to look at the new patch set (#3).

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..

IMPALA-12029: Relax scan fragment parallelism on first planning

In multiple executor group set setup, Frontend will try to match a query
with the smallest executor group set that can fit the memory and cpu
requirement of the compiled query. There are kind of query where the
compiled plan will fit to any executor group set but not necessarily
deliver the best performance. An example for this is Impala's COMPUTE
STATS query. It does full table scan and aggregate the stats, have
fairly simple query plan shape, but can benefit from higher scan
parallelism.

This patch relax the scan fragment parallelism on first round of query
planning. This allows scan fragment to increase its parallelism based on
its ProcessingCost estimation. If the relaxed plan fit in an executor
group set, we replan once again with that executor group set but with
scan fragment parallelism returned back to MT_DOP. We can remove the
replanning in the future once we can fully manage scan node parallelism
without MT_DOP.

Testing:
- Pass test_executor_groups.py
- Add test case in test_min_processing_per_thread_small.
- Raised impala.admission-control.max-query-mem-limit.root.small from
  64MB to 70MB in llama-site-3-groups.xml so that the new grouping query
  can fit in root.small pool.

Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
---
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/resources/llama-site-3-groups.xml
M tests/custom_cluster/test_executor_groups.py
6 files changed, 125 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/3
--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12703/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 29 Mar 2023 01:48:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-28 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19656/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19656/1//COMMIT_MSG@23
PS1, Line 23: ag
> is not
Updated.


http://gerrit.cloudera.org:8080/#/c/19656/2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/19656/2/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1078
PS2, Line 1078: One exception to that is if this fragment has ScanNode and this 
is the first
  :   //   planning round over selected executor group (see 
IMPALA-12029).
Comment here and other places should say leaf node instead of scan node to be 
consistent.
EmptySetNode, ScanNode, and UnionNode is considered as leaf node.


http://gerrit.cloudera.org:8080/#/c/19656/2/tests/custom_cluster/test_executor_groups.py
File tests/custom_cluster/test_executor_groups.py:

http://gerrit.cloudera.org:8080/#/c/19656/2/tests/custom_cluster/test_executor_groups.py@955
PS2, Line 955: Expect to run the query on the small group by default.
Misplaced comment.



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 29 Mar 2023 01:35:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Relax scan fragment parallelism on first planning

2023-03-28 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19656

to look at the new patch set (#2).

Change subject: IMPALA-12029: Relax scan fragment parallelism on first planning
..

IMPALA-12029: Relax scan fragment parallelism on first planning

In multiple executor group set setup, Frontend will try to match a query
with the smallest executor group set that can fit the memory and cpu
requirement of the compiled query. There are kind of query where the
compiled plan will fit to any executor group set but not necessarily
deliver the best performance. An example for this is Impala's COMPUTE
STATS query. It does full table scan and aggregate the stats, have
fairly simple query plan shape, but can benefit from higher scan
parallelism.

This patch relax the scan fragment parallelism on first round of query
planning. This allows scan fragment to increase its parallelism based on
its ProcessingCost estimation. If the relaxed plan fit in an executor
group set, we replan once again with that executor group set but with
scan fragment parallelism returned back to MT_DOP. We can remove the
replanning in the future once we can fully manage scan node parallelism
without MT_DOP.

Testing:
- Pass test_executor_groups.py
- Add test cases in query-options-test.cc.
- Add test case in test_min_processing_per_thread_small.
- Raised impala.admission-control.max-query-mem-limit.root.small from
  64MB to 70MB in llama-site-3-groups.xml so that the new grouping query
  can fit in root.small pool.

Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
---
M be/src/service/query-options-test.cc
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/resources/llama-site-3-groups.xml
M tests/custom_cluster/test_executor_groups.py
7 files changed, 129 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/2
--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto