[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-10-09 Thread Riza Suminto (Code Review)
Riza Suminto has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..

IMPALA-11068: Add query option to reduce scanner thread launch.

Under heavy decompression workload, Impala running with scanner thread
parallelism (MT_DOP=0) can still hit OOM error due to launching too many
threads too soon. We have logic in ScannerMemLimiter to limit the number
of scanner threads by calculating the thread's memory requirement and
estimating the memory growth rate of all threads. However, it does not
prevent a scanner node from quickly launching many threads and
immediately reaching the memtracker's spare capacity. Even after
ScannerMemLimiter rejects a new thread launch, some existing threads
might continue increasing their non-reserved memory for decompression
work until the memory limit exceeded.

IMPALA-7096 adds hdfs_scanner_thread_max_estimated_bytes flag as a
heuristic to count for non-reserved memory growth. Increasing this flag
value can help reduce thread count, but might severely regress other
queries that do not have heavy decompression characteristics. Similarly
with lowering the NUM_SCANNER_THREADS query option.

This patch adds one more query option as an alternative to mitigate OOM
called HDFS_SCANNER_NON_RESERVED_BYTES. This option is intended to offer
the same control as hdfs_scanner_thread_max_estimated_bytes, but as a
query option such that tuning can be done at per query granularity. If
this query option not set, set to 0, or negative value, backend will
revert to use the value of hdfs_scanner_thread_max_estimated_bytes flag.

Testing:
- Add test case in query-options-test.cc and
  TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling.

Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Reviewed-on: http://gerrit.cloudera.org:8080/18126
Reviewed-by: Csaba Ringhofer 
Tested-by: Impala Public Jenkins 
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M 
testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-mem-scaling.test
8 files changed, 89 insertions(+), 6 deletions(-)

Approvals:
  Csaba Ringhofer: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-10-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 7: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sun, 08 Oct 2023 23:38:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-10-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9796/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sun, 08 Oct 2023 19:01:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-10-07 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 7: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc
File be/src/runtime/scanner-mem-limiter.cc:

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc@90
PS1, Line 90: if (node == element.first) found_scan = scan.get();
> After some internal discussion, we come up with conclusion that query optio
Thanks for adding the query option!



--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sat, 07 Oct 2023 14:16:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-10-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14151/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sat, 07 Oct 2023 01:04:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-10-06 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18126/6/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/18126/6/be/src/service/query-options.cc@1146
PS6, Line 1146: 
query_options->__set_mem_limit_coordinators(mem_spec_val.value);
> A break is missing from the end of the block here.
Done



--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sat, 07 Oct 2023 00:35:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-10-06 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Daniel Becker, Joe McDonnell, Csaba Ringhofer, Michael 
Smith, Wenzhe Zhou, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18126

to look at the new patch set (#7).

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..

IMPALA-11068: Add query option to reduce scanner thread launch.

Under heavy decompression workload, Impala running with scanner thread
parallelism (MT_DOP=0) can still hit OOM error due to launching too many
threads too soon. We have logic in ScannerMemLimiter to limit the number
of scanner threads by calculating the thread's memory requirement and
estimating the memory growth rate of all threads. However, it does not
prevent a scanner node from quickly launching many threads and
immediately reaching the memtracker's spare capacity. Even after
ScannerMemLimiter rejects a new thread launch, some existing threads
might continue increasing their non-reserved memory for decompression
work until the memory limit exceeded.

IMPALA-7096 adds hdfs_scanner_thread_max_estimated_bytes flag as a
heuristic to count for non-reserved memory growth. Increasing this flag
value can help reduce thread count, but might severely regress other
queries that do not have heavy decompression characteristics. Similarly
with lowering the NUM_SCANNER_THREADS query option.

This patch adds one more query option as an alternative to mitigate OOM
called HDFS_SCANNER_NON_RESERVED_BYTES. This option is intended to offer
the same control as hdfs_scanner_thread_max_estimated_bytes, but as a
query option such that tuning can be done at per query granularity. If
this query option not set, set to 0, or negative value, backend will
revert to use the value of hdfs_scanner_thread_max_estimated_bytes flag.

Testing:
- Add test case in query-options-test.cc and
  TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling.

Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M 
testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-mem-scaling.test
8 files changed, 89 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/18126/7
--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-10-06 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 6:

(1 comment)

Just one comment, otherwise +1.

http://gerrit.cloudera.org:8080/#/c/18126/6/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/18126/6/be/src/service/query-options.cc@1146
PS6, Line 1146: 
query_options->__set_mem_limit_coordinators(mem_spec_val.value);
A break is missing from the end of the block here.



--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sat, 07 Oct 2023 00:30:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-10-04 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 6: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 05 Oct 2023 01:10:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-10-02 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 6: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 02 Oct 2023 22:32:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-10-02 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18126/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18126/5//COMMIT_MSG@29
PS5, Line 29: query option such that tuning can be done at per query 
granularity. If
> nit: granularity
Done



--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 02 Oct 2023 22:31:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-10-02 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Daniel Becker, Joe McDonnell, Csaba Ringhofer, Michael 
Smith, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18126

to look at the new patch set (#6).

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..

IMPALA-11068: Add query option to reduce scanner thread launch.

Under heavy decompression workload, Impala running with scanner thread
parallelism (MT_DOP=0) can still hit OOM error due to launching too many
threads too soon. We have logic in ScannerMemLimiter to limit the number
of scanner threads by calculating the thread's memory requirement and
estimating the memory growth rate of all threads. However, it does not
prevent a scanner node from quickly launching many threads and
immediately reaching the memtracker's spare capacity. Even after
ScannerMemLimiter rejects a new thread launch, some existing threads
might continue increasing their non-reserved memory for decompression
work until the memory limit exceeded.

IMPALA-7096 adds hdfs_scanner_thread_max_estimated_bytes flag as a
heuristic to count for non-reserved memory growth. Increasing this flag
value can help reduce thread count, but might severely regress other
queries that do not have heavy decompression characteristics. Similarly
with lowering the NUM_SCANNER_THREADS query option.

This patch adds one more query option as an alternative to mitigate OOM
called HDFS_SCANNER_NON_RESERVED_BYTES. This option is intended to offer
the same control as hdfs_scanner_thread_max_estimated_bytes, but as a
query option such that tuning can be done at per query granularity. If
this query option not set, set to 0, or negative value, backend will
revert to use the value of hdfs_scanner_thread_max_estimated_bytes flag.

Testing:
- Add test case in query-options-test.cc and
  TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling.

Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M 
testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-mem-scaling.test
8 files changed, 88 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/18126/6
--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-10-02 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 5: Code-Review+1

(1 comment)

This seems like what IMPALA-7096 should have been originally. Reasonable enough.

http://gerrit.cloudera.org:8080/#/c/18126/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18126/5//COMMIT_MSG@29
PS5, Line 29: query option such that tuning can be done at per query 
granulality. If
nit: granularity



--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 02 Oct 2023 22:24:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-10-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14126/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 02 Oct 2023 17:25:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-10-02 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 5:

Patch set 4 is a rebase to resolve merge conflict.


--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 02 Oct 2023 16:57:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-10-02 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Daniel Becker, Joe McDonnell, Csaba Ringhofer, Bikramjeet 
Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18126

to look at the new patch set (#5).

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..

IMPALA-11068: Add query option to reduce scanner thread launch.

Under heavy decompression workload, Impala running with scanner thread
parallelism (MT_DOP=0) can still hit OOM error due to launching too many
threads too soon. We have logic in ScannerMemLimiter to limit the number
of scanner threads by calculating the thread's memory requirement and
estimating the memory growth rate of all threads. However, it does not
prevent a scanner node from quickly launching many threads and
immediately reaching the memtracker's spare capacity. Even after
ScannerMemLimiter rejects a new thread launch, some existing threads
might continue increasing their non-reserved memory for decompression
work until the memory limit exceeded.

IMPALA-7096 adds hdfs_scanner_thread_max_estimated_bytes flag as a
heuristic to count for non-reserved memory growth. Increasing this flag
value can help reduce thread count, but might severely regress other
queries that do not have heavy decompression characteristics. Similarly
with lowering the NUM_SCANNER_THREADS query option.

This patch adds one more query option as an alternative to mitigate OOM
called HDFS_SCANNER_NON_RESERVED_BYTES. This option is intended to offer
the same control as hdfs_scanner_thread_max_estimated_bytes, but as a
query option such that tuning can be done at per query granulality. If
this query option not set, set to 0, or negative value, backend will
revert to use the value of hdfs_scanner_thread_max_estimated_bytes flag.

Testing:
- Add test case in query-options-test.cc and
  TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling.

Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M 
testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-mem-scaling.test
8 files changed, 88 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/18126/5
--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-09-27 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 4: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 27 Sep 2023 16:00:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-09-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14001/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 14 Sep 2023 16:10:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-09-14 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc@222
PS3, Line 222: aking
> Nit: taking
Done


http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc@222
PS3, Line 222: with
> Nit: with
Done


http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc@222
PS3, Line 222:
> Nit: precedence
Done


http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc@223
PS3, Line 223: ery op
> "is set to a positive value"?
Done


http://gerrit.cloudera.org:8080/#/c/18126/3/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/18126/3/common/thrift/ImpalaService.thrift@849
PS3, Line 849: is not
> "is not set to a positive value"?
Done



--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 14 Sep 2023 15:43:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-09-14 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Daniel Becker, Joe McDonnell, Csaba Ringhofer, Bikramjeet 
Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18126

to look at the new patch set (#4).

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..

IMPALA-11068: Add query option to reduce scanner thread launch.

Under heavy decompression workload, Impala running with scanner thread
parallelism (MT_DOP=0) can still hit OOM error due to launching too many
threads too soon. We have logic in ScannerMemLimiter to limit the number
of scanner threads by calculating the thread's memory requirement and
estimating the memory growth rate of all threads. However, it does not
prevent a scanner node from quickly launching many threads and
immediately reaching the memtracker's spare capacity. Even after
ScannerMemLimiter rejects a new thread launch, some existing threads
might continue increasing their non-reserved memory for decompression
work until the memory limit exceeded.

IMPALA-7096 adds hdfs_scanner_thread_max_estimated_bytes flag as a
heuristic to count for non-reserved memory growth. Increasing this flag
value can help reduce thread count, but might severely regress other
queries that do not have heavy decompression characteristics. Similarly
with lowering the NUM_SCANNER_THREADS query option.

This patch adds one more query option as an alternative to mitigate OOM
called HDFS_SCANNER_NON_RESERVED_BYTES. This option is intended to offer
the same control as hdfs_scanner_thread_max_estimated_bytes, but as a
query option such that tuning can be done at per query granulality. If
this query option not set, set to 0, or negative value, backend will
revert to use the value of hdfs_scanner_thread_max_estimated_bytes flag.

Testing:
- Add test case in query-options-test.cc and
  TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling.

Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M 
testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-mem-scaling.test
8 files changed, 87 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/18126/4
--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-09-14 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc@222
PS3, Line 222: whith
Nit: with


http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc@222
PS3, Line 222: precedent
Nit: precedence


http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc@222
PS3, Line 222: takes
Nit: taking


http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc@223
PS3, Line 223: is set
"is set to a positive value"?


http://gerrit.cloudera.org:8080/#/c/18126/3/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/18126/3/common/thrift/ImpalaService.thrift@849
PS3, Line 849: not set
"is not set to a positive value"?



--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 14 Sep 2023 15:32:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-09-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13991/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 12 Sep 2023 18:08:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-09-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/18126/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18126/2//COMMIT_MSG@27
PS2, Line 27: n is i
> Nit: could be "is intended to offer"
Done


http://gerrit.cloudera.org:8080/#/c/18126/2//COMMIT_MSG@30
PS2, Line 30: tion not s
> Mention that we also fall back to the flag if the query option is set no ze
Done


http://gerrit.cloudera.org:8080/#/c/18126/2/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/18126/2/be/src/exec/hdfs-scan-node.cc@220
PS2, Line 220: from
 :   // either of the HDFS_SCANNER_NON_RESERVED_BYTES query op
> Nit: it would be better like this:
Done. Rephrased.


http://gerrit.cloudera.org:8080/#/c/18126/2/be/src/exec/hdfs-scan-node.cc@221
PS2, Line 221: ion or the
 :   // hdfs_scanner_thread_max_estimated_byte
> Nit: it would be better like this:
Done. Rephrased.


http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/ImpalaService.thrift@847
PS2, Line 847: by the pla
> Nit: by the planner.
Done


http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/ImpalaService.thrift@849
PS2, Line 849: al sca
> Nit: threads.
Done


http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/Query.thrift@665
PS2, Line 665:   166: optional i64 hdfs_scanner_non_reserved_bytes = -1
> Shouldn't the default be unset, i.e. for example -1? That way the original
Agree. Done.



--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 12 Sep 2023 17:42:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-09-12 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Daniel Becker, Joe McDonnell, Csaba Ringhofer, Bikramjeet 
Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18126

to look at the new patch set (#3).

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..

IMPALA-11068: Add query option to reduce scanner thread launch.

Under heavy decompression workload, Impala running with scanner thread
parallelism (MT_DOP=0) can still hit OOM error due to launching too many
threads too soon. We have logic in ScannerMemLimiter to limit the number
of scanner threads by calculating the thread's memory requirement and
estimating the memory growth rate of all threads. However, it does not
prevent a scanner node from quickly launching many threads and
immediately reaching the memtracker's spare capacity. Even after
ScannerMemLimiter rejects a new thread launch, some existing threads
might continue increasing their non-reserved memory for decompression
work until the memory limit exceeded.

IMPALA-7096 adds hdfs_scanner_thread_max_estimated_bytes flag as a
heuristic to count for non-reserved memory growth. Increasing this flag
value can help reduce thread count, but might severely regress other
queries that do not have heavy decompression characteristics. Similarly
with lowering the NUM_SCANNER_THREADS query option.

This patch adds one more query option as an alternative to mitigate OOM
called HDFS_SCANNER_NON_RESERVED_BYTES. This option is intended to offer
the same control as hdfs_scanner_thread_max_estimated_bytes, but as a
query option such that tuning can be done at per query granulality. If
this query option not set, set to 0, or negative value, backend will
revert to use the value of hdfs_scanner_thread_max_estimated_bytes flag.

Testing:
- Add test case in query-options-test.cc and
  TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling.

Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M 
testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-mem-scaling.test
8 files changed, 87 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/18126/3
--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-09-12 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/18126/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18126/2//COMMIT_MSG@27
PS2, Line 27: intent
Nit: could be "is intended to offer"


http://gerrit.cloudera.org:8080/#/c/18126/2//COMMIT_MSG@30
PS2, Line 30: is not set
Mention that we also fall back to the flag if the query option is set no zero 
or a negative value.


http://gerrit.cloudera.org:8080/#/c/18126/2/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/18126/2/be/src/exec/hdfs-scan-node.cc@220
PS2, Line 220: from
 :   // either of hdfs_scanner_thread_max_estimated_bytes flag
Nit: it would be better like this:
"from either the hdfs_scanner_thread_max_estimated_bytes flag ..."


http://gerrit.cloudera.org:8080/#/c/18126/2/be/src/exec/hdfs-scan-node.cc@221
PS2, Line 221: or
 :   // hdfs_scanner_non_reserved_bytes option
Nit: it would be better like this:
"or the HDFS_SCANNER_NON_RESERVED_BYTES query option."

We could mention that the query option takes precedence over the flag if it is 
set.


http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/ImpalaService.thrift@847
PS2, Line 847: by planner
Nit: by the planner.


http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/ImpalaService.thrift@849
PS2, Line 849: thread
Nit: threads.


http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/Query.thrift@665
PS2, Line 665:   166: optional i64 hdfs_scanner_non_reserved_bytes = 33554432 
// 32MB
Shouldn't the default be unset, i.e. for example -1? That way the original 
behaviour would be conserved, i.e. the 
'hdfs_scanner_thread_max_estimated_bytes' flag would be effective unless this 
option is set.

If the default value stays this, we should mention it also in the commit 
message.

My reason for having this option unset by default is that a user may set the 
'hdfs_scanner_thread_max_estimated_bytes' flag but it won't have any effect and 
it could be difficult for the user to find out why.



--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 12 Sep 2023 09:03:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-09-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13974/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 11 Sep 2023 22:15:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

2023-09-11 Thread Riza Suminto (Code Review)
Hello Kurt Deschler, Joe McDonnell, Csaba Ringhofer, Bikramjeet Vig,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18126

to look at the new patch set (#2).

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
..

IMPALA-11068: Add query option to reduce scanner thread launch.

Under heavy decompression workload, Impala running with scanner thread
parallelism (MT_DOP=0) can still hit OOM error due to launching too many
threads too soon. We have logic in ScannerMemLimiter to limit the number
of scanner threads by calculating the thread's memory requirement and
estimating the memory growth rate of all threads. However, it does not
prevent a scanner node from quickly launching many threads and
immediately reaching the memtracker's spare capacity. Even after
ScannerMemLimiter rejects a new thread launch, some existing threads
might continue increasing their non-reserved memory for decompression
work until finally the memory limit is exceeded.

IMPALA-7096 adds hdfs_scanner_thread_max_estimated_bytes flag as a
heuristic to count for non-reserved memory growth. Increasing this flag
value can help reduce thread count, but might severely regress other
queries that do not have heavy decompression characteristics. Similarly
with lowering the NUM_SCANNER_THREADS query option.

This patch adds one more query option as an alternative to mitigate OOM
called HDFS_SCANNER_NON_RESERVED_BYTES. This flag intent to offer the
same control as hdfs_scanner_thread_max_estimated_bytes, but as a query
option such that tuning can be done at per query granulality. If this
query option is not set, revert to use the value of
hdfs_scanner_thread_max_estimated_bytes flag.

Testing:
- Add test case in
  TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling.

Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M 
testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-mem-scaling.test
8 files changed, 86 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/18126/2
--
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto