[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..

IMPALA-9146: Add a configurable limit for the size of broadcast input.

Impala's DistributedPlanner may sometimes accidentally choose broadcast
distribution for inputs that are larger than the destination executor's
total memory. This could potentially happen if the cluster membership is
not accurately known and the planner's cost computation of the
broadcastCost vs partitionCost happens to favor the broadcast
distribution. This causes spilling and severely affects performance.
Although the DistributedPlanner does a mem_limit check before picking
broadcast, the mem_limit is not an accurate reflection since it is
assigned during admission control.

As a safety here we introduce an explicit configurable limit:
broadcast_bytes_limit for the size of the broadcast input and set it
to default of 32GB. The default is chosen based on analysis of existing
benchmark queries and representative workloads such that in vast
majority of the cases the parameter value does not need to be changed.
If the estimated input size on the build side is greater than this
threshold, the DistributedPlanner will fall back to a partition
distribution. Setting this parameter to 0 causes it to be ignored.

Testing:
 - Ran all regression tests on Jenkins successfully
 - Added a few unit testis in PlannerTest that (a) set the
broadcast_bytes_limit to a small value and checks whether the
distributed plan does hash partitioning on the build side instead
of broadcast, (b) pass a broadcast hint to override the config
setting, (c) verify the standard case where broadcast threshold
is larger than the build input size.

Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Reviewed-on: http://gerrit.cloudera.org:8080/14690
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit-large.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test
10 files changed, 131 insertions(+), 5 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 10
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 9: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 9
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 26 Nov 2019 06:47:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5295/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 9
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 26 Nov 2019 02:08:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 9
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 26 Nov 2019 02:08:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 8:

Hit IMPALA-9152 and IMPALA-9165. I might wait until Joe's fix for IMPALA-9165 
is merged before retrying.


--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 8
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 21 Nov 2019 19:15:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 8: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5261/


--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 8
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 21 Nov 2019 07:13:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5261/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 8
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 20 Nov 2019 21:13:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 8
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 20 Nov 2019 21:13:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 7
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 20 Nov 2019 21:12:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5088/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 7
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 20 Nov 2019 21:02:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-20 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 7:

(2 comments)

Addressed comments.  Pls take another look.  Thanks !

http://gerrit.cloudera.org:8080/#/c/14690/6/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/14690/6/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@1017
PS6, Line 1017: options.setBroadcast_bytes_limit(100);
> Optional: you could probably put this test in the same file as the previous
Made this change.


http://gerrit.cloudera.org:8080/#/c/14690/6/testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit-hint.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit-hint.test:

http://gerrit.cloudera.org:8080/#/c/14690/6/testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit-hint.test@3
PS6, Line 3:
> nit: trailing whitespace. The bot ignores .test files, but would be best to
I have removed this file now since it is combined with 
broadcast-bytes-limit.test



--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 7
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 20 Nov 2019 20:16:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-20 Thread Aman Sinha (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14690

to look at the new patch set (#7).

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..

IMPALA-9146: Add a configurable limit for the size of broadcast input.

Impala's DistributedPlanner may sometimes accidentally choose broadcast
distribution for inputs that are larger than the destination executor's
total memory. This could potentially happen if the cluster membership is
not accurately known and the planner's cost computation of the
broadcastCost vs partitionCost happens to favor the broadcast
distribution. This causes spilling and severely affects performance.
Although the DistributedPlanner does a mem_limit check before picking
broadcast, the mem_limit is not an accurate reflection since it is
assigned during admission control.

As a safety here we introduce an explicit configurable limit:
broadcast_bytes_limit for the size of the broadcast input and set it
to default of 32GB. The default is chosen based on analysis of existing
benchmark queries and representative workloads such that in vast
majority of the cases the parameter value does not need to be changed.
If the estimated input size on the build side is greater than this
threshold, the DistributedPlanner will fall back to a partition
distribution. Setting this parameter to 0 causes it to be ignored.

Testing:
 - Ran all regression tests on Jenkins successfully
 - Added a few unit testis in PlannerTest that (a) set the
broadcast_bytes_limit to a small value and checks whether the
distributed plan does hash partitioning on the build side instead
of broadcast, (b) pass a broadcast hint to override the config
setting, (c) verify the standard case where broadcast threshold
is larger than the build input size.

Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit-large.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test
10 files changed, 131 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/14690/7
--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 7
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5085/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 6
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 20 Nov 2019 17:27:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5084/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 20 Nov 2019 17:23:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 6:

(2 comments)

I'm ready to +2 except for a couple of minor things

http://gerrit.cloudera.org:8080/#/c/14690/6/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/14690/6/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@1017
PS6, Line 1017: runPlannerTestFile("broadcast-bytes-limit-hint", 
"tpch_parquet", options);
Optional: you could probably put this test in the same file as the previous 
one, since it uses all the same settings.


http://gerrit.cloudera.org:8080/#/c/14690/6/testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit-hint.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit-hint.test:

http://gerrit.cloudera.org:8080/#/c/14690/6/testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit-hint.test@3
PS6, Line 3: # greater than broadcast_bytes_limit because the query has a
nit: trailing whitespace. The bot ignores .test files, but would be best to 
avoid.



--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 6
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 20 Nov 2019 16:56:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-20 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 6:

(2 comments)

Thanks for the code review.  I have addressed the review comments.  For the 
unit tests, I ended up creating separate test files and changing the option 
through the Java code.  I see some tests are setting QUERYOPTIONS within a 
single test file..not sure if there's a preference.

http://gerrit.cloudera.org:8080/#/c/14690/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14690/4//COMMIT_MSG@30
PS4, Line 30:  - Added a few unit testis in PlannerTest that (a) set the
> Can you also add the option to query-options-test just to test the validati
Done


http://gerrit.cloudera.org:8080/#/c/14690/4/testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test:

PS4:
> Ah, sure will add that.
Done



--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 6
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 20 Nov 2019 16:56:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-20 Thread Aman Sinha (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14690

to look at the new patch set (#6).

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..

IMPALA-9146: Add a configurable limit for the size of broadcast input.

Impala's DistributedPlanner may sometimes accidentally choose broadcast
distribution for inputs that are larger than the destination executor's
total memory. This could potentially happen if the cluster membership is
not accurately known and the planner's cost computation of the
broadcastCost vs partitionCost happens to favor the broadcast
distribution. This causes spilling and severely affects performance.
Although the DistributedPlanner does a mem_limit check before picking
broadcast, the mem_limit is not an accurate reflection since it is
assigned during admission control.

As a safety here we introduce an explicit configurable limit:
broadcast_bytes_limit for the size of the broadcast input and set it
to default of 32GB. The default is chosen based on analysis of existing
benchmark queries and representative workloads such that in vast
majority of the cases the parameter value does not need to be changed.
If the estimated input size on the build side is greater than this
threshold, the DistributedPlanner will fall back to a partition
distribution. Setting this parameter to 0 causes it to be ignored.

Testing:
 - Ran all regression tests on Jenkins successfully
 - Added a few unit testis in PlannerTest that (a) set the
broadcast_bytes_limit to a small value and checks whether the
distributed plan does hash partitioning on the build side instead
of broadcast, (b) pass a broadcast hint to override the config
setting, (c) verify the standard case where broadcast threshold
is larger than the build input size.

Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit-hint.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit-large.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test
11 files changed, 135 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/14690/6
--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 6
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-20 Thread Aman Sinha (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14690

to look at the new patch set (#5).

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..

IMPALA-9146: Add a configurable limit for the size of broadcast input.

Impala's DistributedPlanner may sometimes accidentally choose broadcast
distribution for inputs that are larger than the destination executor's
total memory. This could potentially happen if the cluster membership is
not accurately known and the planner's cost computation of the
broadcastCost vs partitionCost happens to favor the broadcast
distribution. This causes spilling and severely affects performance.
Although the DistributedPlanner does a mem_limit check before picking
broadcast, the mem_limit is not an accurate reflection since it is
assigned during admission control.

As a safety here we introduce an explicit configurable limit:
broadcast_bytes_limit for the size of the broadcast input and set it
to default of 32GB. The default is chosen based on analysis of existing
benchmark queries and representative workloads such that in vast
majority of the cases the parameter value does not need to be changed.
If the estimated input size on the build side is greater than this
threshold, the DistributedPlanner will fall back to a partition
distribution. Setting this parameter to 0 causes it to be ignored.

Testing:
 - Ran all regression tests on Jenkins successfully
 - Added a few unit testis in PlannerTest that (a) set the
broadcast_bytes_limit to a small value and checks whether the
distributed plan does hash partitioning on the build side instead
of broadcast, (b) pass a broadcast hint to override the config
setting, (c) verify the standard case where broadcast threshold
is larger than the build input size.

Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit-hint.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit-large.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test
11 files changed, 136 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/14690/5
--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14690/5/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/14690/5/common/thrift/ImpalaService.thrift@488
PS5, Line 488:
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 20 Nov 2019 16:40:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-19 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14690/4/testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test:

PS4:
> Oh I meant something simpler - same option values, but a plan that has an e
Ah, sure will add that.



--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 20 Nov 2019 05:40:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-19 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14690/4/testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test:

PS4:
> Good suggestion. Just want to clarify..the first negative test case can be
Oh I meant something simpler - same option values, but a plan that has an 
estimate of < broadcast_bytes_limit so we're testing both sides of the 
boundary. We kinda get that coverage from all the other planner tests but it'd 
be nice to have it explicitly tested here.

It should have read "Can you also add a negative test where it is below the 
threshold and stays as a broadcast join." Sorry about that.



--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 20 Nov 2019 05:20:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-19 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14690/4/testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test:

PS4:
> Can you also add a negative test where it is below the threshold and doesn'
Good suggestion. Just want to clarify..the first negative test case can be 
something like setting the 'mem_limit' to a small number but leave the 
broadcast_bytes_limit as default (or sufficiently high). In this case, the 
planner should not pick broadcast.



--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 20 Nov 2019 02:16:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-19 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 4:

(3 comments)

Code looks good, just had a few asks as far as additional tests.

http://gerrit.cloudera.org:8080/#/c/14690/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14690/4//COMMIT_MSG@30
PS4, Line 30:  - Added a new unit test in PlannerTest that sets the
Can you also add the option to query-options-test just to test the validations. 
I think you can just add it to the table of byte options in SetByteOptions() 
since its behaviour is consistent with the other options.


http://gerrit.cloudera.org:8080/#/c/14690/4/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/14690/4/be/src/service/query-options.cc@914
PS4, Line 914: ParseMemValue(value, "broadcast bytes limit for join 
operations", _bytes_limit));
> line too long (103 > 90)
Assume you'll fix in next patch. You can run these checks locally the HEAD 
commit with:

  ./bin/jenkins/critique-gerrit-review.py --dryrun

If you want to check whether they're resolved. Also ok to rely on the bot.


http://gerrit.cloudera.org:8080/#/c/14690/4/testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test:

PS4:
Can you also add a negative test where it is below the threshold and doesn't 
switch to broadcast.

And a negative test where it's above the threshold but the /* +broadcast */ 
hint overrides it.



-- 
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 19 Nov 2019 19:39:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5067/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 19 Nov 2019 18:38:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-19 Thread Aman Sinha (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14690

to look at the new patch set (#4).

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..

IMPALA-9146: Add a configurable limit for the size of broadcast input.

Impala's DistributedPlanner may sometimes accidentally choose broadcast
distribution for inputs that are larger than the destination executor's
total memory. This could potentially happen if the cluster membership is
not accurately known and the planner's cost computation of the
broadcastCost vs partitionCost happens to favor the broadcast
distribution. This causes spilling and severely affects performance.
Although the DistributedPlanner does a mem_limit check before picking
broadcast, the mem_limit is not an accurate reflection since it is
assigned during admission control.

As a safety here we introduce an explicit configurable limit:
broadcast_bytes_limit for the size of the broadcast input and set it
to default of 32GB. The default is chosen based on analysis of existing
benchmark queries and representative workloads such that in vast
majority of the cases the parameter value does not need to be changed.
If the estimated input size on the build side is greater than this
threshold, the DistributedPlanner will fall back to a partition
distribution. Setting this parameter to 0 causes it to be ignored.

Testing:
 - Ran all regression tests on Jenkins successfully
 - Added a new unit test in PlannerTest that sets the
broadcast_bytes_limit to a small value and checks whether the
distributed plan does hash partitioning on the build side instead of
broadcast.

Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test
8 files changed, 72 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/14690/4
--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14690/4/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/14690/4/be/src/service/query-options.cc@914
PS4, Line 914: ParseMemValue(value, "broadcast bytes limit for join 
operations", _bytes_limit));
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/14690/4/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/14690/4/common/thrift/ImpalaService.thrift@488
PS4, Line 488:
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 19 Nov 2019 18:21:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5039/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 2
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 16 Nov 2019 20:54:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-16 Thread Aman Sinha (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14690

to look at the new patch set (#3).

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..

IMPALA-9146: Add a configurable limit for the size of broadcast input.

Impala's DistributedPlanner may sometimes accidentally choose broadcast
distribution for inputs that are larger than the destination executor's
total memory. This could potentially happen if the cluster membership is
not accurately known and the planner's cost computation of the
broadcastCost vs partitionCost happens to favor the broadcast
distribution. This causes spilling and severely affects performance.
Although the DistributedPlanner does a mem_limit check before picking
broadcast, the mem_limit is not an accurate reflection since it is
assigned during admission control.

As a safety here we introduce an explicit configurable limit:
broadcast_bytes_limit for the size of the broadcast input and set it
to default of 32GB. The default is chosen based on analysis of existing
benchmark queries and representative workloads. If the estimated input
size on the build side is greater than this threshold, the
DistributedPlanner will fall back to a partition distribution.

Testing:
 - Ran all regression tests on Jenkins successfully
 - Added a new unit test in PlannerTest that sets the
broadcast_bytes_limit to a small value and checks whether the
distributed plan does hash partitioning on the build side instead of
broadcast.

Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test
9 files changed, 75 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/14690/3
--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 3
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-16 Thread Aman Sinha (Code Review)
Aman Sinha has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..

IMPALA-9146: Add a configurable limit for the size of broadcast input.

Impala's DistributedPlanner may sometimes accidentally choose broadcast 
distribution
for inputs that are larger than the destination executor's total memory. This 
could
potentially happen if the cluster membership is not accurately known and the 
planner's
cost computation of the broadcastCost vs partitionCost happens to favor the 
broadcast
distribution. This causes spilling and severely affects performance. Although 
the
DistributedPlanner does a mem_limit check before picking broadcast, the 
mem_limit is
not an accurate reflection since it is assigned during admission control.

As a safety here we introduce an explicit configurable limit: 
broadcast_bytes_limit
for the size of the broadcast input and set it to default of 32GB. The default 
is chosen
based on analysis of existing benchmark queries and representative workloads. 
If the
estimated input size on the build side is greater than this threshold,
the DistributedPlanner will fall back to a partition distribution.

Testing:
 - Ran all regression tests on Jenkins successfully
 - Added a new unit test in PlannerTest that sets the
broadcast_bytes_limit to a small value and checks whether the
distributed plan does hash partitioning on the build side instead of
broadcast.

Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test
9 files changed, 75 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/14690/2
--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 2
Gerrit-Owner: Aman Sinha 


[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

2019-11-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast 
input.
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/14690/2/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/14690/2/be/src/service/query-options.cc@914
PS2, Line 914: ParseMemValue(value, "broadcast bytes limit for join 
operations", _bytes_limit));
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/14690/2/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/14690/2/common/thrift/ImpalaService.thrift@488
PS2, Line 488:
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/14690/2/common/thrift/ImpalaService.thrift@489
PS2, Line 489:   // The max number of estimated bytes eligible for a Broadcast 
operation during a join.
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/14690/2/common/thrift/ImpalaService.thrift@490
PS2, Line 490:   // If the planner thinks the total bytes sent to all 
destinations of a broadcast exchange
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/14690/2/common/thrift/ImpalaService.thrift@491
PS2, Line 491:   // will exceed this limit, it will not consider a broadcast 
and instead fall back on a
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/14690/2/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

http://gerrit.cloudera.org:8080/#/c/14690/2/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java@861
PS2, Line 861: Collections.emptySet());
tab used for whitespace



--
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 2
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 16 Nov 2019 20:12:29 +
Gerrit-HasComments: Yes