Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/14690 to look at the new patch set (#5). Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast input. ...................................................................... IMPALA-9146: Add a configurable limit for the size of broadcast input. Impala's DistributedPlanner may sometimes accidentally choose broadcast distribution for inputs that are larger than the destination executor's total memory. This could potentially happen if the cluster membership is not accurately known and the planner's cost computation of the broadcastCost vs partitionCost happens to favor the broadcast distribution. This causes spilling and severely affects performance. Although the DistributedPlanner does a mem_limit check before picking broadcast, the mem_limit is not an accurate reflection since it is assigned during admission control. As a safety here we introduce an explicit configurable limit: broadcast_bytes_limit for the size of the broadcast input and set it to default of 32GB. The default is chosen based on analysis of existing benchmark queries and representative workloads such that in vast majority of the cases the parameter value does not need to be changed. If the estimated input size on the build side is greater than this threshold, the DistributedPlanner will fall back to a partition distribution. Setting this parameter to 0 causes it to be ignored. Testing: - Ran all regression tests on Jenkins successfully - Added a few unit testis in PlannerTest that (a) set the broadcast_bytes_limit to a small value and checks whether the distributed plan does hash partitioning on the build side instead of broadcast, (b) pass a broadcast hint to override the config setting, (c) verify the standard case where broadcast threshold is larger than the build input size. Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit-hint.test A testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit-large.test A testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test 11 files changed, 136 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/14690/5 -- To view, visit http://gerrit.cloudera.org:8080/14690 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9 Gerrit-Change-Number: 14690 Gerrit-PatchSet: 5 Gerrit-Owner: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>