Hello Kurt Deschler, Abhishek Rawat, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/20104 to look at the new patch set (#9). Change subject: IMPALA-11842: Improve memory estimation for Aggregate ...................................................................... IMPALA-11842: Improve memory estimation for Aggregate Planner often overestimates aggregation node memory estimate since it uses simple multiplication of NDVs of contributing grouping columns. This patch introduces new query options LARGE_AGG_MEM_THRESHOLD and AGG_MEM_CORRELATION_FACTOR. If the estimated perInstanceDataBytes from the NDV multiplication method exceed LARGE_AGG_MEM_THRESHOLD, recompute perInstanceDataBytes again by comparing against the max(NDV) & AGG_MEM_CORRELATION_FACTOR method. perInstanceDataBytes is kept at LARGE_AGG_MEM_THRESHOLD at a minimum so that low max(NDV) will not negatively impact query execution. Unlike PREAGG_BYTES_LIMIT, LARGE_AGG_MEM_THRESHOLD is evaluated on both preaggregation and final aggregation, and does not cap max memory reservation of the aggregation node (it may still increase memory allocation beyond the estimate if it is available). However, if a plan node is a streaming preaggregation node and PREAGG_BYTES_LIMIT is set, then PREAGG_BYTES_LIMIT will override the value of LARGE_AGG_MEM_THRESHOLD as a threshold. Testing: - Run the patch with 10 nodes, MT_DOP=12, against TPC-DS 3TB scale. Among 103 queries, 20 queries have lower "Per-Host Resource Estimates", 11 have lower "Cluster Memory Admitted", and 3 have over 10% reduced latency. No significant regression in query latency was observed. - Pass core tests. Change-Id: Ia4b4b2e519ee89f0a13fdb62d0471ee4047f6421 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test 16 files changed, 1,379 insertions(+), 105 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20104/9 -- To view, visit http://gerrit.cloudera.org:8080/20104 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia4b4b2e519ee89f0a13fdb62d0471ee4047f6421 Gerrit-Change-Number: 20104 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>