Hello Quanlong Huang, Aman Sinha, Kurt Deschler, Abhishek Rawat, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20104

to look at the new patch set (#14).

Change subject: IMPALA-11842: Improve memory estimation for Aggregate
......................................................................

IMPALA-11842: Improve memory estimation for Aggregate

Planner often overestimates aggregation node memory estimate since it
uses simple multiplication of NDVs of contributing grouping columns.
This patch introduces new query options LARGE_AGG_MEM_THRESHOLD and
AGG_MEM_CORRELATION_FACTOR. If the estimated perInstanceDataBytes from
the NDV multiplication method exceed LARGE_AGG_MEM_THRESHOLD, recompute
perInstanceDataBytes again by comparing against the max(NDV) &
AGG_MEM_CORRELATION_FACTOR method.

perInstanceDataBytes is kept at LARGE_AGG_MEM_THRESHOLD at a minimum so
that low max(NDV) will not negatively impact query execution. Unlike
PREAGG_BYTES_LIMIT, LARGE_AGG_MEM_THRESHOLD is evaluated on both
preaggregation and final aggregation, and does not cap max memory
reservation of the aggregation node (it may still increase memory
allocation beyond the estimate if it is available). However, if a plan
node is a streaming preaggregation node and PREAGG_BYTES_LIMIT is set,
then PREAGG_BYTES_LIMIT will override the value of
LARGE_AGG_MEM_THRESHOLD as a threshold.

Testing:
- Run the patch with 10 nodes, MT_DOP=12, against TPC-DS 3TB scale.
  Among 103 queries, 20 queries have lower
  "Per-Host Resource Estimates", 11 have lower "Cluster Memory
  Admitted", and 3 have over 10% reduced latency. No significant
  regression in query latency was observed.
- Pass core tests.

Change-Id: Ia4b4b2e519ee89f0a13fdb62d0471ee4047f6421
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test
16 files changed, 1,382 insertions(+), 105 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20104/14
--
To view, visit http://gerrit.cloudera.org:8080/20104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia4b4b2e519ee89f0a13fdb62d0471ee4047f6421
Gerrit-Change-Number: 20104
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>

Reply via email to