Hello Kurt Deschler, Abhishek Rawat, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20104

to look at the new patch set (#5).

Change subject: IMPALA-11842: Improve memory estimation for Aggregate
......................................................................

IMPALA-11842: Improve memory estimation for Aggregate

Planner often overestimates aggregation node memory estimate since it
uses simple multiplication of NDVs of contributing grouping columns.
This patch introduces new query options LARGE_AGG_MEM_THRESHOLD and
AGG_MEM_CORRELATION_FACTOR. If the estimated perInstanceDataBytes from
the NDV multiplication method exceed LARGE_AGG_MEM_THRESHOLD,
recompute perInstanceDataBytes again by comparing against the max(NDV) &
AGG_MEM_CORRELATION_FACTOR method.

perInstanceDataBytes is kept at LARGE_AGG_MEM_THRESHOLD at a minimum
so that low max(NDV) will not negatively impact query execution. Unlike
PREAGG_BYTES_LIMIT, LARGE_AGG_MEM_THRESHOLD is applied on both
preaggregation and final aggregation, and does not cap max memory
reservation of the aggregation node (it may still increase memory
allocation beyond the estimate if it is available).

Testing:
- Run the patch with 10 nodes, MT_DOP=12, against TPC-DS 3TB scale.
  Among 103 queries, 20 queries have lower
  "Per-Host Resource Estimates", 11 have lower "Cluster Memory
  Admitted", and 3 have over 10% reduced latency. No significant
  regression in query latency was observed.
- Pass core tests.

Change-Id: Ia4b4b2e519ee89f0a13fdb62d0471ee4047f6421
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test
16 files changed, 1,365 insertions(+), 99 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/20104/5
--
To view, visit http://gerrit.cloudera.org:8080/20104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia4b4b2e519ee89f0a13fdb62d0471ee4047f6421
Gerrit-Change-Number: 20104
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>

Reply via email to