Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20104 )

Change subject: IMPALA-11842: Improve memory estimation for Aggregate
......................................................................


Patch Set 11:

(6 comments)

> Patch Set 9: Code-Review+1
>
> (6 comments)
>
> This makes sense to me. However, we are adding more and more query options 
> that might be hard to tune. E.g. given the actual mem usage, what option 
> values should be set to correct the estimation to it. Maybe some options or 
> hints that can be set directly on the node's estimation value would be easier 
> to use, and easier for integration with History-based optimizer.

I generally agree with issues you raise in your comment. Having history-based 
optimizer and applying it at per plan node granularity level is much better, 
but will require require more infrastructure work. This patch, on the other 
hand, only focus on memory estimation of aggregation node, which often severely 
overestimated due to the NDV multiplication. There is a future work that will 
try to incorporate multi-column statistics that should estimate better than 
this.

http://gerrit.cloudera.org:8080/#/c/20104/9/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/20104/9/common/thrift/ImpalaService.thrift@810
PS9, Line 810: deemed
> typo: deemed
Done


http://gerrit.cloudera.org:8080/#/c/20104/9/common/thrift/ImpalaService.thrift@816
PS9, Line 816: beyond the threshol
> nit: "beyond the threshold"? Initially I thought this means we might get a
Done


http://gerrit.cloudera.org:8080/#/c/20104/9/common/thrift/ImpalaService.thrift@822
PS9, Line 822: be
> nit: be
Done


http://gerrit.cloudera.org:8080/#/c/20104/9/common/thrift/ImpalaService.thrift@826
PS9, Line 826:   //   corrFactor = AGG_MEM_CORRELATION_FACTOR ^ N
> nit: can we explan a little bit how it's used? E.g. the real factor (F) als
Done


http://gerrit.cloudera.org:8080/#/c/20104/9/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20104/9/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@598
PS9, Line 598: perInstanceCardinality, l
> nit: the arg can be 'perInstanceCardinality' directly
Done


http://gerrit.cloudera.org:8080/#/c/20104/9/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@640
PS9, Line 640: s;
> Might be an existing issue. Should we use PlanNode.checkedMultiply() here?
I think this is fine, since the multiplication is in double. The ceil and long 
casting seems to cap it to Long.MAX_VALUE.



--
To view, visit http://gerrit.cloudera.org:8080/20104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4b4b2e519ee89f0a13fdb62d0471ee4047f6421
Gerrit-Change-Number: 20104
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Comment-Date: Tue, 11 Jul 2023 15:58:30 +0000
Gerrit-HasComments: Yes

Reply via email to