Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20104 )
Change subject: IMPALA-11842: Improve memory estimation for Aggregate ...................................................................... Patch Set 11: (6 comments) > Patch Set 9: Code-Review+1 > > (6 comments) > > This makes sense to me. However, we are adding more and more query options > that might be hard to tune. E.g. given the actual mem usage, what option > values should be set to correct the estimation to it. Maybe some options or > hints that can be set directly on the node's estimation value would be easier > to use, and easier for integration with History-based optimizer. I generally agree with issues you raise in your comment. Having history-based optimizer and applying it at per plan node granularity level is much better, but will require require more infrastructure work. This patch, on the other hand, only focus on memory estimation of aggregation node, which often severely overestimated due to the NDV multiplication. There is a future work that will try to incorporate multi-column statistics that should estimate better than this. http://gerrit.cloudera.org:8080/#/c/20104/9/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/20104/9/common/thrift/ImpalaService.thrift@810 PS9, Line 810: deemed > typo: deemed Done http://gerrit.cloudera.org:8080/#/c/20104/9/common/thrift/ImpalaService.thrift@816 PS9, Line 816: beyond the threshol > nit: "beyond the threshold"? Initially I thought this means we might get a Done http://gerrit.cloudera.org:8080/#/c/20104/9/common/thrift/ImpalaService.thrift@822 PS9, Line 822: be > nit: be Done http://gerrit.cloudera.org:8080/#/c/20104/9/common/thrift/ImpalaService.thrift@826 PS9, Line 826: // corrFactor = AGG_MEM_CORRELATION_FACTOR ^ N > nit: can we explan a little bit how it's used? E.g. the real factor (F) als Done http://gerrit.cloudera.org:8080/#/c/20104/9/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/20104/9/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@598 PS9, Line 598: perInstanceCardinality, l > nit: the arg can be 'perInstanceCardinality' directly Done http://gerrit.cloudera.org:8080/#/c/20104/9/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@640 PS9, Line 640: s; > Might be an existing issue. Should we use PlanNode.checkedMultiply() here? I think this is fine, since the multiplication is in double. The ceil and long casting seems to cap it to Long.MAX_VALUE. -- To view, visit http://gerrit.cloudera.org:8080/20104 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4b4b2e519ee89f0a13fdb62d0471ee4047f6421 Gerrit-Change-Number: 20104 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Comment-Date: Tue, 11 Jul 2023 15:58:30 +0000 Gerrit-HasComments: Yes