Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 )
Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations ...................................................................... Patch Set 11: (8 comments) Looks pretty good. http://gerrit.cloudera.org:8080/#/c/17821/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17821/11//COMMIT_MSG@10 PS11, Line 10: In some cases nit. When both conditions below are true, http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.h File be/src/exec/aggregation-node-base.h: http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.h@71 PS11, Line 71: bool fast_limit_check_; nit. May init to false here. http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.cc File be/src/exec/aggregation-node-base.cc: http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.cc@93 PS11, Line 93: if (num_aggs == 1 && limit() != -1 && pnode.aggs_[0]->GetNumGroupingExprs() > 0 : && pnode.aggs_[0]->aggregate_functions_.empty() : && pnode.aggs_[0]->conjuncts_.empty()) { : fast_limit_check_ = true; : GroupingAggregator* gpAgg = dynamic_cast<GroupingAggregator*>(aggs_[0].get()); : gpAgg->UnsetLimit(); I also wonder if fast_limit_check_ can be moved to the plan node for Aggregation and set by FE. The same comment goes to limit. That is, set the limit in TAggregator in FE. In this way, the solution is cleaner. http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node.cc@75 PS11, Line 75: LOG_QUERY << "the number of rows returned by agg node has exceeded" : " the limit."; nit. May make the message more specific by plugging in the numbers. For example, the number of rows (3) returned from the aggregation node has exceeded the limit of 2. http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc File be/src/exec/streaming-aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc@129 PS11, Line 129: if (aggs_[0]->GetNumKeys() >= limit()) { nit. We probably should DCHECK(limit() > -1) here before the IF. http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc@133 PS11, Line 133: VLOG_QUERY << "the number of rows returned by streaming agg node has exceeded" : " the limit."; nit. same as the other one to make the message more specific. http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test: http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934 PS11, Line 2934: limit: 2 nit. I wonder if this correct. On paper, we need to look at all distinct values (via grouping) to make a decision. http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/targeted-perf/queries/aggregation.test File testdata/workloads/targeted-perf/queries/aggregation.test: http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/targeted-perf/queries/aggregation.test@2731 PS11, Line 2731: count(*) This violates condition 1? -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 11 Gerrit-Owner: liuyao <liu...@sensorsdata.cn> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Thu, 09 Sep 2021 15:08:25 +0000 Gerrit-HasComments: Yes