Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 )
Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations ...................................................................... Patch Set 12: (6 comments) Looks great. The comment on subquery rewrite test is orthogonal to this patch. http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG@10 PS12, Line 10: pre-aggregation nit. exceed the max line threshold. http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc@76 PS12, Line 76: VLOG_QUERY << "the number of rows (" << aggs_[0]->GetNumKeys() << ") returned" : " from the aggregation node has exceeded the limit of " << limit(); nit. May use Substitute() which is faster. Example: orc-column-readers.cc: return Substitute("$0 column (ORC id=$1)", node->toString(), node->getColumnId()); sort-node.cc: VLOG(3) << Substitute("Sort estimation: estimated_full_input_size=$0 " http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift@479 PS12, Line 479: halt nit. complete http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@646 PS12, Line 646: For SELECT DISTINCT f1,f2,...fn FROM t LIMIT n nit. I think we should mention the two conditions in the commit message here. The above SQL is an example. http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@648 PS12, Line 648: Halt nit. May use Complete, as Halt implies stop for some reason. http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test: http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934 PS11, Line 2934: limit: 2 > This is an SQL optimization. Look at SubqueryRewriter# mergeExpr. This is a Since we do not push down id from the outer side to the inner, I would think we need to have ALL distinct values from the union side. If limit 2 is for the # of groups (i.e., taking only 2 groups, or 2 distinct values), then is it right? -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 12 Gerrit-Owner: liuyao <liu...@sensorsdata.cn> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: liuyao <liu...@sensorsdata.cn> Gerrit-Comment-Date: Fri, 10 Sep 2021 13:40:16 +0000 Gerrit-HasComments: Yes