Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
......................................................................


Patch Set 12:

(6 comments)

Looks great.

The comment on subquery rewrite test is orthogonal to this patch.

http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG@10
PS12, Line 10: pre-aggregation
nit. exceed the max line threshold.


http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc@76
PS12, Line 76:  VLOG_QUERY << "the number of rows (" << aggs_[0]->GetNumKeys() 
<< ") returned"
             :               " from the aggregation node has exceeded the limit 
of " << limit();
nit. May use Substitute() which is faster.

Example:
orc-column-readers.cc:  return Substitute("$0 column (ORC id=$1)", 
node->toString(), node->getColumnId());

sort-node.cc:  VLOG(3) << Substitute("Sort estimation: 
estimated_full_input_size=$0 "


http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift@479
PS12, Line 479: halt
nit. complete


http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@646
PS12, Line 646: For SELECT DISTINCT f1,f2,...fn FROM t LIMIT n
nit. I think we should mention the two conditions in the commit message here. 
The above SQL is an example.


http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@648
PS12, Line 648: Halt
nit. May use Complete, as Halt implies stop for some reason.


http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934
PS11, Line 2934: limit: 2
> This is an SQL optimization. Look at SubqueryRewriter# mergeExpr. This is a
Since we do not push down id from the outer side to the inner, I would think we 
need to have ALL distinct values from the union side.

If limit 2 is for the # of groups (i.e., taking only 2 groups, or 2 distinct 
values), then is it right?



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 12
Gerrit-Owner: liuyao <liu...@sensorsdata.cn>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: liuyao <liu...@sensorsdata.cn>
Gerrit-Comment-Date: Fri, 10 Sep 2021 13:40:16 +0000
Gerrit-HasComments: Yes

Reply via email to