Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16242 )

Change subject: IMPALA-9979: part 2: partitioned top-n
......................................................................


Patch Set 30:

(15 comments)

This is a complex optimization. Nice work on piecing it together!
Sending the comments I have so far. Still have to go through the plans in more 
details.

http://gerrit.cloudera.org:8080/#/c/16242/30//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16242/30//COMMIT_MSG@49
PS30, Line 49: The
nit: 'This'


http://gerrit.cloudera.org:8080/#/c/16242/30//COMMIT_MSG@57
PS30, Line 57: both
             : the partitioning
nit: ..and non-partitioned case ?


http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.h
File be/src/exec/topn-node.h:

http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.h@133
PS30, Line 133: partition are returned
Would be good to add how the returned partitions are ordered among 
themselves...for instance whether this is a deterministic ascending order of 
partition keys.   (update:  I see below you have more details on, so maybe ok 
to ignore).


http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc
File be/src/exec/topn-node.cc:

http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc@53
PS30, Line 53: the the
nit: repetition 'the'


http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc@59
PS30, Line 59: the the
nit:  repetition 'the'


http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc@398
PS30, Line 398:     RETURN_IF_CANCELLED(state);
I was wondering if the inner while loop (line 418) should also have a 
cancellation check or is that an overkill considering the  number of times 
inner loop is executed is expected to be small.


http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc@559
PS30, Line 559:   // We evict heaps starting with the heaps that were least 
effective at filtering
The eviction policy is interesting.  Considering that a rank()/rownum() can 
only be computed after a partition has seen all its rows, the filtering would 
be blocked until then.  So it seems all partitions will have equal weightage, 
no ?


http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@279
PS30, Line 279:   public long estimateTopNMaterializedSize(long totalRows) {
Should this estimate account for any per-partition overhead (additional 
metadata needed per heap) ?


http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java:

http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@646
PS30, Line 646: limit on
nit: incomplete sentence


http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
File fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java:

http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@453
PS30, Line 453:           if (limit.limit > 1) {
Maybe simplify this if-else to  planNodeLimit = Math.max(limit.limit, 1)


http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@461
PS30, Line 461:           if (partitionByExprs.isEmpty()) {
There was a recent issue with constant partition-by and order-by exprs 
(https://issues.apache.org/jira/browse/IMPALA-10473). This check could also 
consider such cases probably after the other issue is fixed such that we don't 
create partitioned top-n for constant partition-by.


http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@885
PS30, Line 885:     /// Whether the source predicate was a simple < or <= 
inequality.
This sounded confusing at first ..I interpreted it to mean true for < (the 
variable name is also isLessThan) and false for <= .
Maybe add the comment that this is false for =  and true for both < and <= .


http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@928
PS30, Line 928: to .
nit:  incomplete


http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@1158
PS30, Line 1158: returns
nit: 'returns ties'


http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/SelectNode.java
File fe/src/main/java/org/apache/impala/planner/SelectNode.java:

http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/SelectNode.java@81
PS30, Line 81:       selectNode.conjuncts_.addAll(conjuncts);
We should check if the child's conjuncts being added are unique compared to 
what this selectNode already had (unless we can be sure it is getting enforced 
in the way conjuncts are migrated).



--
To view, visit http://gerrit.cloudera.org:8080/16242
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5
Gerrit-Change-Number: 16242
Gerrit-PatchSet: 30
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: David Rorke <dro...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Feb 2021 07:48:37 +0000
Gerrit-HasComments: Yes

Reply via email to