Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10771 )

Change subject: IMPALA-110: Support for multiple DISTINCT
......................................................................


Patch Set 10:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@40
PS10, Line 40: unique set of DISTINCT expressions
nit: unique list of grouping expressions?


http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@53
PS10, Line 53: contains the phase 2 aggregate functions and
             :  *     grouping exprs
also contains merging aggregate functions for non-distinct aggregates, right?


http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@56
PS10, Line 56: grouping exprs are identical
identical to aggInfo's grouping exprs


http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@58
PS10, Line 58: grouping exprs are identical
grouping exprs are identical to aggInfo.secondPhaseDistinctAggInfo's grouping 
exprs, right? It's not too clear from the comment.


http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@178
PS10, Line 178:    * At the moment, we require that all distinct aggregate
              :    * functions be applied to the same set of exprs (ie, we 
can't do something
              :    * like SELECT COUNT(DISTINCT id), COUNT(DISTINCT address)).
Obsolete comment


http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@193
PS10, Line 193:    * TODO: expand implementation to cover the general case; 
this will require
              :    * a different execution strategy
please resolve TODO


http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@44
PS10, Line 44: DISTINCT exprs.
nit: list of grouping exprs?


http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@57
PS10, Line 57: the non-distinct class is carried along the two phases
Could be elaborated a bit:
* in the 1st phase the non-distinct class is grouped on the grouping exprs from 
GROUP BY (if any) concatenated by the distinct expr
* in the 2nd phase everything is grouped on the grouping exprs and the 
non-distinct class is merge aggregated


http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@78
PS10, Line 78: To distinguish between the aggregation cla
nit: Maybe it would be better to start with the context/motivation before going 
into the details

E.g.:
In a distributed plan we need to send the tuple rows over the network to some 
merge aggregators, therefore we need to partition our rows based on the actual 
values...

In the final transposition step we need to group by on the proper slots...


http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@289
PS10, Line 289: aggClass
nit: we both have aggregation classes and aggregation infos here, I think it's 
better not to use the terms interchangeably.


http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@307
PS10, Line 307: has
typo: hash, but as far I understood no hashing is involved in the transposing 
step.


http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@472
PS10, Line 472: lastAggClass = aggInfos_
nit: lastAggInfo?


http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@476
PS10, Line 476: else
I'm just wondering if this 'else' is intentional here.
In other words, if the only aggregation info was added in the previous block, 
do we want to create a simplified agg Smap?

The tests doesn't seem to catch the removal of this 'else'.


http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@649
PS10, Line 649: aggClass : aggInfos_
nit: agg class and agg info also used interchangeably here


http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@793
PS10, Line 793:    * for the phase 2 AggregationNode.
Might be worth mentioning the case when the aggregation phase is transpose.



--
To view, visit http://gerrit.cloudera.org:8080/10771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I055402eaef6d81e5f70e850d9f8a621e766830a4
Gerrit-Change-Number: 10771
Gerrit-PatchSet: 10
Gerrit-Owner: Thomas Marshall <thomasmarsh...@cmu.edu>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Tue, 04 Sep 2018 15:48:34 +0000
Gerrit-HasComments: Yes

Reply via email to