Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10771 )
Change subject: IMPALA-110: Support for multiple DISTINCT ...................................................................... Patch Set 10: (15 comments) http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@40 PS10, Line 40: unique set of DISTINCT expressions nit: unique list of grouping expressions? http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@53 PS10, Line 53: contains the phase 2 aggregate functions and : * grouping exprs also contains merging aggregate functions for non-distinct aggregates, right? http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@56 PS10, Line 56: grouping exprs are identical identical to aggInfo's grouping exprs http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@58 PS10, Line 58: grouping exprs are identical grouping exprs are identical to aggInfo.secondPhaseDistinctAggInfo's grouping exprs, right? It's not too clear from the comment. http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@178 PS10, Line 178: * At the moment, we require that all distinct aggregate : * functions be applied to the same set of exprs (ie, we can't do something : * like SELECT COUNT(DISTINCT id), COUNT(DISTINCT address)). Obsolete comment http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@193 PS10, Line 193: * TODO: expand implementation to cover the general case; this will require : * a different execution strategy please resolve TODO http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@44 PS10, Line 44: DISTINCT exprs. nit: list of grouping exprs? http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@57 PS10, Line 57: the non-distinct class is carried along the two phases Could be elaborated a bit: * in the 1st phase the non-distinct class is grouped on the grouping exprs from GROUP BY (if any) concatenated by the distinct expr * in the 2nd phase everything is grouped on the grouping exprs and the non-distinct class is merge aggregated http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@78 PS10, Line 78: To distinguish between the aggregation cla nit: Maybe it would be better to start with the context/motivation before going into the details E.g.: In a distributed plan we need to send the tuple rows over the network to some merge aggregators, therefore we need to partition our rows based on the actual values... In the final transposition step we need to group by on the proper slots... http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@289 PS10, Line 289: aggClass nit: we both have aggregation classes and aggregation infos here, I think it's better not to use the terms interchangeably. http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@307 PS10, Line 307: has typo: hash, but as far I understood no hashing is involved in the transposing step. http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@472 PS10, Line 472: lastAggClass = aggInfos_ nit: lastAggInfo? http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@476 PS10, Line 476: else I'm just wondering if this 'else' is intentional here. In other words, if the only aggregation info was added in the previous block, do we want to create a simplified agg Smap? The tests doesn't seem to catch the removal of this 'else'. http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java@649 PS10, Line 649: aggClass : aggInfos_ nit: agg class and agg info also used interchangeably here http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/10771/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@793 PS10, Line 793: * for the phase 2 AggregationNode. Might be worth mentioning the case when the aggregation phase is transpose. -- To view, visit http://gerrit.cloudera.org:8080/10771 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I055402eaef6d81e5f70e850d9f8a621e766830a4 Gerrit-Change-Number: 10771 Gerrit-PatchSet: 10 Gerrit-Owner: Thomas Marshall <thomasmarsh...@cmu.edu> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Tue, 04 Sep 2018 15:48:34 +0000 Gerrit-HasComments: Yes