[ https://issues.apache.org/jira/browse/IMPALA-12964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877409#comment-17877409 ]
ASF subversion and git services commented on IMPALA-12964: ---------------------------------------------------------- Commit 02469e723b393de3815acdf2df027a32e0df7352 in impala's branch refs/heads/master from Steve Carlin [ https://gitbox.apache.org/repos/asf?p=impala.git;h=02469e723 ] IMPALA-12964: Implement basic aggregation in the Calcite planner Basic aggregation functionality is now added to the Calcite planner. The implementation of aggregation was a little tricky on the conversion from the Aggregate RelNode to the Impala Agg PlanNode. The compilation in Impala requires some AggregateInfo structures which may set up multiple internal PlanNodes. Some parts of the Analyzer are used by AggregateInfo. This usage of Analyzer puts two design goals in conflict with each other, which are: 1) Remove dependency on the Analyzer since Calcite does all the parsing and validation 2) Avoid refactoring in the first major iteration of the Calcite planner. To resolve this, a SimplifiedAnalyzer class has been created which is injected into the AggregateInfo. Some methods of the Analyzer class are overridden to avoid the non-Calcite planner analysis. The SimplifiedAnalyzer overrides two aspects of the Analyzer: 1) "Having" filter conjuncts are going to be "unassigned conjuncts". After Calcite validates and optimizes the plan, the only filter conjuncts above the aggregation will be the "having" clause, so all these conjuncts will be used in the aggregate (sidenote: optimization rules have not been pushed yet to move filters underneath the aggregate, but that will come in a push in the near future). Once the aggregate has been changed to a PlanNode, we can clear out the unassigned conjuncts. 2) Because the Aggregte PlanNodes can have multiple layers, it may be responsible for creating some TupleDescriptors and SlotDescriptors for these PlanNodes. The SlotDescriptors need to be "materialized". In the non-Calcite planner, this is done through its planning process. In the Calcite planner, the materialization can happen immediately when the PlanNode is created. So the "addSlotDescriptor" is overridden to call the parent, but then to immediately materialize the SlotDescriptor. The rest of the ImpalaAggRel is hopefully self-explanatory. The groups, aggregates, and grouping sets are extracted from the RelNodes and used in the PlanNodes. The logic to set up multiple PlanNodes and the creation of MultiAggregateInfo and AggregateInfo objects are similar to what is used in the non-Calcite planner. Change-Id: Iacf0de8ba11f0d31d73d624f0c9a91db9997cfd5 Reviewed-on: http://gerrit.cloudera.org:8080/21238 Reviewed-by: Michael Smith <michael.sm...@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Implement aggregation capability > -------------------------------- > > Key: IMPALA-12964 > URL: https://issues.apache.org/jira/browse/IMPALA-12964 > Project: IMPALA > Issue Type: Sub-task > Reporter: Steve Carlin > Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org