[jira] [Commented] (IMPALA-12964) Implement aggregation capability

ASF subversion and git services (Jira) Wed, 28 Aug 2024 07:02:43 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-12964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877409#comment-17877409
 ]


ASF subversion and git services commented on IMPALA-12964:
----------------------------------------------------------

Commit 02469e723b393de3815acdf2df027a32e0df7352 in impala's branch 
refs/heads/master from Steve Carlin
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=02469e723 ]

IMPALA-12964: Implement basic aggregation in the Calcite planner

Basic aggregation functionality is now added to the Calcite planner.

The implementation of aggregation was a little tricky on the
conversion from the Aggregate RelNode to the Impala Agg PlanNode. The
compilation in Impala requires some AggregateInfo structures which
may set up multiple internal PlanNodes. Some parts of the Analyzer
are used by AggregateInfo.

This usage of Analyzer puts two design goals in conflict with each
other, which are:
1) Remove dependency on the Analyzer since Calcite does all the parsing
and validation
2) Avoid refactoring in the first major iteration of the Calcite planner.

To resolve this, a SimplifiedAnalyzer class has been created which is
injected into the AggregateInfo. Some methods of the Analyzer class are
overridden to avoid the non-Calcite planner analysis.

The SimplifiedAnalyzer overrides two aspects of the Analyzer:
1) "Having" filter conjuncts are going to be "unassigned conjuncts".
After Calcite validates and optimizes the plan, the only filter
conjuncts above the aggregation will be the "having" clause, so all
these conjuncts will be used in the aggregate (sidenote: optimization
rules have not been pushed yet to move filters underneath the aggregate,
but that will come in a push in the near future). Once the aggregate
has been changed to a PlanNode, we can clear out the unassigned conjuncts.
2) Because the Aggregte PlanNodes can have multiple layers, it may
be responsible for creating some TupleDescriptors and SlotDescriptors
for these PlanNodes. The SlotDescriptors need to be "materialized".
In the non-Calcite planner, this is done through its planning process.
In the Calcite planner, the materialization can happen immediately when
the PlanNode is created. So the "addSlotDescriptor" is overridden to
call the parent, but then to immediately materialize the SlotDescriptor.

The rest of the ImpalaAggRel is hopefully self-explanatory. The groups,
aggregates, and grouping sets are extracted from the RelNodes and used
in the PlanNodes. The logic to set up multiple PlanNodes and the creation
of MultiAggregateInfo and AggregateInfo objects are similar to what is
used in the non-Calcite planner.

Change-Id: Iacf0de8ba11f0d31d73d624f0c9a91db9997cfd5
Reviewed-on: http://gerrit.cloudera.org:8080/21238
Reviewed-by: Michael Smith <michael.sm...@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Implement aggregation capability
> --------------------------------
>
>                 Key: IMPALA-12964
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12964
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Steve Carlin
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12964) Implement aggregation capability

Reply via email to