[
https://issues.apache.org/jira/browse/CALCITE-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yu Xu updated CALCITE-7484:
---------------------------
Summary: Add a rule to eliminate redundant aggregates functions over GROUP
BY keys (was: Add a rule to eliminate redundant aggregates over GROUP BY keys)
> Add a rule to eliminate redundant aggregates functions over GROUP BY keys
> -------------------------------------------------------------------------
>
> Key: CALCITE-7484
> URL: https://issues.apache.org/jira/browse/CALCITE-7484
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.41.0
> Reporter: Yu Xu
> Assignee: Yu Xu
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> Sql like:
> {code:java}
> select sal, max(sal) as sal_max, sum(comm) as comm_sum from emp group by sal,
> deptno; {code}
> It should be optimized as follows (the calculation of the aggregate function
> max is redundant):
>
> {code:java}
> select sal, sal as sal_max, sum(comm) as comm_sum from emp group by sal,
> deptno; {code}
> and current plan:
>
> {code:java}
> LogicalProject(SAL=[$0], SAL_MAX=[$2], COMM_SUM=[$3])
> LogicalAggregate(group=[{0, 1}], SAL_MAX=[MAX($0)], COMM_SUM=[SUM($2)])
> LogicalProject(SAL=[$5], DEPTNO=[$7], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> It would be better to optimized to:
>
> {code:java}
> LogicalProject(SAL=[$0], SAL_MAX=[$2], COMM_SUM=[$3])
> LogicalProject(SAL=[$0], DEPTNO=[$1], SAL0=[$0], COMM_SUM=[$2])
> LogicalAggregate(group=[{0, 1}], COMM_SUM=[SUM($2)])
> LogicalProject(SAL=[$5], DEPTNO=[$7], COMM=[$6])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> As far as I know, similar optimizations exist in some mainstream databases.
>
> Therefore, we maybe can introduce a rule to eliminate redundant aggregation
> functions on the GROUP BY key.
> Reason: When the parameter of an aggregation function is itself the key of
> GROUP BY, the aggregation operation is redundant. This is because after
> grouping by a key, the value of that key within each group is naturally
> unique. In this case:
> MAX(a) results in 'a' itself
> MIN(a) results in 'a' itself
> AVG(a) results in 'a' itself
> Optimize aggregation functions such as max/min/avg/any_val
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)