[ 
https://issues.apache.org/jira/browse/CALCITE-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Xu updated CALCITE-7484:
---------------------------
    Summary: Add a rule to eliminate redundant aggregates over GROUP BY keys  
(was: Add AggregateFunctionOfGroupByKeysRule to eliminate redundant aggregates 
over GROUP BY keys)

> Add a rule to eliminate redundant aggregates over GROUP BY keys
> ---------------------------------------------------------------
>
>                 Key: CALCITE-7484
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7484
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.41.0
>            Reporter: Yu Xu
>            Assignee: Yu Xu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.42.0
>
>
> Sql like:
> {code:java}
> select sal, max(sal) as sal_max, sum(comm) as comm_sum from emp group by sal, 
> deptno; {code}
> It should be optimized as follows (the calculation of the aggregate function 
> max is redundant):
>  
> {code:java}
> select sal, sal as sal_max, sum(comm) as comm_sum from emp group by sal, 
> deptno; {code}
> and current plan:
>  
> {code:java}
> LogicalProject(SAL=[$0], SAL_MAX=[$2], COMM_SUM=[$3])
>   LogicalAggregate(group=[{0, 1}], SAL_MAX=[MAX($0)], COMM_SUM=[SUM($2)])
>     LogicalProject(SAL=[$5], DEPTNO=[$7], COMM=[$6])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> It would be better to optimized to:
>  
> {code:java}
> LogicalProject(SAL=[$0], SAL_MAX=[$2], COMM_SUM=[$3])
>   LogicalProject(SAL=[$0], DEPTNO=[$1], SAL0=[$0], COMM_SUM=[$2])
>     LogicalAggregate(group=[{0, 1}], COMM_SUM=[SUM($2)])
>       LogicalProject(SAL=[$5], DEPTNO=[$7], COMM=[$6])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> As far as I know, similar optimizations exist in some mainstream databases.
>  
> Therefore, we maybe can introduce a rule to eliminate redundant aggregation 
> functions on the GROUP BY key.
> Reason: When the parameter of an aggregation function is itself the key of 
> GROUP BY, the aggregation operation is redundant. This is because after 
> grouping by a key, the value of that key within each group is naturally 
> unique. In this case:
> MAX(a) results in 'a' itself
> MIN(a) results in 'a' itself
> AVG(a) results in 'a' itself
> Optimize aggregation functions such as max/min/avg/any_val
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to