[ 
https://issues.apache.org/jira/browse/CALCITE-7479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18075582#comment-18075582
 ] 

Zhen Chen commented on CALCITE-7479:
------------------------------------

During testing, I found that if the {{GROUP BY}} contains the primary key, 
there is no issue. However, if the {{GROUP BY}} does not contain the primary 
key, using {{SINGLE_VALUE}} may cause problems.

Consider the following example:
{code:java}
SELECT
    deptno,
    deptno + 2
FROM emp
GROUP BY deptno, deptno + 2
ORDER BY deptno; {code}
Since {{{}deptno -> deptno + 2{}}}, according to the current rule, {{deptno + 
2}} should be removable. However, execution fails.

If I remove the {{GROUP BY}} clause from the SQL above, the correct result set 
is:
{code:java}
 deptno | deptno + 2
--------+------------
     10 |         12
     10 |         12
     10 |         12
     20 |         22
     20 |         22
     20 |         22
     20 |         22
     20 |         22
     30 |         32
     30 |         32
     30 |         32
     30 |         32
     30 |         32
     30 |         32
(14 rows) {code}
If we rewrite it using {{{}SINGLE_VALUE{}}}, the plan becomes:
{code:java}
EnumerableSort(sort0=[$0], dir0=[ASC])
  EnumerableAggregate(group=[{0}], EXPR$1=[SINGLE_VALUE($1)])
    EnumerableCalc(expr#0..7=[{inputs}], expr#8=[2], expr#9=[+($t7, $t8)], 
DEPTNO=[$t7], EXPR$1=[$t9])
      EnumerableTableScan(table=[[scott, EMP]]) {code}
During execution, it throws the following error:
{code:java}
Caused by: java.lang.IllegalStateException: more than one value in agg 
SINGLE_VALUE {code}
So my understanding is that the validation of {{SINGLE_VALUE}} happens before 
aggregation. I think this logic is reasonable: although all values of the 
column (for example, {{deptno + 2}} in this SQL) are the same, it does not 
actually mean there is only one row, so the exception is expected.

Therefore, I think this rewrite should use {{ANY_VALUE}} instead of 
{{{}SINGLE_VALUE{}}}.

> Remove redundant aggregate group keys with FD
> ---------------------------------------------
>
>                 Key: CALCITE-7479
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7479
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.41.0
>            Reporter: Zhen Chen
>            Priority: Minor
>              Labels: pull-request-available
>
> {*}Proposal{*}​
> Add a new {{AGGREGATE}} rewrite rule to identify and remove redundant 
> grouping keys from an {{AGGREGATE}} that are functionally determined by 
> preceding grouping keys. To preserve the original semantics, the rewrite will:
>  * Shorten the grouping key list of the {{AGGREGATE}}
>  * Use ANY_VALUE to restore the removed grouping column(s)
>  * Reorder the output columns to match the original order using a {{PROJECT}}
> {*}Example of the Planned Change{*}​
> Original SQL:
> {code:java}
> select deptno, name, count() as c
> from sales.dept
> group by deptno, name {code}
> Original plan:
> {code:java}
> LogicalAggregate(group=[{0, 1}], C=[COUNT()])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {code}
> Rewritten plan:
> {code:java}
> LogicalProject(DEPTNO=[$0], NAME=[$1], C=[$2])
>   LogicalAggregate(group=[{0}], NAME=[ANY_VALUE($1)], C=[COUNT()])
>     LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {code}
> {*}Explanation{*}​
> If {{{}name{}}}is functionally determined by {{{}deptno{}}}, then 
> {{{}name{}}}is removed from the grouping keys.
>  * ANY_VALUE is used to retain the value of the removed column semantically.
>  * The PROJECT restores the final output column order and field names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to