[
https://issues.apache.org/jira/browse/CALCITE-7479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18075582#comment-18075582
]
Zhen Chen commented on CALCITE-7479:
------------------------------------
During testing, I found that if the {{GROUP BY}} contains the primary key,
there is no issue. However, if the {{GROUP BY}} does not contain the primary
key, using {{SINGLE_VALUE}} may cause problems.
Consider the following example:
{code:java}
SELECT
deptno,
deptno + 2
FROM emp
GROUP BY deptno, deptno + 2
ORDER BY deptno; {code}
Since {{{}deptno -> deptno + 2{}}}, according to the current rule, {{deptno +
2}} should be removable. However, execution fails.
If I remove the {{GROUP BY}} clause from the SQL above, the correct result set
is:
{code:java}
deptno | deptno + 2
--------+------------
10 | 12
10 | 12
10 | 12
20 | 22
20 | 22
20 | 22
20 | 22
20 | 22
30 | 32
30 | 32
30 | 32
30 | 32
30 | 32
30 | 32
(14 rows) {code}
If we rewrite it using {{{}SINGLE_VALUE{}}}, the plan becomes:
{code:java}
EnumerableSort(sort0=[$0], dir0=[ASC])
EnumerableAggregate(group=[{0}], EXPR$1=[SINGLE_VALUE($1)])
EnumerableCalc(expr#0..7=[{inputs}], expr#8=[2], expr#9=[+($t7, $t8)],
DEPTNO=[$t7], EXPR$1=[$t9])
EnumerableTableScan(table=[[scott, EMP]]) {code}
During execution, it throws the following error:
{code:java}
Caused by: java.lang.IllegalStateException: more than one value in agg
SINGLE_VALUE {code}
So my understanding is that the validation of {{SINGLE_VALUE}} happens before
aggregation. I think this logic is reasonable: although all values of the
column (for example, {{deptno + 2}} in this SQL) are the same, it does not
actually mean there is only one row, so the exception is expected.
Therefore, I think this rewrite should use {{ANY_VALUE}} instead of
{{{}SINGLE_VALUE{}}}.
> Remove redundant aggregate group keys with FD
> ---------------------------------------------
>
> Key: CALCITE-7479
> URL: https://issues.apache.org/jira/browse/CALCITE-7479
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.41.0
> Reporter: Zhen Chen
> Priority: Minor
> Labels: pull-request-available
>
> {*}Proposal{*}
> Add a new {{AGGREGATE}} rewrite rule to identify and remove redundant
> grouping keys from an {{AGGREGATE}} that are functionally determined by
> preceding grouping keys. To preserve the original semantics, the rewrite will:
> * Shorten the grouping key list of the {{AGGREGATE}}
> * Use ANY_VALUE to restore the removed grouping column(s)
> * Reorder the output columns to match the original order using a {{PROJECT}}
> {*}Example of the Planned Change{*}
> Original SQL:
> {code:java}
> select deptno, name, count() as c
> from sales.dept
> group by deptno, name {code}
> Original plan:
> {code:java}
> LogicalAggregate(group=[{0, 1}], C=[COUNT()])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {code}
> Rewritten plan:
> {code:java}
> LogicalProject(DEPTNO=[$0], NAME=[$1], C=[$2])
> LogicalAggregate(group=[{0}], NAME=[ANY_VALUE($1)], C=[COUNT()])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {code}
> {*}Explanation{*}
> If {{{}name{}}}is functionally determined by {{{}deptno{}}}, then
> {{{}name{}}}is removed from the grouping keys.
> * ANY_VALUE is used to retain the value of the removed column semantically.
> * The PROJECT restores the final output column order and field names.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)