[ 
https://issues.apache.org/jira/browse/CALCITE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709987#comment-17709987
 ] 

Pinhan Zhao commented on CALCITE-5637:
--------------------------------------

I think by MySQL's definition 
([https://dev.mysql.com/doc/refman/8.0/en/miscellaneous-functions.html#function_any-value),]
 ANY_VALUE is not an aggregate function and therefore is not comparable to 
other aggregate functions like MAX - as being said in its documentation, using 
ANY_VALUE (with ONLY_FULL_GROUP_BY enabled) is essentially equivalent to 
disabling ONLY_FULL_GROUP_BY.

 

Given this fact, I tend to believe the testPushAggregateThroughJoin7 rule in 
Calcite is correct in terms of equivalence preserving for most DBMS generally, 
but it's erroneous for MySQL specifically since MySQL has different semantics 
for ANY_VALUE whereas Calcite fails to accommodate this diversity

> RelOptRulesTest testPushAggregateThroughJoin7 optimized query not equivalent 
> to the original one
> ------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-5637
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5637
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Pinhan Zhao
>            Priority: Major
>
> [https://github.com/apache/calcite/blob/fb340ece8c5e60ecfd6371950f8dcb665c85a712/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java#L5046]
> We first converted plans to SQL queries, resulting in two queries below:
> {code:sql}
> SELECT ANY_VALUE(t0.SAL) FROM EMP INNER JOIN (SELECT SAL FROM EMP GROUP BY 
> SAL) AS t0 ON EMP.SAL = t0.SAL;{code}
> and
> {code:sql}
> SELECT ANY_VALUE(t4.SAL) FROM (SELECT SAL FROM EMP GROUP BY SAL) AS t3 INNER 
> JOIN (SELECT SAL FROM EMP GROUP BY SAL) AS t4 ON t3.SAL = t4.SAL;{code}
> However, with the DB:
> {code:sql}
> CREATE TABLE EMP (
> EMPNO INTEGER,
> DEPTNO INTEGER,
> ENAME VARCHAR(20),
> JOB VARCHAR(20),
> MGR INTEGER,
> HIREDATE DATE,
> SAL INTEGER,
> COMM INTEGER,
> SLACKER INTEGER
> );
> INSERT INTO EMP VALUES (-1, 1, '2147483648', '-2147483649', 0, '1970-01-01', 
> 0, 0, 1);
> INSERT INTO EMP VALUES (0, 1, '2147483648', '-2147483649', NULL, 
> '1970-01-01', 0, 0, 1);{code}
>  
> the outputs of these two queries running on MySQL are different:
> {code:sql}
> ANY_VALUE(t0.SAL)
> 0
> 0{code}
> vs
> {code:sql}
> ANY_VALUE(t4.SAL)
> 0{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to