[ https://issues.apache.org/jira/browse/CALCITE-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699933#comment-17699933 ]
Pinhan Zhao commented on CALCITE-5578: -------------------------------------- I believe this would be the same issue (with the testAggregateCaseToFilter rule) as in [https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-5516?filter=allopenissues|https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-5516?filter=allopenissues.] I've marked the case documented in CALCITE-5516 as a duplicate of this. Please check. Thanks so much. > RelOptRulesTest testAggregateCaseToFilter optimized plan not semantically > equivalent to the original one after conversion > ------------------------------------------------------------------------------------------------------------------------- > > Key: CALCITE-5578 > URL: https://issues.apache.org/jira/browse/CALCITE-5578 > Project: Calcite > Issue Type: Bug > Components: core > Reporter: Pinhan Zhao > Priority: Blocker > > After converting the "plan before" and "plan after" of > testAggregateCaseToFilter > ([https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java#L4140]) > to PostgreSQL queries, we obtained two queries: > {code:java} > SELECT SUM(SAL) AS SUM_SAL, COUNT(DISTINCT CASE WHEN JOB = 'CLERK' THEN > DEPTNO ELSE NULL END) AS COUNT_DISTINCT_CLERK, SUM(CASE WHEN DEPTNO = 10 THEN > SAL ELSE NULL END) AS SUM_SAL_D10, SUM(CASE WHEN DEPTNO = 20 THEN SAL ELSE 0 > END) AS SUM_SAL_D20, SUM(CASE WHEN DEPTNO = 30 THEN 1 ELSE 0 END) AS > COUNT_D30, COUNT(CASE WHEN DEPTNO = 40 THEN 'x' ELSE NULL END) AS COUNT_D40, > SUM(CASE WHEN DEPTNO = 45 THEN 1 ELSE NULL END) AS COUNT_D45, SUM(CASE WHEN > DEPTNO = 50 THEN 1 ELSE NULL END) AS COUNT_D50, SUM(CAST(NULL AS DECIMAL(19, > 9))) AS SUM_NULL_D60, SUM(CASE WHEN DEPTNO = 70 THEN NULL ELSE 1 END) AS > SUM_NULL_D70, COUNT(CASE WHEN DEPTNO = 20 THEN 1 ELSE NULL END) AS COUNT_D20 > FROM EMP{code} > and > {code:java} > SELECT SUM(SAL) AS SUM_SAL, COUNT(DISTINCT DEPTNO) FILTER (WHERE JOB = > 'CLERK') AS COUNT_DISTINCT_CLERK, SUM(SAL) FILTER (WHERE DEPTNO = 10) AS > SUM_SAL_D10, SUM(SAL) FILTER (WHERE DEPTNO = 20) AS SUM_SAL_D20, > CAST(COUNT(*) FILTER (WHERE DEPTNO = 30) AS INTEGER) AS COUNT_D30, COUNT(*) > FILTER (WHERE DEPTNO = 40) AS COUNT_D40, SUM(1) FILTER (WHERE DEPTNO = 45) AS > COUNT_D45, SUM(1) FILTER (WHERE DEPTNO = 50) AS COUNT_D50, SUM(CAST(NULL AS > DECIMAL(19, 9))) AS SUM_NULL_D60, SUM(1) FILTER (WHERE DEPTNO <> 70) AS > SUM_NULL_D70, COUNT(*) FILTER (WHERE DEPTNO = 20) AS COUNT_D20 FROM EMP{code} > > The code we used to convert the plans to PostgreSQL queries is: > {code:java} > converter.visitRoot(rel).asStatement().toSqlString(PostgresqlSqlDialect.DEFAULT).getSql().replace('\n', > '');{code} > > However, these queries are not semantically equivalent. With a > counterexample below (which I believe is also satisfying the integrity > constraints): > {code:java} > CREATE TABLE EMP ( > EMPNO INTEGER, > DEPTNO INTEGER, > ENAME VARCHAR(20), > JOB VARCHAR(20), > MGR INTEGER, > HIREDATE DATE, > SAL INTEGER, > COMM INTEGER, > SLACKER INTEGER > ); > INSERT INTO EMP VALUES (0, 70, '-2147483649', '-6721455509335307966', 0, > '1970-01-01', 0, 0, 1); > CREATE TABLE DEPT ( > DEPTNO INTEGER, > NAME VARCHAR(20) > ); > INSERT INTO DEPT VALUES (70, '-2147483649'); > CREATE TABLE BONUS ( > ENAME VARCHAR(20), > JOB VARCHAR(20), > SAL INTEGER, > COMM INTEGER > ); > INSERT INTO BONUS VALUES ('2147483648', '2147483648', 0, 0); > CREATE TABLE EMPNULLABLES ( > EMPNO INTEGER, > DEPTNO INTEGER, > ENAME VARCHAR(20), > JOB VARCHAR(20), > MGR INTEGER, > HIREDATE DATE, > SAL INTEGER, > COMM INTEGER, > SLACKER INTEGER > ); > INSERT INTO EMPNULLABLES VALUES (0, 70, '2147483648', '2147483648', 0, > '1970-01-01', 0, 0, 1); > CREATE TABLE EMPNULLABLES_20 ( > EMPNO INTEGER, > DEPTNO INTEGER, > ENAME VARCHAR(20), > JOB VARCHAR(20), > MGR INTEGER, > HIREDATE DATE, > SAL INTEGER, > COMM INTEGER, > SLACKER INTEGER > ); > INSERT INTO EMPNULLABLES_20 VALUES (0, 70, '2147483648', '2147483648', 0, > '1970-01-01', 0, 0, 1); > CREATE TABLE EMP_B ( > EMPNO INTEGER, > DEPTNO INTEGER, > ENAME VARCHAR(20), > JOB VARCHAR(20), > MGR INTEGER, > HIREDATE DATE, > SAL INTEGER, > COMM INTEGER, > SLACKER INTEGER, > BIRTHDATE DATE > ); > INSERT INTO EMP_B VALUES (0, 70, '2147483648', '2147483648', 0, '1970-01-01', > 0, 0, 1, '1970-01-01');{code} > > we can have two queries produce different output: > {code:java} > sum_sal | count_distinct_clerk | sum_sal_d10 | sum_sal_d20 | count_d30 | > count_d40 | count_d45 | count_d50 | sum_null_d60 | sum_null_d70 | count_d20 > ---------+----------------------+-------------+-------------+-----------+-----------+-----------+-----------+--------------+--------------+----------- > 0 | 0 | | 0 | 0 | > 0 | | | | | 0 > (1 row){code} > vs > {code:java} > sum_sal | count_distinct_clerk | sum_sal_d10 | sum_sal_d20 | count_d30 | > count_d40 | count_d45 | count_d50 | sum_null_d60 | sum_null_d70 | count_d20 > ---------+----------------------+-------------+-------------+-----------+-----------+-----------+-----------+--------------+--------------+----------- > 0 | 0 | | | 0 | > 0 | | | | | 0 > (1 row){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)