[ https://issues.apache.org/jira/browse/CALCITE-5516 ]
fugang deleted comment on CALCITE-5516: --------------------------------- was (Author: JIRAUSER305412): For the first example, the main issue lies in the function {{{}COALESCE(SUM(SAL), 0){}}}. By converting it to sql复制代码 | |CAST(CASE WHEN COUNT(*) = 0 THEN NULL ELSE SUM(SAL) END / COUNT(*) AS INTEGER)| we can avoid such inconsistencies, and this problem is not only present in the {{{}PostgresqlSqlDialect{}}}, but also in the {{{}MysqlSqlDialect{}}}. By modifying the {{reduceSum}} method and the {{matches}} method in the {{{}AggregateReduceFunctionsRule{}}}, it is possible to correct this issue. However, I am not sure if eliminating the conversion of {{SUM}} to {{SUM0}} would have any impact on other dialects. Although Postgresql itself does not support syntax like {{{}t WITHIN DISTINCT{}}}, Calcite itself is not tied to a specific SQL version or standard, and I have not found any distinction based on dialects in the validator module, which I believe is reasonable. > RelOptRulesTest AggregateReduceFunctionsRule optimized plan not semantically > equivalent to the original one after conversion > ---------------------------------------------------------------------------------------------------------------------------- > > Key: CALCITE-5516 > URL: https://issues.apache.org/jira/browse/CALCITE-5516 > Project: Calcite > Issue Type: Bug > Components: core > Affects Versions: 1.32.0 > Reporter: Pinhan Zhao > Priority: Blocker > > After converting the "plan before" and "plan after" of > testReduceWithNonTypePredicate > (https://github.com/apache/calcite/blob/fb340ece8c5e60ecfd6371950f8dcb665c85a712/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java#L6820) > to PostgreSQL queries, we obtained two queries: > {code:sql} > SELECT AVG(SAL), AVG(SAL) FROM EMP{code} > and > {code:sql} > SELECT AVG(SAL), CAST(CASE WHEN COUNT(*) = 0 THEN NULL ELSE > COALESCE(SUM(SAL), 0) END / COUNT(*) AS INTEGER) FROM EMP{code} > > The code we used to convert the plans to PostgreSQL queries is: > {code:java} > converter.visitRoot(rel).asStatement().toSqlString(PostgresqlSqlDialect.DEFAULT).getSql().replace('\n', > ''){code} > > However, these queries are not semantically equivalent. With a > counterexample below: > {code:sql} > --------DEPT---------- > CREATE TABLE DEPT ( > DEPTNO INTEGER PRIMARY KEY, > NAME VARCHAR(20) > ); > INSERT INTO DEPT VALUES (0,'0'); > INSERT INTO DEPT VALUES (1,'0'); > --------EMP---------- > CREATE TABLE EMP ( > EMPNO INTEGER PRIMARY KEY, > DEPTNO INTEGER REFERENCES DEPT(DEPTNO), > ENAME VARCHAR(20), > JOB VARCHAR(20), > MGR INTEGER, > HIREDATE DATE, > SAL INTEGER, > COMM INTEGER, > SLACKER BOOLEAN > ); > INSERT INTO EMP VALUES (0,0,'1','0',0,'2000-01-01',NULL,0,True); > INSERT INTO EMP VALUES (-1,1,'0','0',0,'2000-01-01',NULL,0,True);{code} > > we can have two queries produce different output: > {code:java} > avg | avg > -----+----- > | > (1 row) {code} > (output of the plan before optimization, with converted to PostgreSQL query) > > {code:java} > avg | int4 > -----+------ > | 0 > (1 row) > {code} > (output of the plan after optimization, with converted to PostgreSQL query) > -- This message was sent by Atlassian Jira (v8.20.10#820010)