[ https://issues.apache.org/jira/browse/CALCITE-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737204#comment-17737204 ]
Ruben Q L edited comment on CALCITE-5716 at 6/26/23 4:15 PM: ------------------------------------------------------------- After some discussion between [~libenchao] and myself, it was decided that, due to potential regression risks, CALCITE-5683 will NOT include a fix for the current issue (they are two separate, related problems), so I'm re-opening the current ticket. The current issue could be reproduced by adding the following tests in sub-query.iq: {code:sql} select deptno from dept d1 where exists ( select 1 from dept d2 where d2.deptno = d1.deptno and exists ( select 1 from dept d3 where d3.dname = d1.dname)); +--------+ | DEPTNO | +--------+ | 10 | | 20 | | 30 | | 40 | +--------+ (4 rows) !ok SELECT dept.deptno, ( SELECT max(emp.empno) FROM emp WHERE empno = (SELECT max(empno) AS maxDept FROM emp e2 WHERE e2.deptno = dept.deptno) AND emp.deptno = dept.deptno), dept.dname FROM dept; +--------+--------+------------+ | DEPTNO | EXPR$1 | DNAME | +--------+--------+------------+ | 10 | 7934 | ACCOUNTING | | 20 | 7902 | RESEARCH | | 30 | 7900 | SALES | | 40 | | OPERATIONS | +--------+--------+------------+ (4 rows) !ok {code} It seems the fix could simply be changing the following line in {{SubQueryRemoveRule#matchFilter}}: {code:java} if (!filterVariablesSet.isEmpty()) { variablesSet.retainAll(filterVariablesSet); } ==> variablesSet.retainAll(filterVariablesSet); {code} However, that could break downstream projects which may be creating Filter without correctly specifying their set of variables (see more info [here|https://github.com/apache/calcite/pull/3277#discussion_r1240592799] and [here|https://github.com/apache/calcite/pull/3193#discussion_r1204993079]); so for the moment (given the limited time before 1.35RC) we decided to NOT apply it. This needs further discussion. was (Author: rubenql): After some discussion between [~libenchao] and myself, it was decided that, due to potential regression risks, CALCITE-5683 will NOT include a fix for the current issue (they are two separate, related problems), so I'm re-opening the current ticket. The current issue could be reproduced by adding the following tests in sub-query.iq: {code:sql} select deptno from dept d1 where exists ( select 1 from dept d2 where d2.deptno = d1.deptno and exists ( select 1 from dept d3 where d3.dname = d1.dname)); +--------+ | DEPTNO | +--------+ | 10 | | 20 | | 30 | | 40 | +--------+ (4 rows) !ok SELECT dept.deptno, ( SELECT max(emp.empno) FROM emp WHERE empno = (SELECT max(empno) AS maxDept FROM emp e2 WHERE e2.deptno = dept.deptno) AND emp.deptno = dept.deptno), dept.dname FROM dept; +--------+--------+------------+ | DEPTNO | EXPR$1 | DNAME | +--------+--------+------------+ | 10 | 7934 | ACCOUNTING | | 20 | 7902 | RESEARCH | | 30 | 7900 | SALES | | 40 | | OPERATIONS | +--------+--------+------------+ (4 rows) !ok {code} The fix could simply be changing the following line in {{SubQueryRemoveRule#matchFilter}}: {code:java} if (!filterVariablesSet.isEmpty()) { variablesSet.retainAll(filterVariablesSet); } ==> variablesSet.retainAll(filterVariablesSet); {code} However, that could break downstream projects which may be creating Filter without correctly specifying their set of variables (see more info [here|https://github.com/apache/calcite/pull/3277#discussion_r1240592799] and [here|https://github.com/apache/calcite/pull/3193#discussion_r1204993079]); so for the moment (given the limited time before 1.35RC) we decided to NOT apply it. This needs further discussion. > Two level nested correlated subquery translates to incorrect ON condition > ------------------------------------------------------------------------- > > Key: CALCITE-5716 > URL: https://issues.apache.org/jira/browse/CALCITE-5716 > Project: Calcite > Issue Type: Improvement > Reporter: John Wright > Priority: Major > > I have a query that goes through the Calcite Core decorrelation rules and > comes out with an ON condition that isn't correct and returns incorrect > results to a different engine. > > {code:java} > SELECT summary.id, > (SELECT Max(detail.id) > FROM detail > WHERE date = (SELECT Max(date) AS maxDate > FROM detail detail2 > WHERE detail2.summary_id = summary.id) > AND detail.summary_id = summary.id) AS Detail_With_MaxDate, > summary.data > FROM summary {code} > translates to: > {code:java} > select summary.ID AS id, t3.SINGLE_DETAIL AS Detail_With_MaxDate, > summary.data AS data > from summary > left join (select t2.summary_id, MAX(t2.ID) AS SINGLE_DETAIL > FROM (select t.summary_id, t.ID > FROM (select * > FROM detail > WHERE summary_id IS NOT NULL) AS t > INNER JOIN (select summary_id, MAX(date) AS maxDate > FROM detail > WHERE summary_id IS NOT NULL > GROUP BY summary_id) AS t1 > ON t.ID = t1.summary_id > WHERE t.date = t1.maxDate) AS t2 > INNER JOIN summary AS summary0 ON t2.summary_id = summary0.ID > GROUP BY t2.summary_id) AS t3 > ON summary.ID = t3.summary_id {code} > where the condition `ON t.ID = t1.summary_id` doesn't make any sense. > [http://sqlfiddle.com/#!9/5758a6/3] > includes testdata and the incorrect results (null for the one column). > When I manually decorrelate the query I can get the correct answer: > [http://sqlfiddle.com/#!9/5758a6/6] > {code:java} > SELECT summary.id, > extraction.single_detail AS Detail_With_MaxDate, > summary.data > FROM summary > JOIN (SELECT detail.summary_id, > Max(detail.id) AS SINGLE_DETAIL > FROM detail > JOIN (SELECT summary_id, > Max(date) AS maxDate > FROM detail > GROUP BY summary_id) maxDates > ON detail.summary_id = maxDates.summary_id > WHERE detail.date = maxDates.maxdate > GROUP BY detail.summary_id) extraction > ON summary.id = extraction.summary_id {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)