[jira] [Comment Edited] (CALCITE-5716) Two level nested correlated subquery translates to incorrect ON condition

Ruben Q L (Jira) Mon, 26 Jun 2023 09:16:05 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737204#comment-17737204
 ]


Ruben Q L edited comment on CALCITE-5716 at 6/26/23 4:15 PM:
-------------------------------------------------------------

After some discussion between [~libenchao] and myself, it was decided that, due 
to potential regression risks, CALCITE-5683 will NOT include a fix for the 
current issue (they are two separate, related problems), so I'm re-opening the 
current ticket.

The current issue could be reproduced by adding the following tests in 
sub-query.iq:
{code:sql}
select deptno from dept d1 where exists (
 select 1 from dept d2 where d2.deptno = d1.deptno and exists (
  select 1 from dept d3 where d3.dname = d1.dname));
+--------+
| DEPTNO |
+--------+
|     10 |
|     20 |
|     30 |
|     40 |
+--------+
(4 rows)

!ok

SELECT dept.deptno, (
  SELECT max(emp.empno)
  FROM emp
  WHERE empno = (SELECT max(empno) AS maxDept
                 FROM emp e2
                 WHERE e2.deptno = dept.deptno)
        AND emp.deptno = dept.deptno),
  dept.dname
FROM dept;
+--------+--------+------------+
| DEPTNO | EXPR$1 | DNAME      |
+--------+--------+------------+
|     10 |   7934 | ACCOUNTING |
|     20 |   7902 | RESEARCH   |
|     30 |   7900 | SALES      |
|     40 |        | OPERATIONS |
+--------+--------+------------+
(4 rows)

!ok
{code}

It seems the fix could simply be changing the following line in 
{{SubQueryRemoveRule#matchFilter}}:
{code:java}
if (!filterVariablesSet.isEmpty()) {
  variablesSet.retainAll(filterVariablesSet);
}
==>
variablesSet.retainAll(filterVariablesSet);
{code}

However, that could break downstream projects which may be creating Filter 
without correctly specifying their set of variables (see more info 
[here|https://github.com/apache/calcite/pull/3277#discussion_r1240592799] and 
[here|https://github.com/apache/calcite/pull/3193#discussion_r1204993079]); so 
for the moment (given the limited time before 1.35RC) we decided to NOT apply 
it. This needs further discussion.


was (Author: rubenql):
After some discussion between [~libenchao] and myself, it was decided that, due 
to potential regression risks, CALCITE-5683 will NOT include a fix for the 
current issue (they are two separate, related problems), so I'm re-opening the 
current ticket.

The current issue could be reproduced by adding the following tests in 
sub-query.iq:
{code:sql}
select deptno from dept d1 where exists (
 select 1 from dept d2 where d2.deptno = d1.deptno and exists (
  select 1 from dept d3 where d3.dname = d1.dname));
+--------+
| DEPTNO |
+--------+
|     10 |
|     20 |
|     30 |
|     40 |
+--------+
(4 rows)

!ok

SELECT dept.deptno, (
  SELECT max(emp.empno)
  FROM emp
  WHERE empno = (SELECT max(empno) AS maxDept
                 FROM emp e2
                 WHERE e2.deptno = dept.deptno)
        AND emp.deptno = dept.deptno),
  dept.dname
FROM dept;
+--------+--------+------------+
| DEPTNO | EXPR$1 | DNAME      |
+--------+--------+------------+
|     10 |   7934 | ACCOUNTING |
|     20 |   7902 | RESEARCH   |
|     30 |   7900 | SALES      |
|     40 |        | OPERATIONS |
+--------+--------+------------+
(4 rows)

!ok
{code}

The fix could simply be changing the following line in 
{{SubQueryRemoveRule#matchFilter}}:
{code:java}
if (!filterVariablesSet.isEmpty()) {
  variablesSet.retainAll(filterVariablesSet);
}
==>
variablesSet.retainAll(filterVariablesSet);
{code}

However, that could break downstream projects which may be creating Filter 
without correctly specifying their set of variables (see more info 
[here|https://github.com/apache/calcite/pull/3277#discussion_r1240592799] and 
[here|https://github.com/apache/calcite/pull/3193#discussion_r1204993079]); so 
for the moment (given the limited time before 1.35RC) we decided to NOT apply 
it. This needs further discussion.

> Two level nested correlated subquery translates to incorrect ON condition
> -------------------------------------------------------------------------
>
>                 Key: CALCITE-5716
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5716
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: John Wright
>            Priority: Major
>
> I have a query that goes through the Calcite Core decorrelation rules and 
> comes out with an ON condition that isn't correct and returns incorrect 
> results to a different engine.
>  
> {code:java}
>  SELECT summary.id,
>        (SELECT Max(detail.id)
>         FROM   detail
>         WHERE  date = (SELECT Max(date) AS maxDate
>                        FROM   detail detail2
>                        WHERE  detail2.summary_id = summary.id)
>                AND detail.summary_id = summary.id) AS Detail_With_MaxDate,
>        summary.data
> FROM   summary  {code}
> translates to:
> {code:java}
> select summary.ID AS id, t3.SINGLE_DETAIL AS Detail_With_MaxDate, 
> summary.data AS data
> from summary
> left join (select t2.summary_id, MAX(t2.ID) AS SINGLE_DETAIL
>     FROM (select t.summary_id, t.ID
>         FROM (select *
>             FROM detail
>             WHERE summary_id IS NOT NULL) AS t
>             INNER JOIN (select summary_id, MAX(date) AS maxDate
>                         FROM detail
>                         WHERE summary_id IS NOT NULL
>                         GROUP BY summary_id) AS t1 
>             ON t.ID = t1.summary_id
>             WHERE t.date = t1.maxDate) AS t2
>     INNER JOIN summary AS summary0 ON t2.summary_id = summary0.ID
>     GROUP BY t2.summary_id) AS t3 
> ON summary.ID = t3.summary_id {code}
> where the condition `ON t.ID = t1.summary_id` doesn't make any sense.
> [http://sqlfiddle.com/#!9/5758a6/3]
> includes testdata and the incorrect results (null for the one column).
> When I manually decorrelate the query I can get the correct answer:
> [http://sqlfiddle.com/#!9/5758a6/6]
> {code:java}
> SELECT summary.id,
>        extraction.single_detail AS Detail_With_MaxDate,
>        summary.data
> FROM   summary
>        JOIN (SELECT detail.summary_id,
>                     Max(detail.id) AS SINGLE_DETAIL
>              FROM   detail
>                     JOIN (SELECT summary_id,
>                                  Max(date) AS maxDate
>                           FROM   detail
>                           GROUP  BY summary_id) maxDates
>                       ON detail.summary_id = maxDates.summary_id
>              WHERE  detail.date = maxDates.maxdate
>              GROUP  BY detail.summary_id) extraction
>          ON summary.id = extraction.summary_id  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-5716) Two level nested correlated subquery translates to incorrect ON condition

Reply via email to