[ 
https://issues.apache.org/jira/browse/CALCITE-7482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Silun Dong updated CALCITE-7482:
--------------------------------
    Description: 
{code:java}
// query
WITH t1(id, sal) as (VALUES (1, 10), (2, 20), (3, 30)), t2(id, sal) as (VALUES 
(2, 20), (2, 200), (3, 30)), t3(id, sal) as (VALUES (3, 30), (4, 40)), t4(id, 
sal) as (VALUES (2, 200), (5, 50))
select * from 
t1, 
lateral(
  select t2.id, t2.sal from t2 join t3 on exists(
    select t4.id from t4 where t4.id=t1.id and t4.sal=t2.sal
  )
); 

// initial plan
LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
  LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0}])
    LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
    LogicalProject(ID=[$0], SAL=[$1])
      LogicalJoin(condition=[EXISTS({
LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1, $cor1.SAL))])
  LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
})], joinType=[inner], variablesSet=[[$cor1]])          <- variablesSet of Join
        LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
        LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])

// after removing subquery
LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
  LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0}])
    LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
    LogicalProject(ID=[$0], SAL=[$1])
      LogicalProject(EXPR$0=[$0], EXPR$1=[$1], EXPR$00=[$3], EXPR$10=[$4])
        LogicalJoin(condition=[true], joinType=[inner])
          LogicalJoin(condition=[true], joinType=[inner], variablesSet=[[$cor0, 
$cor1]])         <- here
            LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
            LogicalAggregate(group=[{0}])
              LogicalProject(i=[true])
                LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1, 
$cor1.SAL))])
                  LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
          LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])

// final output
+----+-----+-----+------+
| ID | SAL | ID0 | SAL0 |
+----+-----+-----+------+
|  2 |  20 |   2 |   20 |
|  2 |  20 |   2 |   20 |
|  2 |  20 |   2 |  200 |
|  2 |  20 |   2 |  200 |
|  2 |  20 |   3 |   30 |
|  2 |  20 |   3 |   30 |
+----+-----+-----+------+

// correct output (verified in PG)
+----+-----+-----+------+
| ID | SAL | ID0 | SAL0 |
+----+-----+-----+------+
|  2 |  20 |   2 |  200 |
|  2 |  20 |   2 |  200 |
+----+-----+-----+------+{code}
The subquery should have been rewritten as a {{Correlate}} with 
{{{}CorrelationId=$cor1{}}}, but it was rewritten as a {{Join}} instead; its 
{{variablesSet}} contains two {{{}CorrelationId{}}}, which is clearly wrong.

The root cause is that when rewriting a subquery within the ON clause, the 
program detected the {{CorrelationId}} used in the subquery but did not filter 
out the {{CorrelationId}} that did not belong only to the current scope.

 

  was:
 
{code:java}
// query
WITH t1(id, sal) as (VALUES (1, 10), (2, 20), (3, 30)), t2(id, sal) as (VALUES 
(2, 20), (2, 200), (3, 30)), t3(id, sal) as (VALUES (3, 30), (4, 40)), t4(id, 
sal) as (VALUES (2, 200), (5, 50))
select * from 
t1, 
lateral(
  select t2.id, t2.sal from t2 join t3 on exists(
    select t4.id from t4 where t4.id=t1.id and t4.sal=t2.sal
  )
); 

// initial plan
LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
  LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0}])
    LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
    LogicalProject(ID=[$0], SAL=[$1])
      LogicalJoin(condition=[EXISTS({
LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1, $cor1.SAL))])
  LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
})], joinType=[inner], variablesSet=[[$cor1]])          <- variablesSet of Join
        LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
        LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])

// after removing subquery
LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
  LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0}])
    LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
    LogicalProject(ID=[$0], SAL=[$1])
      LogicalProject(EXPR$0=[$0], EXPR$1=[$1], EXPR$00=[$3], EXPR$10=[$4])
        LogicalJoin(condition=[true], joinType=[inner])
          LogicalJoin(condition=[true], joinType=[inner], variablesSet=[[$cor0, 
$cor1]])         <- here
            LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
            LogicalAggregate(group=[{0}])
              LogicalProject(i=[true])
                LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1, 
$cor1.SAL))])
                  LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
          LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])

// final output
+----+-----+-----+------+
| ID | SAL | ID0 | SAL0 |
+----+-----+-----+------+
|  2 |  20 |   2 |   20 |
|  2 |  20 |   2 |   20 |
|  2 |  20 |   2 |  200 |
|  2 |  20 |   2 |  200 |
|  2 |  20 |   3 |   30 |
|  2 |  20 |   3 |   30 |
+----+-----+-----+------+

// correct output (verified in PG)
+----+-----+-----+------+
| ID | SAL | ID0 | SAL0 |
+----+-----+-----+------+
|  2 |  20 |   2 |  200 |
|  2 |  20 |   2 |  200 |
+----+-----+-----+------+{code}
The subquery should have been rewritten as a {{Correlate}} with 
{{{}CorrelationId=$cor1{}}}, but it was rewritten as a {{Join}} instead; its 
{{variablesSet}} contains two {{{}CorrelationId{}}}, which is clearly wrong.

The root cause is that when rewriting a subquery within the ON clause, the 
program detected the {{CorrelationId}} used in the subquery but did not filter 
out the {{CorrelationId}} that did not belong only to the current scope.

 


> Wrong variablesSet used when rewriting subquery in JOIN ON clause
> -----------------------------------------------------------------
>
>                 Key: CALCITE-7482
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7482
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.41.0
>            Reporter: Silun Dong
>            Assignee: Silun Dong
>            Priority: Major
>             Fix For: 1.42.0
>
>
> {code:java}
> // query
> WITH t1(id, sal) as (VALUES (1, 10), (2, 20), (3, 30)), t2(id, sal) as 
> (VALUES (2, 20), (2, 200), (3, 30)), t3(id, sal) as (VALUES (3, 30), (4, 
> 40)), t4(id, sal) as (VALUES (2, 200), (5, 50))
> select * from 
> t1, 
> lateral(
>   select t2.id, t2.sal from t2 join t3 on exists(
>     select t4.id from t4 where t4.id=t1.id and t4.sal=t2.sal
>   )
> ); 
> // initial plan
> LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
>   LogicalCorrelate(correlation=[$cor0], joinType=[inner], 
> requiredColumns=[{0}])
>     LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
>     LogicalProject(ID=[$0], SAL=[$1])
>       LogicalJoin(condition=[EXISTS({
> LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1, $cor1.SAL))])
>   LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
> })], joinType=[inner], variablesSet=[[$cor1]])          <- variablesSet of 
> Join
>         LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
>         LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])
> // after removing subquery
> LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
>   LogicalCorrelate(correlation=[$cor0], joinType=[inner], 
> requiredColumns=[{0}])
>     LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
>     LogicalProject(ID=[$0], SAL=[$1])
>       LogicalProject(EXPR$0=[$0], EXPR$1=[$1], EXPR$00=[$3], EXPR$10=[$4])
>         LogicalJoin(condition=[true], joinType=[inner])
>           LogicalJoin(condition=[true], joinType=[inner], 
> variablesSet=[[$cor0, $cor1]])         <- here
>             LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
>             LogicalAggregate(group=[{0}])
>               LogicalProject(i=[true])
>                 LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1, 
> $cor1.SAL))])
>                   LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
>           LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])
> // final output
> +----+-----+-----+------+
> | ID | SAL | ID0 | SAL0 |
> +----+-----+-----+------+
> |  2 |  20 |   2 |   20 |
> |  2 |  20 |   2 |   20 |
> |  2 |  20 |   2 |  200 |
> |  2 |  20 |   2 |  200 |
> |  2 |  20 |   3 |   30 |
> |  2 |  20 |   3 |   30 |
> +----+-----+-----+------+
> // correct output (verified in PG)
> +----+-----+-----+------+
> | ID | SAL | ID0 | SAL0 |
> +----+-----+-----+------+
> |  2 |  20 |   2 |  200 |
> |  2 |  20 |   2 |  200 |
> +----+-----+-----+------+{code}
> The subquery should have been rewritten as a {{Correlate}} with 
> {{{}CorrelationId=$cor1{}}}, but it was rewritten as a {{Join}} instead; its 
> {{variablesSet}} contains two {{{}CorrelationId{}}}, which is clearly wrong.
> The root cause is that when rewriting a subquery within the ON clause, the 
> program detected the {{CorrelationId}} used in the subquery but did not 
> filter out the {{CorrelationId}} that did not belong only to the current 
> scope.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to