[ 
https://issues.apache.org/jira/browse/CALCITE-7482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CALCITE-7482:
------------------------------------
    Labels: pull-request-available  (was: )

> Wrong variablesSet used when rewriting subquery in JOIN ON clause
> -----------------------------------------------------------------
>
>                 Key: CALCITE-7482
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7482
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.41.0
>            Reporter: Silun Dong
>            Assignee: Silun Dong
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.42.0
>
>
> {code:java}
> // query
> WITH t1(id, sal) as (VALUES (1, 10), (2, 20), (3, 30)), t2(id, sal) as 
> (VALUES (2, 20), (2, 200), (3, 30)), t3(id, sal) as (VALUES (3, 30), (4, 
> 40)), t4(id, sal) as (VALUES (2, 200), (5, 50))
> select * from 
> t1, 
> lateral(
>   select t2.id, t2.sal from t2 join t3 on exists(
>     select t4.id from t4 where t4.id=t1.id and t4.sal=t2.sal
>   )
> ); 
> // initial plan
> LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
>   LogicalCorrelate(correlation=[$cor0], joinType=[inner], 
> requiredColumns=[{0}])
>     LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
>     LogicalProject(ID=[$0], SAL=[$1])
>       LogicalJoin(condition=[EXISTS({
> LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1, $cor1.SAL))])
>   LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
> })], joinType=[inner], variablesSet=[[$cor1]])          <- variablesSet of 
> Join
>         LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
>         LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])
> // after removing subquery
> LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
>   LogicalCorrelate(correlation=[$cor0], joinType=[inner], 
> requiredColumns=[{0}])
>     LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
>     LogicalProject(ID=[$0], SAL=[$1])
>       LogicalProject(EXPR$0=[$0], EXPR$1=[$1], EXPR$00=[$3], EXPR$10=[$4])
>         LogicalJoin(condition=[true], joinType=[inner])
>           LogicalJoin(condition=[true], joinType=[inner], 
> variablesSet=[[$cor0, $cor1]])         <- here
>             LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
>             LogicalAggregate(group=[{0}])
>               LogicalProject(i=[true])
>                 LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1, 
> $cor1.SAL))])
>                   LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
>           LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])
> // final output
> +----+-----+-----+------+
> | ID | SAL | ID0 | SAL0 |
> +----+-----+-----+------+
> |  2 |  20 |   2 |   20 |
> |  2 |  20 |   2 |   20 |
> |  2 |  20 |   2 |  200 |
> |  2 |  20 |   2 |  200 |
> |  2 |  20 |   3 |   30 |
> |  2 |  20 |   3 |   30 |
> +----+-----+-----+------+
> // correct output (verified in PG)
> +----+-----+-----+------+
> | ID | SAL | ID0 | SAL0 |
> +----+-----+-----+------+
> |  2 |  20 |   2 |  200 |
> |  2 |  20 |   2 |  200 |
> +----+-----+-----+------+{code}
> The subquery should have been rewritten as a {{Correlate}} with 
> {{{}CorrelationId=$cor1{}}}, but it was rewritten as a {{Join}} instead; its 
> {{variablesSet}} contains two {{{}CorrelationId{}}}, which is clearly wrong.
> The root cause is that when rewriting a subquery within the ON clause, the 
> program detected the {{CorrelationId}} used in the subquery but did not 
> filter out the {{CorrelationId}} that did not belong only to the current 
> scope.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to