[
https://issues.apache.org/jira/browse/CALCITE-7482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated CALCITE-7482:
------------------------------------
Labels: pull-request-available (was: )
> Wrong variablesSet used when rewriting subquery in JOIN ON clause
> -----------------------------------------------------------------
>
> Key: CALCITE-7482
> URL: https://issues.apache.org/jira/browse/CALCITE-7482
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.41.0
> Reporter: Silun Dong
> Assignee: Silun Dong
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> {code:java}
> // query
> WITH t1(id, sal) as (VALUES (1, 10), (2, 20), (3, 30)), t2(id, sal) as
> (VALUES (2, 20), (2, 200), (3, 30)), t3(id, sal) as (VALUES (3, 30), (4,
> 40)), t4(id, sal) as (VALUES (2, 200), (5, 50))
> select * from
> t1,
> lateral(
> select t2.id, t2.sal from t2 join t3 on exists(
> select t4.id from t4 where t4.id=t1.id and t4.sal=t2.sal
> )
> );
> // initial plan
> LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
> LogicalCorrelate(correlation=[$cor0], joinType=[inner],
> requiredColumns=[{0}])
> LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
> LogicalProject(ID=[$0], SAL=[$1])
> LogicalJoin(condition=[EXISTS({
> LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1, $cor1.SAL))])
> LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
> })], joinType=[inner], variablesSet=[[$cor1]]) <- variablesSet of
> Join
> LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
> LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])
> // after removing subquery
> LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
> LogicalCorrelate(correlation=[$cor0], joinType=[inner],
> requiredColumns=[{0}])
> LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
> LogicalProject(ID=[$0], SAL=[$1])
> LogicalProject(EXPR$0=[$0], EXPR$1=[$1], EXPR$00=[$3], EXPR$10=[$4])
> LogicalJoin(condition=[true], joinType=[inner])
> LogicalJoin(condition=[true], joinType=[inner],
> variablesSet=[[$cor0, $cor1]]) <- here
> LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
> LogicalAggregate(group=[{0}])
> LogicalProject(i=[true])
> LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1,
> $cor1.SAL))])
> LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
> LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])
> // final output
> +----+-----+-----+------+
> | ID | SAL | ID0 | SAL0 |
> +----+-----+-----+------+
> | 2 | 20 | 2 | 20 |
> | 2 | 20 | 2 | 20 |
> | 2 | 20 | 2 | 200 |
> | 2 | 20 | 2 | 200 |
> | 2 | 20 | 3 | 30 |
> | 2 | 20 | 3 | 30 |
> +----+-----+-----+------+
> // correct output (verified in PG)
> +----+-----+-----+------+
> | ID | SAL | ID0 | SAL0 |
> +----+-----+-----+------+
> | 2 | 20 | 2 | 200 |
> | 2 | 20 | 2 | 200 |
> +----+-----+-----+------+{code}
> The subquery should have been rewritten as a {{Correlate}} with
> {{{}CorrelationId=$cor1{}}}, but it was rewritten as a {{Join}} instead; its
> {{variablesSet}} contains two {{{}CorrelationId{}}}, which is clearly wrong.
> The root cause is that when rewriting a subquery within the ON clause, the
> program detected the {{CorrelationId}} used in the subquery but did not
> filter out the {{CorrelationId}} that did not belong only to the current
> scope.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)