[
https://issues.apache.org/jira/browse/CALCITE-7482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Silun Dong updated CALCITE-7482:
--------------------------------
Description:
{code:java}
// query
WITH t1(id, sal) as (VALUES (1, 10), (2, 20), (3, 30)), t2(id, sal) as (VALUES
(2, 20), (2, 200), (3, 30)), t3(id, sal) as (VALUES (3, 30), (4, 40)), t4(id,
sal) as (VALUES (2, 200), (5, 50))
select * from
t1,
lateral(
select t2.id, t2.sal from t2 join t3 on exists(
select t4.id from t4 where t4.id=t1.id and t4.sal=t2.sal
)
);
// initial plan
LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0}])
LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
LogicalProject(ID=[$0], SAL=[$1])
LogicalJoin(condition=[EXISTS({
LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1, $cor1.SAL))])
LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
})], joinType=[inner], variablesSet=[[$cor1]]) <- variablesSet of Join
LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])
// after removing subquery
LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0}])
LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
LogicalProject(ID=[$0], SAL=[$1])
LogicalProject(EXPR$0=[$0], EXPR$1=[$1], EXPR$00=[$3], EXPR$10=[$4])
LogicalJoin(condition=[true], joinType=[inner])
LogicalJoin(condition=[true], joinType=[inner], variablesSet=[[$cor0,
$cor1]]) <- here
LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
LogicalAggregate(group=[{0}])
LogicalProject(i=[true])
LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1,
$cor1.SAL))])
LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])
// final output
+----+-----+-----+------+
| ID | SAL | ID0 | SAL0 |
+----+-----+-----+------+
| 2 | 20 | 2 | 20 |
| 2 | 20 | 2 | 20 |
| 2 | 20 | 2 | 200 |
| 2 | 20 | 2 | 200 |
| 2 | 20 | 3 | 30 |
| 2 | 20 | 3 | 30 |
+----+-----+-----+------+
// correct output (verified in PG)
+----+-----+-----+------+
| ID | SAL | ID0 | SAL0 |
+----+-----+-----+------+
| 2 | 20 | 2 | 200 |
| 2 | 20 | 2 | 200 |
+----+-----+-----+------+{code}
The subquery should have been rewritten as a {{Correlate}} with
{{{}CorrelationId=$cor1{}}}, but it was rewritten as a {{Join}} instead; its
{{variablesSet}} contains two {{{}CorrelationId{}}}, which is clearly wrong.
The root cause is that when rewriting a subquery within the ON clause, the
program detected the {{CorrelationId}} used in the subquery but did not filter
out the {{CorrelationId}} that did not belong only to the current scope.
was:
{code:java}
// query
WITH t1(id, sal) as (VALUES (1, 10), (2, 20), (3, 30)), t2(id, sal) as (VALUES
(2, 20), (2, 200), (3, 30)), t3(id, sal) as (VALUES (3, 30), (4, 40)), t4(id,
sal) as (VALUES (2, 200), (5, 50))
select * from
t1,
lateral(
select t2.id, t2.sal from t2 join t3 on exists(
select t4.id from t4 where t4.id=t1.id and t4.sal=t2.sal
)
);
// initial plan
LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0}])
LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
LogicalProject(ID=[$0], SAL=[$1])
LogicalJoin(condition=[EXISTS({
LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1, $cor1.SAL))])
LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
})], joinType=[inner], variablesSet=[[$cor1]]) <- variablesSet of Join
LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])
// after removing subquery
LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0}])
LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
LogicalProject(ID=[$0], SAL=[$1])
LogicalProject(EXPR$0=[$0], EXPR$1=[$1], EXPR$00=[$3], EXPR$10=[$4])
LogicalJoin(condition=[true], joinType=[inner])
LogicalJoin(condition=[true], joinType=[inner], variablesSet=[[$cor0,
$cor1]]) <- here
LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
LogicalAggregate(group=[{0}])
LogicalProject(i=[true])
LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1,
$cor1.SAL))])
LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])
// final output
+----+-----+-----+------+
| ID | SAL | ID0 | SAL0 |
+----+-----+-----+------+
| 2 | 20 | 2 | 20 |
| 2 | 20 | 2 | 20 |
| 2 | 20 | 2 | 200 |
| 2 | 20 | 2 | 200 |
| 2 | 20 | 3 | 30 |
| 2 | 20 | 3 | 30 |
+----+-----+-----+------+
// correct output (verified in PG)
+----+-----+-----+------+
| ID | SAL | ID0 | SAL0 |
+----+-----+-----+------+
| 2 | 20 | 2 | 200 |
| 2 | 20 | 2 | 200 |
+----+-----+-----+------+{code}
The subquery should have been rewritten as a {{Correlate}} with
{{{}CorrelationId=$cor1{}}}, but it was rewritten as a {{Join}} instead; its
{{variablesSet}} contains two {{{}CorrelationId{}}}, which is clearly wrong.
The root cause is that when rewriting a subquery within the ON clause, the
program detected the {{CorrelationId}} used in the subquery but did not filter
out the {{CorrelationId}} that did not belong only to the current scope.
> Wrong variablesSet used when rewriting subquery in JOIN ON clause
> -----------------------------------------------------------------
>
> Key: CALCITE-7482
> URL: https://issues.apache.org/jira/browse/CALCITE-7482
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.41.0
> Reporter: Silun Dong
> Assignee: Silun Dong
> Priority: Major
> Fix For: 1.42.0
>
>
> {code:java}
> // query
> WITH t1(id, sal) as (VALUES (1, 10), (2, 20), (3, 30)), t2(id, sal) as
> (VALUES (2, 20), (2, 200), (3, 30)), t3(id, sal) as (VALUES (3, 30), (4,
> 40)), t4(id, sal) as (VALUES (2, 200), (5, 50))
> select * from
> t1,
> lateral(
> select t2.id, t2.sal from t2 join t3 on exists(
> select t4.id from t4 where t4.id=t1.id and t4.sal=t2.sal
> )
> );
> // initial plan
> LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
> LogicalCorrelate(correlation=[$cor0], joinType=[inner],
> requiredColumns=[{0}])
> LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
> LogicalProject(ID=[$0], SAL=[$1])
> LogicalJoin(condition=[EXISTS({
> LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1, $cor1.SAL))])
> LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
> })], joinType=[inner], variablesSet=[[$cor1]]) <- variablesSet of
> Join
> LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
> LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])
> // after removing subquery
> LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
> LogicalCorrelate(correlation=[$cor0], joinType=[inner],
> requiredColumns=[{0}])
> LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
> LogicalProject(ID=[$0], SAL=[$1])
> LogicalProject(EXPR$0=[$0], EXPR$1=[$1], EXPR$00=[$3], EXPR$10=[$4])
> LogicalJoin(condition=[true], joinType=[inner])
> LogicalJoin(condition=[true], joinType=[inner],
> variablesSet=[[$cor0, $cor1]]) <- here
> LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
> LogicalAggregate(group=[{0}])
> LogicalProject(i=[true])
> LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1,
> $cor1.SAL))])
> LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
> LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])
> // final output
> +----+-----+-----+------+
> | ID | SAL | ID0 | SAL0 |
> +----+-----+-----+------+
> | 2 | 20 | 2 | 20 |
> | 2 | 20 | 2 | 20 |
> | 2 | 20 | 2 | 200 |
> | 2 | 20 | 2 | 200 |
> | 2 | 20 | 3 | 30 |
> | 2 | 20 | 3 | 30 |
> +----+-----+-----+------+
> // correct output (verified in PG)
> +----+-----+-----+------+
> | ID | SAL | ID0 | SAL0 |
> +----+-----+-----+------+
> | 2 | 20 | 2 | 200 |
> | 2 | 20 | 2 | 200 |
> +----+-----+-----+------+{code}
> The subquery should have been rewritten as a {{Correlate}} with
> {{{}CorrelationId=$cor1{}}}, but it was rewritten as a {{Join}} instead; its
> {{variablesSet}} contains two {{{}CorrelationId{}}}, which is clearly wrong.
> The root cause is that when rewriting a subquery within the ON clause, the
> program detected the {{CorrelationId}} used in the subquery but did not
> filter out the {{CorrelationId}} that did not belong only to the current
> scope.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)