Silun Dong created CALCITE-7482:
-----------------------------------
Summary: Wrong variablesSet used when rewriting subquery in JOIN
ON clause
Key: CALCITE-7482
URL: https://issues.apache.org/jira/browse/CALCITE-7482
Project: Calcite
Issue Type: Bug
Components: core
Affects Versions: 1.41.0
Reporter: Silun Dong
Assignee: Silun Dong
Fix For: 1.42.0
{code:java}
// query
WITH t1(id, sal) as (VALUES (1, 10), (2, 20), (3, 30)), t2(id, sal) as (VALUES
(2, 20), (2, 200), (3, 30)), t3(id, sal) as (VALUES (3, 30), (4, 40)), t4(id,
sal) as (VALUES (2, 200), (5, 50))
select * from
t1,
lateral(
select t2.id, t2.sal from t2 join t3 on exists(
select t4.id from t4 where t4.id=t1.id and t4.sal=t2.sal
)
);
// initial plan
LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0}])
LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
LogicalProject(ID=[$0], SAL=[$1])
LogicalJoin(condition=[EXISTS({
LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1, $cor1.SAL))])
LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
})], joinType=[inner], variablesSet=[[$cor1]]) <- variablesSet of Join
LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])
// after removing subquery
LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0}])
LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
LogicalProject(ID=[$0], SAL=[$1])
LogicalProject(EXPR$0=[$0], EXPR$1=[$1], EXPR$00=[$3], EXPR$10=[$4])
LogicalJoin(condition=[true], joinType=[inner])
LogicalJoin(condition=[true], joinType=[inner], variablesSet=[[$cor0,
$cor1]]) <- here
LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
LogicalAggregate(group=[{0}])
LogicalProject(i=[true])
LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1,
$cor1.SAL))])
LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])
// final output
+----+-----+-----+------+
| ID | SAL | ID0 | SAL0 |
+----+-----+-----+------+
| 2 | 20 | 2 | 20 |
| 2 | 20 | 2 | 20 |
| 2 | 20 | 2 | 200 |
| 2 | 20 | 2 | 200 |
| 2 | 20 | 3 | 30 |
| 2 | 20 | 3 | 30 |
+----+-----+-----+------+
// correct output (verified in PG)
+----+-----+-----+------+
| ID | SAL | ID0 | SAL0 |
+----+-----+-----+------+
| 2 | 20 | 2 | 200 |
| 2 | 20 | 2 | 200 |
+----+-----+-----+------+{code}
The subquery should have been rewritten as a {{Correlate}} with
{{{}CorrelationId=$cor1{}}}, but it was rewritten as a {{Join}} instead; its
{{variablesSet}} contains two {{{}CorrelationId{}}}, which is clearly wrong.
The root cause is that when rewriting a subquery within the ON clause, the
program detected the {{CorrelationId}} used in the subquery but did not filter
out the {{CorrelationId}} that did not belong only to the current scope.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)