Ian Bertolacci created CALCITE-6504:
---------------------------------------
Summary: JOIN_SUB_QUERY_TO_CORRELATE/Join SubQueryRemoveRule
produces incorrect tree when using correlated sub-query in on clause of
equi-join
Key: CALCITE-6504
URL: https://issues.apache.org/jira/browse/CALCITE-6504
Project: Calcite
Issue Type: Bug
Reporter: Ian Bertolacci
JOIN_SUB_QUERY_TO_CORRELATE/Join SubQueryRemoveRule produces incorrect trees
when using a correlated sub-query as part of an equi-join condition of a join.
For example:
{code:sql}
select * from T3 join T2 on (select max(id) from T1 where T3.C301 = T1.id) =
T2.C201
{code}
Has the initial tree:
{code:none}
20:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5],
C203=[$6], C204=[$7])
└── 19:Join(condition=[=(SCALAR_SUBQUERY{
│ 18:Aggregate(group=[{}], EXPR$0=[MAX($0)])
│ └── 17:Project(ID=[$0])
│ └── 16:Filter(condition=[=($cor0.C301, $0)])
│ └── 15:TableScan(table=[T1], Schema=[ID:Dimension,
C101:Decimal(0)])
│}, $4)], joinType=[inner])
├── 13:TableScan(table=[T3], Schema=[ID:Dimension, C301:Dimension,
C302:Dimension])
└── 14:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension,
C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
{code}
Using only the JOIN_SUB_QUERY_TO_CORRELATE program as a hep program, this is
the resulting tree:
{code}
25:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5],
C203=[$6], C204=[$7])
└── 41:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5],
C203=[$6], C204=[$7])
└── 39:Join(condition=[=($8, $4)], joinType=[inner])
├── 13:TableScan(table=[T3], Schema=[ID:Dimension, C301:Dimension,
C302:Dimension])
└── 37:Correlate(correlation=[$cor0], joinType=[left],
requiredColumns=[{1}])
├── 14:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension,
C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
└── 35:Aggregate(group=[{}], EXPR$0=[MAX($0)])
└── 33:Project(ID=[$0])
└── 31:Filter(condition=[=($cor0.C301, $0)])
└── 15:TableScan(table=[T1], Schema=[ID:Dimension,
C101:Decimal(0)])
{code}
Notice that the original correlation expression is between T1 and T3, but the
rule as created a correlate between T1 and T2.
I would have expected this tree:
{code:none}
25:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5],
C203=[$6], C204=[$7])
└── 41:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5],
C203=[$6], C204=[$7])
└── 39:Join(condition=[=($3, $4)], joinType=[inner])
├── 37:Correlate(correlation=[$cor0], joinType=[left],
requiredColumns=[{1}])
│ ├── 13:TableScan(table=[T3], Schema=[ID:Dimension, C301:Dimension,
C302:Dimension])
│ └── 35:Aggregate(group=[{}], EXPR$0=[MAX($0)])
│ └── 33:Project(ID=[$0])
│ └── 31:Filter(condition=[=($cor0.C301, $0)])
│ └── 15:TableScan(table=[T1], Schema=[ID:Dimension,
C101:Decimal(0)])
└── 14:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension,
C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
{code}
If you swap the sides of the join (making {{`T2 join T3`}}), you do get the
correct association, but the correlate has an invalid `requiredColumn`:
{code:none}
25:Project(ID=[$0], C201=[$1], C202=[$2], C203=[$3], C204=[$4], ID0=[$5],
C301=[$6], C302=[$7])
└── 41:Project(ID=[$0], C201=[$1], C202=[$2], C203=[$3], C204=[$4], ID0=[$5],
C301=[$6], C302=[$7])
└── 39:Join(condition=[=($8, $1)], joinType=[inner])
├── 13:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension,
C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
└── 37:Correlate(correlation=[$cor0], joinType=[left],
requiredColumns=[{6}])
├── 14:TableScan(table=[T3], Schema=[ID:Dimension, C301:Dimension,
C302:Dimension])
└── 35:Aggregate(group=[{}], EXPR$0=[MAX($0)])
└── 33:Project(ID=[$0])
└── 31:Filter(condition=[=($cor0.C301, $0)])
└── 15:TableScan(table=[T1], Schema=[ID:Dimension,
C101:Decimal(0)])
{code}
Here `requiredColumn` should be 1 (pointing to `C301`) but is actually 6, which
is where `C301` would be after the join {{`T2 join T3`}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)