[ 
https://issues.apache.org/jira/browse/CALCITE-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Bertolacci updated CALCITE-6504:
------------------------------------
    Description: 
JOIN_SUB_QUERY_TO_CORRELATE/Join SubQueryRemoveRule produces incorrect trees 
when using a correlated sub-query as part of an equi-join condition of a join.

For example:
{code:sql}
select * from T3 join T2 on (select max(id) from T1 where T3.C301 = T1.id) = 
T2.C201
{code}
Has the initial tree:
{code:none}
20:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5], 
C203=[$6], C204=[$7])
└── 19:Join(condition=[=(SCALAR_SUBQUERY{
    │    18:Aggregate(group=[{}], EXPR$0=[MAX($0)])
    │    └── 17:Project(ID=[$0])
    │        └── 16:Filter(condition=[=($cor0.C301, $0)])
    │            └── 15:TableScan(table=[T1], Schema=[ID:Dimension, 
C101:Decimal(0)])
    │}, $4)], joinType=[inner])
    ├── 13:TableScan(table=[T3], Schema=[ID:Dimension, C301:Dimension, 
C302:Dimension])
    └── 14:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension, 
C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
{code}
Using only the JOIN_SUB_QUERY_TO_CORRELATE rules in a hep program in planning, 
this is the resulting tree:
{code:java}
25:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5], 
C203=[$6], C204=[$7])
└── 41:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5], 
C203=[$6], C204=[$7])
    └── 39:Join(condition=[=($8, $4)], joinType=[inner])
        ├── 13:TableScan(table=[T3], Schema=[ID:Dimension, C301:Dimension, 
C302:Dimension])
        └── 37:Correlate(correlation=[$cor0], joinType=[left], 
requiredColumns=[{1}])
            ├── 14:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension, 
C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
            └── 35:Aggregate(group=[{}], EXPR$0=[MAX($0)])
                └── 33:Project(ID=[$0])
                    └── 31:Filter(condition=[=($cor0.C301, $0)])
                        └── 15:TableScan(table=[T1], Schema=[ID:Dimension, 
C101:Decimal(0)])
{code}
Notice that the original correlation expression is between T1 and T3, but the 
rule has created a correlate between T1 and T2.

I would have expected this tree:
{code:none}
25:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5], 
C203=[$6], C204=[$7])
└── 41:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5], 
C203=[$6], C204=[$7])
    └── 39:Join(condition=[=($3, $4)], joinType=[inner])
        ├── 37:Correlate(correlation=[$cor0], joinType=[left], 
requiredColumns=[{1}])
        │   ├── 13:TableScan(table=[T3], Schema=[ID:Dimension, C301:Dimension, 
C302:Dimension])
        │   └── 35:Aggregate(group=[{}], EXPR$0=[MAX($0)])
        │       └── 33:Project(ID=[$0])
        │           └── 31:Filter(condition=[=($cor0.C301, $0)])
        │               └── 15:TableScan(table=[T1], Schema=[ID:Dimension, 
C101:Decimal(0)])
        └── 14:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension, 
C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
{code}
 

Additionally if you swap the sides of the join (making {{{}`T2 join T3`{}}}), 
you do get the correct association, but the correlate has an invalid 
`requiredColumn`:
{code:none}
25:Project(ID=[$0], C201=[$1], C202=[$2], C203=[$3], C204=[$4], ID0=[$5], 
C301=[$6], C302=[$7])
└── 41:Project(ID=[$0], C201=[$1], C202=[$2], C203=[$3], C204=[$4], ID0=[$5], 
C301=[$6], C302=[$7])
    └── 39:Join(condition=[=($8, $1)], joinType=[inner])
        ├── 13:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension, 
C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
        └── 37:Correlate(correlation=[$cor0], joinType=[left], 
requiredColumns=[{6}])
            ├── 14:TableScan(table=[T3], Schema=[ID:Dimension, C301:Dimension, 
C302:Dimension])
            └── 35:Aggregate(group=[{}], EXPR$0=[MAX($0)])
                └── 33:Project(ID=[$0])
                    └── 31:Filter(condition=[=($cor0.C301, $0)])
                        └── 15:TableScan(table=[T1], Schema=[ID:Dimension, 
C101:Decimal(0)])
{code}
Here `requiredColumn` should be 1 (pointing to `C301`) but is actually 6, which 
is where `C301` would be after the join {{`T2 join T3`}}

  was:
JOIN_SUB_QUERY_TO_CORRELATE/Join SubQueryRemoveRule produces incorrect trees 
when using a correlated sub-query as part of an equi-join condition of a join.

For example:
{code:sql}
select * from T3 join T2 on (select max(id) from T1 where T3.C301 = T1.id) = 
T2.C201
{code}
Has the initial tree:
{code:none}
20:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5], 
C203=[$6], C204=[$7])
└── 19:Join(condition=[=(SCALAR_SUBQUERY{
    │    18:Aggregate(group=[{}], EXPR$0=[MAX($0)])
    │    └── 17:Project(ID=[$0])
    │        └── 16:Filter(condition=[=($cor0.C301, $0)])
    │            └── 15:TableScan(table=[T1], Schema=[ID:Dimension, 
C101:Decimal(0)])
    │}, $4)], joinType=[inner])
    ├── 13:TableScan(table=[T3], Schema=[ID:Dimension, C301:Dimension, 
C302:Dimension])
    └── 14:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension, 
C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
{code}
Using only the JOIN_SUB_QUERY_TO_CORRELATE rules in a hep program in planning, 
this is the resulting tree:
{code}
25:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5], 
C203=[$6], C204=[$7])
└── 41:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5], 
C203=[$6], C204=[$7])
    └── 39:Join(condition=[=($8, $4)], joinType=[inner])
        ├── 13:TableScan(table=[T3], Schema=[ID:Dimension, C301:Dimension, 
C302:Dimension])
        └── 37:Correlate(correlation=[$cor0], joinType=[left], 
requiredColumns=[{1}])
            ├── 14:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension, 
C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
            └── 35:Aggregate(group=[{}], EXPR$0=[MAX($0)])
                └── 33:Project(ID=[$0])
                    └── 31:Filter(condition=[=($cor0.C301, $0)])
                        └── 15:TableScan(table=[T1], Schema=[ID:Dimension, 
C101:Decimal(0)])
{code}

Notice that the original correlation expression is between T1 and T3, but the 
rule as created a correlate between T1 and T2.

I would have expected this tree:
{code:none}
25:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5], 
C203=[$6], C204=[$7])
└── 41:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5], 
C203=[$6], C204=[$7])
    └── 39:Join(condition=[=($3, $4)], joinType=[inner])
        ├── 37:Correlate(correlation=[$cor0], joinType=[left], 
requiredColumns=[{1}])
        │   ├── 13:TableScan(table=[T3], Schema=[ID:Dimension, C301:Dimension, 
C302:Dimension])
        │   └── 35:Aggregate(group=[{}], EXPR$0=[MAX($0)])
        │       └── 33:Project(ID=[$0])
        │           └── 31:Filter(condition=[=($cor0.C301, $0)])
        │               └── 15:TableScan(table=[T1], Schema=[ID:Dimension, 
C101:Decimal(0)])
        └── 14:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension, 
C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
{code}

If you swap the sides of the join (making {{`T2 join T3`}}), you do get the 
correct association, but the correlate has an invalid `requiredColumn`:
{code:none}
25:Project(ID=[$0], C201=[$1], C202=[$2], C203=[$3], C204=[$4], ID0=[$5], 
C301=[$6], C302=[$7])
└── 41:Project(ID=[$0], C201=[$1], C202=[$2], C203=[$3], C204=[$4], ID0=[$5], 
C301=[$6], C302=[$7])
    └── 39:Join(condition=[=($8, $1)], joinType=[inner])
        ├── 13:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension, 
C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
        └── 37:Correlate(correlation=[$cor0], joinType=[left], 
requiredColumns=[{6}])
            ├── 14:TableScan(table=[T3], Schema=[ID:Dimension, C301:Dimension, 
C302:Dimension])
            └── 35:Aggregate(group=[{}], EXPR$0=[MAX($0)])
                └── 33:Project(ID=[$0])
                    └── 31:Filter(condition=[=($cor0.C301, $0)])
                        └── 15:TableScan(table=[T1], Schema=[ID:Dimension, 
C101:Decimal(0)])
{code}
Here `requiredColumn` should be 1 (pointing to `C301`) but is actually 6, which 
is where `C301` would be after the join {{`T2 join T3`}}


> JOIN_SUB_QUERY_TO_CORRELATE/Join SubQueryRemoveRule produces incorrect tree 
> when using correlated sub-query in on clause of equi-join
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-6504
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6504
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Ian Bertolacci
>            Priority: Major
>
> JOIN_SUB_QUERY_TO_CORRELATE/Join SubQueryRemoveRule produces incorrect trees 
> when using a correlated sub-query as part of an equi-join condition of a join.
> For example:
> {code:sql}
> select * from T3 join T2 on (select max(id) from T1 where T3.C301 = T1.id) = 
> T2.C201
> {code}
> Has the initial tree:
> {code:none}
> 20:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5], 
> C203=[$6], C204=[$7])
> └── 19:Join(condition=[=(SCALAR_SUBQUERY{
>     │    18:Aggregate(group=[{}], EXPR$0=[MAX($0)])
>     │    └── 17:Project(ID=[$0])
>     │        └── 16:Filter(condition=[=($cor0.C301, $0)])
>     │            └── 15:TableScan(table=[T1], Schema=[ID:Dimension, 
> C101:Decimal(0)])
>     │}, $4)], joinType=[inner])
>     ├── 13:TableScan(table=[T3], Schema=[ID:Dimension, C301:Dimension, 
> C302:Dimension])
>     └── 14:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension, 
> C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
> {code}
> Using only the JOIN_SUB_QUERY_TO_CORRELATE rules in a hep program in 
> planning, this is the resulting tree:
> {code:java}
> 25:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5], 
> C203=[$6], C204=[$7])
> └── 41:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5], 
> C203=[$6], C204=[$7])
>     └── 39:Join(condition=[=($8, $4)], joinType=[inner])
>         ├── 13:TableScan(table=[T3], Schema=[ID:Dimension, C301:Dimension, 
> C302:Dimension])
>         └── 37:Correlate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{1}])
>             ├── 14:TableScan(table=[T2], Schema=[ID:Dimension, 
> C201:Dimension, C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
>             └── 35:Aggregate(group=[{}], EXPR$0=[MAX($0)])
>                 └── 33:Project(ID=[$0])
>                     └── 31:Filter(condition=[=($cor0.C301, $0)])
>                         └── 15:TableScan(table=[T1], Schema=[ID:Dimension, 
> C101:Decimal(0)])
> {code}
> Notice that the original correlation expression is between T1 and T3, but the 
> rule has created a correlate between T1 and T2.
> I would have expected this tree:
> {code:none}
> 25:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5], 
> C203=[$6], C204=[$7])
> └── 41:Project(ID=[$0], C301=[$1], C302=[$2], ID0=[$3], C201=[$4], C202=[$5], 
> C203=[$6], C204=[$7])
>     └── 39:Join(condition=[=($3, $4)], joinType=[inner])
>         ├── 37:Correlate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{1}])
>         │   ├── 13:TableScan(table=[T3], Schema=[ID:Dimension, 
> C301:Dimension, C302:Dimension])
>         │   └── 35:Aggregate(group=[{}], EXPR$0=[MAX($0)])
>         │       └── 33:Project(ID=[$0])
>         │           └── 31:Filter(condition=[=($cor0.C301, $0)])
>         │               └── 15:TableScan(table=[T1], Schema=[ID:Dimension, 
> C101:Decimal(0)])
>         └── 14:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension, 
> C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
> {code}
>  
> Additionally if you swap the sides of the join (making {{{}`T2 join T3`{}}}), 
> you do get the correct association, but the correlate has an invalid 
> `requiredColumn`:
> {code:none}
> 25:Project(ID=[$0], C201=[$1], C202=[$2], C203=[$3], C204=[$4], ID0=[$5], 
> C301=[$6], C302=[$7])
> └── 41:Project(ID=[$0], C201=[$1], C202=[$2], C203=[$3], C204=[$4], ID0=[$5], 
> C301=[$6], C302=[$7])
>     └── 39:Join(condition=[=($8, $1)], joinType=[inner])
>         ├── 13:TableScan(table=[T2], Schema=[ID:Dimension, C201:Dimension, 
> C202:Decimal(0), C203:Decimal(0), C204:Decimal(2)])
>         └── 37:Correlate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{6}])
>             ├── 14:TableScan(table=[T3], Schema=[ID:Dimension, 
> C301:Dimension, C302:Dimension])
>             └── 35:Aggregate(group=[{}], EXPR$0=[MAX($0)])
>                 └── 33:Project(ID=[$0])
>                     └── 31:Filter(condition=[=($cor0.C301, $0)])
>                         └── 15:TableScan(table=[T1], Schema=[ID:Dimension, 
> C101:Decimal(0)])
> {code}
> Here `requiredColumn` should be 1 (pointing to `C301`) but is actually 6, 
> which is where `C301` would be after the join {{`T2 join T3`}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to