[ 
https://issues.apache.org/jira/browse/CALCITE-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18044318#comment-18044318
 ] 

Zhen Chen commented on CALCITE-7319:
------------------------------------

I'm looking at this issue. I've roughly reviewed the logic of 
FilterCorrelateRule, and it matches my thinking. I'm just trying to figure out 
how to modify FilterIntoJoinRule more elegantly. If you've already found a 
solution, would you submit a PR directly?

> FILTER_INTO_JOIN rule loses correlation variable context in HepPlanner
> ----------------------------------------------------------------------
>
>                 Key: CALCITE-7319
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7319
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.41.0
>         Environment: Java
>            Reporter: Matthew White
>            Assignee: Zhen Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.42.0
>
>
> FILTER_INTO_JOIN Calcite HepPlanner rules loose variable context for 
> subqueries
> For queries that involve a subplan and therefore correlation variables, 
> applying some HepPlanner rules means
> that the correlation variables. (The variable set in the nodes) is lost.
> As an example start with this SQL query with a subquery.
> ```sql
> final String sql = """
>         SELECT
>             P.P_PARTKEY
>         FROM
>             PART P
>             JOIN PARTSUPP PS ON P.P_PARTKEY = PS.PS_PARTKEY
>         WHERE
>                 PS.PS_SUPPLYCOST = (
>                 SELECT
>                     MIN(PS_INNER.PS_SUPPLYCOST)
>                 FROM
>                     PARTSUPP PS_INNER
>                 WHERE
>                     PS_INNER.PS_PARTKEY = P.P_PARTKEY
>             )
>         """;
> ```
> When this is parsed and converted to Calcite RelNodes, there is a reference 
> to `$cor0.P_PARTKEY` and the `LogicalFilter` has a varaibleSet present. The 
> 'explain()' format of the nodes gives:
> ```
>     LogicalProject(P_PARTKEY=[$0])
>       LogicalFilter(condition=[=($12, $SCALAR_QUERY({
>     LogicalAggregate(group=[{}], EXPR$0=[MIN($0)])
>       LogicalProject(PS_SUPPLYCOST=[$3])
>         LogicalFilter(condition=[=($0, $cor0.P_PARTKEY)])
>           LogicalTableScan(table=[[TPCH, PARTSUPP]])
>     }))], variablesSet=[[$cor0]])
>         LogicalJoin(condition=[=($0, $9)], joinType=[inner])
>           LogicalTableScan(table=[[TPCH, PART]])
>           LogicalTableScan(table=[[TPCH, PARTSUPP]])
> ```
> Applying the `FILTER_INTO_JOIN` rule. (we believe that other rules can also 
> have the same effect) result in an optimized query, with the calcite nodes 
> being as follows
> ```
>     LogicalProject(P_PARTKEY=[$0])
>       LogicalJoin(condition=[=($0, $9)], joinType=[inner])
>         LogicalTableScan(table=[[TPCH, PART]])
>         LogicalFilter(condition=[=($3, $SCALAR_QUERY({
>     LogicalAggregate(group=[{}], EXPR$0=[MIN($0)])
>       LogicalProject(PS_SUPPLYCOST=[$3])
>         LogicalFilter(condition=[=($0, $cor0.P_PARTKEY)])
>           LogicalTableScan(table=[[TPCH, PARTSUPP]])
>     }))])
>           LogicalTableScan(table=[[TPCH, PARTSUPP]])
> ```
> Note that the `LogicalFilter` still has the condition `[=($0, 
> $cor0.P_PARTKEY)]` - but no variableset present on any node.
> Attempting to convert this back into SQL via Calcite gives
> ```
> java.lang.NullPointerException: variable $cor0 is not found
> ```
> The complete code is available in this [github 
> gist](https://gist.github.com/mbwhite/4899dfd959839a6eedd6ee1542c9fb08)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to