Matthew White created CALCITE-7319:
--------------------------------------

             Summary: FILTER_INTO_JOIN Calcite HepPlanner rules loose variable 
context for subqueries
                 Key: CALCITE-7319
                 URL: https://issues.apache.org/jira/browse/CALCITE-7319
             Project: Calcite
          Issue Type: Bug
          Components: core
    Affects Versions: 1.41.0
         Environment: Java
            Reporter: Matthew White


FILTER_INTO_JOIN Calcite HepPlanner rules loose variable context for subqueries

For queries that involve a subplan and therefore correlation variables, 
applying some HepPlanner rules means
that the correlation variables. (The variable set in the nodes) is lost.

As an example start with this SQL query with a subquery.

```sql
final String sql = """
        SELECT
            P.P_PARTKEY
        FROM
            PART P
            JOIN PARTSUPP PS ON P.P_PARTKEY = PS.PS_PARTKEY
        WHERE
                PS.PS_SUPPLYCOST = (
                SELECT
                    MIN(PS_INNER.PS_SUPPLYCOST)
                FROM
                    PARTSUPP PS_INNER
                WHERE
                    PS_INNER.PS_PARTKEY = P.P_PARTKEY
            )
        """;
```

When this is parsed and converted to Calcite RelNodes, there is a reference to 
`$cor0.P_PARTKEY` and the `LogicalFilter` has a varaibleSet present. The 
'explain()' format of the nodes gives:

```
    LogicalProject(P_PARTKEY=[$0])
      LogicalFilter(condition=[=($12, $SCALAR_QUERY({
    LogicalAggregate(group=[{}], EXPR$0=[MIN($0)])
      LogicalProject(PS_SUPPLYCOST=[$3])
        LogicalFilter(condition=[=($0, $cor0.P_PARTKEY)])
          LogicalTableScan(table=[[TPCH, PARTSUPP]])
    }))], variablesSet=[[$cor0]])
        LogicalJoin(condition=[=($0, $9)], joinType=[inner])
          LogicalTableScan(table=[[TPCH, PART]])
          LogicalTableScan(table=[[TPCH, PARTSUPP]])
```

Applying the `FILTER_INTO_JOIN` rule. (we believe that other rules can also 
have the same effect) result in an optimized query, with the calcite nodes 
being as follows

```
    LogicalProject(P_PARTKEY=[$0])
      LogicalJoin(condition=[=($0, $9)], joinType=[inner])
        LogicalTableScan(table=[[TPCH, PART]])
        LogicalFilter(condition=[=($3, $SCALAR_QUERY({
    LogicalAggregate(group=[{}], EXPR$0=[MIN($0)])
      LogicalProject(PS_SUPPLYCOST=[$3])
        LogicalFilter(condition=[=($0, $cor0.P_PARTKEY)])
          LogicalTableScan(table=[[TPCH, PARTSUPP]])
    }))])
          LogicalTableScan(table=[[TPCH, PARTSUPP]])

```

Note that the `LogicalFilter` still has the condition `[=($0, 
$cor0.P_PARTKEY)]` - but no variableset present on any node.

Attempting to convert this back into SQL via Calcite gives

```
java.lang.NullPointerException: variable $cor0 is not found
```

The complete code is available in this [github 
gist](https://gist.github.com/mbwhite/4899dfd959839a6eedd6ee1542c9fb08)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to