Matthew White created CALCITE-7319:
--------------------------------------
Summary: FILTER_INTO_JOIN Calcite HepPlanner rules loose variable
context for subqueries
Key: CALCITE-7319
URL: https://issues.apache.org/jira/browse/CALCITE-7319
Project: Calcite
Issue Type: Bug
Components: core
Affects Versions: 1.41.0
Environment: Java
Reporter: Matthew White
FILTER_INTO_JOIN Calcite HepPlanner rules loose variable context for subqueries
For queries that involve a subplan and therefore correlation variables,
applying some HepPlanner rules means
that the correlation variables. (The variable set in the nodes) is lost.
As an example start with this SQL query with a subquery.
```sql
final String sql = """
SELECT
P.P_PARTKEY
FROM
PART P
JOIN PARTSUPP PS ON P.P_PARTKEY = PS.PS_PARTKEY
WHERE
PS.PS_SUPPLYCOST = (
SELECT
MIN(PS_INNER.PS_SUPPLYCOST)
FROM
PARTSUPP PS_INNER
WHERE
PS_INNER.PS_PARTKEY = P.P_PARTKEY
)
""";
```
When this is parsed and converted to Calcite RelNodes, there is a reference to
`$cor0.P_PARTKEY` and the `LogicalFilter` has a varaibleSet present. The
'explain()' format of the nodes gives:
```
LogicalProject(P_PARTKEY=[$0])
LogicalFilter(condition=[=($12, $SCALAR_QUERY({
LogicalAggregate(group=[{}], EXPR$0=[MIN($0)])
LogicalProject(PS_SUPPLYCOST=[$3])
LogicalFilter(condition=[=($0, $cor0.P_PARTKEY)])
LogicalTableScan(table=[[TPCH, PARTSUPP]])
}))], variablesSet=[[$cor0]])
LogicalJoin(condition=[=($0, $9)], joinType=[inner])
LogicalTableScan(table=[[TPCH, PART]])
LogicalTableScan(table=[[TPCH, PARTSUPP]])
```
Applying the `FILTER_INTO_JOIN` rule. (we believe that other rules can also
have the same effect) result in an optimized query, with the calcite nodes
being as follows
```
LogicalProject(P_PARTKEY=[$0])
LogicalJoin(condition=[=($0, $9)], joinType=[inner])
LogicalTableScan(table=[[TPCH, PART]])
LogicalFilter(condition=[=($3, $SCALAR_QUERY({
LogicalAggregate(group=[{}], EXPR$0=[MIN($0)])
LogicalProject(PS_SUPPLYCOST=[$3])
LogicalFilter(condition=[=($0, $cor0.P_PARTKEY)])
LogicalTableScan(table=[[TPCH, PARTSUPP]])
}))])
LogicalTableScan(table=[[TPCH, PARTSUPP]])
```
Note that the `LogicalFilter` still has the condition `[=($0,
$cor0.P_PARTKEY)]` - but no variableset present on any node.
Attempting to convert this back into SQL via Calcite gives
```
java.lang.NullPointerException: variable $cor0 is not found
```
The complete code is available in this [github
gist](https://gist.github.com/mbwhite/4899dfd959839a6eedd6ee1542c9fb08)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)