Hello,
I'm working with Apache Calcite and using variable sets in each relational node
to track correlations between outer attributes in correlated subqueries and
their corresponding outer nodes.
However, I'm encountering an issue where I cannot find the expected correlated
variables in the variable sets when dealing with scalar subqueries.
Example Query:
SELECT
(SELECT name FROM t1
WHERE a = (SELECT a
FROM t2
WHERE b = t.b)) AS name
FROM t3 t
WHERE t.b = 'xxxxx';
Generated Logical Plan:
rel#22:LogicalProject.NONE.[](input=LogicalFilter#15,exprs=[$SCALAR_QUERY({
LogicalProject(NAME=[$0])
LogicalFilter(condition=[=($1, $SCALAR_QUERY({
LogicalProject(a=[$0])
LogicalFilter(condition=[=($1, $cor0.contno)])
LogicalTableScan(table=[[t2]])
}))])
LogicalTableScan(table=[[t1]])
})])
Expected vs. Actual Behavior:
Based on my understanding, the correlation variable $cor0 should be stored in
the variable set of the outermost LogicalProject node, indicating that cor0
correlates to the outer table t. However, when I examine the variable sets,
they are empty for all nodes in this plan.
Question:
Could you help clarify whether this is:
1. A potential bug in how correlated variables are populated in variable
sets for scalar subqueries, or
2. Expected behavior where Calcite uses special logic for handling
correlations in scalar subqueries that differs from the standard variable set
mechanism?
Any guidance on the correct way to track these correlations in scalar subquery
scenarios would be greatly appreciated.
Best regards,
Zikai Wang