Hello,
I'm working with Apache Calcite and using variable sets in each relational node 
to track correlations between outer attributes in correlated subqueries and 
their corresponding outer nodes.

However, I'm encountering an issue where I cannot find the expected correlated 
variables in the variable sets when dealing with scalar subqueries.

Example Query:
SELECT
    (SELECT name FROM t1
     WHERE a = (SELECT a
                FROM t2
                WHERE b = t.b)) AS name
FROM t3 t
WHERE t.b = 'xxxxx';

Generated Logical Plan:
rel#22:LogicalProject.NONE.[](input=LogicalFilter#15,exprs=[$SCALAR_QUERY({
LogicalProject(NAME=[$0])
  LogicalFilter(condition=[=($1, $SCALAR_QUERY({
LogicalProject(a=[$0])
  LogicalFilter(condition=[=($1, $cor0.contno)])
    LogicalTableScan(table=[[t2]])
}))])
    LogicalTableScan(table=[[t1]])
})])

Expected vs. Actual Behavior:
Based on my understanding, the correlation variable $cor0 should be stored in 
the variable set of the outermost LogicalProject node, indicating that cor0 
correlates to the outer table t. However, when I examine the variable sets, 
they are empty for all nodes in this plan.

Question:
Could you help clarify whether this is:

  1.  A potential bug in how correlated variables are populated in variable 
sets for scalar subqueries, or
  2.  Expected behavior where Calcite uses special logic for handling 
correlations in scalar subqueries that differs from the standard variable set 
mechanism?

Any guidance on the correct way to track these correlations in scalar subquery 
scenarios would be greatly appreciated.

Best regards,
Zikai Wang

Reply via email to