damccorm opened a new pull request, #38929:
URL: https://github.com/apache/beam/pull/38929

   Adds a custom subquery decorrelation pass to Beam SQL to handle complex 
correlated subqueries.
   
   ### The Problem
   Calcite's default `SqlToRelConverter` decorrelates most queries, but some 
complex correlated shapes (like correlated `EXISTS` or `IN` subqueries inside 
`PROJECT` or `JOIN` conditions) survive as `RexSubQuery` or residual 
`LogicalCorrelate` nodes. 
   
   Beam's Volcano planner ruleset has no physical converter rules for a general 
`LogicalCorrelate` node. When these residues survive, the planner fails with a 
`CannotPlanException`.
   
   ### The Fix
   Implemented a pre-Volcano normalization pass in 
`CalciteQueryPlanner.convertToBeamRel`:
   1. **normalizeForVolcano**: This pass runs a short-lived `HepPlanner` with 
Calcite's `*SUB_QUERY_TO_CORRELATE` rules to turn any un-expanded `RexSubQuery` 
into a `LogicalCorrelate`.
   2. It then calls `RelDecorrelator.decorrelateQuery` to lower these 
correlates into standard relational shapes (`Join`, `Aggregate`, `Project`, 
`Filter`) which existing Beam rules already know how to handle.
   3. This pass is strictly gated on the tree actually *referencing* a 
correlation variable (using `RelOptUtil.getVariablesUsed`), making it a no-op 
for shapes like `UNNEST` (`LogicalCorrelate(_, Uncollect)`) which define but do 
not reference a correlation variable, preserving them for `BeamUnnestRule`.
   4. It runs before the Volcano planner and cost-based metadata provider swap, 
keeping it off the recursive cost path.
   
   Added a detailed design document in `decorrelation_design.md` explaining the 
approach and the Volcano planner convergence safety.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to