damccorm opened a new pull request, #38929: URL: https://github.com/apache/beam/pull/38929
Adds a custom subquery decorrelation pass to Beam SQL to handle complex correlated subqueries. ### The Problem Calcite's default `SqlToRelConverter` decorrelates most queries, but some complex correlated shapes (like correlated `EXISTS` or `IN` subqueries inside `PROJECT` or `JOIN` conditions) survive as `RexSubQuery` or residual `LogicalCorrelate` nodes. Beam's Volcano planner ruleset has no physical converter rules for a general `LogicalCorrelate` node. When these residues survive, the planner fails with a `CannotPlanException`. ### The Fix Implemented a pre-Volcano normalization pass in `CalciteQueryPlanner.convertToBeamRel`: 1. **normalizeForVolcano**: This pass runs a short-lived `HepPlanner` with Calcite's `*SUB_QUERY_TO_CORRELATE` rules to turn any un-expanded `RexSubQuery` into a `LogicalCorrelate`. 2. It then calls `RelDecorrelator.decorrelateQuery` to lower these correlates into standard relational shapes (`Join`, `Aggregate`, `Project`, `Filter`) which existing Beam rules already know how to handle. 3. This pass is strictly gated on the tree actually *referencing* a correlation variable (using `RelOptUtil.getVariablesUsed`), making it a no-op for shapes like `UNNEST` (`LogicalCorrelate(_, Uncollect)`) which define but do not reference a correlation variable, preserving them for `BeamUnnestRule`. 4. It runs before the Volcano planner and cost-based metadata provider swap, keeping it off the recursive cost path. Added a detailed design document in `decorrelation_design.md` explaining the approach and the Volcano planner convergence safety. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
