peter-toth opened a new pull request, #55628:
URL: https://github.com/apache/spark/pull/55628

   ### What changes were proposed in this pull request?
   
   `PlanMerger` now supports filter propagation through `Join` nodes when 
merging similar subplans. Previously, when two subplans contained identical 
`Join` nodes but differed only in a filter applied to one of the join's 
children, they could not be merged.
   
   This PR adds the ability to propagate such filter conditions through a 
`Join` and into the parent `Aggregate`'s `FILTER` clause. A new 
`filterSafeForJoin` helper checks that the filter originates from the 
non-nullable (preserved) side of the join: the left side of 
`LeftOuter`/`LeftSemi`/`LeftAnti`, the right side of `RightOuter`, or either 
side of `Inner`/`Cross`. `FullOuter` joins are not eligible.
   
   The feature is gated by a new SQL config:
   `spark.sql.optimizer.mergeSubplans.filterPropagationThroughJoin.enabled` 
(default: `true`).
   
   ### Why are the changes needed?
   
   Without this change, scalar subqueries that differ only in a filter on one 
side of an identical join cannot be merged, resulting in redundant scans and 
compute. For example:
   
     SELECT
       (SELECT sum(key) FROM t1 JOIN t2 ON t1.id = t2.id),
       (SELECT sum(key) FROM t1 JOIN t2 ON t1.id = t2.id WHERE t2.b > 1)
   
   Both subqueries scan `t1` and `t2` in full even though they share the same 
base join. After this change a single merged scan is used and the second 
subquery's result is derived from it via an aggregate `FILTER` clause.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. The optimizer may now merge scalar subqueries that were previously kept 
separate, reducing the number of scan and join operations. The new config 
`spark.sql.optimizer.mergeSubplans.filterPropagationThroughJoin.enabled` 
(default `true`) can be used to opt out.
   
   ### How was this patch tested?
   
   Added unit tests in `MergeSubplansSuite`:
   - Merge with filter on left inner join child
   - Merge with filter on right inner join child
   - No merge when both join children have independent filters
   - Merge with filter on the preserved side of a `LeftSemi` join
   - No merge when filter is on the non-output side of a `LeftSemi` join
   - No merge when filter is on the nullable side of an outer join
   - No merge when the feature is disabled via config
   
   Added integration test in `PlanMergeSuite` verifying correctness 
(`checkAnswer`) and plan shape (`SubqueryExec`/`ReusedSubqueryExec` counts) for 
both the enabled and disabled config cases, with and without AQE.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Sonnet 4.6
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to