cloud-fan opened a new pull request, #55500:
URL: https://github.com/apache/spark/pull/55500
### What changes were proposed in this pull request?
Follow-up to #55298 (SPARK-40193). Two related cleanups to `PlanMerger`'s
filter propagation:
1. **Correctness fix in `mergeNamedExpressions`.** Wrapping of unmatched
cached expressions with the cached plan's filter now iterates only over the
original cached range `[0, cachedPlanExpressions.length)`, not over all of
`mergedExpressions`. The previous loop also touched new-plan entries that were
appended earlier in the same call and already wrapped with the new plan's
filter.
2. **Tighten the `(np: Filter, cp)` / `(np, cp: Filter)` cases in
`tryMergePlans`.** Drop the structurally unreachable branches that appended
`cpFilter.toSeq` / `npFilter.map(_._1).toSeq` to the new `Project` and the
corresponding `symmetricFilterPropagationEnabled` escape in the guard. In both
cases the recursion keeps the non-Filter side unchanged, so no deeper case can
expose a `Filter` on that side — the child result always has `cpFilter = None`
/ `npFilter = None`. Matching `None` explicitly makes the invariant explicit
and removes dead code that would have produced a `Project` with duplicate
attributes if ever reached.
### Why are the changes needed?
For (1): with symmetric filter propagation enabled
(`spark.sql.optimizer.mergeSubplans.symmetricFilterPropagation.enabled = true`)
and non-attribute `Project` expressions on both sides of the merge, the
cached-side loop double-wrapped new-plan-appended expressions with
`If(cpFilter, If(npFilter, expr, null), null)` and replaced the slot in
`mergedExpressions` with a new `Alias` (fresh `exprId`). The `newNPMapping`
built earlier in the same call still pointed at the single-wrap alias's
attribute, so the parent `Aggregate` was rewritten to reference an attribute
that was no longer in the merged `Project`'s output. The resulting plan failed
analysis with `MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION`.
Minimal reproducer (fails on master before this PR):
```scala
withSQLConf(SQLConf.MERGE_SUBPLANS_SYMMETRIC_FILTER_PROPAGATION_ENABLED.key
-> "true") {
val subquery1 = ScalarSubquery(
testRelation.where($"a" > 1).select(($"a" *
2).as("x")).groupBy()(sum($"x").as("sum_x")))
val subquery2 = ScalarSubquery(
testRelation.where($"a" < 1).select(($"a" +
1).as("y")).groupBy()(max($"y").as("max_y")))
val df = testRelation.select(subquery1, subquery2).analyze
MergeSubplans(df) // analyzer error: Resolved attribute(s) "y" missing
from "x", "y", ...
}
```
For (2): the branches in question are unreachable by case analysis and the
appended `cpFilter.toSeq` / `npFilter.map(_._1).toSeq` would duplicate an
attribute already present in `mergedChild.output`. Removing them makes the
reachable contract explicit.
### Does this PR introduce _any_ user-facing change?
No. The bug was only observable as an analyzer failure, and only when
`spark.sql.optimizer.mergeSubplans.symmetricFilterPropagation.enabled` (which
defaults to `false`) was enabled together with subqueries whose merge path
exercises non-attribute `Project` expressions on both sides. Behavior otherwise
matches the released master.
### How was this patch tested?
- New unit test `MergeSubplansSuite`: `"SPARK-40193: Merge non-grouping
subqueries with different filter conditions and non-attribute Project
expressions on both sides"` — fails on master without the fix (analysis error),
passes with the fix.
- Full `MergeSubplansSuite` (42 tests) and `PlanMergeSuite` (12 tests)
continue to pass.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Sonnet 4.5
This pull request and its description were written by Isaac.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]