jbharadw-oai opened a new pull request, #55678: URL: https://github.com/apache/spark/pull/55678
### What changes were proposed in this pull request? This PR restricts the single-column null-aware anti join optimization to cases where the right side can actually be broadcast, following up on the earlier proposal in #33289. It also makes adaptive query stage reuse mode-aware for hashed broadcast exchanges: - regular equi joins only reuse non-null-aware hashed broadcast stages - null-aware anti joins only reuse null-aware hashed broadcast stages ### Why are the changes needed? Single-column null-aware anti joins build the right side as a broadcast hash relation, but the planner currently selects that path unconditionally once the logical pattern matches. That can choose a broadcast hash join even when the right side is above the broadcast threshold. In addition, adaptive planning was treating hashed broadcast stages as interchangeable without checking whether they were null-aware. Null-aware and regular hashed relations have different semantics, so they should not be reused across each other. ### Does this PR introduce _any_ user-facing change? Yes. Queries that match the single-column null-aware anti join optimization no longer force a broadcast hash join when the right side exceeds the broadcast threshold; they fall back to normal join planning instead. ### How was this patch tested? Added regression coverage for: - `JoinSelectionHelper.canPlanAsBroadcastHashJoin` - physical planning when a null-aware anti join right side is above the broadcast threshold - adaptive query stage reuse for null-aware vs regular hashed broadcast modes Ran: - `./build/sbt "catalyst/testOnly org.apache.spark.sql.catalyst.optimizer.JoinSelectionHelperSuite -- -z single-column"` - `./build/sbt "sql/testOnly org.apache.spark.sql.execution.adaptive.AdaptiveQueryExecSuite -- -z hashed"` - `./build/sbt "sql/testOnly org.apache.spark.sql.JoinSuite -- -z SPARK-36082"` - `./build/sbt "sql/testOnly org.apache.spark.sql.SparkSessionExtensionSuite"` - `./build/sbt "sql/testOnly org.apache.spark.sql.SubquerySuite -- -z SingleColumn"` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Codex GPT-5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
