jbharadw-oai opened a new pull request, #55678:
URL: https://github.com/apache/spark/pull/55678

   ### What changes were proposed in this pull request?
   
   This PR restricts the single-column null-aware anti join optimization to 
cases where the right
   side can actually be broadcast, following up on the earlier proposal in 
#33289.
   
   It also makes adaptive query stage reuse mode-aware for hashed broadcast 
exchanges:
   - regular equi joins only reuse non-null-aware hashed broadcast stages
   - null-aware anti joins only reuse null-aware hashed broadcast stages
   
   ### Why are the changes needed?
   
   Single-column null-aware anti joins build the right side as a broadcast hash 
relation, but the
   planner currently selects that path unconditionally once the logical pattern 
matches. That can
   choose a broadcast hash join even when the right side is above the broadcast 
threshold.
   
   In addition, adaptive planning was treating hashed broadcast stages as 
interchangeable without
   checking whether they were null-aware. Null-aware and regular hashed 
relations have different
   semantics, so they should not be reused across each other.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. Queries that match the single-column null-aware anti join optimization 
no longer force a
   broadcast hash join when the right side exceeds the broadcast threshold; 
they fall back to normal
   join planning instead.
   
   ### How was this patch tested?
   
   Added regression coverage for:
   - `JoinSelectionHelper.canPlanAsBroadcastHashJoin`
   - physical planning when a null-aware anti join right side is above the 
broadcast threshold
   - adaptive query stage reuse for null-aware vs regular hashed broadcast modes
   
   Ran:
   - `./build/sbt "catalyst/testOnly 
org.apache.spark.sql.catalyst.optimizer.JoinSelectionHelperSuite -- -z 
single-column"`
   - `./build/sbt "sql/testOnly 
org.apache.spark.sql.execution.adaptive.AdaptiveQueryExecSuite -- -z hashed"`
   - `./build/sbt "sql/testOnly org.apache.spark.sql.JoinSuite -- -z 
SPARK-36082"`
   - `./build/sbt "sql/testOnly 
org.apache.spark.sql.SparkSessionExtensionSuite"`
   - `./build/sbt "sql/testOnly org.apache.spark.sql.SubquerySuite -- -z 
SingleColumn"`
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Codex GPT-5
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to