cloud-fan opened a new pull request, #56031:
URL: https://github.com/apache/spark/pull/56031

   ### What changes were proposed in this pull request?
   
   Followup to https://github.com/apache/spark/pull/54972.
   
   Narrow the `V2ScanRelationPushDown` join-pushdown guard so it only blocks 
pushdown when at least one side has a pushed `Sample` with fraction < 1. At 
fraction = 1 the sample is a no-op on the result set, so dropping it inside the 
merged scan builder is safe.
   
   ### Why are the changes needed?
   
   The guard added in SPARK-55978 exists because the merged scan builder for 
`SupportsPushDownJoin` cannot carry a pushed sample and would silently discard 
it. The hazard is *silent result change*. At fraction = 1, no rows are 
excluded, so dropping the sample changes nothing observable. The current guard 
is therefore stricter than its rationale requires, and unnecessarily skips join 
pushdown for queries that land at `TABLESAMPLE SYSTEM (100 PERCENT)` 
(parameterized queries, query generators, environment-tuned fractions).
   
   ### Does this PR introduce _any_ user-facing change?
   
   No behavior change for queries with fraction < 1. For queries where the 
pushed sample has fraction = 1, join pushdown now proceeds — same result set, 
faster plan.
   
   ### How was this patch tested?
   
   - Existing `"join pushdown is skipped when a side has a pushed sample"` test 
moved from `100 PERCENT` to `50 PERCENT` so it keeps exercising the 
(still-active) fraction < 1 branch of the guard.
   - New `"100% SYSTEM sample does not block join pushdown"` test asserts the 
new fraction = 1 short-circuit, locking in the contract.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to