adriangb opened a new pull request, #21666:
URL: https://github.com/apache/datafusion/pull/21666

   ## Which issue does this PR close?
   
   - Closes #.
   
   ## Rationale for this change
   
   Partitioned hash join dynamic filters assumed every build-side partition 
would be polled far enough to report its build summary. That assumption breaks 
when an upstream partitioned `RightSemi` join legally completes early for 
partitions whose build side is empty. In that case the child partitioned hash 
join can be dropped before it reports its empty build partition, and sibling 
partitions then wait forever on the shared dynamic-filter barrier.
   
   ## What changes are included in this PR?
   
   - start partitioned hash join build-side collection eagerly at `execute()` 
time when dynamic filter pushdown is enabled
   - report `PartitionBuildData::Partitioned` as soon as build collection 
finishes, including empty partitions
   - avoid double-reporting by skipping the stream-side accumulator path for 
partitioned mode
   - add a regression test covering the parent `RightSemi` / child partitioned 
inner join cancellation pattern that previously hung
   
   ## Are these changes tested?
   
   - `cargo fmt --all`
   - `cargo test -p datafusion-physical-plan 
test_partitioned_dynamic_filter_reports_empty_canceled_partitions -- 
--nocapture`
   - `cargo test -p datafusion-physical-plan hash_join -- --nocapture`
   - verified that 
`test_partitioned_dynamic_filter_reports_empty_canceled_partitions` times out 
on the pre-fix revision and passes on this branch
   
   `cargo clippy --all-targets --all-features -- -D warnings` currently fails 
on an unrelated existing workspace lint in 
`datafusion/expr/src/logical_plan/plan.rs:3773` (`clippy::mutable_key_type`).
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to