Rachelint opened a new pull request, #23269:
URL: https://github.com/apache/datafusion/pull/23269

   ## Which issue does this PR close?
   
   - Closes #.
   
   ## Rationale for this change
   
   The current producer-side repartition coalescer is shared by all input tasks 
for each output partition. That adds synchronization around every coalesced 
batch path when multiple input tasks target the same output partition.
   
   ## What changes are included in this PR?
   
   This PR replaces the shared per-output-partition coalescer with local 
per-producer-channel coalescers in `RepartitionExec`:
   
   - each non-preserve-order output channel owns its own `LimitedBatchCoalescer`
   - preserve-order mode still skips producer-side coalescing and relies on 
`StreamingMergeBuilder`
   - local coalescers are finalized by their owning input task at end of input
   - the shared `Arc<Mutex<LimitedBatchCoalescer>>` and active-sender tracking 
are removed
   
   ## Are these changes tested?
   
   Ran:
   
   - `cargo fmt --all`
   - `cargo check -p datafusion-physical-plan`
   - `cargo clippy -p datafusion-physical-plan --all-targets --all-features -- 
-D warnings`
   - `cargo clippy --all-targets --all-features -- -D warnings`
   
   Existing repartition tests cover the coalescing and spilling paths.
   
   ## Are there any user-facing changes?
   
   No user-facing API changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to