JanKaul opened a new pull request, #22092: URL: https://github.com/apache/datafusion/pull/22092
## Which issue does this PR close? Closes #22090. ## Rationale for this change See #22090 — today's "all-channels-non-empty" gate doesn't catch the case where one consumer lags behind a balanced producer. The slow channel grows linearly per input batch. ## What changes are included in this PR? - New config `datafusion.execution.repartition_buffer_size_bytes` (default 100 MiB). - `distributor_channels.rs`: gate predicate becomes `empty_channels == 0 || buffered_bytes >= max_buffered_bytes`. Items are stored as `(T, usize)` so receivers refund bytes on pop. Overdraw escape: an empty channel always accepts a push (a single oversize batch can't head-of-line block its consumer). The `Mutex<Option<Vec<Waker>>>` double-state is collapsed to `Mutex<Vec<Waker>>` with the counters as the single source of truth. - `mod.rs`: plumbs the config through `channels` / `partition_aware_channels`; passes batch size at the data-path send and `0` for spill-marker / sentinel sends. - 5 new unit tests: skewed producer parking on B, oversize-overdraw, Gate A still working under a generous B, multiple parked senders waking on release, receiver-drop refund. ## Are these changes tested? Yes. 21 `distributor_channels` unit tests and 48 `repartition` tests pass. ## Are there any user-facing changes? One new config option; otherwise transparent. Default value caps in-memory repartition buffering at 100 MiB — workloads that rely on the previous unbounded buffering may want to set it higher. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
