xudong963 commented on code in PR #20642:
URL: https://github.com/apache/datafusion/pull/20642#discussion_r2911262610
##########
datafusion/physical-plan/src/sorts/sort.rs:
##########
@@ -355,6 +355,13 @@ impl ExternalSorter {
self.sort_and_spill_in_mem_batches().await?;
}
+ // Transfer the pre-reserved merge memory to the streaming merge
+ // using `take()` instead of `new_empty()`. This ensures the merge
+ // stream starts with `sort_spill_reservation_bytes` already
+ // allocated, preventing starvation when concurrent sort partitions
+ // compete for pool memory. `take()` moves the bytes atomically
+ // without releasing them back to the pool, so other partitions
+ // cannot race to consume the freed memory.
Review Comment:
Thanks! Good point — just take() alone wouldn't be enough if the merge
stream doesn't know about the pre-reserved bytes.
The PR does address this in the other changed files:
- In builder.rs: BatchBuilder now tracks batches_mem_used separately and
only calls try_grow() when actual usage exceeds the current reservation size.
It also records initial_reservation so
it never shrinks below that during build_output. This way the
pre-reserved bytes are used as the initial budget rather than requesting from
the pool on top of them.
- In multi_level_merge.rs: get_sorted_spill_files_to_merge now tracks
total_needed and only requests additional pool memory when total_needed >
reservation.size(), so spill file buffers
covered by the pre-reserved bytes don't trigger extra pool allocations.
So the merge stream is aware of the pre-reserved bytes and uses them as
its starting budget — it doesn't think it's starting from 0.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]