Nachiket-Roy opened a new pull request, #19695:
URL: https://github.com/apache/datafusion/pull/19695
## Which issue does this PR close?
- Closes #19679
## Rationale for this change
External sort previously assumed that, under memory pressure, there would
always be buffered in-memory batches available to spill before sorting. This
assumption breaks down when a single oversized `RecordBatch` arrives and cannot
be fully sorted in memory, while no other buffered batches exist to spill
first. In this scenario, the sorter could fail with an out-of-memory error or
violate expected output batch sizing. This PR adds a safe fallback that allows
external sort to make progress without unbounded memory growth.
## What changes are included in this PR?
This PR introduces a chunked spill fallback for oversized batches under
memory pressure:
- Adds a new helper `sort_and_spill_large_batch()` that:
- Sorts a single oversized batch once
- Splits the sorted output into `batch_size`-sized chunks
- Incrementally appends these chunks to a single spill file
- Integrates this helper at the memory-reservation boundary to handle the
case where:
- Memory cannot be reserved
- No buffered batches are available to spill
- The input batch exceeds the configured `batch_size`
- Ensures:
- Correct ordering is preserved
- Output batches respect `batch_size`
- Memory is released eagerly
- No async recursion or API changes are introduced
## Are these changes tested?
Yes.
No new tests were added, as this change is fully covered by existing tests
that already exercise external sort spilling behavior.
## Are there any user-facing changes?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]