Re: [PR] Add spilling to RepartitionExec [datafusion]

via GitHub Sat, 11 Oct 2025 05:44:18 -0700


adriangb commented on code in PR #18014:
URL: https://github.com/apache/datafusion/pull/18014#discussion_r2422792326



##########
datafusion/physical-plan/src/repartition/mod.rs:
##########
@@ -1007,12 +1071,37 @@ impl RepartitionExec {
 
                 let timer = metrics.send_time[partition].timer();
                 // if there is still a receiver, send to it
-                if let Some((tx, reservation)) = 
output_channels.get_mut(&partition) {
-                    reservation.lock().try_grow(size)?;
-
-                    if tx.send(Some(Ok(batch))).await.is_err() {
+                if let Some((tx, reservation, spill_manager)) =
+                    output_channels.get_mut(&partition)
+                {
+                    let (batch_to_send, is_memory_batch) =
+                        match reservation.lock().try_grow(size) {
+                            Ok(_) => {
+                                // Memory available - send in-memory batch
+                                (RepartitionBatch::Memory(batch), true)
+                            }
+                            Err(_) => {
+                                // We're memory limited - spill this single 
batch to its own file

Review Comment:
   > yes, having a file per batch can get process to get killed (by the os) due 
to too many open files
   
   I checked and indeed we close the files after writing so this is not going 
to happen. the only thing we accumulate is `PathBuf`s, not open file 
descriptors.
   
   > closing and then opening files for will bring at least 6 system calls per 
batch
   > Ideally if we can keep file open and then share offsets after write, would 
save lot of syscals
   
   I think that would be nice but I can't think of a scheme that would make 
that work: reading from the end of the file isn't compatible with our use case 
because we *need* it to be FIFO but using a single file only really works if 
you do LIFO. Do you have any ideas on how we could do this in a relatively 
simple way?
   
   Given that any query that would spill here would have previously errored out 
I'm not too worried about performance. And frankly I think if batch sizes are 
reasonable a couple extra sys calls won't be measurable.
   
   That said since how the spilling happens is all very internal / private 
there's no reason we can't merge this and then come back and improve it later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add spilling to RepartitionExec [datafusion]

Reply via email to