[I] Do we need bounded channels in RepartitionExec? [datafusion]

via GitHub Fri, 07 Nov 2025 13:48:34 -0800


adriangb opened a new issue, #18538:
URL: https://github.com/apache/datafusion/issues/18538


   Now that we have a good spilling implementation in 
https://github.com/apache/datafusion/pull/18207, do we still want bounded 
channels for the in-memory data? This was first introduced in 
https://github.com/apache/datafusion/pull/4867.
   
   My feeling is that we could probably drop it. The one situation I am worried 
about is when we fill RepartitionExec's buffers and consume the entire memory 
budget then the query fails. i.e. if we could do "cooperative" spilling 
RepartitionExec would be the ideal candidate to spill and this would not be a 
problem (an upstream GroupBy could ask other operators to spill, 
RepartitionExec would spill easily and free up memory). But today that's not 
the case.
   
   One way to collect  more information is to run ClickBench (and other 
benchmarks) w/o the Distribution infrastructure and compare runtimes, peak 
memory use and behavior under constrained memory budgets.
   
   cc @Dandandan @alamb @crepererum @2010YOUY01 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Do we need bounded channels in RepartitionExec? [datafusion]

Reply via email to