adriangb opened a new issue, #18538: URL: https://github.com/apache/datafusion/issues/18538
Now that we have a good spilling implementation in https://github.com/apache/datafusion/pull/18207, do we still want bounded channels for the in-memory data? This was first introduced in https://github.com/apache/datafusion/pull/4867. My feeling is that we could probably drop it. The one situation I am worried about is when we fill RepartitionExec's buffers and consume the entire memory budget then the query fails. i.e. if we could do "cooperative" spilling RepartitionExec would be the ideal candidate to spill and this would not be a problem (an upstream GroupBy could ask other operators to spill, RepartitionExec would spill easily and free up memory). But today that's not the case. One way to collect more information is to run ClickBench (and other benchmarks) w/o the Distribution infrastructure and compare runtimes, peak memory use and behavior under constrained memory budgets. cc @Dandandan @alamb @crepererum @2010YOUY01 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
