2010YOUY01 commented on PR #18207: URL: https://github.com/apache/datafusion/pull/18207#issuecomment-3434946419
> > This PR is setting size limit to spill files, when the size exceeds threshold, the spiller rotates to new file. I'm wondering why this design? Now the spill writer and reader is able to do streaming read/write, so a large spill file usually won't be the issue, unless it needs more parallelism somewhere. > > The issue with using a single FIFO file is that you accumulate dead data, bloating disk usage considerably. The idea is to cap that at say 100MB and then start a new file so that once all of the original file has been consumed we can garbage collect it. This makes a lot of sense, operators should release disk usage sooner if possible. I will to review it soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
