adriangb commented on issue #17334: URL: https://github.com/apache/datafusion/issues/17334#issuecomment-3447056234
Having played around with spilling in DataFusion now I think @milenkovicm makes some very good points. The way non-spill able operators work right now is not great. What I would propose is that non-spillable operators *register* their memory usage but don't error if they blow past the memory limit. That would put pressure on spoilable operators to start spilling without running into these sort of memory reservation deadlocks. I think this is what @milenkovicm means as well by "we do not limit memory for non-spillable operators [...] Actually, we do track the memory usage of them, but we let it grow unbounded.". Also what @ding-young said: > Yes, when a non-spillable operator runs out of memory, it’s difficult to trigger spilling in another spillable operator to reclaim memory which seems to be a limitation currently. I think if we could make those two changes DataFusion would be in a much better place w.r.t. spilling. There would still be a long tail of making as many operators spoilable as possible and making them as performant when they do spill as we can but it becomes less important. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
