adriangb commented on issue #17334:
URL: https://github.com/apache/datafusion/issues/17334#issuecomment-3447056234

   Having played around with spilling in DataFusion now I think @milenkovicm 
makes some very good points. The way non-spill able operators work right now is 
not great.
   
   What I would propose is that non-spillable operators *register* their memory 
usage but don't error if they blow past the memory limit. That would put 
pressure on spoilable operators to start spilling without running into these 
sort of memory reservation deadlocks. I think this is what @milenkovicm means 
as well by "we do not limit memory for non-spillable operators [...] Actually, 
we do track the memory usage of them, but we let it grow unbounded.".
   
   Also what @ding-young said:
   
   > Yes, when a non-spillable operator runs out of memory, it’s difficult to 
trigger spilling in another spillable operator to reclaim memory which seems to 
be a limitation currently.
   
   I think if we could make those two changes DataFusion would be in a much 
better place w.r.t. spilling. There would still be a long tail of making as 
many operators spoilable as possible and making them as performant when they do 
spill as we can but it becomes less important.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to