milenkovicm commented on issue #17334: URL: https://github.com/apache/datafusion/issues/17334#issuecomment-3393169247
> Yes, when a non-spillable operator runs out of memory, it’s difficult to trigger spilling in another spillable operator to reclaim memory which seems to be a limitation currently. We need collaborative spilling like in spark, but I don't think current API can support it. > Have you ever observed any cases where a non-spillable operator showed a memory usage spike (for example, due to skewness or similar factors)? I cant really say, we just track maximum memory used by non-spillable operators > I wonder what would be the solution for these frequent failures on non-spillable operators - especially when other concurrent operators are spillable. If the memory usage of non-spillable operators can be roughly estimated before execution, do you think it would make sense to bypass or pre-reserve memory for them, instead of continuously growing the shared memory reservation along the non-spillable path? Tuning spillable and unspillable for me is like an equation with two unknowns, very hard to get it right as there is linear relation between two numbers. So we need to assume one variable as constant or add an additional equation. Adding an additional pool, one for spillable and another for non-spillable, like, I believe, @2010YOUY01 mentioned as well, will break relation between spillable and unspilable memory, simplifying things a bit, one variable can be approximated with a constant making relation with single variable. Spillable pool is limited to trigger spill, unspillable may or may not be. If OS-level memory enforcement is used (cgroups), then unlimited unspillable may make sense. If we want to limit unspillable memory, doing a few empirical tests can give us rough estimation of. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
