Samyak2 commented on issue #22526: URL: https://github.com/apache/datafusion/issues/22526#issuecomment-4623596957
> the reservation for this batch fails (even though theoretically the hash aggregation released the memory reservation, the new reservation for this RecordBatch could require a bit more memory, meaning it would fail) Not sure I follow. Why would this sliced record batch require more memory than what agg released? But the problem does make sense. Another example is `RepartitionExec`. Even if only one record batch is actually buffered in the channels, we would be reserving size for the whole buffer (this happens today too). -- > It would be nice if we could use something like: > > [datafusion/datafusion/physical-plan/src/spill/spill_manager.rs](https://github.com/apache/datafusion/blob/e7f7fa9929fdcca01df8c90023f961cdcdb217de/datafusion/physical-plan/src/spill/spill_manager.rs#L214) > > Line 214 in [e7f7fa9](/apache/datafusion/commit/e7f7fa9929fdcca01df8c90023f961cdcdb217de) > impl GetSlicedSize for RecordBatch { > > (but without breaking the functionality of the other operators) This also makes sense, but it would have the problem of undercounting in case of unreferenced data. One scenario is `StringViewArray`s that were `slice()`d/`.take()`n but not GC'd yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
