milenkovicm commented on issue #3941: URL: https://github.com/apache/arrow-datafusion/issues/3941#issuecomment-1292030069
thanks for your comment @yjshen, concern raised in previous comments is not spill algorithm for aggregation, it is about interaction between `non-async` and `async` code. As it is implemented `GroupedHashAggregateStreamV2` lives in non-async word but memory manager / memory consumer expose `async` methods which can't be called from `poll_next`: ```rust impl Stream for GroupedHashAggregateStreamV2 { type Item = ArrowResult<RecordBatch>; fn poll_next( mut self: std::pin::Pin<&mut Self>, cx: &mut Context<'_>, ) -> Poll<Option<Self::Item>> { let this = &mut *self; // await` is only allowed inside `async` functions and blocks only allowed inside `async` functions and blocks this.mem_manager.try_grow(200).await; // rest of the code which does aggregation and spill } } ``` so the question is how to bridge the gap -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org