[GitHub] [arrow-datafusion] milenkovicm commented on issue #3941: Generate runtime errors if the memory budget is exceeded [EPIC]

GitBox Wed, 26 Oct 2022 06:23:54 -0700


milenkovicm commented on issue #3941:
URL: 
https://github.com/apache/arrow-datafusion/issues/3941#issuecomment-1292030069


   thanks for your comment @yjshen, 
   
   concern raised in previous comments is not spill algorithm for aggregation, 
it is about interaction between `non-async` and `async` code.
   
   As it is implemented `GroupedHashAggregateStreamV2` lives in non-async word 
but memory manager / memory consumer expose `async` methods which can't be 
called from `poll_next`: 
   
   ```rust
   impl Stream for GroupedHashAggregateStreamV2 {
       type Item = ArrowResult<RecordBatch>;
   
       fn poll_next(
           mut self: std::pin::Pin<&mut Self>,
           cx: &mut Context<'_>,
       ) -> Poll<Option<Self::Item>> {
           let this = &mut *self;
          // await` is only allowed inside `async` functions and blocks only 
allowed inside `async` functions and blocks
           this.mem_manager.try_grow(200).await;
         // rest of the code which does aggregation and spill
     }
   }
   ```
   so the question is how to bridge the gap
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] milenkovicm commented on issue #3941: Generate runtime errors if the memory budget is exceeded [EPIC]

Reply via email to