alamb commented on PR #19360:
URL: https://github.com/apache/datafusion/pull/19360#issuecomment-3666437472

   > Edit: I was mistaken, the cross join is actually returning batches 
sufficiently frequently. It's actually the aggregation loop that's not 
yielding. I'm reminded of the original discussion we had on this topic in 
#16196. Whose responsibility is it to yield sufficiently frequently? Back then 
I argued that it's the looping code (i.e. the aggregation loop) that's 
responsible for this because that's where the root cause of the problem lies. 
   
   My memory was that the general consensus was that the operators themselves 
(ie. in this case the aggregation loop) should be yielding periodically to the 
scheduler, and that the cooperative() method was a workaround that gave us some 
more time to figure out how to do this (and tokio's budget wasn't yet public)
   
   > I've experimented a bit with a variant of what's suggested in this PR in 
https://github.com/pepijnve/datafusion/tree/coop_joins. 
   
   I really like this method  from 
https://github.com/pepijnve/datafusion/commit/aeecad3c4e88d68f3593c065f6b2266b20fb8809
   
   
   ```rust
   /// Consumes a unit of budget. If the task budget has been depleted 
`Poll::Pending` is returned.
   /// Otherwise, `Poll::Ready(()))` is returned.
   ///
   /// The task will only yield if its entire coop budget has been exhausted.
   /// This function can be used in order to insert optional yield points into 
long
   /// computations that do not use Tokio resources,
   /// without redundantly yielding to the runtime each time.
   pub fn consume_budget(cx: &mut Context<'_>) -> Poll<()> {
       tokio::task::coop::poll_proceed(cx).map(|r| r.made_progress())
   }
   ```
   
   @pepijnve  would your proposal be to add the appropriate `consume_budget` 
calls in group by hash instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to