zhuqi-lucas commented on issue #16353:
URL: https://github.com/apache/datafusion/issues/16353#issuecomment-2969886007

   > > fixes any bugs / adds features over the current one,
   > > Is "just" cleaner way to implement the same thing (this is also a fine 
thing to contribute as well).
   > 
   > There are a couple of benefits.
   > 
   > It removes the edge case seen in the interleave operator (or any `select!` 
style code in general). With the current per stream counter, one stream might 
want to yield, but the parent stream may decide to poll another stream in 
response which happens to be ready. The end result is that two cooperating 
streams may turn into a non-cooperating when they are merged. To fix this, you 
would need to adjust the merging operator as well and we're basically back 
where we started. If all cooperating streams use the same budget, then this 
problem goes away. Once the yield point has been hit, all cooperating streams 
will yield.
   
   So it means this sub-task corner case can be resolved?
   
   Fix the corner case provided in this link: 
https://gist.github.com/pepijnve/0e1a66f98033c6c44c62a51fb9dbae5a
   
   > Using the task budget also avoids the 'redundant yield' problem in the 
current version. If you now do a simple `SELECT * FROM ...` query, by default 
you'll get a `Pending` after every 64 `Ready(RecordBatch)`. With the task 
budget you will only actually inject the `Pending` when it's actually 
necessary. The system automatically does the right thing.
   
   I am curious what's the budget count since we can't config it from 
datafusion, will it affect performance or other things? It seems not, because 
we already use RecordBatchReceiverStream for the budget?
   
   
   Another question:
   
   If we have share the one budget  for all leaf nodes, will some leaf node 
very aggressive consuming budget will affect the total fairness or performance? 
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to