zhuqi-lucas commented on issue #16353: URL: https://github.com/apache/datafusion/issues/16353#issuecomment-2969886007
> > fixes any bugs / adds features over the current one, > > Is "just" cleaner way to implement the same thing (this is also a fine thing to contribute as well). > > There are a couple of benefits. > > It removes the edge case seen in the interleave operator (or any `select!` style code in general). With the current per stream counter, one stream might want to yield, but the parent stream may decide to poll another stream in response which happens to be ready. The end result is that two cooperating streams may turn into a non-cooperating when they are merged. To fix this, you would need to adjust the merging operator as well and we're basically back where we started. If all cooperating streams use the same budget, then this problem goes away. Once the yield point has been hit, all cooperating streams will yield. So it means this sub-task corner case can be resolved? Fix the corner case provided in this link: https://gist.github.com/pepijnve/0e1a66f98033c6c44c62a51fb9dbae5a > Using the task budget also avoids the 'redundant yield' problem in the current version. If you now do a simple `SELECT * FROM ...` query, by default you'll get a `Pending` after every 64 `Ready(RecordBatch)`. With the task budget you will only actually inject the `Pending` when it's actually necessary. The system automatically does the right thing. I am curious what's the budget count since we can't config it from datafusion, will it affect performance or other things? It seems not, because we already use RecordBatchReceiverStream for the budget? Another question: If we have share the one budget for all leaf nodes, will some leaf node very aggressive consuming budget will affect the total fairness or performance? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org