pepijnve opened a new pull request, #16322: URL: https://github.com/apache/datafusion/pull/16322
## Which issue does this PR close? - Closes #16321. ## Rationale for this change `SortPreservingMergeStream` works in two phases. It first waits for each input stream to be ready to emit. Once everybody's ready it proceeds to an emit phase. During the waiting phase, it will poll each stream in a round-robin fashion. If any stream returns `Pending` the code self-wakes the current task and immediately returns `Pending`. This results in busy-waiting when waiting for, for instance, a `SortExec` that's sorting its data or any other pipeline breaker. While this works, it wastes CPU cycles. ## What changes are included in this PR? Rather than returning immediately when one stream is pending, poll each stream once. Only return pending when there are still streams left that have not started emitting. This assumes that the pending streams are well behaved and will wake the task when they need to be polled again as required by the `Stream` contract. Note that this may surface bugs in other streams. Rotation of `uninitiated_partitions` has been removed since that's no longer useful. There was a comment in the code about 'upstream buffer size increase', but I'm not sure what that was referring to. ## Are these changes tested? Only by existing test and manual testing ## Are there any user-facing changes? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org