wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1958892432
########## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ########## @@ -246,32 +282,50 @@ fn replace_with_partial_sort( /// This function turns plans of the form /// ```text /// "SortExec: expr=\[a@0 ASC\]", -/// " CoalescePartitionsExec", -/// " RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1", +/// " ...nodes..." +/// " CoalescePartitionsExec", +/// " RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1", /// ``` /// to /// ```text /// "SortPreservingMergeExec: \[a@0 ASC\]", /// " SortExec: expr=\[a@0 ASC\]", -/// " RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1", +/// " ...nodes..." +/// " RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1", /// ``` /// by following connections from [`CoalescePartitionsExec`]s to [`SortExec`]s. /// By performing sorting in parallel, we can increase performance in some scenarios. +/// +/// This requires that there are no nodes between the [`SortExec`] and [`CoalescePartitionsExec`] Review Comment: Think I need better words. 😆 The context is made to find linked Sort->Coalesce cascades. ``` SortExec ctx.data=false, to halt remove_bottleneck_in_subplan) ...nodes... ctx.data=true (e.g. are linked in cascade) Coalesce ctx.data=true (e.g. is a coalesce) ``` This linkage is then used to say "if we find a sort, remove the coalesce from the subplan". Specifically, [this code](https://github.com/apache/datafusion/blob/e4b78c7ed40c248cfc9596d53f1813b62c668249/datafusion/physical-optimizer/src/enforce_sorting/mod.rs#L334-L348). If the link is broken, a.k.a. if `ctx.data=false`, then stop [going down the subplan](https://github.com/apache/datafusion/blob/e4b78c7ed40c248cfc9596d53f1813b62c668249/datafusion/physical-optimizer/src/enforce_sorting/mod.rs#L610) looking for coalesce to remove. So the link only exists as long as "no nodes" break the link. Example of an unlinked Coalesce->Sort, since the aggregate requires the coalesce for single partitioned input: ``` SortExec ctx.data=false, to halt remove_bottleneck_in_subplan) AggregateExec ctx.data=false, to stop the link ...nodes... ctx.data=true (e.g. are linked in cascade) Coalesce ctx.data=true (e.g. is a coalesce) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org