wiedld commented on code in PR #14637:
URL: https://github.com/apache/datafusion/pull/14637#discussion_r1958892432


##########
datafusion/physical-optimizer/src/enforce_sorting/mod.rs:
##########
@@ -246,32 +282,50 @@ fn replace_with_partial_sort(
 /// This function turns plans of the form
 /// ```text
 ///      "SortExec: expr=\[a@0 ASC\]",
-///      "  CoalescePartitionsExec",
-///      "    RepartitionExec: partitioning=RoundRobinBatch(8), 
input_partitions=1",
+///      "  ...nodes..."
+///      "    CoalescePartitionsExec",
+///      "      RepartitionExec: partitioning=RoundRobinBatch(8), 
input_partitions=1",
 /// ```
 /// to
 /// ```text
 ///      "SortPreservingMergeExec: \[a@0 ASC\]",
 ///      "  SortExec: expr=\[a@0 ASC\]",
-///      "    RepartitionExec: partitioning=RoundRobinBatch(8), 
input_partitions=1",
+///      "    ...nodes..."
+///      "      RepartitionExec: partitioning=RoundRobinBatch(8), 
input_partitions=1",
 /// ```
 /// by following connections from [`CoalescePartitionsExec`]s to [`SortExec`]s.
 /// By performing sorting in parallel, we can increase performance in some 
scenarios.
+///
+/// This requires that there are no nodes between the [`SortExec`] and 
[`CoalescePartitionsExec`]

Review Comment:
   Think I need better words. 😆 
   
   The context is made to find linked Sort->Coalesce cascades.
   ```
   SortExec ctx.data=false, to halt remove_bottleneck_in_subplan)
      ...nodes...   ctx.data=true (e.g. are linked in cascade)
            Coalesce  ctx.data=true (e.g. is a coalesce) 
   ```
   
   This linkage is then used to say "if we find a sort, remove the coalesce 
from the subplan". Specifically, [this 
code](https://github.com/apache/datafusion/blob/e4b78c7ed40c248cfc9596d53f1813b62c668249/datafusion/physical-optimizer/src/enforce_sorting/mod.rs#L334-L348).
 If the link is broken, a.k.a. if `ctx.data=false`, then stop [going down the 
subplan](https://github.com/apache/datafusion/blob/e4b78c7ed40c248cfc9596d53f1813b62c668249/datafusion/physical-optimizer/src/enforce_sorting/mod.rs#L610)
 looking for coalesce to remove.
   
   So the link only exists as long as "no nodes" break the link.
   
   Example of an unlinked Coalesce->Sort, since the aggregate requires the 
coalesce for single partitioned input:
   ```
   SortExec ctx.data=false, to halt remove_bottleneck_in_subplan)
      AggregateExec  ctx.data=false, to stop the link
         ...nodes...   ctx.data=true (e.g. are linked in cascade)
            Coalesce  ctx.data=true (e.g. is a coalesce) 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to