[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #4043: [Part2] Partition and Sort Enforcement, ExecutionPlan enhancement

GitBox Wed, 02 Nov 2022 17:09:00 -0700


mingmwang commented on code in PR #4043:
URL: https://github.com/apache/arrow-datafusion/pull/4043#discussion_r1012378962



##########
datafusion/core/src/physical_plan/windows/window_agg_exec.rs:
##########
@@ -119,22 +129,25 @@ impl ExecutionPlan for WindowAggExec {
         true
     }
 
-    fn relies_on_input_order(&self) -> bool {
-        true
+    fn required_input_ordering(&self) -> Vec<Option<&[PhysicalSortExpr]>> {
+        let sort_keys = self.sort_keys.as_deref();
+        vec![sort_keys]
     }
 
-    fn required_child_distribution(&self) -> Distribution {
-        if self
-            .window_expr()
-            .iter()
-            .all(|expr| expr.partition_by().is_empty())
-        {
-            Distribution::SinglePartition
+    fn required_input_distribution(&self) -> Vec<Distribution> {
+        if self.partition_keys.is_empty() {
+            warn!("No partition defined for WindowAggExec!!!");

Review Comment:
   Yes, this is a valid case, but the SQL might run very slowly without any 
`Partition By` clause due to collapsed to the `Distribution::SinglePartition`.  
 I can remove the warning if we think the warning is useless.  There is one 
optimization we can do here in future after we add
   the `Range Partitioning` (I can work on this maybe next month). When there 
is not `Partition By` clause but only `Order By`, and depends on the window 
funcs, for some cases we can make the  `required_input_distribution` to be 
`SortDistribution`, so that the `WindowAggExec` can still run in parallel. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #4043: [Part2] Partition and Sort Enforcement, ExecutionPlan enhancement

Reply via email to