mertak-synnada commented on code in PR #14418:
URL: https://github.com/apache/datafusion/pull/14418#discussion_r1940564238


##########
datafusion/physical-optimizer/src/limit_pushdown.rs:
##########
@@ -146,6 +146,15 @@ pub fn pushdown_limit_helper(
         global_state.skip = skip;
         global_state.fetch = fetch;
 
+        if limit_exec.input().as_any().is::<CoalescePartitionsExec>() {

Review Comment:
   While I agree with checking with_fetch API suggestion, please also check 
with the `combines_input_partitions()` helper function so that 
SortPreservingMerge can be affected as well.
   
    In the optimizer logic, we remove the Limit operators first, and then we 
add them to the lowest possible point at the plan, if the plan is "satisfied" 
we drop the limit information. So if the plan is combining input partitions, 
we're only adding a global limit if `skip` information is there, maybe we can 
identify if the local limits are enough or not and then decide to add the 
global limit at there. But in the end, I think rather than adding a global 
limit, we should be able to limit in the `CoalescePartitionsExec` or in 
`SortPreservingMerge` so that it won't unnecessarily push more data
    
    ```
    // Execution plans can't (yet) handle skip, so if we have one,
   // we still need to add a global limit
   if global_state.skip > 0 {
       new_plan =
           add_global_limit(new_plan, global_state.skip, global_state.fetch);
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to