mertak-synnada commented on code in PR #14418:
URL: https://github.com/apache/datafusion/pull/14418#discussion_r1940564238
##########
datafusion/physical-optimizer/src/limit_pushdown.rs:
##########
@@ -146,6 +146,15 @@ pub fn pushdown_limit_helper(
global_state.skip = skip;
global_state.fetch = fetch;
+ if limit_exec.input().as_any().is::<CoalescePartitionsExec>() {
Review Comment:
While I agree with checking with_fetch API suggestion, please also check
with the `combines_input_partitions()` helper function so that
SortPreservingMerge can be affected as well.
In the optimizer logic, we remove the Limit operators first, and then we
add them to the lowest possible point at the plan, if the plan is "satisfied"
we drop the limit information. So if the plan is combining input partitions,
we're only adding a global limit if `skip` information is there, maybe we can
identify if the local limits are enough or not and then decide to add the
global limit at there. But in the end, I think rather than adding a global
limit, we should be able to limit in the `CoalescePartitionsExec` or in
`SortPreservingMerge` so that it won't unnecessarily push more data
```
// Execution plans can't (yet) handle skip, so if we have one,
// we still need to add a global limit
if global_state.skip > 0 {
new_plan =
add_global_limit(new_plan, global_state.skip, global_state.fetch);
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]