xiedeyantu opened a new pull request, #22150:
URL: https://github.com/apache/datafusion/pull/22150

   ## Which issue does this PR close?
   
   - Closes #.
   
   ## Rationale for this change
   
   `GlobalLimitExec` (and `LocalLimitExec`) are sometimes redundant: if the 
input can be proven via exact statistics to produce no more rows than the fetch 
value, the limit node does nothing and should be removed entirely.
   
   Previously, the `LimitPushdown` rule had no mechanism to eliminate such 
trivially-satisfied limits. A query like `SELECT * FROM (VALUES ...) LIMIT 10` 
— where the input is a single-row `PlaceholderRowExec` — still carried an 
unnecessary `GlobalLimitExec` in the physical plan. Similarly, a `LIMIT N` over 
an `EmptyExec` or any zero-row plan was retained.
   
   ## What changes are included in this PR?
   
   - Adds `limit_satisfied_by_input()` in `limit_pushdown.rs`: checks whether a 
plan's child provably produces at most `fetch` rows (requires `skip == 0` and a 
single output partition).
   - Adds `limit_eliminable_exact_num_rows()`: iteratively unwraps 
`ProjectionExec` wrappers and recognises `EmptyExec` (0 rows), 
`PlaceholderRowExec` (1 row), and any plan reporting `Precision::Exact(0)` rows 
as eliminable producers.
   - When a limit is statically satisfied, marks `global_state.satisfied = 
true` and returns early — **without** resetting `fetch`/`skip` — so nested 
limit nodes still receive the correct outer constraints to merge against.
   - Updates the `merges_local_limit_with_local_limit` snapshot: the result is 
now bare `EmptyExec` (limit eliminated).
   - Updates `union.slt`: `ProjectionExec` over `PlaceholderRowExec` (1 row) 
with `fetch=3` no longer carries a redundant `GlobalLimitExec`.
   - Adds `explain_tree.slt` test: `SELECT count(*) … LIMIT 10` over a two-row 
VALUES clause is correctly reduced to `ProjectionExec → PlaceholderRowExec` 
with no limit node.
   - Updates copy.slt: `fetch=10` is now correctly pushed all the way down to 
`DataSourceExec`.
   
   ## Are these changes tested?
   
   Yes.
   
   - `cargo fmt --all`
   - `cargo clippy --all-targets --all-features -- -D warnings`
   - `cargo test -p datafusion-core --test physical_optimizer limit`
   - `cargo test --features backtrace,parquet_encryption --profile ci --package 
datafusion-sqllogictest --test sqllogictests -- copy.slt union.slt 
explain_tree.slt`
   
   ## Are there any user-facing changes?
   
   No API changes. Physical plans for queries with `LIMIT` over statically 
small inputs (`EmptyExec`, `PlaceholderRowExec`, or zero-row tables) will now 
have the redundant `GlobalLimitExec`/`LocalLimitExec` nodes eliminated, 
resulting in simpler and slightly more efficient plans.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to