xiedeyantu opened a new pull request, #22150: URL: https://github.com/apache/datafusion/pull/22150
## Which issue does this PR close? - Closes #. ## Rationale for this change `GlobalLimitExec` (and `LocalLimitExec`) are sometimes redundant: if the input can be proven via exact statistics to produce no more rows than the fetch value, the limit node does nothing and should be removed entirely. Previously, the `LimitPushdown` rule had no mechanism to eliminate such trivially-satisfied limits. A query like `SELECT * FROM (VALUES ...) LIMIT 10` — where the input is a single-row `PlaceholderRowExec` — still carried an unnecessary `GlobalLimitExec` in the physical plan. Similarly, a `LIMIT N` over an `EmptyExec` or any zero-row plan was retained. ## What changes are included in this PR? - Adds `limit_satisfied_by_input()` in `limit_pushdown.rs`: checks whether a plan's child provably produces at most `fetch` rows (requires `skip == 0` and a single output partition). - Adds `limit_eliminable_exact_num_rows()`: iteratively unwraps `ProjectionExec` wrappers and recognises `EmptyExec` (0 rows), `PlaceholderRowExec` (1 row), and any plan reporting `Precision::Exact(0)` rows as eliminable producers. - When a limit is statically satisfied, marks `global_state.satisfied = true` and returns early — **without** resetting `fetch`/`skip` — so nested limit nodes still receive the correct outer constraints to merge against. - Updates the `merges_local_limit_with_local_limit` snapshot: the result is now bare `EmptyExec` (limit eliminated). - Updates `union.slt`: `ProjectionExec` over `PlaceholderRowExec` (1 row) with `fetch=3` no longer carries a redundant `GlobalLimitExec`. - Adds `explain_tree.slt` test: `SELECT count(*) … LIMIT 10` over a two-row VALUES clause is correctly reduced to `ProjectionExec → PlaceholderRowExec` with no limit node. - Updates copy.slt: `fetch=10` is now correctly pushed all the way down to `DataSourceExec`. ## Are these changes tested? Yes. - `cargo fmt --all` - `cargo clippy --all-targets --all-features -- -D warnings` - `cargo test -p datafusion-core --test physical_optimizer limit` - `cargo test --features backtrace,parquet_encryption --profile ci --package datafusion-sqllogictest --test sqllogictests -- copy.slt union.slt explain_tree.slt` ## Are there any user-facing changes? No API changes. Physical plans for queries with `LIMIT` over statically small inputs (`EmptyExec`, `PlaceholderRowExec`, or zero-row tables) will now have the redundant `GlobalLimitExec`/`LocalLimitExec` nodes eliminated, resulting in simpler and slightly more efficient plans. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
