zhuqi-lucas opened a new pull request, #186: URL: https://github.com/apache/datafusion-site/pull/186
## Summary A walkthrough of the sort pushdown work landed and in flight on Apache DataFusion. Opening as a **draft** to share the narrative early — the in-flight PRs the post discusses are still in flight, but the structure and the merged work (Phase 1 #19064, Phase 2 #21182, BufferExec #21426, row-group reverse #18817) are in their final shape. ## What this post covers - Why `SortExec` is expensive, and what `Exact` / `Inexact` mean at *runtime* (static `fetch` vs `TopK` dynamic filter). - **Phase 1** ([#19064](https://github.com/apache/datafusion/pull/19064)) — the `PushdownSort` rule + reverse row-group case. - **Phase 2** ([#21182](https://github.com/apache/datafusion/pull/21182)) — statistics-based file sort that upgrades `Unsupported` to `Exact`, eliminating the `SortExec` on non-overlapping ASC scans. Includes the `BufferExec` compensation ([#21426](https://github.com/apache/datafusion/pull/21426)) so the SPM above doesn't lose its implicit memory buffer. - **Reverse scans** — today's row-group reverse ([#18817](https://github.com/apache/datafusion/pull/18817), Inexact) and the community decision to wait for arrow-rs page-level reverse ([apache/arrow-rs#9937](https://github.com/apache/arrow-rs/pull/9937)) before pursuing `Exact` reverse, after memory-profile pushback on the original row-group-level proposal. - **Benchmarks** — 2.1×–49× on the ASC-LIMIT `sort_pushdown` suite. - **What's next** — the dynamic / TopK-driven path (#21351 merged, #21733, #21712, #21956, #21580) including the precise RG-level pruning vs mid-stream early return distinction, and the `EnsureRequirements` unification ([#21976](https://github.com/apache/datafusion/pull/21976)). - Links into the prior [dynamic filters](https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/) and [limit pruning](https://datafusion.apache.org/blog/2026/03/20/limit-pruning/) posts so the series reads as a coherent thread. ## Why a draft A few of the in-flight PRs the "What's next" section references may evolve in review (e.g. #21580 may be split into smaller pieces, dynamic RG scheduling on top of #21351 is described but not yet on a PR). Opening as draft so we can adjust wording as those land or change shape — happy to flip to ready for review when the dust settles, or earlier if reviewers prefer. ## Test plan - [x] Rendered locally with the Pelican Docker image from the project README — images, internal links, code blocks, and tables all render correctly. - [x] All issue / PR / blog-post links checked against current state. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
