zhuqi-lucas opened a new pull request, #186:
URL: https://github.com/apache/datafusion-site/pull/186

   ## Summary
   
   A walkthrough of the sort pushdown work landed and in flight on Apache 
DataFusion. Opening as a **draft** to share the narrative early — the in-flight 
PRs the post discusses are still in flight, but the structure and the merged 
work (Phase 1 #19064, Phase 2 #21182, BufferExec #21426, row-group reverse 
#18817) are in their final shape.
   
   ## What this post covers
   
   - Why `SortExec` is expensive, and what `Exact` / `Inexact` mean at 
*runtime* (static `fetch` vs `TopK` dynamic filter).
   - **Phase 1** ([#19064](https://github.com/apache/datafusion/pull/19064)) — 
the `PushdownSort` rule + reverse row-group case.
   - **Phase 2** ([#21182](https://github.com/apache/datafusion/pull/21182)) — 
statistics-based file sort that upgrades `Unsupported` to `Exact`, eliminating 
the `SortExec` on non-overlapping ASC scans. Includes the `BufferExec` 
compensation ([#21426](https://github.com/apache/datafusion/pull/21426)) so the 
SPM above doesn't lose its implicit memory buffer.
   - **Reverse scans** — today's row-group reverse 
([#18817](https://github.com/apache/datafusion/pull/18817), Inexact) and the 
community decision to wait for arrow-rs page-level reverse 
([apache/arrow-rs#9937](https://github.com/apache/arrow-rs/pull/9937)) before 
pursuing `Exact` reverse, after memory-profile pushback on the original 
row-group-level proposal.
   - **Benchmarks** — 2.1×–49× on the ASC-LIMIT `sort_pushdown` suite.
   - **What's next** — the dynamic / TopK-driven path (#21351 merged, #21733, 
#21712, #21956, #21580) including the precise RG-level pruning vs mid-stream 
early return distinction, and the `EnsureRequirements` unification 
([#21976](https://github.com/apache/datafusion/pull/21976)).
   - Links into the prior [dynamic 
filters](https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/) and 
[limit pruning](https://datafusion.apache.org/blog/2026/03/20/limit-pruning/) 
posts so the series reads as a coherent thread.
   
   ## Why a draft
   
   A few of the in-flight PRs the "What's next" section references may evolve 
in review (e.g. #21580 may be split into smaller pieces, dynamic RG scheduling 
on top of #21351 is described but not yet on a PR). Opening as draft so we can 
adjust wording as those land or change shape — happy to flip to ready for 
review when the dust settles, or earlier if reviewers prefer.
   
   ## Test plan
   
   - [x] Rendered locally with the Pelican Docker image from the project README 
— images, internal links, code blocks, and tables all render correctly.
   - [x] All issue / PR / blog-post links checked against current state.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to