[PR] perf: Coalesce page fetches when RowSelection selects all rows [arrow-rs]

via GitHub Thu, 19 Mar 2026 01:08:32 -0700


Dandandan opened a new pull request, #9578:
URL: https://github.com/apache/arrow-rs/pull/9578


   ## Summary
   
   - When a `RowSelection` selects every row in a row group, `fetch_ranges` now 
treats it as no selection, producing a single whole-column-chunk I/O request 
instead of N individual page requests
   - This reduces the number of I/O requests for subsequent filter predicates 
when an earlier predicate passes all rows
   
   ## Details
   
   In `InMemoryRowGroup::fetch_ranges`, when both a `RowSelection` and an 
`OffsetIndex` are present, the code enters a page-level fetch path that uses 
`scan_ranges()` to produce individual page ranges. Even when the selection 
covers all rows, this produces N separate ranges (one per page).
   
   The fix: before entering the page-level path, check if the selection's 
`row_count()` equals the row group's total row count. If so, drop the selection 
and take the simpler whole-column-chunk path.
   
   This commonly happens when a multi-predicate `RowFilter` has an early 
predicate that passes all rows in a row group (e.g., `CounterID = 62` on a row 
group where all rows have `CounterID = 62`).
   
   ## Test plan
   
   - [x] Existing tests pass (snapshot updated to reflect fewer I/O requests)
   - [x] `test_read_multiple_row_filter` verifies the coalesced fetch pattern
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] perf: Coalesce page fetches when RowSelection selects all rows [arrow-rs]

Reply via email to