Dandandan opened a new pull request, #9578: URL: https://github.com/apache/arrow-rs/pull/9578
## Summary - When a `RowSelection` selects every row in a row group, `fetch_ranges` now treats it as no selection, producing a single whole-column-chunk I/O request instead of N individual page requests - This reduces the number of I/O requests for subsequent filter predicates when an earlier predicate passes all rows ## Details In `InMemoryRowGroup::fetch_ranges`, when both a `RowSelection` and an `OffsetIndex` are present, the code enters a page-level fetch path that uses `scan_ranges()` to produce individual page ranges. Even when the selection covers all rows, this produces N separate ranges (one per page). The fix: before entering the page-level path, check if the selection's `row_count()` equals the row group's total row count. If so, drop the selection and take the simpler whole-column-chunk path. This commonly happens when a multi-predicate `RowFilter` has an early predicate that passes all rows in a row group (e.g., `CounterID = 62` on a row group where all rows have `CounterID = 62`). ## Test plan - [x] Existing tests pass (snapshot updated to reflect fewer I/O requests) - [x] `test_read_multiple_row_filter` verifies the coalesced fetch pattern 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
