[I] ParquetPushDecoder: expose the next row-group index that try_next_reader will yield [arrow-rs]

via GitHub Tue, 16 Jun 2026 07:12:04 -0700


zhuqi-lucas opened a new issue, #10148:
URL: https://github.com/apache/arrow-rs/issues/10148


   ## Description
   
   `ParquetPushDecoder` advertises `is_at_row_group_boundary()` and 
`row_groups_remaining()`, but a consumer that needs to track per-RG state in 
lock-step with the decoder has no way to ask **which row-group index** the next 
`try_next_reader()` call will produce a reader for. Today the call sites have 
to *assume* a 1:1 correspondence between the row-group list passed to 
`with_row_groups(...)` and the readers returned in order. That assumption 
breaks silently when page-index pruning eliminates every page of a row group: 
the decoder skips that RG internally and the next reader is for the RG after 
it, with no observable signal.
   
   ## Concrete example
   
   In DataFusion 
([apache/datafusion#22450](https://github.com/apache/datafusion/pull/22450)) we 
maintain an `rg_plan: VecDeque<RgEntry>` parallel to the decoder's row-group 
list so we can consult per-RG metadata (statistics for runtime row-group 
pruning, fully-matched flag for skipping `RowFilter` evaluation, etc.) at each 
boundary. The current implementation `pop_front()`s the queue every time 
`try_next_reader()` yields a reader. With this query:
   
   ```sql
   WHERE species > 'M' AND s >= 50 ORDER BY species LIMIT 3
   ```
   
   against a 3-row-group file where every page of RG 1 is page-index-pruned, 
the trace is:
   
   - `rg_plan = [RG1 (not fully-matched), RG2 (fully-matched), RG3 (not 
fully-matched)]`
   - arrow-rs page-index-prunes RG 1 entirely
   - `try_next_reader()` returns the reader for **RG 2**; our code pops RG 1 
from the queue
   - The next boundary check looks at the new head (`RG 2`), sees 
`fully-matched`, triggers a per-RG `RowFilter` toggle, calls 
`into_builder().with_row_filter(empty).with_row_groups([2, 3]).build()`
   - The rebuilt decoder dutifully reads RG 2 again — duplicate output
   
   The final query result is still correct because TopK ranks the rows, but the 
scan emits 2x the rows it should and downstream operators do extra work. Any 
consumer that wants to drive per-RG state alongside the decoder (per-RG 
pruners, fully-matched filter toggles, custom statistics, etc.) hits this.
   
   ## Proposed API
   
   Add a `peek_next_row_group(&self) -> Option<usize>` accessor on 
`ParquetPushDecoder` (and the underlying `RemainingRowGroups`). When 
`is_at_row_group_boundary()` is `true`, this returns `Some(idx)` where `idx` is 
the file-level row-group index that the next `try_next_reader()` call will 
yield a reader for — i.e. after any page-index-driven skipping the decoder will 
perform internally. When the decoder is mid-row-group or finished, returns 
`None`.
   
   An alternative shape would be `remaining_row_group_indices(&self) -> 
&[usize]` — the full sequence the decoder still intends to read. Either works 
for our use case.
   
   ## Why this matters
   
   This isn't specific to DataFusion — any push-decoder consumer that needs 
per-RG bookkeeping aligned with what the decoder is actually doing will run 
into this. The information already exists inside `RemainingRowGroups`; just no 
public accessor.
   
   Happy to send a PR if there's interest.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] ParquetPushDecoder: expose the next row-group index that try_next_reader will yield [arrow-rs]

Reply via email to