Re: [PR] feat(parquet): make `PushBuffers` boundary-agnostic for prefetch IO [arrow-rs]

via GitHub Tue, 14 Apr 2026 21:29:50 -0700


HippoBaro commented on PR #9697:
URL: https://github.com/apache/arrow-rs/pull/9697#issuecomment-4249093758


   I've pushed two follow-up commits that address the feedback.
   
   The watermark is gone, replaced by per-range release backed by a `BTreeMap` 
(as @alamb suggested). Row group column chunks are released individually when 
consumed or skipped — no monotonicity assumption, no restriction on file layout 
or access order. This fixes the reverse-scan regression @nathanb9 found.
   
   On the prefetching side, incoming buffers are now filtered at push time 
against the column chunk byte ranges of the queued row groups, so data the 
decoder will never consume is discarded before it enters `PushBuffers`. The IO 
layer can push anything — coalesced, prefetched, the entire file — and the 
decoder sheds what it doesn't need, or releases it in row-group increments 
otherwise, regardless of how they are laid out in the file.
   
   > I was thinking more about this change and I thought maybe we could take a 
step back and figure out what you are trying to accomplish
   
   Interested to know your thoughts about this last version, but completely 
open to shift direction if you feels this is not the right way to go about it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(parquet): make `PushBuffers` boundary-agnostic for prefetch IO [arrow-rs]

Reply via email to