HippoBaro commented on PR #9697: URL: https://github.com/apache/arrow-rs/pull/9697#issuecomment-4249093758
I've pushed two follow-up commits that address the feedback. The watermark is gone, replaced by per-range release backed by a `BTreeMap` (as @alamb suggested). Row group column chunks are released individually when consumed or skipped — no monotonicity assumption, no restriction on file layout or access order. This fixes the reverse-scan regression @nathanb9 found. On the prefetching side, incoming buffers are now filtered at push time against the column chunk byte ranges of the queued row groups, so data the decoder will never consume is discarded before it enters `PushBuffers`. The IO layer can push anything — coalesced, prefetched, the entire file — and the decoder sheds what it doesn't need, or releases it in row-group increments otherwise, regardless of how they are laid out in the file. > I was thinking more about this change and I thought maybe we could take a step back and figure out what you are trying to accomplish Interested to know your thoughts about this last version, but completely open to shift direction if you feels this is not the right way to go about it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
