Re: [PR] [POC for #9934] parquet: add ReverseSerializedPageReader [arrow-rs]

via GitHub Wed, 06 May 2026 23:06:33 -0700


zhuqi-lucas commented on PR #9937:
URL: https://github.com/apache/arrow-rs/pull/9937#issuecomment-4394494848


   Added a criterion benchmark in `e3d7ddb` 
(`parquet/benches/reverse_page_reader.rs`).
   
   Phase 1's empirical claim is **per-page cost parity** with the existing 
forward `SerializedPageReader`, not a flashy speedup — the speedup comes at 
higher levels (Phase 2/3). Numbers from `cargo bench -- --quick` on Apple 
M-series, 100k INT32 values, ~98 data pages, no dictionary:
   
   | Bench | Forward | Reverse | Δ |
   |---|---:|---:|---:|
   | uncompressed / drain (full chunk) | 9.73 µs | 9.34 µs | -4% |
   | uncompressed / first_page latency | 164 ns | 156 ns | -5% |
   | snappy / drain (full chunk) | 21.5 µs | 21.2 µs | -1% |
   | snappy / first_page latency | 313 ns | 283 ns | -10% |
   
   All within measurement noise; reverse iteration adds no per-page overhead.
   
   Phase 2's value-add (early termination + low peak memory under `WHERE filter 
ORDER BY DESC LIMIT N`) is not exercised here — that requires 
record-batch-level integration and a different benchmark setup. Filed under 
future work.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [POC for #9934] parquet: add ReverseSerializedPageReader [arrow-rs]

Reply via email to