Re: [PR] fix(parquet): bound data page byte size for large variable-width values [arrow-rs]

via GitHub Tue, 26 May 2026 15:36:12 -0700


etseidl commented on code in PR #9972:
URL: https://github.com/apache/arrow-rs/pull/9972#discussion_r3307256819



##########
parquet/benches/arrow_writer.rs:
##########
@@ -100,6 +100,29 @@ fn create_string_bench_batch(
     )?)
 }
 
+/// 1 M short, fixed-width 8-byte strings. Exercises the BYTE_ARRAY hot path
+/// for the case where individual values are small enough that the byte-budget
+/// based sub-batch sizing in `write_batch_internal` should always resolve to
+/// the full chunk (no granular splitting, no regression vs. current behavior).
+fn create_short_string_bench_batch(size: usize) -> Result<RecordBatch> {

Review Comment:
   Let's break the bench update into a separate PR so we can see the difference 
in the large string case. I'm seeing a 13% slowdown vs main, but that may just 
be the price for getting smaller batches.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix(parquet): bound data page byte size for large variable-width values [arrow-rs]

Reply via email to