[PR] feat(parquet): add wide-schema writer overhead benchmark [arrow-rs]

via GitHub Tue, 14 Apr 2026 23:13:45 -0700


HippoBaro opened a new pull request, #9723:
URL: https://github.com/apache/arrow-rs/pull/9723


   # Which issue does this PR close?
   
   - Contributes to #9722
   
   # Rationale for this change
   
   Existing writer benchmarks use narrow schemas (5–10 columns) and primarily 
measure data encoding throughput. They don't capture per-column structural 
overhead that dominates at high column cardinality (thousands to hundreds of 
thousands of columns), such as allocation, and metadata assembly.
   
   # What changes are included in this PR?
   
   This commit adds benchmarks to fill that gap by writing a single-row batch 
through `ArrowWriter` with 1k/5k/10k flat `Float32` columns and per-column 
`WriterProperties` entries, isolating the cost of the writer infrastructure 
itself.
   
   Baseline results (Apple M1 Max):
   
   ```
     writer_overhead/1000_cols/per_column_props      3.72 ms
     writer_overhead/5000_cols/per_column_props     54.96 ms
     writer_overhead/10000_cols/per_column_props   220.73 ms
   ```
   
   # Are these changes tested?
   
   N/A
   
   # Are there any user-facing changes?
   
   N/A
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] feat(parquet): add wide-schema writer overhead benchmark [arrow-rs]

Reply via email to