HippoBaro opened a new pull request, #9724:
URL: https://github.com/apache/arrow-rs/pull/9724
# Which issue does this PR close?
- Depends on #9723
- Contributes to #9722
# Rationale for this change
`WriterProperties::offset_index_disabled()` checked whether any column in
the `column_properties` HashMap has page-level statistics enabled, scanning the
entire map on every call. This method is called from `GenericColumnWriter::new`
— once per column per row group. With N columns each having per-column
properties, this resulted in quadratic HashMap iterations during row group
construction.
# What changes are included in this PR?
Move the scan into `WriterPropertiesBuilder::build()` so it runs once at
construction time.
Benchmark results (vs baseline):
```
writer_overhead/1000_cols/per_column_props 2.44 ms (was 3.25 ms,
−25%)
writer_overhead/5000_cols/per_column_props 13.28 ms (was 47.45 ms,
−72%)
writer_overhead/10000_cols/per_column_props 27.97 ms (was 197.97 ms,
−86%)
```
Scaling now linear.
# Are these changes tested?
All tests passing.
# Are there any user-facing changes?
None.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]