CuteChuanChuan opened a new pull request, #9877:
URL: https://github.com/apache/arrow-rs/pull/9877

   # Which issue does this PR close?
   - Closes #9667.
   
   # Rationale for this change
   There is no builder for `BloomFilterProperties`, so callers must construct 
it via the struct literal `BloomFilterProperties { fpp, ndv }`, which locks the 
public field layout into the API surface and gives no validation of `fpp` at 
construction time.
   
   # What changes are included in this PR?
   - `BloomFilterPropertiesBuilder` with `new` / `with_fpp` / `with_max_ndv` / 
`build` / `try_build`. Per maintainer feedback in #9667, both 
`BloomFilterProperties::builder()` and `BloomFilterPropertiesBuilder::new()` 
are valid entry points; `build()` panics on invalid `fpp`, `try_build()` 
returns `Result`.
   - `WriterPropertiesBuilder::set_bloom_filter_properties` and 
`set_column_bloom_filter_properties` for atomic struct-based configuration. NDV 
from the supplied `BloomFilterProperties` is treated as explicit and is **not** 
overridden by the build-time row-group-size NDV fallback. For dynamic NDV 
sizing, `set_bloom_filter_enabled` / `set_bloom_filter_fpp` remain available.
   - Renamed `set_bloom_filter_ndv` → `set_bloom_filter_max_ndv` (and the 
per-column variant). Old names kept as `#[deprecated(since = "59.0.0")]` 
aliases that forward to the new ones, so this is non-breaking. Internal call 
sites in `parquet-rewrite` and `arrow_writer` migrated.
   - The fpp range check is centralized in a single helper used by both the 
panicking setter and `try_build()`, so the two paths can't drift out of sync.
   - Doc-test on `BloomFilterProperties` showing the builder and the new setter.
   
   **Out of scope** (deferred to follow-up PRs per maintainer suggestion in the 
issue):
   - The disable → re-enable stale-state issue mentioned in #9667.
   - Making `BloomFilterProperties::{fpp, ndv}` private and exposing accessors 
(breaking change, would target a separate major-version PR).
   
   # Are these changes tested?
   
   Yes — 10 new unit tests in `parquet/src/file/properties.rs`:
   - 5 builder tests: defaults, partial setting (fpp-only / ndv-only), 
`build()` panicking on bad fpp, `try_build()` returning `Err` on bad fpp + `Ok` 
on valid input.
   - 3 integration tests for the new writer-side setters: global broadcast, 
per-column override of global, and the explicit-NDV-survives-build semantics.
   - 1 regression test pinning the existing dynamic-NDV fallback for per-column 
bloom filters (so the new code paths don't accidentally break the existing 
behaviour).
   - 1 test verifying the deprecated `set_bloom_filter_ndv` aliases still work.
   
   The existing `test_writer_properties_bloom_filter_ndv_fpp_set` was migrated 
to use the new setter name. 
   
   # Are there any user-facing changes?
   Yes, all additive and non-breaking:
   - New public types/methods: `BloomFilterPropertiesBuilder`, 
`BloomFilterProperties::builder`, 
`WriterPropertiesBuilder::set_bloom_filter_properties`,   
`WriterPropertiesBuilder::set_column_bloom_filter_properties`,  
`WriterPropertiesBuilder::set_bloom_filter_max_ndv`, 
`WriterPropertiesBuilder::set_column_bloom_filter_max_ndv`.
   - Soft deprecation: `WriterPropertiesBuilder::set_bloom_filter_ndv` and 
`set_column_bloom_filter_ndv` are now `#[deprecated(since = "59.0.0")]` aliases 
for the `_max_ndv` variants. Calling them still works and just emits a 
deprecation warning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to