CuteChuanChuan commented on issue #9667:
URL: https://github.com/apache/arrow-rs/issues/9667#issuecomment-4322291963

   Hi @alamb, Posting a plan and a few open questions before sending a draft PR.
   
   I'm targeting `59.0.0` (May 2026) since this involves making 
`BloomFilterProperties` fields private, which is a breaking change. Let me know 
if that's the wrong release to aim for.
   
   ### Plan
   
   - [ ] Add `BloomFilterPropertiesBuilder` in `parquet/src/file/properties.rs`
   - [ ] Make `BloomFilterProperties` fields private; add `fpp()` and `ndv()` 
accessors
   - [ ] Migrate `WriterPropertiesBuilder::set_bloom_filter_*` to delegate to 
the builder internally
   - [ ] Fix the disable → re-enable stale-state issue mentioned in the issue
   - [ ] Add the regression tests called out in the issue (disable/re-enable, 
per-column implicit NDV, explicit vs default NDV)
   
   ### Proposed API sketch
   
   ```rust
   // Building
   let props = BloomFilterProperties::builder()
       .with_fpp(0.01)
       .with_max_ndv(10_000)
       .build();
   
   // Reading
   assert_eq!(props.fpp(), 0.01);
   
   // Per-column via WriterProperties
   let writer_props = WriterProperties::builder()
       .set_column_bloom_filter_properties(
           ColumnPath::from("col"),
           BloomFilterProperties::builder().with_fpp(0.01).build(),
       )
       .build();
   ```
   
   ### Open questions
   
   1. **Entry point: `::new()` vs `::builder()`**
   The issue description shows `BloomFilterProperties::new().with_fpp(...)`, 
but the rest of this crate consistently uses `Foo::builder()` to return the 
builder (e.g. `WriterProperties::builder()`). I'd lean toward 
`BloomFilterProperties::builder()` for consistency. OK to deviate from the 
issue example?
   
   2. **Validation: `Result` vs panic**
   The existing `set_bloom_filter_fpp` panics on out-of-range input. I'd keep 
the same behavior in `with_fpp`/`with_ndv` (panic via `assert!`) so `build()` 
can return `BloomFilterProperties` directly without `Result`, matching 
`WriterPropertiesBuilder::build()`. Any reason to prefer `Result` here?
   
   3. **`with_max_ndv` naming**
   Agreed `with_max_ndv` better reflects the post-folding semantics. For the 
existing `WriterPropertiesBuilder::set_bloom_filter_ndv`, want me to rename → 
`set_bloom_filter_max_ndv` (breaking) or keep the old name?
   
   4. **Internal state for enable/disable**
   To fix the stale-state issue, I'm thinking of representing per-column config 
internally as `Option<BloomFilterProperties>` so `disable` cleanly resets to 
`None`. This may touch a few call sites — happy to go a different direction if 
you'd prefer keeping the separate `enabled: bool` flag.
   
   I'll wait for direction on these before sending the PR. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to