zeroshade commented on code in PR #864:
URL: https://github.com/apache/arrow-go/pull/864#discussion_r3521460586


##########
parquet/metadata/bloom_filter.go:
##########
@@ -549,6 +551,44 @@ func (r *RowGroupBloomFilterReader) GetColumnBloomFilter(i 
int) (BloomFilter, er
        return bf, nil
 }
 
+// VisitColumnBloomFilter invokes fn for the BloomFilter of the column at 
index i.
+//
+// Lifetime contract: The BloomFilter passed to fn (and its backing bitset)
+// is recycled immediately after fn returns. Callers must not retain, store,
+// or use the BloomFilter or its data outside the scope of the fn function.
+func (r *RowGroupBloomFilterReader) VisitColumnBloomFilter(i int, fn 
func(BloomFilter) error) error {
+       bf, err := r.GetColumnBloomFilter(i)
+       if err != nil || bf == nil {
+               return err
+       }
+
+       defer r.recycle(bf)
+
+       return fn(bf)
+}
+
+func (r *RowGroupBloomFilterReader) recycle(bf BloomFilter) {
+       if bf == nil {
+               return
+       }
+
+       b, ok := bf.(*blockSplitBloomFilter)
+       if !ok {
+               return
+       }
+
+       b.bitset32 = nil
+
+       if r.bufferPool != nil && b.cancelCleanup != nil {
+               r.bufferPool.Put(b.data)
+               b.cancelCleanup = nil

Review Comment:
   **[blocker] The GC cleanup is never actually cancelled, so `b.data` is 
returned to the pool twice.**
   
   `addCleanup` (the `//go:build go1.24` variant) registers 
`runtime.AddCleanup(bf, fn, bf.data)` and stores `bf.cancelCleanup = c.Stop`; 
`fn` captures the buffer and does `bufferPool.Put(data)`. Here `recycle` sets 
`b.cancelCleanup = nil`, which only drops the reference to `Stop` — it does 
**not** unregister the cleanup. So once `VisitColumnBloomFilter` returns and 
`bf` becomes unreachable, the cleanup still fires and `Put`s the **same** 
buffer a second time.
   
   - **Go 1.24+**: the buffer is now in the pool twice → two `Get()` callers 
receive the same `*memory.Buffer` → concurrent use / corruption (the 
pool-corruption class this PR is meant to fix).
   - **Go ≤1.23** (`SetFinalizer` variant): worse — `recycle` already set 
`b.data = nil`, so the finalizer runs `f.data.ResizeNoShrink(0)` on a nil 
buffer → panic in the finalizer goroutine.
   
   Fix: invoke the canceler before returning to the pool.
   
   ```suggestion
        if r.bufferPool != nil && b.cancelCleanup != nil {
                // Stop the GC cleanup so it can't return b.data to the pool a 
second time.
                b.cancelCleanup()
                b.cancelCleanup = nil
                r.bufferPool.Put(b.data)
   ```
   
   Worth a regression test too: `TestRecycleEncryptedPathPool` covers the 
encrypted branch, but nothing catches the double-`Put`. A test that recycles a 
pool-backed filter, forces `runtime.GC()`, and asserts the buffer isn't 
enqueued twice (e.g. two `pool.Get()`s return distinct pointers) would guard 
this.



##########
parquet/metadata/bloom_filter.go:
##########
@@ -495,7 +495,7 @@ func (r *RowGroupBloomFilterReader) GetColumnBloomFilter(i 
int) (BloomFilter, er
        headerBuf.ResizeNoShrink(int(bloomFilterReadSize))
        defer func() {
                if headerBuf != nil {
-                       headerBuf.ResizeNoShrink(0)
+                       headerBuf.Reset(headerBuf.Buf()[:cap(headerBuf.Buf())])

Review Comment:
   Minor / possibly unintended: this changes `headerBuf.ResizeNoShrink(0)` → 
`Reset(...[:cap])`. `ResizeNoShrink(0)` already retains the backing capacity 
(it only sets `length = 0`), so this isn't needed for reuse, and it's 
inconsistent with the cleanup path which still does `data.ResizeNoShrink(0)` 
before `Put`. Not harmful (`Reset` preserves `mem`/`refCount`), but unless 
there's a reason I'd revert it to keep the recycling style uniform.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to