zeroshade commented on PR #864:
URL: https://github.com/apache/arrow-go/pull/864#issuecomment-4848676420
Following up on my last comment with a concrete alternative direction. I
measured the original path against a prompt-return path on current `main` (same
64×100 harness you posted, `b.N=1`, go1.26.4), splitting the profile into
`alloc_space` (churn) vs `inuse_space` (live after the test's end-of-run GC):
| Path | `alloc_space` (churn) | `inuse_space` (retained) | time/batch |
|---|---|---|---|
| Current `main` (return via `runtime.AddCleanup` only) | ~5.1 GB | ~3.0 GB
| 0.49 s |
| Return the bitset buffer to the pool **promptly** | ~0.40 GB | **~83 MB**
| 0.088 s |
So the per-reader `sync.Pool` already reuses correctly — the only problem is
*when* the buffer goes back. Today the bitset buffer is returned only when the
`BloomFilter` is GC'd (`addCleanup` → `runtime.AddCleanup`). Under a burst
those cleanups lag, so `Get()` keeps allocating fresh ~1 MB buffers. Returning
the same buffer synchronously collapses both churn (~13×) and retained memory
(~36×) and runs ~5.5× faster. The remaining ~83 MB is ≈ `concurrency × 1 MB` —
one live buffer per in-flight goroutine, which is exactly the bounded
steady-state we're after.
The only change from your `BenchmarkGetColumnBloomFilter_OriginalSyncPool`
that produces the second row is returning the buffer to the pool instead of `_
= bf`:
```go
bf, err := threadRdr.GetColumnBloomFilter(0)
if err != nil {
return
}
// return the bitset buffer to the pool immediately instead of waiting for GC
if b, ok := bf.(*blockSplitBloomFilter); ok && b.data != nil {
if b.cancelCleanup != nil {
b.cancelCleanup() // cancel the AddCleanup so the buffer isn't
returned twice
}
b.data.ResizeNoShrink(0)
originalArrowPool.Put(b.data)
}
```
Same harness, same pool, same allocator — only the return timing changes,
and that alone accounts for the 5.1 GB → 0.40 GB / 3.0 GB → 83 MB difference.
(There's also a real pool-pollution issue: `file.Reader` hands the bloom reader
the *same* `sync.Pool` the column/record readers use, so MB-sized bitsets and
small page buffers churn each other.)
## Proposal
Keep `GetColumnBloomFilter` and its `AddCleanup` exactly as they are
(back-compat + a GC safety net so nothing leaks if a caller retains the
filter), and add an **opt-in prompt-return path**. Two possible shapes — I lean
toward (A):
**Option A — scoped accessor (preferred):**
```go
func (r *RowGroupBloomFilterReader) VisitColumnBloomFilter(i int, fn
func(BloomFilter) error) error {
bf, err := r.GetColumnBloomFilter(i)
if err != nil || bf == nil {
return err
}
defer r.recycle(bf) // cancelCleanup() + data.ResizeNoShrink(0) +
pool.Put(data)
return fn(bf)
}
```
The buffer returns to the pool the instant the callback ends —
deterministic, lexically-scoped lifetime, hard to misuse.
**Option B — explicit recycle:**
```go
bf, err := rg.GetColumnBloomFilter(i)
// ... use bf ...
rg.RecycleBloomFilter(bf) // cancels the cleanup, ResizeNoShrink(0), Put
back to pool
```
More flexible for callers who can't wrap usage in a closure, but easier to
forget (forgetting just falls back to today's GC behavior, so it's safe, only
slower).
Both call the same helper that cancels the registered cleanup (so the buffer
isn't double-returned) and puts it back. Either can pair with giving the bloom
reader its **own per-reader pool**, separate from the page-buffer pool, to fix
the pollution.
## Why this over the current PR
The PR reaches a similar steady-state, but the way it gets there trips each
of the constraints I called out earlier:
- **Public API break** — it adds `Close()` to the `BloomFilter` interface;
every external implementer/consumer breaks. The proposal adds a method on the
*reader* and leaves `BloomFilter` untouched.
- **Allocator bypass** — it reads headers into raw `make([]byte, …)`; the
proposal keeps everything in allocator-backed `*memory.Buffer`s.
- **Process-global pools** — it declares package-level
`headerPool`/`filterPool` channels (currently unused, but the wrong direction);
the proposal keeps strictly per-reader pools.
- **Removes the GC safety net** — it drops `AddCleanup` on the read path, so
a filter backed by a CGO/mallocator allocator leaks if the caller never calls
`Close()`. The proposal keeps `AddCleanup` as the fallback.
- **Use-after-recycle** — `Close()` resets the buffer to full cap and pushes
it into a shared pool while the returned filter's `bitset32` still aliases that
memory; a concurrent reader can grab and overwrite it. The scoped accessor
bounds the lifetime so that can't happen.
Net: same bounded-memory result, no interface break, no allocator bypass, no
global pools, and the existing safety net stays. Happy to put up a PR with
Option A + the dedicated pool if that direction works for you.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]