zeroshade opened a new pull request, #745: URL: https://github.com/apache/arrow-go/pull/745
## Summary - Add `BinaryMemoTable.ExistsDirect` that inlines the hash table probe loop, avoiding the closure in `HashTable.Lookup` that causes `val []byte` to escape to the heap - Add `isInBinaryDirect` specialized kernel path that bypasses the `visitBinary` → `VisitBitBlocksShort` closure chain by directly iterating with `OptionalBitBlockCounter` - Route `BinaryDataType` dispatch in `DispatchIsIn` to the new direct path (handles both int32 and int64 offsets) ## Motivation The `is_in` kernel for binary types allocated once per input element because the `[]byte` value escaped to the heap through a closure chain: 1. `visitBinary` slices `rawBytes[offsets[pos]:offsets[pos+1]]` and passes to a callback 2. The callback calls `BinaryMemoTable.Exists(v)` 3. `Exists` calls `lookup` which creates a closure capturing `val` 4. The closure is passed to `HashTable.Lookup`, causing escape analysis to move `val` to the heap Closes #736 ## Benchmark (100k rows, 10-element value set) | Metric | Before | After | Improvement | |--------|--------|-------|-------------| | ns/op | 4,133,679 | 923,565 | **4.5x faster** | | B/op | 2,435,327 | 33,092 | **73x less memory** | | allocs/op | 100,075 | 70 | **1,430x fewer allocs** | All existing `TestIsInBinary` subtests pass (binary, large\_binary, utf8, large\_utf8 × all null matching behaviors). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
