zeroshade opened a new pull request, #745:
URL: https://github.com/apache/arrow-go/pull/745

   ## Summary
   
   - Add `BinaryMemoTable.ExistsDirect` that inlines the hash table probe loop, 
avoiding the closure in `HashTable.Lookup` that causes `val []byte` to escape 
to the heap
   - Add `isInBinaryDirect` specialized kernel path that bypasses the 
`visitBinary` → `VisitBitBlocksShort` closure chain by directly iterating with 
`OptionalBitBlockCounter`
   - Route `BinaryDataType` dispatch in `DispatchIsIn` to the new direct path 
(handles both int32 and int64 offsets)
   
   ## Motivation
   
   The `is_in` kernel for binary types allocated once per input element because 
the `[]byte` value escaped to the heap through a closure chain:
   
   1. `visitBinary` slices `rawBytes[offsets[pos]:offsets[pos+1]]` and passes 
to a callback
   2. The callback calls `BinaryMemoTable.Exists(v)`
   3. `Exists` calls `lookup` which creates a closure capturing `val`
   4. The closure is passed to `HashTable.Lookup`, causing escape analysis to 
move `val` to the heap
   
   Closes #736
   
   ## Benchmark (100k rows, 10-element value set)
   
   | Metric | Before | After | Improvement |
   |--------|--------|-------|-------------|
   | ns/op | 4,133,679 | 923,565 | **4.5x faster** |
   | B/op | 2,435,327 | 33,092 | **73x less memory** |
   | allocs/op | 100,075 | 70 | **1,430x fewer allocs** |
   
   All existing `TestIsInBinary` subtests pass (binary, large\_binary, utf8, 
large\_utf8 × all null matching behaviors).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to