zeroshade opened a new pull request, #715:
URL: https://github.com/apache/arrow-go/pull/715

   ### Rationale for this change
   The initial bool optimization (#707) added direct bitmap encoding with 
`WriteBitmapBatch` and `WriteBitmapBatchSpaced` functions to reduce allocations 
during encoding and decoding. However, this didn't implement the same 
optimization for statistics updated, bloom filter inserts or spaced encoding.
   
   ### What changes are included in this PR?
   1. Update the bloom filter handling to compute hashes directly from bitmaps, 
including for spaced values and reuse slice allocation across iterations.
   2. Update statistics handling to update min/max directly from bitmaps for 
spaced and non-spaced scenarios.
   3. Update the encoder interface for the boolean encoder to have a 
`PutSpacedBitmap` and compress bitmaps with validity buffers. 
   4. Add `writeBitmapValues` and `writeBitmapValuesSpaced` for the boolean 
column writer with fallback to `[]bool` conversions if the encoder doesn't 
implement the interface.
   
   ### Are these changes tested?
   Yes, unit tests are added for everything
   
   ### Are there any user-facing changes?
   No user-facing API changes, this is a pure internal optimization change for 
boolean columns.
   
   *Benchmark Results*
   
   *Statistics Update*
   ```
     BEFORE (with bitmap → []bool conversion):
     BenchmarkBooleanStatisticsWithConversion-16
       153,398 ns/op    109,278 B/op (107 KB)    6 allocs/op
   
     AFTER (direct bitmap operations):
     BenchmarkBooleanStatisticsDirectBitmap-16
           393 ns/op      2,698 B/op (2.6 KB)    5 allocs/op
   ```
   
   * 390x faster 
   * 97.5% less memory
   * 1 fewer allocation
   
   *Bloom filter hashing*
   ```
     BEFORE (with bitmap → []bool conversion):
     BenchmarkBloomFilterHashingWithConversion-16
       1,084,001 ns/op    3,309,593 B/op (3.23 MB)    3 allocs/op
   
     AFTER (direct bitmap operations):
     BenchmarkBloomFilterHashingDirectBitmap-16
         448,882 ns/op      802,821 B/op (784 KB)    2 allocs/op
   ```
   
   * 2.4x faster
   * 76% less memory
   * 2.5 MB saved/operation
   * 1 fewer allocation
   
   *Full write path (Stats + Bloom Filter)*
   ```
     BEFORE (with bitmap → []bool conversion):
     BenchmarkFullWritePathWithConversion-16
       1,211,525 ns/op    3,315,566 B/op (3.24 MB)    15 allocs/op
   
     AFTER (direct bitmap operations):
     BenchmarkFullWritePathDirectBitmap-16
         580,934 ns/op      807,640 B/op (789 KB)    13 allocs/op
   ```
   
   * 2.1x faster
   * 76% less memory
   * 2.5 MB saved per 100k bools written
   * 2 fewer allocations


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to