Re: [PR] GH-50280: [C++] Implement VisitTwoBitRuns and VisitTwoSetBitRuns methods [arrow]

via GitHub Tue, 30 Jun 2026 08:25:38 -0700


HuaHuaY commented on PR #50281:
URL: https://github.com/apache/arrow/pull/50281#issuecomment-4845192292


   I've compiled the test data into three tables. `BinaryBitBlockCounterSum` is 
faster than `BinaryVisitTwoSetBitRunsSum` when there are many null values, 
which I think is expected. In this case, merging the two bitmaps results in too 
many consecutive intervals, making it slower than judging multiple bits at once.
   
   However, I was somewhat surprised that in x86 tests, 
`BinaryBitBlockCounterSum` achieved better performance when there were almost 
no null values. I don't understand why.
   
   There were also some unexpected findings. Currently, 
`BinaryVisitTwoSetBitRunsSum` performs better than `BinaryVisitTwoBitRunsSum`. 
This is partly due to function implementation, and partly because 
`SetBitRunReader` performs better than `BitRunReader`. I will later adjust 
`BinaryVisitTwoBitRunsSum` to use `SetBitRunReader` (`BinaryVisitTwoBitRunsSum` 
can directly wrap `BinaryVisitTwoSetBitRunsSum`). I haven't yet figured out why 
`SetBitRunReader` is faster.
   
   Based on these test results, I plan to revert the changes to 
`cpp/src/arrow/compute/kernels/hash_aggregate_pivot.cc`. However, I think that 
the real value of `BinaryVisitTwoSetBitRunsSum` lies in its ability to handle 
data in batches rather than row-by-row, allowing the use of vector-processing 
functions such as `::arrow::bit_util::SetBitsTo` or 
`::arrow::internal::CountSetBits`. Since I plan to use it in a future PR, I'd 
like to add benchmark scenarios that will showcase the advantages of 
`BinaryVisitTwoSetBitRunsSum`.
   
   ## 4ab7d672 / amd64-c6a-4xlarge-linux / c0d6bf7
   
   | Benchmark | 2 | 8 | 64 | 512 | 4096 | 32768 | 65536 |
   |---|---:|---:|---:|---:|---:|---:|---:|
   | `BitBlockCounterSum` | 2.59e+8 | 6.46e+8 | 1.39e+9 | 3.58e+9 | 4.90e+9 | 
5.15e+9 | 5.17e+9 |
   | `VisitBitRunsSum` | 3.10e+8 | 5.70e+8 | 2.07e+9 | 4.30e+9 | 4.68e+9 | 
4.73e+9 | 4.73e+9 |
   | `VisitSetBitRunsSum` | 3.59e+8 | 6.40e+8 | 2.20e+9 | 4.58e+9 | 4.88e+9 | 
4.93e+9 | 4.94e+9 |
   | `BinaryBitBlockCounterSum` | 1.79e+8 | 3.68e+8 | 7.40e+8 | 1.99e+9 | 
3.69e+9 | 4.18e+9 | 4.22e+9 |
   | `BinaryVisitTwoBitRunsSum` | 1.40e+8 | 2.42e+8 | 9.38e+8 | 2.61e+9 | 
3.36e+9 | 3.49e+9 | 3.52e+9 |
   | `BinaryVisitTwoSetBitRunsSum` | 1.78e+8 | 2.77e+8 | 1.03e+9 | 3.05e+9 | 
3.94e+9 | 4.06e+9 | 4.08e+9 |
   
   ## fd5a984c / amd64-m5-4xlarge-linux / c0d6bf7
   
   | Benchmark | 2 | 8 | 64 | 512 | 4096 | 32768 | 65536 |
   |---|---:|---:|---:|---:|---:|---:|---:|
   | `BitBlockCounterSum` | 2.00e+8 | 4.55e+8 | 9.33e+8 | 1.88e+9 | 2.26e+9 | 
2.32e+9 | 2.32e+9 |
   | `VisitBitRunsSum` | 2.07e+8 | 3.73e+8 | 1.15e+9 | 1.98e+9 | 2.17e+9 | 
2.19e+9 | 2.19e+9 |
   | `VisitSetBitRunsSum` | 2.34e+8 | 4.01e+8 | 1.15e+9 | 2.03e+9 | 2.24e+9 | 
2.26e+9 | 2.26e+9 |
   | `BinaryBitBlockCounterSum` | 1.20e+8 | 2.13e+8 | 4.06e+8 | 1.02e+9 | 
1.79e+9 | 2.00e+9 | 2.01e+9 |
   | `BinaryVisitTwoBitRunsSum` | 1.05e+8 | 1.77e+8 | 5.93e+8 | 1.34e+9 | 
1.63e+9 | 1.67e+9 | 1.68e+9 |
   | `BinaryVisitTwoSetBitRunsSum` | 1.16e+8 | 1.83e+8 | 6.26e+8 | 1.51e+9 | 
1.87e+9 | 1.91e+9 | 1.91e+9 |
   
   ## 457fe991 / test-mac-arm / c0d6bf7
   
   | Benchmark | 2 | 8 | 64 | 512 | 4096 | 32768 | 65536 |
   |---|---:|---:|---:|---:|---:|---:|---:|
   | `BitBlockCounterSum` | 2.60e+8 | 7.22e+8 | 1.98e+9 | 5.40e+9 | 7.09e+9 | 
7.29e+9 | 7.30e+9 |
   | `VisitBitRunsSum` | 3.41e+8 | 6.44e+8 | 1.57e+9 | 2.33e+9 | 2.57e+9 | 
2.59e+9 | 2.59e+9 |
   | `VisitSetBitRunsSum` | 4.00e+8 | 6.36e+8 | 2.64e+9 | 9.07e+9 | 1.22e+10 | 
1.29e+10 | 1.29e+10 |
   | `BinaryBitBlockCounterSum` | 1.79e+8 | 4.07e+8 | 1.00e+9 | 2.52e+9 | 
4.76e+9 | 5.38e+9 | 5.44e+9 |
   | `BinaryVisitTwoBitRunsSum` | 1.75e+8 | 2.99e+8 | 1.12e+9 | 3.91e+9 | 
5.64e+9 | 5.90e+9 | 5.93e+9 |
   | `BinaryVisitTwoSetBitRunsSum` | 2.28e+8 | 3.17e+8 | 1.21e+9 | 4.14e+9 | 
6.09e+9 | 6.40e+9 | 6.42e+9 |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-50280: [C++] Implement VisitTwoBitRuns and VisitTwoSetBitRuns methods [arrow]

Reply via email to