Re: [go-nuts] Bitmask is slower than modulo calculation

'Keith Randall' via golang-nuts Tue, 14 May 2024 09:55:48 -0700

Your test is benchmarking int->any conversions more than it is testing the 
underlying modulo/mask difference.
When you pass an integer to Enqueue, it is converted to any, which 
(usually) involves an allocation. That will swamp the cost of any single 
arithmetic operation.


There are a few ways to fix it. Probably the simplest is to make your ring 
buffers generic on the element type.
You could also fix the benchmark, by precomputing the any-typed elements 
you're going to enqueue before starting the timer.

When you get a surprising benchmark result, always run the benchmark with 
profiling on (-cpuprofile) and look at the resulting profile. This is a 
case where it would be obvious what the problem is.
On Sunday, May 12, 2024 at 10:18:38 PM UTC-7 robert engels wrote:

> Hi. This is still not correct. Use the “for.. in b.N” as discussed in the 
> blog in order to understand the per op difference - which will be more 
> accurate for a microbenchmark timing,
>
> But, if you really want to make a case that bit mask is slower than mod, 
> then a simpler test would be better - you can do these ops on in loop using 
> b.N to time the difference.
>
>
>
> On May 12, 2024, at 8:28 PM, Yuta MIYAKE <yuta_...@mercari.com> wrote:
>
> Thank you. 
>
> This is benchstat result. new test code follows:
>
> ❯ go test -test.bench BenchmarkTestSingleModulo -count=10 -cpu=1 > 
> modulo.txt
> ❯ go test -test.bench BenchmarkTestSingleBitMask -count=10 -cpu=1 > 
> bitmask.txt
> ❯ benchstat modulo.txt bitmask.txt
> goos: darwin
> goarch: arm64
> pkg: ringbuffer
>                   │ modulo.txt │    bitmask.txt    │
>                   │   sec/op   │   sec/op    vs base   │
> TestSingleModulo    6.648 ± 1%
> TestSingleBitMask                6.694 ± 5%
> geomean             6.648        6.694       ? ¹ ²
>
>
> new test code:
>
> const BufferSize = 2 * 1024 * 1024
>
> func benchmarkSingle(rb RingBuffer) {
>
>
> total := 500000
> for i := 0; i < total; i++ {
> for j := 0; j < 1000; j++ {
> rb.Enqueue(j)
> }
> for j := 0; j < 1000; j++ {
> rb.Dequeue()
> }
> }
> }
>
>
> func BenchmarkTestSingleModulo(b *testing.B) {
> rb := NewRingBuffer0(BufferSize)
> b.ResetTimer()
> benchmarkSingle(rb)
> }
>
> func BenchmarkTestSingleBitMask(b *testing.B) {
> rb := NewRingBuffer1(BufferSize)
> b.ResetTimer()
> benchmarkSingle(rb)
> }
>
> On Monday, May 13, 2024 at 8:20:05 AM UTC+9 robert engels wrote:
>
>> Use the Go benchmarking facilities, see 
>> https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go
>>
>> On May 11, 2024, at 9:57 PM, leon <leonca...@gmail.com> wrote:
>>
>> I'm trying to prove an optimization technique for ring buffer is 
>> effective. One of the technique is using bitmask instead of modulo to 
>> calculate a wrap around. However, in my environment, modulo is slightly 
>> faster in a test where 1 billion items are enqueued /dequeued by a single 
>> goroutine. What do you think could be the cause? 
>>
>> Full code:
>> https://go.dev/play/p/H933oqrhPI-
>>
>> Environment:
>> * go version go1.21.4 darwin/arm64
>> * Apple M1 Pro
>>
>> RingBuffer with modulo:
>> ```
>> type RingBuffer0 struct {
>> writeIdx uint64
>> readIdx  uint64
>> buffers  []any
>> size     uint64
>> }
>>
>> func NewRingBuffer0(size uint64) *RingBuffer0 {
>> rb := &RingBuffer0{}
>> rb.init(size)
>> return rb
>> }
>>
>> func (rb *RingBuffer0) init(size uint64) {
>> rb.buffers = make([]any, size)
>> rb.size = size
>> }
>>
>> func (rb *RingBuffer0) Enqueue(item any) error {
>> if rb.writeIdx-rb.readIdx == rb.size {
>> return ErrBufferFull
>> }
>> rb.buffers[rb.writeIdx%rb.size] = item
>> rb.writeIdx++
>> return nil
>> }
>>
>> func (rb *RingBuffer0) Dequeue() (any, error) {
>> if rb.writeIdx == rb.readIdx {
>> return nil, ErrBufferEmpty
>> }
>> item := rb.buffers[rb.readIdx%rb.size]
>> rb.readIdx++
>> return item, nil
>> }
>> ```
>>
>> RingBuffer with bitmask:
>> change each module calculation to the code below
>> * rb.buffers[rb.writeIdx&(rb.size-1)] = item
>> * item := rb.buffers[rb.readIdx&(rb.size-1)]
>>
>> Test:
>> func TestSingle(rb RingBuffer) {
>> start := time.Now()
>> total := 500000
>> for i := 0; i < total; i++ {
>> for j := 0; j < 1000; j++ {
>> rb.Enqueue(j)
>> }
>> for j := 0; j < 1000; j++ {
>> rb.Dequeue()
>> }
>> }
>> end := time.Now()
>> count := total * 2000
>> duration := end.Sub(start).Milliseconds()
>> fmt.Printf("%d ops in %d ms\n", count, duration)
>> }
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to golang-nuts...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/golang-nuts/b9c4d2e0-4ab4-4d27-9359-abd8c090ae33n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/golang-nuts/b9c4d2e0-4ab4-4d27-9359-abd8c090ae33n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>>
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts...@googlegroups.com.
>
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/efe2618f-c520-4b53-b233-a724fe9b77d5n%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/golang-nuts/efe2618f-c520-4b53-b233-a724fe9b77d5n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/23cfcbd1-73ac-4546-93da-b1b47087ed65n%40googlegroups.com.

Re: [go-nuts] Bitmask is slower than modulo calculation

Reply via email to