Hi,

I've written a small library (https://github.com/toefel18/go-patan) in that 
stores counters and collects statistics (running 
min/max/average/stddeviation) of a program during runtime. There is a lock 
based implementation and a channel based implementation. I've written this 
library before in Java as well https://github.com/toefel18/patan. The Java 
version with equivalent implementation is much faster than both the channel 
and locking implementations in Go, I really don't understand why. 

This program has a store that holds the data, and a sync.Mutex that guards 
concurrent access on reads and writes. This is a snippet of the locking 
based implementation:

type Store struct {
   durations map[string]*Distribution
   counters  map[string]int64
   samples   map[string]*Distribution

   lock *sync.Mutex
}

func (store *Store) addSample(key string, value int64) {
   store.addToStore(store.samples, key, value)
}

func (store *Store) addDuration(key string, value int64) {
   store.addToStore(store.durations, key, value)
}


func (store *Store) addToStore(destination map[string]*Distribution, key 
string, value int64) {
   store.lock.Lock()
   defer store.lock.Unlock()
   distribution, exists := destination[key]
   if !exists {
      distribution = NewDistribution()
      destination[key] = distribution
   }
   distribution.addSample(value)
}

Now, when I benchmark this GO code, I get the following results (see gist: 
benchmark code 
<https://gist.github.com/toefel18/96edac8f57f9ad8a4f9a86d83e726aec>):

10 threads with 20000 items took 133 millis 
100 threads with 20000 items took 1809 millis 
1000 threads with 20000 items took 17576 millis 
10 threads with 200000 items took 1228 millis 

100 threads with 200000 items took 17900 millis

When I benchmark the Java code, there are much better results (see gist: java 
benchmark code 
<https://gist.github.com/toefel18/9def55a8c3c53c79a4488c29c66f31e5>)

10 threads with 20000 items takes 89 millis 
100 threads with 20000 items takes 265 millis 
1000 threads with 20000 items takes 2888 millis 
10 threads with 200000 items takes 311 millis 

100 threads with 200000 items takes 3067 millis


I have profiled the Go code and created a call graph 
<https://gist.githubusercontent.com/toefel18/96edac8f57f9ad8a4f9a86d83e726aec/raw/c1ff6452abaefe49c3c688b3d07e4574cbe77264/call-graph.png>.
 I interpret this as follows:
GO spends 0.31 and 0.25 seconds in my methods, and pretty much the rest in
 sync.(*Mutex).Lock() and sync.(*Mutex).Unlock()  

The top20 output of the profiler:

(pprof) top20
59110ms of 73890ms total (80.00%)
Dropped 22 nodes (cum <= 369.45ms)
Showing top 20 nodes out of 65 (cum >= 50220ms)
      flat  flat%   sum%        cum   cum%
    8900ms 12.04% 12.04%     8900ms 12.04%  runtime.futex
    7270ms  9.84% 21.88%     7270ms  9.84%  runtime/internal/atomic.Xchg
    7020ms  9.50% 31.38%     7020ms  9.50%  runtime.procyield
    4560ms  6.17% 37.56%     4560ms  6.17%  sync/atomic.CompareAndSwapUint32
    4400ms  5.95% 43.51%     4400ms  5.95%  runtime/internal/atomic.Xadd
    4210ms  5.70% 49.21%    22040ms 29.83%  runtime.lock
    3650ms  4.94% 54.15%     3650ms  4.94%  runtime/internal/atomic.Cas
    3260ms  4.41% 58.56%     3260ms  4.41%  runtime/internal/atomic.Load
    2220ms  3.00% 61.56%    22810ms 30.87%  sync.(*Mutex).Lock
    1870ms  2.53% 64.10%     1870ms  2.53%  runtime.osyield
    1540ms  2.08% 66.18%    16740ms 22.66%  runtime.findrunnable
    1430ms  1.94% 68.11%     1430ms  1.94%  runtime.freedefer
    1400ms  1.89% 70.01%     1400ms  1.89%  sync/atomic.AddUint32
    1250ms  1.69% 71.70%     1250ms  1.69%  
github.com/toefel18/go-patan/statistics/lockbased.(*Distribution).addSample
    1240ms  1.68% 73.38%     3140ms  4.25%  runtime.deferreturn
    1070ms  1.45% 74.83%     6520ms  8.82%  runtime.systemstack
    1010ms  1.37% 76.19%     1010ms  1.37%  runtime.newdefer
    1000ms  1.35% 77.55%     1000ms  1.35%  runtime.mapaccess1_faststr
     950ms  1.29% 78.83%    15660ms 21.19%  runtime.semacquire
     860ms  1.16% 80.00%    50220ms 67.97%  main.Benchmrk.func1


I would really like to understand why Locking in Go is so much slower than in 
Java. I've initially written this program using channels, but that was much 
slower than locking. Can somebody please help me out?



-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to