Re: [go-nuts] Go locking and channels much slower than Java equivalent, program spends most of time in sync.(Mutex).Lock() and sync.(Mutex).Unlock()

Matt Harden Sun, 02 Oct 2016 13:41:50 -0700

Re: transactional memory, in Intel this is called TSX-NI. It looks like
even for the latest generation (Skylake), it's enabled for certain
processors and disabled for others: compare all skylake processors
<http://ark.intel.com/compare/?ids=93742,93741,93740,93848,93847,93339,93338,93337,91169,93336,93340,93341,93335,93354,93359,93358,90619,90713,93277,93366,90737,90738,90741,92213,92211,90711,90610,88171,88177,88174,88175,88173,88169,88176,88170,88182,88168,88172,88178,90615,90426,88201,88186,90425,90617,90611,90612,88181,90618,88972,88970,88969,88967,91497,88192,91167,91163,88194,88962,91156,88959,88190,88193,91160,91166,91164,89063,88180,91154,88199,88197,88202,88198,89610,89608,89612,89611,88200,88196,88188,88189,88184,88183,88185,88187,90731,90728,90733,90729,90734,90730,90725,90732,88179,90614,88195,88191>



On Sun, Oct 2, 2016 at 12:07 PM Jesper Louis Andersen <
jesper.louis.ander...@gmail.com> wrote:

>
> This program has a store that holds the data, and a sync.Mutex that guards
> concurrent access on reads and writes. This is a snippet of the locking
> based implementation:
>
> type Store struct {
>    durations map[string]*Distribution
>    counters  map[string]int64
>    samples   map[string]*Distribution
>
>    lock *sync.Mutex
> }
>
> If I were to write this, I would have one Store per goroutine. Then at
> intervals, you send this Store to a central location (over a channel) and
> tally up the counts. In turn, you avoid locking in the tight loops since
> each goroutine owns the Store it operates on, until you make a copy of it
> and send it along. approach, which is a bit harder to pull of in Go, is to
> have a mutex/Store per core. Read operations then take all mutexes for
> reading, but write operations will almost never compete on the lock, so it
> becomes free of contention.
>
> The reason this works is because your counters work as CRDT P-counters and
> max/min are CRDTs on their own. So you don't have to synchronize at all[0].
> Also, if you keep the count, the sum and the sum of squares, it is easy to
> compute the mean and the std. dev. of the data set. All form CRDTs [1].
>
> The key to fast concurrent programming which achieves good parallelism is
> to avoid having to synchronize at all.
>
>
> (Aside: both of the above schemes have been used by me with success in
> Erlang, so I know they work in practice as well as in theory).
>
> I would really like to understand why Locking in Go is so much slower than in 
> Java. I've initially written this program using channels, but that was much 
> slower than locking. Can somebody please help me out?
>
> Javas compiler employs a number of optimizations w.r.t. Locking:
>
> * Escape analysis and lock elision: If we can prove data is thread-local
> and never escapes the thread, then we can elide the lock and never take it.
>
> * Lock coarsening: Suppose we take and release the same lock multiple
> times in a row. Say you roll out the inner loop of your benchmark a bit.
> Then you can take the lock once, update several times and release the lock.
> This is called coarsening the lock. If the lock is heavily contended, this
> may be faster. Especially if other optimizations allows us to
> strength-reduce the updates and fold several of them into a single update.
> Which would be exactly what we do once we have unrolled the loop 4-8 times.
>
> There is also the ability to exploit modern CPUs hardware transactional
> memory: If you have a coarse lock over a large array, and most updates
> affect different indexes in the array, then you can run a transaction
> optimistically under the assumption threads will write to different slots.
> On a collision, you restart the transaction, this time with normal mutexes.
> It is sometimes called hardware lock elision, but the actual state of it on
> modern CPUs eludes me. The extensions were there but I think they were
> disabled by microcode since there were bugs in them (At least on the
> Haswell generation of Intel CPUs). Also, I don't think Java has this, yet,
> but I might be mistaken.
>
> [0] CRDT: Commutative/Convergent Replicated Data Type.
> [1] Consider just using https://godoc.org/github.com/codahale/hdrhistogram by
> Coda Hale. HDR Histogram is a nice data structure, invented by Gil Tene,
> which combines the ideas of Flajolet's HyperLogLog with the structure of
> floating point representations. Histogram updates are measured in the lower
> nanoseconds.
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Go locking and channels much slower than Java equivalent, program spends most of time in sync.(*Mutex).Lock() and sync.(*Mutex).Unlock()

Reply via email to

Re: [go-nuts] Go locking and channels much slower than Java equivalent, program spends most of time in sync.(Mutex).Lock() and sync.(Mutex).Unlock()