Re: transactional memory, in Intel this is called TSX-NI. It looks like even for the latest generation (Skylake), it's enabled for certain processors and disabled for others: compare all skylake processors <http://ark.intel.com/compare/?ids=93742,93741,93740,93848,93847,93339,93338,93337,91169,93336,93340,93341,93335,93354,93359,93358,90619,90713,93277,93366,90737,90738,90741,92213,92211,90711,90610,88171,88177,88174,88175,88173,88169,88176,88170,88182,88168,88172,88178,90615,90426,88201,88186,90425,90617,90611,90612,88181,90618,88972,88970,88969,88967,91497,88192,91167,91163,88194,88962,91156,88959,88190,88193,91160,91166,91164,89063,88180,91154,88199,88197,88202,88198,89610,89608,89612,89611,88200,88196,88188,88189,88184,88183,88185,88187,90731,90728,90733,90729,90734,90730,90725,90732,88179,90614,88195,88191>
On Sun, Oct 2, 2016 at 12:07 PM Jesper Louis Andersen < jesper.louis.ander...@gmail.com> wrote: > > This program has a store that holds the data, and a sync.Mutex that guards > concurrent access on reads and writes. This is a snippet of the locking > based implementation: > > type Store struct { > durations map[string]*Distribution > counters map[string]int64 > samples map[string]*Distribution > > lock *sync.Mutex > } > > If I were to write this, I would have one Store per goroutine. Then at > intervals, you send this Store to a central location (over a channel) and > tally up the counts. In turn, you avoid locking in the tight loops since > each goroutine owns the Store it operates on, until you make a copy of it > and send it along. approach, which is a bit harder to pull of in Go, is to > have a mutex/Store per core. Read operations then take all mutexes for > reading, but write operations will almost never compete on the lock, so it > becomes free of contention. > > The reason this works is because your counters work as CRDT P-counters and > max/min are CRDTs on their own. So you don't have to synchronize at all[0]. > Also, if you keep the count, the sum and the sum of squares, it is easy to > compute the mean and the std. dev. of the data set. All form CRDTs [1]. > > The key to fast concurrent programming which achieves good parallelism is > to avoid having to synchronize at all. > > > (Aside: both of the above schemes have been used by me with success in > Erlang, so I know they work in practice as well as in theory). > > I would really like to understand why Locking in Go is so much slower than in > Java. I've initially written this program using channels, but that was much > slower than locking. Can somebody please help me out? > > Javas compiler employs a number of optimizations w.r.t. Locking: > > * Escape analysis and lock elision: If we can prove data is thread-local > and never escapes the thread, then we can elide the lock and never take it. > > * Lock coarsening: Suppose we take and release the same lock multiple > times in a row. Say you roll out the inner loop of your benchmark a bit. > Then you can take the lock once, update several times and release the lock. > This is called coarsening the lock. If the lock is heavily contended, this > may be faster. Especially if other optimizations allows us to > strength-reduce the updates and fold several of them into a single update. > Which would be exactly what we do once we have unrolled the loop 4-8 times. > > There is also the ability to exploit modern CPUs hardware transactional > memory: If you have a coarse lock over a large array, and most updates > affect different indexes in the array, then you can run a transaction > optimistically under the assumption threads will write to different slots. > On a collision, you restart the transaction, this time with normal mutexes. > It is sometimes called hardware lock elision, but the actual state of it on > modern CPUs eludes me. The extensions were there but I think they were > disabled by microcode since there were bugs in them (At least on the > Haswell generation of Intel CPUs). Also, I don't think Java has this, yet, > but I might be mistaken. > > [0] CRDT: Commutative/Convergent Replicated Data Type. > [1] Consider just using https://godoc.org/github.com/codahale/hdrhistogram by > Coda Hale. HDR Histogram is a nice data structure, invented by Gil Tene, > which combines the ideas of Flajolet's HyperLogLog with the structure of > floating point representations. Histogram updates are measured in the lower > nanoseconds. > > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.