On Thu, Feb 08, 2018 at 04:03:41PM +0000, Will Deacon wrote: > On Thu, Feb 08, 2018 at 04:46:43PM +0100, Peter Zijlstra wrote: > > On Thu, Feb 08, 2018 at 03:30:31PM +0000, Will Deacon wrote: > > > On Thu, Feb 08, 2018 at 03:00:05PM +0100, Peter Zijlstra wrote: > > > > > > Without this ordering I think it would be possible to loose has_blocked > > > > and not observe the CPU either. > > > > > > I had a quick look at this, and I think you're right. This looks very much > > > like an 'R'-shaped test, which means it's smp_mb() all round otherwise > > > Power > > > will go wrong. That also means the smp_mb__after_atomic() in > > > nohz_balance_enter_idle *cannot* be an smp_wmb(), so you might want a > > > comment stating that explicitly. > > > > Thanks Will. BTW, where does that 'R' shape nomenclature come from? > > This is the first I've heard of it. > > I don't know where it originates from, but the imfamous "test6.pdf" has it: > > https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test6.pdf > > half way down the first page on the left. It says "needs sync+sync" which
Indeed. As a curiosity: I've never _observed_ R+lwsync+sync (the lwsync separating the two writes), and other people who tried found the same http://moscova.inria.fr/~maranget/cats7/linux/hard.html#unseen http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/ppc051.html#toc8 . It would be interesting to hear about different results ... ;-) Andrea > is about as bad as it gets for Power (compare with "2+2w", which gets away > with lwsync+lwsync). See also: > > http://materials.dagstuhl.de/files/16/16471/16471.DerekWilliams.Slides.pdf > > for a light-hearted, yet technically accurate story about the latter. > > Will