On Wed, Aug 10, 2016 at 12:17:57PM -0700, Davidlohr Bueso wrote: > On Wed, 10 Aug 2016, Manfred Spraul wrote: > > > On 08/10/2016 02:05 AM, Benjamin Herrenschmidt wrote: > > > On Tue, 2016-08-09 at 20:52 +0200, Manfred Spraul wrote: > > > > Hi Benjamin, Hi Michael, > > > > > > > > regarding commit 51d7d5205d33 ("powerpc: Add smp_mb() to > > > > arch_spin_is_locked()"): > > > > > > > > For the ipc/sem code, I would like to replace the spin_is_locked() with > > > > a smp_load_acquire(), see: > > > > > > > > http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/tree/ipc/sem.c#n367 > > > > > > > > http://www.ozlabs.org/~akpm/mmots/broken-out/ipc-semc-fix-complex_count-vs-simple-op-race.patch > > > > > > > > To my understanding, I must now add a smp_mb(), otherwise it would be > > > > broken on PowerPC: > > > > > > > > The approach that the memory barrier is added into spin_is_locked() > > > > doesn't work because the code doesn't use spin_is_locked(). > > > > > > > > Correct? > > > Right, otherwise you aren't properly ordered. The current powerpc locks > > > provide > > > good protection between what's inside vs. what's outside the lock but not > > > vs. > > > the lock *value* itself, so if, like you do in the sem code, use the lock > > > value as something that is relevant in term of ordering, you probably need > > > an explicit full barrier. > > But the problem here is with spin_unlock_wait() (for ll/sc spin_lock) not > seeing the > store that makes the lock visibly taken and both threads end up exiting out > of sem_lock(); > similar scenario to the spin_is_locked commit mentioned above, which is > crossing of > locks. > > Now that spin_unlock_wait() always implies at least an load-acquire barrier > (for both > ticket and qspinlocks, which is still x86 only), we wait on the full critical > region. > > So this patch takes this locking scheme: > > CPU0 CPU1 > spin_lock(l) spin_lock(L) > spin_unlock_wait(L) if (spin_is_locked(l)) > foo() foo() > > ... and converts it now to: > > CPU0 CPU1 > complex_mode = true spin_lock(l) > smp_mb() <--- do we want a smp_mb() here? > spin_unlock_wait(l) if (!smp_load_acquire(complex_mode)) > foo() foo() > > We should not be doing an smp_mb() right after a spin_lock(), makes no sense. > The > spinlock machinery should guarantee us the barriers in the unorthodox locking > cases, > such as this. >
Right. If you have: 6262db7c088b ("powerpc/spinlock: Fix spin_unlock_wait()") you don't need smp_mb() after spin_lock() on PPC. And, IIUC, if you have: 3a5facd09da8 ("arm64: spinlock: fix spin_unlock_wait for LSE atomics") d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers") you don't need smp_mb() after spin_lock() on ARM64. And, IIUC, if you have: 2c6100227116 ("locking/qspinlock: Fix spin_unlock_wait() some more") you don't need smp_mb() after spin_lock() on x86 with qspinlock. Regards, Boqun > Thanks, > Davidlohr
signature.asc
Description: PGP signature