On Wed, Aug 10, 2016 at 12:17:57PM -0700, Davidlohr Bueso wrote:
> On Wed, 10 Aug 2016, Manfred Spraul wrote:
> 
> > On 08/10/2016 02:05 AM, Benjamin Herrenschmidt wrote:
> > > On Tue, 2016-08-09 at 20:52 +0200, Manfred Spraul wrote:
> > > > Hi Benjamin, Hi Michael,
> > > > 
> > > > regarding commit 51d7d5205d33 ("powerpc: Add smp_mb() to
> > > > arch_spin_is_locked()"):
> > > > 
> > > > For the ipc/sem code, I would like to replace the spin_is_locked() with
> > > > a smp_load_acquire(), see:
> > > > 
> > > > http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/tree/ipc/sem.c#n367
> > > > 
> > > > http://www.ozlabs.org/~akpm/mmots/broken-out/ipc-semc-fix-complex_count-vs-simple-op-race.patch
> > > > 
> > > > To my understanding, I must now add a smp_mb(), otherwise it would be
> > > > broken on PowerPC:
> > > > 
> > > > The approach that the memory barrier is added into spin_is_locked()
> > > > doesn't work because the code doesn't use spin_is_locked().
> > > > 
> > > > Correct?
> > > Right, otherwise you aren't properly ordered. The current powerpc locks 
> > > provide
> > > good protection between what's inside vs. what's outside the lock but not 
> > > vs.
> > > the lock *value* itself, so if, like you do in the sem code, use the lock
> > > value as something that is relevant in term of ordering, you probably need
> > > an explicit full barrier.
> 
> But the problem here is with spin_unlock_wait() (for ll/sc spin_lock) not 
> seeing the
> store that makes the lock visibly taken and both threads end up exiting out 
> of sem_lock();
> similar scenario to the spin_is_locked commit mentioned above, which is 
> crossing of
> locks.
> 
> Now that spin_unlock_wait() always implies at least an load-acquire barrier 
> (for both
> ticket and qspinlocks, which is still x86 only), we wait on the full critical 
> region.
> 
> So this patch takes this locking scheme:
> 
>   CPU0                              CPU1
>   spin_lock(l)                      spin_lock(L)
>   spin_unlock_wait(L)       if (spin_is_locked(l))
>   foo()                        foo()
> 
> ... and converts it now to:
> 
>   CPU0                              CPU1
>   complex_mode = true       spin_lock(l)
>   smp_mb()                              <--- do we want a smp_mb() here?
>   spin_unlock_wait(l)       if (!smp_load_acquire(complex_mode))
>   foo()                        foo()
> 
> We should not be doing an smp_mb() right after a spin_lock(), makes no sense. 
> The
> spinlock machinery should guarantee us the barriers in the unorthodox locking 
> cases,
> such as this.
> 

Right.

If you have:

6262db7c088b ("powerpc/spinlock: Fix spin_unlock_wait()")

you don't need smp_mb() after spin_lock() on PPC.

And, IIUC, if you have:

3a5facd09da8 ("arm64: spinlock: fix spin_unlock_wait for LSE atomics")
d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against
concurrent lockers")

you don't need smp_mb() after spin_lock() on ARM64.

And, IIUC, if you have:

2c6100227116 ("locking/qspinlock: Fix spin_unlock_wait() some more")

you don't need smp_mb() after spin_lock() on x86 with qspinlock.

Regards,
Boqun

> Thanks,

> Davidlohr

Attachment: signature.asc
Description: PGP signature

Reply via email to