qspinlock: Remove unbounded cmpxchg loop from locking slowpath

Peter Zijlstra Mon, 09 Apr 2018 08:54:49 -0700

On Mon, Apr 09, 2018 at 03:54:09PM +0100, Will Deacon wrote:

> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> index 19261af9f61e..71eb5e3a3d91 100644
> --- a/kernel/locking/qspinlock.c
> +++ b/kernel/locking/qspinlock.c
> @@ -139,6 +139,20 @@ static __always_inline void 
> clear_pending_set_locked(struct qspinlock *lock)
>       WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL);
>  }
>  
> +/**
> + * set_pending_fetch_acquire - set the pending bit and return the old lock
> + *                             value with acquire semantics.
> + * @lock: Pointer to queued spinlock structure
> + *
> + * *,*,* -> *,1,*
> + */
> +static __always_inline u32 set_pending_fetch_acquire(struct qspinlock *lock)
> +{
> +     u32 val = xchg_relaxed(&lock->pending, 1) << _Q_PENDING_OFFSET;
> +     val |= (atomic_read_acquire(&lock->val) & ~_Q_PENDING_MASK);
> +     return val;
> +}


> @@ -289,18 +315,26 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, 
> u32 val)
>               return;
>  
>       /*
> -      * If we observe any contention; queue.
> +      * If we observe queueing, then queue ourselves.
>        */
> -     if (val & ~_Q_LOCKED_MASK)
> +     if (val & _Q_TAIL_MASK)
>               goto queue;
>  
>       /*
> +      * We didn't see any queueing, so have one more try at snatching
> +      * the lock in case it became available whilst we were taking the
> +      * slow path.
> +      */
> +     if (queued_spin_trylock(lock))
> +             return;
> +
> +     /*
>        * trylock || pending
>        *
>        * 0,0,0 -> 0,0,1 ; trylock
>        * 0,0,1 -> 0,1,1 ; pending
>        */
> +     val = set_pending_fetch_acquire(lock);
>       if (!(val & ~_Q_LOCKED_MASK)) {

So, if I remember that partial paper correctly, the atomc_read_acquire()
can see 'arbitrary' old values for everything except the pending byte,
which it just wrote and will fwd into our load, right?

But I think coherence requires the read to not be older than the one
observed by the trylock before (since it uses c-cas its acquire can be
elided).

I think this means we can miss a concurrent unlock vs the fetch_or. And
I think that's fine, if we still see the lock set we'll needlessly 'wait'
for it go become unlocked.

Re: [PATCH 02/10] locking/qspinlock: Remove unbounded cmpxchg loop from locking slowpath

Reply via email to