Hi Waiman,

On Mon, Apr 23, 2018 at 12:46:12PM -0400, Waiman Long wrote:
> On 04/10/2018 01:22 PM, Waiman Long wrote:
> > It was observed occasionally in PowerPC systems that there was reader
> > who had not been woken up but that its waiter->task had been cleared.

Can you provide more details about these observations?  (links to LKML
posts, traces, applications used/micro-benchmarks, ...)


> >
> > One probable cause of this missed wakeup may be the fact that the
> > waiter->task and the task state have not been properly synchronized as
> > the lock release-acquire pair of different locks in the wakeup code path
> > does not provide a full memory barrier guarantee.

I guess that by the "pair of different locks" you mean (sem->wait_lock,
p->pi_lock), right?  BTW, __rwsem_down_write_failed_common() is calling
wake_up_q() _before_ releasing the wait_lock: did you intend to exclude
this callsite? (why?)


> So smp_store_mb()
> > is now used to set waiter->task to NULL to provide a proper memory
> > barrier for synchronization.

Mmh; the patch is not introducing an smp_store_mb()... My guess is that
you are thinking at the sequence:

        smp_store_release(&waiter->task, NULL);
        [...]
        smp_mb(); /* added with your patch */

or what am I missing?


> >
> > Signed-off-by: Waiman Long <long...@redhat.com>
> > ---
> >  kernel/locking/rwsem-xadd.c | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> >
> > diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> > index e795908..b3c588c 100644
> > --- a/kernel/locking/rwsem-xadd.c
> > +++ b/kernel/locking/rwsem-xadd.c
> > @@ -209,6 +209,23 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
> >             smp_store_release(&waiter->task, NULL);
> >     }
> >  
> > +   /*
> > +    * To avoid missed wakeup of reader, we need to make sure
> > +    * that task state and waiter->task are properly synchronized.
> > +    *
> > +    *     wakeup                 sleep
> > +    *     ------                 -----
> > +    * __rwsem_mark_wake:   rwsem_down_read_failed*:
> > +    *   [S] waiter->task     [S] set_current_state(state)
> > +    *       MB                   MB
> > +    * try_to_wake_up:
> > +    *   [L] state            [L] waiter->task
> > +    *
> > +    * For the wakeup path, the original lock release-acquire pair
> > +    * does not provide enough guarantee of proper synchronization.
> > +    */
> > +   smp_mb();
> > +
> >     adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
> >     if (list_empty(&sem->wait_list)) {
> >             /* hit end of list above */
> 
> Ping!
> 
> Any thought on this patch?
> 
> I am wondering if there is a cheaper way to apply the memory barrier
> just on architectures that need it.

try_to_wake_up() does:

        raw_spin_lock_irqsave(&p->pi_lock, flags);
        smp_mb__after_spinlock();
        if (!(p->state & state))

My understanding is that this smp_mb__after_spinlock() provides us with
the guarantee you described above.  The smp_mb__after_spinlock() should
represent a 'cheaper way' to provide such a guarantee.

If this understanding is correct, the remaining question would be about
whether you want to rely on (and document) the smp_mb__after_spinlock()
in the callsite in question (the comment in wake_up_q()

   /*
    * wake_up_process() implies a wmb() to pair with the queueing
    * in wake_q_add() so as not to miss wakeups.
    */

does not appear to be suffient...).

  Andrea


> 
> Cheers,
> Longman
> 

Reply via email to