Peter Zijlstra wrote:
> On Thu, Oct 22, 2015 at 08:01:37AM +0000, Kosuke Tatsukawa wrote:
> 
> Its somewhat unfortunate you chose the whole wait_woken() thing, its
> 'rare'.

Yes.  I first noticed this lack of memory barrier before
waitqueue_active() issue in drivers/tty/n_tty.c which was using
wait_woken().  However, other places were mostly using prepare_to_wait()
or wait_event*(), so wait_woken() is 'rare'.


>> Second, on the waiting thread side, the CPU can reorder the load of
>> CONDITION to occur during add_wait_queue active, before the entry is
>> added to the wait queue.
>>      wake_up thread                 waiting thread
>>                                       (reordered)
>> ------------------------------------------------------------------------
>>                                 spin_lock_irqsave(...)      <add_wait_queue>
>>                                 if (CONDITION)
>> CONDITION = 1;
>> if (waitqueue_active(wq))
>       wake_up();
>>                                 __add_wait_queue(...)       <add_wait_queue>
>>                                 spin_unlock_irqrestore(...) <add_wait_queue>
>>                                 wait_woken(&wait, ...);
>> ------------------------------------------------------------------------
> 
> This isn't actually a problem IIRC, because wait_woken() will test
> WQ_FLAG_WOKEN and not actually sleep.

In the above figure, waitqueue_active(wq) will return 0 (queue is
inactive) and skip the whole wake_up() call, because __add_wait_queue()
hasn't been called yet.  This actually does occur using a reproducer.


>> However, if that is too expensive, the reordering could be prevented by
>> adding memory barriers in the following places.
>>      wake_up thread                 waiting thread
>> ------------------------------------------------------------------------
>> CONDITION = 1;                  add_wait_queue(wq, &wait);
>> smp_mb();                       smp_mb();
>> if (waitqueue_active(wq))       for (;;) {
>>         wake_up(wq);                    if (CONDITION)
>>                                                 break;
>>                                         wait_woken(&wait, ...);
>>                                 }
> 
> So for wait_woken, WQ_FLAG_WOKEN should 'fix' that, and for pretty much
> anything else you must have a set_current_state() before testing
> CONDITION and you're good (as you state elsewhere).

wait_woken() calls set_current_state(), but that is after the CONDITION
test.


>> +++ b/include/linux/wait.h
>> @@ -102,6 +102,19 @@ init_waitqueue_func_entry(wait_queue_t *q, 
>> wait_queue_func_t func)
>>      q->func         = func;
>>  }
>>  
>> +/*
>> + * Note: When adding waitqueue_active before calling wake_up for
>> + * optimization, some sort of memory barrier is required on SMP so
>> + * that the waiting thread does not miss the wake up.
>> + *
>> + * A memory barrier is required before waitqueue_active to prevent
>> + * waitqueue_active from being reordered by the CPU before any writes
>> + * done prior to it.
>> + *
>> + * The waiting side also needs a memory barrier which pairs with the
>> + * wake_up side.  If prepare_to_wait() or wait_event*() is used, they
>> + * contain the memory barrier in set_current_state().
>> + */
>>  static inline int waitqueue_active(wait_queue_head_t *q)
>>  {
>>      return !list_empty(&q->task_list);
> 
> How about something like:
> 
> /**
>  * waitqueue_active -- locklessly test for waiters on the queue
>  * @q: the waitqueue to test for waiters
>  *
>  * returns true if the wait list is not empty
>  *
>  * NOTE: this function is lockless and requires care, incorrect usage
>  * _will_ lead to sporadic and non-obvious failure.
>  *
>  * Use either while holding wait_queue_head_t::lock or when used for
>  * wakeups with an extra smp_mb() like:
>  *
>  *    CPU0 - waker                    CPU1 - waiter
>  *
>  *                                    for (;;) {
>  *    @cond = true;                     prepare_to_wait(&wq, &wait, state);
>  *    smp_mb();                         /* smp_mb() from set_current_state() 
> */
>  *    if (waitqueue_active(wq))         if (@cond)
>  *      wake_up(wq);                      break;
>  *                                        schedule();
>  *                                      }
>  *
>  * Because without the explicit smp_mb() its possible for the
>  * waitqueue_active() load to get hoisted over the @cond store such that
>  * we'll observe an empty wait list while the waiter might not observe
>  * @cond.
>  *
>  * Also note that this 'optimization' trades a spin_lock() for an
>  * smp_mb(), which (when the lock is uncontended) are of roughly equal
>  * cost.
>  */
> 
> Does that work for you?

Yes.  Considering that the use of wait_woken is pretty rare, I think the
explanation is more focused and easier to understand this way.

Best regards.
---
Kosuke TATSUKAWA  | 3rd IT Platform Department
                  | IT Platform Division, NEC Corporation
                  | [email protected]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to