On Thu, Nov 29, 2018 at 11:17:14PM +0100, Peter Zijlstra wrote: > On Thu, Nov 29, 2018 at 01:34:21PM -0800, Davidlohr Bueso wrote: > > I messed up something such that waiman was not in the thread. Ccing. > > > > > On Thu, 29 Nov 2018, Waiman Long wrote: > > > > > > > That can be costly for x86 which will now have 2 locked instructions. > > > > > > Yeah, and when used as an actual queue we should really start to notice. > > > Some users just have a single task in the wake_q because avoiding the cost > > > of wake_up_process() with locks held is significant. > > > > > > How about instead of adding the barrier before the cmpxchg, we do it > > > in the failed branch, right before we return. This is the uncommon > > > path. > > > > > > Thanks, > > > Davidlohr > > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > > index 091e089063be..0d844a18a9dc 100644 > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -408,8 +408,14 @@ void wake_q_add(struct wake_q_head *head, struct > > > task_struct *task) > > > * This cmpxchg() executes a full barrier, which pairs with the full > > > * barrier executed by the wakeup in wake_up_q(). > > > */ > > > - if (cmpxchg(&node->next, NULL, WAKE_Q_TAIL)) > > > + if (cmpxchg(&node->next, NULL, WAKE_Q_TAIL)) { > > > + /* > > > + * Ensure, that when the cmpxchg() fails, the corresponding > > > + * wake_up_q() will observe our prior state. > > > + */ > > > + smp_mb__after_atomic(); > > > return; > > > + } > > So wake_up_q() does: > > wake_up_q(): > node->next = NULL; > /* implied smp_mb */ > wake_up_process(); > > So per the cross your variables 'rule', this side then should do: > > wake_q_add(): > /* wake_cond = true */ > smp_mb() > cmpxchg_relaxed(&node->next, ...); > > So that the ordering pivots around node->next. > > Either we see NULL and win the cmpxchg (in which case we'll do the > wakeup later) or, when we fail the cmpxchg, we must observe what came > before the failure. > > If it wasn't so damn late, I'd try and write a litmus test for this, > because now I'm starting to get confused -- also probably because it's > late.
The above description suggests: C wake_up_q-wake_q_add { int next = 0; int y = 0; } P0(int *next, int *y) { int r0; /* in wake_up_q() */ WRITE_ONCE(*next, 1); /* node->next = NULL */ smp_mb(); /* implied by wake_up_process() */ r0 = READ_ONCE(*y); } P1(int *next, int *y) { int r1; /* in wake_q_add() */ WRITE_ONCE(*y, 1); /* wake_cond = true */ smp_mb__before_atomic(); r1 = cmpxchg_relaxed(next, 1, 2); } exists (0:r0=0 /\ 1:r1=0) This "exists" clause cannot be satisfied according to the LKMM: Test wake_up_q-wake_q_add Allowed States 3 0:r0=0; 1:r1=1; 0:r0=1; 1:r1=0; 0:r0=1; 1:r1=1; No Witnesses Positive: 0 Negative: 3 Condition exists (0:r0=0 /\ 1:r1=0) Observation wake_up_q-wake_q_add Never 0 3 Time wake_up_q-wake_q_add 0.00 Hash=72d85545f97ef7fd35c8928259225ee0 (TBH, I'm not sure what "y" (you denoted it "wake_cond") is pointing to here/is modeling, but I might have missed some previous remarks...) Andrea > > In any case, I think you patch is 'wrong' because it puts the barrier on > the wrong side of the cmpxchg() (after, as opposed to before).