On 07/07/20 09:17, Peter Zijlstra wrote: > On Tue, Jul 07, 2020 at 12:56:04AM +0100, Valentin Schneider wrote: > >> > @@ -2605,8 +2596,20 @@ try_to_wake_up(struct task_struct *p, unsigned int >> > state, int wake_flags) >> > * >> > * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in >> > * __schedule(). See the comment for smp_mb__after_spinlock(). >> > + * >> > + * Form a control-dep-acquire with p->on_rq == 0 above, to ensure >> > + * schedule()'s deactivate_task() has 'happened' and p will no longer >> > + * care about it's own p->state. See the comment in __schedule(). >> > */ >> > - smp_rmb(); >> > + smp_acquire__after_ctrl_dep(); >> >> Apologies for asking again, but I'm foolishly hopeful I'll someday be able >> to grok those things without half a dozen tabs open with documentation and >> Paul McKenney papers. >> >> Do I get it right that the 'acquire' part hints this is equivalent to >> issuing a load-acquire on whatever was needed to figure out whether or not >> the take the branch (in this case, p->on_rq, amongst other things); IOW >> ensures any memory access appearing later in program order has to happen >> after the load? >> >> That at least explains to me the load->{load,store} wording in >> smp_acquire__after_ctrl_dep(). > > Yes. > > So the thing is that hardware MUST NOT speculate stores, or rather, if > it does, it must take extreme measures to ensure they do not become > visible in any way shape or form, since speculative stores lead to > instant OOTA problems. > > Therefore we can say that branches order stores and if the branch > condition depends on a load, we get a load->store order. IOW the load > must complete before we can resolve the branch, which in turn enables > the store to become visible/happen. >
Right, I think that point is made clear in memory-barriers.txt. > If we then add an smp_rmb() to the branch to order load->load, we end up > with a load->{load,store} ordering, which is equivalent to a > load-acquire. > > The reason to do it like that, is that load-aquire would otherwise > require an smp_mb(), since for many platforms that's the only barrier > that has load->store ordering. > > The down-side of doing it like this, as Paul will be quick to point out, > is that the C standard doesn't recognise control dependencies and thus > the compiler would be in its right to 'optimize' our conditional away. > Yikes! > We're relying on the compilers not having done this in the past and > there being sufficient compiler people interested in compiling Linux to > avoid this from happening. > > > Anyway, this patch is basically: > > LOAD p->state LOAD-ACQUIRE p->on_rq == 0 > MB > STORE p->on_rq, 0 STORE p->state, TASK_WAKING > > which ensures the TASK_WAKING store happens after the p->state load. > Just a wee bit complicated due to not actually adding any barriers while > adding additional ordering. > Your newer changelog also helped in that regards. Thanks a ton for the writeup! > Anyway, let me now endeavour to write a coherent Changelog for this mess > :-(