On 23/02/2008, Linus Torvalds <[EMAIL PROTECTED]> wrote: > On Sat, 23 Feb 2008, Oleg Nesterov wrote: > > > > > In short: wake_up_process() doesn't imply mb(), this means that _in theory_ > > the commonly used code like > > > > set_current_state(TASK_INTERRUPTIBLE); > > if (CONDITION) > > return; > > schedule(); > > > > is racy wrt > > > > CONDITION = 1; > > wake_up_process(p); > > > > I'll be happy to be wrong though, please correct me. > > > Well, you should be wrong on x86, because the spinlock at the head of > wake_up_process() (well, "try_to_wake_up()" to be exact) will be a full > memory barrier. > > But yeah, in general spinlocks can have weaker semantics, and let > preceding writes percolate into the critical section and thus past the > point that actually sets task->state. > > And I do agree that we should *not* add a memory barrier in the caller > (that's just going to be really confusing for everybody, and make these > things much harder than they should be), and we should make sure that the > above sequence is always race-free. > > I also think that a full memory barrier is overkill. We should be ok with > just adding a write barrier to the top of wake_up_process(), no?
No, wmb is not enough. I've provided an explanation in the original thread. (http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/44c45685680585fc/e58785df0eeee6f8?lnk=raot) Actually, there seems to be _no_ problem at all, provided a task to be woken up is _not_ running on another CPU at the exact moment of wakeup. Why? shared_data = new; wake_up_task(p); (1) try_to_wake_up() holds a lock of the runqueue on which 'p' is to-be-placed ; (2) it's _guaranteed_ that 'shared_data' will be updated by the moment any UNLOCK is called -- in our case, try_to_wake_up(p) calls unlock(&rq->lock); (3) for 'p' to start running, something must call schedule() -> which will take 'rq->lock'... and 'rq' is the same as in (2). IOW, 'p' can't start running untill try_to_wake_up(p) releases 'rq->lock', and as said in (2), that implies that 'shared_data' will be up-to-date by this moment. IOW #2, 'rq->lock' is kind of a synchronization point/'barrier' in this case. does it make sense now? Another point is if 'p' is actually _running_ on another CPU at the time when we do try_to_wake_up()... and I guess, the potential problem is only relevant for situations like below: CPU #0: EIP_1 ---> (*) /* so 'p' is at this point now */ set_current_state(TASK_INTERRUPTIBLE); if (shared_data == magic) schedule(); CPU #1: shared_data = magic; try_to_wake_up(p); now the problem is if 'p->state' (inside try_to_wake_up()) will be loaded _before_ 'shared_data' has been updated. recall, we are about to execute set_current_state(TASK_INTERRUPTIBLE) on CPU #0... so p->state is still TASK_RUNNING, meaning that try_to_wake_up() just exits ! (nothing to be done) in the mean time, on CPU #0: set_current_state(TASK_INTERRUPTIBLE) is called. shared_data is checked _but_ it's still an old value (CPU #1 is still inside try_to_wake_up(p)) so we call schedule() and go to sleep... i.e. we actually lost a wakeup. to sum it up, for the following scheme to work: set_current_state(TASK_INTERRUPTIBLE); <--- here we have a smb_mb() if (condition) schedule(); effectively => (1) MODIFY(current->state) ; (2) LOAD(condition) and a wake_up path must ensure access (LOAD or MODIFY) to the same data happens in the _reverse_ order: condition = new; smb_mb(); try_to_wake_up(); => (1) MODIFY(condition); (2) LOAD(current->state) try_to_wake_up() does not need to be a full mb per se, the only requirement (and only for situation like above) is that there is a full mb between possible write ops. that have taken place before try_to_wake_up() _and_ a load of p->state inside try_to_wake_up(). does it make sense #2 ? :-) (yeah, maybe I'm just too paranoid :-) > > Linus > -- Best regards, Dmitry Adamushko -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/