Oleg Nesterov wrote: >> and this. After grabbing cwq lock, compare it to get_wq_data() first, >> if they are the same, both are using the same lock so there's no >> reason for the barriers. If they are different, it doesn't matter >> anyway. We need to retry the locking. > > I think this is not right. The problem is that work->data (its > WORK_STRUCT_WQ_DATA_MASK part, cwq) could be changed _without_ holding > the old cwq->lock. > > Suppose that CPU_0 does queue_delayed_work(dwork). We start the timer, > work->data points to cwq_0. CPU_DEAD comes, the timer migrates to > CPU_1, but work->data was not changed yet. > >> retry: >> cwq = get_sw_data(work); >> if (!cwq) >> return ret; >> >> spin_lock_irq(&cwq->lock); >> if (unlikely(cwq != get_wq_data(work))) { >> /* oops wrong cwq */ >> spin_unlock_irq(&cwq->lock); >> goto retry; /* or just return 0; */ >> } > > dwork->timer fires, delayed_work_timer_fn() changes work->data to cwq_1 > and queues this work on cwq_1. > >> if (!list_empty(&work->entry)) { >> list_del_init(&work->entry); > > Oops! we are holding cwq_0->lock, but modify cwq_1->worklist. > > Actually, we have the same problem with a plain queue_work() if _cpu_down() > comes at "right" time. > > However, I agree, this smp_wmb() in insert_work() should die. We can > introduce "smp_mb__before_spinlock()" (no-op on x86 at least) to kill it.
Yeah, right, we allow cwq pointer to change without holding the lock. Although I still think that is where we should fix the problem. Taking down CPU is a cold cold path. We can afford a lot of overhead there. IMHO, if we can do that, it would be far better than memory barrier dance which tends to be difficult to understand and thus prove/maintain correctness. I'll think about it more. >>> + * It is possible to use this function if the work re-queues itself. It can >>> + * cancel the work even if it migrates to another workqueue, however in >>> that >>> + * case it only garantees that work->func() has completed on the last >>> queued >>> + * workqueue. >> We first prevent requeueing from happening; then, wait for each cwq to >> finish the work, so I think we're guaranteed that they're finished on >> all cpus. Otherwise, the 'sync' part isn't too useful as it means all >> rearming tasks might be running on completion of cancel_work_sync(). > > Yes, sure, you are right. What I meant was: > > struct workqueue_struct *WQ1, *WQ1; > > void work_func(struct work_struct *self) > { > // migrate on another workqueue > queue_work(WQ2, self); > > do_something(); > } > > queue_work(WQ1, work); > > Now, cancel_work_sync() can't guarantee that this work has finished > its execution on WQ1. Right again, I misunderstood that the migration was between cwqs. Things definitely won't work if it's between different wqs. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/