On 12/22, Andrew Morton wrote: > > On Fri, 22 Dec 2006 16:07:24 +0530 > Gautham R Shenoy <[EMAIL PROTECTED]> wrote: > > > While we are at this per-subsystem cpuhotplug "locking", here's a > > proposal that might put an end to the workqueue deadlock woes. > > Oleg is working on some patches which will permit us to cancel or wait upon > a particular work_struct, rather than upon all pending work_structs.
I hope there are completed. I am waiting for the next -mm release to send a "final" patch, I need too look at set_wq_data/set_wq_data when workqueue.c will be in sync with Linus's changes. > This will fix the problem where we accidentlly wait upon some unrelated > work_struct which takes a lock which is related to one which we already > hold. > > I hope. It'll be a bit tricky to implement: if some foreign work_struct is > running right now, we cannot wait upon it - we must non-blockingly dequeue > the work_struct which we want to kill before it gets to run. The previous patch I sent [PATCH, RFC rc1-mm1] implement flush_work() http://marc.theaimsgroup.com/?l=linux-kernel&m=116647310413104 has a race. +static void wait_on_work(struct cpu_workqueue_struct *cwq, + struct work_struct *work) +{ + struct wq_barrier barr; + int running = 0; + + spin_lock_irq(&cwq->lock); + if (get_wq_data(work) == cwq) { + list_del_init(&work->entry); + work_release(work); + } If that work is pending on CPU 1 it, and this CPU goes down, it may be moved to CPU 0 after flush_work() already checked CPU 0. I think we can do this: static void wait_on_work(struct cpu_workqueue_struct *cwq, struct work_struct *work) { struct wq_barrier barr; int running = 0; spin_lock_irq(&cwq->lock); if (unlikely(cwq->current_work == work)) { init_wq_barrier(&barr); insert_work(cwq, &barr.work, 0); running = 1; } spin_unlock_irq(&cwq->lock); if (unlikely(running)) { mutex_unlock(&workqueue_mutex); wait_for_completion(&barr.done); mutex_lock(&workqueue_mutex); } } void flush_work(struct workqueue_struct *wq, struct work_struct *work) { struct cpu_workqueue_struct *cwq; cwq = get_wq_data(work); if (!cwq) return; spin_lock_irq(&cwq->lock); list_del_init(&work->entry); work_release(work); spin_unlock_irq(&cwq->lock); mutex_lock(&workqueue_mutex); if (is_single_threaded(wq)) { /* Always use first cpu's area. */ wait_on_work(per_cpu_ptr(wq->cpu_wq, singlethread_cpu), work); } else { int cpu; for_each_online_cpu(cpu) wait_on_work(per_cpu_ptr(wq->cpu_wq, cpu), work); } mutex_unlock(&workqueue_mutex); } Do you see any problems? When wait_on_work() unlocks workqueue_mutex (or whatever we choose to protect against CPU hotplug), CPU may go away. But in that case take_over_work() will move a barrier we queued to another CPU, it will be fired sometime, and wait_on_work() will be woken. Actually, we are doing cleanup_workqueue_thread()->kthread_stop() before take_over_work(), so cwq->thread should complete its ->worklist (and thus the barrier), because currently we don't check kthread_should_stop() in run_workqueue(). But even if we did, everything looks safe to me. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/