Re: [PATCH] workqueue: fix possible bug which may silence the pool

Tejun Heo Mon, 04 Mar 2013 11:21:13 -0800

Hello, Lai.

On Sat, Mar 02, 2013 at 11:55:29PM +0800, Lai Jiangshan wrote:
> After we introduce multiple pools for cpu pools, a part of the comments
> in wq_unbind_fn() becomes wrong.
> 
> It said that "current worker would trigger unbound chain execution".
> It is wrong. current worker only belongs to one of the multiple pools.
> 
> If wq_unbind_fn() does unbind the normal_pri pool(not the pool of the current
> worker), the current worker is not the available worker to trigger unbound
> chain execution of the normal_pri pool, and if all the workers of
> the normal_pri goto sleep after they were set %WORKER_UNBOUND but before
> they finish their current work, unbound chain execution is not triggered
> totally. The pool is stopped!
> 
> We can change wq_unbind_fn() only does unbind one pool and we launch multiple
> wq_unbind_fn()s, one for each pool to solve the problem.
> But this change will add much latency to hotplug path unnecessarily.
> 
> So we choice to wake up a worker directly to trigger unbound chain execution.
> 
> current worker may sleep on &second_pool->assoc_mutex, so we also move
> the wakeup code into the loop to avoid second_pool silences the first_pool.
> 
> Signed-off-by: Lai Jiangshan <[email protected]>


Nice catch.

> @@ -3446,28 +3446,35 @@ static void wq_unbind_fn(struct work_struct *work)
>  
>               spin_unlock_irq(&pool->lock);
>               mutex_unlock(&pool->assoc_mutex);
> -     }
>  
> -     /*
> -      * Call schedule() so that we cross rq->lock and thus can guarantee
> -      * sched callbacks see the %WORKER_UNBOUND flag.  This is necessary
> -      * as scheduler callbacks may be invoked from other cpus.
> -      */
> -     schedule();
> +             /*
> +              * Call schedule() so that we cross rq->lock and thus can
> +              * guarantee sched callbacks see the %WORKER_UNBOUND flag.
> +              * This is necessary as scheduler callbacks may be invoked
> +              * from other cpus.
> +              */
> +             schedule();
>  
> -     /*
> -      * Sched callbacks are disabled now.  Zap nr_running.  After this,
> -      * nr_running stays zero and need_more_worker() and keep_working()
> -      * are always true as long as the worklist is not empty.  Pools on
> -      * @cpu now behave as unbound (in terms of concurrency management)
> -      * pools which are served by workers tied to the CPU.
> -      *
> -      * On return from this function, the current worker would trigger
> -      * unbound chain execution of pending work items if other workers
> -      * didn't already.
> -      */
> -     for_each_std_worker_pool(pool, cpu)
> +             /*
> +              * Sched callbacks are disabled now.  Zap nr_running.
> +              * After this, nr_running stays zero and need_more_worker()
> +              * and keep_working() are always true as long as the worklist
> +              * is not empty.  This pool now behave as unbound (in terms of
> +              * concurrency management) pool which are served by workers
> +              * tied to the pool.
> +              */
>               atomic_set(&pool->nr_running, 0);
> +
> +             /* The current busy workers of this pool may goto sleep without
> +              * wake up any other worker after they were set %WORKER_UNBOUND
> +              * flag. Here we wake up another possible worker to start
> +              * the unbound chain execution of pending work items in this
> +              * case.
> +              */
> +             spin_lock_irq(&pool->lock);
> +             wake_up_worker(pool);
> +             spin_unlock_irq(&pool->lock);
> +     }

But can we please just addd wake_up_worker() in the
for_each_std_worker_pool() loop?  We want to mark the patch for
-stable and keep it short and to the point.  This patch is a couple
times larger than necessary.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] workqueue: fix possible bug which may silence the pool

Reply via email to