When there is serious memory pressure, all workers in a pool could be blocked, and a new thread cannot be created because it requires memory allocation.
In this situation a WQ_MEM_RECLAIM workqueue will wake up the rescuer thread to do some work. The rescuer will only handle requests that are already on ->worklist. If max_requests is 1, that means it will handle a single request. The rescuer will be woken again in 100ms to handle another max_requests requests. I've seen a machine (running a 3.0 based "enterprise" kernel) with thousands of requests queued for xfslogd, which has a max_requests of 1, and is needed for retiring all 'xfs' write requests. When one of the worker pools gets into this state, it progresses extremely slowly and possibly never recovers (only waited an hour or two). So if, after handling everything on worklist, there is again something on worklist (counted in nr_active), and if the queue is still congested, keep processing instead of waiting for the next wake-up. Signed-off-by: NeilBrown <ne...@suse.de> --- Hi Tejun, I haven't tested this patch yet so this really is an 'RFC'. In general ->nr_active should only be accessed under the pool->lock, but a miss-read here will at most cause a very occasional 100ms delay so shouldn't be a big problem. The only thread likely to change ->nr_active is this thread, so such a delay would be extremely unlikely. Thanks, NeilBrown diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 09b685daee3d..d0a8b101c5d9 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2244,16 +2244,18 @@ repeat: spin_lock_irq(&pool->lock); rescuer->pool = pool; - /* - * Slurp in all works issued via this workqueue and - * process'em. - */ - WARN_ON_ONCE(!list_empty(&rescuer->scheduled)); - list_for_each_entry_safe(work, n, &pool->worklist, entry) - if (get_work_pwq(work) == pwq) - move_linked_works(work, scheduled, &n); + do { + /* + * Slurp in all works issued via this workqueue and + * process'em. + */ + WARN_ON_ONCE(!list_empty(&rescuer->scheduled)); + list_for_each_entry_safe(work, n, &pool->worklist, entry) + if (get_work_pwq(work) == pwq) + move_linked_works(work, scheduled, &n); - process_scheduled_works(rescuer); + process_scheduled_works(rescuer); + } while (need_more_worker(pool) && pwq->nr_active); /* * Put the reference grabbed by send_mayday(). @pool won't
pgpNcBZEZm2I9.pgp
Description: OpenPGP digital signature