When there is serious memory pressure, all workers in a pool could be blocked,
and a new thread cannot be created because it requires memory allocation.

In this situation a WQ_MEM_RECLAIM workqueue will wake up the
rescuer thread to do some work.

The rescuer will only handle requests that are already on ->worklist.
If max_requests is 1, that means it will handle a single request.

The rescuer will be woken again in 100ms to handle another max_requests
requests.

I've seen a machine (running a 3.0 based "enterprise" kernel) with thousands
of requests queued for xfslogd, which has a max_requests of 1, and is needed
for retiring all 'xfs' write requests.  When one of the worker pools gets
into this state, it progresses extremely slowly and possibly never recovers
(only waited an hour or two).

So if, after handling everything on worklist, there is again something on
worklist (counted in nr_active), and if the queue is still congested, keep
processing instead of waiting for the next wake-up.

Signed-off-by: NeilBrown <ne...@suse.de>
---

Hi Tejun,
  I haven't tested this patch yet so this really is an 'RFC'.
In general ->nr_active should only be accessed under the pool->lock,
but a miss-read here will at most cause a very occasional 100ms delay so
shouldn't be a big problem.  The only thread likely to change ->nr_active is
this thread, so such a delay would be extremely unlikely.

Thanks,
NeilBrown


diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 09b685daee3d..d0a8b101c5d9 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2244,16 +2244,18 @@ repeat:
                spin_lock_irq(&pool->lock);
                rescuer->pool = pool;
 
-               /*
-                * Slurp in all works issued via this workqueue and
-                * process'em.
-                */
-               WARN_ON_ONCE(!list_empty(&rescuer->scheduled));
-               list_for_each_entry_safe(work, n, &pool->worklist, entry)
-                       if (get_work_pwq(work) == pwq)
-                               move_linked_works(work, scheduled, &n);
+               do {
+                       /*
+                        * Slurp in all works issued via this workqueue and
+                        * process'em.
+                        */
+                       WARN_ON_ONCE(!list_empty(&rescuer->scheduled));
+                       list_for_each_entry_safe(work, n, &pool->worklist, 
entry)
+                               if (get_work_pwq(work) == pwq)
+                                       move_linked_works(work, scheduled, &n);
 
-               process_scheduled_works(rescuer);
+                       process_scheduled_works(rescuer);
+               } while (need_more_worker(pool) && pwq->nr_active);
 
                /*
                 * Put the reference grabbed by send_mayday().  @pool won't

Attachment: pgpNcBZEZm2I9.pgp
Description: OpenPGP digital signature

Reply via email to