On Tue, Jan 12, 2021 at 11:38:12PM +0800, Lai Jiangshan wrote:

> But the hard problem is "how to suppress the warning of
> online&!active in __set_cpus_allowed_ptr()" for late spawned
> unbound workers during hotplug.

I cannot see create_worker() go bad like that.

The thing is, it uses:

  kthread_bind_mask(, pool->attr->cpumask)
  worker_attach_to_pool()
    set_cpus_allowed_ptr(, pool->attr->cpumask)

which means set_cpus_allowed_ptr() must be a NOP, because the affinity
is already set by kthread_bind_mask(). Further, the first wakeup of that
worker will then hit:

  select_task_rq()
    is_cpu_allowed()
      is_per_cpu_kthread() -- false
    select_fallback_rq()


So normally that really isn't a problem. I can only see a tiny hole
there, where someone changes the cpumask between kthread_bind_mask() and
set_cpus_allowed_ptr(). AFAICT that can be fixed in two ways:

 - add wq_pool_mutex around things in create_worker(), or
 - move the set_cpus_allowed_ptr() out of worker_attach_to_pool() and
   into rescuer_thread().

Which then brings us to rescuer_thread...  If we manage to trigger the
rescuer during hotplug, then yes, I think that can go wobbly.

Let me consider that a bit more while I try and make sense of that splat
Paul reported.

---

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ec0771e4a3fb..fe05308dc472 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1844,15 +1844,19 @@ static struct worker *alloc_worker(int node)
  * cpu-[un]hotplugs.
  */
 static void worker_attach_to_pool(struct worker *worker,
-                                  struct worker_pool *pool)
+                                 struct worker_pool *pool,
+                                 bool set_affinity)
 {
        mutex_lock(&wq_pool_attach_mutex);
 
-       /*
-        * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
-        * online CPUs.  It'll be re-applied when any of the CPUs come up.
-        */
-       set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+       if (set_affinity) {
+               /*
+                * set_cpus_allowed_ptr() will fail if the cpumask doesn't have
+                * any online CPUs.  It'll be re-applied when any of the CPUs
+                * come up.
+                */
+               set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+       }
 
        /*
         * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
@@ -1944,7 +1948,7 @@ static struct worker *create_worker(struct worker_pool 
*pool)
        kthread_bind_mask(worker->task, pool->attrs->cpumask);
 
        /* successful, attach the worker to the pool */
-       worker_attach_to_pool(worker, pool);
+       worker_attach_to_pool(worker, pool, false);
 
        /* start the newly created worker */
        raw_spin_lock_irq(&pool->lock);
@@ -2509,7 +2513,11 @@ static int rescuer_thread(void *__rescuer)
 
                raw_spin_unlock_irq(&wq_mayday_lock);
 
-               worker_attach_to_pool(rescuer, pool);
+               /*
+                * XXX can go splat when running during hot-un-plug and
+                * the pool affinity is wobbly.
+                */
+               worker_attach_to_pool(rescuer, pool, true);
 
                raw_spin_lock_irq(&pool->lock);
 

Reply via email to