* Tejun Heo <[email protected]> [2026-04-10 08:53:30]:

Hi Tejun,

[ copying Samir Mulani to this thread ]

> Hello,
> 
> > Seems that we (mostly Paul) have our own trick to track whether a CPU
> > has ever been onlined in RCU, see rcu_cpu_beenfullyonline(). Paul also
> > used it in his fix [1]. And I think it won't be that hard to copy it
> > into workqueue and let queue_work_on() use it so that if the user queues
> > a work on a never-onlined CPU, it can detect it (with a warning?) and do
> > something?
> 
> The easiest way to do this is just creating the initial workers for all
> possible pools. Please see below. However, the downside is that it's going
> to create all workers for all possible cpus. This isn't a problem for
> anybody else but these IBM mainframes often come up with a lot of possible
> but not-yet-or-ever-online CPUs for capacity management, so the cost may not
> be negligible on some configurations.
> 
> IBM folks, is that okay?

Even on PowerPC LPARS, its not uncommon to have possible cpus != online cpus
at boot.  However your approach will work.

And Samir has already tested the same too and reported here
https://lkml.kernel.org/r/[email protected]

> 
> Also, why do you need to queue work items on an offline CPU? Do they
> actually have to be per-cpu? Can you get away with using an unbound
> workqueue?
> 
> Thanks.
> 
> From: Tejun Heo <[email protected]>
> Subject: workqueue: Create workers for all possible CPUs on init
> 
> Per-CPU worker pools are initialized for every possible CPU during early boot,
> but workqueue_init() only creates initial workers for online CPUs. On systems
> where possible CPUs outnumber online CPUs (e.g. s390 LPARs with 76 online and
> 400 possible CPUs), the pools for never-onlined CPUs have POOL_DISASSOCIATED
> set but no workers. Any work item queued on such a CPU hangs indefinitely.
> 
> This was exposed by 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> non-preemptible") which made SRCU schedule callbacks on all possible CPUs
> during size transitions, triggering workqueue lockup warnings for all
> never-onlined CPUs.
> 
> Create workers for all possible CPUs during init, not just online ones. For
> online CPUs, the behavior is unchanged - POOL_DISASSOCIATED is cleared and the
> worker is bound to the CPU. For not-yet-online CPUs, POOL_DISASSOCIATED
> remains set, so worker_attach_to_pool() marks the worker UNBOUND and it can
> execute on any CPU. When the CPU later comes online, rebind_workers() handles
> the transition to associated operation as usual.
> 

With these patch, if a CPU has been onlined once, it's should be ok to queue
the work on that CPU even if its offline now.

> Reported-by: Vasily Gorbik <[email protected]>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Paul E. McKenney <[email protected]>

Reviewed-by: Srikar Dronamraju <[email protected]>

> ---
>  kernel/workqueue.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -8068,9 +8068,10 @@ void __init workqueue_init(void)
>               for_each_bh_worker_pool(pool, cpu)
>                       BUG_ON(!create_worker(pool));
> 
> -     for_each_online_cpu(cpu) {
> +     for_each_possible_cpu(cpu) {
>               for_each_cpu_worker_pool(pool, cpu) {
> -                     pool->flags &= ~POOL_DISASSOCIATED;
> +                     if (cpu_online(cpu))
> +                             pool->flags &= ~POOL_DISASSOCIATED;
>                       BUG_ON(!create_worker(pool));
>               }
>       }
> -- 
> tejun

-- 
Thanks and Regards
Srikar Dronamraju

Reply via email to