Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition

Paul E. McKenney Wed, 29 Apr 2026 11:06:03 -0700

On Wed, Apr 29, 2026 at 11:14:56PM +0530, Shrikanth Hegde wrote:
> 
> I have limited understanding in rcu or workqueues, but my two cents.
> 
> On 4/29/26 10:48 PM, Paul E. McKenney wrote:
> > On Wed, Apr 29, 2026 at 07:08:23PM +0200, Vasily Gorbik wrote:
> > > On Wed, Apr 29, 2026 at 08:30:38PM +0530, Srikar Dronamraju wrote:
> > > > * Tejun Heo <[email protected]> [2026-04-10 08:53:30]:
> > > > > Hello,
> > > > > 
> > > > > > Seems that we (mostly Paul) have our own trick to track whether a 
> > > > > > CPU
> > > > > > has ever been onlined in RCU, see rcu_cpu_beenfullyonline(). Paul 
> > > > > > also
> > > > > > used it in his fix [1]. And I think it won't be that hard to copy it
> > > > > > into workqueue and let queue_work_on() use it so that if the user 
> > > > > > queues
> > > > > > a work on a never-onlined CPU, it can detect it (with a warning?) 
> > > > > > and do
> > > > > > something?
> > > > > 
> > > > > The easiest way to do this is just creating the initial workers for 
> > > > > all
> > > > > possible pools. Please see below. However, the downside is that it's 
> > > > > going
> > > > > to create all workers for all possible cpus. This isn't a problem for
> > > > > anybody else but these IBM mainframes often come up with a lot of 
> > > > > possible
> > > > > but not-yet-or-ever-online CPUs for capacity management, so the cost 
> > > > > may not
> > > > > be negligible on some configurations.
> > > > > 
> > > > > IBM folks, is that okay?
> > > > 
> > > > Even on PowerPC LPARS, its not uncommon to have possible cpus != online 
> > > > cpus
> > > > at boot.  However your approach will work.
> > > > 
> > > > And Samir has already tested the same too and reported here
> > > > https://lkml.kernel.org/r/[email protected]
> > > > 
> > > > > From: Tejun Heo <[email protected]>
> > > > > Subject: workqueue: Create workers for all possible CPUs on init
> > > > > 
> > > > > Per-CPU worker pools are initialized for every possible CPU during 
> > > > > early boot,
> > > > > but workqueue_init() only creates initial workers for online CPUs. On 
> > > > > systems
> > > > > where possible CPUs outnumber online CPUs (e.g. s390 LPARs with 76 
> > > > > online and
> > > > > 400 possible CPUs), the pools for never-onlined CPUs have 
> > > > > POOL_DISASSOCIATED
> > > > > set but no workers. Any work item queued on such a CPU hangs 
> > > > > indefinitely.
> > > > > 
> > > > > This was exposed by 61bbcfb50514 ("srcu: Push srcu_node allocation to 
> > > > > GP when
> > > > > non-preemptible") which made SRCU schedule callbacks on all possible 
> > > > > CPUs
> > > > > during size transitions, triggering workqueue lockup warnings for all
> > > > > never-onlined CPUs.
> > > > > 
> > > > > Create workers for all possible CPUs during init, not just online 
> > > > > ones. For
> > > > > online CPUs, the behavior is unchanged - POOL_DISASSOCIATED is 
> > > > > cleared and the
> > > > > worker is bound to the CPU. For not-yet-online CPUs, 
> > > > > POOL_DISASSOCIATED
> > > > > remains set, so worker_attach_to_pool() marks the worker UNBOUND and 
> > > > > it can
> > > > > execute on any CPU. When the CPU later comes online, rebind_workers() 
> > > > > handles
> > > > > the transition to associated operation as usual.
> > > > > 
> > > > 
> > > > With these patch, if a CPU has been onlined once, it's should be ok to 
> > > > queue
> > > > the work on that CPU even if its offline now.
> > > 
> > > That already seems to hold without this patch, what this patch newly
> > > covers is queueing on CPUs that have never been online.
> > > 
> > > Do we actually need to create workers for every possible CPU at boot?
> > > On the s390 LPAR in question (76 online / 400 possible) that's a few
> > > hundred extra kthreads kept around for the life of the system.
> > > That's probably the same on PowerPC.
> > > 
> > > Wouldn't Paul's SRCU-side fix [1] alone be enough here for PowerPC
> > > as well? I retested it on s390 (76/400) and on x86 KVM with
> > > --smp 16,maxcpus=255 and the lockup didn't reproduce in either case.
> > > 
> > > [1] 
> > > https://lore.kernel.org/rcu/ed1fa6cd-7343-4ca3-8b9d-d699ca496f83@paulmck-laptop/
> > 
> > Just to emphasize that SRCU really was buggy before my fix.  The
> > queue_work_on() kernel-doc header clearly states the rules.  The bug
> > is even more embarrassing given just who it was that wrote those two
> > sentences.  ;-)
> 
> That mask = ~0 is really looks uncomfortable to me. What does it mean?
> It might end up even sending to non possible CPUs without proper checks.
> 
> It should use either cpumask_setall? or use cpu_online_mask?
> 
> Your current patch rcu_cpu_beenfullyonline indicates that code around
> srcu_schedule_cbs_sdp handles hotplug already right?
> in that case, just setting mask = cpu_online_mask would work?


Agreed.  Which is why I have this commit queued:

f8d5aaaf90f8 ("srcu: Don't queue workqueue handlers to never-online CPUs")

This is currently slated for the upcoming merge window, but if you
need it sooner, please let us know.  Please see the end of this email
for the full commit.


                                                        Thanx, Paul

> > /**
> >   * queue_work_on - queue work on specific cpu
> >   * @cpu: CPU number to execute work on
> >   * @wq: workqueue to use
> >   * @work: work to queue
> >   *
> >   * We queue the work to a specific CPU, the caller must ensure it
> >   * can't go away.  Callers that fail to ensure that the specified
> >   * CPU cannot go away will execute on a randomly chosen CPU.
> >   * But note well that callers specifying a CPU that never has been
> >   * online will get a splat.
> >   *
> >   * Return: %false if @work was already on a queue, %true otherwise.
> >   */
> 
> 
> In that case, making offline CPUs have a unbound workqueue is wrong. no?
> 
> It might encourage more users to abuse queue_work_on interface to
> send to offline CPUs without any checks and onus now falls onto
> workqueue to disaptch to unbound wqs.
> 
> So I think it is better to put the guardrails in SRCU instead of any change in
> workqueue.

------------------------------------------------------------------------

commit f8d5aaaf90f8294890802ce8dccbafd9850ac5f9
Author: Paul E. McKenney <[email protected]>
Date:   Thu Apr 9 11:16:02 2026 -0700

    srcu: Don't queue workqueue handlers to never-online CPUs
    
    While an srcu_struct structure is in the midst of switching from CPU-0
    to all-CPUs state, it can attempt to invoke callbacks for CPUs that
    have never been online.  Worse yet, it can attempt in invoke callbacks
    for CPUs that never will be online due to not being present in the
    cpu_possible_mask.  This can cause hangs on s390, which is not set up to
    deal with workqueue handlers being scheduled on such CPUs.  This commit
    therefore causes Tree SRCU to refrain from queueing workqueue handlers
    on CPUs that have not yet (and might never) come online.
    
    Because callbacks are not invoked on CPUs that have not been
    online, it is an error to invoke call_srcu(), synchronize_srcu(), or
    synchronize_srcu_expedited() on a CPU that is not yet fully online.
    However, it turns out to be less code to redirect the callbacks
    from too-early invocations of call_srcu() than to warn about such
    invocations.  This commit therefore also redirects callbacks queued on
    not-yet-fully-online CPUs to the boot CPU.
    
    Reported-by: Vasily Gorbik <[email protected]>
    Signed-off-by: Paul E. McKenney <[email protected]>
    Tested-by: Vasily Gorbik <[email protected]>
    Cc: Tejun Heo <[email protected]>

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 0d01cd8c4b4a7b..7c2f7cc131f7ae 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -897,11 +897,9 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, 
struct srcu_node *snp
 {
        int cpu;
 
-       for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
-               if (!(mask & (1UL << (cpu - snp->grplo))))
-                       continue;
-               srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
-       }
+       for (cpu = snp->grplo; cpu <= snp->grphi; cpu++)
+               if ((mask & (1UL << (cpu - snp->grplo))) && 
rcu_cpu_beenfullyonline(cpu))
+                       srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), 
delay);
 }
 
 /*
@@ -1322,7 +1320,9 @@ static unsigned long srcu_gp_start_if_needed(struct 
srcu_struct *ssp,
         */
        idx = __srcu_read_lock_nmisafe(ssp);
        ss_state = smp_load_acquire(&ssp->srcu_sup->srcu_size_state);
-       if (ss_state < SRCU_SIZE_WAIT_CALL)
+       // If !rcu_cpu_beenfullyonline(), interrupts are still disabled,
+       // so no migration is possible in either direction from this CPU.
+       if (ss_state < SRCU_SIZE_WAIT_CALL || 
!rcu_cpu_beenfullyonline(raw_smp_processor_id()))
                sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
        else
                sdp = raw_cpu_ptr(ssp->sda);

Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition

Reply via email to