On Thu, Apr 09, 2026 at 09:15:50PM +0200, Vasily Gorbik wrote:
> On Thu, Apr 09, 2026 at 10:22:00AM -0700, Paul E. McKenney wrote:
> > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > non-preemptible") defers srcu_node tree allocation when called under
> > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > work accumulates, workqueue lockup detector fires.
> > >
> > > Before 61bbcfb50514, GFP_ATOMIC allocation went straight to
> > > SRCU_SIZE_BIG, the mask = ~0 path was never reached.
> > >
> > > Affects systems with convert_to_big active (auto when nr_cpu_ids >= 128)
> > > and possible CPUs > online CPUs. Hit on s390 LPAR (76 online, 400
> > > possible),
> > > where possible CPUs > online CPUs is the usual case.
> > > Also reproducible on x86 KVM --smp 16,maxcpus=255 (CONFIG_NR_CPUS=256)
> > > or simply -smp 1,maxcpus=2 with srcutree.convert_to_big=1
> > > or --smp 16,maxcpus=64 with srcutree.big_cpu_lim=32 (CONFIG_NR_CPUS=64)
> > >
> > > s390 log (76 online CPUs, 400 possible, all pools 76-399 stuck):
> > >
> > > BUG: workqueue lockup - pool cpus=76 node=0 flags=0x4 nice=0 stuck for
> > > 1842s!
> > > BUG: workqueue lockup - pool cpus=77 node=0 flags=0x4 nice=0 stuck for
> > > 1842s!
> > > ...
> > > BUG: workqueue lockup - pool cpus=399 node=0 flags=0x4 nice=0 stuck for
> > > 1842s!
> > > Showing busy workqueues and worker pools:
> > > workqueue rcu_gp: flags=0x108
> > > pwq 306: cpus=76 node=0 flags=0x4 nice=0 active=3 refcnt=4
> > > pending: 3*srcu_invoke_callbacks
> > > pwq 310: cpus=77 node=0 flags=0x4 nice=0 active=3 refcnt=4
> > > pending: 3*srcu_invoke_callbacks
> > > ...
> > > pwq 1598: cpus=399 node=0 flags=0x4 nice=0 active=3 refcnt=4
> > > pending: 3*srcu_invoke_callbacks
> > >
> > > Not sure if replacing mask = ~0 with something derived from
> > > cpu_online_mask would be racy in that context.
> > >
> > > [1] https://lore.kernel.org/rcu/[email protected]
> > > [2]
> > > https://lore.kernel.org/rcu/fe28d664-3872-40f6-83c6-818627ad5b7d@paulmck-laptop
> >
> > This was a pre-existing bug, but the change made it much more likely
> > to happen.
>
> Yes, indeed.
>
> > Does the alleged (and untested) fix below do the trick? The theory is
> > that if a given CPU has ever been fully online, it has workqueues set up.
> > Directly checking whether a CPU is currently online is vulnerable to a CPU
> > piling up lots of SRCU callbacks, then going offline. So we do need to
> > be prepared to invoke SRCU callbacks for CPUs that are currently offline.
>
> Yes, tested on s390 LPAR (76 online, 400 possible) as well as
> on x86 KVM with --smp 16,maxcpus=255 and CONFIG_NR_CPUS=256
> no more workqueue lockup in both cases.
>
> Thank you!
>
> Tested-by: Vasily Gorbik <[email protected]>
Thank you for testing this!
Please see below for an updated patch. Tejun's patch might obsolete
this one, but just in case he balks at SRCU queueing handlers for CPUs
that are not even in the cpu_possible_mask. ;-)
Thanx, Paul
------------------------------------------------------------------------
commit dcc14db7e76af899f1ff4606ec4316580d7b6f88
Author: Paul E. McKenney <[email protected]>
Date: Thu Apr 9 11:16:02 2026 -0700
srcu: Don't queue workqueue handlers to never-online CPUs
While an srcu_struct structure is in the midst of switching from CPU-0
to all-CPUs state, it can attempt to invoke callbacks for CPUs that
have never been online. Worse yet, it can attempt in invoke callbacks
for CPUs that never will be online due to not being present in the
cpu_possible_mask. This can cause hangs on s390, which is not set up to
deal with workqueue handlers being scheduled on such CPUs. This commit
therefore causes Tree SRCU to refrain from queueing workqueue handlers
on CPUs that have not yet (and might never) come online.
Reported-by: Vasily Gorbik <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Tested-by: Vasily Gorbik <[email protected]>
Cc: Tejun Heo <[email protected]>
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 0d01cd8c4b4a7..a67af44fc0745 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -897,11 +897,9 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp,
struct srcu_node *snp
{
int cpu;
- for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
- if (!(mask & (1UL << (cpu - snp->grplo))))
- continue;
- srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
- }
+ for (cpu = snp->grplo; cpu <= snp->grphi; cpu++)
+ if ((mask & (1UL << (cpu - snp->grplo))) &&
rcu_cpu_beenfullyonline(cpu))
+ srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu),
delay);
}
/*