On 5/6/2025 2:21 PM, Paul E. McKenney wrote:
> On Tue, May 06, 2025 at 11:03:40AM -0700, Paul E. McKenney wrote:
>> On Tue, May 06, 2025 at 10:09:54AM -0400, Joel Fernandes wrote:
>>>
>>>
>>> On 5/5/2025 3:03 PM, Paul E. McKenney wrote:
>>>> Hello!
>>>>
>>>> Following up on off-list discussions and requests for more information.
>>>>
>>>> These reproduce reasonably easily, and here is an example:
>>>>
>>>> ------------------------------------------------------------------------
>>>> [ 426.022056] rcu: NOCB: Cannot CB-offload online CPU 25
>>>> [ 460.867350] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>>>> [ 460.867980] rcu: 0-...!: (1 GPs behind)
>>>> idle=8fe4/1/0x4000000000000000 softirq=2124/2125 fqs=0
>>>> [ 460.868726] rcu: 1-...!: (1 GPs behind)
>>>> idle=e0ac/1/0x4000000000000000 softirq=3547/3547 fqs=0
>>>> [ 460.869468] rcu: 5-...!: (1 GPs behind)
>>>> idle=5c34/1/0x4000000000000000 softirq=2084/2084 fqs=0
>>>> [ 460.870233] rcu: 8-...!: (1 GPs behind)
>>>> idle=9fdc/1/0x4000000000000000 softirq=4301/4301 fqs=0
>>>> [ 460.871007] rcu: 9-...!: (0 ticks this GP)
>>>> idle=4a2c/1/0x4000000000000000 softirq=3873/3873 fqs=0
>>>> [ 460.871772] rcu: 10-...!: (1 GPs behind)
>>>> idle=4994/1/0x4000000000000000 softirq=4760/4760 fqs=0
>>>> [ 460.872511] rcu: 12-...!: (1 GPs behind)
>>>> idle=0bbc/1/0x4000000000000000 softirq=3545/3545 fqs=0
>>>> [ 460.873300] rcu: 13-...!: (1 GPs behind)
>>>> idle=a59c/1/0x4000000000000000 softirq=2097/2098 fqs=0
>>>> [ 460.874084] rcu: 14-...!: (1 GPs behind)
>>>> idle=22cc/1/0x4000000000000000 softirq=4424/4424 fqs=0
>>>> [ 460.874864] rcu: 15-...!: (1 GPs behind)
>>>> idle=0e5c/1/0x4000000000000000 softirq=5984/5984 fqs=0
>>>> [ 460.875625] rcu: 16-...!: (1 GPs behind)
>>>> idle=d5bc/1/0x4000000000000000 softirq=6313/6313 fqs=0
>>>> [ 460.876374] rcu: 17-...!: (1 GPs behind)
>>>> idle=848c/1/0x4000000000000000 softirq=4160/4160 fqs=0
>>>> [ 460.877182] rcu: 18-...!: (1 GPs behind)
>>>> idle=a1c4/1/0x4000000000000000 softirq=3217/3217 fqs=0
>>>> [ 460.877930] rcu: 19-...!: (0 ticks this GP)
>>>> idle=cbe4/1/0x4000000000000000 softirq=7886/7886 fqs=0
>>>> [ 460.878772] rcu: 20-...!: (1 GPs behind)
>>>> idle=3acc/1/0x4000000000000000 softirq=4315/4317 fqs=0
>>>> [ 460.879514] rcu: 21-...!: (1 GPs behind)
>>>> idle=eb7c/1/0x4000000000000000 softirq=4221/4221 fqs=0
>>>> [ 460.880225] rcu: 22-...!: (1 GPs behind)
>>>> idle=477c/1/0x4000000000000000 softirq=5049/5049 fqs=0
>>>> [ 460.880990] rcu: 23-...!: (1 GPs behind)
>>>> idle=8404/1/0x4000000000000000 softirq=5635/5636 fqs=0
>>>> [ 460.881739] rcu: 24-...!: (0 ticks this GP)
>>>> idle=11e4/1/0x4000000000000000 softirq=2031/2031 fqs=0
>>>> [ 460.882497] rcu: 25-...!: (0 ticks this GP)
>>>> idle=3434/1/0x4000000000000000 softirq=3495/3495 fqs=0
>>>> [ 460.883296] rcu: 26-...!: (1 GPs behind)
>>>> idle=f864/1/0x4000000000000000 softirq=3645/3645 fqs=0
>>>> [ 460.884054] rcu: 27-...!: (0 ticks this GP)
>>>> idle=3e74/1/0x4000000000000000 softirq=4168/4168 fqs=0
>>>> [ 460.884805] rcu: 28-...!: (1 GPs behind)
>>>> idle=71a4/1/0x4000000000000000 softirq=3596/3596 fqs=0
>>>> [ 460.885570] rcu: 29-...!: (1 GPs behind)
>>>> idle=9aa4/1/0x4000000000000000 softirq=3111/3112 fqs=0
>>>> [ 460.886276] rcu: 30-...!: (1 GPs behind)
>>>> idle=10dc/1/0x4000000000000000 softirq=3030/3030 fqs=0
>>>> [ 460.887030] rcu: 31-...!: (0 ticks this GP)
>>>> idle=0fb4/1/0x4000000000000000 softirq=3540/3540 fqs=0
>>>> [ 460.887777] rcu: 32-...!: (1 GPs behind)
>>>> idle=fad4/1/0x4000000000000000 softirq=2645/2645 fqs=0
>>>> [ 460.888548] rcu: 33-...!: (0 ticks this GP)
>>>> idle=37ac/1/0x4000000000000000 softirq=2201/2201 fqs=0
>>>> [ 460.889336] rcu: 34-...!: (1 GPs behind)
>>>> idle=85ac/1/0x4000000000000000 softirq=2336/2336 fqs=0
>>>> [ 460.890081] rcu: 35-...!: (1 GPs behind)
>>>> idle=756c/1/0x4000000000000000 softirq=1534/1536 fqs=0
>>>> [ 460.890820] rcu: 36-...!: (0 ticks this GP)
>>>> idle=b2ec/1/0x4000000000000000 softirq=4286/4286 fqs=0
>>>> [ 460.891572] rcu: 37-...!: (1 GPs behind)
>>>> idle=de1c/1/0x4000000000000000 softirq=2725/2725 fqs=0
>>>> [ 460.892323] rcu: 38-...!: (1 GPs behind)
>>>> idle=c62c/1/0x4000000000000000 softirq=2341/2342 fqs=0
>>>> [ 460.893096] rcu: 39-...!: (1 GPs behind)
>>>> idle=be2c/1/0x4000000000000000 softirq=2650/2651 fqs=0
>>>> [ 460.893830] rcu: 40-...!: (1 GPs behind)
>>>> idle=7ecc/1/0x4000000000000000 softirq=2119/2121 fqs=0
>>>> [ 460.894621] rcu: 41-...!: (0 ticks this GP)
>>>> idle=19dc/1/0x4000000000000000 softirq=3539/3539 fqs=0
>>>> [ 460.895384] rcu: 42-...!: (0 ticks this GP)
>>>> idle=a74c/1/0x4000000000000000 softirq=2259/2259 fqs=0
>>>> [ 460.896148] rcu: Tasks blocked on level-1 rcu_node (CPUs 30-42):
>>>> P91/1:b..l
>>>> [ 460.896778] rcu: (detected by 11, t=7962 jiffies, g=15565, q=48
>>>> ncpus=39)
>>>> [ 460.897395] Sending NMI from CPU 11 to CPUs 0:
>>>> [ 461.047304] rcu-torture: rtc: 00000000fb6bc815 ver: 2137 tfle: 0 rta:
>>>> 2137 rtaf: 0 rtf: 2128 rtmbe: 0 rtmbkf: 0/2042 rtbe: 0 rtbke: 0 rtbf: 0
>>>> rtb: 0 nt: 148999 onoff: 32/32:36/37 7,348:36,3639 4096:26899 (HZ=250)
>>>> barrier: 979/980:0 read-exits: 416 nocb-toggles: 1013:969 gpwraps: 0
>>>> [ 461.047343] rcu-torture: Reader Pipe: 1919541609 11501 0 0 0 0 0 0 0 0 >>>> 0
>>>> [ 461.047383] rcu-torture: Reader Batch: 1919533823 19422 0 0 0 0 0 0 0
>>>> 0 0
>>>> [ 461.047401] rcu-torture: Free-Block Circulation: 2136 2136 2135 2134
>>>> 2133 2132 2131 2130 2129 2128 0
>>>> [ 462.777371] NMI backtrace for cpu 0
>>>> [ 462.777547] CPU: 0 UID: 0 PID: 20 Comm: migration/0 Not tainted
>>>> 6.15.0-rc1-00066-g612f2bb48fa2 #22 PREEMPT
>>>> [ 462.777551] Hardware name: linux,dummy-virt (DT)
>>>> [ 462.777552] Stopper: multi_cpu_stop+0x0/0x15c <-
>>>> __stop_cpus.constprop.0+0x78/0xc4
>>>> [ 462.777564] pstate: 43400005 (nZcv daif +PAN -UAO +TCO +DIT -SSBS
>>>> BTYPE=--)
>>>> [ 462.777566] pc : handle_softirqs+0xa0/0x248
>>>> [ 462.777836] lr : handle_softirqs+0x7c/0x248
>>>> [ 462.777838] sp : ffff800080003f50
>>>> [ 462.777839] x29: ffff800080003f50 x28: ffff000003d9d780 x27:
>>>> 0000000000000000
>>>> [ 462.777842] x26: 0000000000000282 x25: 0000000000000000 x24:
>>>> ffffafb5af9f4000
>>>> [ 462.777843] x23: 0000000063400005 x22: ffffafb5adb9b8f0 x21:
>>>> ffffafb5af9e93c8
>>>> [ 462.778009] x20: ffffafb5b02970c0 x19: ffff000003d9d780 x18:
>>>> 0000000000000001
>>>> [ 462.778012] x17: ffff504a6d7bc000 x16: ffff800080000000 x15:
>>>> ffff00001d5f4d00
>>>> [ 462.778013] x14: 0000000000000156 x13: 0000000000000000 x12:
>>>> 0000000000000000
>>>> [ 462.778015] x11: ffff00001d1b0d80 x10: 00000000002ab980 x9 :
>>>> 0000000000000000
>>>> [ 462.778017] x8 : 000001d6146032ba x7 : ffff00000500b800 x6 :
>>>> 0000000000000000
>>>> [ 462.778019] x5 : 00000000002ab980 x4 : 0000000100009e3e x3 :
>>>> ffff504a6d7bc000
>>>> [ 462.778020] x2 : 0000006bbfae36c0 x1 : ffff504a6d7bc000 x0 :
>>>> ffffafb5af9f4000
>>>> [ 462.778232] Call trace:
>>>> [ 462.778233] handle_softirqs+0xa0/0x248 (P)
>>>> [ 462.778296] __do_softirq+0x14/0x20
>>>> [ 462.778299] ____do_softirq+0x10/0x1c
>>>> [ 462.778389] call_on_irq_stack+0x24/0x4c
>>>> [ 462.778391] do_softirq_own_stack+0x1c/0x2c
>>>> [ 462.778393] __irq_exit_rcu+0xd8/0x110
>>>> [ 462.778395] irq_exit_rcu+0x10/0x1c
>>>> [ 462.778456] el1_interrupt+0x38/0x68
>>>> [ 462.778531] el1h_64_irq_handler+0x18/0x24
>>>> [ 462.778533] el1h_64_irq+0x6c/0x70
>>>> [ 462.778534] multi_cpu_stop+0xf0/0x15c (P)
>>>> [ 462.778536] cpu_stopper_thread+0x8c/0x11c
>>>> [ 462.778642] smpboot_thread_fn+0x220/0x24c
>>>> [ 462.778647] kthread+0x134/0x208
>>>> [ 462.778649] ret_from_fork+0x10/0x20
>>>> [ 463.888107] NMI backtrace for cpu 1
>>>> [ 463.888117] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted
>>>> 6.15.0-rc1-00066-g612f2bb48fa2 #22 PREEMPT
>>>> [ 463.888120] Hardware name: linux,dummy-virt (DT)
>>>> [ 463.888121] pstate: 43400005 (nZcv daif +PAN -UAO +TCO +DIT -SSBS
>>>> BTYPE=--)
>>>> [ 463.888123] pc : handle_softirqs+0xa0/0x248
>>>> [ 463.888135] lr : handle_softirqs+0x7c/0x248
>>>> [ 463.888140] sp : ffff80008000bf50
>>>> [ 463.888140] x29: ffff80008000bf50 x28: ffff000003d9e900 x27:
>>>> 0000000000000000
>>>> [ 463.888143] x26: 0000000000000282 x25: ffff000003d9e900 x24:
>>>> ffffafb5af9f4000
>>>> [ 463.888146] x23: 0000000063400005 x22: ffffafb5adad9114 x21:
>>>> ffffafb5af9e93c8
>>>> [ 463.888148] x20: ffffafb5b02970c0 x19: ffff000003d9e900 x18:
>>>> 0000000000000000
>>>> [ 463.888150] x17: ffff504a6d7d6000 x16: ffff800080008000 x15:
>>>> ffff00001d1cad00
>>>> [ 463.888152] x14: 0000000000000112 x13: 0000000000000000 x12:
>>>> 0000000000000000
>>>> [ 463.888158] x11: 0000000000000a95 x10: 0000000000000163 x9 :
>>>> 0000000000000000
>>>> [ 463.888160] x8 : 0000006bfc655500 x7 : 7fffffffffffffff x6 :
>>>> 0000006bfc655500
>>>> [ 463.888165] x5 : 0000006bfca25e00 x4 : 0000000100009ff5 x3 :
>>>> ffff504a6d7d6000
>>>> [ 463.888168] x2 : 0000006c01e2bea0 x1 : ffff504a6d7d6000 x0 :
>>>> ffffafb5af9f4000
>>>> [ 463.888171] Call trace:
>>>> [ 463.888172] handle_softirqs+0xa0/0x248 (P)
>>>> [ 463.888175] __do_softirq+0x14/0x20
>>>> [ 463.888178] ____do_softirq+0x10/0x1c
>>>> [ 463.888183] call_on_irq_stack+0x24/0x4c
>>>> [ 463.888185] do_softirq_own_stack+0x1c/0x2c
>>>> [ 463.888187] __irq_exit_rcu+0xd8/0x110
>>>> [ 463.888192] irq_exit_rcu+0x10/0x1c
>>>> [ 463.888194] el1_interrupt+0x38/0x68
>>>> [ 463.888198] el1h_64_irq_handler+0x18/0x24
>>>> [ 463.888200] el1h_64_irq+0x6c/0x70
>>>> [ 463.888202] finish_task_switch.isra.0+0x74/0x220 (P)
>>>> [ 463.888207] __schedule+0x2fc/0x998
>>>> [ 463.888210] schedule_idle+0x28/0x48
>>>> [ 463.888212] do_idle+0x170/0x268
>>>> [ 463.888215] cpu_startup_entry+0x34/0x3c
>>>> [ 463.888217] secondary_start_kernel+0x138/0x158
>>>> [ 463.888224] __secondary_switched+0xc0/0xc4
>>>> [ 463.941174] NMI backtrace for cpu 8
>>>> [ 463.941179] CPU: 8 UID: 0 PID: 19 Comm: rcu_exp_gp_kthr Not tainted
>>>> 6.15.0-rc1-00066-g612f2bb48fa2 #22 PREEMPT
>>>> [ 463.941220] Hardware name: linux,dummy-virt (DT)
>>>> [ 463.941222] pstate: 23400005 (nzCv daif +PAN -UAO +TCO +DIT -SSBS
>>>> BTYPE=--)
>>>> [ 463.941223] pc : smp_call_function_single+0xe4/0x1b0
>>>> [ 463.941292] lr : __sync_rcu_exp_select_node_cpus+0x224/0x3c8
>>>> [ 463.941366] sp : ffff800080233cd0
>>>> [ 463.941367] x29: ffff800080233d20 x28: 0000000000000002 x27:
>>>> ffff000003d9c600
>>>> [ 463.941370] x26: ffffafb5b032a880 x25: ffff00001d4d7ac0 x24:
>>>> ffffafb5adb49e24
>>>> [ 463.941372] x23: 0000000000001fff x22: ffffafb5af9f5ac0 x21:
>>>> ffffafb5b029ea08
>>>> [ 463.941374] x20: 0000000000000000 x19: ffff800080233ce0 x18:
>>>> 0000000000000001
>>>> [ 463.941376] x17: ffff504a6d88c000 x16: ffff800080040000 x15:
>>>> 0000000000000000
>>>> [ 463.941378] x14: 0000000000000115 x13: 0000000000000115 x12:
>>>> 0000000000000000
>>>> [ 463.941380] x11: ffffafb5af9f0408 x10: 0000000000000001 x9 :
>>>> ffff000003d9c600
>>>> [ 463.941462] x8 : 0000000000000040 x7 : ffffafb5b0322ab0 x6 :
>>>> ffffafb5adb49e24
>>>> [ 463.941464] x5 : 0000000000000000 x4 : ffff00001d281f08 x3 :
>>>> 0000000000000001
>>>> [ 463.941535] x2 : 0000000000000000 x1 : ffff00001d281f00 x0 :
>>>> 000000000000001f
>>>> [ 463.941537] Call trace:
>>>> [ 463.941538] smp_call_function_single+0xe4/0x1b0 (P)
>>>> [ 463.941541] __sync_rcu_exp_select_node_cpus+0x224/0x3c8
>>>> [ 463.941619] sync_rcu_exp_select_cpus+0x138/0x2a4
>>>> [ 463.941622] wait_rcu_exp_gp+0x18/0x30
>>>> [ 463.941644] kthread_worker_fn+0xcc/0x184
>>>> [ 463.941708] kthread+0x134/0x208
>>>> [ 463.941709] ret_from_fork+0x10/0x20
>>>
>>> I wish the back trace also included which CPU the smp_call_function_single
>>> is
>>> waiting on. This is similar to what Boqun sees as well, where we are
>>> spinning on
>>> csd lock for a CPU that is wedged. In this case, the 'wedged' appears to be
>>> on
>>> level-1 rcu_node (CPUs 30-42) ? But we don't have those backtraces in the
>>> above
>>> to confirm which specific one.
>>
>> The way to make that happen is via the CSD_LOCK_WAIT_DEBUG and the
>> CSD_LOCK_WAIT_DEBUG_DEFAULT Kconfig options.
>>
>>> Al though, cpu 8 looks like it could itself be the one blocking the GP as a
>>> secondary effect of the smp_call IPI not coming through?
>>
>> What caught my eye was this: "Stopper: multi_cpu_stop+0x0/0x15c".
>>
>> This puppy makes all the CPUs spin with interrupts disabled. It looks
>> like it was running while something was blocking context switches on
>> at least some of the CPUs. It is a most wonderful way to amplify any
>> delay on any CPUs to affect all CPUs. ;-)
>>
>>> Speaking of smp_call, may be it does make sense for EXP to not wait too
>>> long for
>>> csd lock thus to not make dire matters worse? After all, the GP may well
>>> complete eventually even without the IPI executing? Perhaps even sooner if
>>> the
>>> CPU doing the smp_call wait does some other useful work as well.
>>
>> What does the RCU expedited grace period do instead of waiting?
>>
>> I could imagine an smp_call_function_single() API that did exponential
>> backoff, spinning for a bit, sleeping, spinning for a bit, sleeping
>> for longer and so on. Or did you have something else in mind?
>>
>>>> At first glance, this looks like a stop_machine() request gone bad.
>>>> Please note that I am seeing this on most TREE01 runs and nowhere else.
>>>> So this might be a strange side effect of dynamic callback offloading,
>>>> but perhaps more likely due to TREE01 using more CPUs than it is allowing
>>>> for, that is, telling qemu "-smp 43", the kernel "maxcpus=8 nr_cpus=43",
>>>> but forcing the guest OS to run on only eight CPUs. (Why??? Well,
>>>> back in the day, there was an RCU bug exposed by this.)
>>>>
>>>> All that aside, this is one of the more spectacular RCU CPU stall warnings
>>>> I have ever seen, and that is with me leaving off the NMI stack traces
>>>> for most of the CPUs. ;-)
>>>
>>> IIRC, NMI stack tracing on ARM is unreliable due to raisins.
>>
>> Ah, true, older ARM hardware lacks NMIs.
>>
>>>> If there is interest, I could post a tarball of a few of the full
>>>> console logs.
>>>>
>>>> Come to think of it, even if there isn't interest, here you go! ;-)
>>>>
>>>> https://drive.google.com/file/d/1JKmCYqK6kAwhlJZUJQeTle-o62aJXkUQ/view?usp=sharing
>>>
>>> TREE01 in this trace seems to be a different splat I think:
>>>
>>> [ 71.612482] rcu: Offline CPU 1 blocking current GP.
>>>
>>> Causing the stall I think. Something to do with hotplug? I wonder if without
>>> hotplug testing, it will trigger or not with the same CPU count settings.
>>> It may
>>> be worth adding the ARM folks as well.
>>
>> I could try going back to the overload case and disabling CPU hotplug.
>> But the problem seems to be that ARM is taking the extra 35 CPUs much
>> more seriously than does x86, which could cause trouble given that the
>> rcutorture scripting is jamming all 43 of them onto eight real CPUs.
>> The strange thing if that they are online as far as the (de)offloading
>> tests are concerned, but they show no sign of ever having been brought
>> online.
>>
>> No, I take it back, they were brought online, it is just that ARM's CPU
>> ID space is way different than I would expect: 0x1, 0x2, 0x3, ..., 0xe,
>> 0xf, 0x100, 0x101, ...
>>
>> With then change to TREE01, there are only ever eight of them.
>>
>> Either way, the next step seems to be to remove the "CONFIG_NR_CPUS=8"
>> to see what happens.
>
> Which causes all 43 CPUs and the stall warnings to reappear, and with
> the stop-machine code again at the scene of the crime. Which is not
> at all surprising: 43 vCPUs are trying to spin with interrupts disabled
> on only 8 real CPUs, which just is never going to end well.
>
> So this might just be ARM making heavier use of stop-machine than does
> x86. Or maybe x86 bringing the systems only sort of online (which was
> what I was seeing last I looked).
>
> At this point I am tempted to remove the CPU-oversubscription aspect
> of TREE01. What are your thoughts?
I am also of similar opinion. Or maybe oversubscribe it lesser but still
oversubscribe? :) Since you mentioned there is an RCU bug this tests.
It sounds like you want to test vCPU preemption more by having a configuration
where vCPU >> pCPU. If not, then maybe the cpu_count in kvm.sh shouldn't
oversubscribe like that?
It could also be a 'bug' in ARM KVM which is not allowing the scheduling of the
vCPU threads correctly or something.
- Joel