On Tue, Jan 20, 2026 at 02:40:44PM +0530, Venkat Rao Bagalkote wrote:
> Greetings!!!
> 
> 
> IBM CI has reported a kernel softlockup, while running BPF selftests on
> PowerPC kernel.
> 
> 
> Traces:
> 
> [ 1632.509843] audit: type=1334 audit(1769127975.721:164430): prog-id=82135
> op=LOAD
> [ 1632.509852] audit: type=1334 audit(1769127975.721:164431): prog-id=82135
> op=UNLOAD
> [ 1637.016921] Mode = AA
> [ 1660.780274] watchdog: BUG: soft lockup - CPU#8 stuck for 23s!
> [rqsl_w/8:51609]
> [ 1660.780283] Modules linked in: bpf_test_rqspinlock(OE+) 8021q(E) garp(E)
> mrp(E) stp(E) llc(E) vrf(E) tun(E) bpf_testmod(OE) veth(E) nft_fib_inet(E)
> nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E)
> nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) bonding(E) nft_ct(E)
> tls(E) nft_chain_nat(E) rfkill(E) sunrpc(E) ibmveth(E) hvcs(E) hvcserver(E)
> pseries_rng(E) vmx_crypto(E) dm_multipath(E) fuse(E) dm_mod(E) drm(E)
> drm_panel_orientation_quirks(E) zram(E) ext4(E) crc16(E) mbcache(E) jbd2(E)
> sr_mod(E) sd_mod(E) cdrom(E) ibmvscsi(E) scsi_transport_srp(E) [last
> unloaded: livepatch_sample(EK)]
> [ 1660.780352] CPU: 8 UID: 0 PID: 51609 Comm: rqsl_w/8 Tainted: G        
>  OE K     6.19.0-rc4-g960c1fd29055 #1 VOLUNTARY
> [ 1660.780359] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE, [K]=LIVEPATCH
> [ 1660.780362] Hardware name: IBM,8375-42A POWER9 (architected) 0x4e0202
> 0xf000005 of:IBM,FW950.80 (VL950_131) hv:phyp pSeries
> [ 1660.780365] NIP:  c0000000000399a8 LR: c000000000039c24 CTR:
> c000000000039ca0
> [ 1660.780368] REGS: c000000bc19cfd28 TRAP: 0900   Tainted: G      OE K     
> (6.19.0-rc4-g960c1fd29055)
> [ 1660.780372] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
> 28000288  XER: 0000000a
> [ 1660.780386] CFAR: 0000000000000000 IRQMASK: 0
> [ 1660.780386] GPR00: c000000000039c24 c000000bc19cfd00 c000000001f58100
> c000000bc19cfcf8
> [ 1660.780386] GPR04: c000000bc19cfea8 0000000000000000 4000000000000002
> c0000013ff916e08
> [ 1660.780386] GPR08: 00000013fd6c0000 0000000000000049 fffffffffffffffc
> c0080000f73e0f98
> [ 1660.780386] GPR12: c000000000039ca0 c00000002e9b6700 c000000000270808
> c000000bad8fe980
> [ 1660.780386] GPR16: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [ 1660.780386] GPR20: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [ 1660.780386] GPR24: 0000000000000000 0000000000000000 c0080000f7360048
> 0000000000000000
> [ 1660.780386] GPR28: 0000000000000008 0000000000000000 000001827223b155
> 0000000000000003
> [ 1660.780433] NIP [c0000000000399a8] __replay_soft_interrupts+0x38/0x150
> [ 1660.780443] LR [c000000000039c24]
> arch_local_irq_restore.part.0+0xe4/0x160
> [ 1660.780449] Call Trace:
> [ 1660.780452] [c000000bc19cfd00] [c000000000039a0c]
> __replay_soft_interrupts+0x9c/0x150 (unreliable)
> [ 1660.780460] [c000000bc19cfeb0] [c000000000039c24]
> arch_local_irq_restore.part.0+0xe4/0x160
> [ 1660.780468] [c000000bc19cfef0] [c0080000f73e042c]
> rqspinlock_worker_fn+0x244/0x300 [bpf_test_rqspinlock]
> [ 1660.780476] [c000000bc19cff90] [c000000000270954] kthread+0x154/0x170
> [ 1660.780482] [c000000bc19cffe0] [c00000000000df78]
> start_kernel_thread+0x14/0x18
> [ 1660.780487] Code: 60000000 7c0802a6 f8010010 f821fe51 e92d0c78 f92101a8
> 39200000 38610028 892d0933 61290040 992d0933 48102e15 <60000000> 39200000
> e9410130 f9210160
> 
> 
> 
> If you happen to fix this, Please add below tag.
> 
> 
> Reported-by: Venkat Rao Bagalkote <[email protected]>
> 
> 
> 
> Regards,
> 
> Venkat.
> 
> 
Hi Venkat,

I tried on bpf-next-6.19.0-rc5+. It is recovering from softlockups after
sometime and the selftest is passing:

./test_progs -t res_spin_lock_stress
WATCHDOG: test case res_spin_lock_stress executes for 10 seconds...
#377     res_spin_lock_stress:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

On powerpc, there is an arch-specific implementation of queued spinlocks,
but the resilient queued spinlock used by BPF currently falls back to a
simpler test-and-set (TAS) lock. Under heavy contention, this can lead 
to soft lockups like the one observed above. Once powerpc specific 
resilient queued spinlock implementation is added, the issue is expected
to disappear.

Thanks,
Saket

Reply via email to