Hi Jiri, On 26/02/2026 11:37, Jiri Slaby wrote: > On 06. 02. 26, 12:54, Matthieu Baerts wrote: >> Our CI for the MPTCP subsystem is now regularly hitting various stalls >> before even starting the MPTCP test suite. These issues are visible on >> top of the latest net and net-next trees, which have been sync with >> Linus' tree yesterday. All these issues have been seen on a "public CI" >> using GitHub-hosted runners with KVM support, where the tested kernel is >> launched in a nested (I suppose) VM. I can see the issue with or without >> debug.config. According to the logs, it might have started around >> v6.19-rc0, but I was unavailable for a few weeks, and I couldn't react >> quicker, sorry for that. Unfortunately, I cannot reproduce this locally, >> and the CI doesn't currently have the ability to execute bisections. > > Hmm, after the switch of the qemu guest kernels to 6.19, our (opensuse) > build service is stalling in smp_call_function_many_cond() randomly too: > https://bugzilla.suse.com/show_bug.cgi?id=1258936 > > The attachment from there contains sysrq-t logs too: > https://bugzilla.suse.com/attachment.cgi?id=888612
I'm glad I'm not the only one with this issue :) In your case, do you also have nested VMs with KVM support? Are you able to easily reproduce the issue and change the guest kernel in your build service? On my side, any debugging steps need to be automated. Lately, it looks like the issue is more easily triggered on a stable 6.19 kernel, than on the last RC. >> The stalls happen before starting the MPTCP test suite. The init program >> creates a VSOCK listening socket via socat [1], and different hangs are >> then visible: RCU stalls followed by a soft lockup [2], only a soft >> lockup [3], sometimes the soft lockup comes with a delay [4] [5], or >> there is no RCU stalls or soft lockups detected after one minute, but VM >> is stalled [6]. In the last case, the VM is stopped after having >> launched GDB to get more details about what was being executed. >> >> It feels like the issue is not directly caused by the VSOCK listening >> socket, but the stalls always happen after having started the socat >> command [1] in the background. > > It fails randomly while building random packages (go, libreoffice, > bayle, ...). I don't think it is VSOCK related in those cases, but who > knows what the builds do... Indeed, unlikely to be VSOCK then. > I cannot reproduce locally either. > > I came across: > 614da1d3d4cd x86: make page fault handling disable interrupts properly > but I have no idea if it could have impact on this at all. Did it help to revert it? Cheers, Matt -- Sponsored by the NGI0 Core fund.

