On 02. 03. 26, 12:46, Peter Zijlstra wrote:
On Mon, Mar 02, 2026 at 06:28:38AM +0100, Jiri Slaby wrote:

The state of the lock:

crash> struct rq.__lock -x ffff8d1a6fd35dc0
   __lock = {
     raw_lock = {
       {
         val = {
           counter = 0x40003
         },
         {
           locked = 0x3,
           pending = 0x0
         },
         {
           locked_pending = 0x3,
           tail = 0x4
         }
       }
     }
   },



That had me remember the below patch that never quite made it. I've
rebased it to something more recent so it applies.

If you stick that in, we might get a clue as to who is owning that lock.
Provided it all wants to reproduce well enough.

Thanks, I applied it, but to date it is still not accepted yet:
https://build.opensuse.org/requests/1335893


In the meantime, me and Michal K. did some digging into qemu dumps. Details at (and a couple previous comments):
https://bugzilla.suse.com/show_bug.cgi?id=1258936#c17

tl;dr:

In one of the dumps, one process sits in
  context_switch
    -> mm_get_cid (before switch_to())

> 65 kworker/1:1 SP= 0xffffcf82c022fd98 -> __schedule+0x16ee (ffffffff820f162e) -> call mm_get_cid

Michal extracted the vCPU's RIP and it turned out:
> Hm, I'd say the CPU could be spinning in mm_get_cid() waiting for a free CID.
> ...
> ffff8a88458137c0:  000000000000000f 000000000000000f
>                                                    ^
> Hm, so indeed CIDs for all four CPUs are occupied.

To me (I don't know what CID is either), this might point as a possible culprit to Thomas' "sched/mmcid: Cure mode transition woes" [1].

Funnily enough, 47ee94efccf6 ("sched/mmcid: Protect transition on weakly ordered systems") spells:
>     As a consequence the task will
> not drop the CID when scheduling out before the fixup is completed, which > means the CID space can be exhausted and the next task scheduling in will > loop in mm_get_cid() and the fixup thread can livelock on the held runqueue
>     lock as above.

Which sounds like what exactly happens here. Except the patch is from the series above, so is already in 6.19 obviously.


I noticed there is also a 7.0-rc1 fix:
  1e83ccd5921a sched/mmcid: Don't assume CID is CPU owned on mode switch
But that got into 6.19.1 already (we are at 6.19.3). So does not improve the situation.

Any ideas?



[1] https://lore.kernel.org/all/[email protected]/

thanks,
--
js
suse labs

Reply via email to