On Thu, Jan 29, 2026 at 09:46:12AM -0800, Paul E. McKenney wrote: > On Thu, Jan 29, 2026 at 05:27:04AM +0000, Shinichiro Kawasaki wrote: > > On Jan 28, 2026 / 08:42, Paul E. McKenney wrote: > > > On Wed, Jan 28, 2026 at 05:55:01PM +0800, Kunwu Chan wrote: > > > > On 1/26/26 19:30, Shinichiro Kawasaki wrote: > > > > > kernel: xfs/for-next, 51aba4ca399, v6.19-rc5+ > > > > > block device: dm-linear on HDD (non-zoned) > > > > > xfs: zoned > > > > > > > > I had a quick look at the attached logs. Across the different runs, the > > > > stall traces consistently show CPUs spending extended time in > > > > |mm_get_cid()|along the mm/sched context switch path. > > > > > > > > This doesn’t seem to indicate an immediate RCU issue by itself, but it > > > > raises the question of whether context switch completion can be delayed > > > > for unusually long periods under these test configurations. > > > > > > Thank you all! > > > > > > Us RCU guys looked at this and it also looks to us that at least one > > > part of this issue is that mm_get_cid() is spinning. This is being > > > investigated over here: > > > > > > https://lore.kernel.org/all/877bt29cgv.ffs@tglx/ > > > https://lore.kernel.org/all/[email protected]/ > > > https://lore.kernel.org/all/87y0lh96xo.ffs@tglx/ > > > > Knuwu, Paul and RCU experts, thank you very much. It's good to know that the > > similar issue is already under investigation. I hope that a fix gets > > available > > in timely manner. > > > > > I have seen the static-key pattern called out by Dave Chinner when running > > > KASAN on large systems. We worked around this by disabling KASAN's use > > > of static keys. In case you were running KASAN in these tests. > > > > As to KASAN, yes, I enable it in my test runs. I find three static-keys > > under > > mm/kasan/*. I will think if they can be disabled in my test runs. Thanks. > > There is a set of Kconfig options that disables static branches. If you > cannot find them quickly, please let me know and I can look them up.
And Thomas Gleixner posted an alleged fix to the CID issue here: https://lore.kernel.org/lkml/[email protected]/ Please let him know whether or not it helps. Thanx, Paul
