On Jan 28, 2026 / 08:42, Paul E. McKenney wrote: > On Wed, Jan 28, 2026 at 05:55:01PM +0800, Kunwu Chan wrote: > > On 1/26/26 19:30, Shinichiro Kawasaki wrote: > > > kernel: xfs/for-next, 51aba4ca399, v6.19-rc5+ > > > block device: dm-linear on HDD (non-zoned) > > > xfs: zoned > > > > I had a quick look at the attached logs. Across the different runs, the > > stall traces consistently show CPUs spending extended time in > > |mm_get_cid()|along the mm/sched context switch path. > > > > This doesn’t seem to indicate an immediate RCU issue by itself, but it > > raises the question of whether context switch completion can be delayed > > for unusually long periods under these test configurations. > > Thank you all! > > Us RCU guys looked at this and it also looks to us that at least one > part of this issue is that mm_get_cid() is spinning. This is being > investigated over here: > > https://lore.kernel.org/all/877bt29cgv.ffs@tglx/ > https://lore.kernel.org/all/[email protected]/ > https://lore.kernel.org/all/87y0lh96xo.ffs@tglx/
Knuwu, Paul and RCU experts, thank you very much. It's good to know that the similar issue is already under investigation. I hope that a fix gets available in timely manner. > I have seen the static-key pattern called out by Dave Chinner when running > KASAN on large systems. We worked around this by disabling KASAN's use > of static keys. In case you were running KASAN in these tests. As to KASAN, yes, I enable it in my test runs. I find three static-keys under mm/kasan/*. I will think if they can be disabled in my test runs. Thanks.
