On Sat, Mar 02, 2024 at 08:53:54PM +0100, Max Boone wrote:
>
> Thank you so much for the quick reply!
>
> I haven't filed a bug with Debian specifically as I'm running the linux
> kernel built and provided by Microsoft and Ubuntu as OS on top. If it helps
> with the search I'd gladly run Debian and file a bug there, but will still
> need to build my own kernel as WSL requires some modules (such as HyperV
> storage and sockets) to be built into the kernel instead (meaning =y) of as
> modules (meaning =m).
Ah, if you built your own kernel, then you are your own distro as far
as kernel issues are concerned. ;-)
Thanx, Paul
> I'll stick to using the rcu list from here on to avoid spam, thanks again!
>
> On Saturday, March 02, 2024 20:43 CET, "Paul E. McKenney"
> <[email protected]> wrote:
> [ Adding Boqun and the rcu list on CC. ]
>
> On Sat, Mar 02, 2024 at 07:59:08PM +0100, Max Boone wrote:
> >
> > Dear Dr. McKenney,
> >
> > For a couple of years now I've been the sometimes frustrated owner of a
> > Microsoft Surface Pro X ARM64 device, which has been getting progressively
> > better as more vendors start targeting their builds at ARM64 architectures
> > but since the introduction of the device there have been issues with the
> > Windows Subsystem for Linux (not more than an opinionated Hyper-V VM with
> > extensive tooling) locking up and hanging.
> >
> > When this happens, traces like the following are dumped in the kernel
> > messages:
> > https://github.com/microsoft/WSL/issues/9454#issuecomment-1942222109
> >
> > When watching your talk "Decoding Those Inscrutable RCU CPU Stall Warnings"
> > you mentioned one can feel free reaching out when bumping into such issues.
> > Building other kernel releases, switching off-and-on modules and playing
> > with the RCU grace period times so far don't seem to work for me (or others
> > in that thread).
> >
> > Anyways, I don't really know where to start looking and the call stacks
> > aren't very informative (to my eye) either. I'm hoping you might help me
> > find the direction to look for the root of this problem.
>
> I am assuming that you have filed a bug with the Debian folks, and before
> doing that, searched for similar bug reports.
>
> At first glance, this is because things were stuck here:
>
> [ 967.115632] clear_rseq_cs.isra.0+0x4c/0x60
> [ 967.116433] do_notify_resume+0xf8/0xeb0
> [ 967.116960] el0_svc+0x3c/0x50
> [ 967.117537] el0t_64_sync_handler+0x9c/0x120
> [ 967.118323] el0t_64_sync+0x158/0x15c
>
> So including these function names (clear_rseq_cs() and so on) in your
> search for similar bug reports would be a good idea.
>
> I am unfamiliar with that code.
>
> So I added Boqun because he works with Linux on HyperV as part of his
> day job and has a great deal of experience with RCU. He will likely
> have quite a number of questions for you including exact versions,
> Debian bug number, the results of your web search, and so on. He might
> also know an ARM person to get involved in this.
>
> Or maybe he knows the solution off the top of his head!
>
> Thanx, Paul
>
>
>