On Wed, Jan 18, 2023 at 03:04:06PM +0000, Peter Maydell wrote: > On Tue, 17 Jan 2023 at 19:21, Guenter Roeck <li...@roeck-us.net> wrote: > > Anyway - any idea what to do to help figuring out what is happening ? > > Add tracing support to pci interrupt handling, maybe ? > > For intermittent bugs, I like recording the QEMU session under > rr (using its chaos mode to provoke the failure if necessary) to > get a recording that I can debug and re-debug at leisure. Usually > you want to turn on/add tracing to help with this, and if the > failure doesn't hit early in bootup then you might need to > do a QEMU snapshot just before point-of-failure so you can > run rr only on the short snapshot-to-failure segment. > > https://translatedcode.wordpress.com/2015/05/30/tricks-for-debugging-qemu-rr/ > https://translatedcode.wordpress.com/2015/07/06/tricks-for-debugging-qemu-savevm-snapshots/ > > This gives you a debugging session from the QEMU side's perspective, > of course -- assuming you know what the hardware is supposed to do > you hopefully wind up with either "the guest software did X,Y,Z > and we incorrectly did A" or else "the guest software did X,Y,Z, > the spec says A is the right/a permitted thing but the guest got confused". > If it's the latter then you have to look at the guest as a separate > code analysis/debug problem.
Here's what I got, though I'm way out of my depth here. It looks like Linux kernel's fasteoi for RISC-V's PLIC claims the interrupt after its first handling, which I think is expected. After claiming, QEMU masks the pending interrupt, lowering the level, though the device that raised it never deasserted.