Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

Keith Busch Wed, 18 Jan 2023 08:33:44 -0800

On Wed, Jan 18, 2023 at 03:04:06PM +0000, Peter Maydell wrote:
> On Tue, 17 Jan 2023 at 19:21, Guenter Roeck <li...@roeck-us.net> wrote:
> > Anyway - any idea what to do to help figuring out what is happening ?
> > Add tracing support to pci interrupt handling, maybe ?
> 
> For intermittent bugs, I like recording the QEMU session under
> rr (using its chaos mode to provoke the failure if necessary) to
> get a recording that I can debug and re-debug at leisure. Usually
> you want to turn on/add tracing to help with this, and if the
> failure doesn't hit early in bootup then you might need to
> do a QEMU snapshot just before point-of-failure so you can
> run rr only on the short snapshot-to-failure segment.
> 
> https://translatedcode.wordpress.com/2015/05/30/tricks-for-debugging-qemu-rr/
> https://translatedcode.wordpress.com/2015/07/06/tricks-for-debugging-qemu-savevm-snapshots/
> 
> This gives you a debugging session from the QEMU side's perspective,
> of course -- assuming you know what the hardware is supposed to do
> you hopefully wind up with either "the guest software did X,Y,Z
> and we incorrectly did A" or else "the guest software did X,Y,Z,
> the spec says A is the right/a permitted thing but the guest got confused".
> If it's the latter then you have to look at the guest as a separate
> code analysis/debug problem.


Here's what I got, though I'm way out of my depth here.

It looks like Linux kernel's fasteoi for RISC-V's PLIC claims the
interrupt after its first handling, which I think is expected. After
claiming, QEMU masks the pending interrupt, lowering the level, though
the device that raised it never deasserted.

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

Reply via email to