On Dec 12 06:27, Guenter Roeck wrote: > On 12/12/22 05:45, Klaus Jensen wrote: > > On Dec 12 05:39, Guenter Roeck wrote: > > > On 12/12/22 01:58, Klaus Jensen wrote: > > > > On Dec 8 12:39, Guenter Roeck wrote: > > > > > On Thu, Dec 08, 2022 at 12:13:55PM -0800, Guenter Roeck wrote: > > > > > > On Thu, Dec 08, 2022 at 10:47:42AM -0800, Guenter Roeck wrote: > > > > > > > > > > > > > > > > A cq head doorbell mmio is skipped... And it is not the fault > > > > > > > > of the > > > > > > > > kernel. The kernel is in it's good right to skip the mmio since > > > > > > > > the cq > > > > > > > > eventidx is not properly updated. > > > > > > > > > > > > > > > > Adding that and it boots properly on riscv. But I'm perplexed > > > > > > > > as to why > > > > > > > > this didnt show up on our regularly tested platforms. > > > > > > > > > > > > > > > > Gonna try to get this in for 7.2! > > > > > > > > > > > > > > I see another problem with sparc64. > > > > > > > > > > > > > > [ 5.261508] could not locate request for tag 0x0 > > > > > > > [ 5.261711] nvme nvme0: invalid id 0 completed on queue 1 > > > > > > > > > > > > > > That is seen repeatedly until the request times out. I'll test > > > > > > > with > > > > > > > your patch to see if it resolves this problem as well, and will > > > > > > > bisect > > > > > > > otherwise. > > > > > > > > > > > > > The second problem is unrelated to the doorbell problem. > > > > > > It is first seen in qemu v7.1. I'll try to bisect. > > > > > > > > > > > > > > > > Unfortunately, the problem observed with sparc64 also bisects to this > > > > > patch. Making things worse, "hw/nvme: fix missing cq eventidx update" > > > > > does not fix it (which is why I initially thought it was unrelated). > > > > > > > > > > I used the following qemu command line. > > > > > > > > > > qemu-system-sparc64 -M sun4v -cpu "TI UltraSparc IIi" -m 512 > > > > > -snapshot \ > > > > > -device nvme,serial=foo,drive=d0,bus=pciB \ > > > > > -drive file=rootfs.ext2,if=none,format=raw,id=d0 \ > > > > > -kernel arch/sparc/boot/image -no-reboot \ > > > > > -append "root=/dev/nvme0n1 console=ttyS0" \ > > > > > -nographic -monitor none > > > > > > > > > > > > > Hi Guenter, > > > > > > > > Thank you very much for the detailed reports and I apologize for the > > > > fallout of this. > > > > > > > > I think this is related to missing endian conversions when handling the > > > > shadow doorbells. I'm not sure if there is a kernel issue here as well, > > > > because as far as I can tell, the shadow doorbells are updated like so: > > > > > > > > old_value = *dbbuf_db; > > > > *dbbuf_db = value; > > > > > > > > (where `value` is the new head/tail value depending on if this is an sq > > > > or cq). > > > > > > > > Keith, is the kernel doing something magically here I am not aware of, > > > > or isn't this missing some cpu_to_le32() / le32_to_cpu() calls as well? > > > > > > Wouldn't that mean that nvme doorbell support would be broken in Linux > > > on all big endian systems ? Maybe it is, but it seems a bit unlikely. > > > > > > > No, not broken in general - only for shadow doorbells. On regular MMIO, > > the linux helpers takes care of endianness (and so does the MMIO > > callbacks in QEMU). However, for shadow doorbells, the doorbell value is > > written to host memory - and Linux (and QEMU) does not handle that > > correctly wrt. endianness. > > > > I got it all working with the series I just posted for QEMU (v4) and > > fixing endianness conversion for the above in the kernel (patch pending > > for linux-nvme). > > Hmm, interesting. I guess I'll wait for the Linux patch to be posted. > > That makes me wonder, though: Are the Linux changes really necessary ? > If this never worked, would it be possible to adjust the qemu code > in a way that it just works with the existing Linux code ? >
That would mean that hw/nvme would be non-compliant on big-endian and I'd rather not leave it in that state. > Alternatively, would it be possible to add a runtime flag to qemu > that would let me disable shadow doorbell support ? I am testing > all Linux kernel branches, and without such a flag I'd have to carry > a patch just disabling it in qemu until all kernel branches are fixed > (if that ever happens). > YES. The fallout of this has been bad, so I'll cook up a patch to disable shadow doorbells at runtime.
signature.asc
Description: PGP signature