On Dec 8 12:39, Guenter Roeck wrote: > On Thu, Dec 08, 2022 at 12:13:55PM -0800, Guenter Roeck wrote: > > On Thu, Dec 08, 2022 at 10:47:42AM -0800, Guenter Roeck wrote: > > > > > > > > A cq head doorbell mmio is skipped... And it is not the fault of the > > > > kernel. The kernel is in it's good right to skip the mmio since the cq > > > > eventidx is not properly updated. > > > > > > > > Adding that and it boots properly on riscv. But I'm perplexed as to why > > > > this didnt show up on our regularly tested platforms. > > > > > > > > Gonna try to get this in for 7.2! > > > > > > I see another problem with sparc64. > > > > > > [ 5.261508] could not locate request for tag 0x0 > > > [ 5.261711] nvme nvme0: invalid id 0 completed on queue 1 > > > > > > That is seen repeatedly until the request times out. I'll test with > > > your patch to see if it resolves this problem as well, and will bisect > > > otherwise. > > > > > The second problem is unrelated to the doorbell problem. > > It is first seen in qemu v7.1. I'll try to bisect. > > > > Unfortunately, the problem observed with sparc64 also bisects to this > patch. Making things worse, "hw/nvme: fix missing cq eventidx update" > does not fix it (which is why I initially thought it was unrelated). > > I used the following qemu command line. > > qemu-system-sparc64 -M sun4v -cpu "TI UltraSparc IIi" -m 512 -snapshot \ > -device nvme,serial=foo,drive=d0,bus=pciB \ > -drive file=rootfs.ext2,if=none,format=raw,id=d0 \ > -kernel arch/sparc/boot/image -no-reboot \ > -append "root=/dev/nvme0n1 console=ttyS0" \ > -nographic -monitor none >
Hi Guenter, Thank you very much for the detailed reports and I apologize for the fallout of this. I think this is related to missing endian conversions when handling the shadow doorbells. I'm not sure if there is a kernel issue here as well, because as far as I can tell, the shadow doorbells are updated like so: old_value = *dbbuf_db; *dbbuf_db = value; (where `value` is the new head/tail value depending on if this is an sq or cq). Keith, is the kernel doing something magically here I am not aware of, or isn't this missing some cpu_to_le32() / le32_to_cpu() calls as well?
signature.asc
Description: PGP signature