On Dec 12 05:39, Guenter Roeck wrote:
> On 12/12/22 01:58, Klaus Jensen wrote:
> > On Dec  8 12:39, Guenter Roeck wrote:
> > > On Thu, Dec 08, 2022 at 12:13:55PM -0800, Guenter Roeck wrote:
> > > > On Thu, Dec 08, 2022 at 10:47:42AM -0800, Guenter Roeck wrote:
> > > > > > 
> > > > > > A cq head doorbell mmio is skipped... And it is not the fault of the
> > > > > > kernel. The kernel is in it's good right to skip the mmio since the 
> > > > > > cq
> > > > > > eventidx is not properly updated.
> > > > > > 
> > > > > > Adding that and it boots properly on riscv. But I'm perplexed as to 
> > > > > > why
> > > > > > this didnt show up on our regularly tested platforms.
> > > > > > 
> > > > > > Gonna try to get this in for 7.2!
> > > > > 
> > > > > I see another problem with sparc64.
> > > > > 
> > > > > [    5.261508] could not locate request for tag 0x0
> > > > > [    5.261711] nvme nvme0: invalid id 0 completed on queue 1
> > > > > 
> > > > > That is seen repeatedly until the request times out. I'll test with
> > > > > your patch to see if it resolves this problem as well, and will bisect
> > > > > otherwise.
> > > > > 
> > > > The second problem is unrelated to the doorbell problem.
> > > > It is first seen in qemu v7.1. I'll try to bisect.
> > > > 
> > > 
> > > Unfortunately, the problem observed with sparc64 also bisects to this
> > > patch. Making things worse, "hw/nvme: fix missing cq eventidx update"
> > > does not fix it (which is why I initially thought it was unrelated).
> > > 
> > > I used the following qemu command line.
> > > 
> > > qemu-system-sparc64 -M sun4v -cpu "TI UltraSparc IIi" -m 512 -snapshot \
> > >      -device nvme,serial=foo,drive=d0,bus=pciB \
> > >      -drive file=rootfs.ext2,if=none,format=raw,id=d0 \
> > >      -kernel arch/sparc/boot/image -no-reboot \
> > >      -append "root=/dev/nvme0n1 console=ttyS0" \
> > >      -nographic -monitor none
> > > 
> > 
> > Hi Guenter,
> > 
> > Thank you very much for the detailed reports and I apologize for the
> > fallout of this.
> > 
> > I think this is related to missing endian conversions when handling the
> > shadow doorbells. I'm not sure if there is a kernel issue here as well,
> > because as far as I can tell, the shadow doorbells are updated like so:
> > 
> >    old_value = *dbbuf_db;
> >    *dbbuf_db = value;
> > 
> > (where `value` is the new head/tail value depending on if this is an sq
> > or cq).
> > 
> > Keith, is the kernel doing something magically here I am not aware of,
> > or isn't this missing some cpu_to_le32() / le32_to_cpu() calls as well?
> 
> Wouldn't that mean that nvme doorbell support would be broken in Linux
> on all big endian systems ? Maybe it is, but it seems a bit unlikely.
> 

No, not broken in general - only for shadow doorbells. On regular MMIO,
the linux helpers takes care of endianness (and so does the MMIO
callbacks in QEMU). However, for shadow doorbells, the doorbell value is
written to host memory - and Linux (and QEMU) does not handle that
correctly wrt. endianness.

I got it all working with the series I just posted for QEMU (v4) and
fixing endianness conversion for the above in the kernel (patch pending
for linux-nvme).

Attachment: signature.asc
Description: PGP signature

Reply via email to