On 12/8/22 12:28, Keith Busch wrote:
When the request times out, the kernel should be printing the command ID. What does that say? The driver thinks the 0 is invalid, so I'm just curious what value it's expecting.
After some time I see the following. ... [ 88.071197] nvme nvme0: invalid id 0 completed on queue 1 [ 88.071514] could not locate request for tag 0x0 [ 88.071802] nvme nvme0: invalid id 0 completed on queue 1 [ 88.072135] could not locate request for tag 0x0 [ 88.072426] nvme nvme0: invalid id 0 completed on queue 1 [ 88.072720] could not locate request for tag 0x0 [ 88.073007] nvme nvme0: invalid id 0 completed on queue 1 [ 88.073343] nvme nvme0: request 0x50 genctr mismatch (got 0x0 expected 0x1) [ 88.073774] nvme nvme0: invalid id 80 completed on queue 1 [ 88.074127] nvme nvme0: request 0x4f genctr mismatch (got 0x0 expected 0x1) [ 88.074556] nvme nvme0: invalid id 79 completed on queue 1 [ 88.074903] nvme nvme0: request 0x4e genctr mismatch (got 0x0 expected 0x1) [ 88.075318] nvme nvme0: invalid id 78 completed on queue 1 [ 88.075803] nvme nvme0: request 0x45 genctr mismatch (got 0x0 expected 0x1) [ 88.076239] nvme nvme0: invalid id 69 completed on queue 1 [ 88.076585] nvme nvme0: request 0x46 genctr mismatch (got 0x0 expected 0x1) [ 88.076990] nvme nvme0: invalid id 70 completed on queue 1 [ 88.077314] nvme nvme0: request 0x47 genctr mismatch (got 0x0 expected 0x1) [ 88.077744] nvme nvme0: invalid id 71 completed on queue 1 [ 88.078064] nvme nvme0: request 0x48 genctr mismatch (got 0x0 expected 0x1) [ 88.078465] nvme nvme0: invalid id 72 completed on queue 1 [ 88.078792] nvme nvme0: request 0x49 genctr mismatch (got 0x0 expected 0x1) [ 88.079190] nvme nvme0: invalid id 73 completed on queue 1 [ 88.079522] nvme nvme0: request 0x4a genctr mismatch (got 0x0 expected 0x1) [ 88.079918] nvme nvme0: invalid id 74 completed on queue 1 [ 88.080243] nvme nvme0: request 0x4b genctr mismatch (got 0x0 expected 0x1) [ 88.080630] nvme nvme0: invalid id 75 completed on queue 1 [ 88.080963] nvme nvme0: request 0x4c genctr mismatch (got 0x0 expected 0x1) [ 88.081361] nvme nvme0: invalid id 76 completed on queue 1 [ 88.081687] nvme nvme0: request 0x4d genctr mismatch (got 0x0 expected 0x1) [ 88.082081] nvme nvme0: invalid id 77 completed on queue 1 [ 89.061345] irq 9: nobody cared (try booting with the "irqpoll" option) [ 89.061794] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G N 6.1.0-rc8+ #1 [ 89.062220] Call Trace: [ 89.062383] [<0000000000eb7518>] __report_bad_irq+0x38/0xb4 [ 89.062685] [<00000000004e6e58>] note_interrupt+0x318/0x380 [ 89.063000] [<00000000004e2f00>] handle_irq_event+0x80/0xc0 [ 89.063296] [<00000000004e7e10>] handle_fasteoi_irq+0x90/0x220 [ 89.063631] [<00000000004e18e8>] generic_handle_irq+0x28/0x40 [ 89.063946] [<0000000000ede2ec>] handler_irq+0xac/0x100 [ 89.064255] [<00000000004274b0>] sys_call_table+0x760/0x970 [ 89.064578] [<0000000000eb2ee0>] ffs+0x0/0x18 [ 89.064848] [<000000000042be0c>] do_softirq_own_stack+0x2c/0x40 [ 89.065195] [<000000000046fd50>] __irq_exit_rcu+0xf0/0x140 [ 89.065520] [<0000000000470744>] irq_exit+0x4/0x40 [ 89.065830] [<0000000000ede3c4>] timer_interrupt+0x84/0xc0 [ 89.066184] [<00000000004274f8>] sys_call_table+0x7a8/0x970 [ 89.066546] [<00000000008f75e0>] blk_mq_do_dispatch_sched+0xa0/0x380 [ 89.066940] [<00000000008f7ad4>] __blk_mq_sched_dispatch_requests+0x94/0x160 [ 89.067359] [<00000000008f7bfc>] blk_mq_sched_dispatch_requests+0x3c/0x80 [ 89.067774] handlers: [ 89.067952] [<(____ptrval____)>] nvme_irq [ 89.068254] [<(____ptrval____)>] nvme_irq [ 89.068538] Disabling IRQ #9 [ 89.069837] random: crng init done [ 89.183077] could not locate request for tag 0x0 [ 89.183475] nvme nvme0: invalid id 0 completed on queue 1 [ 89.183824] could not locate request for tag 0x0 ... [ 89.766750] nvme nvme0: invalid id 0 completed on queue 1 [ 89.767076] could not locate request for tag 0x0 [ 89.767361] nvme nvme0: invalid id 0 completed on queue 1 [ 89.767701] nvme nvme0: request 0x4d genctr mismatch (got 0x0 expected 0x1) [ 89.768114] nvme nvme0: invalid id 77 completed on queue 1 [ 89.768455] nvme nvme0: request 0x4c genctr mismatch (got 0x0 expected 0x1) [ 89.768876] nvme nvme0: invalid id 76 completed on queue 1 [ 89.769215] nvme nvme0: request 0x4b genctr mismatch (got 0x0 expected 0x1) [ 89.769630] nvme nvme0: invalid id 75 completed on queue 1 [ 89.769991] nvme nvme0: request 0x4a genctr mismatch (got 0x0 expected 0x1) [ 89.770409] nvme nvme0: invalid id 74 completed on queue 1 [ 89.770750] nvme nvme0: request 0x49 genctr mismatch (got 0x0 expected 0x1) [ 89.771171] nvme nvme0: invalid id 73 completed on queue 1 [ 89.771513] nvme nvme0: request 0x48 genctr mismatch (got 0x0 expected 0x1) [ 89.771934] nvme nvme0: invalid id 72 completed on queue 1 [ 89.772286] nvme nvme0: request 0x47 genctr mismatch (got 0x0 expected 0x1) [ 89.772707] nvme nvme0: invalid id 71 completed on queue 1 [ 89.773048] nvme nvme0: request 0x46 genctr mismatch (got 0x0 expected 0x1) [ 89.773478] nvme nvme0: invalid id 70 completed on queue 1 [ 89.773958] nvme nvme0: request 0x51 genctr mismatch (got 0x0 expected 0x1) [ 89.774393] nvme nvme0: invalid id 81 completed on queue 1 [ 89.774740] nvme nvme0: request 0x50 genctr mismatch (got 0x0 expected 0x1) [ 89.775163] nvme nvme0: invalid id 80 completed on queue 1 [ 89.775514] nvme nvme0: request 0x4f genctr mismatch (got 0x0 expected 0x1) [ 89.775902] nvme nvme0: invalid id 79 completed on queue 1 [ 89.776239] nvme nvme0: request 0x4e genctr mismatch (got 0x0 expected 0x1) [ 89.776649] nvme nvme0: invalid id 78 completed on queue 1 [ 89.782920] could not locate request for tag 0x0 Does that help ? If not I can add some additional log messages. Guenter
On Thu, Dec 8, 2022, 8:13 PM Guenter Roeck <li...@roeck-us.net <mailto:li...@roeck-us.net>> wrote: On Thu, Dec 08, 2022 at 10:47:42AM -0800, Guenter Roeck wrote: > > > > A cq head doorbell mmio is skipped... And it is not the fault of the > > kernel. The kernel is in it's good right to skip the mmio since the cq > > eventidx is not properly updated. > > > > Adding that and it boots properly on riscv. But I'm perplexed as to why > > this didnt show up on our regularly tested platforms. > > > > Gonna try to get this in for 7.2! > > I see another problem with sparc64. > > [ 5.261508] could not locate request for tag 0x0 > [ 5.261711] nvme nvme0: invalid id 0 completed on queue 1 > > That is seen repeatedly until the request times out. I'll test with > your patch to see if it resolves this problem as well, and will bisect > otherwise. > The second problem is unrelated to the doorbell problem. It is first seen in qemu v7.1. I'll try to bisect. Guenter