Re: Error when running fio against nvme-of rdma target (mlx5 driver)
Hi Robin, I ran into the exact same problem while testing with 4 connect-x6 cards, kernel 5.18-rc6. [ 4878.273016] nvme nvme0: Successfully reconnected (3 attempts) [ 4879.122015] nvme nvme0: starting error recovery [ 4879.122028] infiniband mlx5_4: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error [ 4879.122035] infiniband mlx5_4: dump_cqe:272:(pid 0): dump error cqe [ 4879.122037] : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 4879.122039] 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 4879.122040] 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 4879.122040] 0030: 00 00 00 00 a9 00 56 04 00 00 00 ed 0d da ff e2 [ 4881.085547] nvme nvme3: Reconnecting in 10 seconds... I assume this means that the problem has still not been resolved? If so, I'll try to diagnose the problem. Thanks, --Mark On 11/02/2022, 12:35, "Linux-nvme on behalf of Robin Murphy" wrote: On 2022-02-10 23:58, Martin Oliveira wrote: > On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote: >> On 2/8/22 6:50 PM, Martin Oliveira wrote: >>> Hello, >>> >>> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions. >>> >>> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace. >>> >> >> Thanks for reporting this, if you can bisect the problem on your setup >> it will help others to help you better. >> >> -ck > > Hi Chaitanya, > > I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from. > > I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version. > > I'd be happy to try any tests if someone has any suggestions. The IOMMU is probably your friend here - one thing that might be worth trying is capturing the iommu:map and iommu:unmap tracepoints to see if the address reported in subsequent IOMMU faults was previously mapped as a valid DMA address (be warned that there will likely be a *lot* of trace generated). With 5.13 or newer, booting with "iommu.forcedac=1" should also make it easier to tell real DMA IOVAs from rogue physical addresses or other nonsense, as real DMA addresses should then look more like 0x24d08000. That could at least help narrow down whether it's some kind of use-after-free race or a completely bogus address creeping in somehow. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Error when running fio against nvme-of rdma target (mlx5 driver)
Hi, Can you please send the original scenario, setup details and dumps ? I can't find it in my mailbox. you can send it directly to me to avoid spam. -Max. On 5/17/2022 11:26 AM, Mark Ruijter wrote: Hi Robin, I ran into the exact same problem while testing with 4 connect-x6 cards, kernel 5.18-rc6. [ 4878.273016] nvme nvme0: Successfully reconnected (3 attempts) [ 4879.122015] nvme nvme0: starting error recovery [ 4879.122028] infiniband mlx5_4: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error [ 4879.122035] infiniband mlx5_4: dump_cqe:272:(pid 0): dump error cqe [ 4879.122037] : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 4879.122039] 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 4879.122040] 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 4879.122040] 0030: 00 00 00 00 a9 00 56 04 00 00 00 ed 0d da ff e2 [ 4881.085547] nvme nvme3: Reconnecting in 10 seconds... I assume this means that the problem has still not been resolved? If so, I'll try to diagnose the problem. Thanks, --Mark On 11/02/2022, 12:35, "Linux-nvme on behalf of Robin Murphy" wrote: On 2022-02-10 23:58, Martin Oliveira wrote: > On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote: >> On 2/8/22 6:50 PM, Martin Oliveira wrote: >>> Hello, >>> >>> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions. >>> >>> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace. >>> >> >> Thanks for reporting this, if you can bisect the problem on your setup >> it will help others to help you better. >> >> -ck > > Hi Chaitanya, > > I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from. > > I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version. > > I'd be happy to try any tests if someone has any suggestions. The IOMMU is probably your friend here - one thing that might be worth trying is capturing the iommu:map and iommu:unmap tracepoints to see if the address reported in subsequent IOMMU faults was previously mapped as a valid DMA address (be warned that there will likely be a *lot* of trace generated). With 5.13 or newer, booting with "iommu.forcedac=1" should also make it easier to tell real DMA IOVAs from rogue physical addresses or other nonsense, as real DMA addresses should then look more like 0x24d08000. That could at least help narrow down whether it's some kind of use-after-free race or a completely bogus address creeping in somehow. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Error when running fio against nvme-of rdma target (mlx5 driver)
On 2022-02-10 23:58, Martin Oliveira wrote: On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote: On 2/8/22 6:50 PM, Martin Oliveira wrote: Hello, We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions. Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace. Thanks for reporting this, if you can bisect the problem on your setup it will help others to help you better. -ck Hi Chaitanya, I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from. I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version. I'd be happy to try any tests if someone has any suggestions. The IOMMU is probably your friend here - one thing that might be worth trying is capturing the iommu:map and iommu:unmap tracepoints to see if the address reported in subsequent IOMMU faults was previously mapped as a valid DMA address (be warned that there will likely be a *lot* of trace generated). With 5.13 or newer, booting with "iommu.forcedac=1" should also make it easier to tell real DMA IOVAs from rogue physical addresses or other nonsense, as real DMA addresses should then look more like 0x24d08000. That could at least help narrow down whether it's some kind of use-after-free race or a completely bogus address creeping in somehow. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Error when running fio against nvme-of rdma target (mlx5 driver)
On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote: > On 2/8/22 6:50 PM, Martin Oliveira wrote: > > Hello, > > > > We have been hitting an error when running IO over our nvme-of setup, using > > the mlx5 driver and we are wondering if anyone has seen anything > > similar/has any suggestions. > > > > Both initiator and target are AMD EPYC 7502 machines connected over RDMA > > using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a > > single NVMe fabrics device, one physical SSD per namespace. > > > > Thanks for reporting this, if you can bisect the problem on your setup > it will help others to help you better. > > -ck Hi Chaitanya, I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from. I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version. I'd be happy to try any tests if someone has any suggestions. Thanks, Martin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Error when running fio against nvme-of rdma target (mlx5 driver)
On 2022-02-09 02:50, Martin Oliveira wrote: Hello, We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions. Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace. When running an fio job targeting directly the fabrics devices (no filesystem, see script at the end), within a minute or so we start seeing errors like this: [ 408.368677] mlx5_core :c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d08000 flags=0x] [ 408.372201] infiniband mlx5_0: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error [ 408.380181] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe [ 408.380187] : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 408.380189] 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 408.380191] 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 408.380192] 0030: 00 00 00 00 a9 00 56 04 00 00 01 e9 00 54 e8 e2 [ 408.380230] nvme nvme15: RECV for CQE 0xce392ed9 failed with status local protection error (4) [ 408.380235] nvme nvme15: starting error recovery [ 408.380238] nvme_ns_head_submit_bio: 726 callbacks suppressed [ 408.380246] block nvme15n2: no usable path - requeuing I/O [ 408.380284] block nvme15n5: no usable path - requeuing I/O [ 408.380298] block nvme15n1: no usable path - requeuing I/O [ 408.380304] block nvme15n11: no usable path - requeuing I/O [ 408.380304] block nvme15n11: no usable path - requeuing I/O [ 408.380330] block nvme15n1: no usable path - requeuing I/O [ 408.380350] block nvme15n2: no usable path - requeuing I/O [ 408.380371] block nvme15n6: no usable path - requeuing I/O [ 408.380377] block nvme15n6: no usable path - requeuing I/O [ 408.380382] block nvme15n4: no usable path - requeuing I/O [ 408.380472] mlx5_core :c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d09000 flags=0x] [ 408.391265] mlx5_core :c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d0a000 flags=0x] [ 415.125967] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! [ 415.131898] nvmet: ctrl 1 fatal error occurred! Occasionally, we've seen the following stack trace: FWIW this is indicative the scatterlist passed to dma_unmap_sg_attrs() was wrong - specifically it looks like an attempt to unmap a region that's already unmapped (or was never mapped in the first place). Whatever race or data corruption issue is causing that is almost certainly happening much earlier, since the IO_PAGE_FAULT logs further imply that either some pages have been spuriously unmapped while the device was still accessing them, or some DMA address in the scatterlist was already bogus by the time it was handed off to the device. Robin. [ 1158.152464] kernel BUG at drivers/iommu/amd/io_pgtable.c:485! [ 1158.427696] invalid opcode: [#1] SMP NOPTI [ 1158.432228] CPU: 51 PID: 796 Comm: kworker/51:1H Tainted: P OE 5.13.0-eid-athena-g6fb4e704d11c-dirty #14 [ 1158.443867] Hardware name: GIGABYTE R272-Z32-00/MZ32-AR0-00, BIOS R21 10/08/2020 [ 1158.451252] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] [ 1158.456884] RIP: 0010:iommu_v1_unmap_page+0xed/0x100 [ 1158.461849] Code: 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 1d 48 83 c4 10 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 8d 46 ff 4c 85 f0 74 d6 <0f> 0b e8 1c 38 46 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 [ 1158.480589] RSP: 0018:abb520587bd0 EFLAGS: 00010206 [ 1158.485812] RAX: 000100061fff RBX: 0010 RCX: 0027 [ 1158.492938] RDX: 30562000 RSI: RDI: [ 1158.500071] RBP: abb520587c08 R08: abb520587bd0 R09: [ 1158.507202] R10: 0001 R11: 000ff000 R12: 9984abd9e318 [ 1158.514326] R13: 9984abd9e310 R14: 000100062000 R15: 0001 [ 1158.521452] FS: () GS:99a36c8c() knlGS: [ 1158.529540] CS: 0010 DS: ES: CR0: 80050033 [ 1158.535286] CR2: 7f75b04f1000 CR3: 0001eddd8000 CR4: 00350ee0 [ 1158.542419] Call Trace: [ 1158.544877] amd_iommu_unmap+0x2c/0x40 [ 1158.548653] __iommu_unmap+0xc4/0x170 [ 1158.552344] iommu_unmap_fast+0xe/0x10 [ 1158.556100] __iommu_dma_unmap+0x85/0x120 [ 1158.560115] iommu_dma_unmap_sg+0x95/0x110 [ 1158.564213] dma_unmap_sg_attrs+0x42/0x50 [ 1158.568225] rdma_rw_ctx_destroy+0x6e/0xc0 [ib_core] [ 1158.573201] nvmet_rdma_rw_ctx_destroy+0xa7/0xc0 [nvmet_rdma] [ 1158.578944] nvmet_rdma_read_data_done+0x5c/0xf0 [nvmet_rdma] [ 1158.584683] __ib_process_cq+0x8e/0x150 [ib_core] [ 1158.589398] ib_cq_poll_work+0x2b/0x80 [ib_core] [ 1158.594027] process_one_wor
Re: Error when running fio against nvme-of rdma target (mlx5 driver)
On 2/8/22 6:50 PM, Martin Oliveira wrote: > Hello, > > We have been hitting an error when running IO over our nvme-of setup, using > the mlx5 driver and we are wondering if anyone has seen anything similar/has > any suggestions. > > Both initiator and target are AMD EPYC 7502 machines connected over RDMA > using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a > single NVMe fabrics device, one physical SSD per namespace. > Thanks for reporting this, if you can bisect the problem on your setup it will help others to help you better. -ck ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Error when running fio against nvme-of rdma target (mlx5 driver)
Hello, We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions. Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace. When running an fio job targeting directly the fabrics devices (no filesystem, see script at the end), within a minute or so we start seeing errors like this: [ 408.368677] mlx5_core :c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d08000 flags=0x] [ 408.372201] infiniband mlx5_0: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error [ 408.380181] infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe [ 408.380187] : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 408.380189] 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 408.380191] 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 408.380192] 0030: 00 00 00 00 a9 00 56 04 00 00 01 e9 00 54 e8 e2 [ 408.380230] nvme nvme15: RECV for CQE 0xce392ed9 failed with status local protection error (4) [ 408.380235] nvme nvme15: starting error recovery [ 408.380238] nvme_ns_head_submit_bio: 726 callbacks suppressed [ 408.380246] block nvme15n2: no usable path - requeuing I/O [ 408.380284] block nvme15n5: no usable path - requeuing I/O [ 408.380298] block nvme15n1: no usable path - requeuing I/O [ 408.380304] block nvme15n11: no usable path - requeuing I/O [ 408.380304] block nvme15n11: no usable path - requeuing I/O [ 408.380330] block nvme15n1: no usable path - requeuing I/O [ 408.380350] block nvme15n2: no usable path - requeuing I/O [ 408.380371] block nvme15n6: no usable path - requeuing I/O [ 408.380377] block nvme15n6: no usable path - requeuing I/O [ 408.380382] block nvme15n4: no usable path - requeuing I/O [ 408.380472] mlx5_core :c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d09000 flags=0x] [ 408.391265] mlx5_core :c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002f address=0x24d0a000 flags=0x] [ 415.125967] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! [ 415.131898] nvmet: ctrl 1 fatal error occurred! Occasionally, we've seen the following stack trace: [ 1158.152464] kernel BUG at drivers/iommu/amd/io_pgtable.c:485! [ 1158.427696] invalid opcode: [#1] SMP NOPTI [ 1158.432228] CPU: 51 PID: 796 Comm: kworker/51:1H Tainted: P OE 5.13.0-eid-athena-g6fb4e704d11c-dirty #14 [ 1158.443867] Hardware name: GIGABYTE R272-Z32-00/MZ32-AR0-00, BIOS R21 10/08/2020 [ 1158.451252] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] [ 1158.456884] RIP: 0010:iommu_v1_unmap_page+0xed/0x100 [ 1158.461849] Code: 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 1d 48 83 c4 10 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 8d 46 ff 4c 85 f0 74 d6 <0f> 0b e8 1c 38 46 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 [ 1158.480589] RSP: 0018:abb520587bd0 EFLAGS: 00010206 [ 1158.485812] RAX: 000100061fff RBX: 0010 RCX: 0027 [ 1158.492938] RDX: 30562000 RSI: RDI: [ 1158.500071] RBP: abb520587c08 R08: abb520587bd0 R09: [ 1158.507202] R10: 0001 R11: 000ff000 R12: 9984abd9e318 [ 1158.514326] R13: 9984abd9e310 R14: 000100062000 R15: 0001 [ 1158.521452] FS: () GS:99a36c8c() knlGS: [ 1158.529540] CS: 0010 DS: ES: CR0: 80050033 [ 1158.535286] CR2: 7f75b04f1000 CR3: 0001eddd8000 CR4: 00350ee0 [ 1158.542419] Call Trace: [ 1158.544877] amd_iommu_unmap+0x2c/0x40 [ 1158.548653] __iommu_unmap+0xc4/0x170 [ 1158.552344] iommu_unmap_fast+0xe/0x10 [ 1158.556100] __iommu_dma_unmap+0x85/0x120 [ 1158.560115] iommu_dma_unmap_sg+0x95/0x110 [ 1158.564213] dma_unmap_sg_attrs+0x42/0x50 [ 1158.568225] rdma_rw_ctx_destroy+0x6e/0xc0 [ib_core] [ 1158.573201] nvmet_rdma_rw_ctx_destroy+0xa7/0xc0 [nvmet_rdma] [ 1158.578944] nvmet_rdma_read_data_done+0x5c/0xf0 [nvmet_rdma] [ 1158.584683] __ib_process_cq+0x8e/0x150 [ib_core] [ 1158.589398] ib_cq_poll_work+0x2b/0x80 [ib_core] [ 1158.594027] process_one_work+0x220/0x3c0 [ 1158.598038] worker_thread+0x4d/0x3f0 [ 1158.601696] kthread+0x114/0x150 [ 1158.604928] ? process_one_work+0x3c0/0x3c0 [ 1158.609114] ? kthread_park+0x90/0x90 [ 1158.612783] ret_from_fork+0x22/0x30 We first saw this on a 5.13 kernel but could reproduce with 5.17-rc2. We found a possibly related bug report [1] that suggested disabling the IOMMU could help, but even after I disabled it (amd_iommu=off iommu=off) I still get errors (nvme IO timeouts). Another thread from 2016[2] suggested that disabling some kernel debug options could workaround the "local protection error" but th