Hello, I am reporting an NVMe issue on an RK3588-based board (FriendlyElec CM3588 NAS) running OpenBSD/arm64.
### Hardware * SoC: Rockchip RK3588 * Board: FriendlyElec CM3588 NAS * Storage: Samsung 990 EVO Plus 4TB (NVMe 2.0) * Boot: eMMC / SD (NVMe not used for root in tests) * U-Boot: vendor (FriendlyElec / Armbian-based) ### Summary NVMe attaches correctly, but **asynchronous I/O on the I/O queue (qid=1) stalls**, while the same operations complete successfully when using polling (`nvme_poll()`). This behavior is **dependent on the PCIe root port / physical socket used**. ### Observed Behavior With a single NVMe installed: * `nvme0` attaches correctly * Admin queue (qid=0) works reliably * `sd0` is attached and reports correct capacity * First READ command on I/O queue (qid=1) is submitted * No completion is observed in the normal interrupt-driven path * System stalls or eventually errors Instrumentation shows: * `nvme_q_submit()` is reached for qid=1 * `nvme_intr()` is triggered * `nvme_q_complete()` reports no CQ entries for the I/O queue Example: ``` [NVMEQ] submit qid=1 ... [NVMEIRQ] intr enter [NVMEIRQ] q_complete io=0 admin=0 ``` However, forcing polling: ``` nvme_poll(sc, sc->sc_q, ...) ``` results in: * completion observed * `nvme_scsi_io_done()` called * I/O succeeds ### Kernel Versions Tested * nvme.c rev 1.124 (with instrumentation) * nvme.c rev 1.126 (current) Result: * 1.124 shows clear async vs poll discrepancy * 1.126 still exhibits hang/stall on affected PCIe path ### PCIe / Socket Observations The issue appears **dependent on which PCIe root port is used**, not just the drive: * Different physical M.2 sockets map inconsistently to `dwpcieX` * Some controllers fail link training (`can't initialize hardware`) * Some configurations allow NVMe attach but fail during I/O This suggests a possible interaction between: * RK3588 PCIe controller(s) * interrupt/MSI/MSI-X handling * or memory ordering / DMA visibility ### Additional Symptoms * System instability after failed I/O (e.g. `Bus error`) * `disklabels not read` during boot when NVMe is present * behavior improves when forcing polling (no interrupts) ### Notes * Issue reproduces with a single NVMe device * softraid disabled for testing * small I/O sizes (no PRP list involved) ### Request Any guidance on: * debugging NVMe interrupt/completion path on arm64 * known issues with RK3588 PCIe + MSI/MSI-X * or additional instrumentation points would be appreciated. I can provide: * full dmesg * DTS comparisons (vendor vs OpenBSD) * instrumented logs Thanks,Tiago
