Hello,

I am reporting an NVMe issue on an RK3588-based board (FriendlyElec CM3588 NAS) 
running OpenBSD/arm64.

### Hardware

* SoC: Rockchip RK3588
* Board: FriendlyElec CM3588 NAS
* Storage: Samsung 990 EVO Plus 4TB (NVMe 2.0)
* Boot: eMMC / SD (NVMe not used for root in tests)
* U-Boot: vendor (FriendlyElec / Armbian-based)

### Summary

NVMe attaches correctly, but **asynchronous I/O on the I/O queue (qid=1) 
stalls**, while the same operations complete successfully when using polling 
(`nvme_poll()`).

This behavior is **dependent on the PCIe root port / physical socket used**.

### Observed Behavior

With a single NVMe installed:

* `nvme0` attaches correctly
* Admin queue (qid=0) works reliably
* `sd0` is attached and reports correct capacity
* First READ command on I/O queue (qid=1) is submitted
* No completion is observed in the normal interrupt-driven path
* System stalls or eventually errors

Instrumentation shows:

* `nvme_q_submit()` is reached for qid=1
* `nvme_intr()` is triggered
* `nvme_q_complete()` reports no CQ entries for the I/O queue

Example:

```
[NVMEQ] submit qid=1 ...
[NVMEIRQ] intr enter
[NVMEIRQ] q_complete io=0 admin=0
```

However, forcing polling:

```
nvme_poll(sc, sc->sc_q, ...)
```

results in:

* completion observed
* `nvme_scsi_io_done()` called
* I/O succeeds

### Kernel Versions Tested

* nvme.c rev 1.124 (with instrumentation)
* nvme.c rev 1.126 (current)

Result:

* 1.124 shows clear async vs poll discrepancy
* 1.126 still exhibits hang/stall on affected PCIe path

### PCIe / Socket Observations

The issue appears **dependent on which PCIe root port is used**, not just the 
drive:

* Different physical M.2 sockets map inconsistently to `dwpcieX`
* Some controllers fail link training (`can't initialize hardware`)
* Some configurations allow NVMe attach but fail during I/O

This suggests a possible interaction between:

* RK3588 PCIe controller(s)
* interrupt/MSI/MSI-X handling
* or memory ordering / DMA visibility

### Additional Symptoms

* System instability after failed I/O (e.g. `Bus error`)
* `disklabels not read` during boot when NVMe is present
* behavior improves when forcing polling (no interrupts)

### Notes

* Issue reproduces with a single NVMe device
* softraid disabled for testing
* small I/O sizes (no PRP list involved)

### Request

Any guidance on:

* debugging NVMe interrupt/completion path on arm64
* known issues with RK3588 PCIe + MSI/MSI-X
* or additional instrumentation points

would be appreciated.

I can provide:

* full dmesg
* DTS comparisons (vendor vs OpenBSD)
* instrumented logs

Thanks,Tiago

Reply via email to