I've now committed my fixes for NVMe driver, should be more stable now, give it a try.
With those fixes, the driver works without any problem, even under fairly heavy i/o load, when nvme.c and ld_nvme.c is compiled with -O0, on both virtual and real MP machine. -O2 kernel works also on virtual machine, but I've had an I/O lockup on real hw machine with -O2 kernel. It may have been unrelated, I'm still investigating. Jaromir 2016-10-18 22:01 GMT+02:00 Jaromír Doleček <jaromir.dole...@gmail.com>: > Hey, > > thank you. This iostat_unbusy panic is typical symptom of the current > MP issues, the command completion queue gets corrupted, and > nvme_q_complete() delivers some commands twice. It causes either this > panic (due to duplicate lddone() for stale buf), or a random kernel > crash. > > I've been working on debugging this for past two weeks or so. I have > some local changes (mainly some volatile classifiers) which seem to > fix this issue at least for my MP VirtualBox test machine. But these > changes still do not fix the issue completely on another real system I > have access to. I guess it would be useful to share the ongoing work > at least. I'll polish and commit what I have, today or tomorrow. > > Jaromir > > 2016-10-18 10:40 GMT+02:00 Masanobu SAITOH <msai...@execsw.org>: >> On 2016/09/22 5:54, Jaromír Doleček wrote: >>> >>> Hello, >>> >>> NVMe driver in NetBSD-current was recently tweaked to fix several MP and >>> locking >>> issues, and the driver is now marked as MPSAFE by default. >>> >>> Most of this work was done on emulators since I lack the the hardware, >>> so it's not clear if >>> everything would work properly on real systems too. >>> >>> Anyone having the hardware, I'd appreciate if you could check the >>> driver out, and try >>> to punish the drive by some heavy I/O test with parallel load if >>> possible, and report >>> results. >>> >>> The driver should work on i386 and amd64, and is enabled in >>> INSTALL/GENERIC kernels there, >>> so you could just try to boot install iso from NetBSD daily builds, >>> and send-pr any >>> issues. >>> >>> I'd also especially welcome if someone with sparc64 system could test >>> the driver out, too. >>> The driver originates from OpenBSD where nvme(4) is enabled in GENERIC >>> sparc64 >>> kernel, so it should work. But it was not confirmed yet on >>> NetBSD/sparc64. Note you might >>> need fairly modern system, at least some Intel NVMe cards require PCIe >>> Generation 3 to >>> actually work, so this rules out e.g. T1s. >>> >>> I'd also very welcome any benchmark results, it would be very >>> interesting to share some >>> IOPS figures. >>> >>> Let me know the results, I'd like to update driver manpage to list >>> known working hardware. >>> >>> In any reports, please include the attachment fragment from dmesg, as >>> there >>> is quite significant different between attachment via apic/INTx and >>> MSI/MSI-X. >>> Also useful would be intrctl(8) output, to confirm interrupt handlers >>> are dispatched >>> properly to individual available CPUs. >>> >>> Thank you. >>> >>> Jaromir >>> >> >> With nvme.c rev. 1.16: >> >>> Oct 18 17:14:02 five savecore: reboot after panic: panic: >>> ioWsAtRNatI_NWG:Au nRSNPILN GbNuO:Ts SLPOyLW E RN >> >> >> and, >> >>> five# crash -M netbsd.36.core -N /netbsd >>> Crash version 7.99.39, image version 7.99.39. >>> System panicked: iostat_unbusy >>> Backtrace from time of crash is available. >>> crash> trace >>> _KERNEL_OPT_NVGA_RASTERCONSOLE() at 0 >>> ?() at ffff80008f0e5240 >>> vpanic() at vpanic+0x149 >>> snprintf() at snprintf >>> iostat_isbusy() at iostat_isbusy >>> dk_done1() at dk_done1+0xab >>> lddone() at lddone+0xf >>> nvme_q_complete() at nvme_q_complete+0xc6 >>> softint_dispatch() at softint_dispatch+0xd3 >>> DDB lost frame for Xsoftintr+0x4f, trying 0xfffffe810e919ff0 >>> Xsoftintr() at Xsoftintr+0x4f >>> --- interrupt --- >>> 0: >> >> >> Again, the panic message was: >> >>> Oct 18 17:14:02 five savecore: reboot after panic: panic: >>> ioWsAtRNatI_NWG:Au nRSNPILN GbNuO:Ts SLPOyLW E RN >> >> >> -> panic: iostat_unbust >> -> WARNINWG:A RSNPILN GNO:T SLPOLW E RN >> >> -> WARNING: SPL NOT LOWER >> -> WARNING: SPL N >> >> The full dmesg is at: >> >> http://www.netbsd.org/~msaitoh/nvme-20161018-0.log >> >> Any test code are welcomed! >> >> -- >> ----------------------------------------------- >> SAITOH Masanobu (msai...@execsw.org >> msai...@netbsd.org)