Re: 10.1 NVMe kernel panic

2015-06-02 Thread Jim Harris
On Thu, May 21, 2015 at 8:33 AM, Sean Kelly smke...@smkelly.org wrote:

 Greetings.

 I have a Dell R630 server with four of Dell’s 800GB NVMe SSDs running
 FreeBSD 10.1-p10. According to the PCI vendor, they are some sort of
 rebranded Samsung drive. If I boot the system and then load nvme.ko and
 nvd.ko from a command line, the drives show up okay. If I put
 nvme_load=“YES”
 nvd_load=“YES”
 in /boot/loader.conf, the box panics on boot:
 panic: nexus_setup_intr: NULL irq resource!

 If I boot the system with “Safe Mode: ON” from the loader menu, it also
 boots successfully and the drives show up.

 You can see a full ‘boot -v’ here:
 http://smkelly.org/stuff/nvme-panic.txt 
 http://smkelly.org/stuff/nvme-panic.txt

 Anyone have any insight into what the issue may be here? Ideally I need to
 get this working in the next few days or return this thing to Dell.


Hi Sean,

Can you try adding hw.nvme.force_intx=1 to /boot/loader.conf?

I suspect you are able to load the drivers successfully after boot because
interrupt assignments are not restricted to CPU0 at that point - see
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321 for a related
issue.  Your logs clearly show that vectors were allocated for the first 2
NVMe SSDs, but the third could not get its full allocation.  There is a bug
in the INTx fallback code that needs to be fixed - you do not hit this bug
when loading after boot because bug #199321 only affects interrupt
allocation during boot.

If the force_intx test works, would you able to upgrade your nvme drivers
to the latest on stable/10?  There are several patches (one related to
interrupt vector allocation) that have been pushed to stable/10 since 10.1
was released, and I will be pushing another patch for the issue you have
reported shortly.

Thanks,

-Jim





 Thanks!

 --
 Sean Kelly
 smke...@smkelly.org
 http://smkelly.org

 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: 10.1 NVMe kernel panic

2015-06-02 Thread Sean Kelly
Jim,

Thanks for the reply. I set hw.nvme.force_intx=1 and get a new form of kernel 
panic:
http://smkelly.org/stuff/nvme_crash_force_intx.txt 
http://smkelly.org/stuff/nvme_crash_force_intx.txt

It looks like the NVMes are just failing to initialize at all now. As long as 
that tunable is in the kenv, I get this behavior. If I kldload them after boot, 
the init fails as well. But if I kldunload, kenv -u, kldload, it then works 
again. The only difference is kldload doesn’t result in a panic, just timeouts 
initializing them all.

I also compiled and tried stable/10 and it crashed in a similar way, but i’ve 
not captured the panic yet. It crashes even without the tunable in place. I’ll 
see if I can capture it.

-- 
Sean Kelly
smke...@smkelly.org
http://smkelly.org

 On Jun 2, 2015, at 6:10 PM, Jim Harris jim.har...@gmail.com wrote:
 
 
 
 On Thu, May 21, 2015 at 8:33 AM, Sean Kelly smke...@smkelly.org 
 mailto:smke...@smkelly.org wrote:
 Greetings.
 
 I have a Dell R630 server with four of Dell’s 800GB NVMe SSDs running FreeBSD 
 10.1-p10. According to the PCI vendor, they are some sort of rebranded 
 Samsung drive. If I boot the system and then load nvme.ko and nvd.ko from a 
 command line, the drives show up okay. If I put
 nvme_load=“YES”
 nvd_load=“YES”
 in /boot/loader.conf, the box panics on boot:
 panic: nexus_setup_intr: NULL irq resource!
 
 If I boot the system with “Safe Mode: ON” from the loader menu, it also boots 
 successfully and the drives show up.
 
 You can see a full ‘boot -v’ here:
 http://smkelly.org/stuff/nvme-panic.txt 
 http://smkelly.org/stuff/nvme-panic.txt 
 http://smkelly.org/stuff/nvme-panic.txt 
 http://smkelly.org/stuff/nvme-panic.txt
 
 Anyone have any insight into what the issue may be here? Ideally I need to 
 get this working in the next few days or return this thing to Dell.
 
 Hi Sean,
 
 Can you try adding hw.nvme.force_intx=1 to /boot/loader.conf?
 
 I suspect you are able to load the drivers successfully after boot because 
 interrupt assignments are not restricted to CPU0 at that point - see 
 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321 
 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321 for a related 
 issue.  Your logs clearly show that vectors were allocated for the first 2 
 NVMe SSDs, but the third could not get its full allocation.  There is a bug 
 in the INTx fallback code that needs to be fixed - you do not hit this bug 
 when loading after boot because bug #199321 only affects interrupt allocation 
 during boot.
 
 If the force_intx test works, would you able to upgrade your nvme drivers to 
 the latest on stable/10?  There are several patches (one related to interrupt 
 vector allocation) that have been pushed to stable/10 since 10.1 was 
 released, and I will be pushing another patch for the issue you have reported 
 shortly.
 
 Thanks,
 
 -Jim
 
 
   
 
 Thanks!
 
 --
 Sean Kelly
 smke...@smkelly.org mailto:smke...@smkelly.org
 http://smkelly.org http://smkelly.org/
 
 ___
 freebsd-stable@freebsd.org mailto:freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable 
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org 
 mailto:freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org