On Sun, Oct 23, 2016 at 10:00:08AM -0700, Robert Mustacchi wrote: > On 10/22/16 17:54 , Frank M. wrote: > > Hi, > > > > I´ve installed the new bloody as a new test virtual machine. The vm > > crashes whenever there is a access to the nvme device. > > Tests with a "special nvme driver" on the last stable had no problems. > > I tested a Samsung SM951 on ESXi 6.0 U2 on a Supermicro X10DRi-T4+. > > Tomorrow I will test shortly a Samsung SM961 on another Server (HP). > > I hope, you will find a solution until the next stable release... > > Well, where does the system crash exactly? Do you have a dump or > anything you can actually share? Otherwise, it'll be hard for folks to > make progress or suggestions.
Here's what happens: nvme gets the initial MSI-X interrupt, then at a later time releases that interrupt and wants to get as many MSI-X interrupts as it can use with the device and number of CPUs available. And this is how ESXi sees it: [ nvme_setup_interrupts(nvme, DDI_INTR_TYPE_MSIX, 1) ] cpu5:150197)VMKPCIPassthru: 1850: SBDF=0000:0a:00.0 intrType = 4 numIntrs: 1 cpu5:150197)IntrCookie: 3643: cookie 0x7b vector 0xbb cpu5:150197)IntrCookie: 1935: cookie 0x7b moduleID 0 <pcip_0000:0a:00.0> exclusive, flags 0x1 [ nvme_release_interrupts(nvme) ] cpu5:150197)VMKPCIPassthru: 1850: SBDF=0000:0a:00.0 intrType = 2 numIntrs: 1 cpu5:150197)IntrCookie: 3643: cookie 0x7c vector 0x34 cpu5:150197)IntrCookie: 1935: cookie 0x7c moduleID 0 <pcip_0000:0a:00.0> exclusive, flags 0x1 [ nvme_setup_interrupts(nvme, DDI_INTR_TYPE_MSIX, 4) ] cpu5:150197)VMKPCIPassthru: 1850: SBDF=0000:0a:00.0 intrType = 4 numIntrs: 1 cpu5:150197)WARNING: MSI: 593: MSI-X already enabled for device: 0000:0a:00.0, control: 0x8020 cpu5:150197)IntrCookie: 1325: Unable to allocate 1 cookies: Bad parameter cpu5:150197)VMKPCIPassthru: 1796: Failed to allocate 1 MSIX interrupts cpu5:150197)UserDump: 1908: Dumping cartel 150187 (from world 150197) to file /vmfs/volumes/6c1351b1-f4405fae/Test-SAN0/vmx-zdump.000 ... cpu5:150197)UserDump: 2028: Userworld coredump complete. So perhaps apix/pcmplusmp need to do some cleanup work when interrupts are released so that ESXi understands what happens there. I don't really know. Disabling the use of MSI-X in nvme on ESXi seems to workaround that problem. I've asked that question before: There are already a few other drivers in illumos that disable MSI-X when running on ESXi. Should nvme do the same, or would it better to disable MSI-X on ESXi completely? Hans -- %SYSTEM-F-ANARCHISM, The operating system has been overthrown ------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com