On Sun, Oct 23, 2016 at 10:00:08AM -0700, Robert Mustacchi wrote:
> On 10/22/16 17:54 , Frank M. wrote:
> > Hi,
> > 
> > I´ve installed the new bloody as a new test virtual machine. The vm
> > crashes whenever there is a access to the nvme device.
> > Tests with a "special nvme driver" on the last stable had no problems.
> > I tested a Samsung SM951 on ESXi 6.0 U2 on a Supermicro X10DRi-T4+.
> > Tomorrow I will test shortly a Samsung SM961 on another Server (HP).
> > I hope, you will find a solution until the next stable release...
> 
> Well, where does the system crash exactly? Do you have a dump or
> anything you can actually share? Otherwise, it'll be hard for folks to
> make progress or suggestions.

Here's what happens: nvme gets the initial MSI-X interrupt, then at a
later time releases that interrupt and wants to get as many MSI-X
interrupts as it can use with the device and number of CPUs available.

And this is how ESXi sees it:

[ nvme_setup_interrupts(nvme, DDI_INTR_TYPE_MSIX, 1) ]
cpu5:150197)VMKPCIPassthru: 1850: SBDF=0000:0a:00.0 intrType = 4 numIntrs: 1
cpu5:150197)IntrCookie: 3643: cookie 0x7b vector 0xbb
cpu5:150197)IntrCookie: 1935: cookie 0x7b moduleID 0 <pcip_0000:0a:00.0> 
exclusive, flags 0x1

[ nvme_release_interrupts(nvme) ]
cpu5:150197)VMKPCIPassthru: 1850: SBDF=0000:0a:00.0 intrType = 2 numIntrs: 1
cpu5:150197)IntrCookie: 3643: cookie 0x7c vector 0x34
cpu5:150197)IntrCookie: 1935: cookie 0x7c moduleID 0 <pcip_0000:0a:00.0> 
exclusive, flags 0x1

[ nvme_setup_interrupts(nvme, DDI_INTR_TYPE_MSIX, 4) ]
cpu5:150197)VMKPCIPassthru: 1850: SBDF=0000:0a:00.0 intrType = 4 numIntrs: 1
cpu5:150197)WARNING: MSI: 593: MSI-X already enabled for device: 0000:0a:00.0, 
control: 0x8020
cpu5:150197)IntrCookie: 1325: Unable to allocate 1 cookies: Bad parameter
cpu5:150197)VMKPCIPassthru: 1796: Failed to allocate 1 MSIX interrupts
cpu5:150197)UserDump: 1908: Dumping cartel 150187 (from world 150197) to file 
/vmfs/volumes/6c1351b1-f4405fae/Test-SAN0/vmx-zdump.000 ...
cpu5:150197)UserDump: 2028: Userworld coredump complete.

So perhaps apix/pcmplusmp need to do some cleanup work when interrupts
are released so that ESXi understands what happens there. I don't really
know.

Disabling the use of MSI-X in nvme on ESXi seems to workaround that
problem. I've asked that question before: There are already a few other
drivers in illumos that disable MSI-X when running on ESXi. Should nvme
do the same, or would it better to disable MSI-X on ESXi completely?


Hans


-- 
%SYSTEM-F-ANARCHISM, The operating system has been overthrown


-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to