On Thu, Jul 07, 2016 at 11:13:29AM +1000, Blair Bethwaite wrote: :Jon, : :Awesome, thanks for sharing. We've just run into an issue with SRIOV :VF passthrough that sounds like it might be the same problem (device :disappearing after a reboot), but haven't yet investigated deeply - :this will help with somewhere to start!
:By the way, the nouveau mention was because we had missed it on some :K80 hypervisors recently and seen passthrough apparently work, but :then the NVIDIA drivers would not build in the guest as they claimed :they could not find a supported device (despite the GPU being visible :on the PCI bus). Definitely sage advice! :I have also heard passing mention of requiring qemu :2.3+ but don't have any specific details of the related issue. I didn't do a bisection but with qemu 2.2 (from ubuntu cloudarchive kilo) I was sad and with 2.5 (from ubuntu cloudarchive mitaka but installed on a kilo hypervisor) I am working. Thanks, -Jon :Cheers, : :On 7 July 2016 at 08:13, Jonathan Proulx <j...@csail.mit.edu> wrote: :> On Wed, Jul 06, 2016 at 12:32:26PM -0400, Jonathan D. Proulx wrote: :> : :> :I do have an odd remaining issue where I can run cuda jobs in the vm :> :but snapshots fail and after pause (for snapshotting) the pci device :> :can't be reattached (which is where i think it deletes the snapshot :> :it took). Got same issue with 3.16 and 4.4 kernels. :> : :> :Not very well categorized yet, but I'm hoping it's because the VM I :> :was hacking on had it's libvirt.xml written out with the older qemu :> :maybe? It had been through a couple reboots of the physical system :> :though. :> : :> :Currently building a fresh instance and bashing more keys... :> :> After an ugly bout of bashing I've solve my failing snapshot issue :> which I'll post here in hopes of saving someonelse :> :> Short version: :> :> add "/dev/vfio/vfio rw," to /etc/apparmor.d/abstractions/libvirt-qemu :> add "ulimit -l unlimited" to /etc/init/libvirt-bin.conf :> :> Longer version: :> :> What was happening. :> :> * send snapshot request :> * instance pauses while snapshot is pending :> * instance attempt to resume :> * fails to reattach pci device :> * nova-compute.log :> Exception during message handling: internal error: unable to execute QEMU command 'device_add': Device initialization failedcompute.log :> :> * qemu/<id>.log :> vfio: failed to open /dev/vfio/vfio: Permission denied :> vfio: failed to setup container for group 48 :> vfio: failed to get group 48 :> * snapshot disappears :> * instance resumes but without passed through device (hard reboot :> reattaches) :> :> seeing permsission denied I though would be an easy fix but: :> :> # ls -l /dev/vfio/vfio :> crw-rw-rw- 1 root root 10, 196 Jul 6 14:05 /dev/vfio/vfio :> :> so I'm guessing I'm in apparmor hell, I try adding "/dev/vfio/vfio :> rw," to /etc/apparmor.d/abstractions/libvirt-qemu rebooting the :> hypervisor and trying again which gets me a different libvirt error :> set: :> :> VFIO_MAP_DMA: -12 :> vfio_dma_map(0x5633a5fa69b0, 0x0, 0xa0000, 0x7f4e7be00000) = -12 (Cannot allocate memory) :> :> kern.log (and thus dmesg) showing: :> vfio_pin_pages: RLIMIT_MEMLOCK (65536) exceeded :> :> Getting rid of this one required inserting 'ulimit -l unlimited' into :> /etc/init/libvirt-bin.conf in the 'script' section: :> :> <previous bits excluded> :> script :> [ -r /etc/default/libvirt-bin ] && . /etc/default/libvirt-bin :> ulimit -l unlimited :> exec /usr/sbin/libvirtd $libvirtd_opts :> end script :> :> :> -Jon :> :> _______________________________________________ :> OpenStack-operators mailing list :> OpenStack-operators@lists.openstack.org :> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators : : : :-- :Cheers, :~Blairo -- _______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators