On 10/28/2016 07:28 AM, Henning Schild wrote:
Hey,
i am running an unusual setup where i assign pci devices behind the
back of libvirt. I have two options to do that:
1. a wrapper script for qemu that takes care of suid-root and appends
arguments for pci-assign
2. virsh qemu-monitor-command ... 'device_add pci-assign...'
With any reasonably modern version of Linux/qemu/libvirt, you should not
be using pci-assign, but should use vfio-pci instead. pci-assign is old,
unmaintained, and deprecated (and any other bad words you can think of).
Also, have you done anything to lock the guest's memory in host RAM?
This is necessary so that the source/destination of DMA reads/writes is
always present. It is done automatically by libvirt as required *when
libvirt knows that a device is being assigned to the guest*, but if
you're going behind libvirt's back, you need to take care of that
yourself (or alternately, don't go behind libvirt's back, which is the
greatly preferred alternative!)
I know i should probably not be doing this,
Yes, that is a serious understatement :-) And I suspect that it isn't
necessary.
it is a workaround to
introduce fine-grained pci-assignment in an openstack setup, where
vendor and device id are not enough to pick the right device for a vm.
libvirt selects the device according to its PCI address, not vendor and
device id. Is that not "fine-grained" enough? (And does OpenStack not
let you select devices based on their PCI address?)
In both cases qemu will crash with the following output:
qemu: hardware error: pci read failed, ret = 0 errno = 22
followed by the usual machine state dump. With strace i found it to be
a failing read on the config space file of my device.
/sys/bus/pci/devices/0000:xx:xx.x/config
A few reads out of that file succeeded, as well as accesses on vendor
etc.
Manually launching a qemu with the pci-assign works without a problem,
so i "blame" libvirt and the cgroup environment the qemu ends up in.
So i put a bash into the exact same cgroup setup - next to a running
qemu, expecting a dd or hexdump on the config-space file to fail. But
from that bash i can read the file without a problem.
Has anyone seen that problem before?
No, because nobody else (that I've ever heard) is doing what you are
doing. You're going around behind the back of libvirt (and OpenStack)
to do device assignment with a method that was replaced with something
newer/better/etc about 3 years ago, and in the process are likely
missing a lot of the details that would otherwise be automatically
handled by libvirt.
Right now i do not know what i
am missing, maybe qemu is hitting some limits configured for the
cgroups or whatever. I can not use pci-assign from libvirt, but if i
did would it configure cgroups in a different way or relax some limits?
What would be a good next step to debug that? Right now i am looking at
kernel event traces, but the machine is pretty big and so is the trace.
My recommendation would be this:
1) look at OpenStack to see if it allows selecting the device to assign
by PCI address. If so, use that (it will just tell libvirt "assign this
device", and libvirt will automatically use VFIO for the device
assignment if it's available (which it will be))
2) if (1) is a deadend (i.e. OpenStack doesn't allow you to select based
on PCI address), use your "sneaky backdoor method" to do "virsh
attach-device somexmlfile.xml", where somexmlfile.xml has a proper
<hostdev> element to select and assign the host device you want. Again,
libvirt will automatically figure out if VFIO can be used, and will
properly setup everything necessary related to cgroups, locked memory, etc.
That assignment used to work and i do not know how it broke, i have
tried combinations of several kernels, versions of libvirt and qemu.
(kernel 3.18 and 4.4, libvirt 1.3.2 and 2.0.0, and qemu 2.2.1 and 2.7)
All combinations show the same problem, even the ones that work on
other machines. So when it comes to software versions the problem could
well be caused by a software update of another component, that i
got with the package manager and did not compile myself. It is a debian
8.6 with all recent updates installed. My guess would be that systemd
could have an influence on cgroups or limits causing such a problem.
That you would need to think of such things points out that your current
setup is fragile and ultimately unmaintainable. Please consider
"coloring inside the lines" :-) (We'd be happy to help if there are any
hangups along the way, either on the libvirt-users mailing list or in
the #virt channel on irc.oftc.net).