On 09/10/2015 05:23 PM, Brent Eagles wrote:
Hi,
I was recently informed of a situation that came up when an engineer
added an SR-IOV nic to a compute node that was hosting some guests that
had VFs attached. Unfortunately, adding the card shuffled the PCI
addresses causing some degree of havoc. Basically, the PCI addresses
associated with the previously allocated VFs were no longer valid.
I tend to consider this a non-issue. The expectation that hosts have
relatively static hardware configuration (and kernel/driver configs for
that matter) is the price you pay for having pets with direct hardware
access. That being said, this did come as a surprise to some of those
involved and I don't think we have any messaging around this or advice
on how to deal with situations like this.
So what should we do? I can't quite see altering OpenStack to deal with
this situation (or even how that could work). Has anyone done any
research into this problem, even if it is how to recover or extricate
a guest that is no longer valid? It seems that at the very least we
could use some stern warnings in the docs.
Hi Brent,
Interesting issue. We have code in the PCI tracker that ostensibly
handles this problem:
https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L145-L164
But the note from yjiang5 is telling:
# Pci properties may change while assigned because of
# hotplug or config changes. Although normally this should
# not happen.
# As the devices have been assigned to a instance, we defer
# the change till the instance is destroyed. We will
# not sync the new properties with database before that.
# TODO(yjiang5): Not sure if this is a right policy, but
# at least it avoids some confusion and, if
# we can add more action like killing the instance
# by force in future.
Basically, if the PCI device tracker notices that an instance is
assigned a PCI device with an address that no longer exists in the PCI
device addresses returned from libvirt, it will (eventually, in the
_free_instance() method) remove the PCI device assignment from the
Instance object, but it will make no attempt to assign a new PCI device
that meets the original PCI device specification in the launch request.
Should we handle this case and attempt a "hot re-assignment of a PCI
device"? Perhaps. Is it high priority? Not really, IMHO.
If you'd like to file a bug against Nova, that would be cool, though.
Best,
-jay
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev