On Mon, Sep 14, 2015 at 09:34:31PM -0400, Jay Pipes wrote: > On 09/10/2015 05:23 PM, Brent Eagles wrote: > >Hi, > > > >I was recently informed of a situation that came up when an engineer > >added an SR-IOV nic to a compute node that was hosting some guests that > >had VFs attached. Unfortunately, adding the card shuffled the PCI > >addresses causing some degree of havoc. Basically, the PCI addresses > >associated with the previously allocated VFs were no longer valid. > > > >I tend to consider this a non-issue. The expectation that hosts have > >relatively static hardware configuration (and kernel/driver configs for > >that matter) is the price you pay for having pets with direct hardware > >access. That being said, this did come as a surprise to some of those > >involved and I don't think we have any messaging around this or advice > >on how to deal with situations like this. > > > >So what should we do? I can't quite see altering OpenStack to deal with > >this situation (or even how that could work). Has anyone done any > >research into this problem, even if it is how to recover or extricate > >a guest that is no longer valid? It seems that at the very least we > >could use some stern warnings in the docs. > > Hi Brent, > > Interesting issue. We have code in the PCI tracker that ostensibly handles > this problem: > > https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L145-L164 > > But the note from yjiang5 is telling: > > # Pci properties may change while assigned because of > # hotplug or config changes. Although normally this should > # not happen. > # As the devices have been assigned to a instance, we defer > # the change till the instance is destroyed. We will > # not sync the new properties with database before that. > # TODO(yjiang5): Not sure if this is a right policy, but > # at least it avoids some confusion and, if > # we can add more action like killing the instance > # by force in future. > > Basically, if the PCI device tracker notices that an instance is assigned a > PCI device with an address that no longer exists in the PCI device addresses > returned from libvirt, it will (eventually, in the _free_instance() method) > remove the PCI device assignment from the Instance object, but it will make > no attempt to assign a new PCI device that meets the original PCI device > specification in the launch request. > > Should we handle this case and attempt a "hot re-assignment of a PCI > device"? Perhaps. Is it high priority? Not really, IMHO.
Hotplugging new PCI devices to a running host should not have any impact on existing PCI device addresses - it'll merely add new adddresses for new devices - existing devices are unchanged. So Everything should "just work" in that case. IIUC, Brent's Q was around turning off the host and cold-plugging/unplugging hardware, which /is/ liable to arbitrarily re-arrange existing PCI device addresses. > If you'd like to file a bug against Nova, that would be cool, though. I think it is explicitly out of scope for Nova to deal with this scenario. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev