Re: [libvirt PATCH v4 25/25] nodedev: fix hang when destroying an mdev in use
On Wed, Feb 03, 2021 at 11:39:09AM -0600, Jonathon Jongsma wrote: > Calling `mdevctl stop` for a mediated device that is in use by an active > domain will block until that vm exits (or the vm closes the device). > Since the nodedev driver cannot query the hypervisor driver to see > whether any active domains are using the device, we resort to a > workaround that relies on the fact that a vfio group can only be opened > by one user at a time. If we get an EBUSY error when attempting to open > the group file, we assume the device is in use and refuse to try to > destroy that device. > > Signed-off-by: Jonathon Jongsma > --- > src/node_device/node_device_driver.c | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/src/node_device/node_device_driver.c > b/src/node_device/node_device_driver.c > index bf97291041..e6b0213157 100644 > --- a/src/node_device/node_device_driver.c > +++ b/src/node_device/node_device_driver.c > @@ -1164,6 +1164,23 @@ nodeDeviceDestroy(virNodeDevicePtr device) > > ret = 0; > } else if (nodeDeviceHasCapability(def, VIR_NODE_DEV_CAP_MDEV)) { > +/* If this mediated device is in use by a vm, attempting to stop it > + * will block until the vm closes the device. Since the nodedev > driver > + * cannot query the hypervisor driver to determine whether the device A nice detailed commentary, but for future reference I'd still add the reason *why* it is that nodedev driver cannot poke the hypervisor driver. > + * is in use by any active domains, we need to resort to a > workaround. > + * vfio only allows the group for a device to be opened by one user > at > + * a time. So if we get EBUSY when opening the group, we infer that > the > + * device is in use and shouldn't try to remove the device. */ > +g_autofree char *vfiogroup = > +virMediatedDeviceGetIOMMUGroupDev(def->caps->data.mdev.uuid); > +VIR_AUTOCLOSE fd = open(vfiogroup, O_RDONLY); > + > +if (fd < 0 && errno == EBUSY) { > +virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > + _("Unable to destroy a mediated device that is in > use")); I think a slightly better message would look like: _("Unable to destroy '%s': device in use"), def->name This was a simple workaround indeed :). Reviewed-by: Erik Skultety
Re: [libvirt PATCH v4 24/25] nodedev: add ability to specify UUID for new mdevs
On Wed, Feb 03, 2021 at 11:39:08AM -0600, Jonathon Jongsma wrote: > Use the new element in the mdev caps to define and start devices > with a specific UUID. > > Signed-off-by: Jonathon Jongsma > --- > src/node_device/node_device_driver.c | 18 +++--- > ...019_36ea_4111_8f0a_8c9a70e21366-define.argv | 3 ++- > ...d019_36ea_4111_8f0a_8c9a70e21366-start.argv | 3 ++- > ...ev_d069d019_36ea_4111_8f0a_8c9a70e21366.xml | 1 + > 4 files changed, 20 insertions(+), 5 deletions(-) > > diff --git a/src/node_device/node_device_driver.c > b/src/node_device/node_device_driver.c > index d5cdf2b097..bf97291041 100644 > --- a/src/node_device/node_device_driver.c > +++ b/src/node_device/node_device_driver.c > @@ -728,6 +728,10 @@ > nodeDeviceGetMdevctlDefineStartCommand(virNodeDeviceDefPtr def, > NULL); > > virCommandSetInputBuffer(cmd, json); > + > +if (def->caps->data.mdev.uuid) > +virCommandAddArgPair(cmd, "--uuid", def->caps->data.mdev.uuid); > + > virCommandSetOutputBuffer(cmd, uuid_out); > > return cmd; > @@ -806,8 +810,12 @@ nodeDeviceCreateXMLMdev(virConnectPtr conn, > _("Unable to start mediated device")); > return NULL; > } > +if (uuid) { mdevctl returns an empty string when UUID was provided, so this has to become 'if (uuid && uuid[0])' > +g_free(def->caps->data.mdev.uuid); > +def->caps->data.mdev.uuid = g_steal_pointer(); > +} > > -return nodeDeviceFindNewMediatedDevice(conn, uuid); > +return nodeDeviceFindNewMediatedDevice(conn, def->caps->data.mdev.uuid); > } > > > @@ -1213,9 +1221,13 @@ nodeDeviceDefineXML(virConnectPtr conn, > return NULL; > } > > -def->caps->data.mdev.uuid = g_strdup(uuid); > +if (uuid) { ^Here too... Erik > +g_free(def->caps->data.mdev.uuid); > +def->caps->data.mdev.uuid = g_steal_pointer(); > +} > + > mdevGenerateDeviceName(def); > -device = nodeDeviceFindNewMediatedDevice(conn, uuid); > +device = nodeDeviceFindNewMediatedDevice(conn, > def->caps->data.mdev.uuid);
Re: [libvirt PATCH v4 23/25] nodedev: add element to mdev caps
On Wed, Feb 03, 2021 at 11:39:07AM -0600, Jonathon Jongsma wrote: > It will be useful to be able to specify a particular UUID for a mediated > device when defining the node device. To accomodate that, allow this to > be specified in the xml schema. This patch also parses and formats that > value to the xml, but does not yet use it. > > Signed-off-by: Jonathon Jongsma > --- ... > > +if ((uuidstr = virXPathString("string(./uuid[1])", ctxt))) { > +/* make sure that the provided uuid is valid */ > +if (virUUIDParse(uuidstr, uuidbuf) < 0) { > +virReportError(VIR_ERR_INTERNAL_ERROR, > + _("Invalid uuid '%s' for '%s'"), mdev->uuid, we don't have mdev->uuid at this point yet, so: s/mdev->uuid/uuidstr > + def->name); we also haven't generated any name yet, so ^this yields "new device". I'd suggest to hardcode the message with "new mdev device" instead of using def->name. Being able to specify UUIDs with mdevs directly is great. Reviewed-by: Erik Skultety
Re: [libvirt PATCH v4 21/25] api: add virNodeDeviceCreate()
On Wed, Feb 03, 2021 at 11:39:05AM -0600, Jonathon Jongsma wrote: > This new API function provides a way to start a persistently-defined > mediate device that was defined by virNodeDeviceDefineXML() (or one that > was defined externally via mdevctl) > > Signed-off-by: Jonathon Jongsma > --- ... > +int nodeDeviceCreate(virNodeDevicePtr device) > +{ > +int ret = -1; > +virNodeDeviceObjPtr obj = NULL; > +virNodeDeviceDefPtr def; ...actually, please initialize ^this pointer as well to remain consistent. Erik (The RB still stands)
Re: [libvirt PATCH v4 22/25] virsh: add "nodedev-start" command
On Wed, Feb 03, 2021 at 11:39:06AM -0600, Jonathon Jongsma wrote: > This virsh command maps to virNodeDeviceCreate(), which starts a node > device that has been previously defined by virNodeDeviceDefineXML(). > This is only supported for mediated devices. > > Signed-off-by: Jonathon Jongsma > --- Reviewed-by: Erik Skultety
Re: [libvirt PATCH v4 20/25] virsh: add nodedev-undefine command
On Wed, Feb 03, 2021 at 11:39:04AM -0600, Jonathon Jongsma wrote: > Add a virsh command that maps to virNodeDeviceUndefine(). > > Signed-off-by: Jonathon Jongsma Reviewed-by: Erik Skultety
Re: [libvirt PATCH v4 15/25] virsh: Add --inactive, --all to nodedev-list
On Wed, Feb 03, 2021 at 11:38:59AM -0600, Jonathon Jongsma wrote: > Now that we can filter active and inactive node devices in > virConnectListAllNodeDevices(), add these switches to the virsh command. > > Eventual output (once everything is hooked up): > > virsh # nodedev-list --cap mdev > mdev_bd2ea955_3402_4252_8c17_7468083a0f26 > > virsh # nodedev-list --inactive --cap mdev > mdev_07d8b8b0_7e04_4c0f_97ed_9214ce12723c > mdev_927c040f_ae7d_4a35_966e_286ba6ebbe1c > > virsh # nodedev-list --all --cap mdev > mdev_07d8b8b0_7e04_4c0f_97ed_9214ce12723c > mdev_927c040f_ae7d_4a35_966e_286ba6ebbe1c > mdev_bd2ea955_3402_4252_8c17_7468083a0f26 > > Signed-off-by: Jonathon Jongsma > --- > tools/virsh-nodedev.c | 30 ++ > 1 file changed, 26 insertions(+), 4 deletions(-) > > diff --git a/tools/virsh-nodedev.c b/tools/virsh-nodedev.c > index 428ead7384..a2e83fb676 100644 > --- a/tools/virsh-nodedev.c > +++ b/tools/virsh-nodedev.c > @@ -378,6 +378,14 @@ static const vshCmdOptDef opts_node_list_devices[] = { > .completer = virshNodeDeviceCapabilityNameCompleter, > .help = N_("capability names, separated by comma") > }, > +{.name = "inactive", > + .type = VSH_OT_BOOL, > + .help = N_("list inactive devices") > +}, > +{.name = "all", > + .type = VSH_OT_BOOL, > + .help = N_("list inactive & active devices") > +}, > {.name = NULL} > }; > > @@ -393,18 +401,27 @@ cmdNodeListDevices(vshControl *ctl, const vshCmd *cmd > G_GNUC_UNUSED) > int ncaps = 0; > virshNodeDeviceListPtr list = NULL; > int cap_type = -1; > +bool inactive, all; 1 declaration per line... > > +inactive = vshCommandOptBool(cmd, "inactive"); > +all = vshCommandOptBool(cmd, "all"); ...also ^these 2 can be used to initialize the variables at their definition. Reviewed-by: Erik Skultety
Re: [PATCH RESEND 20/20] virhostdev.c: remove missing PCI devs from hostdev manager
On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote: virHostdevReAttachPCIDevices() is called when we want to re-attach a list of hostdevs back to the host, either on the shutdown path or via a 'virsh detach-device' call. This function always count on the existence of the device in the host to work, but this can lead to problems. For example, a SR-IOV device can be removed via an admin "echo 0 > /sys/bus/pci/devices//sriov_numvfs", making the kernel fire up and eventfd_signal() to the process, asking for the process to release the device. The result might vary depending on the device driver and OS/arch, but two possible outcomes are: 1) the hypervisor driver will detach the device from the VM, issuing a delete event to Libvirt. This can be observed in QEMU; 2) the 'echo 0 > ...' will hang waiting for the device to be unplugged. This means that the VM process failed/refused to release the hostdev back to the host, and the hostdev will be detached during VM shutdown. Today we don't behave well for both cases. We'll fail to remove the PCI device reference from mgr->activePCIHostdevs and mgr->inactivePCIHostdevs because we rely on the existence of the PCI device conf file in the sysfs. Attempting to re-utilize the same device (assuming it is now present back in the host) can result in an error like this: $ ./run tools/virsh start vm1-sriov --console error: Failed to start domain vm1-sriov error: Requested operation is not valid: PCI device :01:00.2 is in use by driver QEMU, domain vm1-sriov For (1), a VM destroy/start cycle is needed to re-use the VF in the guest. For (2), the effect is more nefarious, requiring a Libvirtd daemon restart to use the VF again in any guest. We can make it a bit better by checking, during virHostdevReAttachPCIDevices(), if there is any missing PCI device that will be left behind in activePCIHostdevs and inactivePCIHostdevs lists. Remove any missing device found from both lists, unconditionally, matching the current state of the host. This change affects the code path in (1) (processDeviceDeletedEvent into qemuDomainRemoveDevice, all the way back to qemuHostdevReAttachPCIDevices) and also in (b) (qemuProcessStop into qemuHostdevReAttachDomainDevices). NB: Although this patch enables the possibility of 'outside Libvirt' SR-IOV hotunplug of PCI devices, if the hypervisor and the PCI driver copes with it, our goal is to mitigate what it is still considered an user oopsie. For all'' This is one of those odd cases where you use "a" before a word starting with a vowel rather than "an". It's that way because the *sound* starting the next word is "you" rather than "ooh". supported purposes, the admin must remove the SR-IOV VFs from all running domains before removing the VFs from the host. Resolves: https://gitlab.com/libvirt/libvirt/-/issues/72 Signed-off-by: Daniel Henrique Barboza --- src/hypervisor/virhostdev.c | 38 + 1 file changed, 38 insertions(+) diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c index c708791eec..ed43733e71 100644 --- a/src/hypervisor/virhostdev.c +++ b/src/hypervisor/virhostdev.c @@ -1077,6 +1077,40 @@ virHostdevReAttachPCIDevicesImpl(virHostdevManagerPtr mgr, } +static void +virHostdevDeleteMissingPCIDevices(virHostdevManagerPtr mgr, + virDomainHostdevDefPtr *hostdevs, + int nhostdevs) +{ +size_t i; + +if (nhostdevs == 0) +return; + +virObjectLock(mgr->activePCIHostdevs); +virObjectLock(mgr->inactivePCIHostdevs); + +for (i = 0; i < nhostdevs; i++) { +virDomainHostdevDef *hostdev = hostdevs[i]; +virDomainHostdevSubsysPCI *pcisrc = >source.subsys.u.pci; +g_autoptr(virPCIDevice) pci = NULL; + +if (virHostdevGetPCIHostDevice(hostdev, ) != -2) +continue; + +/* The PCI device from 'hostdev' does not exist in the host + * anymore. Delete it from both active and inactive lists to + * reflect the current host state. + */ +virPCIDeviceListDel(mgr->activePCIHostdevs, >addr); +virPCIDeviceListDel(mgr->inactivePCIHostdevs, >addr); +} + +virObjectUnlock(mgr->activePCIHostdevs); +virObjectUnlock(mgr->inactivePCIHostdevs); It makes me nervous that you're unlocking these lists in the same order that you lock them, since that could theoretically lead to a deadlock. I haven't even thought beyond this single function to figure out if it's even a possibility in this case, but still would feel better if you unlocked in the opposite order that you lock. With that fixed: Reviewed-by: Laine Stump +} + + /* @oldStateDir: * For upgrade purpose: see virHostdevRestoreNetConfig */ @@ -1102,6 +1136,10 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr mgr, virHostdevReAttachPCIDevicesImpl(mgr, drv_name, dom_name, pcidevs, hostdevs,
Re: [PATCH RESEND 17/20] virpci.c: use virPCIDeviceAddressPtr in virPCIDeviceListFind()
On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote: Signed-off-by: Daniel Henrique Barboza --- src/hypervisor/virhostdev.c | 12 src/util/virpci.c | 16 src/util/virpci.h | 2 +- 3 files changed, 17 insertions(+), 13 deletions(-) diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c index 402d7be42d..3bfb04c674 100644 --- a/src/hypervisor/virhostdev.c +++ b/src/hypervisor/virhostdev.c @@ -657,7 +657,8 @@ virHostdevReattachAllPCIDevices(virHostdevManagerPtr mgr, /* We need to look up the actual device because that's what * virPCIDeviceReattach() expects as its argument */ -if (!(actual = virPCIDeviceListFind(mgr->inactivePCIHostdevs, pci))) +if (!(actual = virPCIDeviceListFind(mgr->inactivePCIHostdevs, +virPCIDeviceGetAddress(pci continue; if (virPCIDeviceGetManaged(actual)) { @@ -777,7 +778,8 @@ virHostdevPreparePCIDevicesImpl(virHostdevManagerPtr mgr, /* Unmanaged devices should already have been marked as * inactive: if that's the case, we can simply move on */ -if (virPCIDeviceListFind(mgr->inactivePCIHostdevs, pci)) { +if (virPCIDeviceListFind(mgr->inactivePCIHostdevs, + virPCIDeviceGetAddress(pci))) { Wondered at first why you were using a access function to get the pointer rather than just referencing it directly. I'd forgotten that virPCIDevice is one of those rare "private structures" in libvirt - defined only in a single .c file. Anyway, Reviewed-by: Laine Stump VIR_DEBUG("Not detaching unmanaged PCI device %s", virPCIDeviceGetName(pci)); continue; @@ -860,7 +862,8 @@ virHostdevPreparePCIDevicesImpl(virHostdevManagerPtr mgr, * there because 'pci' only contain address information and will * be released at the end of the function */ pci = virPCIDeviceListGet(pcidevs, i); -actual = virPCIDeviceListFind(mgr->activePCIHostdevs, pci); +actual = virPCIDeviceListFind(mgr->activePCIHostdevs, + virPCIDeviceGetAddress(pci)); VIR_DEBUG("Setting driver and domain information for PCI device %s", virPCIDeviceGetName(pci)); @@ -992,7 +995,8 @@ virHostdevReAttachPCIDevicesImpl(virHostdevManagerPtr mgr, * information such as by which domain and driver it is used. As a * side effect, by looking it up we can also tell whether it was * really active in the first place */ -actual = virPCIDeviceListFind(mgr->activePCIHostdevs, pci); +actual = virPCIDeviceListFind(mgr->activePCIHostdevs, + virPCIDeviceGetAddress(pci)); if (actual) { const char *actual_drvname; const char *actual_domname; diff --git a/src/util/virpci.c b/src/util/virpci.c index 1554acffb6..9544275c31 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -705,7 +705,7 @@ virPCIDeviceSharesBusWithActive(virPCIDevicePtr dev, virPCIDevicePtr check, void return 0; /* same bus, but inactive, i.e. about to be assigned to guest */ -if (inactiveDevs && virPCIDeviceListFind(inactiveDevs, check)) +if (inactiveDevs && virPCIDeviceListFind(inactiveDevs, >address)) return 0; return 1; @@ -1022,7 +1022,7 @@ virPCIDeviceReset(virPCIDevicePtr dev, return -1; } -if (activeDevs && virPCIDeviceListFind(activeDevs, dev)) { +if (activeDevs && virPCIDeviceListFind(activeDevs, >address)) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Not resetting active device %s"), dev->name); return -1; @@ -1294,7 +1294,7 @@ virPCIDeviceDetach(virPCIDevicePtr dev, if (virPCIProbeStubDriver(dev->stubDriver) < 0) return -1; -if (activeDevs && virPCIDeviceListFind(activeDevs, dev)) { +if (activeDevs && virPCIDeviceListFind(activeDevs, >address)) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Not detaching active device %s"), dev->name); return -1; @@ -1306,7 +1306,7 @@ virPCIDeviceDetach(virPCIDevicePtr dev, /* Add *a copy of* the dev into list inactiveDevs, if * it's not already there. */ -if (inactiveDevs && !virPCIDeviceListFind(inactiveDevs, dev)) { +if (inactiveDevs && !virPCIDeviceListFind(inactiveDevs, >address)) { VIR_DEBUG("Adding PCI device %s to inactive list", dev->name); if (virPCIDeviceListAddCopy(inactiveDevs, dev) < 0) return -1; @@ -1324,7 +1324,7 @@ virPCIDeviceReattach(virPCIDevicePtr dev, virPCIDeviceListPtr activeDevs, virPCIDeviceListPtr inactiveDevs) { -if (activeDevs &&
Re: [PATCH RESEND 18/20] virpci.c: use virPCIDeviceAddressPtr in virPCIDeviceListSteal()
On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote: Signed-off-by: Daniel Henrique Barboza Reviewed-by: Laine Stump
Re: [PATCH RESEND 19/20] virpci.c: use virPCIDeviceAddressPtr in virPCIDeviceListDel()
On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote: This change will allow us to remove PCI devices from a list without the need of a PCI Device object, which will be need in the next patch. Signed-off-by: Daniel Henrique Barboza Reviewed-by: Laine Stump
Re: [PATCH RESEND 16/20] virpci.c: use virPCIDeviceAddressPtr in virPCIDeviceListFindIndex()
On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote: We're going to need a way to remove a PCI Device from a list without having a valid virPCIDevicePtr, because the device is missing from the host. This means that virPCIDevicesListDel() must operate with a PCI Device address instead. Turns out that virPCIDevicesListDel() and its related functions only use the virPCIDeviceAddressPtr of the virPCIDevicePtr, so this change is simple to do and will not cause hassle in all other callers. Let's start adapting virPCIDeviceListFindIndex() and crawl our way up to virPCIDevicesListDel(). Signed-off-by: Daniel Henrique Barboza Reviewed-by: Laine Stump (also Declared-should-have-been-done-this-way-in-the-first-place-by: Laine Stump :-)) --- src/util/virpci.c | 15 --- src/util/virpci.h | 2 +- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/src/util/virpci.c b/src/util/virpci.c index 7143380348..1554acffb6 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1745,7 +1745,7 @@ virPCIDevicePtr virPCIDeviceListSteal(virPCIDeviceListPtr list, virPCIDevicePtr dev) { -return virPCIDeviceListStealIndex(list, virPCIDeviceListFindIndex(list, dev)); +return virPCIDeviceListStealIndex(list, virPCIDeviceListFindIndex(list, >address)); } void @@ -1756,16 +1756,17 @@ virPCIDeviceListDel(virPCIDeviceListPtr list, } int -virPCIDeviceListFindIndex(virPCIDeviceListPtr list, virPCIDevicePtr dev) +virPCIDeviceListFindIndex(virPCIDeviceListPtr list, + virPCIDeviceAddressPtr devAddr) { size_t i; for (i = 0; i < list->count; i++) { virPCIDevicePtr other = list->devs[i]; -if (other->address.domain == dev->address.domain && -other->address.bus == dev->address.bus&& -other->address.slot == dev->address.slot && -other->address.function == dev->address.function) +if (other->address.domain == devAddr->domain && +other->address.bus == devAddr->bus&& +other->address.slot == devAddr->slot && +other->address.function == devAddr->function) return i; } return -1; @@ -1798,7 +1799,7 @@ virPCIDeviceListFind(virPCIDeviceListPtr list, virPCIDevicePtr dev) { int idx; -if ((idx = virPCIDeviceListFindIndex(list, dev)) >= 0) +if ((idx = virPCIDeviceListFindIndex(list, >address)) >= 0) return list->devs[idx]; else return NULL; diff --git a/src/util/virpci.h b/src/util/virpci.h index a9c597a428..8c6776da21 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -175,7 +175,7 @@ virPCIDeviceListFindByIDs(virPCIDeviceListPtr list, unsigned int slot, unsigned int function); int virPCIDeviceListFindIndex(virPCIDeviceListPtr list, - virPCIDevicePtr dev); + virPCIDeviceAddressPtr devAddr); /* * Callback that will be invoked once for each file
Re: [PATCH RESEND 15/20] qemu_cgroup.c: skip absent PCI devices in qemuTeardownHostdevCgroup()
On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote: There is no need to bother with cgroup tearing down for absent PCI devices, given that their entries in the sysfs are already gone. Signed-off-by: Daniel Henrique Barboza --- src/qemu/qemu_cgroup.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index f7146a71c9..050df21d87 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -467,6 +467,16 @@ qemuTeardownHostdevCgroup(virDomainObjPtr vm, if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) return 0; +/* Skip tearing down Cgroup for hostdevs that represents absent + * PCI devices, e.g. SR-IOV virtual functions that were removed from + * the host while the domain was still running. */ +if (virHostdevIsPCIDevice(dev)) { +const virDomainHostdevSubsysPCI *pcisrc = >source.subsys.u.pci; + +if (!virPCIDeviceExists(>addr)) +return 0; I would have skipped creating the temprorary variable, since it's only used once, but.. eh. Potato, potahtoe. Reviewed-by: Laine Stump +} + if (qemuDomainGetHostdevPath(dev, , NULL) < 0) return -1;
Re: [PATCH RESEND 14/20] virhostdev.c: add virHostdevIsPCIDevice() helper
On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote: Add a helper to quickly determine if a hostdev is a PCI device, instead of doing a tedius 'if' check with hostdev mode and s/tedius/tedious/ subsys type. Signed-off-by: Daniel Henrique Barboza Reviewed-by: Laine Stump
Re: [PATCH RESEND 13/20] virsh-domain.c: modernize cmdDetachDevice()
On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote: Use g_auto* pointers to avoid the need of a cleanup label. The type of the pointer 'virDomainPtr dom' was changed to its alias 'virshDomainPtr' to allow the use of g_autoptr(). Signed-off-by: Daniel Henrique Barboza Reviewed-by: Laine Stump
Re: [PATCH RESEND 11/20] qemu_driver.c: modernize qemuNodeDeviceReAttach()
On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote: Add virObjectUnref an autoptr cleanup func for virNodeDevice, then remove all unref and free calls from qemuNodeDeviceReAttach(). Signed-off-by: Daniel Henrique Barboza --- src/datatypes.h| 2 ++ src/qemu/qemu_driver.c | 32 2 files changed, 14 insertions(+), 20 deletions(-) diff --git a/src/datatypes.h b/src/datatypes.h index ade3779e43..7a88aba0df 100644 --- a/src/datatypes.h +++ b/src/datatypes.h @@ -707,6 +707,8 @@ struct _virNodeDevice { char *parentName; /* parent device name */ }; +G_DEFINE_AUTOPTR_CLEANUP_FUNC(virNodeDevice, virObjectUnref); It may seem like overkill, but I think this ^^^ should be added in a separate patch. That way if some other patch that uses g_autoptr(virNodeDevice) needs to be backported to a downstream release that doesn't want to take the rest of this patch's refactoring of qemuNodeDeviceReAttach(), they can do it. Reviewed-by: Laine Stump but split the above line into a separate patch (which you can also put my R-b on) + /** * _virSecret: * diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 7581e3c8cb..f9aa93fa3e 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -12045,17 +12045,16 @@ static int qemuNodeDeviceReAttach(virNodeDevicePtr dev) { virQEMUDriverPtr driver = dev->conn->privateData; -virPCIDevicePtr pci = NULL; +g_autoptr(virPCIDevice) pci = NULL; virPCIDeviceAddress devAddr; -int ret = -1; -virNodeDeviceDefPtr def = NULL; +g_autoptr(virNodeDeviceDef) def = NULL; g_autofree char *xml = NULL; virHostdevManagerPtr hostdev_mgr = driver->hostdevMgr; -virConnectPtr nodeconn = NULL; -virNodeDevicePtr nodedev = NULL; +g_autoptr(virConnect) nodeconn = NULL; +g_autoptr(virNodeDevice) nodedev = NULL; if (!(nodeconn = virGetConnectNodeDev())) -goto cleanup; +return -1; /* 'dev' is associated with the QEMU virConnectPtr, * so for split daemons, we need to get a copy that @@ -12063,36 +12062,29 @@ qemuNodeDeviceReAttach(virNodeDevicePtr dev) */ if (!(nodedev = virNodeDeviceLookupByName( nodeconn, virNodeDeviceGetName(dev -goto cleanup; +return -1; xml = virNodeDeviceGetXMLDesc(nodedev, 0); if (!xml) -goto cleanup; +return -1; def = virNodeDeviceDefParseString(xml, EXISTING_DEVICE, NULL); if (!def) -goto cleanup; +return -1; /* ACL check must happen against original 'dev', * not the new 'nodedev' we acquired */ if (virNodeDeviceReAttachEnsureACL(dev->conn, def) < 0) -goto cleanup; +return -1; if (virDomainDriverNodeDeviceGetPCIInfo(def, ) < 0) -goto cleanup; +return -1; pci = virPCIDeviceNew(); if (!pci) -goto cleanup; - -ret = virHostdevPCINodeDeviceReAttach(hostdev_mgr, pci); +return -1; -virPCIDeviceFree(pci); - cleanup: -virNodeDeviceDefFree(def); -virObjectUnref(nodedev); -virObjectUnref(nodeconn); -return ret; +return virHostdevPCINodeDeviceReAttach(hostdev_mgr, pci); } static int
Re: [PATCH RESEND 10/20] libvirt-nodedev.c: remove return value from virNodeDeviceFree()
On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote: The function returns -1 on error, but no caller is actually checking the return value. Making it 'void' makes more sense with its current use. Signed-off-by: Daniel Henrique Barboza NAK - you can't change a public function. --- include/libvirt/libvirt-nodedev.h | 2 +- src/libvirt-nodedev.c | 15 +++ 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/include/libvirt/libvirt-nodedev.h b/include/libvirt/libvirt-nodedev.h index eab8abf6ab..5634980a75 100644 --- a/include/libvirt/libvirt-nodedev.h +++ b/include/libvirt/libvirt-nodedev.h @@ -114,7 +114,7 @@ char * virNodeDeviceGetXMLDesc (virNodeDevicePtr dev, unsigned int flags); int virNodeDeviceRef(virNodeDevicePtr dev); -int virNodeDeviceFree (virNodeDevicePtr dev); +voidvirNodeDeviceFree (virNodeDevicePtr dev); int virNodeDeviceDettach(virNodeDevicePtr dev); int virNodeDeviceDetachFlags(virNodeDevicePtr dev, diff --git a/src/libvirt-nodedev.c b/src/libvirt-nodedev.c index eb8c735a8c..fcca40f47b 100644 --- a/src/libvirt-nodedev.c +++ b/src/libvirt-nodedev.c @@ -445,19 +445,26 @@ virNodeDeviceListCaps(virNodeDevicePtr dev, * Drops a reference to the node device, freeing it if * this was the last reference. * - * Returns the 0 for success, -1 for error. + * Throws a VIR_ERR_INVALID_NODE_DEVICE error if @dev is + * not a valid node device. Does nothing if @dev is + * NULL. */ -int +void virNodeDeviceFree(virNodeDevicePtr dev) { +if (!dev) +return; + VIR_DEBUG("dev=%p, conn=%p", dev, dev ? dev->conn : NULL); virResetLastError(); -virCheckNodeDeviceReturn(dev, -1); +virCheckNodeDeviceGoto(dev, invalid_device); virObjectUnref(dev); -return 0; + + invalid_device: +return; }
Re: [PATCH RESEND 09/20] dac, selinux: skip setting/restoring label for absent PCI devices
On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote: If the underlying PCI device of a hostdev does not exist in the host (e.g. a SR-IOV VF that was removed while the domain was running), skip security label handling for it. This will avoid errors that happens during qemuProcessStop() time, where a VF that was being used by the domain is not present anymore. The restore label functions of both DAC and SELinux drivers will trigger errors in virPCIDeviceNew(). Signed-off-by: Daniel Henrique Barboza This is fine as far as it goes, but what about AppArmorSetSecurityHostdevLabel() (and also whatever it is that's going on in the get_files() function in virt-aa-helper.c). You likely don't have a setup to test it, but it seems pretty straightforward to extrapolate that if you should skip setting the selinux and DAC labels when a device doesn't exist, you can probably do the same thing for AppArmor. Reviewed-by: Laine Stump but add another patch that fixes the problem for AppArmor too. (Also, all the code repetition here makes me think that there must be a better way to do this and reduce all the boilerplate, but I think it would be better to just make these changes and then look at eliminating the boilerplate afterwards, rather than re-structuring the code (which could create regressions) while adding this new concept on top of it. --- src/security/security_dac.c | 14 -- src/security/security_selinux.c | 14 -- 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/src/security/security_dac.c b/src/security/security_dac.c index 0085982bb1..a2528aeb2d 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -1266,7 +1266,12 @@ virSecurityDACSetHostdevLabel(virSecurityManagerPtr mgr, } case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: { -g_autoptr(virPCIDevice) pci = virPCIDeviceNew(>addr); +g_autoptr(virPCIDevice) pci = NULL; + +if (!virPCIDeviceExists(>addr)) +break; + +pci = virPCIDeviceNew(>addr); if (!pci) return -1; @@ -1422,7 +1427,12 @@ virSecurityDACRestoreHostdevLabel(virSecurityManagerPtr mgr, } case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: { -g_autoptr(virPCIDevice) pci = virPCIDeviceNew(>addr); +g_autoptr(virPCIDevice) pci = NULL; + +if (!virPCIDeviceExists(>addr)) +break; + +pci = virPCIDeviceNew(>addr); if (!pci) return -1; diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index 99adf08a15..c018c0708a 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -2101,7 +2101,12 @@ virSecuritySELinuxSetHostdevSubsysLabel(virSecurityManagerPtr mgr, } case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: { -g_autoptr(virPCIDevice) pci = virPCIDeviceNew(>addr); +g_autoptr(virPCIDevice) pci = NULL; + +if (!virPCIDeviceExists(>addr)) +break; + +pci = virPCIDeviceNew(>addr); if (!pci) return -1; @@ -2329,7 +2334,12 @@ virSecuritySELinuxRestoreHostdevSubsysLabel(virSecurityManagerPtr mgr, } case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: { -g_autoptr(virPCIDevice) pci = virPCIDeviceNew(>addr); +g_autoptr(virPCIDevice) pci = NULL; + +if (!virPCIDeviceExists(>addr)) +break; + +pci = virPCIDeviceNew(>addr); if (!pci) return -1;
Re: [PATCH RESEND 06/20] virhostdev.c: virHostdevGetPCIHostDevice() now reports missing device
On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote: Gitlab issue #72 [1] reports that removing SR-IOVs VFs before removing the devices from the running domains can have strange consequences. QEMU might be able to hotunplug the device inside the guest, but Libvirt will not be aware of that, and then the guest is now inconsistent with the domain definition. There's also the possibility of the VFs removal not succeeding while the domain is running but then, as soon as the domain is shutdown, all the VFs are removed. Libvirt can't handle the removal of the PCI devices while trying to reattach the hostdevs, and the Libvirt daemon can be left in an inconsistent state (see [2]). This patch starts to address the issue related in Gitlab #72, most notably the issue described in [2]. When shutting down a domain with SR-IOV hostdevs that got missing, virHostdevReAttachPCIDevices() is failing the whole process and failing to reattach all the PCI devices, including the ones that aren't related to the VFs that went missing. Let's make it more resilient with host changes by changing virHostdevGetPCIHostDevice() to return an exclusive error code '-2' for this case. virHostdevGetPCIHostDeviceList() can then tell when virHostdevGetPCIHostDevice() failed to find the PCI device of a hostdev and continue to make the list of PCI devices. virHostdevReAttachPCIDevices() will now be able to proceed reattaching all other valid PCI devices, at least. The 'ghost hostdevs' will be handled later on. [1] https://gitlab.com/libvirt/libvirt/-/issues/72 [2] https://gitlab.com/libvirt/libvirt/-/issues/72#note_459032148 Signed-off-by: Daniel Henrique Barboza --- src/hypervisor/virhostdev.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c index bd35397f2c..dbba36193b 100644 --- a/src/hypervisor/virhostdev.c +++ b/src/hypervisor/virhostdev.c @@ -220,7 +220,8 @@ virHostdevManagerGetDefault(void) * is returned. * * Returns: 0 on success (@pci might be NULL though), - * -1 otherwise (with error reported). + * -1 otherwise (with error reported), + * -2 PCI device not found. @pci will be NULL */ static int virHostdevGetPCIHostDevice(const virDomainHostdevDef *hostdev, @@ -235,6 +236,9 @@ virHostdevGetPCIHostDevice(const virDomainHostdevDef *hostdev, hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) return 0; +if (!virPCIDeviceExists(>addr)) +return -2; You've created a special return code to mean "This is a PCI device, but it isn't present on the host"... + actual = virPCIDeviceNew(>addr); if (!actual) @@ -270,7 +274,7 @@ virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr *hostdevs, int nhostdevs) virDomainHostdevDefPtr hostdev = hostdevs[i]; g_autoptr(virPCIDevice) pci = NULL; -if (virHostdevGetPCIHostDevice(hostdev, ) < 0) +if (virHostdevGetPCIHostDevice(hostdev, ) == -1) return NULL; ...But you haven't actually used it. Will it become necessary later to have this special value? right now a missing device is treated exactly the same as if it isn't a PCI device. I guess I can understand the conceptual desire to return an error of some type in this case though, and there are other places where something similar is done (-2 indicates some type of "odd" error), so I'll let it pass :-) Reviewed-by: Laine Stump if (!pci)