Re: [PATCH RESEND 20/20] virhostdev.c: remove missing PCI devs from hostdev manager
On 2/22/21 4:11 PM, Daniel Henrique Barboza wrote: I learned this 20+ years ago and forgot it along the road. Heavy metal lyrics and World of Warcraft public chat aren't the best sources for learning proper English grammar, so it seems. I learned Turkish from taxi drivers and random people at the bus terminal, and that worked out pretty well for me. Food for thought ;-) Thanks for the reviews Laine. I dropped the patch you NACKED and fixed everything else following your suggestions. We're in freeze ATM and this isn't critical or urgent, so I'll wait for the next release before pushing it. Yeah, sorry again that it took so long. I tend to need constant prodding to get anything done...
Re: [PATCH RESEND 20/20] virhostdev.c: remove missing PCI devs from hostdev manager
On 2/22/21 1:12 AM, Laine Stump wrote: On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote: virHostdevReAttachPCIDevices() is called when we want to re-attach a list of hostdevs back to the host, either on the shutdown path or via a 'virsh detach-device' call. This function always count on the existence of the device in the host to work, but this can lead to problems. For example, a SR-IOV device can be removed via an admin "echo 0 > /sys/bus/pci/devices//sriov_numvfs", making the kernel fire up and eventfd_signal() to the process, asking for the process to release the device. The result might vary depending on the device driver and OS/arch, but two possible outcomes are: 1) the hypervisor driver will detach the device from the VM, issuing a delete event to Libvirt. This can be observed in QEMU; 2) the 'echo 0 > ...' will hang waiting for the device to be unplugged. This means that the VM process failed/refused to release the hostdev back to the host, and the hostdev will be detached during VM shutdown. Today we don't behave well for both cases. We'll fail to remove the PCI device reference from mgr->activePCIHostdevs and mgr->inactivePCIHostdevs because we rely on the existence of the PCI device conf file in the sysfs. Attempting to re-utilize the same device (assuming it is now present back in the host) can result in an error like this: $ ./run tools/virsh start vm1-sriov --console error: Failed to start domain vm1-sriov error: Requested operation is not valid: PCI device :01:00.2 is in use by driver QEMU, domain vm1-sriov For (1), a VM destroy/start cycle is needed to re-use the VF in the guest. For (2), the effect is more nefarious, requiring a Libvirtd daemon restart to use the VF again in any guest. We can make it a bit better by checking, during virHostdevReAttachPCIDevices(), if there is any missing PCI device that will be left behind in activePCIHostdevs and inactivePCIHostdevs lists. Remove any missing device found from both lists, unconditionally, matching the current state of the host. This change affects the code path in (1) (processDeviceDeletedEvent into qemuDomainRemoveDevice, all the way back to qemuHostdevReAttachPCIDevices) and also in (b) (qemuProcessStop into qemuHostdevReAttachDomainDevices). NB: Although this patch enables the possibility of 'outside Libvirt' SR-IOV hotunplug of PCI devices, if the hypervisor and the PCI driver copes with it, our goal is to mitigate what it is still considered an user oopsie. For all'' This is one of those odd cases where you use "a" before a word starting with a vowel rather than "an". It's that way because the *sound* starting the next word is "you" rather than "ooh". Oh yeah thanks for pointing this out. I learned this 20+ years ago and forgot it along the road. Heavy metal lyrics and World of Warcraft public chat aren't the best sources for learning proper English grammar, so it seems. supported purposes, the admin must remove the SR-IOV VFs from all running domains before removing the VFs from the host. Resolves: https://gitlab.com/libvirt/libvirt/-/issues/72 Signed-off-by: Daniel Henrique Barboza --- src/hypervisor/virhostdev.c | 38 + 1 file changed, 38 insertions(+) diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c index c708791eec..ed43733e71 100644 --- a/src/hypervisor/virhostdev.c +++ b/src/hypervisor/virhostdev.c @@ -1077,6 +1077,40 @@ virHostdevReAttachPCIDevicesImpl(virHostdevManagerPtr mgr, } +static void +virHostdevDeleteMissingPCIDevices(virHostdevManagerPtr mgr, + virDomainHostdevDefPtr *hostdevs, + int nhostdevs) +{ + size_t i; + + if (nhostdevs == 0) + return; + + virObjectLock(mgr->activePCIHostdevs); + virObjectLock(mgr->inactivePCIHostdevs); + + for (i = 0; i < nhostdevs; i++) { + virDomainHostdevDef *hostdev = hostdevs[i]; + virDomainHostdevSubsysPCI *pcisrc = >source.subsys.u.pci; + g_autoptr(virPCIDevice) pci = NULL; + + if (virHostdevGetPCIHostDevice(hostdev, ) != -2) + continue; + + /* The PCI device from 'hostdev' does not exist in the host + * anymore. Delete it from both active and inactive lists to + * reflect the current host state. + */ + virPCIDeviceListDel(mgr->activePCIHostdevs, >addr); + virPCIDeviceListDel(mgr->inactivePCIHostdevs, >addr); + } + + virObjectUnlock(mgr->activePCIHostdevs); + virObjectUnlock(mgr->inactivePCIHostdevs); It makes me nervous that you're unlocking these lists in the same order that you lock them, since that could theoretically lead to a deadlock. I haven't even thought beyond this single function to figure out if it's even a possibility in this case, but still would feel better if you unlocked in the opposite order that you lock. That was a huge oversight. I've fixed it to unlock in the reverse order.
Re: [PATCH RESEND 20/20] virhostdev.c: remove missing PCI devs from hostdev manager
On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote: virHostdevReAttachPCIDevices() is called when we want to re-attach a list of hostdevs back to the host, either on the shutdown path or via a 'virsh detach-device' call. This function always count on the existence of the device in the host to work, but this can lead to problems. For example, a SR-IOV device can be removed via an admin "echo 0 > /sys/bus/pci/devices//sriov_numvfs", making the kernel fire up and eventfd_signal() to the process, asking for the process to release the device. The result might vary depending on the device driver and OS/arch, but two possible outcomes are: 1) the hypervisor driver will detach the device from the VM, issuing a delete event to Libvirt. This can be observed in QEMU; 2) the 'echo 0 > ...' will hang waiting for the device to be unplugged. This means that the VM process failed/refused to release the hostdev back to the host, and the hostdev will be detached during VM shutdown. Today we don't behave well for both cases. We'll fail to remove the PCI device reference from mgr->activePCIHostdevs and mgr->inactivePCIHostdevs because we rely on the existence of the PCI device conf file in the sysfs. Attempting to re-utilize the same device (assuming it is now present back in the host) can result in an error like this: $ ./run tools/virsh start vm1-sriov --console error: Failed to start domain vm1-sriov error: Requested operation is not valid: PCI device :01:00.2 is in use by driver QEMU, domain vm1-sriov For (1), a VM destroy/start cycle is needed to re-use the VF in the guest. For (2), the effect is more nefarious, requiring a Libvirtd daemon restart to use the VF again in any guest. We can make it a bit better by checking, during virHostdevReAttachPCIDevices(), if there is any missing PCI device that will be left behind in activePCIHostdevs and inactivePCIHostdevs lists. Remove any missing device found from both lists, unconditionally, matching the current state of the host. This change affects the code path in (1) (processDeviceDeletedEvent into qemuDomainRemoveDevice, all the way back to qemuHostdevReAttachPCIDevices) and also in (b) (qemuProcessStop into qemuHostdevReAttachDomainDevices). NB: Although this patch enables the possibility of 'outside Libvirt' SR-IOV hotunplug of PCI devices, if the hypervisor and the PCI driver copes with it, our goal is to mitigate what it is still considered an user oopsie. For all'' This is one of those odd cases where you use "a" before a word starting with a vowel rather than "an". It's that way because the *sound* starting the next word is "you" rather than "ooh". supported purposes, the admin must remove the SR-IOV VFs from all running domains before removing the VFs from the host. Resolves: https://gitlab.com/libvirt/libvirt/-/issues/72 Signed-off-by: Daniel Henrique Barboza --- src/hypervisor/virhostdev.c | 38 + 1 file changed, 38 insertions(+) diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c index c708791eec..ed43733e71 100644 --- a/src/hypervisor/virhostdev.c +++ b/src/hypervisor/virhostdev.c @@ -1077,6 +1077,40 @@ virHostdevReAttachPCIDevicesImpl(virHostdevManagerPtr mgr, } +static void +virHostdevDeleteMissingPCIDevices(virHostdevManagerPtr mgr, + virDomainHostdevDefPtr *hostdevs, + int nhostdevs) +{ +size_t i; + +if (nhostdevs == 0) +return; + +virObjectLock(mgr->activePCIHostdevs); +virObjectLock(mgr->inactivePCIHostdevs); + +for (i = 0; i < nhostdevs; i++) { +virDomainHostdevDef *hostdev = hostdevs[i]; +virDomainHostdevSubsysPCI *pcisrc = >source.subsys.u.pci; +g_autoptr(virPCIDevice) pci = NULL; + +if (virHostdevGetPCIHostDevice(hostdev, ) != -2) +continue; + +/* The PCI device from 'hostdev' does not exist in the host + * anymore. Delete it from both active and inactive lists to + * reflect the current host state. + */ +virPCIDeviceListDel(mgr->activePCIHostdevs, >addr); +virPCIDeviceListDel(mgr->inactivePCIHostdevs, >addr); +} + +virObjectUnlock(mgr->activePCIHostdevs); +virObjectUnlock(mgr->inactivePCIHostdevs); It makes me nervous that you're unlocking these lists in the same order that you lock them, since that could theoretically lead to a deadlock. I haven't even thought beyond this single function to figure out if it's even a possibility in this case, but still would feel better if you unlocked in the opposite order that you lock. With that fixed: Reviewed-by: Laine Stump +} + + /* @oldStateDir: * For upgrade purpose: see virHostdevRestoreNetConfig */ @@ -1102,6 +1136,10 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr mgr, virHostdevReAttachPCIDevicesImpl(mgr, drv_name, dom_name, pcidevs, hostdevs,
[PATCH RESEND 20/20] virhostdev.c: remove missing PCI devs from hostdev manager
virHostdevReAttachPCIDevices() is called when we want to re-attach a list of hostdevs back to the host, either on the shutdown path or via a 'virsh detach-device' call. This function always count on the existence of the device in the host to work, but this can lead to problems. For example, a SR-IOV device can be removed via an admin "echo 0 > /sys/bus/pci/devices//sriov_numvfs", making the kernel fire up and eventfd_signal() to the process, asking for the process to release the device. The result might vary depending on the device driver and OS/arch, but two possible outcomes are: 1) the hypervisor driver will detach the device from the VM, issuing a delete event to Libvirt. This can be observed in QEMU; 2) the 'echo 0 > ...' will hang waiting for the device to be unplugged. This means that the VM process failed/refused to release the hostdev back to the host, and the hostdev will be detached during VM shutdown. Today we don't behave well for both cases. We'll fail to remove the PCI device reference from mgr->activePCIHostdevs and mgr->inactivePCIHostdevs because we rely on the existence of the PCI device conf file in the sysfs. Attempting to re-utilize the same device (assuming it is now present back in the host) can result in an error like this: $ ./run tools/virsh start vm1-sriov --console error: Failed to start domain vm1-sriov error: Requested operation is not valid: PCI device :01:00.2 is in use by driver QEMU, domain vm1-sriov For (1), a VM destroy/start cycle is needed to re-use the VF in the guest. For (2), the effect is more nefarious, requiring a Libvirtd daemon restart to use the VF again in any guest. We can make it a bit better by checking, during virHostdevReAttachPCIDevices(), if there is any missing PCI device that will be left behind in activePCIHostdevs and inactivePCIHostdevs lists. Remove any missing device found from both lists, unconditionally, matching the current state of the host. This change affects the code path in (1) (processDeviceDeletedEvent into qemuDomainRemoveDevice, all the way back to qemuHostdevReAttachPCIDevices) and also in (b) (qemuProcessStop into qemuHostdevReAttachDomainDevices). NB: Although this patch enables the possibility of 'outside Libvirt' SR-IOV hotunplug of PCI devices, if the hypervisor and the PCI driver copes with it, our goal is to mitigate what it is still considered an user oopsie. For all supported purposes, the admin must remove the SR-IOV VFs from all running domains before removing the VFs from the host. Resolves: https://gitlab.com/libvirt/libvirt/-/issues/72 Signed-off-by: Daniel Henrique Barboza --- src/hypervisor/virhostdev.c | 38 + 1 file changed, 38 insertions(+) diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c index c708791eec..ed43733e71 100644 --- a/src/hypervisor/virhostdev.c +++ b/src/hypervisor/virhostdev.c @@ -1077,6 +1077,40 @@ virHostdevReAttachPCIDevicesImpl(virHostdevManagerPtr mgr, } +static void +virHostdevDeleteMissingPCIDevices(virHostdevManagerPtr mgr, + virDomainHostdevDefPtr *hostdevs, + int nhostdevs) +{ +size_t i; + +if (nhostdevs == 0) +return; + +virObjectLock(mgr->activePCIHostdevs); +virObjectLock(mgr->inactivePCIHostdevs); + +for (i = 0; i < nhostdevs; i++) { +virDomainHostdevDef *hostdev = hostdevs[i]; +virDomainHostdevSubsysPCI *pcisrc = >source.subsys.u.pci; +g_autoptr(virPCIDevice) pci = NULL; + +if (virHostdevGetPCIHostDevice(hostdev, ) != -2) +continue; + +/* The PCI device from 'hostdev' does not exist in the host + * anymore. Delete it from both active and inactive lists to + * reflect the current host state. + */ +virPCIDeviceListDel(mgr->activePCIHostdevs, >addr); +virPCIDeviceListDel(mgr->inactivePCIHostdevs, >addr); +} + +virObjectUnlock(mgr->activePCIHostdevs); +virObjectUnlock(mgr->inactivePCIHostdevs); +} + + /* @oldStateDir: * For upgrade purpose: see virHostdevRestoreNetConfig */ @@ -1102,6 +1136,10 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr mgr, virHostdevReAttachPCIDevicesImpl(mgr, drv_name, dom_name, pcidevs, hostdevs, nhostdevs, oldStateDir); + +/* Handle the case where PCI devices from the host went missing + * during the domain lifetime */ +virHostdevDeleteMissingPCIDevices(mgr, hostdevs, nhostdevs); } -- 2.26.2