Re: [PATCH 5/7] conf: parse/format element in plain
On 2/11/21 8:57 AM, Laine Stump wrote: The element in allows pairing two interfaces together as a simple "failover bond" network device in a guest. One of the devices will be the "transient" interface - it will be preferred for all network traffic when it is present, but may be removed when necessary, in particular during migration, when traffic will instead go through the other interface of the pair - the "persistent" interface. As it happens, in the QEMU implementation of this teaming pair (called "virtio failover" in QEMU) the transient interface is always a host network device assigned to the guest using VFIO (aka "hostdev"); the persistent interface is always an emulated virtio NIC. When support was initially added for , it was written to require that the transient/hostdev device be defined using ; this was done because the virtio failover implementation in QEMU and the virtio guest driver demands that the two interfaces in the pair have matching MAC addresses, and the only way libvirt can guarantee the MAC address of a hostdev network device is to use , whose main purpose is to configure the device's MAC address. (note that in turn requires that the network device be an SRIOV VF (Virtual Function), as that is the only type of network device whose MAC address we can set in a way that will survive the device's driver init in the guest). It has recently come up that some users are unable to use because they are running in a container environment where libvirt doesn't have the necessary privileges or resources to set the VF's MAC address (because setting the VF MAC is done via the same device's PF (Physical Function), and the PF is not exposed to libvirt's container. At the same time, they *are* able to set the VF's MAC address in advance of staring up libvirt in the container. So they could theoretically use the feature if libvirt just skipped the "setting the MAC address" part. Fortunately, that is *exactly* the difference between (a "hostdev VF") and (a "plain hostdev" - it could be *any PCI device; libvirt doesn't know what type of PCI device it is, and doesn't care). But what *is* still needed is for libvirt to provide a small bit of information on the commandline argument for the hostdev, telling QEMU that this device will be part of a team ("failover pair"), and the id of the other device in the pair. So, what we need to do is add support for the element to plain , and that is what this patch does. (actually, this patch adds parsing/formatting of the element in . The next patch will actually wire that into the qemu driver.) Signed-off-by: Laine Stump --- docs/formatdomain.rst | 51 +++ docs/schemas/domaincommon.rng | 3 + src/conf/domain_conf.c| 5 ++ src/conf/domain_conf.h| 1 + src/conf/domain_validate.c| 19 ++ .../net-virtio-teaming-hostdev.xml| 48 ++ .../net-virtio-teaming-hostdev.xml| 64 +++ tests/qemuxml2xmltest.c | 3 + 8 files changed, 194 insertions(+) create mode 100644 tests/qemuxml2argvdata/net-virtio-teaming-hostdev.xml create mode 100644 tests/qemuxml2xmloutdata/net-virtio-teaming-hostdev.xml There're only very few of differences between these two files (mostly PCI addresses in the out xml) and neither of them is related to this feature. Perhaps make one symlink of the other and add those diffs into the input xml? diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 2493be595f..eafd6b3396 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -4837,6 +4837,22 @@ support in the hypervisor and the guest network driver). ... +The second interface in this example is referencing a network that is +a pool of SRIOV VFs (i.e. a "hostdev network"). You could instead +directly reference an SRIOV VF device: + +:: + + ... + + + + + + + + ... + The element required attribute ``type`` will be set to either ``"persistent"`` to indicate a device that should always be present in the domain, or ``"transient"`` to indicate a device that may periodically be @@ -4858,6 +4874,41 @@ once migration is completed; while migration is taking place, network traffic will use the virtio NIC. (Of course the emulated virtio NIC and the hostdev NIC must be connected to the same subnet for bonding to work properly). +:since:`Since 7.1.0` The element can also be added to a +plain device. + +:: + + ... + + + + + + + + ... + +This device must be a network device, but not necessarily an SRIOV +VF. Using plain rather than or is useful if the +device that will be assigned with VFIO is a standard NIC (not a VF) or +if libvirt doesn't have the necessary resources and privileges to set +the VF's MAC
[PATCH 5/7] conf: parse/format element in plain
The element in allows pairing two interfaces together as a simple "failover bond" network device in a guest. One of the devices will be the "transient" interface - it will be preferred for all network traffic when it is present, but may be removed when necessary, in particular during migration, when traffic will instead go through the other interface of the pair - the "persistent" interface. As it happens, in the QEMU implementation of this teaming pair (called "virtio failover" in QEMU) the transient interface is always a host network device assigned to the guest using VFIO (aka "hostdev"); the persistent interface is always an emulated virtio NIC. When support was initially added for , it was written to require that the transient/hostdev device be defined using ; this was done because the virtio failover implementation in QEMU and the virtio guest driver demands that the two interfaces in the pair have matching MAC addresses, and the only way libvirt can guarantee the MAC address of a hostdev network device is to use , whose main purpose is to configure the device's MAC address. (note that in turn requires that the network device be an SRIOV VF (Virtual Function), as that is the only type of network device whose MAC address we can set in a way that will survive the device's driver init in the guest). It has recently come up that some users are unable to use because they are running in a container environment where libvirt doesn't have the necessary privileges or resources to set the VF's MAC address (because setting the VF MAC is done via the same device's PF (Physical Function), and the PF is not exposed to libvirt's container. At the same time, they *are* able to set the VF's MAC address in advance of staring up libvirt in the container. So they could theoretically use the feature if libvirt just skipped the "setting the MAC address" part. Fortunately, that is *exactly* the difference between (a "hostdev VF") and (a "plain hostdev" - it could be *any PCI device; libvirt doesn't know what type of PCI device it is, and doesn't care). But what *is* still needed is for libvirt to provide a small bit of information on the commandline argument for the hostdev, telling QEMU that this device will be part of a team ("failover pair"), and the id of the other device in the pair. So, what we need to do is add support for the element to plain , and that is what this patch does. (actually, this patch adds parsing/formatting of the element in . The next patch will actually wire that into the qemu driver.) Signed-off-by: Laine Stump --- docs/formatdomain.rst | 51 +++ docs/schemas/domaincommon.rng | 3 + src/conf/domain_conf.c| 5 ++ src/conf/domain_conf.h| 1 + src/conf/domain_validate.c| 19 ++ .../net-virtio-teaming-hostdev.xml| 48 ++ .../net-virtio-teaming-hostdev.xml| 64 +++ tests/qemuxml2xmltest.c | 3 + 8 files changed, 194 insertions(+) create mode 100644 tests/qemuxml2argvdata/net-virtio-teaming-hostdev.xml create mode 100644 tests/qemuxml2xmloutdata/net-virtio-teaming-hostdev.xml diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 2493be595f..eafd6b3396 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -4837,6 +4837,22 @@ support in the hypervisor and the guest network driver). ... +The second interface in this example is referencing a network that is +a pool of SRIOV VFs (i.e. a "hostdev network"). You could instead +directly reference an SRIOV VF device: + +:: + + ... + + + + + + + + ... + The element required attribute ``type`` will be set to either ``"persistent"`` to indicate a device that should always be present in the domain, or ``"transient"`` to indicate a device that may periodically be @@ -4858,6 +4874,41 @@ once migration is completed; while migration is taking place, network traffic will use the virtio NIC. (Of course the emulated virtio NIC and the hostdev NIC must be connected to the same subnet for bonding to work properly). +:since:`Since 7.1.0` The element can also be added to a +plain device. + +:: + + ... + + + + + + + + ... + +This device must be a network device, but not necessarily an SRIOV +VF. Using plain rather than or is useful if the +device that will be assigned with VFIO is a standard NIC (not a VF) or +if libvirt doesn't have the necessary resources and privileges to set +the VF's MAC address (e.g. if libvirt is running unprivileged, or in a +container). This of course means that the user (or another +application) is responsible for setting the MAC address of the device +in a way such that it will survive guest driver initialization. For +standard NICs (i.e. not an SRIOV VF)