On a Tuesday in 2021, Laine Stump wrote:
Commit 795e9e05c3 (libvirt-7.7.0) refactored the code in virpci.c and virnetdev.c that gathered lists of the Virtual Functions (VF) of an SRIOV Physical Function (PF) to simplify the code.Unfortunately the simplification made the assumption that a VF's netdev interface name should only be retrieved if the PF had a valid phys_port_id. That is an incorrect assumption - only a small handful of (now previous-generation) Mellanox SRIOV cards actually use phys_port_id (this is for an odd design where there are multiple physical network ports on a single PCI address); all other SRIOV cards (including new Mellanox cards) have a file in sysfs called phys_port_id, but it can't be read, and so the pfPhysPortID string is NULL. The result of this logic error is that virtual networks that are a pool of VFs to be used for macvtap connections will be unable to start, giving an errror like this: VF 0 of SRIOV PF enp130s0f0 couldn't be added to the interface pool because it isn't bound to a network driver - possibly in use elsewhere The simple/quick solution to this is to not imply that "pfPhysPortID == NULL" is the same as "don't fill in the VF's netdev ifname". Instead, add a bool getIfName to the args of virNetDevGetVirtualFunctionsFull() so that we can still get the ifname filled in when pfPhysPortID is NULL. Resolves: https://bugzilla.redhat.com/2025432 Fixes: 795e9e05c3b6b9ef3abe6f6078a6373a136ec23b Signed-off-by: Laine Stump <la...@redhat.com> --- On one hand this is a regression with an apparently simple fix (that has been tested to solve the problem), and it's always good to fix regressions before a release rather than after. On the other hand it has been broken since August, and nobody complained until last week (and that was a QE tester, not an actual end-user), so it seems the bug is in functionality that isn't used much in the field (or at least no downstream with a used of the functionality has made a release since then that was installed by said user). I've grown increasingly wary of pushing anything just before a release, especially when it modifies the args of a function that has multiple definitions for different platforms (although CI is supposed to be thorough enough to catch those types of problems these days). So I'm all for pushing this right after the release, rather than before. But if anyone sees a reason for doing otherwise, we can talk about it in about 10 hours when I sit down at the keyboard again :-).
I agree, pushing after the relase seems safer ;)
P.S. I'm planning to make a followup that eliminates the pfPhysPortID arg completely, but wanted the bugfix to be as stripped down as possible. src/util/virnetdev.c | 2 +- src/util/virpci.c | 20 ++++++++++++-------- src/util/virpci.h | 1 + 3 files changed, 14 insertions(+), 9 deletions(-)
Reviewed-by: Ján Tomko <jto...@redhat.com> Jano
signature.asc
Description: PGP signature