On a Tuesday in 2021, Laine Stump wrote:
Commit 795e9e05c3 (libvirt-7.7.0) refactored the code in virpci.c and
virnetdev.c that gathered lists of the Virtual Functions (VF) of an
SRIOV Physical Function (PF) to simplify the code.

Unfortunately the simplification made the assumption that a VF's
netdev interface name should only be retrieved if the PF had a valid
phys_port_id. That is an incorrect assumption - only a small handful
of (now previous-generation) Mellanox SRIOV cards actually use
phys_port_id (this is for an odd design where there are multiple
physical network ports on a single PCI address); all other SRIOV cards
(including new Mellanox cards) have a file in sysfs called
phys_port_id, but it can't be read, and so the pfPhysPortID string is
NULL.

The result of this logic error is that virtual networks that are a
pool of VFs to be used for macvtap connections will be unable to
start, giving an errror like this:

VF 0 of SRIOV PF enp130s0f0 couldn't be added to the interface pool because it 
isn't bound to a network driver - possibly in use elsewhere

The simple/quick solution to this is to not imply that "pfPhysPortID
== NULL" is the same as "don't fill in the VF's netdev
ifname". Instead, add a bool getIfName to the args of
virNetDevGetVirtualFunctionsFull() so that we can still get the ifname
filled in when pfPhysPortID is NULL.

Resolves: https://bugzilla.redhat.com/2025432
Fixes: 795e9e05c3b6b9ef3abe6f6078a6373a136ec23b
Signed-off-by: Laine Stump <la...@redhat.com>
---

On one hand this is a regression with an apparently simple fix (that
has been tested to solve the problem), and it's always good to fix
regressions before a release rather than after. On the other hand it
has been broken since August, and nobody complained until last week
(and that was a QE tester, not an actual end-user), so it seems the
bug is in functionality that isn't used much in the field (or at least
no downstream with a used of the functionality has made a release
since then that was installed by said user).

I've grown increasingly wary of pushing anything just before a
release, especially when it modifies the args of a function that has
multiple definitions for different platforms (although CI is supposed
to be thorough enough to catch those types of problems these days). So
I'm all for pushing this right after the release, rather than
before. But if anyone sees a reason for doing otherwise, we can talk
about it in about 10 hours when I sit down at the keyboard again :-).


I agree, pushing after the relase seems safer ;)

P.S. I'm planning to make a followup that eliminates the pfPhysPortID
arg completely, but wanted the bugfix to be as stripped down as
possible.

src/util/virnetdev.c |  2 +-
src/util/virpci.c    | 20 ++++++++++++--------
src/util/virpci.h    |  1 +
3 files changed, 14 insertions(+), 9 deletions(-)


Reviewed-by: Ján Tomko <jto...@redhat.com>

Jano

Attachment: signature.asc
Description: PGP signature

Reply via email to