RE: [PATCH] util: Add phys_port_name support on virPCIGetNetName
>-Original Message- >From: Laine Stump >Sent: Thursday, December 17, 2020 5:22 AM >To: libvir-list@redhat.com >Cc: Dmytro Linkin ; Moshe Levi ; >Adrian Chiris >Subject: Re: [PATCH] util: Add phys_port_name support on >virPCIGetNetName > >External email: Use caution opening links or attachments > > >On 12/10/20 11:51 AM, Adrian Chiris wrote: > >> Hi, >> Would be great to get a pair of eyes on this Patch, Thanks! > > >I've looked at it several times and every time would just end up shaking my >head wondering why there isn't one definitive symlink in the VF's sysfs for the >netdev of the physical port. (Part of my lack of response from the last time is >that I didn't know how to respond since I didn't understand why such >roundabout logic should be needed to answer such a basic question, decided I >should look at it again before responding, and then forgot about it :-() > > >Anyway, this time I'm determined to not get up until I've got it figured out >(or >at least understand exactly what I don't have figured out)... > > > >>> -Original Message- >>> From: Dmytro Linkin >>> Sent: Tuesday, October 27, 2020 10:58 AM >>> To: libvir-list@redhat.com >>> Cc: Laine Stump ; Adrian Chiris >>> ; Moshe Levi >>> Subject: Re: [PATCH] util: Add phys_port_name support on >>> virPCIGetNetName >>> >>> On Mon, Sep 28, 2020 at 12:56:12PM +0300, Dmytro Linkin wrote: >>>> On Tue, Sep 22, 2020 at 08:31:15AM -0400, Laine Stump wrote: >>>>> On 8/28/20 6:53 AM, Dmytro Linkin wrote: >>>>>> Current virPCIGetNetName() logic is to get net device name by >>>>>> checking it's phys_port_id, if caller provide it, or by it's index >>>>>> (eg, by it's position at sysfs net directory). This approach >>>>>> worked fine up until linux kernel version 5.8, where NVIDIA >>>>>> Mellanox driver implemented linking of VFs' representors to PCI >>>>>> device in switchdev mode. This mean that device's sysfs net >>>>>> directory will hold >>> multiple net devices. Ex.: >>>>>> $ ls '/sys/bus/pci/devices/:82:00.0/net' >>>>>> ens1f0 eth0 eth1 >>>>>> >>>>>> Most switch devices support phys_port_name instead of >>>>>> phys_port_id, so >>>>>> virPCIGetNetName() will try to get PF name by it's index - 0. The >>>>>> problem here is that the PF nedev entry may not be the first. >>>>>> >>>>>> To fix that, for switch devices, we introduce a new logic to >>>>>> select the PF uplink netdev according to the content of >phys_port_name. >>>>>> Extend >>>>>> virPCIGetNetName() with physPortNameRegex variable to get proper >>>>>> device by it's phys_port_name scheme, for ex., "p[0-9]+$" to get >>>>>> PF, "pf[0-9]+vf[0-9]+$" to get VF or "p1$" to get exact net device. >>>>>> So now >>>>>> virPCIGetNetName() logic work in following sequence: >>>>>> - filter by phys_port_id, if it's provided, >>>>>> or >>>>>> - filter by phys_port_name, if it's regex provided, >>>>>> or >>>>>> - get net device by it's index (position) in sysfs net directory. >>>>>> Also, make getting content of iface sysfs files more generic. >>>>>> >>>>>> Signed-off-by: Dmytro Linkin >>>>>> Reviewed-by: Adrian Chiris >>>>> >>>>> [...] >>>>> >>>>> >>>>>> +/* Represents format of PF's phys_port_name in switchdev mode: >>>>>> + * 'p%u' or 'p%us%u'. New line checked since value is readed from >>>>>> +sysfs >>> file. >>>>>> + */ >>>>>> +# define VIR_PF_PHYS_PORT_NAME_REGEX ((char >>>>>> +*)"(p[0-9]+$)|(p[0-9]+s[0-9]+$)") >>>>>> + >>>>> >>>>> I've come back to look at this patch several times since it was >>>>> posted (sorry for the extreme delay in responding), but just can't >>>>> figure out what it's doing with this regex and why the regex is >>>>> necessary. Not having access to the hardware that it works with is >>>>> a bit of a problem, but perhaps I could get a better idea if you >>>>> gave a full example of sysfs contents? My concern wi
RE: [PATCH] util: Add phys_port_name support on virPCIGetNetName
>-Original Message- >From: Dmytro Linkin >Sent: Tuesday, October 27, 2020 10:58 AM >To: libvir-list@redhat.com >Cc: Laine Stump ; Adrian Chiris ; >Moshe Levi >Subject: Re: [PATCH] util: Add phys_port_name support on >virPCIGetNetName > >On Mon, Sep 28, 2020 at 12:56:12PM +0300, Dmytro Linkin wrote: >> On Tue, Sep 22, 2020 at 08:31:15AM -0400, Laine Stump wrote: >> > On 8/28/20 6:53 AM, Dmytro Linkin wrote: >> > >Current virPCIGetNetName() logic is to get net device name by >> > >checking it's phys_port_id, if caller provide it, or by it's index >> > >(eg, by it's position at sysfs net directory). This approach worked >> > >fine up until linux kernel version 5.8, where NVIDIA Mellanox >> > >driver implemented linking of VFs' representors to PCI device in >> > >switchdev mode. This mean that device's sysfs net directory will hold >multiple net devices. Ex.: >> > > >> > >$ ls '/sys/bus/pci/devices/:82:00.0/net' >> > >ens1f0 eth0 eth1 >> > > >> > >Most switch devices support phys_port_name instead of phys_port_id, >> > >so >> > >virPCIGetNetName() will try to get PF name by it's index - 0. The >> > >problem here is that the PF nedev entry may not be the first. >> > > >> > >To fix that, for switch devices, we introduce a new logic to select >> > >the PF uplink netdev according to the content of phys_port_name. >> > >Extend >> > >virPCIGetNetName() with physPortNameRegex variable to get proper >> > >device by it's phys_port_name scheme, for ex., "p[0-9]+$" to get >> > >PF, "pf[0-9]+vf[0-9]+$" to get VF or "p1$" to get exact net device. >> > >So now >> > >virPCIGetNetName() logic work in following sequence: >> > > - filter by phys_port_id, if it's provided, >> > > or >> > > - filter by phys_port_name, if it's regex provided, >> > > or >> > > - get net device by it's index (position) in sysfs net directory. >> > >Also, make getting content of iface sysfs files more generic. >> > > >> > >Signed-off-by: Dmytro Linkin >> > >Reviewed-by: Adrian Chiris >> > >> > >> > [...] >> > >> > >> > > >> > >+/* Represents format of PF's phys_port_name in switchdev mode: >> > >+ * 'p%u' or 'p%us%u'. New line checked since value is readed from sysfs >file. >> > >+ */ >> > >+# define VIR_PF_PHYS_PORT_NAME_REGEX ((char >> > >+*)"(p[0-9]+$)|(p[0-9]+s[0-9]+$)") >> > >+ >> > >> > >> > I've come back to look at this patch several times since it was >> > posted (sorry for the extreme delay in responding), but just can't >> > figure out what it's doing with this regex and why the regex is >> > necessary. Not having access to the hardware that it works with is a >> > bit of a problem, but perhaps I could get a better idea if you gave >> > a full example of sysfs contents? My concern with using a regex is >> > that it might work just fine when using one method for net device >> > naming, but break if that was changed. Also, it seems >> > counterintuitive that it would be necessary to look for a device >> > with a name matching a specific pattern; why isn't there simply a >> > single symbolic link somewhere in the sysfs tree for the net device >> > that just directly points at its physical port? That would be so >> > much simpler and more reliable (or at least it would give the >> > *perception* of being more reliable). >> > >> >> I'll emphasize that regex is being used for phys_port_name, NOT net >> device name (they are not the same). Anyway, I'll give an example. >> >> Lets look how virPCIGetNetName() currently work. >> At first it try to read phys_port_id of every net device in net >> folder, like: >> $ cd '/sys/bus/pci/devices/:80:02.0/:82:00.0/' >> $ cat net/ens1f0/phys_port_id >> >> Imagine we have single pci dual port nic (I hope named it right), eg. >> net folder holds net devices for both ports, so read operation will be >> successfull and function will return name of first or second port. >> For single port or dual pci nics (or for drivers which didn't >> implemented phys_port_id) read fails and function fallback to picking >> up device by it's index, which not really fine (I'll describe it >> later) but work 'cause net folder usualy contains only one net d