On 27/10/2020 14:03, Ilya Maximets wrote: > On 10/20/20 1:15 PM, Kevin Traynor wrote: >> On 16/09/2020 16:17, Gaetan Rivet wrote: >>> In some cloud topologies, using DPDK VF representors in guest requires >>> configuring a VF before it is assigned to the guest. >>> >>> A first basic option for such configuration is setting the VF MAC >>> address. Add a key 'dpdk-vf-mac' to the 'options' column of the Interface >>> table. >>> >>> This option can be used as such: >>> >>> $ ovs-vsctl add-port br0 dpdk-rep0 -- set Interface dpdk-rep0 type=dpdk \ >>> options:dpdk-vf-mac=00:11:22:33:44:55 >>> >> >> If I got it right, it doesn't seem like it solves any issues by limiting >> to representor, but is just an attempt to limit some of the edge cases >> around bifurcated devices to the devices where the requested >> functionality is really needed now. >> >> This limitation has some costs... >> >> We will now have, mac, mac_in_use, dpdk-vf-mac, configured_hwaddr, >> requested_hwaddr for the user to understand. I think it can be confusing. > > To be clear, 'configured_hwaddr/requested_hwaddr' type of report should not > exist at least in the output of 'get_config()' function. Same applies > to all existing '{configured,requested}_something' reports. > 'get_config()' intended to report things that could be copied and passed > directly > to database. OTOH, 'get_status()' should report what is an actual device > status, i.e. 'dpdk-vf-mac' should be in 'get_config()' and 'configured_hwaddr' > should be in 'get_status()', but it should be named differently, e.g. just > 'dpdk-vf-mac' and the callback it placed in already says that it's actually > 'configured'. In short: > - 'dpdk-vf-mac' from the 'get_config()' should report dev->requested_hwaddr. > - 'dpdk-vf-mac' from the 'get_status()' should report dev->hwaddr.
If an invalid mac is attempted or it fails to be set, this is what the user will see for dpdk-vf-mac: ovs-appctl dpctl/show -> dpdk-vf-mac=00:11:22:33:44:55 ovs-vsctl show -> dpdk-vf-mac=33:11:22:33:44:55 ovs-vsctl get Interface myport status -> dpdk-vf-mac=00:11:22:33:44:55 For completeness, i'll add the failed attempt is logged clearly. I suppose we can tell the user to check the 'mac_in_use', but multiple dpdk-vf-mac entries for the same device with different values is confusing. requested_dpdk-vf-mac configured_dpdk-vf-mac from 'ovs-appctl dpctl/show' would be explicit. At a minimum we should document which one is requested and which one is configured. > Of course, all this should be reported only if device is a representor. > > >> >> 'ovs-vsctl show' will show the dpdk-vf-mac, which is the main one we >> document, but it's not clear if it was successful or not. (yes there is >> a log entry, but it is a separate place). >> >> As it is specific for representors, if we ever want to allow setting of >> mac on any non representor DPDK ports, we have an awkward user >> interface. We would need to make another 'dpdk-mac' type option, or we >> decide then to allow use mac in interface table but that it doesn't work >> for representors, or there are two ways to set for representors etc. >> None of this seems great. >> >> My feeling is that it is making things complicated for the user with >> this many knobs, where we could just have mac and mac_in_use, live with >> the edge cases (as Gaetan pointed out, we already do for MTU) and we >> know as well if we ever need to extend further, the user interface will >> stay simple. Just my 2C, maybe there's a good argument why we can't do >> it like this. > > Setting mac for physical device has lots of split-brain issues and unclear > ways how and when undo changes as we do not know what was original mac or > what to do if there was no mac at all at the start. Moreover, some devices > will not allow setting mac to it's original value (e.g. all zeroes). > One more point is that vf and representor itself are different devices and > these devices could have different mac addresses. For now we agreed that > DPDK api works for representors in a way that it always changes parameters > of a virtual function, not representors', but that is not a very clear thing > for all users (e.g. for DPDK developers), so 'vf' in 'dpdk-vf-mac' > specifically > points to that difference. > That's a good point, it does distinguish and iirc some people made the point that this could be important. > After a long discussion we decided to make this option as specific as > possible, > so users will not try to use it to configure anything else beside mac of a > virtual function. All restrictions and downsides of configuration of vf mac > was/should be documented. > ok, I can see the pros of this type of design too so I won't object, but agree the downsides like the additional fields and ability for user to understand clearly the output from commands above should be documented. > >> >> Few comments on the code as it is now below. >> >> >>> Signed-off-by: Gaetan Rivet <gr...@u256.net> >>> Suggested-by: Ilya Maximets <i.maxim...@ovn.org> >>> Acked-by: Eli Britstein <el...@nvidia.com> >>> --- >>> Documentation/topics/dpdk/phy.rst | 22 ++++++++++++ >>> NEWS | 2 ++ >>> lib/netdev-dpdk.c | 60 +++++++++++++++++++++++++++++++ >>> vswitchd/vswitch.xml | 13 +++++++ >>> 4 files changed, 97 insertions(+) >>> >>> diff --git a/Documentation/topics/dpdk/phy.rst >>> b/Documentation/topics/dpdk/phy.rst >>> index 38e52c8de..73dcb7c28 100644 >>> --- a/Documentation/topics/dpdk/phy.rst >>> +++ b/Documentation/topics/dpdk/phy.rst >>> @@ -379,6 +379,28 @@ an eth device whose mac address is >>> ``00:11:22:33:44:55``:: >>> $ ovs-vsctl add-port br0 dpdk-mac -- set Interface dpdk-mac type=dpdk \ >>> options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55" >>> >>> +Representor specific configuration >>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> + >>> +In some topologies, a VF must be configured before being assigned to a >>> +guest (VM) machine. This configuration is done through VF-specific fields >>> +in the ``options`` column of the Interface table. >>> + >>> +.. important:: >>> + >>> + Some DPDK port use `bifurcated drivers <bifurcated-drivers>`__, >>> + which means that a kernel netdevice remains when Open vSwitch is >>> stopped. >>> + >>> + In such case, any configuration applied to a VF would remain set on the >>> + kernel netdevice, and be inherited from it when Open vSwitch is >>> restarted, >>> + even if the options described in this section are unset from Open >>> vSwitch. >>> + >>> +.. _bifurcated-drivers: >>> http://doc.dpdk.org/guides/linux_gsg/linux_drivers.html#bifurcated-driver >>> + >>> +- Configure the VF MAC address:: >>> + >>> + $ ovs-vsctl set Interface dpdk-rep0 >>> options:dpdk-vf-mac=00:11:22:33:44:55 >>> + >> >> As mentioned above, if for some reason setting the mac fails there will >> be warning in the logs, but 'ovs-vsctl show' will still show the >> dpdk-vf-mac that was not applied and it's quite prominent. Also, there >> are mac, mac_in_use, dpdk-vf-mac, configured_hwaddr, requested_hwaddr. >> >> Suggest to add a small summary here for each one as it relates >> representors and make clear to users, >> 1. how to find the mac the user requested >> 2. how to find the current mac >> >>> Jumbo Frames >>> ------------ >>> >>> diff --git a/NEWS b/NEWS >>> index 2f67d5047..5ece29474 100644 >>> --- a/NEWS >>> +++ b/NEWS >>> @@ -24,6 +24,8 @@ v2.14.0 - 17 Aug 2020 >> >> 2.15....hopefully :-) >> >>> to list and change log levels in DPDK components. >>> * Vhost-user Dequeue zero-copy support is deprecated and will be >>> removed >>> in the next release. >>> + * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports, >>> + that allows configuring the MAC address of a VF representor. >>> - Linux datapath: >>> * Support for kernel versions up to 5.5.x. >>> - AF_XDP: >>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >>> index 4ef7f78a6..060348369 100644 >>> --- a/lib/netdev-dpdk.c >>> +++ b/lib/netdev-dpdk.c >>> @@ -523,6 +523,9 @@ struct netdev_dpdk { >>> * otherwise interrupt mode is used. */ >>> bool requested_lsc_interrupt_mode; >>> bool lsc_interrupt_mode; >>> + >>> + /* VF configuration. */ >>> + struct eth_addr requested_hwaddr; >>> ); >>> >>> PADDED_MEMBERS(CACHE_LINE_SIZE, >>> @@ -1685,6 +1688,16 @@ out: >>> return ret; >>> } >>> >>> +static bool >>> +dpdk_port_is_representor(struct netdev_dpdk *dev) >>> + OVS_REQUIRES(dev->mutex) >>> +{ >>> + struct rte_eth_dev_info dev_info; >>> + >>> + rte_eth_dev_info_get(dev->port_id, &dev_info); >>> + return (*dev_info.dev_flags) & RTE_ETH_DEV_REPRESENTOR; >>> +} >>> + >>> static int >>> netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args) >>> { >>> @@ -1719,6 +1732,13 @@ netdev_dpdk_get_config(const struct netdev *netdev, >>> struct smap *args) >>> } >>> smap_add(args, "lsc_interrupt_mode", >>> dev->lsc_interrupt_mode ? "true" : "false"); >>> + >>> + if (dpdk_port_is_representor(dev)) { >>> + smap_add_format(args, "requested_hwaddr", ETH_ADDR_FMT, >>> + ETH_ADDR_ARGS(dev->requested_hwaddr)); >>> + smap_add_format(args, "configured_hwaddr", ETH_ADDR_FMT, >>> + ETH_ADDR_ARGS(dev->hwaddr)); >> >> '*mac*' is used elsewhere (mac, mac_in_use, dpdk-vf-mac), think it's >> better to keep consistent > > Since this is 'config' it should report things in a way they could be passed > to the datbase. So, it should be 'dpdk-vf-mac' and it should report the > value that was requested, i.e. dev->requested_hwaddr. > >> >>> + } >>> } >>> ovs_mutex_unlock(&dev->mutex); >>> >>> @@ -1898,6 +1918,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const >>> struct smap *args, >>> {RTE_FC_RX_PAUSE, RTE_FC_FULL } >>> }; >>> const char *new_devargs; >>> + const char *vf_mac; >>> int err = 0; >>> >>> ovs_mutex_lock(&dpdk_mutex); >>> @@ -1968,6 +1989,36 @@ netdev_dpdk_set_config(struct netdev *netdev, const >>> struct smap *args, >>> goto out; >>> } >>> >>> + vf_mac = smap_get(args, "dpdk-vf-mac"); >>> + if (vf_mac) { >>> + struct eth_addr mac; >>> + >>> + err = EINVAL; >>> + >>> + if (!dpdk_port_is_representor(dev)) { >>> + VLOG_ERR_BUF(errp, "'%s' is trying to set the VF MAC '%s' " >>> + "but 'options:dpdk-vf-mac' is only supported for " >>> + "VF representors.", >>> + netdev_get_name(netdev), vf_mac); >> >> Warnings here seems more in line with the other configs that cannot be >> applied. > > I agree, if it's not a representor we could just issue a warning and continue. > >> >>> + goto out; >>> + } >>> + if (!eth_addr_from_string(vf_mac, &mac)) { >>> + VLOG_ERR_BUF(errp, "interface '%s': cannot parse VF MAC '%s'", >>> + netdev_get_name(netdev), vf_mac); >> >> as above >> >>> + goto out; >>> + } >>> + if (eth_addr_is_multicast(mac)) { >>> + VLOG_ERR_BUF(errp, >>> + "interface '%s': cannot set VF MAC to multicast " >>> + "address", netdev_get_name(netdev)); >> >> as above >> >>> + goto out; >>> + } >>> + if (!eth_addr_equals(dev->requested_hwaddr, mac)) { >>> + dev->requested_hwaddr = mac; >>> + netdev_request_reconfigure(netdev); >>> + } >>> + } >>> + >>> lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false); >>> if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) { >>> dev->requested_lsc_interrupt_mode = lsc_interrupt_mode; >>> @@ -4938,6 +4989,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev) >>> && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode >>> && dev->rxq_size == dev->requested_rxq_size >>> && dev->txq_size == dev->requested_txq_size >>> + && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr) >>> && dev->socket_id == dev->requested_socket_id >>> && dev->started && !dev->reset_needed) { >>> /* Reconfiguration is unnecessary */ >>> @@ -4969,6 +5021,14 @@ netdev_dpdk_reconfigure(struct netdev *netdev) >>> dev->txq_size = dev->requested_txq_size; >>> >>> rte_free(dev->tx_q); >>> + >>> + if (!eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)) { >> >> Looks like if dpdk-vf-mac is not set, after initialization there will be >> a configured hwaddr, and requested_hwaddr will be all zeros. >> >> If this function is called because of some other config change, then >> hwaddr != requested_hwaddr, and it will attempt to set the mac to the >> requested_hwaddr of all zeros. This may also fail causing the >> reconfigure to fail for other configs items. >> >> (I admit, I hacked a phy port dev_info to say it was a representor, when >> playing with this but the issue seems valid) >> >> >>> + err = netdev_dpdk_set_etheraddr__(dev, dev->requested_hwaddr); >>> + if (err) { >>> + goto out; >>> + } >>> + } >>> + >>> err = dpdk_eth_dev_init(dev); >>> if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { >>> netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO; >>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml >>> index 81c84927f..c0bf03c95 100644 >>> --- a/vswitchd/vswitch.xml >>> +++ b/vswitchd/vswitch.xml >>> @@ -3280,6 +3280,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 >>> type=patch options:peer=p1 \ >>> descriptors will be used by default. >>> </p> >>> </column> >>> + >>> + <column name="options" key="dpdk-vf-mac"> >>> + <p> >>> + Ethernet address to set for this VF interface. If unset then the >>> + default MAC address is used: >>> + </p> >>> + <ul> >>> + <li>For most drivers, the default MAC address assigned by >>> + their hardware.</li><li>For bifurcated drivers, the MAC currently >>> + used by the kernel netdevice.</li> >> >> nit: the formatting is a little different compared to the other >> <li></li> usage, newline/indent etc. >> >> thanks, >> Kevin. >> >>> + </ul> >>> + <p>This option may only be used with dpdk VF representors.</p> >>> + </column> >>> </group> >>> >>> <group title="EMC (Exact Match Cache) Configuration"> >>> >> >> >> >> >> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev