Hi Ilya, Kevin,

Thanks for the remarks, I will fix them.
However I have two questions inline.

On 27/10/20 15:03 +0100, Ilya Maximets wrote:
> On 10/20/20 1:15 PM, Kevin Traynor wrote:
> > On 16/09/2020 16:17, Gaetan Rivet wrote:
> >> In some cloud topologies, using DPDK VF representors in guest requires
> >> configuring a VF before it is assigned to the guest.
> >>
> >> A first basic option for such configuration is setting the VF MAC
> >> address. Add a key 'dpdk-vf-mac' to the 'options' column of the Interface
> >> table.
> >>
> >> This option can be used as such:
> >>
> >>    $ ovs-vsctl add-port br0 dpdk-rep0 -- set Interface dpdk-rep0 type=dpdk 
> >> \
> >>       options:dpdk-vf-mac=00:11:22:33:44:55
> >>
> > 
> > If I got it right, it doesn't seem like it solves any issues by limiting
> > to representor, but is just an attempt to limit some of the edge cases
> > around bifurcated devices to the devices where the requested
> > functionality is really needed now.
> > 
> > This limitation has some costs...
> > 
> > We will now have, mac, mac_in_use, dpdk-vf-mac, configured_hwaddr,
> > requested_hwaddr for the user to understand. I think it can be confusing.
> 
> To be clear, 'configured_hwaddr/requested_hwaddr' type of report should not
> exist at least in the output of 'get_config()' function.  Same applies
> to all existing '{configured,requested}_something' reports.
> 'get_config()' intended to report things that could be copied and passed 
> directly
> to database.  OTOH, 'get_status()' should report what is an actual device
> status, i.e. 'dpdk-vf-mac' should be in 'get_config()' and 'configured_hwaddr'
> should be in 'get_status()', but it should be named differently, e.g. just
> 'dpdk-vf-mac' and the callback it placed in already says that it's actually
> 'configured'.  In short:
>   - 'dpdk-vf-mac' from the 'get_config()' should report dev->requested_hwaddr.
>   - 'dpdk-vf-mac' from the 'get_status()' should report dev->hwaddr.
> Of course, all this should be reported only if device is a representor.
> 
> 

Indeed I had followed the other '{configured|requested}_' fields.

This is not related to this patchset, however I'd like to know how to
proceed to fix those other fields? They were exposed to users possibly,
should there be a deprecation process to warn of the change, or should
they just be fixed directly as the get_config() API was not respected?

> > 
> > 'ovs-vsctl show' will show the dpdk-vf-mac, which is the main one we
> > document, but it's not clear if it was successful or not. (yes there is
> > a log entry, but it is a separate place).
> > 
> > As it is specific for representors, if we ever want to allow setting of
> > mac on any non representor DPDK ports, we have an awkward user
> > interface. We would need to make another 'dpdk-mac' type option, or we
> > decide then to allow use mac in interface table but that it doesn't work
> > for representors, or there are two ways to set for representors etc.
> > None of this seems great.
> > 
> > My feeling is that it is making things complicated for the user with
> > this many knobs, where we could just have mac and mac_in_use, live with
> > the edge cases (as Gaetan pointed out, we already do for MTU) and we
> > know as well if we ever need to extend further, the user interface will
> > stay simple. Just my 2C, maybe there's a good argument why we can't do
> > it like this.
> 
> Setting mac for physical device has lots of split-brain issues and unclear
> ways how and when undo changes as we do not know what was original mac or
> what to do if there was no mac at all at the start.  Moreover, some devices
> will not allow setting mac to it's original value (e.g. all zeroes).
> One more point is that vf and representor itself are different devices and
> these devices could have different mac addresses.  For now we agreed that
> DPDK api works for representors in a way that it always changes parameters
> of a virtual function, not representors', but that is not a very clear thing
> for all users (e.g. for DPDK developers), so 'vf' in 'dpdk-vf-mac' 
> specifically
> points to that difference.
> 
> After a long discussion we decided to make this option as specific as 
> possible,
> so users will not try to use it to configure anything else beside mac of a
> virtual function.  All restrictions and downsides of configuration of vf mac
> was/should be documented.
> 
> 
> > 
> > Few comments on the code as it is now below.
> > 
> > 
> >> Signed-off-by: Gaetan Rivet <gr...@u256.net>
> >> Suggested-by: Ilya Maximets <i.maxim...@ovn.org>
> >> Acked-by: Eli Britstein <el...@nvidia.com>
> >> ---
> >>  Documentation/topics/dpdk/phy.rst | 22 ++++++++++++
> >>  NEWS                              |  2 ++
> >>  lib/netdev-dpdk.c                 | 60 +++++++++++++++++++++++++++++++
> >>  vswitchd/vswitch.xml              | 13 +++++++
> >>  4 files changed, 97 insertions(+)
> >>
> >> diff --git a/Documentation/topics/dpdk/phy.rst 
> >> b/Documentation/topics/dpdk/phy.rst
> >> index 38e52c8de..73dcb7c28 100644
> >> --- a/Documentation/topics/dpdk/phy.rst
> >> +++ b/Documentation/topics/dpdk/phy.rst
> >> @@ -379,6 +379,28 @@ an eth device whose mac address is 
> >> ``00:11:22:33:44:55``::
> >>      $ ovs-vsctl add-port br0 dpdk-mac -- set Interface dpdk-mac type=dpdk 
> >> \
> >>         options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55"
> >>  
> >> +Representor specific configuration
> >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> +
> >> +In some topologies, a VF must be configured before being assigned to a
> >> +guest (VM) machine.  This configuration is done through VF-specific fields
> >> +in the ``options`` column of the Interface table.
> >> +
> >> +.. important::
> >> +
> >> +   Some DPDK port use `bifurcated drivers <bifurcated-drivers>`__,
> >> +   which means that a kernel netdevice remains when Open vSwitch is 
> >> stopped.
> >> +
> >> +   In such case, any configuration applied to a VF would remain set on the
> >> +   kernel netdevice, and be inherited from it when Open vSwitch is 
> >> restarted,
> >> +   even if the options described in this section are unset from Open 
> >> vSwitch.
> >> +
> >> +.. _bifurcated-drivers: 
> >> http://doc.dpdk.org/guides/linux_gsg/linux_drivers.html#bifurcated-driver
> >> +
> >> +- Configure the VF MAC address::
> >> +
> >> +    $ ovs-vsctl set Interface dpdk-rep0 
> >> options:dpdk-vf-mac=00:11:22:33:44:55
> >> +
> > 
> > As mentioned above, if for some reason setting the mac fails there will
> > be warning in the logs, but 'ovs-vsctl show' will still show the
> > dpdk-vf-mac that was not applied and it's quite prominent. Also, there
> > are mac, mac_in_use, dpdk-vf-mac, configured_hwaddr, requested_hwaddr.
> > 
> > Suggest to add a small summary here for each one as it relates
> > representors and make clear to users,
> > 1. how to find the mac the user requested
> > 2. how to find the current mac
> > 
> >>  Jumbo Frames
> >>  ------------
> >>  
> >> diff --git a/NEWS b/NEWS
> >> index 2f67d5047..5ece29474 100644
> >> --- a/NEWS
> >> +++ b/NEWS
> >> @@ -24,6 +24,8 @@ v2.14.0 - 17 Aug 2020
> > 
> > 2.15....hopefully :-)
> > 
> >>         to list and change log levels in DPDK components.
> >>       * Vhost-user Dequeue zero-copy support is deprecated and will be 
> >> removed
> >>         in the next release.
> >> +     * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
> >> +       that allows configuring the MAC address of a VF representor.
> >>     - Linux datapath:
> >>       * Support for kernel versions up to 5.5.x.
> >>     - AF_XDP:
> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >> index 4ef7f78a6..060348369 100644
> >> --- a/lib/netdev-dpdk.c
> >> +++ b/lib/netdev-dpdk.c
> >> @@ -523,6 +523,9 @@ struct netdev_dpdk {
> >>           * otherwise interrupt mode is used. */
> >>          bool requested_lsc_interrupt_mode;
> >>          bool lsc_interrupt_mode;
> >> +
> >> +        /* VF configuration. */
> >> +        struct eth_addr requested_hwaddr;
> >>      );
> >>  
> >>      PADDED_MEMBERS(CACHE_LINE_SIZE,
> >> @@ -1685,6 +1688,16 @@ out:
> >>      return ret;
> >>  }
> >>  
> >> +static bool
> >> +dpdk_port_is_representor(struct netdev_dpdk *dev)
> >> +    OVS_REQUIRES(dev->mutex)
> >> +{
> >> +    struct rte_eth_dev_info dev_info;
> >> +
> >> +    rte_eth_dev_info_get(dev->port_id, &dev_info);
> >> +    return (*dev_info.dev_flags) & RTE_ETH_DEV_REPRESENTOR;
> >> +}
> >> +
> >>  static int
> >>  netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
> >>  {
> >> @@ -1719,6 +1732,13 @@ netdev_dpdk_get_config(const struct netdev *netdev, 
> >> struct smap *args)
> >>          }
> >>          smap_add(args, "lsc_interrupt_mode",
> >>                   dev->lsc_interrupt_mode ? "true" : "false");
> >> +
> >> +        if (dpdk_port_is_representor(dev)) {
> >> +            smap_add_format(args, "requested_hwaddr", ETH_ADDR_FMT,
> >> +                            ETH_ADDR_ARGS(dev->requested_hwaddr));
> >> +            smap_add_format(args, "configured_hwaddr", ETH_ADDR_FMT,
> >> +                            ETH_ADDR_ARGS(dev->hwaddr));
> > 
> > '*mac*' is used elsewhere (mac, mac_in_use, dpdk-vf-mac), think it's
> > better to keep consistent
> 
> Since this is 'config' it should report things in a way they could be passed
> to the datbase.  So, it should be 'dpdk-vf-mac' and it should report the
> value that was requested, i.e. dev->requested_hwaddr.
> 
> > 
> >> +        }
> >>      }
> >>      ovs_mutex_unlock(&dev->mutex);
> >>  
> >> @@ -1898,6 +1918,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
> >> struct smap *args,
> >>          {RTE_FC_RX_PAUSE, RTE_FC_FULL    }
> >>      };
> >>      const char *new_devargs;
> >> +    const char *vf_mac;
> >>      int err = 0;
> >>  
> >>      ovs_mutex_lock(&dpdk_mutex);
> >> @@ -1968,6 +1989,36 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
> >> struct smap *args,
> >>          goto out;
> >>      }
> >>  
> >> +    vf_mac = smap_get(args, "dpdk-vf-mac");
> >> +    if (vf_mac) {
> >> +        struct eth_addr mac;
> >> +
> >> +        err = EINVAL;
> >> +
> >> +        if (!dpdk_port_is_representor(dev)) {
> >> +            VLOG_ERR_BUF(errp, "'%s' is trying to set the VF MAC '%s' "
> >> +                         "but 'options:dpdk-vf-mac' is only supported for 
> >> "
> >> +                         "VF representors.",
> >> +                         netdev_get_name(netdev), vf_mac);
> > 
> > Warnings here seems more in line with the other configs that cannot be
> > applied.
> 
> I agree, if it's not a representor we could just issue a warning and continue.
> 

Your phrasing seems to limit the (err -> warn) change to the
representor filtering part. Should the MAC sanitizing messages below remain
errors?

> > 
> >> +            goto out;
> >> +        }
> >> +        if (!eth_addr_from_string(vf_mac, &mac)) {
> >> +            VLOG_ERR_BUF(errp, "interface '%s': cannot parse VF MAC '%s'",
> >> +                         netdev_get_name(netdev), vf_mac);
> > 
> > as above
> > 
> >> +            goto out;
> >> +        }
> >> +        if (eth_addr_is_multicast(mac)) {
> >> +            VLOG_ERR_BUF(errp,
> >> +                         "interface '%s': cannot set VF MAC to multicast "
> >> +                         "address", netdev_get_name(netdev));
> > 
> > as above
> > 
> >> +            goto out;
> >> +        }
> >> +        if (!eth_addr_equals(dev->requested_hwaddr, mac)) {
> >> +            dev->requested_hwaddr = mac;
> >> +            netdev_request_reconfigure(netdev);
> >> +        }
> >> +    }
> >> +
> >>      lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
> >>      if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
> >>          dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
> >> @@ -4938,6 +4989,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
> >>          && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
> >>          && dev->rxq_size == dev->requested_rxq_size
> >>          && dev->txq_size == dev->requested_txq_size
> >> +        && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
> >>          && dev->socket_id == dev->requested_socket_id
> >>          && dev->started && !dev->reset_needed) {
> >>          /* Reconfiguration is unnecessary */
> >> @@ -4969,6 +5021,14 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
> >>      dev->txq_size = dev->requested_txq_size;
> >>  
> >>      rte_free(dev->tx_q);
> >> +
> >> +    if (!eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)) {
> > 
> > Looks like if dpdk-vf-mac is not set, after initialization there will be
> > a configured hwaddr, and requested_hwaddr will be all zeros.
> > 
> > If this function is called because of some other config change, then
> > hwaddr !=  requested_hwaddr, and it will attempt to set the mac to the
> > requested_hwaddr of all zeros. This may also fail causing the
> > reconfigure to fail for other configs items.
> > 
> > (I admit, I hacked a phy port dev_info to say it was a representor, when
> > playing with this but the issue seems valid)
> > 
> > 
> >> +        err = netdev_dpdk_set_etheraddr__(dev, dev->requested_hwaddr);
> >> +        if (err) {
> >> +            goto out;
> >> +        }
> >> +    }
> >> +
> >>      err = dpdk_eth_dev_init(dev);
> >>      if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
> >>          netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO;
> >> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> >> index 81c84927f..c0bf03c95 100644
> >> --- a/vswitchd/vswitch.xml
> >> +++ b/vswitchd/vswitch.xml
> >> @@ -3280,6 +3280,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
> >> type=patch options:peer=p1 \
> >>            descriptors will be used by default.
> >>          </p>
> >>        </column>
> >> +
> >> +      <column name="options" key="dpdk-vf-mac">
> >> +        <p>
> >> +          Ethernet address to set for this VF interface.  If unset then 
> >> the
> >> +          default MAC address is used:
> >> +        </p>
> >> +        <ul>
> >> +          <li>For most drivers, the default MAC address assigned by
> >> +          their hardware.</li><li>For bifurcated drivers, the MAC 
> >> currently
> >> +          used by the kernel netdevice.</li>
> > 
> > nit: the formatting is a little different compared to the other
> > <li></li> usage, newline/indent etc.
> > 
> > thanks,
> > Kevin.
> > 
> >> +        </ul>
> >> +        <p>This option may only be used with dpdk VF representors.</p>
> >> +      </column>
> >>      </group>
> >>  
> >>      <group title="EMC (Exact Match Cache) Configuration">
> >>
> > 
> > 
> > 
> > 
> > 
> 

-- 
Gaëtan
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to