On 27/10/2020 14:03, Ilya Maximets wrote:
> On 10/20/20 1:15 PM, Kevin Traynor wrote:
>> On 16/09/2020 16:17, Gaetan Rivet wrote:
>>> In some cloud topologies, using DPDK VF representors in guest requires
>>> configuring a VF before it is assigned to the guest.
>>>
>>> A first basic option for such configuration is setting the VF MAC
>>> address. Add a key 'dpdk-vf-mac' to the 'options' column of the Interface
>>> table.
>>>
>>> This option can be used as such:
>>>
>>>    $ ovs-vsctl add-port br0 dpdk-rep0 -- set Interface dpdk-rep0 type=dpdk \
>>>       options:dpdk-vf-mac=00:11:22:33:44:55
>>>
>>
>> If I got it right, it doesn't seem like it solves any issues by limiting
>> to representor, but is just an attempt to limit some of the edge cases
>> around bifurcated devices to the devices where the requested
>> functionality is really needed now.
>>
>> This limitation has some costs...
>>
>> We will now have, mac, mac_in_use, dpdk-vf-mac, configured_hwaddr,
>> requested_hwaddr for the user to understand. I think it can be confusing.
> 
> To be clear, 'configured_hwaddr/requested_hwaddr' type of report should not
> exist at least in the output of 'get_config()' function.  Same applies
> to all existing '{configured,requested}_something' reports.
> 'get_config()' intended to report things that could be copied and passed 
> directly
> to database.  OTOH, 'get_status()' should report what is an actual device
> status, i.e. 'dpdk-vf-mac' should be in 'get_config()' and 'configured_hwaddr'
> should be in 'get_status()', but it should be named differently, e.g. just
> 'dpdk-vf-mac' and the callback it placed in already says that it's actually
> 'configured'.  In short:
>   - 'dpdk-vf-mac' from the 'get_config()' should report dev->requested_hwaddr.
>   - 'dpdk-vf-mac' from the 'get_status()' should report dev->hwaddr.

If an invalid mac is attempted or it fails to be set, this is what the
user will see for dpdk-vf-mac:

ovs-appctl dpctl/show -> dpdk-vf-mac=00:11:22:33:44:55
ovs-vsctl show -> dpdk-vf-mac=33:11:22:33:44:55
ovs-vsctl get Interface myport status -> dpdk-vf-mac=00:11:22:33:44:55

For completeness, i'll add the failed attempt is logged clearly.

I suppose we can tell the user to check the 'mac_in_use', but multiple
dpdk-vf-mac entries for the same device with different values is confusing.

requested_dpdk-vf-mac configured_dpdk-vf-mac from 'ovs-appctl
dpctl/show' would be explicit. At a minimum we should document which one
is requested and which one is configured.

> Of course, all this should be reported only if device is a representor.
> 
> 
>>
>> 'ovs-vsctl show' will show the dpdk-vf-mac, which is the main one we
>> document, but it's not clear if it was successful or not. (yes there is
>> a log entry, but it is a separate place).
>>
>> As it is specific for representors, if we ever want to allow setting of
>> mac on any non representor DPDK ports, we have an awkward user
>> interface. We would need to make another 'dpdk-mac' type option, or we
>> decide then to allow use mac in interface table but that it doesn't work
>> for representors, or there are two ways to set for representors etc.
>> None of this seems great.
>>
>> My feeling is that it is making things complicated for the user with
>> this many knobs, where we could just have mac and mac_in_use, live with
>> the edge cases (as Gaetan pointed out, we already do for MTU) and we
>> know as well if we ever need to extend further, the user interface will
>> stay simple. Just my 2C, maybe there's a good argument why we can't do
>> it like this.
> 
> Setting mac for physical device has lots of split-brain issues and unclear
> ways how and when undo changes as we do not know what was original mac or
> what to do if there was no mac at all at the start.  Moreover, some devices
> will not allow setting mac to it's original value (e.g. all zeroes).
> One more point is that vf and representor itself are different devices and
> these devices could have different mac addresses.  For now we agreed that

> DPDK api works for representors in a way that it always changes parameters
> of a virtual function, not representors', but that is not a very clear thing
> for all users (e.g. for DPDK developers), so 'vf' in 'dpdk-vf-mac' 
> specifically
> points to that difference.
> 

That's a good point, it does distinguish and iirc some people made
the point that this could be important.

> After a long discussion we decided to make this option as specific as 
> possible,
> so users will not try to use it to configure anything else beside mac of a
> virtual function.  All restrictions and downsides of configuration of vf mac
> was/should be documented.
> 

ok, I can see the pros of this type of design too so I won't object, but
agree the downsides like the additional fields and ability for user to
understand clearly the output from commands above should be documented.

> 
>>
>> Few comments on the code as it is now below.
>>
>>
>>> Signed-off-by: Gaetan Rivet <gr...@u256.net>
>>> Suggested-by: Ilya Maximets <i.maxim...@ovn.org>
>>> Acked-by: Eli Britstein <el...@nvidia.com>
>>> ---
>>>  Documentation/topics/dpdk/phy.rst | 22 ++++++++++++
>>>  NEWS                              |  2 ++
>>>  lib/netdev-dpdk.c                 | 60 +++++++++++++++++++++++++++++++
>>>  vswitchd/vswitch.xml              | 13 +++++++
>>>  4 files changed, 97 insertions(+)
>>>
>>> diff --git a/Documentation/topics/dpdk/phy.rst 
>>> b/Documentation/topics/dpdk/phy.rst
>>> index 38e52c8de..73dcb7c28 100644
>>> --- a/Documentation/topics/dpdk/phy.rst
>>> +++ b/Documentation/topics/dpdk/phy.rst
>>> @@ -379,6 +379,28 @@ an eth device whose mac address is 
>>> ``00:11:22:33:44:55``::
>>>      $ ovs-vsctl add-port br0 dpdk-mac -- set Interface dpdk-mac type=dpdk \
>>>         options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55"
>>>  
>>> +Representor specific configuration
>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +In some topologies, a VF must be configured before being assigned to a
>>> +guest (VM) machine.  This configuration is done through VF-specific fields
>>> +in the ``options`` column of the Interface table.
>>> +
>>> +.. important::
>>> +
>>> +   Some DPDK port use `bifurcated drivers <bifurcated-drivers>`__,
>>> +   which means that a kernel netdevice remains when Open vSwitch is 
>>> stopped.
>>> +
>>> +   In such case, any configuration applied to a VF would remain set on the
>>> +   kernel netdevice, and be inherited from it when Open vSwitch is 
>>> restarted,
>>> +   even if the options described in this section are unset from Open 
>>> vSwitch.
>>> +
>>> +.. _bifurcated-drivers: 
>>> http://doc.dpdk.org/guides/linux_gsg/linux_drivers.html#bifurcated-driver
>>> +
>>> +- Configure the VF MAC address::
>>> +
>>> +    $ ovs-vsctl set Interface dpdk-rep0 
>>> options:dpdk-vf-mac=00:11:22:33:44:55
>>> +
>>
>> As mentioned above, if for some reason setting the mac fails there will
>> be warning in the logs, but 'ovs-vsctl show' will still show the
>> dpdk-vf-mac that was not applied and it's quite prominent. Also, there
>> are mac, mac_in_use, dpdk-vf-mac, configured_hwaddr, requested_hwaddr.
>>
>> Suggest to add a small summary here for each one as it relates
>> representors and make clear to users,
>> 1. how to find the mac the user requested
>> 2. how to find the current mac
>>
>>>  Jumbo Frames
>>>  ------------
>>>  
>>> diff --git a/NEWS b/NEWS
>>> index 2f67d5047..5ece29474 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -24,6 +24,8 @@ v2.14.0 - 17 Aug 2020
>>
>> 2.15....hopefully :-)
>>
>>>         to list and change log levels in DPDK components.
>>>       * Vhost-user Dequeue zero-copy support is deprecated and will be 
>>> removed
>>>         in the next release.
>>> +     * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
>>> +       that allows configuring the MAC address of a VF representor.
>>>     - Linux datapath:
>>>       * Support for kernel versions up to 5.5.x.
>>>     - AF_XDP:
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 4ef7f78a6..060348369 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -523,6 +523,9 @@ struct netdev_dpdk {
>>>           * otherwise interrupt mode is used. */
>>>          bool requested_lsc_interrupt_mode;
>>>          bool lsc_interrupt_mode;
>>> +
>>> +        /* VF configuration. */
>>> +        struct eth_addr requested_hwaddr;
>>>      );
>>>  
>>>      PADDED_MEMBERS(CACHE_LINE_SIZE,
>>> @@ -1685,6 +1688,16 @@ out:
>>>      return ret;
>>>  }
>>>  
>>> +static bool
>>> +dpdk_port_is_representor(struct netdev_dpdk *dev)
>>> +    OVS_REQUIRES(dev->mutex)
>>> +{
>>> +    struct rte_eth_dev_info dev_info;
>>> +
>>> +    rte_eth_dev_info_get(dev->port_id, &dev_info);
>>> +    return (*dev_info.dev_flags) & RTE_ETH_DEV_REPRESENTOR;
>>> +}
>>> +
>>>  static int
>>>  netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
>>>  {
>>> @@ -1719,6 +1732,13 @@ netdev_dpdk_get_config(const struct netdev *netdev, 
>>> struct smap *args)
>>>          }
>>>          smap_add(args, "lsc_interrupt_mode",
>>>                   dev->lsc_interrupt_mode ? "true" : "false");
>>> +
>>> +        if (dpdk_port_is_representor(dev)) {
>>> +            smap_add_format(args, "requested_hwaddr", ETH_ADDR_FMT,
>>> +                            ETH_ADDR_ARGS(dev->requested_hwaddr));
>>> +            smap_add_format(args, "configured_hwaddr", ETH_ADDR_FMT,
>>> +                            ETH_ADDR_ARGS(dev->hwaddr));
>>
>> '*mac*' is used elsewhere (mac, mac_in_use, dpdk-vf-mac), think it's
>> better to keep consistent
> 
> Since this is 'config' it should report things in a way they could be passed
> to the datbase.  So, it should be 'dpdk-vf-mac' and it should report the
> value that was requested, i.e. dev->requested_hwaddr.
> 
>>
>>> +        }
>>>      }
>>>      ovs_mutex_unlock(&dev->mutex);
>>>  
>>> @@ -1898,6 +1918,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
>>> struct smap *args,
>>>          {RTE_FC_RX_PAUSE, RTE_FC_FULL    }
>>>      };
>>>      const char *new_devargs;
>>> +    const char *vf_mac;
>>>      int err = 0;
>>>  
>>>      ovs_mutex_lock(&dpdk_mutex);
>>> @@ -1968,6 +1989,36 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
>>> struct smap *args,
>>>          goto out;
>>>      }
>>>  
>>> +    vf_mac = smap_get(args, "dpdk-vf-mac");
>>> +    if (vf_mac) {
>>> +        struct eth_addr mac;
>>> +
>>> +        err = EINVAL;
>>> +
>>> +        if (!dpdk_port_is_representor(dev)) {
>>> +            VLOG_ERR_BUF(errp, "'%s' is trying to set the VF MAC '%s' "
>>> +                         "but 'options:dpdk-vf-mac' is only supported for "
>>> +                         "VF representors.",
>>> +                         netdev_get_name(netdev), vf_mac);
>>
>> Warnings here seems more in line with the other configs that cannot be
>> applied.
> 
> I agree, if it's not a representor we could just issue a warning and continue.
> 
>>
>>> +            goto out;
>>> +        }
>>> +        if (!eth_addr_from_string(vf_mac, &mac)) {
>>> +            VLOG_ERR_BUF(errp, "interface '%s': cannot parse VF MAC '%s'",
>>> +                         netdev_get_name(netdev), vf_mac);
>>
>> as above
>>
>>> +            goto out;
>>> +        }
>>> +        if (eth_addr_is_multicast(mac)) {
>>> +            VLOG_ERR_BUF(errp,
>>> +                         "interface '%s': cannot set VF MAC to multicast "
>>> +                         "address", netdev_get_name(netdev));
>>
>> as above
>>
>>> +            goto out;
>>> +        }
>>> +        if (!eth_addr_equals(dev->requested_hwaddr, mac)) {
>>> +            dev->requested_hwaddr = mac;
>>> +            netdev_request_reconfigure(netdev);
>>> +        }
>>> +    }
>>> +
>>>      lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
>>>      if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
>>>          dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
>>> @@ -4938,6 +4989,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>>          && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
>>>          && dev->rxq_size == dev->requested_rxq_size
>>>          && dev->txq_size == dev->requested_txq_size
>>> +        && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
>>>          && dev->socket_id == dev->requested_socket_id
>>>          && dev->started && !dev->reset_needed) {
>>>          /* Reconfiguration is unnecessary */
>>> @@ -4969,6 +5021,14 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>>      dev->txq_size = dev->requested_txq_size;
>>>  
>>>      rte_free(dev->tx_q);
>>> +
>>> +    if (!eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)) {
>>
>> Looks like if dpdk-vf-mac is not set, after initialization there will be
>> a configured hwaddr, and requested_hwaddr will be all zeros.
>>
>> If this function is called because of some other config change, then
>> hwaddr !=  requested_hwaddr, and it will attempt to set the mac to the
>> requested_hwaddr of all zeros. This may also fail causing the
>> reconfigure to fail for other configs items.
>>
>> (I admit, I hacked a phy port dev_info to say it was a representor, when
>> playing with this but the issue seems valid)
>>
>>
>>> +        err = netdev_dpdk_set_etheraddr__(dev, dev->requested_hwaddr);
>>> +        if (err) {
>>> +            goto out;
>>> +        }
>>> +    }
>>> +
>>>      err = dpdk_eth_dev_init(dev);
>>>      if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
>>>          netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO;
>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>>> index 81c84927f..c0bf03c95 100644
>>> --- a/vswitchd/vswitch.xml
>>> +++ b/vswitchd/vswitch.xml
>>> @@ -3280,6 +3280,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
>>> type=patch options:peer=p1 \
>>>            descriptors will be used by default.
>>>          </p>
>>>        </column>
>>> +
>>> +      <column name="options" key="dpdk-vf-mac">
>>> +        <p>
>>> +          Ethernet address to set for this VF interface.  If unset then the
>>> +          default MAC address is used:
>>> +        </p>
>>> +        <ul>
>>> +          <li>For most drivers, the default MAC address assigned by
>>> +          their hardware.</li><li>For bifurcated drivers, the MAC currently
>>> +          used by the kernel netdevice.</li>
>>
>> nit: the formatting is a little different compared to the other
>> <li></li> usage, newline/indent etc.
>>
>> thanks,
>> Kevin.
>>
>>> +        </ul>
>>> +        <p>This option may only be used with dpdk VF representors.</p>
>>> +      </column>
>>>      </group>
>>>  
>>>      <group title="EMC (Exact Match Cache) Configuration">
>>>
>>
>>
>>
>>
>>
> 





_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to