On 25.04.2019 0:58, Roni Bar Yanai wrote:
> Hi Ilya,
> 
> Please see comment/clarification inline.
> Thanks,
> Roni
> 
>> -----Original Message-----
>> From: Ilya Maximets <i.maxim...@samsung.com>
>> Sent: Wednesday, April 24, 2019 6:12 PM
>> To: Ophir Munk <ophi...@mellanox.com>; ovs-dev@openvswitch.org
>> Cc: Ian Stokes <ian.sto...@intel.com>; Flavio Leitner <f...@sysclose.org>; 
>> Kevin
>> Traynor <ktray...@redhat.com>; Roni Bar Yanai <ron...@mellanox.com>; Finn
>> Christensen <f...@napatech.com>; Ben Pfaff <b...@ovn.org>; Roi Dayan
>> <r...@mellanox.com>; Simon Horman <simon.hor...@netronome.com>; Olga
>> Shern <ol...@mellanox.com>; Asaf Penso <as...@mellanox.com>; Oz Shlomo
>> <o...@mellanox.com>; Paul Blakey <pa...@mellanox.com>
>> Subject: Re: [PATCH] netdev: Dynamic per-port Flow API.
>>
>> On 24.04.2019 17:26, Ophir Munk wrote:
>>> Hi Ilya,
>>> Please find comments inline and at the end of this mail.
>>
>> Hi. Thanks for review.
>> Comments inline.
>>
>> Best regards, Ilya Maximets.
>>
>>>
>>> Regards,
>>> Ophir
>>>
>>>> -----Original Message-----
>>>> From: Ilya Maximets <i.maxim...@samsung.com>
>>>> Sent: Tuesday, April 23, 2019 7:19 PM
>>>> To: ovs-dev@openvswitch.org
>>>> Cc: Ian Stokes <ian.sto...@intel.com>; Flavio Leitner <f...@sysclose.org>;
>>>> Ophir Munk <ophi...@mellanox.com>; Kevin Traynor
>>>> <ktray...@redhat.com>; Roni Bar Yanai <ron...@mellanox.com>; Finn
>>>> Christensen <f...@napatech.com>; Ben Pfaff <b...@ovn.org>; Roi Dayan
>>>> <r...@mellanox.com>; Simon Horman <simon.hor...@netronome.com>;
>>>> Ilya Maximets <i.maxim...@samsung.com>
>>>> Subject: [PATCH] netdev: Dynamic per-port Flow API.
>>>>
>>>> Current issues with Flow API:
>>>>
>>>> * OVS calls offloading functions regardless of successful
>>>>   flow API initialization. (ex. on init_flow_api failure)
>>>> * Static initilaization of Flow API for a netdev_class forbids
>>>>   having different offloading types for different instances
>>>>   of netdev with the same netdev_class. (ex. different vports in
>>>>   'system' and 'netdev' datapaths at the same time)
>>>>
>>>> Solution:
>>>>
>>>> * Move Flow API from the netdev_class to netdev instance.
>>>> * Make Flow API dynamic, i.e. probe the APIs and choose the
>>>>   suitable one.
>>>>
>>>
>>> I suggest distinguishing between initialization and probing.
>>> The probing you mention is implemented by testing device initialization:
>> init_flow_api().
>>> I suggest adding a new probe() API just for the sake of probing. It will 
>>> check the
>> netdev struct.
>>> I suggest  leaving the init_flow_api() API for HW/drivers initialization.
>>> The HW/driver may fail at initialization, but it does not mean we need to 
>>> probe
>> for a new API in that case.
>>
>> I also thought about separate 'probe()' api, but it seems that probing
>> will be equal to 'init()' in most cases. Checking the 'netdev struct'
>> fro probing is a bad solution as instances of same netdev class could
>> require different offloading apis. Probing a new api on init failure
>> could be useful exactly for this case. Anyway we'll be able to improve
>> the logic of assigning in the future if it will be needed. This is
>> the single point in code where we have information about all the available
>> offloading providers.
>>
>>>
>>>> Side effects:
>>>>
>>>> * Flow API providers localized as possible in their modules.
>>>> * Now we have an ability to make runtime checks. For example,
>>>>   we could check if particular device supports features we
>>>>   need, like if dpdk device supports RSS+MARK action.
>>>>
>>>> Signed-off-by: Ilya Maximets <i.maxim...@samsung.com>
>>>> ---
>>>>
>>>> Since RFC:
>>>>   * Fixed sparce build.
>>>>   * Some logs turned from INFO to DBG.
>>>>   * Enabled HW offloading on non-Linux systems
>>>>     (For testing with dummy provider).
>>>>
>>>>  lib/automake.mk               |   2 +-
>>>>  lib/dpdk.c                    |   2 +
>>>>  lib/netdev-dpdk.c             |  24 +++-
>>>>  lib/netdev-dpdk.h             |   3 +
>>>>  lib/netdev-dummy.c            |  24 +++-
>>>>  lib/netdev-linux.c            |   3 -
>>>>  lib/netdev-linux.h            |  10 --
>>>>  lib/netdev-offload-provider.h |  99 +++++++++++++++
>>>>  lib/netdev-provider.h         |  67 +----------
>>>>  lib/netdev-rte-offloads.c     |  40 +++++-
>>>>  lib/netdev-rte-offloads.h     |  41 +------
>>>>  lib/netdev-tc-offloads.c      |  39 ++++--
>>>>  lib/netdev-tc-offloads.h      |  44 -------
>>>>  lib/netdev-vport.c            |   6 +-
>>>>  lib/netdev.c                  | 221 +++++++++++++++++++++++++++-------
>>>>  tests/dpif-netdev.at          |   4 +-
>>>>  tests/ofproto-macros.at       |   1 -
>>>>  17 files changed, 398 insertions(+), 232 deletions(-)  create mode 100644
>>>> lib/netdev-offload-provider.h  delete mode 100644 lib/netdev-tc-offloads.h
>>>>
>>>> diff --git a/lib/automake.mk b/lib/automake.mk index cc5dccf39..b9f3b69e7
>>>> 100644
>>>> --- a/lib/automake.mk
>>>> +++ b/lib/automake.mk
>>>> @@ -138,6 +138,7 @@ lib_libopenvswitch_la_SOURCES = \
>>>>    lib/netdev-dpdk.h \
>>>>    lib/netdev-dummy.c \
>>>>    lib/netdev-provider.h \
>>>> +  lib/netdev-offload-provider.h \
>>>
>>> Please keep alphabetical order (switch between the last 2 lines)
>>>
>>>>    lib/netdev-rte-offloads.h \
>>>>    lib/netdev-vport.c \
>>>>    lib/netdev-vport.h \
>>>> @@ -393,7 +394,6 @@ lib_libopenvswitch_la_SOURCES += \
>>>>    lib/netdev-linux.c \
>>>>    lib/netdev-linux.h \
>>>>    lib/netdev-tc-offloads.c \
>>>> -  lib/netdev-tc-offloads.h \
>>>>    lib/netlink-conntrack.c \
>>>>    lib/netlink-conntrack.h \
>>>>    lib/netlink-notifier.c \
>>>> diff --git a/lib/dpdk.c b/lib/dpdk.c
>>>> index dc6171546..6c6298635 100644
>>>> --- a/lib/dpdk.c
>>>> +++ b/lib/dpdk.c
>>>> @@ -34,6 +34,7 @@
>>>>  #include "dirs.h"
>>>>  #include "fatal-signal.h"
>>>>  #include "netdev-dpdk.h"
>>>> +#include "netdev-rte-offloads.h"
>>>>  #include "openvswitch/dynamic-string.h"
>>>>  #include "openvswitch/vlog.h"
>>>>  #include "ovs-numa.h"
>>>> @@ -442,6 +443,7 @@ dpdk_init__(const struct smap *ovs_other_config)
>>>>
>>>>      /* Finally, register the dpdk classes */
>>>>      netdev_dpdk_register();
>>>> +    netdev_dpdk_flow_api_register();
>>>>      return true;
>>>>  }
>>>>
>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
>>>> 47153dc60..c06c9ef81 100644
>>>> --- a/lib/netdev-dpdk.c
>>>> +++ b/lib/netdev-dpdk.c
>>>> @@ -4204,6 +4204,27 @@ unlock:
>>>>      return err;
>>>>  }
>>>>
>>>> +bool
>>>> +netdev_dpdk_flow_api_supported(struct netdev *netdev) {
>>>> +    struct netdev_dpdk *dev;
>>>> +    bool ret = false;
>>>> +
>>>> +    if (!is_dpdk_class(netdev->netdev_class)) {
>>>> +        goto out;
>>>> +    }
>>>> +
>>>> +    dev = netdev_dpdk_cast(netdev);
>>>> +    ovs_mutex_lock(&dev->mutex);
>>>> +    if (dev->type == DPDK_DEV_ETH) {
>>>> +        /* TODO: Check if we able to offload some minimal flow. */
>>>> +        ret = true;
>>>> +    }
>>>> +    ovs_mutex_unlock(&dev->mutex);
>>>> +out:
>>>> +    return ret;
>>>> +}
>>>> +
>>>>  int
>>>>  netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
>>>>                               struct rte_flow *rte_flow, @@ -4268,8 
>>>> +4289,7 @@
>>>> netdev_dpdk_rte_flow_create(struct netdev *netdev,
>>>>      .get_features = netdev_dpdk_get_features,           \
>>>>      .get_status = netdev_dpdk_get_status,               \
>>>>      .reconfigure = netdev_dpdk_reconfigure,             \
>>>> -    .rxq_recv = netdev_dpdk_rxq_recv,                   \
>>>> -    DPDK_FLOW_OFFLOAD_API
>>>> +    .rxq_recv = netdev_dpdk_rxq_recv
>>>>
>>>>  static const struct netdev_class dpdk_class = {
>>>>      .type = "dpdk",
>>>> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index
>>>> 9bbb8d8d6..60631c4f0 100644
>>>> --- a/lib/netdev-dpdk.h
>>>> +++ b/lib/netdev-dpdk.h
>>>> @@ -34,6 +34,9 @@ struct rte_flow_action;
>>>>
>>>>  void netdev_dpdk_register(void);
>>>>  void free_dpdk_buf(struct dp_packet *);
>>>> +
>>>> +bool netdev_dpdk_flow_api_supported(struct netdev *);
>>>> +
>>>>  int
>>>>  netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
>>>>                               struct rte_flow *rte_flow, diff --git 
>>>> a/lib/netdev-dummy.c
>>>> b/lib/netdev-dummy.c index 3f90ffa09..2e2e0c2ab 100644
>>>> --- a/lib/netdev-dummy.c
>>>> +++ b/lib/netdev-dummy.c
>>>> @@ -24,6 +24,7 @@
>>>>  #include "dp-packet.h"
>>>>  #include "dpif-netdev.h"
>>>>  #include "flow.h"
>>>> +#include "netdev-offload-provider.h"
>>>>  #include "netdev-provider.h"
>>>>  #include "netdev-vport.h"
>>>>  #include "odp-util.h"
>>>> @@ -1523,10 +1524,6 @@ exit:
>>>>      return error ? -1 : 0;
>>>>  }
>>>>
>>>> -#define DUMMY_FLOW_OFFLOAD_API                          \
>>>> -    .flow_put = netdev_dummy_flow_put,                  \
>>>> -    .flow_del = netdev_dummy_flow_del
>>>> -
>>>>  #define NETDEV_DUMMY_CLASS_COMMON                       \
>>>>      .run = netdev_dummy_run,                            \
>>>>      .wait = netdev_dummy_wait,                          \
>>>> @@ -1559,8 +1556,7 @@ exit:
>>>>      .rxq_dealloc = netdev_dummy_rxq_dealloc,            \
>>>>      .rxq_recv = netdev_dummy_rxq_recv,                  \
>>>>      .rxq_wait = netdev_dummy_rxq_wait,                  \
>>>> -    .rxq_drain = netdev_dummy_rxq_drain,                \
>>>> -    DUMMY_FLOW_OFFLOAD_API
>>>> +    .rxq_drain = netdev_dummy_rxq_drain
>>>>
>>>>  static const struct netdev_class dummy_class = {
>>>>      NETDEV_DUMMY_CLASS_COMMON,
>>>> @@ -1578,6 +1574,20 @@ static const struct netdev_class
>>>> dummy_pmd_class = {
>>>>      .is_pmd = true,
>>>>      .reconfigure = netdev_dummy_reconfigure  };
>>>> +
>>>> +static int
>>>> +netdev_dummy_offloads_init_flow_api(struct netdev *netdev) {
>>>> +    return is_dummy_class(netdev->netdev_class) ? 0 : -1; }
>>>> +
>>>> +static const struct netdev_flow_api netdev_dummy_offloads = {
>>>> +    .type = "dummy",
>>>> +    .flow_put = netdev_dummy_flow_put,
>>>> +    .flow_del = netdev_dummy_flow_del,
>>>> +    .init_flow_api = netdev_dummy_offloads_init_flow_api,
>>>> +};
>>>> +
>>>>
>>>
>>>>  /* Helper functions. */
>>>>
>>>> @@ -2024,5 +2034,7 @@ netdev_dummy_register(enum dummy_level
>>>> level)
>>>>      netdev_register_provider(&dummy_internal_class);
>>>>      netdev_register_provider(&dummy_pmd_class);
>>>>
>>>> +    netdev_register_flow_api_provider(&netdev_dummy_offloads);
>>>> +
>>>>      netdev_vport_tunnel_register();
>>>>  }
>>>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 
>>>> f75d73fd3..e4ea94cf9
>>>> 100644
>>>> --- a/lib/netdev-linux.c
>>>> +++ b/lib/netdev-linux.c
>>>> @@ -55,7 +55,6 @@
>>>>  #include "hash.h"
>>>>  #include "openvswitch/hmap.h"
>>>>  #include "netdev-provider.h"
>>>> -#include "netdev-tc-offloads.h"
>>>>  #include "netdev-vport.h"
>>>>  #include "netlink-notifier.h"
>>>>  #include "netlink-socket.h"
>>>> @@ -3321,7 +3320,6 @@ exit:
>>>>
>>>>  const struct netdev_class netdev_linux_class = {
>>>>      NETDEV_LINUX_CLASS_COMMON,
>>>> -    LINUX_FLOW_OFFLOAD_API,
>>>>      .type = "system",
>>>
>>> Please note that before this patch type="system" and type="internal" had
>> LINUX_FLOW_OFFLOAD_API, but type="tap" did not.
>>> With this patch type="tap" is going to be tested with init_flow_api(). If 
>>> the test
>> passes - type="tap" is going to have flow_api which it did not have before.
>>> Was is verified that type="tap" will fail with init_flow_api()?
>>
>> If 'init_flow_api' will be able to create tc qdisc than offloading
>> is supported. If not - init will fail. It's simple. When linux
>> will support tc offloading for tap devices we'll have offloading
>> for free without changing the OVS code.
>>
>>>
>>>>      .construct = netdev_linux_construct,
>>>>      .get_stats = netdev_linux_get_stats, @@ -3341,7 +3339,6 @@ const 
>>>> struct
>>>> netdev_class netdev_tap_class = {
>>>>
>>>>  const struct netdev_class netdev_internal_class = {
>>>>      NETDEV_LINUX_CLASS_COMMON,
>>>> -    LINUX_FLOW_OFFLOAD_API,
>>>>      .type = "internal",
>>>>      .construct = netdev_linux_construct,
>>>>      .get_stats = netdev_internal_get_stats, diff --git 
>>>> a/lib/netdev-linux.h
>>> ...
>>>> +static int
>>>> +netdev_assign_flow_api(struct netdev *netdev) {
>>>> +    struct netdev_registered_flow_api *rfa;
>>>> +
>>>> +    CMAP_FOR_EACH (rfa, cmap_node, &netdev_flow_apis) {
>>>> +        if (!rfa->flow_api->init_flow_api(netdev)) {
>>>> +            ovs_refcount_ref(&rfa->refcnt);
>>>> +            ovsrcu_set(&netdev->flow_api, rfa->flow_api);
>>>> +            VLOG_INFO("%s: Assigned flow API '%s'.",
>>>> +                      netdev_get_name(netdev), rfa->flow_api->type);
>>>> +            return 0;
>>>> +        }
>>>
>>> By this logic the first API which successfully passes init_flow_api() 
>>> becomes the
>> flow_api for this netdev.
>>> Assuming there are 2 vport APIs: 1. system datapath vport API 2. netdev
>> datapath vport API (planned to be added soon).
>>> Assuming both vport APIs successfully pass init_flow_api(), then the first 
>>> one to
>> be tested will always be chosen and the second one will never be used.
>>> Are we sure that the system datapath vport will not always be chosen with 
>>> the
>> current netdev_tc_init_flow_api() implementation.
>>> How can we distinguish between a vport created under different datapaths?
>>> This issue must be resolved for this patch.
>>
>> netdev-vport that works in userspace datapath has no backing linux
>> interface --> get_ifindex failure --> no offloading API configured.
>> When the planned offloading in dpdk flow api provider will be
>> implemented it will be chosen.
>>
>> Right now we don't have and don't plan to have natdevs that suitable
>> for more than one flow api provider. If they appear in the future, we
>> could solve this by choosing most suitable provider instead of first
>> one. But this is over-engineering for now and even for a near future.
>>
> This is more a request for a clarification. 
> If I understand correctly,  you expect that there will be one flow api 
> provider for dpdk.

Yes.

> When the api is called we must test the netdev class and then do the needed 
> tests (maybe HW capabilities) so we can tell if we should return true or 
> false.

Correct in general, but you need to test netdev instance, not the netdev class.

> For vport we have multiple subtypes (vxlan,gre..etc), so if we only support 
> sub group we can test the sub type by its name.

All the vport subtypes are different netdev classes.

> What happens if we have both system and netdev datapath (corner case, but 
> still)? If the system is called first and there is no backing linux port then 
> we are OK. 
> What will happen if the netdev offload provider will be called first? do you 
> expect that we do the same test (linux backing port?) in the dpdk 
> implementation? Is this type of test being correct for all vport types? That 
> is, if we don't have a backing linux port (ifindex) it means that the port is 
> part of a netdev bridge?

Yes, you could add simple check:

    if (netdev_vport_is_vxlan(netdev) && netdev_get_ifindex(netdev) < 0) {
        return 0;
    }

This check should be done inside init_flow_api() (not in 
netdev_dpdk_flow_api_supported()).
It's allowed to call functions of different netdev provides inside the
netdev offload provider code.

'netdev_vport_is_vxlan' should be implemented inside netdev-vport.c.


Note that you'll need an additional change to allow offloading for ports
without ifindex:

diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
index e0731959d..3c474b18c 100644
--- a/lib/netdev-offload.c
+++ b/lib/netdev-offload.c
@@ -527,10 +527,6 @@ netdev_ports_insert(struct netdev *netdev, const struct 
dpif_class *dpif_class,
     struct port_to_netdev_data *data;
     int ifindex = netdev_get_ifindex(netdev);
 
-    if (ifindex < 0) {
-        return ENODEV;
-    }
-
     ovs_mutex_lock(&netdev_hmap_mutex);
     if (netdev_ports_lookup(dpif_port->port_no, dpif_class)) {
         ovs_mutex_unlock(&netdev_hmap_mutex);
@@ -541,11 +537,16 @@ netdev_ports_insert(struct netdev *netdev, const struct 
dpif_class *dpif_class,
     data->netdev = netdev_ref(netdev);
     data->dpif_class = dpif_class;
     dpif_port_clone(&data->dpif_port, dpif_port);
-    data->ifindex = ifindex;
+
+    if (ifindex >= 0) {
+        data->ifindex = ifindex;
+        hmap_insert(&ifindex_to_port, &data->ifindex_node, ifindex);
+    } else {
+        data->ifindex = -1;
+    }
 
     hmap_insert(&port_to_netdev, &data->portno_node,
                 netdev_ports_hash(dpif_port->port_no, dpif_class));
-    hmap_insert(&ifindex_to_port, &data->ifindex_node, ifindex);
     ovs_mutex_unlock(&netdev_hmap_mutex);
 
     netdev_init_flow_api(netdev);
---

It's unrelated to current patch, so, I'll send it separately.

> Basically, there is nothing in the init flow parameters that can tell us if 
> the port should be netdev or system (ifindex is implementation related) 
> because we don't have the context of the caller.
> 
>>>
>>>> +        VLOG_DBG("%s: flow API '%s' is not suitable.",
>>>> +                 netdev_get_name(netdev), rfa->flow_api->type);
>>>> +    }
>>>> +    VLOG_INFO("%s: No suitable flow API found.",
>>>> + netdev_get_name(netdev));
>>>> +
>>>> +    return -1;
>>>
>>> Please return an enumerated positive error.
>>
>> It's an internal helper function. Not the public API.
>> All the functions that visible outside this module returns proper
>> positive error codes.
>>
>>>
>>>> +}
>>>> +
>>>>  int
>>>>  netdev_flow_flush(struct netdev *netdev)  {
>>>> -    const struct netdev_class *class = netdev->netdev_class;
>>>
>>> A few more comments.
>>> 1. This patch changes code related to OVS offload implementation. OVS 
>>> offload
>> must be confirmed with this patch before it is accepted.
>>
>> Sure. And I'll appreciate any help with testing as I'm limited with
>> offloading capable hardware.
>>
>>>
>>> 2. We may suggest to offload "dpdkvhostclient" port type and "vxlan" (under
>> netdev datapath)  port type.
>>> Please confirm the following understanding of the required steps.
>>> 2.1 Need to register two new flow_apis for the two ports types.
>>> 2.2 Need to write the corresponding init_flow_api() to test the relevant 
>>> netdev
>> instances.
>>
>> No, you don't need to implement new flow api provider. I assume that
>> offloading for these netdevs will be implemented via DPDK. So, you
>> need to add the functionality to netdev_rte_offload provider and
>> teach it to return success for these netdevs from the init_flow_api().
>>
>>> 2.3 The question how to distinguish between a vport under system datapath
>> versus vport under netdev datapath is still an open question (would be happy 
>> to
>> learn that it is already resolved).
>>
>> Solved. See above.
>>
>>>
>>> 3. Could you please add inline documentation for the newly added code? It
>> seems the logic is spread is several places so any comments will be helpful.
>>
>> Provider registering API has comments. 'netdev_assign_flow_api' is
>> quiet simple and mostly documented by the log messages inside.
>> I'd like to add some comments if you'll point me functions that needs them.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to