On 25.04.2019 0:58, Roni Bar Yanai wrote: > Hi Ilya, > > Please see comment/clarification inline. > Thanks, > Roni > >> -----Original Message----- >> From: Ilya Maximets <i.maxim...@samsung.com> >> Sent: Wednesday, April 24, 2019 6:12 PM >> To: Ophir Munk <ophi...@mellanox.com>; ovs-dev@openvswitch.org >> Cc: Ian Stokes <ian.sto...@intel.com>; Flavio Leitner <f...@sysclose.org>; >> Kevin >> Traynor <ktray...@redhat.com>; Roni Bar Yanai <ron...@mellanox.com>; Finn >> Christensen <f...@napatech.com>; Ben Pfaff <b...@ovn.org>; Roi Dayan >> <r...@mellanox.com>; Simon Horman <simon.hor...@netronome.com>; Olga >> Shern <ol...@mellanox.com>; Asaf Penso <as...@mellanox.com>; Oz Shlomo >> <o...@mellanox.com>; Paul Blakey <pa...@mellanox.com> >> Subject: Re: [PATCH] netdev: Dynamic per-port Flow API. >> >> On 24.04.2019 17:26, Ophir Munk wrote: >>> Hi Ilya, >>> Please find comments inline and at the end of this mail. >> >> Hi. Thanks for review. >> Comments inline. >> >> Best regards, Ilya Maximets. >> >>> >>> Regards, >>> Ophir >>> >>>> -----Original Message----- >>>> From: Ilya Maximets <i.maxim...@samsung.com> >>>> Sent: Tuesday, April 23, 2019 7:19 PM >>>> To: ovs-dev@openvswitch.org >>>> Cc: Ian Stokes <ian.sto...@intel.com>; Flavio Leitner <f...@sysclose.org>; >>>> Ophir Munk <ophi...@mellanox.com>; Kevin Traynor >>>> <ktray...@redhat.com>; Roni Bar Yanai <ron...@mellanox.com>; Finn >>>> Christensen <f...@napatech.com>; Ben Pfaff <b...@ovn.org>; Roi Dayan >>>> <r...@mellanox.com>; Simon Horman <simon.hor...@netronome.com>; >>>> Ilya Maximets <i.maxim...@samsung.com> >>>> Subject: [PATCH] netdev: Dynamic per-port Flow API. >>>> >>>> Current issues with Flow API: >>>> >>>> * OVS calls offloading functions regardless of successful >>>> flow API initialization. (ex. on init_flow_api failure) >>>> * Static initilaization of Flow API for a netdev_class forbids >>>> having different offloading types for different instances >>>> of netdev with the same netdev_class. (ex. different vports in >>>> 'system' and 'netdev' datapaths at the same time) >>>> >>>> Solution: >>>> >>>> * Move Flow API from the netdev_class to netdev instance. >>>> * Make Flow API dynamic, i.e. probe the APIs and choose the >>>> suitable one. >>>> >>> >>> I suggest distinguishing between initialization and probing. >>> The probing you mention is implemented by testing device initialization: >> init_flow_api(). >>> I suggest adding a new probe() API just for the sake of probing. It will >>> check the >> netdev struct. >>> I suggest leaving the init_flow_api() API for HW/drivers initialization. >>> The HW/driver may fail at initialization, but it does not mean we need to >>> probe >> for a new API in that case. >> >> I also thought about separate 'probe()' api, but it seems that probing >> will be equal to 'init()' in most cases. Checking the 'netdev struct' >> fro probing is a bad solution as instances of same netdev class could >> require different offloading apis. Probing a new api on init failure >> could be useful exactly for this case. Anyway we'll be able to improve >> the logic of assigning in the future if it will be needed. This is >> the single point in code where we have information about all the available >> offloading providers. >> >>> >>>> Side effects: >>>> >>>> * Flow API providers localized as possible in their modules. >>>> * Now we have an ability to make runtime checks. For example, >>>> we could check if particular device supports features we >>>> need, like if dpdk device supports RSS+MARK action. >>>> >>>> Signed-off-by: Ilya Maximets <i.maxim...@samsung.com> >>>> --- >>>> >>>> Since RFC: >>>> * Fixed sparce build. >>>> * Some logs turned from INFO to DBG. >>>> * Enabled HW offloading on non-Linux systems >>>> (For testing with dummy provider). >>>> >>>> lib/automake.mk | 2 +- >>>> lib/dpdk.c | 2 + >>>> lib/netdev-dpdk.c | 24 +++- >>>> lib/netdev-dpdk.h | 3 + >>>> lib/netdev-dummy.c | 24 +++- >>>> lib/netdev-linux.c | 3 - >>>> lib/netdev-linux.h | 10 -- >>>> lib/netdev-offload-provider.h | 99 +++++++++++++++ >>>> lib/netdev-provider.h | 67 +---------- >>>> lib/netdev-rte-offloads.c | 40 +++++- >>>> lib/netdev-rte-offloads.h | 41 +------ >>>> lib/netdev-tc-offloads.c | 39 ++++-- >>>> lib/netdev-tc-offloads.h | 44 ------- >>>> lib/netdev-vport.c | 6 +- >>>> lib/netdev.c | 221 +++++++++++++++++++++++++++------- >>>> tests/dpif-netdev.at | 4 +- >>>> tests/ofproto-macros.at | 1 - >>>> 17 files changed, 398 insertions(+), 232 deletions(-) create mode 100644 >>>> lib/netdev-offload-provider.h delete mode 100644 lib/netdev-tc-offloads.h >>>> >>>> diff --git a/lib/automake.mk b/lib/automake.mk index cc5dccf39..b9f3b69e7 >>>> 100644 >>>> --- a/lib/automake.mk >>>> +++ b/lib/automake.mk >>>> @@ -138,6 +138,7 @@ lib_libopenvswitch_la_SOURCES = \ >>>> lib/netdev-dpdk.h \ >>>> lib/netdev-dummy.c \ >>>> lib/netdev-provider.h \ >>>> + lib/netdev-offload-provider.h \ >>> >>> Please keep alphabetical order (switch between the last 2 lines) >>> >>>> lib/netdev-rte-offloads.h \ >>>> lib/netdev-vport.c \ >>>> lib/netdev-vport.h \ >>>> @@ -393,7 +394,6 @@ lib_libopenvswitch_la_SOURCES += \ >>>> lib/netdev-linux.c \ >>>> lib/netdev-linux.h \ >>>> lib/netdev-tc-offloads.c \ >>>> - lib/netdev-tc-offloads.h \ >>>> lib/netlink-conntrack.c \ >>>> lib/netlink-conntrack.h \ >>>> lib/netlink-notifier.c \ >>>> diff --git a/lib/dpdk.c b/lib/dpdk.c >>>> index dc6171546..6c6298635 100644 >>>> --- a/lib/dpdk.c >>>> +++ b/lib/dpdk.c >>>> @@ -34,6 +34,7 @@ >>>> #include "dirs.h" >>>> #include "fatal-signal.h" >>>> #include "netdev-dpdk.h" >>>> +#include "netdev-rte-offloads.h" >>>> #include "openvswitch/dynamic-string.h" >>>> #include "openvswitch/vlog.h" >>>> #include "ovs-numa.h" >>>> @@ -442,6 +443,7 @@ dpdk_init__(const struct smap *ovs_other_config) >>>> >>>> /* Finally, register the dpdk classes */ >>>> netdev_dpdk_register(); >>>> + netdev_dpdk_flow_api_register(); >>>> return true; >>>> } >>>> >>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index >>>> 47153dc60..c06c9ef81 100644 >>>> --- a/lib/netdev-dpdk.c >>>> +++ b/lib/netdev-dpdk.c >>>> @@ -4204,6 +4204,27 @@ unlock: >>>> return err; >>>> } >>>> >>>> +bool >>>> +netdev_dpdk_flow_api_supported(struct netdev *netdev) { >>>> + struct netdev_dpdk *dev; >>>> + bool ret = false; >>>> + >>>> + if (!is_dpdk_class(netdev->netdev_class)) { >>>> + goto out; >>>> + } >>>> + >>>> + dev = netdev_dpdk_cast(netdev); >>>> + ovs_mutex_lock(&dev->mutex); >>>> + if (dev->type == DPDK_DEV_ETH) { >>>> + /* TODO: Check if we able to offload some minimal flow. */ >>>> + ret = true; >>>> + } >>>> + ovs_mutex_unlock(&dev->mutex); >>>> +out: >>>> + return ret; >>>> +} >>>> + >>>> int >>>> netdev_dpdk_rte_flow_destroy(struct netdev *netdev, >>>> struct rte_flow *rte_flow, @@ -4268,8 >>>> +4289,7 @@ >>>> netdev_dpdk_rte_flow_create(struct netdev *netdev, >>>> .get_features = netdev_dpdk_get_features, \ >>>> .get_status = netdev_dpdk_get_status, \ >>>> .reconfigure = netdev_dpdk_reconfigure, \ >>>> - .rxq_recv = netdev_dpdk_rxq_recv, \ >>>> - DPDK_FLOW_OFFLOAD_API >>>> + .rxq_recv = netdev_dpdk_rxq_recv >>>> >>>> static const struct netdev_class dpdk_class = { >>>> .type = "dpdk", >>>> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index >>>> 9bbb8d8d6..60631c4f0 100644 >>>> --- a/lib/netdev-dpdk.h >>>> +++ b/lib/netdev-dpdk.h >>>> @@ -34,6 +34,9 @@ struct rte_flow_action; >>>> >>>> void netdev_dpdk_register(void); >>>> void free_dpdk_buf(struct dp_packet *); >>>> + >>>> +bool netdev_dpdk_flow_api_supported(struct netdev *); >>>> + >>>> int >>>> netdev_dpdk_rte_flow_destroy(struct netdev *netdev, >>>> struct rte_flow *rte_flow, diff --git >>>> a/lib/netdev-dummy.c >>>> b/lib/netdev-dummy.c index 3f90ffa09..2e2e0c2ab 100644 >>>> --- a/lib/netdev-dummy.c >>>> +++ b/lib/netdev-dummy.c >>>> @@ -24,6 +24,7 @@ >>>> #include "dp-packet.h" >>>> #include "dpif-netdev.h" >>>> #include "flow.h" >>>> +#include "netdev-offload-provider.h" >>>> #include "netdev-provider.h" >>>> #include "netdev-vport.h" >>>> #include "odp-util.h" >>>> @@ -1523,10 +1524,6 @@ exit: >>>> return error ? -1 : 0; >>>> } >>>> >>>> -#define DUMMY_FLOW_OFFLOAD_API \ >>>> - .flow_put = netdev_dummy_flow_put, \ >>>> - .flow_del = netdev_dummy_flow_del >>>> - >>>> #define NETDEV_DUMMY_CLASS_COMMON \ >>>> .run = netdev_dummy_run, \ >>>> .wait = netdev_dummy_wait, \ >>>> @@ -1559,8 +1556,7 @@ exit: >>>> .rxq_dealloc = netdev_dummy_rxq_dealloc, \ >>>> .rxq_recv = netdev_dummy_rxq_recv, \ >>>> .rxq_wait = netdev_dummy_rxq_wait, \ >>>> - .rxq_drain = netdev_dummy_rxq_drain, \ >>>> - DUMMY_FLOW_OFFLOAD_API >>>> + .rxq_drain = netdev_dummy_rxq_drain >>>> >>>> static const struct netdev_class dummy_class = { >>>> NETDEV_DUMMY_CLASS_COMMON, >>>> @@ -1578,6 +1574,20 @@ static const struct netdev_class >>>> dummy_pmd_class = { >>>> .is_pmd = true, >>>> .reconfigure = netdev_dummy_reconfigure }; >>>> + >>>> +static int >>>> +netdev_dummy_offloads_init_flow_api(struct netdev *netdev) { >>>> + return is_dummy_class(netdev->netdev_class) ? 0 : -1; } >>>> + >>>> +static const struct netdev_flow_api netdev_dummy_offloads = { >>>> + .type = "dummy", >>>> + .flow_put = netdev_dummy_flow_put, >>>> + .flow_del = netdev_dummy_flow_del, >>>> + .init_flow_api = netdev_dummy_offloads_init_flow_api, >>>> +}; >>>> + >>>> >>> >>>> /* Helper functions. */ >>>> >>>> @@ -2024,5 +2034,7 @@ netdev_dummy_register(enum dummy_level >>>> level) >>>> netdev_register_provider(&dummy_internal_class); >>>> netdev_register_provider(&dummy_pmd_class); >>>> >>>> + netdev_register_flow_api_provider(&netdev_dummy_offloads); >>>> + >>>> netdev_vport_tunnel_register(); >>>> } >>>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index >>>> f75d73fd3..e4ea94cf9 >>>> 100644 >>>> --- a/lib/netdev-linux.c >>>> +++ b/lib/netdev-linux.c >>>> @@ -55,7 +55,6 @@ >>>> #include "hash.h" >>>> #include "openvswitch/hmap.h" >>>> #include "netdev-provider.h" >>>> -#include "netdev-tc-offloads.h" >>>> #include "netdev-vport.h" >>>> #include "netlink-notifier.h" >>>> #include "netlink-socket.h" >>>> @@ -3321,7 +3320,6 @@ exit: >>>> >>>> const struct netdev_class netdev_linux_class = { >>>> NETDEV_LINUX_CLASS_COMMON, >>>> - LINUX_FLOW_OFFLOAD_API, >>>> .type = "system", >>> >>> Please note that before this patch type="system" and type="internal" had >> LINUX_FLOW_OFFLOAD_API, but type="tap" did not. >>> With this patch type="tap" is going to be tested with init_flow_api(). If >>> the test >> passes - type="tap" is going to have flow_api which it did not have before. >>> Was is verified that type="tap" will fail with init_flow_api()? >> >> If 'init_flow_api' will be able to create tc qdisc than offloading >> is supported. If not - init will fail. It's simple. When linux >> will support tc offloading for tap devices we'll have offloading >> for free without changing the OVS code. >> >>> >>>> .construct = netdev_linux_construct, >>>> .get_stats = netdev_linux_get_stats, @@ -3341,7 +3339,6 @@ const >>>> struct >>>> netdev_class netdev_tap_class = { >>>> >>>> const struct netdev_class netdev_internal_class = { >>>> NETDEV_LINUX_CLASS_COMMON, >>>> - LINUX_FLOW_OFFLOAD_API, >>>> .type = "internal", >>>> .construct = netdev_linux_construct, >>>> .get_stats = netdev_internal_get_stats, diff --git >>>> a/lib/netdev-linux.h >>> ... >>>> +static int >>>> +netdev_assign_flow_api(struct netdev *netdev) { >>>> + struct netdev_registered_flow_api *rfa; >>>> + >>>> + CMAP_FOR_EACH (rfa, cmap_node, &netdev_flow_apis) { >>>> + if (!rfa->flow_api->init_flow_api(netdev)) { >>>> + ovs_refcount_ref(&rfa->refcnt); >>>> + ovsrcu_set(&netdev->flow_api, rfa->flow_api); >>>> + VLOG_INFO("%s: Assigned flow API '%s'.", >>>> + netdev_get_name(netdev), rfa->flow_api->type); >>>> + return 0; >>>> + } >>> >>> By this logic the first API which successfully passes init_flow_api() >>> becomes the >> flow_api for this netdev. >>> Assuming there are 2 vport APIs: 1. system datapath vport API 2. netdev >> datapath vport API (planned to be added soon). >>> Assuming both vport APIs successfully pass init_flow_api(), then the first >>> one to >> be tested will always be chosen and the second one will never be used. >>> Are we sure that the system datapath vport will not always be chosen with >>> the >> current netdev_tc_init_flow_api() implementation. >>> How can we distinguish between a vport created under different datapaths? >>> This issue must be resolved for this patch. >> >> netdev-vport that works in userspace datapath has no backing linux >> interface --> get_ifindex failure --> no offloading API configured. >> When the planned offloading in dpdk flow api provider will be >> implemented it will be chosen. >> >> Right now we don't have and don't plan to have natdevs that suitable >> for more than one flow api provider. If they appear in the future, we >> could solve this by choosing most suitable provider instead of first >> one. But this is over-engineering for now and even for a near future. >> > This is more a request for a clarification. > If I understand correctly, you expect that there will be one flow api > provider for dpdk.
Yes. > When the api is called we must test the netdev class and then do the needed > tests (maybe HW capabilities) so we can tell if we should return true or > false. Correct in general, but you need to test netdev instance, not the netdev class. > For vport we have multiple subtypes (vxlan,gre..etc), so if we only support > sub group we can test the sub type by its name. All the vport subtypes are different netdev classes. > What happens if we have both system and netdev datapath (corner case, but > still)? If the system is called first and there is no backing linux port then > we are OK. > What will happen if the netdev offload provider will be called first? do you > expect that we do the same test (linux backing port?) in the dpdk > implementation? Is this type of test being correct for all vport types? That > is, if we don't have a backing linux port (ifindex) it means that the port is > part of a netdev bridge? Yes, you could add simple check: if (netdev_vport_is_vxlan(netdev) && netdev_get_ifindex(netdev) < 0) { return 0; } This check should be done inside init_flow_api() (not in netdev_dpdk_flow_api_supported()). It's allowed to call functions of different netdev provides inside the netdev offload provider code. 'netdev_vport_is_vxlan' should be implemented inside netdev-vport.c. Note that you'll need an additional change to allow offloading for ports without ifindex: diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c index e0731959d..3c474b18c 100644 --- a/lib/netdev-offload.c +++ b/lib/netdev-offload.c @@ -527,10 +527,6 @@ netdev_ports_insert(struct netdev *netdev, const struct dpif_class *dpif_class, struct port_to_netdev_data *data; int ifindex = netdev_get_ifindex(netdev); - if (ifindex < 0) { - return ENODEV; - } - ovs_mutex_lock(&netdev_hmap_mutex); if (netdev_ports_lookup(dpif_port->port_no, dpif_class)) { ovs_mutex_unlock(&netdev_hmap_mutex); @@ -541,11 +537,16 @@ netdev_ports_insert(struct netdev *netdev, const struct dpif_class *dpif_class, data->netdev = netdev_ref(netdev); data->dpif_class = dpif_class; dpif_port_clone(&data->dpif_port, dpif_port); - data->ifindex = ifindex; + + if (ifindex >= 0) { + data->ifindex = ifindex; + hmap_insert(&ifindex_to_port, &data->ifindex_node, ifindex); + } else { + data->ifindex = -1; + } hmap_insert(&port_to_netdev, &data->portno_node, netdev_ports_hash(dpif_port->port_no, dpif_class)); - hmap_insert(&ifindex_to_port, &data->ifindex_node, ifindex); ovs_mutex_unlock(&netdev_hmap_mutex); netdev_init_flow_api(netdev); --- It's unrelated to current patch, so, I'll send it separately. > Basically, there is nothing in the init flow parameters that can tell us if > the port should be netdev or system (ifindex is implementation related) > because we don't have the context of the caller. > >>> >>>> + VLOG_DBG("%s: flow API '%s' is not suitable.", >>>> + netdev_get_name(netdev), rfa->flow_api->type); >>>> + } >>>> + VLOG_INFO("%s: No suitable flow API found.", >>>> + netdev_get_name(netdev)); >>>> + >>>> + return -1; >>> >>> Please return an enumerated positive error. >> >> It's an internal helper function. Not the public API. >> All the functions that visible outside this module returns proper >> positive error codes. >> >>> >>>> +} >>>> + >>>> int >>>> netdev_flow_flush(struct netdev *netdev) { >>>> - const struct netdev_class *class = netdev->netdev_class; >>> >>> A few more comments. >>> 1. This patch changes code related to OVS offload implementation. OVS >>> offload >> must be confirmed with this patch before it is accepted. >> >> Sure. And I'll appreciate any help with testing as I'm limited with >> offloading capable hardware. >> >>> >>> 2. We may suggest to offload "dpdkvhostclient" port type and "vxlan" (under >> netdev datapath) port type. >>> Please confirm the following understanding of the required steps. >>> 2.1 Need to register two new flow_apis for the two ports types. >>> 2.2 Need to write the corresponding init_flow_api() to test the relevant >>> netdev >> instances. >> >> No, you don't need to implement new flow api provider. I assume that >> offloading for these netdevs will be implemented via DPDK. So, you >> need to add the functionality to netdev_rte_offload provider and >> teach it to return success for these netdevs from the init_flow_api(). >> >>> 2.3 The question how to distinguish between a vport under system datapath >> versus vport under netdev datapath is still an open question (would be happy >> to >> learn that it is already resolved). >> >> Solved. See above. >> >>> >>> 3. Could you please add inline documentation for the newly added code? It >> seems the logic is spread is several places so any comments will be helpful. >> >> Provider registering API has comments. 'netdev_assign_flow_api' is >> quiet simple and mostly documented by the log messages inside. >> I'd like to add some comments if you'll point me functions that needs them. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev