On 4/29/22 20:01, Ilya Maximets wrote: > On 4/29/22 05:28, Jianbo Liu wrote: >> On Thu, 2022-04-28 at 14:00 +0200, Ilya Maximets wrote: >>> On 4/28/22 13:49, Ilya Maximets wrote: >>>> On 4/28/22 11:06, Jianbo Liu wrote: >>>>> On Wed, 2022-04-27 at 13:37 +0200, Ilya Maximets wrote: >>>>>> On 4/27/22 04:41, Jianbo Liu wrote: >>>>>>> On Wed, 2022-04-27 at 01:58 +0200, Ilya Maximets wrote: >>>>>>>> On 4/26/22 09:14, Simon Horman wrote: >>>>>>>>> On Tue, Apr 19, 2022 at 01:38:33AM +0000, Jianbo Liu via >>>>>>>>> dev >>>>>>>>> wrote: >>>>>>>>>> On Thu, 2022-04-07 at 09:15 +0000, Jianbo Liu wrote: >>>>>>>>>>> This series is to add support for tc offloading of >>>>>>>>>>> ovs >>>>>>>>>>> metering, and >>>>>>>>>>> enhance OVS to use new kernel feature which offload >>>>>>>>>>> tc >>>>>>>>>>> police >>>>>>>>>>> action to >>>>>>>>>>> hardware. >>>>>>>>>>> To do the offloading, new APIs for meter are added in >>>>>>>>>>> netdev- >>>>>>>>>>> offload, >>>>>>>>>>> and OVS meters are mapped to tc police actions with >>>>>>>>>>> one-to- >>>>>>>>>>> one >>>>>>>>>>> relationship for dpif-netlink. >>>>>>>>>>> >>>>>>>>>>> Notes: >>>>>>>>>>> v3 >>>>>>>>>>> - Add netdev-offload APIs for meter. >>>>>>>>>>> - Move the implementation to lower netdev- >>>>>>>>>>> offload-tc >>>>>>>>>>> layer. >>>>>>>>>>> >>>>>>>>>>> v2 >>>>>>>>>>> - Move tc police parse call from last patch to >>>>>>>>>>> 2nd >>>>>>>>>>> patch. >>>>>>>>>>> 2nd patch is adding the parse lib func and add >>>>>>>>>>> the >>>>>>>>>>> call >>>>>>>>>>> to it in >>>>>>>>>>> that patch. >>>>>>>>>>> Last patch is the put of tc police act. >>>>>>>>>>> In 2nd patch also add empty switch case for tc >>>>>>>>>>> police >>>>>>>>>>> act >>>>>>>>>>> put as >>>>>>>>>>> the impl. >>>>>>>>>>> is done in the last patch when support for put >>>>>>>>>>> is >>>>>>>>>>> being >>>>>>>>>>> added. >>>>>>>>> >>>>>>>>> ... >>>>>>>>> >>>>>>>>>> Friendly ping for review. >>>>>>>>> >>>>>>>>> Hi Jianbo, >>>>>>>>> >>>>>>>>> sorry for the delayed response. >>>>>>>>> >>>>>>>>> My colleagues at Corigine have exercised this patchset >>>>>>>>> and have >>>>>>>>> verified >>>>>>>>> it for our test-cases. >>>>>>>>> >>>>>>>>> Also, I have run the upstream test workflows over them, >>>>>>>>> and >>>>>>>>> they >>>>>>>>> look good. >>>>>>>>> >>>>>>>>> 1. netdev-offload: Add meter offload provider >>>>>>>>> >>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhorms2%2Fovs%2Factions%2Fruns%2F2220399423&data=05%7C01%7Cjianbol%40nvidia.com%7Cd62f3e1865b2471053e608da290eb982%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637867440481236224%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=UmOUgdVlvQuuCPEqlfgRlei5tZtmAv7liSbfVVaWXgE%3D&reserved=0 >>>>>>>>> 2. tc: Add support parsing tc police action >>>>>>>>> >>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhorms2%2Fovs%2Factions%2Fruns%2F2220409233&data=05%7C01%7Cjianbol%40nvidia.com%7Cd62f3e1865b2471053e608da290eb982%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637867440481236224%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3CA7zQl8evhu9RIF6BC5pwiLpASiSLN3jJ7%2B4lANKCo%3D&reserved=0 >>>>>>>>> 3. netdev-linux: Refactor put police action netlink >>>>>>>>> message >>>>>>>>> >>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhorms2%2Fovs%2Factions%2Fruns%2F2220516986&data=05%7C01%7Cjianbol%40nvidia.com%7Cd62f3e1865b2471053e608da290eb982%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637867440481236224%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Fv0GEkVRMvT5TQo2wxwB%2FB4FMmUu3d2F5mBRONhen1U%3D&reserved=0 >>>>>>>>> 4. netdev-linux: Add functions to manipulate tc police >>>>>>>>> action >>>>>>>>> >>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhorms2%2Fovs%2Factions%2Fruns%2F2220566774&data=05%7C01%7Cjianbol%40nvidia.com%7Cd62f3e1865b2471053e608da290eb982%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637867440481236224%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Nm0T35Z4q5RlkAAAXsiwOqbVo7Utm4nVZYfEtYul4gk%3D&reserved=0 >>>>>>>>> 5. netdev-offload-tc: Implement and register meter >>>>>>>>> offload API >>>>>>>>> for >>>>>>>>> tc >>>>>>>>> >>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhorms2%2Fovs%2Factions%2Fruns%2F2220568491&data=05%7C01%7Cjianbol%40nvidia.com%7Cd62f3e1865b2471053e608da290eb982%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637867440481236224%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=f67ByBBmS7sJC8D4kOcgSOTQSAl5NueDo4NhJL2aozU%3D&reserved=0 >>>>>>>>> 6. netdev-offload-tc: Cleanup police actions with >>>>>>>>> reserved >>>>>>>>> indexes >>>>>>>>> on startup >>>>>>>>> >>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhorms2%2Fovs%2Factions%2Fruns%2F2220569761&data=05%7C01%7Cjianbol%40nvidia.com%7Cd62f3e1865b2471053e608da290eb982%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637867440481236224%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1iLtBH6tsyoOSH8HoPzg3o1zcqEkBQUVNjcs%2BHg3%2Fvc%3D&reserved=0 >>>>>>>>> 7. netdev-offload-tc: Offloading rules with police >>>>>>>>> actions >>>>>>>>> >>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhorms2%2Fovs%2Factions%2Fruns%2F2220571039&data=05%7C01%7Cjianbol%40nvidia.com%7Cd62f3e1865b2471053e608da290eb982%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637867440481236224%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=i1%2FUfISCkzpZJmSygf1eT2T%2FlMiDFosrcBrIinYJ5x8%3D&reserved=0 >>>>>>>>> 8. dpif-netlink: Offloading meter to tc police action >>>>>>>>> >>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhorms2%2Fovs%2Factions%2Fruns%2F2220572904&data=05%7C01%7Cjianbol%40nvidia.com%7Cd62f3e1865b2471053e608da290eb982%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637867440481236224%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vGH1Em4ddDfnHiayWJXabdyqT7TXNua8awyJoPTD4Fk%3D&reserved=0 >>>>>>>>> >>>>>>>>> I have also reviewed the patches, and with my OVS >>>>>>>>> maintainer >>>>>>>>> hat >>>>>>>>> on, I'm >>>>>>>>> happy to apply them, and plan to do so in the coming days >>>>>>>>> unless >>>>>>>>> there are >>>>>>>>> objections. >>>>>>>> >>>>>>>> Thanks, Simon, for taking care of this patch set! And >>>>>>>> sorry for >>>>>>>> me >>>>>>>> not being >>>>>>>> very responsive. >>>>>>>> >>>>>>>> I had a quick look over the patch set just now and I think >>>>>>>> that >>>>>>>> in >>>>>>>> this >>>>>>>> version most of my main concerns were addressed and it >>>>>>>> looks good >>>>>>>> in >>>>>>>> general. >>>>>>>> >>>>>>>> One thing I didn't understand is why we need to have >>>>>>>> 'struct >>>>>>>> meter_offload_api' >>>>>>>> as a separate entity? I mean, it never used if a normal >>>>>>>> offload >>>>>>>> API >>>>>>>> is >>>>>>>> not initialized and it's tied with the particular offload >>>>>>>> API. >>>>>>>> Can >>>>>>>> these >>>>>>>> callbacks be just part of the normal 'struct >>>>>>>> netdev_flow_api' ? >>>>>>>> Should not be hard to merge meter_offload_api and >>>>>>>> netdev_flow_api >>>>>>>> into one. >>>>>>>> That should also reduce the amount of code. Or am I >>>>>>>> missing >>>>>>>> something? >>>>>>> >>>>>>> Conceptually, meter is used globally, and it does not belong >>>>>>> to any >>>>>>> perticular netdev/flow. All the callbacks in 'struct >>>>>>> netdev_flow_api' >>>>>>> have a parameter related to netdev, but meter callbacks don't >>>>>>> need. >>>>>>> The meter_offload_set/get/del can be part of 'struct >>>>>>> netdev_flow_api', >>>>>>> but meter_offload_init/destory must be placed to elsewhere, >>>>>>> to >>>>>>> avoid >>>>>>> being called more than once. >>>>>>> So, why not add new API for meter? But I'd like to change if >>>>>>> you >>>>>>> insist. What's your decision? >>>>>> >>>>>> I agree that meters do not belong to any particular netdev, >>>>>> that's >>>>>> true. However, netdev_flow_api is also a global entity. All >>>>>> netdevs >>>>>> are using the same instance of the netdev_fow_api. And most of >>>>>> the >>>>>> structures in netdev-offload-tc are global (hash maps, >>>>>> configurations). >>>>>> 'netdev' argument can just be viewed as a guide for these >>>>>> callbacks, >>>>>> like meter_id for the meter offload API, not something >>>>>> necessary. >>>>> >>>>> Usually, we get netdev_flow_api from netdev->flow_api, but >>>>> dpif_class- >>>>>> meter_get/set/del don't have the param of netdev, we must add a >>>>>> helper >>>>> to get any netdev from dpif. The helper is more like: >>>>> >>>>> struct netdev * >>>>> netdev_get_any(const char *dpif_type) >>>>> { >>>>> struct port_to_netdev_data *data; >>>>> struct netdev *dev = NULL; >>>>> >>>>> ovs_rwlock_rdlock(&netdev_hmap_rwlock); >>>>> HMAP_FOR_EACH (data, portno_node, &port_to_netdev) { >>>>> if (netdev_get_dpif_type(data->netdev) == dpif_type) { >>>>> dev = netdev_ref(data->netdev); >>>>> break; >>>>> } >>>>> } >>>>> ovs_rwlock_unlock(&netdev_hmap_rwlock); >>>>> >>>>> return dev; >>>>> } >>>>> >>>>> Do you accept it, or have better solution? >>>> >>>> Hmm. Good point, I missed that. >>>> In fact, a single dpif can have ports with different offload API >>>> initialized. For example, linux ports may use TC, while DPDK ports >>>> can use DPDK's rte_flow while being in the same dpif-netdev >>>> instance. >>>> >>>> So, I think, the correct solution is to add a meter to all flow >>>> APIs >>>> that are initialized for ports within that datapath. Maybe >>>> something >>>> like this (not tested, pseudo-code): >>>> >>>> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c >>>> index fb108c0d5..91504ee15 100644 >>>> --- a/lib/netdev-offload.c >>>> +++ b/lib/netdev-offload.c >>>> @@ -79,6 +79,9 @@ struct netdev_registered_flow_api { >>>> /* Number of references: one for the flow_api itself and one >>>> for every >>>> * instance of the netdev that uses it. */ >>>> struct ovs_refcount refcnt; >>>> + >>>> + /* Reference counter for dpif types which has a port with this >>>> flow API. */ >>>> + struct simap used_by_dpif_types; >>>> }; >>>> >>>> static struct netdev_registered_flow_api * >>>> @@ -120,6 +123,7 @@ netdev_register_flow_api_provider(const struct >>>> netdev_flow_api *new_flow_api) >>>> hash_string(new_flow_api->type, 0)); >>>> rfa->flow_api = new_flow_api; >>>> ovs_refcount_init(&rfa->refcnt); >>>> + simap_init(&rfa->used_by_dpif_types); >>>> VLOG_DBG("netdev: flow API '%s' registered.", >>>> new_flow_api->type); >>>> } >>>> ovs_mutex_unlock(&netdev_flow_api_provider_mutex); >>>> @@ -154,6 +158,7 @@ netdev_unregister_flow_api_provider(const char >>>> *type) >>>> } else { >>>> cmap_remove(&netdev_flow_apis, &rfa->cmap_node, >>>> hash_string(rfa->flow_api->type, 0)); >>>> + ovsrcu_postpone(simap_destroy, &rfa->used_by_dpif_types); >>>> ovsrcu_postpone(free, rfa); >>>> error = 0; >>>> } >>>> @@ -182,6 +187,8 @@ netdev_assign_flow_api(struct netdev *netdev) >>>> CMAP_FOR_EACH (rfa, cmap_node, &netdev_flow_apis) { >>>> if (!rfa->flow_api->init_flow_api(netdev)) { >>>> ovs_refcount_ref(&rfa->refcnt); >>>> + simap_increase(&rfa->used_by_dpif_types, >>>> + netdev_get_dpif_type(netdev), 1); >>>> ovsrcu_set(&netdev->flow_api, rfa->flow_api); >>>> VLOG_INFO("%s: Assigned flow API '%s'.", >>>> netdev_get_name(netdev), rfa->flow_api- >>>>> type); >>>> @@ -344,6 +351,8 @@ netdev_uninit_flow_api(struct netdev *netdev) >>>> >>>> ovsrcu_set(&netdev->flow_api, NULL); >>>> rfa = netdev_lookup_flow_api(flow_api->type); >>>> + ovs_assert(simap_increase(&rfa->used_by_dpif_types, >>>> + netdev_get_dpif_type(netdev), -1) >= >>>> 0); >>>> ovs_refcount_unref(&rfa->refcnt); >>>> } >>>> >>>> @@ -739,6 +748,20 @@ netdev_ports_get_n_flows(const char >>>> *dpif_type, odp_port_t port_no, >>>> return ret; >>>> } >>>> >>>> + >>>> +void >>>> +netdev_offload_meter_set(const char *dpif_type, ...) >>>> +{ >>>> + struct netdev_registered_flow_api *rfa; >>>> + >>>> + CMAP_FOR_EACH (rfa, cmap_node, &netdev_flow_apis) { >>>> + if (simap_get(&rfa->used_by_dpif_types, dpif_type) > 0) { >>>> + rfa->flow_api->meter_set(...); >>>> + } >>>> + } >>>> +} >>>> + >>>> + >>>> odp_port_t >>>> netdev_ifindex_to_odp_port(int ifindex) >>>> { >>>> --- >>>> >>>> ? >>> >>> Thinking more, there is no point in counting netdevs, since they >>> can be added after meters already created. So, I guess, we just >>> have to always add meters to all registered flow API providers, >>> i.e. just: >>> >>> void >>> netdev_offload_meter_set(...) >>> { >>> struct netdev_registered_flow_api *rfa; >>> >>> CMAP_FOR_EACH (rfa, cmap_node, &netdev_flow_apis) { >>> rfa->flow_api->meter_set(...); >>> } >>> } >> >> You know I will put this netdev_offload_meter_set in >> dpif_netlink_meter_set. But, in the future, someone may implement >> meter_set for netdev_offload_dpdk, so it will be called as well. It >> looks unreasonable as we want to call linux_tc's meter_set only, >> dpdk_flow_api's meter_set should be called in dpif_netdev. > > meter_set for netdev_offload_dpdk will not be called unless > netdev_offload_dpdk is registered and it will not be registered > unless dpdk is explicitely enabled by the user via database knob. > And if user enables dpdk, they likely want to use userpsace > datapath. So that doesn't sound like a big problem. > > Userspace datapath though is more versatile and it may want > to use several different offload APIs at the same time. And > we don't really know for packets from which ports the meter > will be used. And since they may have different offload APIs > we kind of have to create meters in all registered.
Alternative solution is to not create meters anywhere at all, but only store them globally inside the netdev_offload module. Flow API implementations will look up and create the meter on-demand while creating a flow that is going to use that meter. Not sure how deletions should look like in this case. > > Binding a flow API to a specific datapath doesn't look good to me. > > I guess, this also raises the point of using the same meter > for flows installed in kernel and TC for dpif-netlink, as well > as using the same meter for flows in TC and DPDK with userspace > datapath. > But that problem seems to not have a clear solution as > bandwidth sharing doesn't seem to be possible in such cases. > > Best regards, Ilya Maximets. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev