I use this patch to try to fix BUG, I test several times, it's OK ``` [root@bogon yusur_ovs]# git diff diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c index b000aee..3330cb2 100644 --- a/lib/netdev-offload-tc.c +++ b/lib/netdev-offload-tc.c @@ -86,6 +86,8 @@ static struct hmap meter_id_to_police_idx OVS_GUARDED_BY(meter_mutex) = HMAP_INITIALIZER(&meter_id_to_police_idx); static struct hmap police_idx_to_meter_id OVS_GUARDED_BY(meter_mutex) = HMAP_INITIALIZER(&police_idx_to_meter_id); +static atomic_bool is_tc_init = ATOMIC_VAR_INIT(false);
static int meter_id_lookup(uint32_t meter_id, uint32_t *police_idx); static int police_idx_lookup(uint32_t police_idx, uint32_t *meter_id); @@ -2549,6 +2551,13 @@ meter_tc_set_policer(ofproto_meter_id meter_id, uint32_t rate, burst; bool add_policer; int err; + bool init; + + atomic_read_relaxed(&is_tc_init, &init); + if (!init) + return 0; + else + VLOG_WARN("Do not call meter_set before init."); if (!config->bands || config->n_bands < 1 || config->bands[0].type != OFPMBT13_DROP) { ``` ---- Simon Jones Simon Jones <batmanu...@gmail.com> 于2024年6月17日周一 11:13写道: > I found another cause of this BUG: > ``` > In public-ovs code, @netdev_offload_dpdk and @netdev_offload_tc is > register in @netdev_register_flow_api_provider. > The @netdev_register_flow_api_provider is called in init stage, > like @dpdk_init__ and @netdev_initialize. > After register, @netdev_offload_dpdk and @netdev_offload_tc is in > @netdev_flow_apis. > > Then ovs-vswitchd run @bridge_run. > In @bridge_run, call @netdev_assign_flow_api, then > call rfa->flow_api->init_flow_api of all rfa of @netdev_flow_apis. > The rfa is like p0 netdevice of DPDK type, or mip0 netdevice of system > type. > If DPDK type, it's @netdev_offload_dpdk's init_flow_api. > If system type, it's @netdev_offload_tc's init_flow_api. > > Then the add meter command comes, also call @bridge_run. > In @bridge_run, at last call @meter_offload_set, then > call rfa->flow_api->meter_set of all rfa of @netdev_flow_apis. > > For this BUG. > Happens when ovs-vswitchd restart. > As bridge/port/meter is all stored in ovsdb. > If meter configure called before port configure, then rfa->flow_api->meter_set > will be called before rfa->flow_api->init_flow_api. > Then BUG happens. > > ``` > > ---- > Simon Jones > > > Simon Jones <batmanu...@gmail.com> 于2024年6月17日周一 10:57写道: > >> Oh, I'm using ovs-2.17.2, and I found that there is no .meter_set api >> implement in ovs-dpdk, which means only one .meter_set implement in TC. >> ``` >> const struct netdev_flow_api netdev_offload_dpdk = { >> .type = "dpdk_flow_api", >> .flow_put = netdev_offload_dpdk_flow_put, >> .flow_del = netdev_offload_dpdk_flow_del, >> .init_flow_api = netdev_offload_dpdk_init_flow_api, >> .uninit_flow_api = netdev_offload_dpdk_uninit_flow_api, >> .flow_get = netdev_offload_dpdk_flow_get, >> .flow_flush = netdev_offload_dpdk_flow_flush, >> .hw_miss_packet_recover = netdev_offload_dpdk_hw_miss_packet_recover, >> .flow_get_n_flows = netdev_offload_dpdk_get_n_flows, >> }; >> ``` >> That's why public-ovs has NO BUG, because only one .meter_set implement >> in TC. >> >> But I add .meter_set in dpdk linke this: >> ``` >> const struct netdev_flow_api netdev_offload_dpdk = { >> .type = "dpdk_flow_api", >> .flow_put = netdev_offload_dpdk_flow_put, >> .flow_del = netdev_offload_dpdk_flow_del, >> .init_flow_api = netdev_offload_dpdk_init_flow_api, >> .uninit_flow_api = netdev_offload_dpdk_uninit_flow_api, >> .flow_get = netdev_offload_dpdk_flow_get, >> .flow_flush = netdev_offload_dpdk_flow_flush, >> .hw_miss_packet_recover = netdev_offload_dpdk_hw_miss_packet_recover, >> .flow_get_n_flows = netdev_offload_dpdk_get_n_flows, >> .meter_set = netdev_offload_dpdk_meter_set, >> .meter_get = netdev_offload_dpdk_meter_get, >> .meter_del = netdev_offload_dpdk_meter_del, >> }; >> ``` >> >> That's why I have BUG. >> >> So I think I should add some check... >> >> ---- >> Simon Jones >> >> >> Simon Jones <batmanu...@gmail.com> 于2024年6月17日周一 10:06写道: >> >>> Oh, I'm using ovs-2.17.2, and I found that there is no .meter_set api >>> implement in ovs-dpdk, which means only one .meter_set implement in TC. >>> ``` >>> const struct netdev_flow_api netdev_offload_dpdk = { >>> .type = "dpdk_flow_api", >>> .flow_put = netdev_offload_dpdk_flow_put, >>> .flow_del = netdev_offload_dpdk_flow_del, >>> .init_flow_api = netdev_offload_dpdk_init_flow_api, >>> .uninit_flow_api = netdev_offload_dpdk_uninit_flow_api, >>> .flow_get = netdev_offload_dpdk_flow_get, >>> .flow_flush = netdev_offload_dpdk_flow_flush, >>> .hw_miss_packet_recover = netdev_offload_dpdk_hw_miss_packet_recover, >>> .flow_get_n_flows = netdev_offload_dpdk_get_n_flows, >>> }; >>> ``` >>> That's why public-ovs has NO BUG, because only one .meter_set implement >>> in TC. >>> >>> But I add .meter_set in dpdk linke this: >>> ``` >>> const struct netdev_flow_api netdev_offload_dpdk = { >>> .type = "dpdk_flow_api", >>> .flow_put = netdev_offload_dpdk_flow_put, >>> .flow_del = netdev_offload_dpdk_flow_del, >>> .init_flow_api = netdev_offload_dpdk_init_flow_api, >>> .uninit_flow_api = netdev_offload_dpdk_uninit_flow_api, >>> .flow_get = netdev_offload_dpdk_flow_get, >>> .flow_flush = netdev_offload_dpdk_flow_flush, >>> .hw_miss_packet_recover = netdev_offload_dpdk_hw_miss_packet_recover, >>> .flow_get_n_flows = netdev_offload_dpdk_get_n_flows, >>> .meter_set = netdev_offload_dpdk_meter_set, >>> .meter_get = netdev_offload_dpdk_meter_get, >>> .meter_del = netdev_offload_dpdk_meter_del, >>> }; >>> ``` >>> >>> That's why I have BUG. >>> >>> So I think I should add some check... >>> >>> ---- >>> Simon Jones >>> >>> >>> Ilya Maximets <i.maxim...@ovn.org> 于2024年6月14日周五 20:30写道: >>> >>>> On 6/14/24 13:33, Simon Jones wrote: >>>> > ``` >>>> > Date: Fri Jun 14 19:25:43 2024 +0800 >>>> > >>>> > bugfix of meter tc crash. >>>> > >>>> > diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c >>>> > index 9fde5f7a9..d08c5a35f 100644 >>>> > --- a/lib/netdev-offload.c >>>> > +++ b/lib/netdev-offload.c >>>> > @@ -186,7 +186,6 @@ netdev_assign_flow_api(struct netdev *netdev) >>>> > ovsrcu_set(&netdev->flow_api, rfa->flow_api); >>>> > VLOG_INFO("%s: Assigned flow API '%s'.", >>>> > netdev_get_name(netdev), rfa->flow_api->type); >>>> > - return 0; >>>> > } >>>> > VLOG_DBG("%s: flow API '%s' is not suitable.", >>>> > netdev_get_name(netdev), rfa->flow_api->type); >>>> > ``` >>>> > >>>> > ---- >>>> > Simon Jones >>>> > >>>> > >>>> > Simon Jones <batmanu...@gmail.com> 于2024年6月14日周五 19:25写道: >>>> > >>>> >> Maybe reason is this: >>>> >> ``` >>>> >> @netdev_offload_tc and @netdev_offload_dpdk will always be register. >>>> >> Then the @meter_set api will be called. >>>> >> The @meter_set could be called only after @init_flow_api, but the BUG >>>> >> happens when @meter_set is called before @init_flow_api is called. >>>> >> >>>> >> Check these code: >>>> >> >>>> >> static int >>>> >> netdev_assign_flow_api(struct netdev *netdev) >>>> >> { >>>> >> struct netdev_registered_flow_api *rfa; >>>> >> >>>> >> CMAP_FOR_EACH (rfa, cmap_node, &netdev_flow_apis) { >>>> >> if (!rfa->flow_api->init_flow_api(netdev)) { >>>> >> ovs_refcount_ref(&rfa->refcnt); >>>> >> ovsrcu_set(&netdev->flow_api, rfa->flow_api); >>>> >> VLOG_INFO("%s: Assigned flow API '%s'.", >>>> >> netdev_get_name(netdev), rfa->flow_api->type); >>>> >> return 0; >>>> >> } >>>> >> VLOG_DBG("%s: flow API '%s' is not suitable.", >>>> >> netdev_get_name(netdev), rfa->flow_api->type); >>>> >> } >>>> >> VLOG_INFO("%s: No suitable flow API found.", >>>> netdev_get_name(netdev)); >>>> >> >>>> >> return -1; >>>> >> } >>>> >> >>>> >> ONLY one type of rfa->flow_api->init_flow_api (DPDK or TC) could be >>>> called. >>>> >> Because after one called, then return 0. >>>> >> ``` >>>> >> >>>> >> So I suggest to just remove 'return 0'. >>>> >>>> Hi, Simon. This is not a correct thing to do. By design, only one >>>> flow API >>>> should be able to accept a particular port type. If two API >>>> implementations >>>> can accept the same port, that would be a bug. Also, initializing more >>>> than >>>> one API will cause resource leaks and potentially incorrect datapath >>>> behavior. >>>> >>>> Since you're using userspace datapath, init_flow_api() from the TC >>>> implementation >>>> should always fail. >>>> >>>> What is your OVS version? >>>> >>>> Best regards, Ilya Maximets. >>>> >>>> >> Like this patch: >>>> >> >>>> >> ---- >>>> >> Simon Jones >>>> >> >>>> >> >>>> >> Simon Jones <batmanu...@gmail.com> 于2024年6月11日周二 17:32写道: >>>> >> >>>> >>> >>>> >>> Hi all, >>>> >>> >>>> >>> I'm using ovs-dpdk with this patch: >>>> >>> ``` >>>> >>> commit 4c226944f7c55c9d6e7c85f7c33c5ce11c35ce54 >>>> >>> Author: Jianbo Liu <jian...@nvidia.com> >>>> >>> Date: Fri Jul 8 03:06:26 2022 +0000 >>>> >>> >>>> >>> netdev-offload-tc: Implement meter offload API for tc >>>> >>> ``` >>>> >>> This patch is for offload flow meter by tc. >>>> >>> >>>> >>> Now I found a bug: ovs crash when add meter openflow. >>>> >>> >>>> >>> 1. How to produce: >>>> >>> (NOTICE: This bug is not always reproducible.) >>>> >>> ``` >>>> >>> Add these commands: >>>> >>> >>>> >>> ovs-ofctl -O OpenFlow13 add-meter br-int >>>> >>> meter=1,kbps,band=type=drop,rate=1000 >>>> >>> ovs-ofctl -O OpenFlow13 add-flow br-int >>>> in_port=\"pf0vf0\",ip,nw_src= >>>> >>> 16.0.0.0/24,nw_dst=48.0.0.0/24,nw_proto=17,actions=\ >>>> <http://16.0.0.0/24,nw_dst=48.0.0.0/24,nw_proto=17,actions=%5C> >>>> >>> <http://16.0.0.0/24,nw_dst=48.0.0.0/24,nw_proto=17,actions=%5C> >>>> >>> "meter:1,output:p0\" >>>> >>> ovs-ofctl -O OpenFlow13 add-flow br-int >>>> >>> >>>> in_port=\"pf0vf0\",udp6,ipv6_src=2001:4860:0:2001::/64,ipv6_dst=2001:0:4137:9350:8000:f12a:b9c8:2815,actions=\"meter:1,output:p0\" >>>> >>> >>>> >>> Then ovs crash, this is core file call trace: >>>> >>> >>>> >>> (gdb) bt >>>> >>> #0 id_pool_alloc_id (pool=0x0, id_=id_@entry=0x7fff180c841c) at >>>> >>> lib/id-pool.c:112 >>>> >>> #1 0x000000000055f0a0 in meter_alloc_police_index >>>> >>> (police_index=0x7fff180c841c) at lib/netdev-offload-tc.c:2530 >>>> >>> #2 meter_tc_set_policer (meter_id=..., config=0x7fff180c8538) at >>>> >>> lib/netdev-offload-tc.c:2567 >>>> >>> #3 0x00000000004af207 in meter_offload_set (meter_id=meter_id@entry >>>> =..., >>>> >>> config=config@entry=0x7fff180c8538) at lib/netdev-offload.c:207 >>>> >>> #4 0x0000000000474c2b in dpif_netdev_meter_set (dpif=<optimized >>>> out>, >>>> >>> meter_id=..., config=0x7fff180c8538) at lib/dpif-netdev.c:9137 >>>> >>> #5 0x000000000048e3fd in dpif_meter_set (dpif=0x28cfd10, >>>> meter_id=..., >>>> >>> config=<optimized out>) at lib/dpif.c:1973 >>>> >>> #6 0x000000000042daf8 in meter_set (ofproto_=0x28cc670, >>>> >>> meter_id=0x7fff180c8590, config=<optimized out>) at >>>> >>> ofproto/ofproto-dpif.c:6751 >>>> >>> #7 0x0000000000423e91 in handle_add_meter (mm=0x7fff180c8530, >>>> >>> ofproto=0x28cc670) at ofproto/ofproto.c:6905 >>>> >>> #8 handle_meter_mod (ofconn=0x29073d0, oh=0x291bd40) at >>>> >>> ofproto/ofproto.c:7013 >>>> >>> #9 0x0000000000426d21 in handle_single_part_openflow >>>> >>> (type=OFPTYPE_METER_MOD, oh=0x291bd40, ofconn=0x29073d0) at >>>> >>> ofproto/ofproto.c:8668 >>>> >>> #10 handle_openflow (ofconn=0x29073d0, msgs=0x7fff180c88c0) at >>>> >>> ofproto/ofproto.c:8843 >>>> >>> #11 0x000000000045b3b4 in ofconn_run (handle_openflow=0x4267b0 >>>> >>> <handle_openflow>, ofconn=0x29073d0) at ofproto/connmgr.c:1329 >>>> >>> #12 connmgr_run (mgr=0x272a200, >>>> handle_openflow=handle_openflow@entry=0x4267b0 >>>> >>> <handle_openflow>) at ofproto/connmgr.c:356 >>>> >>> #13 0x00000000004209be in ofproto_run (p=0x28cc670) at >>>> >>> ofproto/ofproto.c:1964 >>>> >>> #14 0x000000000040da44 in bridge_run__ () at vswitchd/bridge.c:3251 >>>> >>> #15 0x00000000004138ec in bridge_run () at vswitchd/bridge.c:3310 >>>> >>> #16 0x000000000040955d in main (argc=<optimized out>, >>>> argv=<optimized >>>> >>> out>) at vswitchd/ovs-vswitchd.c:129 >>>> >>> >>>> >>> ``` >>>> >>> >>>> >>> 2. Check code: >>>> >>> ``` >>>> >>> Notice >>>> >>> #0 id_pool_alloc_id (pool=0x0, id_=id_@entry=0x7fff180c841c) at >>>> >>> lib/id-pool.c:112 >>>> >>> the @pool param is NULL. >>>> >>> >>>> >>> This is happened ONLY when @netdev_tc_init_flow_api is NOT called. >>>> >>> But in code, ONLY after @netdev_tc_init_flow_api is called, the >>>> >>> @handle_meter_mod could works on netdev with system type. >>>> >>> ``` >>>> >>> >>>> >>> 3. My question: >>>> >>> - How this BUG happen? >>>> >>> - Is there patch to solve this BUG? >>>> >>> >>>> >>> >>>> >>> ---- >>>> >>> Simon Jones >>>> >>> >>>> >> >>>> > _______________________________________________ >>>> > dev mailing list >>>> > d...@openvswitch.org >>>> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> >>>> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev