I use this patch to try to fix BUG, I test several times, it's OK
```
[root@bogon yusur_ovs]# git diff
diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index b000aee..3330cb2 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -86,6 +86,8 @@ static struct hmap meter_id_to_police_idx
OVS_GUARDED_BY(meter_mutex)
     = HMAP_INITIALIZER(&meter_id_to_police_idx);
 static struct hmap police_idx_to_meter_id OVS_GUARDED_BY(meter_mutex)
     = HMAP_INITIALIZER(&police_idx_to_meter_id);
+static atomic_bool is_tc_init = ATOMIC_VAR_INIT(false);

 static int meter_id_lookup(uint32_t meter_id, uint32_t *police_idx);
 static int police_idx_lookup(uint32_t police_idx, uint32_t *meter_id);
@@ -2549,6 +2551,13 @@ meter_tc_set_policer(ofproto_meter_id meter_id,
     uint32_t rate, burst;
     bool add_policer;
     int err;
+    bool init;
+
+    atomic_read_relaxed(&is_tc_init, &init);
+    if (!init)
+        return 0;
+    else
+        VLOG_WARN("Do not call meter_set before init.");

     if (!config->bands || config->n_bands < 1 ||
         config->bands[0].type != OFPMBT13_DROP) {
```

----
Simon Jones


Simon Jones <batmanu...@gmail.com> 于2024年6月17日周一 11:13写道:

> I found another cause of this BUG:
> ```
> In public-ovs code, @netdev_offload_dpdk and @netdev_offload_tc is
> register in @netdev_register_flow_api_provider.
> The @netdev_register_flow_api_provider is called in init stage,
> like @dpdk_init__ and @netdev_initialize.
> After register, @netdev_offload_dpdk and @netdev_offload_tc is in
> @netdev_flow_apis.
>
> Then ovs-vswitchd run @bridge_run.
> In @bridge_run, call @netdev_assign_flow_api, then
> call rfa->flow_api->init_flow_api of all rfa of  @netdev_flow_apis.
> The rfa is like p0 netdevice of DPDK type, or mip0 netdevice of system
> type.
> If DPDK type, it's  @netdev_offload_dpdk's  init_flow_api.
> If system type, it's  @netdev_offload_tc's   init_flow_api.
>
> Then the add meter command comes, also call @bridge_run.
> In  @bridge_run, at last call @meter_offload_set, then
> call rfa->flow_api->meter_set  of all rfa of  @netdev_flow_apis.
>
> For this BUG.
> Happens when ovs-vswitchd restart.
> As bridge/port/meter is all stored in ovsdb.
> If meter configure called before port configure, then rfa->flow_api->meter_set
> will be called before rfa->flow_api->init_flow_api.
> Then BUG happens.
>
> ```
>
> ----
> Simon Jones
>
>
> Simon Jones <batmanu...@gmail.com> 于2024年6月17日周一 10:57写道:
>
>> Oh, I'm using ovs-2.17.2, and I found that there is no .meter_set api
>> implement in ovs-dpdk, which means only one .meter_set implement in TC.
>> ```
>> const struct netdev_flow_api netdev_offload_dpdk = {
>>     .type = "dpdk_flow_api",
>>     .flow_put = netdev_offload_dpdk_flow_put,
>>     .flow_del = netdev_offload_dpdk_flow_del,
>>     .init_flow_api = netdev_offload_dpdk_init_flow_api,
>>     .uninit_flow_api = netdev_offload_dpdk_uninit_flow_api,
>>     .flow_get = netdev_offload_dpdk_flow_get,
>>     .flow_flush = netdev_offload_dpdk_flow_flush,
>>     .hw_miss_packet_recover = netdev_offload_dpdk_hw_miss_packet_recover,
>>     .flow_get_n_flows = netdev_offload_dpdk_get_n_flows,
>> };
>> ```
>> That's why public-ovs has NO BUG, because only one .meter_set implement
>> in TC.
>>
>> But I add .meter_set in dpdk linke this:
>> ```
>> const struct netdev_flow_api netdev_offload_dpdk = {
>>     .type = "dpdk_flow_api",
>>     .flow_put = netdev_offload_dpdk_flow_put,
>>     .flow_del = netdev_offload_dpdk_flow_del,
>>     .init_flow_api = netdev_offload_dpdk_init_flow_api,
>>     .uninit_flow_api = netdev_offload_dpdk_uninit_flow_api,
>>     .flow_get = netdev_offload_dpdk_flow_get,
>>     .flow_flush = netdev_offload_dpdk_flow_flush,
>>     .hw_miss_packet_recover = netdev_offload_dpdk_hw_miss_packet_recover,
>>     .flow_get_n_flows = netdev_offload_dpdk_get_n_flows,
>>     .meter_set = netdev_offload_dpdk_meter_set,
>>     .meter_get = netdev_offload_dpdk_meter_get,
>>     .meter_del = netdev_offload_dpdk_meter_del,
>> };
>> ```
>>
>> That's why I have BUG.
>>
>> So I think I should add some check...
>>
>> ----
>> Simon Jones
>>
>>
>> Simon Jones <batmanu...@gmail.com> 于2024年6月17日周一 10:06写道:
>>
>>> Oh, I'm using ovs-2.17.2, and I found that there is no .meter_set api
>>> implement in ovs-dpdk, which means only one .meter_set implement in TC.
>>> ```
>>> const struct netdev_flow_api netdev_offload_dpdk = {
>>>     .type = "dpdk_flow_api",
>>>     .flow_put = netdev_offload_dpdk_flow_put,
>>>     .flow_del = netdev_offload_dpdk_flow_del,
>>>     .init_flow_api = netdev_offload_dpdk_init_flow_api,
>>>     .uninit_flow_api = netdev_offload_dpdk_uninit_flow_api,
>>>     .flow_get = netdev_offload_dpdk_flow_get,
>>>     .flow_flush = netdev_offload_dpdk_flow_flush,
>>>     .hw_miss_packet_recover = netdev_offload_dpdk_hw_miss_packet_recover,
>>>     .flow_get_n_flows = netdev_offload_dpdk_get_n_flows,
>>> };
>>> ```
>>> That's why public-ovs has NO BUG, because only one .meter_set implement
>>> in TC.
>>>
>>> But I add .meter_set in dpdk linke this:
>>> ```
>>> const struct netdev_flow_api netdev_offload_dpdk = {
>>>     .type = "dpdk_flow_api",
>>>     .flow_put = netdev_offload_dpdk_flow_put,
>>>     .flow_del = netdev_offload_dpdk_flow_del,
>>>     .init_flow_api = netdev_offload_dpdk_init_flow_api,
>>>     .uninit_flow_api = netdev_offload_dpdk_uninit_flow_api,
>>>     .flow_get = netdev_offload_dpdk_flow_get,
>>>     .flow_flush = netdev_offload_dpdk_flow_flush,
>>>     .hw_miss_packet_recover = netdev_offload_dpdk_hw_miss_packet_recover,
>>>     .flow_get_n_flows = netdev_offload_dpdk_get_n_flows,
>>>     .meter_set = netdev_offload_dpdk_meter_set,
>>>     .meter_get = netdev_offload_dpdk_meter_get,
>>>     .meter_del = netdev_offload_dpdk_meter_del,
>>> };
>>> ```
>>>
>>> That's why I have BUG.
>>>
>>> So I think I should add some check...
>>>
>>> ----
>>> Simon Jones
>>>
>>>
>>> Ilya Maximets <i.maxim...@ovn.org> 于2024年6月14日周五 20:30写道:
>>>
>>>> On 6/14/24 13:33, Simon Jones wrote:
>>>> > ```
>>>> > Date:   Fri Jun 14 19:25:43 2024 +0800
>>>> >
>>>> >     bugfix of meter tc crash.
>>>> >
>>>> > diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
>>>> > index 9fde5f7a9..d08c5a35f 100644
>>>> > --- a/lib/netdev-offload.c
>>>> > +++ b/lib/netdev-offload.c
>>>> > @@ -186,7 +186,6 @@ netdev_assign_flow_api(struct netdev *netdev)
>>>> >              ovsrcu_set(&netdev->flow_api, rfa->flow_api);
>>>> >              VLOG_INFO("%s: Assigned flow API '%s'.",
>>>> >                        netdev_get_name(netdev), rfa->flow_api->type);
>>>> > -            return 0;
>>>> >          }
>>>> >          VLOG_DBG("%s: flow API '%s' is not suitable.",
>>>> >                   netdev_get_name(netdev), rfa->flow_api->type);
>>>> > ```
>>>> >
>>>> > ----
>>>> > Simon Jones
>>>> >
>>>> >
>>>> > Simon Jones <batmanu...@gmail.com> 于2024年6月14日周五 19:25写道:
>>>> >
>>>> >> Maybe reason is this:
>>>> >> ```
>>>> >> @netdev_offload_tc and @netdev_offload_dpdk will always be register.
>>>> >> Then the @meter_set api will be called.
>>>> >> The @meter_set could be called only after @init_flow_api, but the BUG
>>>> >> happens when @meter_set is called before @init_flow_api is called.
>>>> >>
>>>> >> Check these code:
>>>> >>
>>>> >> static int
>>>> >> netdev_assign_flow_api(struct netdev *netdev)
>>>> >> {
>>>> >>     struct netdev_registered_flow_api *rfa;
>>>> >>
>>>> >>     CMAP_FOR_EACH (rfa, cmap_node, &netdev_flow_apis) {
>>>> >>         if (!rfa->flow_api->init_flow_api(netdev)) {
>>>> >>             ovs_refcount_ref(&rfa->refcnt);
>>>> >>             ovsrcu_set(&netdev->flow_api, rfa->flow_api);
>>>> >>             VLOG_INFO("%s: Assigned flow API '%s'.",
>>>> >>                       netdev_get_name(netdev), rfa->flow_api->type);
>>>> >>             return 0;
>>>> >>         }
>>>> >>         VLOG_DBG("%s: flow API '%s' is not suitable.",
>>>> >>                  netdev_get_name(netdev), rfa->flow_api->type);
>>>> >>     }
>>>> >>     VLOG_INFO("%s: No suitable flow API found.",
>>>> netdev_get_name(netdev));
>>>> >>
>>>> >>     return -1;
>>>> >> }
>>>> >>
>>>> >> ONLY one type of rfa->flow_api->init_flow_api (DPDK or TC) could be
>>>> called.
>>>> >> Because after one called, then return 0.
>>>> >> ```
>>>> >>
>>>> >> So I suggest to just remove 'return 0'.
>>>>
>>>> Hi, Simon.  This is not a correct thing to do.  By design, only one
>>>> flow API
>>>> should be able to accept a particular port type.  If two API
>>>> implementations
>>>> can accept the same port, that would be a bug.  Also, initializing more
>>>> than
>>>> one API will cause resource leaks and potentially incorrect datapath
>>>> behavior.
>>>>
>>>> Since you're using userspace datapath, init_flow_api() from the TC
>>>> implementation
>>>> should always fail.
>>>>
>>>> What is your OVS version?
>>>>
>>>> Best regards, Ilya Maximets.
>>>>
>>>> >> Like this patch:
>>>> >>
>>>> >> ----
>>>> >> Simon Jones
>>>> >>
>>>> >>
>>>> >> Simon Jones <batmanu...@gmail.com> 于2024年6月11日周二 17:32写道:
>>>> >>
>>>> >>>
>>>> >>> Hi all,
>>>> >>>
>>>> >>> I'm using ovs-dpdk with this patch:
>>>> >>> ```
>>>> >>> commit 4c226944f7c55c9d6e7c85f7c33c5ce11c35ce54
>>>> >>> Author: Jianbo Liu <jian...@nvidia.com>
>>>> >>> Date:   Fri Jul 8 03:06:26 2022 +0000
>>>> >>>
>>>> >>>     netdev-offload-tc: Implement meter offload API for tc
>>>> >>> ```
>>>> >>> This patch is for offload flow meter by tc.
>>>> >>>
>>>> >>> Now I found a bug: ovs crash when add meter openflow.
>>>> >>>
>>>> >>> 1. How to produce:
>>>> >>> (NOTICE: This bug is not always reproducible.)
>>>> >>> ```
>>>> >>> Add these commands:
>>>> >>>
>>>> >>> ovs-ofctl -O OpenFlow13 add-meter br-int
>>>> >>> meter=1,kbps,band=type=drop,rate=1000
>>>> >>> ovs-ofctl -O OpenFlow13 add-flow br-int
>>>> in_port=\"pf0vf0\",ip,nw_src=
>>>> >>> 16.0.0.0/24,nw_dst=48.0.0.0/24,nw_proto=17,actions=\
>>>> <http://16.0.0.0/24,nw_dst=48.0.0.0/24,nw_proto=17,actions=%5C>
>>>> >>> <http://16.0.0.0/24,nw_dst=48.0.0.0/24,nw_proto=17,actions=%5C>
>>>> >>> "meter:1,output:p0\"
>>>> >>> ovs-ofctl -O OpenFlow13 add-flow br-int
>>>> >>>
>>>> in_port=\"pf0vf0\",udp6,ipv6_src=2001:4860:0:2001::/64,ipv6_dst=2001:0:4137:9350:8000:f12a:b9c8:2815,actions=\"meter:1,output:p0\"
>>>> >>>
>>>> >>> Then ovs crash, this is core file call trace:
>>>> >>>
>>>> >>> (gdb) bt
>>>> >>> #0  id_pool_alloc_id (pool=0x0, id_=id_@entry=0x7fff180c841c) at
>>>> >>> lib/id-pool.c:112
>>>> >>> #1  0x000000000055f0a0 in meter_alloc_police_index
>>>> >>> (police_index=0x7fff180c841c) at lib/netdev-offload-tc.c:2530
>>>> >>> #2  meter_tc_set_policer (meter_id=..., config=0x7fff180c8538) at
>>>> >>> lib/netdev-offload-tc.c:2567
>>>> >>> #3  0x00000000004af207 in meter_offload_set (meter_id=meter_id@entry
>>>> =...,
>>>> >>> config=config@entry=0x7fff180c8538) at lib/netdev-offload.c:207
>>>> >>> #4  0x0000000000474c2b in dpif_netdev_meter_set (dpif=<optimized
>>>> out>,
>>>> >>> meter_id=..., config=0x7fff180c8538) at lib/dpif-netdev.c:9137
>>>> >>> #5  0x000000000048e3fd in dpif_meter_set (dpif=0x28cfd10,
>>>> meter_id=...,
>>>> >>> config=<optimized out>) at lib/dpif.c:1973
>>>> >>> #6  0x000000000042daf8 in meter_set (ofproto_=0x28cc670,
>>>> >>> meter_id=0x7fff180c8590, config=<optimized out>) at
>>>> >>> ofproto/ofproto-dpif.c:6751
>>>> >>> #7  0x0000000000423e91 in handle_add_meter (mm=0x7fff180c8530,
>>>> >>> ofproto=0x28cc670) at ofproto/ofproto.c:6905
>>>> >>> #8  handle_meter_mod (ofconn=0x29073d0, oh=0x291bd40) at
>>>> >>> ofproto/ofproto.c:7013
>>>> >>> #9  0x0000000000426d21 in handle_single_part_openflow
>>>> >>> (type=OFPTYPE_METER_MOD, oh=0x291bd40, ofconn=0x29073d0) at
>>>> >>> ofproto/ofproto.c:8668
>>>> >>> #10 handle_openflow (ofconn=0x29073d0, msgs=0x7fff180c88c0) at
>>>> >>> ofproto/ofproto.c:8843
>>>> >>> #11 0x000000000045b3b4 in ofconn_run (handle_openflow=0x4267b0
>>>> >>> <handle_openflow>, ofconn=0x29073d0) at ofproto/connmgr.c:1329
>>>> >>> #12 connmgr_run (mgr=0x272a200,
>>>> handle_openflow=handle_openflow@entry=0x4267b0
>>>> >>> <handle_openflow>) at ofproto/connmgr.c:356
>>>> >>> #13 0x00000000004209be in ofproto_run (p=0x28cc670) at
>>>> >>> ofproto/ofproto.c:1964
>>>> >>> #14 0x000000000040da44 in bridge_run__ () at vswitchd/bridge.c:3251
>>>> >>> #15 0x00000000004138ec in bridge_run () at vswitchd/bridge.c:3310
>>>> >>> #16 0x000000000040955d in main (argc=<optimized out>,
>>>> argv=<optimized
>>>> >>> out>) at vswitchd/ovs-vswitchd.c:129
>>>> >>>
>>>> >>> ```
>>>> >>>
>>>> >>> 2. Check code:
>>>> >>> ```
>>>> >>> Notice
>>>> >>> #0  id_pool_alloc_id (pool=0x0, id_=id_@entry=0x7fff180c841c) at
>>>> >>> lib/id-pool.c:112
>>>> >>> the @pool param is NULL.
>>>> >>>
>>>> >>> This is happened ONLY when @netdev_tc_init_flow_api is NOT called.
>>>> >>> But in code, ONLY after @netdev_tc_init_flow_api is called, the
>>>> >>> @handle_meter_mod could works on netdev with system type.
>>>> >>> ```
>>>> >>>
>>>> >>> 3. My question:
>>>> >>> - How this BUG happen?
>>>> >>> - Is there patch to solve this BUG?
>>>> >>>
>>>> >>>
>>>> >>> ----
>>>> >>> Simon Jones
>>>> >>>
>>>> >>
>>>> > _______________________________________________
>>>> > dev mailing list
>>>> > d...@openvswitch.org
>>>> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>
>>>>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to