It's PR: Bugfix of meter_set crash. by batmancn · Pull Request #425 ·
openvswitch/ovs (github.com) <https://github.com/openvswitch/ovs/pull/425>

----
Simon Jones


Simon Jones <batmanu...@gmail.com> 于2024年6月17日周一 15:35写道:

> This patch:
> ```
> $ git diff
> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
> index b000aeea8..74fd7c11b 100644
> --- a/lib/netdev-offload-tc.c
> +++ b/lib/netdev-offload-tc.c
> @@ -86,6 +86,8 @@ static struct hmap meter_id_to_police_idx
> OVS_GUARDED_BY(meter_mutex)
>      = HMAP_INITIALIZER(&meter_id_to_police_idx);
>  static struct hmap police_idx_to_meter_id OVS_GUARDED_BY(meter_mutex)
>      = HMAP_INITIALIZER(&police_idx_to_meter_id);
> +/* YSK2: if init tc. */
> +static atomic_bool is_tc_init = ATOMIC_VAR_INIT(false);
>
>  static int meter_id_lookup(uint32_t meter_id, uint32_t *police_idx);
>  static int police_idx_lookup(uint32_t police_idx, uint32_t *meter_id);
> @@ -2433,6 +2435,7 @@ netdev_tc_init_flow_api(struct netdev *netdev)
>      }
>
>      VLOG_INFO("added ingress qdisc to %s", netdev_get_name(netdev));
> +    atomic_store_relaxed(&is_tc_init, true);
>
>      return 0;
>  }
> @@ -2549,6 +2552,13 @@ meter_tc_set_policer(ofproto_meter_id meter_id,
>      uint32_t rate, burst;
>      bool add_policer;
>      int err;
> +    bool init;
> +
> +    atomic_read_relaxed(&is_tc_init, &init);
> +    if (!init) {
> +        VLOG_WARN("Do not call meter_set before init.");
> +        return 0;
> +    }
>
>      if (!config->bands || config->n_bands < 1 ||
>          config->bands[0].type != OFPMBT13_DROP) {
> ```
> ----
> Simon Jones
>
>
> Simon Jones <batmanu...@gmail.com> 于2024年6月17日周一 15:30写道:
>
>> I use this patch to try to fix BUG, I test several times, it's OK
>> ```
>> [root@bogon yusur_ovs]# git diff
>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>> index b000aee..3330cb2 100644
>> --- a/lib/netdev-offload-tc.c
>> +++ b/lib/netdev-offload-tc.c
>> @@ -86,6 +86,8 @@ static struct hmap meter_id_to_police_idx
>> OVS_GUARDED_BY(meter_mutex)
>>      = HMAP_INITIALIZER(&meter_id_to_police_idx);
>>  static struct hmap police_idx_to_meter_id OVS_GUARDED_BY(meter_mutex)
>>      = HMAP_INITIALIZER(&police_idx_to_meter_id);
>> +static atomic_bool is_tc_init = ATOMIC_VAR_INIT(false);
>>
>>  static int meter_id_lookup(uint32_t meter_id, uint32_t *police_idx);
>>  static int police_idx_lookup(uint32_t police_idx, uint32_t *meter_id);
>> @@ -2549,6 +2551,13 @@ meter_tc_set_policer(ofproto_meter_id meter_id,
>>      uint32_t rate, burst;
>>      bool add_policer;
>>      int err;
>> +    bool init;
>> +
>> +    atomic_read_relaxed(&is_tc_init, &init);
>> +    if (!init)
>> +        return 0;
>> +    else
>> +        VLOG_WARN("Do not call meter_set before init.");
>>
>>      if (!config->bands || config->n_bands < 1 ||
>>          config->bands[0].type != OFPMBT13_DROP) {
>> ```
>>
>> ----
>> Simon Jones
>>
>>
>> Simon Jones <batmanu...@gmail.com> 于2024年6月17日周一 11:13写道:
>>
>>> I found another cause of this BUG:
>>> ```
>>> In public-ovs code, @netdev_offload_dpdk and @netdev_offload_tc is
>>> register in @netdev_register_flow_api_provider.
>>> The @netdev_register_flow_api_provider is called in init stage,
>>> like @dpdk_init__ and @netdev_initialize.
>>> After register, @netdev_offload_dpdk and @netdev_offload_tc is in
>>> @netdev_flow_apis.
>>>
>>> Then ovs-vswitchd run @bridge_run.
>>> In @bridge_run, call @netdev_assign_flow_api, then
>>> call rfa->flow_api->init_flow_api of all rfa of  @netdev_flow_apis.
>>> The rfa is like p0 netdevice of DPDK type, or mip0 netdevice of system
>>> type.
>>> If DPDK type, it's  @netdev_offload_dpdk's  init_flow_api.
>>> If system type, it's  @netdev_offload_tc's   init_flow_api.
>>>
>>> Then the add meter command comes, also call @bridge_run.
>>> In  @bridge_run, at last call @meter_offload_set, then
>>> call rfa->flow_api->meter_set  of all rfa of  @netdev_flow_apis.
>>>
>>> For this BUG.
>>> Happens when ovs-vswitchd restart.
>>> As bridge/port/meter is all stored in ovsdb.
>>> If meter configure called before port configure, then 
>>> rfa->flow_api->meter_set
>>> will be called before rfa->flow_api->init_flow_api.
>>> Then BUG happens.
>>>
>>> ```
>>>
>>> ----
>>> Simon Jones
>>>
>>>
>>> Simon Jones <batmanu...@gmail.com> 于2024年6月17日周一 10:57写道:
>>>
>>>> Oh, I'm using ovs-2.17.2, and I found that there is no .meter_set api
>>>> implement in ovs-dpdk, which means only one .meter_set implement in TC.
>>>> ```
>>>> const struct netdev_flow_api netdev_offload_dpdk = {
>>>>     .type = "dpdk_flow_api",
>>>>     .flow_put = netdev_offload_dpdk_flow_put,
>>>>     .flow_del = netdev_offload_dpdk_flow_del,
>>>>     .init_flow_api = netdev_offload_dpdk_init_flow_api,
>>>>     .uninit_flow_api = netdev_offload_dpdk_uninit_flow_api,
>>>>     .flow_get = netdev_offload_dpdk_flow_get,
>>>>     .flow_flush = netdev_offload_dpdk_flow_flush,
>>>>     .hw_miss_packet_recover =
>>>> netdev_offload_dpdk_hw_miss_packet_recover,
>>>>     .flow_get_n_flows = netdev_offload_dpdk_get_n_flows,
>>>> };
>>>> ```
>>>> That's why public-ovs has NO BUG, because only one .meter_set implement
>>>> in TC.
>>>>
>>>> But I add .meter_set in dpdk linke this:
>>>> ```
>>>> const struct netdev_flow_api netdev_offload_dpdk = {
>>>>     .type = "dpdk_flow_api",
>>>>     .flow_put = netdev_offload_dpdk_flow_put,
>>>>     .flow_del = netdev_offload_dpdk_flow_del,
>>>>     .init_flow_api = netdev_offload_dpdk_init_flow_api,
>>>>     .uninit_flow_api = netdev_offload_dpdk_uninit_flow_api,
>>>>     .flow_get = netdev_offload_dpdk_flow_get,
>>>>     .flow_flush = netdev_offload_dpdk_flow_flush,
>>>>     .hw_miss_packet_recover =
>>>> netdev_offload_dpdk_hw_miss_packet_recover,
>>>>     .flow_get_n_flows = netdev_offload_dpdk_get_n_flows,
>>>>     .meter_set = netdev_offload_dpdk_meter_set,
>>>>     .meter_get = netdev_offload_dpdk_meter_get,
>>>>     .meter_del = netdev_offload_dpdk_meter_del,
>>>> };
>>>> ```
>>>>
>>>> That's why I have BUG.
>>>>
>>>> So I think I should add some check...
>>>>
>>>> ----
>>>> Simon Jones
>>>>
>>>>
>>>> Simon Jones <batmanu...@gmail.com> 于2024年6月17日周一 10:06写道:
>>>>
>>>>> Oh, I'm using ovs-2.17.2, and I found that there is no .meter_set api
>>>>> implement in ovs-dpdk, which means only one .meter_set implement in TC.
>>>>> ```
>>>>> const struct netdev_flow_api netdev_offload_dpdk = {
>>>>>     .type = "dpdk_flow_api",
>>>>>     .flow_put = netdev_offload_dpdk_flow_put,
>>>>>     .flow_del = netdev_offload_dpdk_flow_del,
>>>>>     .init_flow_api = netdev_offload_dpdk_init_flow_api,
>>>>>     .uninit_flow_api = netdev_offload_dpdk_uninit_flow_api,
>>>>>     .flow_get = netdev_offload_dpdk_flow_get,
>>>>>     .flow_flush = netdev_offload_dpdk_flow_flush,
>>>>>     .hw_miss_packet_recover =
>>>>> netdev_offload_dpdk_hw_miss_packet_recover,
>>>>>     .flow_get_n_flows = netdev_offload_dpdk_get_n_flows,
>>>>> };
>>>>> ```
>>>>> That's why public-ovs has NO BUG, because only one .meter_set
>>>>> implement in TC.
>>>>>
>>>>> But I add .meter_set in dpdk linke this:
>>>>> ```
>>>>> const struct netdev_flow_api netdev_offload_dpdk = {
>>>>>     .type = "dpdk_flow_api",
>>>>>     .flow_put = netdev_offload_dpdk_flow_put,
>>>>>     .flow_del = netdev_offload_dpdk_flow_del,
>>>>>     .init_flow_api = netdev_offload_dpdk_init_flow_api,
>>>>>     .uninit_flow_api = netdev_offload_dpdk_uninit_flow_api,
>>>>>     .flow_get = netdev_offload_dpdk_flow_get,
>>>>>     .flow_flush = netdev_offload_dpdk_flow_flush,
>>>>>     .hw_miss_packet_recover =
>>>>> netdev_offload_dpdk_hw_miss_packet_recover,
>>>>>     .flow_get_n_flows = netdev_offload_dpdk_get_n_flows,
>>>>>     .meter_set = netdev_offload_dpdk_meter_set,
>>>>>     .meter_get = netdev_offload_dpdk_meter_get,
>>>>>     .meter_del = netdev_offload_dpdk_meter_del,
>>>>> };
>>>>> ```
>>>>>
>>>>> That's why I have BUG.
>>>>>
>>>>> So I think I should add some check...
>>>>>
>>>>> ----
>>>>> Simon Jones
>>>>>
>>>>>
>>>>> Ilya Maximets <i.maxim...@ovn.org> 于2024年6月14日周五 20:30写道:
>>>>>
>>>>>> On 6/14/24 13:33, Simon Jones wrote:
>>>>>> > ```
>>>>>> > Date:   Fri Jun 14 19:25:43 2024 +0800
>>>>>> >
>>>>>> >     bugfix of meter tc crash.
>>>>>> >
>>>>>> > diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
>>>>>> > index 9fde5f7a9..d08c5a35f 100644
>>>>>> > --- a/lib/netdev-offload.c
>>>>>> > +++ b/lib/netdev-offload.c
>>>>>> > @@ -186,7 +186,6 @@ netdev_assign_flow_api(struct netdev *netdev)
>>>>>> >              ovsrcu_set(&netdev->flow_api, rfa->flow_api);
>>>>>> >              VLOG_INFO("%s: Assigned flow API '%s'.",
>>>>>> >                        netdev_get_name(netdev),
>>>>>> rfa->flow_api->type);
>>>>>> > -            return 0;
>>>>>> >          }
>>>>>> >          VLOG_DBG("%s: flow API '%s' is not suitable.",
>>>>>> >                   netdev_get_name(netdev), rfa->flow_api->type);
>>>>>> > ```
>>>>>> >
>>>>>> > ----
>>>>>> > Simon Jones
>>>>>> >
>>>>>> >
>>>>>> > Simon Jones <batmanu...@gmail.com> 于2024年6月14日周五 19:25写道:
>>>>>> >
>>>>>> >> Maybe reason is this:
>>>>>> >> ```
>>>>>> >> @netdev_offload_tc and @netdev_offload_dpdk will always be
>>>>>> register.
>>>>>> >> Then the @meter_set api will be called.
>>>>>> >> The @meter_set could be called only after @init_flow_api, but the
>>>>>> BUG
>>>>>> >> happens when @meter_set is called before @init_flow_api is called.
>>>>>> >>
>>>>>> >> Check these code:
>>>>>> >>
>>>>>> >> static int
>>>>>> >> netdev_assign_flow_api(struct netdev *netdev)
>>>>>> >> {
>>>>>> >>     struct netdev_registered_flow_api *rfa;
>>>>>> >>
>>>>>> >>     CMAP_FOR_EACH (rfa, cmap_node, &netdev_flow_apis) {
>>>>>> >>         if (!rfa->flow_api->init_flow_api(netdev)) {
>>>>>> >>             ovs_refcount_ref(&rfa->refcnt);
>>>>>> >>             ovsrcu_set(&netdev->flow_api, rfa->flow_api);
>>>>>> >>             VLOG_INFO("%s: Assigned flow API '%s'.",
>>>>>> >>                       netdev_get_name(netdev),
>>>>>> rfa->flow_api->type);
>>>>>> >>             return 0;
>>>>>> >>         }
>>>>>> >>         VLOG_DBG("%s: flow API '%s' is not suitable.",
>>>>>> >>                  netdev_get_name(netdev), rfa->flow_api->type);
>>>>>> >>     }
>>>>>> >>     VLOG_INFO("%s: No suitable flow API found.",
>>>>>> netdev_get_name(netdev));
>>>>>> >>
>>>>>> >>     return -1;
>>>>>> >> }
>>>>>> >>
>>>>>> >> ONLY one type of rfa->flow_api->init_flow_api (DPDK or TC) could
>>>>>> be called.
>>>>>> >> Because after one called, then return 0.
>>>>>> >> ```
>>>>>> >>
>>>>>> >> So I suggest to just remove 'return 0'.
>>>>>>
>>>>>> Hi, Simon.  This is not a correct thing to do.  By design, only one
>>>>>> flow API
>>>>>> should be able to accept a particular port type.  If two API
>>>>>> implementations
>>>>>> can accept the same port, that would be a bug.  Also, initializing
>>>>>> more than
>>>>>> one API will cause resource leaks and potentially incorrect datapath
>>>>>> behavior.
>>>>>>
>>>>>> Since you're using userspace datapath, init_flow_api() from the TC
>>>>>> implementation
>>>>>> should always fail.
>>>>>>
>>>>>> What is your OVS version?
>>>>>>
>>>>>> Best regards, Ilya Maximets.
>>>>>>
>>>>>> >> Like this patch:
>>>>>> >>
>>>>>> >> ----
>>>>>> >> Simon Jones
>>>>>> >>
>>>>>> >>
>>>>>> >> Simon Jones <batmanu...@gmail.com> 于2024年6月11日周二 17:32写道:
>>>>>> >>
>>>>>> >>>
>>>>>> >>> Hi all,
>>>>>> >>>
>>>>>> >>> I'm using ovs-dpdk with this patch:
>>>>>> >>> ```
>>>>>> >>> commit 4c226944f7c55c9d6e7c85f7c33c5ce11c35ce54
>>>>>> >>> Author: Jianbo Liu <jian...@nvidia.com>
>>>>>> >>> Date:   Fri Jul 8 03:06:26 2022 +0000
>>>>>> >>>
>>>>>> >>>     netdev-offload-tc: Implement meter offload API for tc
>>>>>> >>> ```
>>>>>> >>> This patch is for offload flow meter by tc.
>>>>>> >>>
>>>>>> >>> Now I found a bug: ovs crash when add meter openflow.
>>>>>> >>>
>>>>>> >>> 1. How to produce:
>>>>>> >>> (NOTICE: This bug is not always reproducible.)
>>>>>> >>> ```
>>>>>> >>> Add these commands:
>>>>>> >>>
>>>>>> >>> ovs-ofctl -O OpenFlow13 add-meter br-int
>>>>>> >>> meter=1,kbps,band=type=drop,rate=1000
>>>>>> >>> ovs-ofctl -O OpenFlow13 add-flow br-int
>>>>>> in_port=\"pf0vf0\",ip,nw_src=
>>>>>> >>> 16.0.0.0/24,nw_dst=48.0.0.0/24,nw_proto=17,actions=\
>>>>>> <http://16.0.0.0/24,nw_dst=48.0.0.0/24,nw_proto=17,actions=%5C>
>>>>>> >>> <http://16.0.0.0/24,nw_dst=48.0.0.0/24,nw_proto=17,actions=%5C>
>>>>>> >>> "meter:1,output:p0\"
>>>>>> >>> ovs-ofctl -O OpenFlow13 add-flow br-int
>>>>>> >>>
>>>>>> in_port=\"pf0vf0\",udp6,ipv6_src=2001:4860:0:2001::/64,ipv6_dst=2001:0:4137:9350:8000:f12a:b9c8:2815,actions=\"meter:1,output:p0\"
>>>>>> >>>
>>>>>> >>> Then ovs crash, this is core file call trace:
>>>>>> >>>
>>>>>> >>> (gdb) bt
>>>>>> >>> #0  id_pool_alloc_id (pool=0x0, id_=id_@entry=0x7fff180c841c) at
>>>>>> >>> lib/id-pool.c:112
>>>>>> >>> #1  0x000000000055f0a0 in meter_alloc_police_index
>>>>>> >>> (police_index=0x7fff180c841c) at lib/netdev-offload-tc.c:2530
>>>>>> >>> #2  meter_tc_set_policer (meter_id=..., config=0x7fff180c8538) at
>>>>>> >>> lib/netdev-offload-tc.c:2567
>>>>>> >>> #3  0x00000000004af207 in meter_offload_set
>>>>>> (meter_id=meter_id@entry=...,
>>>>>> >>> config=config@entry=0x7fff180c8538) at lib/netdev-offload.c:207
>>>>>> >>> #4  0x0000000000474c2b in dpif_netdev_meter_set (dpif=<optimized
>>>>>> out>,
>>>>>> >>> meter_id=..., config=0x7fff180c8538) at lib/dpif-netdev.c:9137
>>>>>> >>> #5  0x000000000048e3fd in dpif_meter_set (dpif=0x28cfd10,
>>>>>> meter_id=...,
>>>>>> >>> config=<optimized out>) at lib/dpif.c:1973
>>>>>> >>> #6  0x000000000042daf8 in meter_set (ofproto_=0x28cc670,
>>>>>> >>> meter_id=0x7fff180c8590, config=<optimized out>) at
>>>>>> >>> ofproto/ofproto-dpif.c:6751
>>>>>> >>> #7  0x0000000000423e91 in handle_add_meter (mm=0x7fff180c8530,
>>>>>> >>> ofproto=0x28cc670) at ofproto/ofproto.c:6905
>>>>>> >>> #8  handle_meter_mod (ofconn=0x29073d0, oh=0x291bd40) at
>>>>>> >>> ofproto/ofproto.c:7013
>>>>>> >>> #9  0x0000000000426d21 in handle_single_part_openflow
>>>>>> >>> (type=OFPTYPE_METER_MOD, oh=0x291bd40, ofconn=0x29073d0) at
>>>>>> >>> ofproto/ofproto.c:8668
>>>>>> >>> #10 handle_openflow (ofconn=0x29073d0, msgs=0x7fff180c88c0) at
>>>>>> >>> ofproto/ofproto.c:8843
>>>>>> >>> #11 0x000000000045b3b4 in ofconn_run (handle_openflow=0x4267b0
>>>>>> >>> <handle_openflow>, ofconn=0x29073d0) at ofproto/connmgr.c:1329
>>>>>> >>> #12 connmgr_run (mgr=0x272a200,
>>>>>> handle_openflow=handle_openflow@entry=0x4267b0
>>>>>> >>> <handle_openflow>) at ofproto/connmgr.c:356
>>>>>> >>> #13 0x00000000004209be in ofproto_run (p=0x28cc670) at
>>>>>> >>> ofproto/ofproto.c:1964
>>>>>> >>> #14 0x000000000040da44 in bridge_run__ () at
>>>>>> vswitchd/bridge.c:3251
>>>>>> >>> #15 0x00000000004138ec in bridge_run () at vswitchd/bridge.c:3310
>>>>>> >>> #16 0x000000000040955d in main (argc=<optimized out>,
>>>>>> argv=<optimized
>>>>>> >>> out>) at vswitchd/ovs-vswitchd.c:129
>>>>>> >>>
>>>>>> >>> ```
>>>>>> >>>
>>>>>> >>> 2. Check code:
>>>>>> >>> ```
>>>>>> >>> Notice
>>>>>> >>> #0  id_pool_alloc_id (pool=0x0, id_=id_@entry=0x7fff180c841c) at
>>>>>> >>> lib/id-pool.c:112
>>>>>> >>> the @pool param is NULL.
>>>>>> >>>
>>>>>> >>> This is happened ONLY when @netdev_tc_init_flow_api is NOT called.
>>>>>> >>> But in code, ONLY after @netdev_tc_init_flow_api is called, the
>>>>>> >>> @handle_meter_mod could works on netdev with system type.
>>>>>> >>> ```
>>>>>> >>>
>>>>>> >>> 3. My question:
>>>>>> >>> - How this BUG happen?
>>>>>> >>> - Is there patch to solve this BUG?
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> ----
>>>>>> >>> Simon Jones
>>>>>> >>>
>>>>>> >>
>>>>>> > _______________________________________________
>>>>>> > dev mailing list
>>>>>> > d...@openvswitch.org
>>>>>> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>
>>>>>>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to