On Fri, Feb 4, 2022 at 8:37 AM Adrian Moreno <amore...@redhat.com> wrote:
>
>
>
> On 1/18/22 16:01, Peng He wrote:
> >  From hepeng:
> > https://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0...@bytedance.com/#2487473
> >
> > also from guohongzhi <guohongz...@huawei.com>:
> > http://patchwork.ozlabs.org/project/openvswitch/patch/20200306130555.19884-1-guohongz...@huawei.com/
> >
> > also from a discussion about the mixing use of RCU and refcount in the mail
> > list with Ilya Maximets, William Tu, Ben Pfaf, and Gaëtan Rivet.
> >
> > A summary, as quoted from Ilya:
> >
> > "
> > RCU for ofproto was introduced for one
> > and only one reason - to avoid freeing ofproto while rules are still
> > alive.  This was done in commit f416c8d61601 ("ofproto: RCU postpone
> > rule destruction.").  The goal was to allow using rules without
> > refcounting them within a single grace period.  And that forced us
> > to postpone destruction of the ofproto for a single grace period.
> > Later commit 39c9459355b6 ("Use classifier versioning.") made it
> > possible for rules to be alive for more than one grace period, so
> > the commit made ofproto wait for 2 grace periods by double postponing.
> > As we can see now, that wasn't enough and we have to wait for more
> > than 2 grace periods in certain cases.
> > "
> >
> > In a short, the ofproto should have a longer life time than rule, if
> > the rule lasts for more than 2 grace periods, the ofproto should live
> > longer to ensure rule->ofproto is valid. It's hard to predict how long
> > a ofproto should live, thus we need to use refcount on ofproto to make
> > things easy. The controversial part is that we have already used RCU 
> > postpone
> > to delay ofproto destrution, if we have to add refcount, is it simpler to
> > use just refcount without RCU postpone?
> >
> > IMO, I think going back to the pure refcount solution is more
> > complicated than mixing using both.
> >
> > Gaëtan Rive asks some questions on guohongzhi's v2 patch:
> >
> > during ofproto_rule_create, should we use ofproto_ref
> > or ofproto_try_ref? how can we make sure the ofproto is alive?
> >
> > By using RCU, ofproto has three states:
> >
> > state 1: alive, with refcount >= 1
> > state 2: dying, with refcount == 0, however pointer is valid
> > state 3: died, memory freed, pointer might be dangling.
> >
> > Without using RCU, there is no state 2, thus, we have to be very careful
> > every time we see a ofproto pointer. In contrast, with RCU, we can be sure
> > that it's alive at least in this grace peroid, so we can just check if
> > it is dying by ofproto_try_ref.
> >
> > This shows that by mixing use of RCU and refcount we can save a lot of work
> > worrying if ofproto is dangling.
> >
> > In short, the RCU part makes sure the ofproto is alive when we use it,
> > and the refcount part makes sure it lives longer enough.
> >
> > Also regarding a new patch filed recently, people are now making use
> > of RCU to protect ofproto:
> >
> > https://patchwork.ozlabs.org/project/openvswitch/patch/1638530715-44436-1-git-send-email-wangyunj...@huawei.com/
> >
> > In this patch, I have merged guohongzhi's patch and mine, and fixes
> > accoring to the previous comments.
> >
> > Signed-off-by: Peng He <hepeng.0...@bytedance.com>
> > Signed-off-by: guohongzhi <guohongz...@huawei.com>
> > ---
> >   ofproto/ofproto-dpif-xlate-cache.c |  2 +
> >   ofproto/ofproto-dpif-xlate.c       | 14 ++++---
> >   ofproto/ofproto-dpif.c             | 24 +++++++-----
> >   ofproto/ofproto-provider.h         |  2 +
> >   ofproto/ofproto.c                  | 62 +++++++++++++++++++++++++++---
> >   ofproto/ofproto.h                  |  4 ++
> >   6 files changed, 87 insertions(+), 21 deletions(-)
> >
> > diff --git a/ofproto/ofproto-dpif-xlate-cache.c 
> > b/ofproto/ofproto-dpif-xlate-cache.c
> > index dcc91cb38..9224ee2e6 100644
> > --- a/ofproto/ofproto-dpif-xlate-cache.c
> > +++ b/ofproto/ofproto-dpif-xlate-cache.c
> > @@ -209,6 +209,7 @@ xlate_cache_clear_entry(struct xc_entry *entry)
> >   {
> >       switch (entry->type) {
> >       case XC_TABLE:
> > +        ofproto_unref(&(entry->table.ofproto->up));
> >           break;
> >       case XC_RULE:
> >           ofproto_rule_unref(&entry->rule->up);
> > @@ -231,6 +232,7 @@ xlate_cache_clear_entry(struct xc_entry *entry)
> >           free(entry->learn.ofm);
> >           break;
> >       case XC_NORMAL:
> > +        ofproto_unref(&(entry->normal.ofproto->up));
> >           break;
> >       case XC_FIN_TIMEOUT:
> >           /* 'u.fin.rule' is always already held as a XC_RULE, which
> > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> > index 6fb59e170..129cdf714 100644
> > --- a/ofproto/ofproto-dpif-xlate.c
> > +++ b/ofproto/ofproto-dpif-xlate.c
> > @@ -3024,12 +3024,14 @@ xlate_normal(struct xlate_ctx *ctx)
> >           struct xc_entry *entry;
> >
> >           /* Save just enough info to update mac learning table later. */
> > -        entry = xlate_cache_add_entry(ctx->xin->xcache, XC_NORMAL);
> > -        entry->normal.ofproto = ctx->xbridge->ofproto;
> > -        entry->normal.in_port = flow->in_port.ofp_port;
> > -        entry->normal.dl_src = flow->dl_src;
> > -        entry->normal.vlan = vlan;
> > -        entry->normal.is_gratuitous_arp = is_grat_arp;
> > +        if (ofproto_try_ref(&ctx->xbridge->ofproto->up)) {
> > +            entry = xlate_cache_add_entry(ctx->xin->xcache, XC_NORMAL);
> > +            entry->normal.ofproto = ctx->xbridge->ofproto;
> > +            entry->normal.in_port = flow->in_port.ofp_port;
> > +            entry->normal.dl_src = flow->dl_src;
> > +            entry->normal.vlan = vlan;
> > +            entry->normal.is_gratuitous_arp = is_grat_arp;
> > +        }
> >       }
> >
> >       /* Determine output bundle. */
> > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> > index 8143dd965..c0a87456a 100644
> > --- a/ofproto/ofproto-dpif.c
> > +++ b/ofproto/ofproto-dpif.c
> > @@ -4471,12 +4471,14 @@ rule_dpif_lookup_from_table(struct ofproto_dpif 
> > *ofproto,
> >                   atomic_add_relaxed(&tbl->n_matched, stats->n_packets, 
> > &orig);
> >               }
> >               if (xcache) {
> > -                struct xc_entry *entry;
> > +                if (ofproto_try_ref(&ofproto->up)) {
> > +                    struct xc_entry *entry;
> >
> > -                entry = xlate_cache_add_entry(xcache, XC_TABLE);
> > -                entry->table.ofproto = ofproto;
> > -                entry->table.id = *table_id;
> > -                entry->table.match = true;
> > +                    entry = xlate_cache_add_entry(xcache, XC_TABLE);
> > +                    entry->table.ofproto = ofproto;
> > +                    entry->table.id = *table_id;
> > +                    entry->table.match = true;
> > +                }
> >               }
> >               return rule;
> >           }
> > @@ -4507,12 +4509,14 @@ rule_dpif_lookup_from_table(struct ofproto_dpif 
> > *ofproto,
> >                                  stats->n_packets, &orig);
> >           }
> >           if (xcache) {
> > -            struct xc_entry *entry;
> > +            if (ofproto_try_ref(&ofproto->up)) {
> > +                struct xc_entry *entry;
> >
> > -            entry = xlate_cache_add_entry(xcache, XC_TABLE);
> > -            entry->table.ofproto = ofproto;
> > -            entry->table.id = next_id;
> > -            entry->table.match = (rule != NULL);
> > +                entry = xlate_cache_add_entry(xcache, XC_TABLE);
> > +                entry->table.ofproto = ofproto;
> > +                entry->table.id = next_id;
> > +                entry->table.match = (rule != NULL);
> > +            }
> >           }
> >           if (rule) {
> >               goto out;   /* Match. */
> > diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
> > index 14b909973..ed10b8c76 100644
> > --- a/ofproto/ofproto-provider.h
> > +++ b/ofproto/ofproto-provider.h
> > @@ -143,6 +143,8 @@ struct ofproto {
> >       /* Variable length mf_field mapping. Stores all configured variable 
> > length
> >        * meta-flow fields (struct mf_field) in a switch. */
> >       struct vl_mff_map vl_mff_map;
> > +    /* refcount to this ofproto, holds by rule/group/xlate_caches */
> > +    struct ovs_refcount refcount;
> >   };
> >
> >   void ofproto_init_tables(struct ofproto *, int n_tables);
> > diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
> > index 56aeac720..10a22d9ec 100644
> > --- a/ofproto/ofproto.c
> > +++ b/ofproto/ofproto.c
> > @@ -549,6 +549,7 @@ ofproto_create(const char *datapath_name, const char 
> > *datapath_type,
> >
> >       ovs_mutex_init(&ofproto->vl_mff_map.mutex);
> >       cmap_init(&ofproto->vl_mff_map.cmap);
> > +    ovs_refcount_init(&ofproto->refcount);
> >
> >       error = ofproto->ofproto_class->construct(ofproto);
> >       if (error) {
> > @@ -1695,9 +1696,24 @@ ofproto_destroy__(struct ofproto *ofproto)
> >       ofproto->ofproto_class->dealloc(ofproto);
> >   }
> >
> > -/* Destroying rules is doubly deferred, must have 'ofproto' around for 
> > them.
> > - * - 1st we defer the removal of the rules from the classifier
> > - * - 2nd we defer the actual destruction of the rules. */
> > +/* We used to use defer function to wait for two grace periods
> > + * as we assume the rule that holds the ofproto pointer will
> > + * live at most two grace period. Howvever, we found at certain
> > + * cases, this assumption does not stand.
> > + *
> > + * destroying a rule may have to wait multiple grace periods:
> > + * remove_rules_postponed (one grace period)
> > + *       -> remove_rule_rcu
> > + *           -> remove_rule_rcu__
> > + *               -> ofproto_rule_unref -> ref count != 1
> > + *                   -> ... more grace periods.
> > + *                   -> rule_destroy_cb (> 2 grace periods)
> > + *                       -> free
> > + *
> > + * So we have to check the refcount for sure all the rules
> > + * have been destroyed.
> > + *
> > + */
> >   static void
> >   ofproto_destroy_defer__(struct ofproto *ofproto)
> >       OVS_EXCLUDED(ofproto_mutex)
> > @@ -1705,6 +1721,26 @@ ofproto_destroy_defer__(struct ofproto *ofproto)
> >       ovsrcu_postpone(ofproto_destroy__, ofproto);
> >   }
> >
> > +void
> > +ofproto_ref(struct ofproto *ofproto)
> > +{
> > +    ovs_refcount_ref(&ofproto->refcount);
> > +}
> > +
> > +bool
> > +ofproto_try_ref(struct ofproto *ofproto)
> > +{
> > +    return ovs_refcount_try_ref_rcu(&ofproto->refcount);
> > +}
> > +
> > +void
> > +ofproto_unref(struct ofproto *ofproto)
> > +{
> > +    if (ofproto && ovs_refcount_unref(&ofproto->refcount) == 1) {
> > +        ovsrcu_postpone(ofproto_destroy_defer__, ofproto);
> > +    }
> > +}
> > +
> >   void
> >   ofproto_destroy(struct ofproto *p, bool del)
> >       OVS_EXCLUDED(ofproto_mutex)
> > @@ -1736,8 +1772,7 @@ ofproto_destroy(struct ofproto *p, bool del)
> >       p->connmgr = NULL;
> >       ovs_mutex_unlock(&ofproto_mutex);
> >
> > -    /* Destroying rules is deferred, must have 'ofproto' around for them. 
> > */
> > -    ovsrcu_postpone(ofproto_destroy_defer__, p);
> > +    ofproto_unref(p);
> >   }
> >
> >   /* Destroys the datapath with the respective 'name' and 'type'.  With the 
> > Linux
> > @@ -2929,6 +2964,10 @@ ofproto_rule_destroy__(struct rule *rule)
> >       cls_rule_destroy(CONST_CAST(struct cls_rule *, &rule->cr));
> >       rule_actions_destroy(rule_get_actions(rule));
> >       ovs_mutex_destroy(&rule->mutex);
> > +    /* we need to call ofproto_unref first, and thanks to rcu, ofproto is 
> > alive
> > +     * otherwise, group is freed, group->ofproto is invalid
> > +     */
> > +    ofproto_unref(rule->ofproto);
> >       rule->ofproto->ofproto_class->rule_dealloc(rule);
> >   }
> >
> > @@ -3069,6 +3108,10 @@ group_destroy_cb(struct ofgroup *group)
> >                                                   &group->props));
> >       ofputil_bucket_list_destroy(CONST_CAST(struct ovs_list *,
> >                                              &group->buckets));
> > +    /* we need to call ofproto_unref first, and thanks to rcu, ofproto is 
> > alive
> > +     * otherwise, group is freed, group->ofproto is invalid
> > +     */
> > +    ofproto_unref(group->ofproto);
> >       group->ofproto->ofproto_class->group_dealloc(group);
> >   }
> >
> > @@ -5279,6 +5322,11 @@ ofproto_rule_create(struct ofproto *ofproto, struct 
> > cls_rule *cr,
> >           return OFPERR_OFPFMFC_UNKNOWN;
> >       }
> >
> > +    if (!ofproto_try_ref(ofproto)) {
> > +        cls_rule_destroy(cr);
> > +        return OFPERR_OFPFMFC_UNKNOWN;
> > +    }
> > +
> >       /* Initialize base state. */
> >       *CONST_CAST(struct ofproto **, &rule->ofproto) = ofproto;
> >       cls_rule_move(CONST_CAST(struct cls_rule *, &rule->cr), cr);
> > @@ -7345,6 +7393,10 @@ init_group(struct ofproto *ofproto, const struct 
> > ofputil_group_mod *gm,
> >           return OFPERR_OFPGMFC_OUT_OF_GROUPS;
> >       }
> >
> > +    if (!ofproto_try_ref(ofproto)) {
> > +        return OFPERR_OFPFMFC_UNKNOWN;
> > +    }
> > +
> >       *CONST_CAST(struct ofproto **, &(*ofgroup)->ofproto) = ofproto;
> >       *CONST_CAST(uint32_t *, &((*ofgroup)->group_id)) = gm->group_id;
> >       *CONST_CAST(enum ofp11_group_type *, &(*ofgroup)->type) = gm->type;
> > diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h
> > index b0262da2d..4e15167ab 100644
> > --- a/ofproto/ofproto.h
> > +++ b/ofproto/ofproto.h
> > @@ -563,6 +563,10 @@ int ofproto_port_get_cfm_status(const struct ofproto *,
> >   enum ofputil_table_miss ofproto_table_get_miss_config(const struct 
> > ofproto *,
> >                                                         uint8_t table_id);
> >
> > +void ofproto_ref(struct ofproto *);
> > +void ofproto_unref(struct ofproto *);
> > +bool ofproto_try_ref(struct ofproto *);
> > +
> >   #ifdef  __cplusplus
> >   }
> >   #endif
>
>
> Hi Peng,
>
> Do we still to protect the reference that "struct upcall" holds? i.e:
> https://patchwork.ozlabs.org/project/openvswitch/patch/1638530715-44436-1-git-send-email-wangyunj...@huawei.com/

I think that patch is also needed, in addition to this one.

-M

> Thanks.
> --
> Adrián Moreno
>
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to