Re: [ovs-dev] [ovs-dev v5] ofproto: add refcount to ofproto to fix ofproto use-after-free

2022-02-12 Thread Peng He
Great review!

Gaëtan Rivet  于2022年2月10日周四 22:01写道:

> On Sat, Feb 5, 2022, at 06:26, Peng He wrote:
> > From hepeng:
> >
> https://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0...@bytedance.com/#2487473
> >
> > also from guohongzhi :
> >
> http://patchwork.ozlabs.org/project/openvswitch/patch/20200306130555.19884-1-guohongz...@huawei.com/
> >
> > also from a discussion about the mixing use of RCU and refcount in the
> mail
> > list with Ilya Maximets, William Tu, Ben Pfaf, and Gaëtan Rivet.
> >
> > A summary, as quoted from Ilya:
> >
> > "
> > RCU for ofproto was introduced for one
> > and only one reason - to avoid freeing ofproto while rules are still
> > alive.  This was done in commit f416c8d61601 ("ofproto: RCU postpone
> > rule destruction.").  The goal was to allow using rules without
> > refcounting them within a single grace period.  And that forced us
> > to postpone destruction of the ofproto for a single grace period.
> > Later commit 39c9459355b6 ("Use classifier versioning.") made it
> > possible for rules to be alive for more than one grace period, so
> > the commit made ofproto wait for 2 grace periods by double postponing.
> > As we can see now, that wasn't enough and we have to wait for more
> > than 2 grace periods in certain cases.
> > "
> >
> > In a short, the ofproto should have a longer life time than rule, if
> > the rule lasts for more than 2 grace periods, the ofproto should live
> > longer to ensure rule->ofproto is valid. It's hard to predict how long
> > a ofproto should live, thus we need to use refcount on ofproto to make
> > things easy. The controversial part is that we have already used RCU
> postpone
> > to delay ofproto destrution, if we have to add refcount, is it simpler to
> > use just refcount without RCU postpone?
> >
> > IMO, I think going back to the pure refcount solution is more
> > complicated than mixing using both.
> >
> > Gaëtan Rive asks some questions on guohongzhi's v2 patch:
> >
> > during ofproto_rule_create, should we use ofproto_ref
> > or ofproto_try_ref? how can we make sure the ofproto is alive?
> >
> > By using RCU, ofproto has three states:
> >
> > state 1: alive, with refcount >= 1
> > state 2: dying, with refcount == 0, however pointer is valid
> > state 3: died, memory freed, pointer might be dangling.
> >
> > Without using RCU, there is no state 2, thus, we have to be very careful
> > every time we see a ofproto pointer. In contrast, with RCU, we can be
> sure
> > that it's alive at least in this grace peroid, so we can just check if
> > it is dying by ofproto_try_ref.
> >
> > This shows that by mixing use of RCU and refcount we can save a lot of
> work
> > worrying if ofproto is dangling.
> >
> > In short, the RCU part makes sure the ofproto is alive when we use it,
> > and the refcount part makes sure it lives longer enough.
> >
> > In this patch, I have merged guohongzhi's patch and mine, and fixes
> > accoring to the previous comments.
> >
> > v4->v5:
> > * fix the comments, remove the ref to wangyunjian's patch and
> > remove the comments describing the previous ofproto destruction code.
> > * fix group alloc leak issues.
> >
> > Signed-off-by: Peng He 
> > Signed-off-by: guohongzhi 
> > Acked-by: Mike Pattrick 
> > ---
> >  ofproto/ofproto-dpif-xlate-cache.c |  2 ++
> >  ofproto/ofproto-dpif-xlate.c   | 14 
> >  ofproto/ofproto-dpif.c | 24 +++--
> >  ofproto/ofproto-provider.h |  2 ++
> >  ofproto/ofproto.c  | 57 +++---
> >  ofproto/ofproto.h  |  4 +++
> >  6 files changed, 82 insertions(+), 21 deletions(-)
> >
> > diff --git a/ofproto/ofproto-dpif-xlate-cache.c
> > b/ofproto/ofproto-dpif-xlate-cache.c
> > index dcc91cb38..9224ee2e6 100644
> > --- a/ofproto/ofproto-dpif-xlate-cache.c
> > +++ b/ofproto/ofproto-dpif-xlate-cache.c
> > @@ -209,6 +209,7 @@ xlate_cache_clear_entry(struct xc_entry *entry)
> >  {
> >  switch (entry->type) {
> >  case XC_TABLE:
> > +ofproto_unref(&(entry->table.ofproto->up));
> >  break;
> >  case XC_RULE:
> >  ofproto_rule_unref(>rule->up);
> > @@ -231,6 +232,7 @@ xlate_cache_clear_entry(struct xc_entry *entry)
> >  free(entry->learn.ofm);
> >  break;
> >  case XC_NORMAL:
> > +ofproto_unref(&(entry->normal.ofproto->up));
> >  break;
> >  case XC_FIN_TIMEOUT:
> >  /* 'u.fin.rule' is always already held as a XC_RULE, which
> > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> > index 6fb59e170..129cdf714 100644
> > --- a/ofproto/ofproto-dpif-xlate.c
> > +++ b/ofproto/ofproto-dpif-xlate.c
> > @@ -3024,12 +3024,14 @@ xlate_normal(struct xlate_ctx *ctx)
> >  struct xc_entry *entry;
> >
> >  /* Save just enough info to update mac learning table later. */
> > -entry = xlate_cache_add_entry(ctx->xin->xcache, XC_NORMAL);
> > -entry->normal.ofproto = 

Re: [ovs-dev] [ovs-dev v5] ofproto: add refcount to ofproto to fix ofproto use-after-free

2022-02-10 Thread Gaëtan Rivet
On Sat, Feb 5, 2022, at 06:26, Peng He wrote:
> From hepeng:
> https://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0...@bytedance.com/#2487473
>
> also from guohongzhi :
> http://patchwork.ozlabs.org/project/openvswitch/patch/20200306130555.19884-1-guohongz...@huawei.com/
>
> also from a discussion about the mixing use of RCU and refcount in the mail
> list with Ilya Maximets, William Tu, Ben Pfaf, and Gaëtan Rivet.
>
> A summary, as quoted from Ilya:
>
> "
> RCU for ofproto was introduced for one
> and only one reason - to avoid freeing ofproto while rules are still
> alive.  This was done in commit f416c8d61601 ("ofproto: RCU postpone
> rule destruction.").  The goal was to allow using rules without
> refcounting them within a single grace period.  And that forced us
> to postpone destruction of the ofproto for a single grace period.
> Later commit 39c9459355b6 ("Use classifier versioning.") made it
> possible for rules to be alive for more than one grace period, so
> the commit made ofproto wait for 2 grace periods by double postponing.
> As we can see now, that wasn't enough and we have to wait for more
> than 2 grace periods in certain cases.
> "
>
> In a short, the ofproto should have a longer life time than rule, if
> the rule lasts for more than 2 grace periods, the ofproto should live
> longer to ensure rule->ofproto is valid. It's hard to predict how long
> a ofproto should live, thus we need to use refcount on ofproto to make
> things easy. The controversial part is that we have already used RCU postpone
> to delay ofproto destrution, if we have to add refcount, is it simpler to
> use just refcount without RCU postpone?
>
> IMO, I think going back to the pure refcount solution is more
> complicated than mixing using both.
>
> Gaëtan Rive asks some questions on guohongzhi's v2 patch:
>
> during ofproto_rule_create, should we use ofproto_ref
> or ofproto_try_ref? how can we make sure the ofproto is alive?
>
> By using RCU, ofproto has three states:
>
> state 1: alive, with refcount >= 1
> state 2: dying, with refcount == 0, however pointer is valid
> state 3: died, memory freed, pointer might be dangling.
>
> Without using RCU, there is no state 2, thus, we have to be very careful
> every time we see a ofproto pointer. In contrast, with RCU, we can be sure
> that it's alive at least in this grace peroid, so we can just check if
> it is dying by ofproto_try_ref.
>
> This shows that by mixing use of RCU and refcount we can save a lot of work
> worrying if ofproto is dangling.
>
> In short, the RCU part makes sure the ofproto is alive when we use it,
> and the refcount part makes sure it lives longer enough.
>
> In this patch, I have merged guohongzhi's patch and mine, and fixes
> accoring to the previous comments.
>
> v4->v5:
> * fix the comments, remove the ref to wangyunjian's patch and
> remove the comments describing the previous ofproto destruction code.
> * fix group alloc leak issues.
>
> Signed-off-by: Peng He 
> Signed-off-by: guohongzhi 
> Acked-by: Mike Pattrick 
> ---
>  ofproto/ofproto-dpif-xlate-cache.c |  2 ++
>  ofproto/ofproto-dpif-xlate.c   | 14 
>  ofproto/ofproto-dpif.c | 24 +++--
>  ofproto/ofproto-provider.h |  2 ++
>  ofproto/ofproto.c  | 57 +++---
>  ofproto/ofproto.h  |  4 +++
>  6 files changed, 82 insertions(+), 21 deletions(-)
>
> diff --git a/ofproto/ofproto-dpif-xlate-cache.c 
> b/ofproto/ofproto-dpif-xlate-cache.c
> index dcc91cb38..9224ee2e6 100644
> --- a/ofproto/ofproto-dpif-xlate-cache.c
> +++ b/ofproto/ofproto-dpif-xlate-cache.c
> @@ -209,6 +209,7 @@ xlate_cache_clear_entry(struct xc_entry *entry)
>  {
>  switch (entry->type) {
>  case XC_TABLE:
> +ofproto_unref(&(entry->table.ofproto->up));
>  break;
>  case XC_RULE:
>  ofproto_rule_unref(>rule->up);
> @@ -231,6 +232,7 @@ xlate_cache_clear_entry(struct xc_entry *entry)
>  free(entry->learn.ofm);
>  break;
>  case XC_NORMAL:
> +ofproto_unref(&(entry->normal.ofproto->up));
>  break;
>  case XC_FIN_TIMEOUT:
>  /* 'u.fin.rule' is always already held as a XC_RULE, which
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 6fb59e170..129cdf714 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -3024,12 +3024,14 @@ xlate_normal(struct xlate_ctx *ctx)
>  struct xc_entry *entry;
> 
>  /* Save just enough info to update mac learning table later. */
> -entry = xlate_cache_add_entry(ctx->xin->xcache, XC_NORMAL);
> -entry->normal.ofproto = ctx->xbridge->ofproto;
> -entry->normal.in_port = flow->in_port.ofp_port;
> -entry->normal.dl_src = flow->dl_src;
> -entry->normal.vlan = vlan;
> -entry->normal.is_gratuitous_arp = is_grat_arp;
> +if (ofproto_try_ref(>xbridge->ofproto->up)) {
> + 

Re: [ovs-dev] [ovs-dev v5] ofproto: add refcount to ofproto to fix ofproto use-after-free

2022-02-04 Thread 0-day Robot
Bleep bloop.  Greetings Peng He, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Peng He  needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Peng He , guohongzhi 

WARNING: Line has trailing whitespace
#205 FILE: ofproto/ofproto.c:1699:
/* 

Lines checked: 321, Warnings: 2, Errors: 1


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [ovs-dev v5] ofproto: add refcount to ofproto to fix ofproto use-after-free

2022-02-04 Thread Peng He
From hepeng:
https://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0...@bytedance.com/#2487473

also from guohongzhi :
http://patchwork.ozlabs.org/project/openvswitch/patch/20200306130555.19884-1-guohongz...@huawei.com/

also from a discussion about the mixing use of RCU and refcount in the mail
list with Ilya Maximets, William Tu, Ben Pfaf, and Gaëtan Rivet.

A summary, as quoted from Ilya:

"
RCU for ofproto was introduced for one
and only one reason - to avoid freeing ofproto while rules are still
alive.  This was done in commit f416c8d61601 ("ofproto: RCU postpone
rule destruction.").  The goal was to allow using rules without
refcounting them within a single grace period.  And that forced us
to postpone destruction of the ofproto for a single grace period.
Later commit 39c9459355b6 ("Use classifier versioning.") made it
possible for rules to be alive for more than one grace period, so
the commit made ofproto wait for 2 grace periods by double postponing.
As we can see now, that wasn't enough and we have to wait for more
than 2 grace periods in certain cases.
"

In a short, the ofproto should have a longer life time than rule, if
the rule lasts for more than 2 grace periods, the ofproto should live
longer to ensure rule->ofproto is valid. It's hard to predict how long
a ofproto should live, thus we need to use refcount on ofproto to make
things easy. The controversial part is that we have already used RCU postpone
to delay ofproto destrution, if we have to add refcount, is it simpler to
use just refcount without RCU postpone?

IMO, I think going back to the pure refcount solution is more
complicated than mixing using both.

Gaëtan Rive asks some questions on guohongzhi's v2 patch:

during ofproto_rule_create, should we use ofproto_ref
or ofproto_try_ref? how can we make sure the ofproto is alive?

By using RCU, ofproto has three states:

state 1: alive, with refcount >= 1
state 2: dying, with refcount == 0, however pointer is valid
state 3: died, memory freed, pointer might be dangling.

Without using RCU, there is no state 2, thus, we have to be very careful
every time we see a ofproto pointer. In contrast, with RCU, we can be sure
that it's alive at least in this grace peroid, so we can just check if
it is dying by ofproto_try_ref.

This shows that by mixing use of RCU and refcount we can save a lot of work
worrying if ofproto is dangling.

In short, the RCU part makes sure the ofproto is alive when we use it,
and the refcount part makes sure it lives longer enough.

In this patch, I have merged guohongzhi's patch and mine, and fixes
accoring to the previous comments.

v4->v5:
* fix the comments, remove the ref to wangyunjian's patch and
remove the comments describing the previous ofproto destruction code.
* fix group alloc leak issues.

Signed-off-by: Peng He 
Signed-off-by: guohongzhi 
Acked-by: Mike Pattrick 
---
 ofproto/ofproto-dpif-xlate-cache.c |  2 ++
 ofproto/ofproto-dpif-xlate.c   | 14 
 ofproto/ofproto-dpif.c | 24 +++--
 ofproto/ofproto-provider.h |  2 ++
 ofproto/ofproto.c  | 57 +++---
 ofproto/ofproto.h  |  4 +++
 6 files changed, 82 insertions(+), 21 deletions(-)

diff --git a/ofproto/ofproto-dpif-xlate-cache.c 
b/ofproto/ofproto-dpif-xlate-cache.c
index dcc91cb38..9224ee2e6 100644
--- a/ofproto/ofproto-dpif-xlate-cache.c
+++ b/ofproto/ofproto-dpif-xlate-cache.c
@@ -209,6 +209,7 @@ xlate_cache_clear_entry(struct xc_entry *entry)
 {
 switch (entry->type) {
 case XC_TABLE:
+ofproto_unref(&(entry->table.ofproto->up));
 break;
 case XC_RULE:
 ofproto_rule_unref(>rule->up);
@@ -231,6 +232,7 @@ xlate_cache_clear_entry(struct xc_entry *entry)
 free(entry->learn.ofm);
 break;
 case XC_NORMAL:
+ofproto_unref(&(entry->normal.ofproto->up));
 break;
 case XC_FIN_TIMEOUT:
 /* 'u.fin.rule' is always already held as a XC_RULE, which
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 6fb59e170..129cdf714 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3024,12 +3024,14 @@ xlate_normal(struct xlate_ctx *ctx)
 struct xc_entry *entry;
 
 /* Save just enough info to update mac learning table later. */
-entry = xlate_cache_add_entry(ctx->xin->xcache, XC_NORMAL);
-entry->normal.ofproto = ctx->xbridge->ofproto;
-entry->normal.in_port = flow->in_port.ofp_port;
-entry->normal.dl_src = flow->dl_src;
-entry->normal.vlan = vlan;
-entry->normal.is_gratuitous_arp = is_grat_arp;
+if (ofproto_try_ref(>xbridge->ofproto->up)) {
+entry = xlate_cache_add_entry(ctx->xin->xcache, XC_NORMAL);
+entry->normal.ofproto = ctx->xbridge->ofproto;
+entry->normal.in_port = flow->in_port.ofp_port;
+entry->normal.dl_src = flow->dl_src;
+