matchall race (was: Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it)

2016-11-25 Thread Daniel Borkmann

[ Making this a different thread, since unrelated to the other one. ]

On 11/25/2016 07:23 AM, Cong Wang wrote:

On Thu, Nov 24, 2016 at 4:17 PM, Daniel Borkmann  wrote:

[...]


(Btw, matchall is still broken besides this fix. mall_delete() sets the
  RCU_INIT_POINTER(head->filter, NULL), so that a mall_delete() plus
  mall_destroy() combo doesn't free head->filter twice, but doing that is
  racy with mall_classify() where head->filter is dereferenced unchecked.
  Requires additional fix.)


This seems due to matchall only has one filter per tp. But you don't need
to worry since readers never read a freed pointer, right?


The race would be that CPU0 is in tc_ctl_tfilter() with RTM_DELTFILTER and
tcm_handle = 1 request. matchall has only single handle and id of it is 1
(or a prior user-specified one). So in tc_ctl_tfilter() ->get() will find it,
returns pointer as fh. Then we call ->delete(). mall_delete() sets head->filter
to NULL with RCU_INIT_POINTER() (head->filter is not even an RCU pointer btw),
real filter destruction comes via call_rcu(). If we got here, that tp is still
visible and not unlinked from tp chain. So in CPU1 tc_classify() executes
mall_classify(), we fetch the head (tp->root) and the corresponding head->filter
from there, and dereference it next. Probably best would have been to just
return -EOPNOTSUPP for ->delete() like in cls_cgroup case. It probably makes
sense to just defer real destruction to mall_destroy() instead of mall_delete()
if one wants to avoid extra !f test in mall_classify().


Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-24 Thread Cong Wang
On Thu, Nov 24, 2016 at 4:17 PM, Daniel Borkmann  wrote:
>
>
> I'm not sure if setting a dummy object for each affected classifier is
> making things better. Are you having this in mind as a target for -net?
>
> We do kfree_rcu() the head (tp->root) and likewise do we kfree_rcu() the
> tp immediately after the callback. The head object's lifetime for such
> classifiers is thus always bound to the tp lifetime. And besides that,
> nothing apart from get() checks whether it's actually really NULL (and
> that check in get() is odd anyway; some cls do it, some don't).
>

Excellent point.

I thought we should exclude any parallel readers when we call destroy(),
you are taking a different approach by observing we only have to exclude
readers when we really free them, this seems fine to me after a second
thought, because the RCU API should take care of races with readers so
as long as we free everything in RCU callback we are good. Hmm...

But I may miss something since I am not an RCU expert.

[...]
>
> (Btw, matchall is still broken besides this fix. mall_delete() sets the
>  RCU_INIT_POINTER(head->filter, NULL), so that a mall_delete() plus
>  mall_destroy() combo doesn't free head->filter twice, but doing that is
>  racy with mall_classify() where head->filter is dereferenced unchecked.
>  Requires additional fix.)

This seems due to matchall only has one filter per tp. But you don't need
to worry since readers never read a freed pointer, right?


Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-24 Thread Daniel Borkmann

On 11/24/2016 09:25 PM, David Miller wrote:

From: Cong Wang 
Date: Tue, 22 Nov 2016 11:28:37 -0800


On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko  wrote:

Tue, Nov 22, 2016 at 05:04:11PM CET, dan...@iogearbox.net wrote:

Hmm, I don't think we want to have such an additional test in fast
path for each and every classifier. Can we think of ways to avoid that?

My question is, since we unlink individual instances from such tp-internal
lists through RCU and release the instance through call_rcu() as well as
the head (tp->root) via kfree_rcu() eventually, against what are we protecting
setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback? Something
not respecting grace period?


If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
to null.


We do need to respect the grace period if we touch the globally visible
data structure tp in tcf_destroy(). Therefore Roi's patch is not fixing the
right place.


Another idea is to assign tp->root to a dummy static cls_fl_head object,
instead of NULL, which we just make sure has an ht.elems value of zero.

This avoids having to touch the fast path and also avoids all of these
complicated changes being discussed wrt. doing things in call_rcu_bh()
or whatever.


I'm not sure if setting a dummy object for each affected classifier is
making things better. Are you having this in mind as a target for -net?

We do kfree_rcu() the head (tp->root) and likewise do we kfree_rcu() the
tp immediately after the callback. The head object's lifetime for such
classifiers is thus always bound to the tp lifetime. And besides that,
nothing apart from get() checks whether it's actually really NULL (and
that check in get() is odd anyway; some cls do it, some don't).

I took a deeper look into the history, and found that this was not always
the case. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic: fix NULL
pointer dereference") moved tp->root initialization into init() routine,
where before it was part of change(), so get() had to deal with head being
NULL back then, so that was indeed a valid case. Also some classify()
callbacks were checking for head being NULL, so destroy would set it to
NULL, f.e. 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg() in packet
classifiers").

For basic, bpf, flow, flower, matchall, I cooked the below diff which
should be usable and small enough for -net.

The remaining pieces from ->destroy() to ->delete() conversion patch from
Cong, we could later on do in -net-next.

Roi, could you check the below for flower with your setup?

(Btw, matchall is still broken besides this fix. mall_delete() sets the
 RCU_INIT_POINTER(head->filter, NULL), so that a mall_delete() plus
 mall_destroy() combo doesn't free head->filter twice, but doing that is
 racy with mall_classify() where head->filter is dereferenced unchecked.
 Requires additional fix.)

 net/sched/cls_basic.c|  4 
 net/sched/cls_bpf.c  |  4 
 net/sched/cls_flow.c |  1 -
 net/sched/cls_flower.c   | 15 +++
 net/sched/cls_matchall.c |  1 -
 5 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index eb219b7..5877f60 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -62,9 +62,6 @@ static unsigned long basic_get(struct tcf_proto *tp, u32 
handle)
struct basic_head *head = rtnl_dereference(tp->root);
struct basic_filter *f;

-   if (head == NULL)
-   return 0UL;
-
list_for_each_entry(f, &head->flist, link) {
if (f->handle == handle) {
l = (unsigned long) f;
@@ -109,7 +106,6 @@ static bool basic_destroy(struct tcf_proto *tp, bool force)
tcf_unbind_filter(tp, &f->res);
call_rcu(&f->rcu, basic_delete_filter);
}
-   RCU_INIT_POINTER(tp->root, NULL);
kfree_rcu(head, rcu);
return true;
 }
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index bb1d5a4..0a47ba5 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -292,7 +292,6 @@ static bool cls_bpf_destroy(struct tcf_proto *tp, bool 
force)
call_rcu(&prog->rcu, __cls_bpf_delete_prog);
}

-   RCU_INIT_POINTER(tp->root, NULL);
kfree_rcu(head, rcu);
return true;
 }
@@ -303,9 +302,6 @@ static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 
handle)
struct cls_bpf_prog *prog;
unsigned long ret = 0UL;

-   if (head == NULL)
-   return 0UL;
-
list_for_each_entry(prog, &head->plist, link) {
if (prog->handle == handle) {
ret = (unsigned long) prog;
diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index e396723..6575aba 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -596,7 +596,6 @@ static bool flow_destroy(struct tcf_proto *tp, bool force)
list_del_rcu(&f->list);
call_rcu(&f->rcu, flow_destroy_filter);
   

Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-24 Thread David Miller
From: Cong Wang 
Date: Tue, 22 Nov 2016 11:28:37 -0800

> On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko  wrote:
>> Tue, Nov 22, 2016 at 05:04:11PM CET, dan...@iogearbox.net wrote:
>>>Hmm, I don't think we want to have such an additional test in fast
>>>path for each and every classifier. Can we think of ways to avoid that?
>>>
>>>My question is, since we unlink individual instances from such tp-internal
>>>lists through RCU and release the instance through call_rcu() as well as
>>>the head (tp->root) via kfree_rcu() eventually, against what are we 
>>>protecting
>>>setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback? Something
>>>not respecting grace period?
>>
>> If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
>> to null.
> 
> We do need to respect the grace period if we touch the globally visible
> data structure tp in tcf_destroy(). Therefore Roi's patch is not fixing the
> right place.

Another idea is to assign tp->root to a dummy static cls_fl_head object,
instead of NULL, which we just make sure has an ht.elems value of zero.

This avoids having to touch the fast path and also avoids all of these
complicated changes being discussed wrt. doing things in call_rcu_bh()
or whatever.


Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-23 Thread Cong Wang
On Wed, Nov 23, 2016 at 3:29 AM, Daniel Borkmann  wrote:
>
> Can't we drop the 'force' parameter from tcf_destroy() and related cls
> destroy() callbacks, and change the logic roughly like this:
>
> [...]
> case RTM_DELTFILTER:
> err = tp->ops->delete(tp, fh, &drop_tp);
> if (err == 0) {
> struct tcf_proto *next = rtnl_dereference(tp->next);
>
> tfilter_notify(net, skb, n, tp,
>t->tcm_handle,
>RTM_DELTFILTER, false);
> if (drop_tp) {
> RCU_INIT_POINTER(*back, next);
> tcf_destroy(tp);
> }
> }
> goto errout;
> [...]
>
> This one was the only tcf_destroy() instance with force=false. Why can't
> the prior delete() callback make the decision whether the tp now has no
> further internal filters and thus can be dropped. Afaik, delete() and
> destroy() are protected by RTNL anyway. Thus, we could unlink the tp from
> the list before tcf_destroy(), which should then work with grace period
> as well. Given we remove the setting of tp->root to NULL, any outstanding
> readers for that grace period should either still execute the 'scheduled
> for removal' filter we just dropped, or find an empty list of filters.

This is exactly why I said "the semantic of ->destroy() needs to revise too",
this is a reasonable revision of course, but the change is still large because
we need to move that logic from ->destroy() to ->delete(). I was trying to find
a relatively small fix for -net and -stable, for -net-next we could do
aggressive
change as long as it's necessary. This is why I am still thinking about it,
perhaps there is no quick fix for this bug.


>
>> Hmm, perhaps we really have to switch to a doubly-linked list, that is
>> list_head. I need to double check. And also the semantic of ->destroy()
>> needs to revise too.
>
>
> Can you elaborate why double-linked list? Isn't the tp list always protected
> from modifications via RTNL in control path, and walked via
> rcu_dereference_bh()
> in data path?

At least two benefits we can get from using doubly-linked list:

1) No need to pass a 'prev' pointer if we want to remove tp in a RCU callback,
list_del_rcu(&tp->head) is just enough.

2) No need to worry about RCU pointers because list_head has RCU API's
already, much more readable to me.

Of course, the size of struct tcf_proto will grow a bit, but it doesn't seem to
be a problem.


Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-23 Thread Daniel Borkmann

On 11/23/2016 06:24 AM, Cong Wang wrote:

On Tue, Nov 22, 2016 at 3:36 PM, John Fastabend
 wrote:

On 16-11-22 12:41 PM, Daniel Borkmann wrote:

On 11/22/2016 08:28 PM, Cong Wang wrote:

On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko  wrote:

Tue, Nov 22, 2016 at 05:04:11PM CET, dan...@iogearbox.net wrote:

Hmm, I don't think we want to have such an additional test in fast
path for each and every classifier. Can we think of ways to avoid that?

My question is, since we unlink individual instances from such
tp-internal
lists through RCU and release the instance through call_rcu() as
well as
the head (tp->root) via kfree_rcu() eventually, against what are we
protecting
setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback?
Something
not respecting grace period?


If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
to null.


But that's not really an answer to my question. ;)


We do need to respect the grace period if we touch the globally visible
data structure tp in tcf_destroy(). Therefore Roi's patch is not
fixing the
right place.


I think there may be multiple issues actually.

At the time we go into tc_classify(), from ingress as well as egress side,
we're under RCU, but BH variant. In cls delete()/destroy() callbacks, we
everywhere use call_rcu() and kfree_rcu(), same as for tcf_destroy() where
we use kfree_rcu() on tp, although we iterate tps (and implicitly inner
filters)
via rcu_dereference_bh() from reader side. Is there a reason why we don't
use call_rcu_bh() variant on destruction for all this instead?


I can't think of any if its all under _bh we can convert the call_rcu to
call_rcu_bh it just needs an audit.


Just looking at cls_bpf and others, what protects
RCU_INIT_POINTER(tp->root,
NULL) against? The tp is unlinked in tc_ctl_tfilter() from the tp chain in
tcf_destroy() cases. Still active readers under RCU BH can race against
this
(tp->root being NULL), as the commit identified. Only the get() callback
checks
for head against NULL, but both are serialized under rtnl, and the only
place
we call this is tc_ctl_tfilter(). Even if we create a new tp, head
should not
be NULL there, if it was assigned during the init() cb, but contains an
empty
list. (It's different for things like cls_cgroup, though.) So, I'm
wondering
if the RCU_INIT_POINTER(tp->root, NULL) can just be removed instead
(unless I'm
missing something obvious)?


Just took a look at this I think there are a couple possible solutions.
The easiest is likely to fix all the call sites so that 'tp' is unlinked
before calling the destroy() handlers AND not doing the NULL set. I only
see one such call site where destroy is called before unlinking at the
moment. This should enforce that after a grace period there is no path
to reach the classifiers because 'tp' is unlinked. Calling destroy
before unlinking 'tp' however could cause a small race between grace
period of 'tp' and grace period of the filter.

Another would be to only call the destroy path from the call_rcu path
of the 'tp' object so that destroy is only ever called after the object
is guaranteed to be unlinked from the tc_filter path.

I think both solutions would be fine.

Cong were you working on one of these? Or do you have another idea?


Yeah, this is basic what I think as well, however, both are hard.
On one hand, we can't detach the tp from the global singly-linked list
before tcf_destroy() since we rely on its return value to make this decision.
On the other hand, it is a singly-linked list, we have to pass in the address
of its previous pointer to rcu callback to remove it, it seems racy as well
since we modify a previous pointer which is still visible globally...


Can't we drop the 'force' parameter from tcf_destroy() and related cls
destroy() callbacks, and change the logic roughly like this:

[...]
case RTM_DELTFILTER:
err = tp->ops->delete(tp, fh, &drop_tp);
if (err == 0) {
struct tcf_proto *next = rtnl_dereference(tp->next);

tfilter_notify(net, skb, n, tp,
   t->tcm_handle,
   RTM_DELTFILTER, false);
if (drop_tp) {
RCU_INIT_POINTER(*back, next);
tcf_destroy(tp);
}
}
goto errout;
[...]

This one was the only tcf_destroy() instance with force=false. Why can't
the prior delete() callback make the decision whether the tp now has no
further internal filters and thus can be dropped. Afaik, delete() and
destroy() are protected by RTNL anyway. Thus, we could unlink the tp from
the list before tcf_destroy(), which should then work with grace period
as well. Given we remove the setting of tp->root to NULL, any outstanding
readers for that grace period should either still execute the 'scheduled
for removal' filter we just dropped, or find an 

Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-22 Thread Cong Wang
On Tue, Nov 22, 2016 at 3:36 PM, John Fastabend
 wrote:
> On 16-11-22 12:41 PM, Daniel Borkmann wrote:
>> On 11/22/2016 08:28 PM, Cong Wang wrote:
>>> On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko  wrote:
 Tue, Nov 22, 2016 at 05:04:11PM CET, dan...@iogearbox.net wrote:
> Hmm, I don't think we want to have such an additional test in fast
> path for each and every classifier. Can we think of ways to avoid that?
>
> My question is, since we unlink individual instances from such
> tp-internal
> lists through RCU and release the instance through call_rcu() as
> well as
> the head (tp->root) via kfree_rcu() eventually, against what are we
> protecting
> setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback?
> Something
> not respecting grace period?

 If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
 to null.
>>
>> But that's not really an answer to my question. ;)
>>
>>> We do need to respect the grace period if we touch the globally visible
>>> data structure tp in tcf_destroy(). Therefore Roi's patch is not
>>> fixing the
>>> right place.
>>
>> I think there may be multiple issues actually.
>>
>> At the time we go into tc_classify(), from ingress as well as egress side,
>> we're under RCU, but BH variant. In cls delete()/destroy() callbacks, we
>> everywhere use call_rcu() and kfree_rcu(), same as for tcf_destroy() where
>> we use kfree_rcu() on tp, although we iterate tps (and implicitly inner
>> filters)
>> via rcu_dereference_bh() from reader side. Is there a reason why we don't
>> use call_rcu_bh() variant on destruction for all this instead?
>
> I can't think of any if its all under _bh we can convert the call_rcu to
> call_rcu_bh it just needs an audit.
>
>>
>> Just looking at cls_bpf and others, what protects
>> RCU_INIT_POINTER(tp->root,
>> NULL) against? The tp is unlinked in tc_ctl_tfilter() from the tp chain in
>> tcf_destroy() cases. Still active readers under RCU BH can race against
>> this
>> (tp->root being NULL), as the commit identified. Only the get() callback
>> checks
>> for head against NULL, but both are serialized under rtnl, and the only
>> place
>> we call this is tc_ctl_tfilter(). Even if we create a new tp, head
>> should not
>> be NULL there, if it was assigned during the init() cb, but contains an
>> empty
>> list. (It's different for things like cls_cgroup, though.) So, I'm
>> wondering
>> if the RCU_INIT_POINTER(tp->root, NULL) can just be removed instead
>> (unless I'm
>> missing something obvious)?
>
>
> Just took a look at this I think there are a couple possible solutions.
> The easiest is likely to fix all the call sites so that 'tp' is unlinked
> before calling the destroy() handlers AND not doing the NULL set. I only
> see one such call site where destroy is called before unlinking at the
> moment. This should enforce that after a grace period there is no path
> to reach the classifiers because 'tp' is unlinked. Calling destroy
> before unlinking 'tp' however could cause a small race between grace
> period of 'tp' and grace period of the filter.
>
> Another would be to only call the destroy path from the call_rcu path
> of the 'tp' object so that destroy is only ever called after the object
> is guaranteed to be unlinked from the tc_filter path.
>
> I think both solutions would be fine.
>
> Cong were you working on one of these? Or do you have another idea?

Yeah, this is basic what I think as well, however, both are hard.
On one hand, we can't detach the tp from the global singly-linked list
before tcf_destroy() since we rely on its return value to make this decision.
On the other hand, it is a singly-linked list, we have to pass in the address
of its previous pointer to rcu callback to remove it, it seems racy as well
since we modify a previous pointer which is still visible globally...

Hmm, perhaps we really have to switch to a doubly-linked list, that is
list_head. I need to double check. And also the semantic of ->destroy()
needs to revise too.

So yeah, my commit should be blamed. :-/


Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-22 Thread John Fastabend
On 16-11-22 12:41 PM, Daniel Borkmann wrote:
> On 11/22/2016 08:28 PM, Cong Wang wrote:
>> On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko  wrote:
>>> Tue, Nov 22, 2016 at 05:04:11PM CET, dan...@iogearbox.net wrote:
 Hmm, I don't think we want to have such an additional test in fast
 path for each and every classifier. Can we think of ways to avoid that?

 My question is, since we unlink individual instances from such
 tp-internal
 lists through RCU and release the instance through call_rcu() as
 well as
 the head (tp->root) via kfree_rcu() eventually, against what are we
 protecting
 setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback?
 Something
 not respecting grace period?
>>>
>>> If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
>>> to null.
> 
> But that's not really an answer to my question. ;)
> 
>> We do need to respect the grace period if we touch the globally visible
>> data structure tp in tcf_destroy(). Therefore Roi's patch is not
>> fixing the
>> right place.
> 
> I think there may be multiple issues actually.
> 
> At the time we go into tc_classify(), from ingress as well as egress side,
> we're under RCU, but BH variant. In cls delete()/destroy() callbacks, we
> everywhere use call_rcu() and kfree_rcu(), same as for tcf_destroy() where
> we use kfree_rcu() on tp, although we iterate tps (and implicitly inner
> filters)
> via rcu_dereference_bh() from reader side. Is there a reason why we don't
> use call_rcu_bh() variant on destruction for all this instead?

I can't think of any if its all under _bh we can convert the call_rcu to
call_rcu_bh it just needs an audit.

> 
> Just looking at cls_bpf and others, what protects
> RCU_INIT_POINTER(tp->root,
> NULL) against? The tp is unlinked in tc_ctl_tfilter() from the tp chain in
> tcf_destroy() cases. Still active readers under RCU BH can race against
> this
> (tp->root being NULL), as the commit identified. Only the get() callback
> checks
> for head against NULL, but both are serialized under rtnl, and the only
> place
> we call this is tc_ctl_tfilter(). Even if we create a new tp, head
> should not
> be NULL there, if it was assigned during the init() cb, but contains an
> empty
> list. (It's different for things like cls_cgroup, though.) So, I'm
> wondering
> if the RCU_INIT_POINTER(tp->root, NULL) can just be removed instead
> (unless I'm
> missing something obvious)?


Just took a look at this I think there are a couple possible solutions.
The easiest is likely to fix all the call sites so that 'tp' is unlinked
before calling the destroy() handlers AND not doing the NULL set. I only
see one such call site where destroy is called before unlinking at the
moment. This should enforce that after a grace period there is no path
to reach the classifiers because 'tp' is unlinked. Calling destroy
before unlinking 'tp' however could cause a small race between grace
period of 'tp' and grace period of the filter.

Another would be to only call the destroy path from the call_rcu path
of the 'tp' object so that destroy is only ever called after the object
is guaranteed to be unlinked from the tc_filter path.

I think both solutions would be fine.

Cong were you working on one of these? Or do you have another idea?


> 
>> Also I don't know why you blame my commit, this problem should already
>> exist prior to my commit, probably date back to John's RCU patches.
> 
> It seems so.



Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-22 Thread Daniel Borkmann

On 11/22/2016 08:28 PM, Cong Wang wrote:

On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko  wrote:

Tue, Nov 22, 2016 at 05:04:11PM CET, dan...@iogearbox.net wrote:

Hmm, I don't think we want to have such an additional test in fast
path for each and every classifier. Can we think of ways to avoid that?

My question is, since we unlink individual instances from such tp-internal
lists through RCU and release the instance through call_rcu() as well as
the head (tp->root) via kfree_rcu() eventually, against what are we protecting
setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback? Something
not respecting grace period?


If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
to null.


But that's not really an answer to my question. ;)


We do need to respect the grace period if we touch the globally visible
data structure tp in tcf_destroy(). Therefore Roi's patch is not fixing the
right place.


I think there may be multiple issues actually.

At the time we go into tc_classify(), from ingress as well as egress side,
we're under RCU, but BH variant. In cls delete()/destroy() callbacks, we
everywhere use call_rcu() and kfree_rcu(), same as for tcf_destroy() where
we use kfree_rcu() on tp, although we iterate tps (and implicitly inner filters)
via rcu_dereference_bh() from reader side. Is there a reason why we don't
use call_rcu_bh() variant on destruction for all this instead?

Just looking at cls_bpf and others, what protects RCU_INIT_POINTER(tp->root,
NULL) against? The tp is unlinked in tc_ctl_tfilter() from the tp chain in
tcf_destroy() cases. Still active readers under RCU BH can race against this
(tp->root being NULL), as the commit identified. Only the get() callback checks
for head against NULL, but both are serialized under rtnl, and the only place
we call this is tc_ctl_tfilter(). Even if we create a new tp, head should not
be NULL there, if it was assigned during the init() cb, but contains an empty
list. (It's different for things like cls_cgroup, though.) So, I'm wondering
if the RCU_INIT_POINTER(tp->root, NULL) can just be removed instead (unless I'm
missing something obvious)?


Also I don't know why you blame my commit, this problem should already
exist prior to my commit, probably date back to John's RCU patches.


It seems so.


Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-22 Thread Cong Wang
On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko  wrote:
> Tue, Nov 22, 2016 at 05:04:11PM CET, dan...@iogearbox.net wrote:
>>Hmm, I don't think we want to have such an additional test in fast
>>path for each and every classifier. Can we think of ways to avoid that?
>>
>>My question is, since we unlink individual instances from such tp-internal
>>lists through RCU and release the instance through call_rcu() as well as
>>the head (tp->root) via kfree_rcu() eventually, against what are we protecting
>>setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback? Something
>>not respecting grace period?
>
> If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
> to null.

We do need to respect the grace period if we touch the globally visible
data structure tp in tcf_destroy(). Therefore Roi's patch is not fixing the
right place.

Also I don't know why you blame my commit, this problem should already
exist prior to my commit, probably date back to John's RCU patches.

I am working on a patch.


Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-22 Thread Jiri Pirko
Tue, Nov 22, 2016 at 04:37:42PM CET, da...@davemloft.net wrote:
>From: Jiri Pirko 
>Date: Tue, 22 Nov 2016 15:48:44 +0100
>
>> Tue, Nov 22, 2016 at 03:25:26PM CET, r...@mellanox.com wrote:
>>>tp->root is being allocated in init() time and kfreed in destroy()
>>>however it is being dereferenced in classify() path.
>>>
>>>We could be in classify() path after destroy() was called and thus 
>>>tp->root is null. Verifying if tp->root is null in classify() path 
>>>is enough because it's being freed with kfree_rcu() and classify() 
>>>path is under rcu_read_lock().
>>>
>>>Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
>>>Signed-off-by: Roi Dayan 
>>>Cc: Cong Wang 
>> 
>> This is correct
>> 
>> Reviewed-by: Jiri Pirko 
>> 
>> The other way to fix this would be to move tp->ops->destroy call to
>> call_rcu phase. That would require bigger changes though. net-next
>> perhaps?
>
>This patch is targetted at net-next as per Subj.

Oh, right, then it should be fixed so the tp->head could be never null


Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-22 Thread Jiri Pirko
Tue, Nov 22, 2016 at 05:04:11PM CET, dan...@iogearbox.net wrote:
>[ + John ]
>
>On 11/22/2016 03:48 PM, Jiri Pirko wrote:
>> Tue, Nov 22, 2016 at 03:25:26PM CET, r...@mellanox.com wrote:
>> > tp->root is being allocated in init() time and kfreed in destroy()
>> > however it is being dereferenced in classify() path.
>> > 
>> > We could be in classify() path after destroy() was called and thus
>> > tp->root is null. Verifying if tp->root is null in classify() path
>> > is enough because it's being freed with kfree_rcu() and classify()
>> > path is under rcu_read_lock().
>> > 
>> > Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are 
>> > gone")
>> > Signed-off-by: Roi Dayan 
>> > Cc: Cong Wang 
>> 
>> This is correct
>> 
>> Reviewed-by: Jiri Pirko 
>> 
>> The other way to fix this would be to move tp->ops->destroy call to
>> call_rcu phase. That would require bigger changes though. net-next
>> perhaps?
>
>Hmm, I don't think we want to have such an additional test in fast
>path for each and every classifier. Can we think of ways to avoid that?
>
>My question is, since we unlink individual instances from such tp-internal
>lists through RCU and release the instance through call_rcu() as well as
>the head (tp->root) via kfree_rcu() eventually, against what are we protecting
>setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback? Something
>not respecting grace period?

If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
to null.


>
>The only thing that actually checks if tp->root is NULL right now is the
>get() callback. Is that the reason why tp->root is RCU'ified? John?
>
>Thanks,
>Daniel
>
>> > Hi Cong, all
>> > 
>> > As stated above, the issue was introduced with commit 1e052be69d04 
>> > ("net_sched: destroy
>> > proto tp when all filters are gone"). This patch provides a fix only for 
>> > cls_flower where
>> > I succeeded in reproducing the issue. Cong, if you can/want to come up 
>> > with a fix that
>> > will be applicable for all the others classifiners, I am fine with that.
>> > 
>> > Thanks,
>> > Roi
>> > 
>> > 
>> > net/sched/cls_flower.c | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> > 
>> > diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>> > index e8dd09a..88a26c4 100644
>> > --- a/net/sched/cls_flower.c
>> > +++ b/net/sched/cls_flower.c
>> > @@ -135,7 +135,7 @@ static int fl_classify(struct sk_buff *skb, const 
>> > struct tcf_proto *tp,
>> >struct fl_flow_key skb_mkey;
>> >struct ip_tunnel_info *info;
>> > 
>> > -  if (!atomic_read(&head->ht.nelems))
>> > +  if (!head || !atomic_read(&head->ht.nelems))
>> >return -1;
>> > 
>> >fl_clear_masked_range(&skb_key, &head->mask);
>> > --
>> > 2.7.4
>> > 
>


Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-22 Thread Daniel Borkmann

[ + John ]

On 11/22/2016 03:48 PM, Jiri Pirko wrote:

Tue, Nov 22, 2016 at 03:25:26PM CET, r...@mellanox.com wrote:

tp->root is being allocated in init() time and kfreed in destroy()
however it is being dereferenced in classify() path.

We could be in classify() path after destroy() was called and thus
tp->root is null. Verifying if tp->root is null in classify() path
is enough because it's being freed with kfree_rcu() and classify()
path is under rcu_read_lock().

Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
Signed-off-by: Roi Dayan 
Cc: Cong Wang 


This is correct

Reviewed-by: Jiri Pirko 

The other way to fix this would be to move tp->ops->destroy call to
call_rcu phase. That would require bigger changes though. net-next
perhaps?


Hmm, I don't think we want to have such an additional test in fast
path for each and every classifier. Can we think of ways to avoid that?

My question is, since we unlink individual instances from such tp-internal
lists through RCU and release the instance through call_rcu() as well as
the head (tp->root) via kfree_rcu() eventually, against what are we protecting
setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback? Something
not respecting grace period?

The only thing that actually checks if tp->root is NULL right now is the
get() callback. Is that the reason why tp->root is RCU'ified? John?

Thanks,
Daniel


Hi Cong, all

As stated above, the issue was introduced with commit 1e052be69d04 ("net_sched: 
destroy
proto tp when all filters are gone"). This patch provides a fix only for 
cls_flower where
I succeeded in reproducing the issue. Cong, if you can/want to come up with a 
fix that
will be applicable for all the others classifiners, I am fine with that.

Thanks,
Roi


net/sched/cls_flower.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index e8dd09a..88a26c4 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -135,7 +135,7 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
struct fl_flow_key skb_mkey;
struct ip_tunnel_info *info;

-   if (!atomic_read(&head->ht.nelems))
+   if (!head || !atomic_read(&head->ht.nelems))
return -1;

fl_clear_masked_range(&skb_key, &head->mask);
--
2.7.4





Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-22 Thread David Miller
From: Jiri Pirko 
Date: Tue, 22 Nov 2016 15:48:44 +0100

> Tue, Nov 22, 2016 at 03:25:26PM CET, r...@mellanox.com wrote:
>>tp->root is being allocated in init() time and kfreed in destroy()
>>however it is being dereferenced in classify() path.
>>
>>We could be in classify() path after destroy() was called and thus 
>>tp->root is null. Verifying if tp->root is null in classify() path 
>>is enough because it's being freed with kfree_rcu() and classify() 
>>path is under rcu_read_lock().
>>
>>Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
>>Signed-off-by: Roi Dayan 
>>Cc: Cong Wang 
> 
> This is correct
> 
> Reviewed-by: Jiri Pirko 
> 
> The other way to fix this would be to move tp->ops->destroy call to
> call_rcu phase. That would require bigger changes though. net-next
> perhaps?

This patch is targetted at net-next as per Subj.


Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-22 Thread Jiri Pirko
Tue, Nov 22, 2016 at 03:25:26PM CET, r...@mellanox.com wrote:
>tp->root is being allocated in init() time and kfreed in destroy()
>however it is being dereferenced in classify() path.
>
>We could be in classify() path after destroy() was called and thus 
>tp->root is null. Verifying if tp->root is null in classify() path 
>is enough because it's being freed with kfree_rcu() and classify() 
>path is under rcu_read_lock().
>
>Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
>Signed-off-by: Roi Dayan 
>Cc: Cong Wang 

This is correct

Reviewed-by: Jiri Pirko 

The other way to fix this would be to move tp->ops->destroy call to
call_rcu phase. That would require bigger changes though. net-next
perhaps?



>---
>
>Hi Cong, all
>
>As stated above, the issue was introduced with commit 1e052be69d04 
>("net_sched: destroy 
>proto tp when all filters are gone"). This patch provides a fix only for 
>cls_flower where 
>I succeeded in reproducing the issue. Cong, if you can/want to come up with a 
>fix that
>will be applicable for all the others classifiners, I am fine with that.
>
>Thanks,
>Roi
>
>
> net/sched/cls_flower.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>index e8dd09a..88a26c4 100644
>--- a/net/sched/cls_flower.c
>+++ b/net/sched/cls_flower.c
>@@ -135,7 +135,7 @@ static int fl_classify(struct sk_buff *skb, const struct 
>tcf_proto *tp,
>   struct fl_flow_key skb_mkey;
>   struct ip_tunnel_info *info;
> 
>-  if (!atomic_read(&head->ht.nelems))
>+  if (!head || !atomic_read(&head->ht.nelems))
>   return -1;
> 
>   fl_clear_masked_range(&skb_key, &head->mask);
>-- 
>2.7.4
>


[PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-22 Thread Roi Dayan
tp->root is being allocated in init() time and kfreed in destroy()
however it is being dereferenced in classify() path.

We could be in classify() path after destroy() was called and thus 
tp->root is null. Verifying if tp->root is null in classify() path 
is enough because it's being freed with kfree_rcu() and classify() 
path is under rcu_read_lock().

Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
Signed-off-by: Roi Dayan 
Cc: Cong Wang 
---

Hi Cong, all

As stated above, the issue was introduced with commit 1e052be69d04 ("net_sched: 
destroy 
proto tp when all filters are gone"). This patch provides a fix only for 
cls_flower where 
I succeeded in reproducing the issue. Cong, if you can/want to come up with a 
fix that
will be applicable for all the others classifiners, I am fine with that.

Thanks,
Roi


 net/sched/cls_flower.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index e8dd09a..88a26c4 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -135,7 +135,7 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
struct fl_flow_key skb_mkey;
struct ip_tunnel_info *info;
 
-   if (!atomic_read(&head->ht.nelems))
+   if (!head || !atomic_read(&head->ht.nelems))
return -1;
 
fl_clear_masked_range(&skb_key, &head->mask);
-- 
2.7.4