Re: [PATCH net 1/1] net sched actions: fix refcnt when GETing of action after bind

2017-01-22 Thread Jamal Hadi Salim

On 17-01-20 01:20 AM, Cong Wang wrote:

On Wed, Jan 18, 2017 at 3:33 AM, Jamal Hadi Salim  wrote:

On 17-01-17 01:17 PM, Cong Wang wrote:





I did.
The issue there (after your original patch) was destroy() would
decrement the refcount to zero and a GET was essentially translated
to a DEL. Incrementing the refcount earlier protected against that
assuming destroy was going to decrement it.
However, when an action is bound the destroy() doesnt decrement
the refcnt. So the refcnt keeps going up forever (and therefore
deleting fails in the future). So we cant use destroy() as is.


Hmm, tcf_action_destroy() should not touch the refcnt at all in this case,
right? Since the refcnt here is not for readers in kernel code but for
user-space. We mix the use of this refcnt, which leads to problems.

Your patch is not correct either for DEL, tcf_action_destroy() is not
needed to call again after tcf_del_notify() fails, right? Probably
it is not needed at all:


Cong,
Please proceed to separate del from get. The trickery is biting us.
Also - run those tests i had in my patch. There is a difference
between the bound vs not-bound use cases.

cheers,
jamal


Re: [PATCH net 1/1] net sched actions: fix refcnt when GETing of action after bind

2017-01-19 Thread Cong Wang
On Wed, Jan 18, 2017 at 3:33 AM, Jamal Hadi Salim  wrote:
> On 17-01-17 01:17 PM, Cong Wang wrote:
>>
>> Why this check for RTM_GETACTION? It does not make sense
>> at least for the error case, that is, when tcf_action_get_1() fails
>> in the middle of the loop, all the previous ones should be destroyed
>> no matter it's GETACTION or DELACTION...
>>
>> Also, for the non-error case of GETACTION, they should be
>> destroyed too after dumping to user-space, otherwise there is no
>> way to use them since 'actions' is local to that function.
>>
>> I thought in commit aecc5cefc389 you grab that refcnt on purpose.
>>
>
> I did.
> The issue there (after your original patch) was destroy() would
> decrement the refcount to zero and a GET was essentially translated
> to a DEL. Incrementing the refcount earlier protected against that
> assuming destroy was going to decrement it.
> However, when an action is bound the destroy() doesnt decrement
> the refcnt. So the refcnt keeps going up forever (and therefore
> deleting fails in the future). So we cant use destroy() as is.

Hmm, tcf_action_destroy() should not touch the refcnt at all in this case,
right? Since the refcnt here is not for readers in kernel code but for
user-space. We mix the use of this refcnt, which leads to problems.

Your patch is not correct either for DEL, tcf_action_destroy() is not
needed to call again after tcf_del_notify() fails, right? Probably
it is not needed at all:

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index e10456ef6f..82eed18 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -907,13 +907,8 @@ tca_action_gd(struct net *net, struct nlattr
*nla, struct nlmsghdr *n,
ret = act_get_notify(net, portid, n, &actions, event);
else { /* delete */
ret = tcf_del_notify(net, n, &actions, portid);
-   if (ret)
-   goto err;
-   return ret;
}
 err:
-   if (event != RTM_GETACTION)
-   tcf_action_destroy(&actions, 0);
return ret;
 }


Re: [PATCH net 1/1] net sched actions: fix refcnt when GETing of action after bind

2017-01-18 Thread Jamal Hadi Salim

On 17-01-17 01:17 PM, Cong Wang wrote:

On Sun, Jan 15, 2017 at 7:14 AM, Jamal Hadi Salim  wrote:

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 2095c83..e10456ef6f 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -900,8 +900,6 @@ static int tca_action_flush(struct net *net, struct nlattr 
*nla,
goto err;
}
act->order = i;
-   if (event == RTM_GETACTION)
-   act->tcfa_refcnt++;
list_add_tail(&act->list, &actions);
}

@@ -914,7 +912,8 @@ static int tca_action_flush(struct net *net, struct nlattr 
*nla,
return ret;
}
 err:
-   tcf_action_destroy(&actions, 0);
+   if (event != RTM_GETACTION)
+   tcf_action_destroy(&actions, 0);


Why this check for RTM_GETACTION? It does not make sense
at least for the error case, that is, when tcf_action_get_1() fails
in the middle of the loop, all the previous ones should be destroyed
no matter it's GETACTION or DELACTION...

Also, for the non-error case of GETACTION, they should be
destroyed too after dumping to user-space, otherwise there is no
way to use them since 'actions' is local to that function.

I thought in commit aecc5cefc389 you grab that refcnt on purpose.



I did.
The issue there (after your original patch) was destroy() would
decrement the refcount to zero and a GET was essentially translated
to a DEL. Incrementing the refcount earlier protected against that
assuming destroy was going to decrement it.
However, when an action is bound the destroy() doesnt decrement
the refcnt. So the refcnt keeps going up forever (and therefore
deleting fails in the future). So we cant use destroy() as is.

We need another function to cover the scenario you are
describing. I had thought of using cleanup_a() for GET and leave
the refcount alone. But that is insufficient incase the module
refcounts went up. Maybe it is better to refactor get/del
to have different code paths (instead of the clever code reuse).


cheers,
jamal







Re: [PATCH net 1/1] net sched actions: fix refcnt when GETing of action after bind

2017-01-17 Thread Cong Wang
On Sun, Jan 15, 2017 at 7:14 AM, Jamal Hadi Salim  wrote:
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index 2095c83..e10456ef6f 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -900,8 +900,6 @@ static int tca_action_flush(struct net *net, struct 
> nlattr *nla,
> goto err;
> }
> act->order = i;
> -   if (event == RTM_GETACTION)
> -   act->tcfa_refcnt++;
> list_add_tail(&act->list, &actions);
> }
>
> @@ -914,7 +912,8 @@ static int tca_action_flush(struct net *net, struct 
> nlattr *nla,
> return ret;
> }
>  err:
> -   tcf_action_destroy(&actions, 0);
> +   if (event != RTM_GETACTION)
> +   tcf_action_destroy(&actions, 0);

Why this check for RTM_GETACTION? It does not make sense
at least for the error case, that is, when tcf_action_get_1() fails
in the middle of the loop, all the previous ones should be destroyed
no matter it's GETACTION or DELACTION...

Also, for the non-error case of GETACTION, they should be
destroyed too after dumping to user-space, otherwise there is no
way to use them since 'actions' is local to that function.

I thought in commit aecc5cefc389 you grab that refcnt on purpose.


Re: [PATCH net 1/1] net sched actions: fix refcnt when GETing of action after bind

2017-01-16 Thread David Miller
From: Jamal Hadi Salim 
Date: Sun, 15 Jan 2017 10:14:06 -0500

> From: Jamal Hadi Salim 
 ...
> Fixes: aecc5cefc389 ("net sched actions: fix GETing actions")
> Signed-off-by: Jamal Hadi Salim 

Applied and queued up for -stable, thanks Jamal.


[PATCH net 1/1] net sched actions: fix refcnt when GETing of action after bind

2017-01-15 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Demonstrating the issue:

.. add a drop action
$sudo $TC actions add action drop index 10

.. retrieve it
$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
 random type none pass val 0
 index 10 ref 2 bind 0 installed 29 sec used 29 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

... bug 1 above: reference is two.
Reference is actually 1 but we forget to subtract 1.

... do a GET again and we see the same issue
try a few times and nothing changes
~$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
 random type none pass val 0
 index 10 ref 2 bind 0 installed 31 sec used 31 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

... lets try to bind the action to a filter..
$ sudo $TC qdisc add dev lo ingress
$ sudo $TC filter add dev lo parent : protocol ip prio 1 \
  u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10

... and now a few GETs:
$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
 random type none pass val 0
 index 10 ref 3 bind 1 installed 204 sec used 204 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
 random type none pass val 0
 index 10 ref 4 bind 1 installed 206 sec used 206 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
 random type none pass val 0
 index 10 ref 5 bind 1 installed 235 sec used 235 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

 as can be observed the reference count keeps going up.

After the fix

$ sudo $TC actions add action drop index 10
$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
 random type none pass val 0
 index 10 ref 1 bind 0 installed 4 sec used 4 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
 random type none pass val 0
 index 10 ref 1 bind 0 installed 6 sec used 6 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

$ sudo $TC qdisc add dev lo ingress
$ sudo $TC filter add dev lo parent : protocol ip prio 1 \
  u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10

$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
 random type none pass val 0
 index 10 ref 2 bind 1 installed 32 sec used 32 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
 random type none pass val 0
 index 10 ref 2 bind 1 installed 33 sec used 33 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

Fixes: aecc5cefc389 ("net sched actions: fix GETing actions")

Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_api.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 2095c83..e10456ef6f 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -900,8 +900,6 @@ static int tca_action_flush(struct net *net, struct nlattr 
*nla,
goto err;
}
act->order = i;
-   if (event == RTM_GETACTION)
-   act->tcfa_refcnt++;
list_add_tail(&act->list, &actions);
}
 
@@ -914,7 +912,8 @@ static int tca_action_flush(struct net *net, struct nlattr 
*nla,
return ret;
}
 err:
-   tcf_action_destroy(&actions, 0);
+   if (event != RTM_GETACTION)
+   tcf_action_destroy(&actions, 0);
return ret;
 }
 
-- 
1.9.1