from:"Vlad Buslov"

Re: [PATCH net-next 17/17] net: sched: unlock rules update API

2018-11-14 Thread Vlad Buslov



On Wed 14 Nov 2018 at 06:44, Jiri Pirko  wrote:
> Tue, Nov 13, 2018 at 02:46:54PM CET, vla...@mellanox.com wrote:
>>On Mon 12 Nov 2018 at 17:30, David Miller  wrote:
>>> From: Vlad Buslov 
>>> Date: Mon, 12 Nov 2018 09:55:46 +0200
>>>
>>>> Register netlink protocol handlers for message types RTM_NEWTFILTER,
>>>> RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that
>>>> tracks rtnl mutex state to be false by default.
>>>
>>> This whole conditional locking mechanism is really not clean and makes
>>> this code so much harder to understand and audit.
>>>
>>> Please improve the code so that this kind of construct is not needed.
>>>
>>> Thank you.
>>
>>Hi David,
>>
>>I considered several approaches to this problem and decided that this
>>one is most straightforward to implement. I understand your concern and
>>agree that this code is not easiest to understand and can suggest
>>several possible solutions that do not require this kind of elaborate
>>locking mechanism in cls API, but have their own drawbacks:
>>
>>1. Convert all qdiscs and classifiers to support unlocked execution,
>>like we did for actions. However, according to my experience with
>>converting flower classifier, these require much more code than actions.
>>I would estimate it to be more work than whole current unlocking effort
>>(hundred+ patches). Also, authors of some of them might be unhappy with
>>such intrusive changes. I don't think this approach is realistic.
>>
>>2. Somehow determine if rtnl is needed at the beginning of cls API rule
>>update functions. Currently, this is not possible because locking
>>requirements are determined by qdisc_class_ops and tcf_proto_ops 'flags'
>>field, which requires code to first do whole ops lookup sequence.
>>However, instead of class field I can put 'flags' in some kind of hash
>>table or array that will map qdisc/classifier type string to flags, so
>>it will be possible to determine locking requirements by just parsing
>>netlink message and obtaining flags by qdisc/classifier type. I do not
>>consider it pretty solution either, but maybe you have different
>>opinion.
>
> I think you will have to do 2. or some modification. Can't you just
> check for cls ability to run unlocked early on in tc_new_tfilter()?
> You would call tcf_proto_locking_check(nla_data(tca[TCA_KIND]), ...),
> which would do tcf_proto_lookup_ops() for ops and check the flags?

I guess that would work. However, such solution requires calling
tcf_proto_lookup_ops(), which iterates over tcf_proto_base list and
calls strcmp() for each proto, for every rule update call. That is why I
suggested to use some kind of optimized data structure for that purpose
in my first reply. Dunno if such solution will significantly impact rule
update performance. We don't have that many classifiers and their names
are short, so I guess not?

>
>
>>
>>3. Anything you can suggest? I might be missing something simple that
>>you would consider more elegant solution to this problem.
>>
>>Thanks,
>>Vlad
>>

Re: [PATCH net-next 16/17] net: sched: conditionally take rtnl lock on rules update path

2018-11-13 Thread Vlad Buslov



On Tue 13 Nov 2018 at 13:40, Stefano Brivio  wrote:
> On Tue, 13 Nov 2018 13:25:52 +
> Vlad Buslov  wrote:
>
>> On Tue 13 Nov 2018 at 09:40, Stefano Brivio  wrote:
>> > Hi Vlad,
>> >
>> > On Mon, 12 Nov 2018 09:55:45 +0200
>> > Vlad Buslov  wrote:
>> >  
>> >> @@ -179,9 +179,25 @@ static void tcf_proto_destroy_work(struct 
>> >> work_struct *work)
>> >>   rtnl_unlock();
>> >>  }
>> >>  
>> >> +/* Helper function to lock rtnl mutex when specified condition is true 
>> >> and mutex
>> >> + * hasn't been locked yet. Will set rtnl_held to 'true' before taking 
>> >> rtnl lock.
>> >> + * Note that this function does nothing if rtnl is already held. This is
>> >> + * intended to be used by cls API rules update API when multiple 
>> >> conditions
>> >> + * could require rtnl lock and its state needs to be tracked to prevent 
>> >> trying
>> >> + * to obtain lock multiple times.
>> >> + */
>> >> +
>> >> +static void tcf_require_rtnl(bool cond, bool *rtnl_held)
>> >> +{
>> >> + if (!*rtnl_held && cond) {
>> >> + *rtnl_held = true;
>> >> + rtnl_lock();
>> >> + }
>> >> +}  
>> >
>> > I guess calls to this function are supposed to be serialised. If that's
>> > the case (which is my tentative understanding so far), I would indicate
>> > that in the comment.
>> >
>> > If that's not the case, you would be introducing a race I guess.
>> >
>> > Same applies to tcf_block_release() from 17/17.  
>> 
>> Hi Stefano,
>> 
>> Thank you for reviewing my code!
>> 
>> I did not intend for this function to be serialized. First argument to
>> tcf_require_rtnl() is passed by value, and second argument is always a
>> pointer to local stack-allocated value of the caller.
>
> Yes, sorry, I haven't been terribly clear, that's what I meant by
> serialised: it won't be called concurrently with the same *rtnl_held.
>
> Perhaps the risk that somebody uses it that way is close to zero, so
> I'm not even too sure this is worth a comment, but if you can come up
> with a concise way of saying this, that would be nice.

I considered my comment that function "Will set rtnl_held to 'true'
before taking rtnl lock" as a red flag for caller to not pass pointer to
a variable that can be accessed concurrently. I guess I can add
additional sentence to explicitly warn potential users. Or I can just
move rtnl_held assignment in both functions to be performed while
holding rtnl mutex. I implemented it the way I did as an overzealous
optimization, but realistically price of an assignment is negligible in
this case. Suggestions are welcome!

>
>> Same applies to tcf_block_release() - its arguments are Qdisc and block
>> which support concurrency-safe reference counting, and pointer to local
>> variable rtnl_held, which is not accessible to concurrent users.
>
> Same there.
>
>> What is the race in these cases? Am I missing something?
>
> No, no race then. My only concern was:
>
> thread A: thread B:
> - x = false;
> - tcf_require_rtnl(true, ); - tcf_require_rtnl(true, );
>   - if (!*x && true)- if (!*x && true)
> - *x = true;
> - rtnl_lock() - *x = true;
>   - rtnl_lock()
>
> but this cannot happen as you explained.

Re: [PATCH net-next 17/17] net: sched: unlock rules update API

2018-11-13 Thread Vlad Buslov

On Mon 12 Nov 2018 at 17:30, David Miller  wrote:
> From: Vlad Buslov 
> Date: Mon, 12 Nov 2018 09:55:46 +0200
>
>> Register netlink protocol handlers for message types RTM_NEWTFILTER,
>> RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that
>> tracks rtnl mutex state to be false by default.
>
> This whole conditional locking mechanism is really not clean and makes
> this code so much harder to understand and audit.
>
> Please improve the code so that this kind of construct is not needed.
>
> Thank you.

Hi David,

I considered several approaches to this problem and decided that this
one is most straightforward to implement. I understand your concern and
agree that this code is not easiest to understand and can suggest
several possible solutions that do not require this kind of elaborate
locking mechanism in cls API, but have their own drawbacks:

1. Convert all qdiscs and classifiers to support unlocked execution,
like we did for actions. However, according to my experience with
converting flower classifier, these require much more code than actions.
I would estimate it to be more work than whole current unlocking effort
(hundred+ patches). Also, authors of some of them might be unhappy with
such intrusive changes. I don't think this approach is realistic.

2. Somehow determine if rtnl is needed at the beginning of cls API rule
update functions. Currently, this is not possible because locking
requirements are determined by qdisc_class_ops and tcf_proto_ops 'flags'
field, which requires code to first do whole ops lookup sequence.
However, instead of class field I can put 'flags' in some kind of hash
table or array that will map qdisc/classifier type string to flags, so
it will be possible to determine locking requirements by just parsing
netlink message and obtaining flags by qdisc/classifier type. I do not
consider it pretty solution either, but maybe you have different
opinion.

3. Anything you can suggest? I might be missing something simple that
you would consider more elegant solution to this problem.

Thanks,
Vlad

Re: [PATCH net-next 02/17] net: sched: protect block state with spinlock

2018-11-13 Thread Vlad Buslov



On Tue 13 Nov 2018 at 10:07, Stefano Brivio  wrote:
> Vlad,
>
> On Mon, 12 Nov 2018 09:28:59 -0800 (PST)
> David Miller  wrote:
>
>> From: Vlad Buslov 
>> Date: Mon, 12 Nov 2018 09:55:31 +0200
>> 
>> > +#define ASSERT_BLOCK_LOCKED(block)
>> > \
>> > +  WARN_ONCE(!spin_is_locked(&(block)->lock),  \
>> > +"BLOCK: assertion failed at %s (%d)\n", __FILE__,  __LINE__)  
>> 
>> spin_is_locked() is not usable for assertions.
>
> See also b86077207d0c ("igbvf: Replace spin_is_locked() with
> lockdep").

Stefano,

Thanks for the tip. I will check it out.

Vlad.

Re: [PATCH net-next 16/17] net: sched: conditionally take rtnl lock on rules update path

2018-11-13 Thread Vlad Buslov



On Tue 13 Nov 2018 at 09:40, Stefano Brivio  wrote:
> Hi Vlad,
>
> On Mon, 12 Nov 2018 09:55:45 +0200
> Vlad Buslov  wrote:
>
>> @@ -179,9 +179,25 @@ static void tcf_proto_destroy_work(struct work_struct 
>> *work)
>>  rtnl_unlock();
>>  }
>>  
>> +/* Helper function to lock rtnl mutex when specified condition is true and 
>> mutex
>> + * hasn't been locked yet. Will set rtnl_held to 'true' before taking rtnl 
>> lock.
>> + * Note that this function does nothing if rtnl is already held. This is
>> + * intended to be used by cls API rules update API when multiple conditions
>> + * could require rtnl lock and its state needs to be tracked to prevent 
>> trying
>> + * to obtain lock multiple times.
>> + */
>> +
>> +static void tcf_require_rtnl(bool cond, bool *rtnl_held)
>> +{
>> +if (!*rtnl_held && cond) {
>> +*rtnl_held = true;
>> +rtnl_lock();
>> +}
>> +}
>
> I guess calls to this function are supposed to be serialised. If that's
> the case (which is my tentative understanding so far), I would indicate
> that in the comment.
>
> If that's not the case, you would be introducing a race I guess.
>
> Same applies to tcf_block_release() from 17/17.

Hi Stefano,

Thank you for reviewing my code!

I did not intend for this function to be serialized. First argument to
tcf_require_rtnl() is passed by value, and second argument is always a
pointer to local stack-allocated value of the caller. Same applies to
tcf_block_release() - its arguments are Qdisc and block which support
concurrency-safe reference counting, and pointer to local variable
rtnl_held, which is not accessible to concurrent users.

What is the race in these cases? Am I missing something?

Vlad

Re: [PATCH net-next 01/17] net: sched: refactor mini_qdisc_pair_swap() to use workqueue

2018-11-13 Thread Vlad Buslov



On Mon 12 Nov 2018 at 17:28, David Miller  wrote:
> From: Vlad Buslov 
> Date: Mon, 12 Nov 2018 09:55:30 +0200
>
>> +void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
>> +  struct tcf_proto *tp_head)
>> +{
>> +xchg(>tp_head, tp_head);
>
> If you are not checking the return value of xchg(), then this is
> simply a store with optionally a memory barrier of some sort
> either before or after.

That was my intention. What would be a better way to atomically
reset a pointer? Should I just change this line to explicit
assignment+barrier?

[PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock

2018-11-11 Thread Vlad Buslov

Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a third step to remove
rtnl lock dependency from TC rules update path.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
Handlers registered with this flag are called without RTNL taken. End
goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER,
etc.) to be registered with UNLOCKED flag to allow parallel execution.
However, there is no intention to completely remove or split rtnl lock
itself. This patch set addresses specific problems in implementation of
classifiers API that prevent its control path from being executed
concurrently, and completes refactoring of cls API rules update handlers
by removing rtnl lock dependency from code that handles chains and
classifiers. Rules update handlers are registered with
RTNL_FLAG_DOIT_UNLOCKED flag.

This patch set substitutes global rtnl lock dependency on rules update
path in cls API by extending its data structures with following locks:
- tcf_block with 'lock' spinlock. It is used to protect block state, and
  life-time management fields of chains on the block (chain->refcnt,
  chain->action_refcnt, chain->explicitly crated, etc.).
- tcf_chain with 'filter_chain_lock' spinlock, that is used to protect
  list of classifier instances attached to chain.
- tcf_proto with 'lock' spinlock that is intended to be used to
  synchronize access to classifiers that support unlocked execution.

Chain0 head change callbacks can be sleeping and cannot be protected by
block spinlock. To solve this issue, sleeping miniqp swap function (used
as head change callback by ingress and clsact Qdiscs) is refactored to
offload sleeping operations to workqueue. New ordered workqueue
'tc_proto_workqueue' is created in cls_api to be used by miniqp and for
tcf_proto deallocation, which is also moved to workqueue to prevent
deallocation of tp's that are still in use by block. Performing both
miniqp swap and tp deallocation on same ordered workqueue ensures that
any pending head change requests involving tp are completed before the
tp is deallocated.

Classifiers are extended with reference counting to accommodate parallel
access by unlocked cls API. Classifier ops structure is extended with
additional 'put' function to allow reference counting of filters and
intended to be used by classifiers that implement rtnl-unlocked API.
Users of classifiers and individual filter instances are modified to
always hold reference while working with them.

Classifiers that support unlocked execution still need to know the
status of rtnl lock, so their API is extended with additional
'rtnl_held' argument that is used to indicate that caller holds rtnl
lock.

Vlad Buslov (17):
  net: sched: refactor mini_qdisc_pair_swap() to use workqueue
  net: sched: protect block state with spinlock
  net: sched: refactor tc_ctl_chain() to use block->lock
  net: sched: protect block->chain0 with block->lock
  net: sched: traverse chains in block with tcf_get_next_chain()
  net: sched: protect chain template accesses with block lock
  net: sched: lock the chain when accessing filter_chain list
  net: sched: introduce reference counting for tcf proto
  net: sched: traverse classifiers in chain with tcf_get_next_proto()
  net: sched: refactor tp insert/delete for concurrent execution
  net: sched: prevent insertion of new classifiers during chain flush
  net: sched: track rtnl lock status when validating extensions
  net: sched: extend proto ops with 'put' callback
  net: sched: extend proto ops to support unlocked classifiers
  net: sched: add flags to Qdisc class ops struct
  net: sched: conditionally take rtnl lock on rules update path
  net: sched: unlock rules update API

 include/net/pkt_cls.h |   6 +-
 include/net/sch_generic.h |  73 +++-
 net/sched/cls_api.c   | 919 +-
 net/sched/cls_basic.c |  14 +-
 net/sched/cls_bpf.c   |  15 +-
 net/sched/cls_cgroup.c|  13 +-
 net/sched/cls_flow.c  |  15 +-
 net/sched/cls_flower.c|  16 +-
 net/sched/cls_fw.c|  15 +-
 net/sched/cls_matchall.c  |  16 +-
 net/sched/cls_route.c |  14 +-
 net/sched/cls_rsvp.h  |  16 +-
 net/sched/cls_tcindex.c   |  17 +-
 net/sched/cls_u32.c   |  14 +-
 net/sched/sch_api.c   |  10 +-
 net/sched/sch_generic.c   |  37 +-
 net/sched/sch_ingress.c   |   4 +
 17 files changed, 955 insertions(+), 259 deletions(-)

-- 
2.7.5

[PATCH net-next 08/17] net: sched: introduce reference counting for tcf proto

2018-11-11 Thread Vlad Buslov

Add reference counter to tcf proto. Use it to manage tcf proto life cycle
in cls API.

Implement helper get/put functions for tcf proto and use them to modify cls
API to always take reference to tcf proto while using it. This change
allows to concurrently modify proto, instead of relying on rtnl lock for
protection.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  1 +
 net/sched/cls_api.c   | 44 +---
 2 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 82fa23da4969..1015e3491187 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -318,6 +318,7 @@ struct tcf_proto {
void*data;
const struct tcf_proto_ops  *ops;
struct tcf_chain*chain;
+   refcount_t  refcnt;
struct rcu_head rcu;
struct work_struct  work;
 };
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 8f5dfa3ffb1c..6cbdb28017d3 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -197,6 +197,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, 
u32 protocol,
tp->prio = prio;
tp->chain = chain;
INIT_WORK(>work, tcf_proto_destroy_work);
+   refcount_set(>refcnt, 1);
 
err = tp->ops->init(tp);
if (err) {
@@ -210,12 +211,24 @@ static struct tcf_proto *tcf_proto_create(const char 
*kind, u32 protocol,
return ERR_PTR(err);
 }
 
+static void tcf_proto_get(struct tcf_proto *tp)
+{
+   refcount_inc(>refcnt);
+}
+
 static void tcf_proto_destroy(struct tcf_proto *tp,
  struct netlink_ext_ack *extack)
 {
tc_queue_proto_work(>work);
 }
 
+static void tcf_proto_put(struct tcf_proto *tp,
+ struct netlink_ext_ack *extack)
+{
+   if (refcount_dec_and_test(>refcnt))
+   tcf_proto_destroy(tp, extack);
+}
+
 #define ASSERT_BLOCK_LOCKED(block) \
WARN_ONCE(!spin_is_locked(&(block)->lock),  \
  "BLOCK: assertion failed at %s (%d)\n", __FILE__,  __LINE__)
@@ -458,13 +471,13 @@ static void tcf_chain_flush(struct tcf_chain *chain)
 
spin_lock(>filter_chain_lock);
tp = tcf_chain_dereference(chain->filter_chain, chain);
+   RCU_INIT_POINTER(chain->filter_chain, NULL);
tcf_chain0_head_change(chain, NULL);
spin_unlock(>filter_chain_lock);
 
while (tp) {
-   RCU_INIT_POINTER(chain->filter_chain, tp->next);
-   tcf_proto_destroy(tp, NULL);
-   tp = rtnl_dereference(chain->filter_chain);
+   tcf_proto_put(tp, NULL);
+   tp = tp->next;
}
 }
 
@@ -1275,9 +1288,9 @@ static void tcf_chain_tp_insert(struct tcf_chain *chain,
 {
if (*chain_info->pprev == chain->filter_chain)
tcf_chain0_head_change(chain, tp);
+   tcf_proto_get(tp);
RCU_INIT_POINTER(tp->next, tcf_chain_tp_prev(chain, chain_info));
rcu_assign_pointer(*chain_info->pprev, tp);
-   tcf_chain_hold(chain);
 }
 
 static void tcf_chain_tp_remove(struct tcf_chain *chain,
@@ -1315,7 +1328,12 @@ static struct tcf_proto *tcf_chain_tp_find(struct 
tcf_chain *chain,
}
}
chain_info->pprev = pprev;
-   chain_info->next = tp ? tp->next : NULL;
+   if (tp) {
+   chain_info->next = tp->next;
+   tcf_proto_get(tp);
+   } else {
+   chain_info->next = NULL;
+   }
return tp;
 }
 
@@ -1590,6 +1608,12 @@ static int tc_new_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
 errout:
if (chain)
tcf_chain_put(chain);
+   if (chain) {
+   if (tp && !IS_ERR(tp))
+   tcf_proto_put(tp, NULL);
+   if (!tp_created)
+   tcf_chain_put(chain);
+   }
tcf_block_release(q, block);
if (err == -EAGAIN)
/* Replay the request. */
@@ -1720,8 +1744,11 @@ static int tc_del_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
}
 
 errout:
-   if (chain)
+   if (chain) {
+   if (tp && !IS_ERR(tp))
+   tcf_proto_put(tp, NULL);
tcf_chain_put(chain);
+   }
tcf_block_release(q, block);
return err;
 
@@ -1812,8 +1839,11 @@ static int tc_get_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
}
 
 errout:
-   if (chain)
+   if (chain) {
+   if (tp && !IS_ERR(tp))
+   tcf_proto_put(tp, NULL);
tcf_chain_put(chain);
+   }
tcf_block_release(q, block);
return err;
 }
-- 
2.7.5

[PATCH net-next 10/17] net: sched: refactor tp insert/delete for concurrent execution

2018-11-11 Thread Vlad Buslov

Implement unique insertion function to atomically attach tcf proto to chain
after verifying that no other tcf proto with specified priority exists.
Implement delete function that verifies that tp is actually empty before
deleting it. Use these functions to refactor cls API to account for
concurrent tp and rule update instead of relying on rtnl lock. Add new
'deleting' flag to tcf proto. Use it to restart search when iterating over
tp's on chain to prevent accessing potentially inval tp->next pointer.

Extend tcf proto with spinlock that is intended to be used to protects its
data from concurrent modification instead of relying on rtnl mutex. Use it
to protect 'deleting' flag. Add lockdep macros to validate that lock is
held when accessing protected fields.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  18 +
 net/sched/cls_api.c   | 183 +++---
 2 files changed, 176 insertions(+), 25 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 1015e3491187..4809eca41f95 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -318,6 +318,11 @@ struct tcf_proto {
void*data;
const struct tcf_proto_ops  *ops;
struct tcf_chain*chain;
+   /* Lock protects tcf_proto shared state and can be used by unlocked
+* classifiers to protect their private data.
+*/
+   spinlock_t  lock;
+   booldeleting;
refcount_t  refcnt;
struct rcu_head rcu;
struct work_struct  work;
@@ -379,16 +384,29 @@ static inline bool lockdep_tcf_chain_is_locked(struct 
tcf_chain *chain)
 {
return lockdep_is_held(>filter_chain_lock);
 }
+
+static inline bool lockdep_tcf_proto_is_locked(struct tcf_proto *tp)
+{
+   return lockdep_is_held(>lock);
+}
 #else
 static inline bool lockdep_tcf_chain_is_locked(struct tcf_block *chain)
 {
return true;
 }
+
+static inline bool lockdep_tcf_proto_is_locked(struct tcf_proto *tp)
+{
+   return true;
+}
 #endif /* #ifdef CONFIG_PROVE_LOCKING */
 
 #define tcf_chain_dereference(p, chain)
\
rcu_dereference_protected(p, lockdep_tcf_chain_is_locked(chain))
 
+#define tcf_proto_dereference(p, tp)   \
+   rcu_dereference_protected(p, lockdep_tcf_proto_is_locked(tp))
+
 static inline void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
 {
if (*flags & TCA_CLS_FLAGS_IN_HW)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 02130f8d89e1..3ce244fbfb4d 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -197,6 +197,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, 
u32 protocol,
tp->prio = prio;
tp->chain = chain;
INIT_WORK(>work, tcf_proto_destroy_work);
+   spin_lock_init(>lock);
refcount_set(>refcnt, 1);
 
err = tp->ops->init(tp);
@@ -229,6 +230,49 @@ static void tcf_proto_put(struct tcf_proto *tp,
tcf_proto_destroy(tp, extack);
 }
 
+static int walker_noop(struct tcf_proto *tp, void *d, struct tcf_walker *arg)
+{
+   return -1;
+}
+
+static bool tcf_proto_is_empty(struct tcf_proto *tp)
+{
+   struct tcf_walker walker = { .fn = walker_noop, };
+
+   if (tp->ops->walk) {
+   tp->ops->walk(tp, );
+   return !walker.stop;
+   }
+   return true;
+}
+
+static bool tcf_proto_check_delete(struct tcf_proto *tp)
+{
+   spin_lock(>lock);
+   if (tcf_proto_is_empty(tp))
+   tp->deleting = true;
+   spin_unlock(>lock);
+   return tp->deleting;
+}
+
+static void tcf_proto_mark_delete(struct tcf_proto *tp)
+{
+   spin_lock(>lock);
+   tp->deleting = true;
+   spin_unlock(>lock);
+}
+
+static bool tcf_proto_is_deleting(struct tcf_proto *tp)
+{
+   bool deleting;
+
+   spin_lock(>lock);
+   deleting = tp->deleting;
+   spin_unlock(>lock);
+
+   return deleting;
+}
+
 #define ASSERT_BLOCK_LOCKED(block) \
WARN_ONCE(!spin_is_locked(&(block)->lock),  \
  "BLOCK: assertion failed at %s (%d)\n", __FILE__,  __LINE__)
@@ -731,13 +775,27 @@ EXPORT_SYMBOL(tcf_get_next_chain);
 static struct tcf_proto *
 __tcf_get_next_proto(struct tcf_chain *chain, struct tcf_proto *tp)
 {
+   u32 prio = 0;
+
ASSERT_RTNL();
spin_lock(>filter_chain_lock);
 
-   if (!tp)
+   if (!tp) {
tp = tcf_chain_dereference(chain->filter_chain, chain);
-   else
+   } else if (tcf_proto_is_deleting(tp)) {
+   /* 'deleting' flag is set and chain->filter_chain_lock was
+* unlocked, which means next pointer could

[PATCH net-next 13/17] net: sched: extend proto ops with 'put' callback

2018-11-11 Thread Vlad Buslov

Add optional tp->ops->put() API to be implemented for filter reference
counting. This new function is called by cls API to release filter
reference for filters returned by tp->ops->change() or tp->ops->get()
functions. Implement tfilter_put() helper to call tp->ops->put() only for
classifiers that implement it.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  1 +
 net/sched/cls_api.c   | 12 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 100368594524..24103b7282bd 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -273,6 +273,7 @@ struct tcf_proto_ops {
   struct netlink_ext_ack *extack);
 
void*   (*get)(struct tcf_proto*, u32 handle);
+   void(*put)(struct tcf_proto *tp, void *f);
int (*change)(struct net *net, struct sk_buff *,
struct tcf_proto*, unsigned long,
u32 handle, struct nlattr **,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 0949502e31b9..7f65ed84b5e5 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1642,6 +1642,12 @@ static void tfilter_notify_chain(struct net *net, struct 
sk_buff *oskb,
   q, parent, NULL, event, false);
 }
 
+static void tfilter_put(struct tcf_proto *tp, void *fh)
+{
+   if (tp->ops->put && fh)
+   tp->ops->put(tp, fh);
+}
+
 static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
  struct netlink_ext_ack *extack)
 {
@@ -1784,6 +1790,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
goto errout;
}
} else if (n->nlmsg_flags & NLM_F_EXCL) {
+   tfilter_put(tp, fh);
NL_SET_ERR_MSG(extack, "Filter already exists");
err = -EEXIST;
goto errout;
@@ -1798,9 +1805,11 @@ static int tc_new_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, ,
  n->nlmsg_flags & NLM_F_CREATE ? TCA_ACT_NOREPLACE 
: TCA_ACT_REPLACE,
  extack);
-   if (err == 0)
+   if (err == 0) {
tfilter_notify(net, skb, n, tp, block, q, parent, fh,
   RTM_NEWTFILTER, false);
+   tfilter_put(tp, fh);
+   }
 
 errout:
if (err && tp_created)
@@ -2030,6 +2039,7 @@ static int tc_get_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
NL_SET_ERR_MSG(extack, "Failed to send filter notify 
message");
}
 
+   tfilter_put(tp, fh);
 errout:
if (chain) {
if (tp && !IS_ERR(tp))
-- 
2.7.5

[PATCH net-next 09/17] net: sched: traverse classifiers in chain with tcf_get_next_proto()

2018-11-11 Thread Vlad Buslov

All users of chain->filters_chain rely on rtnl lock and assume that no new
classifier instances are added when traversing the list. Use
tcf_get_next_proto() to traverse filters list without relying on rtnl
mutex. This function iterates over classifiers by taking reference to
current iterator classifier only and doesn't assume external
synchronization of filters list.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/pkt_cls.h |  2 ++
 net/sched/cls_api.c   | 70 +++
 net/sched/sch_api.c   |  4 +--
 3 files changed, 64 insertions(+), 12 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 8396d51e78a1..727cb99da5d4 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -45,6 +45,8 @@ struct tcf_chain *tcf_chain_get_by_act(struct tcf_block 
*block,
 void tcf_chain_put_by_act(struct tcf_chain *chain);
 struct tcf_chain *tcf_get_next_chain(struct tcf_block *block,
 struct tcf_chain *chain);
+struct tcf_proto *tcf_get_next_proto(struct tcf_chain *chain,
+struct tcf_proto *tp);
 void tcf_block_netif_keep_dst(struct tcf_block *block);
 int tcf_block_get(struct tcf_block **p_block,
  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 6cbdb28017d3..02130f8d89e1 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -728,6 +728,45 @@ tcf_get_next_chain(struct tcf_block *block, struct 
tcf_chain *chain)
 }
 EXPORT_SYMBOL(tcf_get_next_chain);
 
+static struct tcf_proto *
+__tcf_get_next_proto(struct tcf_chain *chain, struct tcf_proto *tp)
+{
+   ASSERT_RTNL();
+   spin_lock(>filter_chain_lock);
+
+   if (!tp)
+   tp = tcf_chain_dereference(chain->filter_chain, chain);
+   else
+   tp = tcf_chain_dereference(tp->next, chain);
+
+   if (tp)
+   tcf_proto_get(tp);
+
+   spin_unlock(>filter_chain_lock);
+
+   return tp;
+}
+
+/* Function to be used by all clients that want to iterate over all tp's on
+ * chain. Users of this function must be tolerant to concurrent tp
+ * insertion/deletion or ensure that no concurrent chain modification is
+ * possible. Note that all netlink dump callbacks cannot guarantee to provide
+ * consistent dump because rtnl lock is released each time skb is filled with
+ * data and sent to user-space.
+ */
+
+struct tcf_proto *
+tcf_get_next_proto(struct tcf_chain *chain, struct tcf_proto *tp)
+{
+   struct tcf_proto *tp_next = __tcf_get_next_proto(chain, tp);
+
+   if (tp)
+   tcf_proto_put(tp, NULL);
+
+   return tp_next;
+}
+EXPORT_SYMBOL(tcf_get_next_proto);
+
 static void tcf_block_flush_all_chains(struct tcf_block *block)
 {
struct tcf_chain *chain;
@@ -1100,7 +1139,7 @@ tcf_block_playback_offloads(struct tcf_block *block, 
tc_setup_cb_t *cb,
struct netlink_ext_ack *extack)
 {
struct tcf_chain *chain, *chain_prev;
-   struct tcf_proto *tp;
+   struct tcf_proto *tp, *tp_prev;
int err;
 
for (chain = __tcf_get_next_chain(block, NULL);
@@ -1108,8 +1147,10 @@ tcf_block_playback_offloads(struct tcf_block *block, 
tc_setup_cb_t *cb,
 chain_prev = chain,
 chain = __tcf_get_next_chain(block, chain),
 tcf_chain_put(chain_prev)) {
-   for (tp = rtnl_dereference(chain->filter_chain); tp;
-tp = rtnl_dereference(tp->next)) {
+   for (tp = __tcf_get_next_proto(chain, NULL); tp;
+tp_prev = tp,
+tp = __tcf_get_next_proto(chain, tp),
+tcf_proto_put(tp_prev, NULL)) {
if (tp->ops->reoffload) {
err = tp->ops->reoffload(tp, add, cb, cb_priv,
 extack);
@@ -1126,6 +1167,7 @@ tcf_block_playback_offloads(struct tcf_block *block, 
tc_setup_cb_t *cb,
return 0;
 
 err_playback_remove:
+   tcf_proto_put(tp, NULL);
tcf_chain_put(chain);
tcf_block_playback_offloads(block, cb, cb_priv, false, offload_in_use,
extack);
@@ -1449,8 +1491,8 @@ static void tfilter_notify_chain(struct net *net, struct 
sk_buff *oskb,
 {
struct tcf_proto *tp;
 
-   for (tp = rtnl_dereference(chain->filter_chain);
-tp; tp = rtnl_dereference(tp->next))
+   for (tp = tcf_get_next_proto(chain, NULL);
+tp; tp = tcf_get_next_proto(chain, tp))
tfilter_notify(net, oskb, n, tp, block,
   q, parent, NULL, event, false);
 }
@@ -1875,11 +1917,15 @@ static bool tcf_chain_dump(struct tcf_chain *chain, 
struct Qdisc *q, u32 parent,
struct net *net = sock_net(skb->sk);

[PATCH net-next 11/17] net: sched: prevent insertion of new classifiers during chain flush

2018-11-11 Thread Vlad Buslov

Extend tcf_chain with 'flushing' flag. Use the flag to prevent insertion of
new classifier instances when chain flushing is in progress in order to
prevent resource leak when tcf_proto is created by unlocked users
concurrently.

Return EAGAIN error from tcf_chain_tp_insert_unique() to restart
tc_new_tfilter() and lookup the chain/proto again.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  1 +
 net/sched/cls_api.c   | 35 ++-
 2 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 4809eca41f95..100368594524 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -353,6 +353,7 @@ struct tcf_chain {
unsigned int refcnt;
unsigned int action_refcnt;
bool explicitly_created;
+   bool flushing;
const struct tcf_proto_ops *tmplt_ops;
void *tmplt_priv;
 };
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 3ce244fbfb4d..d4f763525412 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -483,9 +483,12 @@ static void __tcf_chain_put(struct tcf_chain *chain, bool 
by_act,
spin_unlock(>lock);
 
/* The last dropped non-action reference will trigger notification. */
-   if (is_last && !by_act)
+   if (is_last && !by_act) {
tc_chain_notify_delete(tmplt_ops, tmplt_priv, chain_index,
   block, NULL, 0, 0, false);
+   /* Last reference to chain, no need to lock. */
+   chain->flushing = false;
+   }
 
if (refcnt == 0) {
tc_chain_tmplt_del(tmplt_ops, tmplt_priv);
@@ -517,6 +520,7 @@ static void tcf_chain_flush(struct tcf_chain *chain)
tp = tcf_chain_dereference(chain->filter_chain, chain);
RCU_INIT_POINTER(chain->filter_chain, NULL);
tcf_chain0_head_change(chain, NULL);
+   chain->flushing = true;
spin_unlock(>filter_chain_lock);
 
while (tp) {
@@ -1382,15 +1386,20 @@ static struct tcf_proto *tcf_chain_tp_prev(struct 
tcf_chain *chain,
return tcf_chain_dereference(*chain_info->pprev, chain);
 }
 
-static void tcf_chain_tp_insert(struct tcf_chain *chain,
-   struct tcf_chain_info *chain_info,
-   struct tcf_proto *tp)
+static int tcf_chain_tp_insert(struct tcf_chain *chain,
+  struct tcf_chain_info *chain_info,
+  struct tcf_proto *tp)
 {
+   if (chain->flushing)
+   return -EAGAIN;
+
if (*chain_info->pprev == chain->filter_chain)
tcf_chain0_head_change(chain, tp);
tcf_proto_get(tp);
RCU_INIT_POINTER(tp->next, tcf_chain_tp_prev(chain, chain_info));
rcu_assign_pointer(*chain_info->pprev, tp);
+
+   return 0;
 }
 
 static void tcf_chain_tp_remove(struct tcf_chain *chain,
@@ -1421,18 +1430,22 @@ static struct tcf_proto 
*tcf_chain_tp_insert_unique(struct tcf_chain *chain,
 {
struct tcf_chain_info chain_info;
struct tcf_proto *tp;
+   int err = 0;
 
spin_lock(>filter_chain_lock);
 
tp = tcf_chain_tp_find(chain, _info,
   protocol, prio, false);
if (!tp)
-   tcf_chain_tp_insert(chain, _info, tp_new);
+   err = tcf_chain_tp_insert(chain, _info, tp_new);
spin_unlock(>filter_chain_lock);
 
if (tp) {
tcf_proto_destroy(tp_new, NULL);
tp_new = tp;
+   } else if (err) {
+   tcf_proto_destroy(tp_new, NULL);
+   tp_new = ERR_PTR(err);
}
 
return tp_new;
@@ -1743,11 +1756,15 @@ static int tc_new_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
 
tp = tcf_chain_tp_insert_unique(chain, tp_new, protocol, prio);
 
-   /* tp insert function can return another tp instance, if it was
-* created concurrently.
-*/
-   if (tp == tp_new)
+   if (IS_ERR(tp)) {
+   err = PTR_ERR(tp);
+   goto errout;
+   } else if (tp == tp_new) {
+   /* tp insert function can return another tp instance, if
+* it was created concurrently.
+*/
tp_created = 1;
+   }
} else {
spin_unlock(>filter_chain_lock);
}
-- 
2.7.5

[PATCH net-next 17/17] net: sched: unlock rules update API

2018-11-11 Thread Vlad Buslov

Register netlink protocol handlers for message types RTM_NEWTFILTER,
RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that
tracks rtnl mutex state to be false by default.

Modify tcf_block_release() to release rtnl lock if it was taken before.
Move code that releases block and qdisc to function __tcf_block_release()
that is used internally by regular block release and by chain update
function, which is not unlocked and doesn't need to release rtnl.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 43 ++-
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 848f148f1019..a23aeac8ea4e 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1024,13 +1024,28 @@ static struct tcf_block *tcf_block_find(struct net 
*net, struct Qdisc **q,
return ERR_PTR(err);
 }
 
-static void tcf_block_release(struct Qdisc *q, struct tcf_block *block)
+static void __tcf_block_release(struct Qdisc *q, struct tcf_block *block,
+   bool rtnl_held)
 {
if (!IS_ERR_OR_NULL(block))
tcf_block_refcnt_put(block);
 
-   if (q)
-   qdisc_put(q);
+   if (q) {
+   if (rtnl_held)
+   qdisc_put(q);
+   else
+   qdisc_put_unlocked(q);
+   }
+}
+
+static void tcf_block_release(struct Qdisc *q, struct tcf_block *block,
+ bool *rtnl_held)
+{
+   if (*rtnl_held) {
+   rtnl_unlock();
+   *rtnl_held = false;
+   }
+   __tcf_block_release(q, block, false);
 }
 
 struct tcf_block_owner_item {
@@ -1706,7 +1721,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
void *fh;
int err;
int tp_created;
-   bool rtnl_held = true;
+   bool rtnl_held = false;
 
if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
return -EPERM;
@@ -1865,7 +1880,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
if (!tp_created)
tcf_chain_put(chain);
}
-   tcf_block_release(q, block);
+   tcf_block_release(q, block, _held);
if (err == -EAGAIN) {
/* Take rtnl lock in case EAGAIN is caused by concurrent flush
 * of target chain.
@@ -1899,7 +1914,7 @@ static int tc_del_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
unsigned long cl = 0;
void *fh = NULL;
int err;
-   bool rtnl_held = true;
+   bool rtnl_held = false;
 
if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
return -EPERM;
@@ -2011,7 +2026,7 @@ static int tc_del_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
tcf_proto_put(tp, NULL);
tcf_chain_put(chain);
}
-   tcf_block_release(q, block);
+   tcf_block_release(q, block, _held);
return err;
 
 errout_locked:
@@ -2037,7 +2052,7 @@ static int tc_get_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
unsigned long cl = 0;
void *fh = NULL;
int err;
-   bool rtnl_held = true;
+   bool rtnl_held = false;
 
err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, rtm_tca_policy, extack);
if (err < 0)
@@ -2112,7 +2127,7 @@ static int tc_get_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
tcf_proto_put(tp, NULL);
tcf_chain_put(chain);
}
-   tcf_block_release(q, block);
+   tcf_block_release(q, block, _held);
return err;
 }
 
@@ -2561,7 +2576,7 @@ static int tc_ctl_chain(struct sk_buff *skb, struct 
nlmsghdr *n,
 errout:
tcf_chain_put(chain);
 errout_block:
-   tcf_block_release(q, block);
+   __tcf_block_release(q, block, true);
if (err == -EAGAIN)
/* Replay the request. */
goto replay;
@@ -2899,10 +2914,12 @@ static int __init tc_filter_init(void)
if (err)
goto err_register_pernet_subsys;
 
-   rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL, 0);
-   rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL, 0);
+   rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL,
+ RTNL_FLAG_DOIT_UNLOCKED);
+   rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL,
+ RTNL_FLAG_DOIT_UNLOCKED);
rtnl_register(PF_UNSPEC, RTM_GETTFILTER, tc_get_tfilter,
- tc_dump_tfilter, 0);
+ tc_dump_tfilter, RTNL_FLAG_DOIT_UNLOCKED);
rtnl_register(PF_UNSPEC, RTM_NEWCHAIN, tc_ctl_chain, NULL, 0);
rtnl_register(PF_UNSPEC, RTM_DELCHAIN, tc_ctl_chain, NULL, 0);
rtnl_register(PF_UNSPEC, RTM_GETCHAIN, tc_ctl_chain,
-- 
2.7.5

[PATCH net-next 16/17] net: sched: conditionally take rtnl lock on rules update path

2018-11-11 Thread Vlad Buslov

As a preparation for registering rules update netlink handlers as unlocked,
conditionally take rtnl in following cases:
- Parent qdisc doesn't support unlocked execution.
- Requested classifier type doesn't support unlocked execution.
- User requested to flash whole chain using old filter update API, instead
of new chains API.

Add helper function tcf_require_rtnl() to only lock rtnl when specified
condition is true and the lock hasn't been taken already.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 74 +
 1 file changed, 63 insertions(+), 11 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 1956c5df5f89..848f148f1019 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -179,9 +179,25 @@ static void tcf_proto_destroy_work(struct work_struct 
*work)
rtnl_unlock();
 }
 
+/* Helper function to lock rtnl mutex when specified condition is true and 
mutex
+ * hasn't been locked yet. Will set rtnl_held to 'true' before taking rtnl 
lock.
+ * Note that this function does nothing if rtnl is already held. This is
+ * intended to be used by cls API rules update API when multiple conditions
+ * could require rtnl lock and its state needs to be tracked to prevent trying
+ * to obtain lock multiple times.
+ */
+
+static void tcf_require_rtnl(bool cond, bool *rtnl_held)
+{
+   if (!*rtnl_held && cond) {
+   *rtnl_held = true;
+   rtnl_lock();
+   }
+}
+
 static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
  u32 prio, struct tcf_chain *chain,
- bool rtnl_held,
+ bool *rtnl_held,
  struct netlink_ext_ack *extack)
 {
struct tcf_proto *tp;
@@ -191,7 +207,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, 
u32 protocol,
if (!tp)
return ERR_PTR(-ENOBUFS);
 
-   tp->ops = tcf_proto_lookup_ops(kind, rtnl_held, extack);
+   tp->ops = tcf_proto_lookup_ops(kind, *rtnl_held, extack);
if (IS_ERR(tp->ops)) {
err = PTR_ERR(tp->ops);
goto errout;
@@ -204,6 +220,8 @@ static struct tcf_proto *tcf_proto_create(const char *kind, 
u32 protocol,
spin_lock_init(>lock);
refcount_set(>refcnt, 1);
 
+   tcf_require_rtnl(!(tp->ops->flags & TCF_PROTO_OPS_DOIT_UNLOCKED),
+rtnl_held);
err = tp->ops->init(tp);
if (err) {
module_put(tp->ops->owner);
@@ -888,6 +906,7 @@ static void tcf_block_refcnt_put(struct tcf_block *block)
 static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
u32 *parent, unsigned long *cl,
int ifindex, u32 block_index,
+   bool *rtnl_held,
struct netlink_ext_ack *extack)
 {
struct tcf_block *block;
@@ -953,6 +972,12 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
 */
rcu_read_unlock();
 
+   /* Take rtnl mutex if qdisc doesn't support unlocked
+* execution.
+*/
+   tcf_require_rtnl(!(cops->flags & QDISC_CLASS_OPS_DOIT_UNLOCKED),
+rtnl_held);
+
/* Do we search for filter, attached to class? */
if (TC_H_MIN(*parent)) {
*cl = cops->find(*q, *parent);
@@ -990,7 +1015,10 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
rcu_read_unlock();
 errout_qdisc:
if (*q) {
-   qdisc_put(*q);
+   if (*rtnl_held)
+   qdisc_put(*q);
+   else
+   qdisc_put_unlocked(*q);
*q = NULL;
}
return ERR_PTR(err);
@@ -1678,7 +1706,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
void *fh;
int err;
int tp_created;
-   bool rtnl_held;
+   bool rtnl_held = true;
 
if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
return -EPERM;
@@ -1697,7 +1725,6 @@ static int tc_new_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
parent = t->tcm_parent;
tp = NULL;
cl = 0;
-   rtnl_held = true;
 
if (prio == 0) {
/* If no priority is provided by the user,
@@ -1715,7 +1742,8 @@ static int tc_new_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
/* Find head of filter chain. */
 
block = tcf_block_find(net, , , ,
-  t->tcm_ifindex, t->tcm_block_index, extack);
+

[PATCH net-next 05/17] net: sched: traverse chains in block with tcf_get_next_chain()

2018-11-11 Thread Vlad Buslov

All users of block->chain_list rely on rtnl lock and assume that no new
chains are added when traversing the list. Use tcf_get_next_chain() to
traverse chain list without relying on rtnl mutex. This function iterates
over chains by taking reference to current iterator chain only and doesn't
assume external synchronization of chain list.

Don't take reference to all chains in block when flushing and use
tcf_get_next_chain() to safely iterate over chain list instead. Remove
tcf_block_put_all_chains() that is no longer used.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/pkt_cls.h |  2 ++
 net/sched/cls_api.c   | 96 +--
 net/sched/sch_api.c   |  4 ++-
 3 files changed, 76 insertions(+), 26 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 72ffb3120ced..8396d51e78a1 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -43,6 +43,8 @@ bool tcf_queue_work(struct rcu_work *rwork, work_func_t func);
 struct tcf_chain *tcf_chain_get_by_act(struct tcf_block *block,
   u32 chain_index);
 void tcf_chain_put_by_act(struct tcf_chain *chain);
+struct tcf_chain *tcf_get_next_chain(struct tcf_block *block,
+struct tcf_chain *chain);
 void tcf_block_netif_keep_dst(struct tcf_block *block);
 int tcf_block_get(struct tcf_block **p_block,
  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index beffcc2ab1fa..bc4666985d4d 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -646,28 +646,62 @@ static struct tcf_block *tcf_block_refcnt_get(struct net 
*net, u32 block_index)
return block;
 }
 
-static void tcf_block_flush_all_chains(struct tcf_block *block)
+static struct tcf_chain *
+__tcf_get_next_chain(struct tcf_block *block, struct tcf_chain *chain)
 {
-   struct tcf_chain *chain;
+   spin_lock(>lock);
+   if (chain)
+   chain = list_is_last(>list, >chain_list) ?
+   NULL : list_next_entry(chain, list);
+   else
+   chain = list_first_entry_or_null(>chain_list,
+struct tcf_chain, list);
 
-   /* Hold a refcnt for all chains, so that they don't disappear
-* while we are iterating.
-*/
-   list_for_each_entry(chain, >chain_list, list)
+   /* skip all action-only chains */
+   while (chain && tcf_chain_held_by_acts_only(chain))
+   chain = list_is_last(>list, >chain_list) ?
+   NULL : list_next_entry(chain, list);
+
+   if (chain)
tcf_chain_hold(chain);
+   spin_unlock(>lock);
 
-   list_for_each_entry(chain, >chain_list, list)
-   tcf_chain_flush(chain);
+   return chain;
 }
 
-static void tcf_block_put_all_chains(struct tcf_block *block)
+/* Function to be used by all clients that want to iterate over all chains on
+ * block. It properly obtains block->lock and takes reference to chain before
+ * returning it. Users of this function must be tolerant to concurrent chain
+ * insertion/deletion or ensure that no concurrent chain modification is
+ * possible. Note that all netlink dump callbacks cannot guarantee to provide
+ * consistent dump because rtnl lock is released each time skb is filled with
+ * data and sent to user-space.
+ */
+
+struct tcf_chain *
+tcf_get_next_chain(struct tcf_block *block, struct tcf_chain *chain)
 {
-   struct tcf_chain *chain, *tmp;
+   struct tcf_chain *chain_next = __tcf_get_next_chain(block, chain);
 
-   /* At this point, all the chains should have refcnt >= 1. */
-   list_for_each_entry_safe(chain, tmp, >chain_list, list) {
-   tcf_chain_put_explicitly_created(chain);
+   if (chain)
tcf_chain_put(chain);
+
+   return chain_next;
+}
+EXPORT_SYMBOL(tcf_get_next_chain);
+
+static void tcf_block_flush_all_chains(struct tcf_block *block)
+{
+   struct tcf_chain *chain;
+
+   /* Last reference to block. At this point chains cannot be added or
+* removed concurrently.
+*/
+   for (chain = tcf_get_next_chain(block, NULL);
+chain;
+chain = tcf_get_next_chain(block, chain)) {
+   tcf_chain_put_explicitly_created(chain);
+   tcf_chain_flush(chain);
}
 }
 
@@ -686,8 +720,6 @@ static void __tcf_block_put(struct tcf_block *block, struct 
Qdisc *q,
spin_unlock(>lock);
if (tcf_block_shared(block))
tcf_block_remove(block, block->net);
-   if (!free_block)
-   tcf_block_flush_all_chains(block);
 
if (q)
tcf_block_offload_unbind(block, q, ei);
@@ -695,7 +727,7 @@ static void __tcf_block_put(struct tcf_block *bl

[PATCH net-next 15/17] net: sched: add flags to Qdisc class ops struct

2018-11-11 Thread Vlad Buslov

Extend Qdisc_class_ops with flags. Create enum to hold possible class ops
flag values. Add first class ops flags value QDISC_CLASS_OPS_DOIT_UNLOCKED
to indicate that class ops functions can be called without taking rtnl
lock.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 7b158d6fae85..c97300f5ea5c 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -175,6 +175,7 @@ static inline int qdisc_avail_bulklimit(const struct 
netdev_queue *txq)
 }
 
 struct Qdisc_class_ops {
+   unsigned intflags;
/* Child qdisc manipulation */
struct netdev_queue *   (*select_queue)(struct Qdisc *, struct tcmsg *);
int (*graft)(struct Qdisc *, unsigned long cl,
@@ -206,6 +207,13 @@ struct Qdisc_class_ops {
struct gnet_dump *);
 };
 
+/* Qdisc_class_ops flag values */
+
+/* Implements API that doesn't require rtnl lock */
+enum qdisc_class_ops_flags {
+   QDISC_CLASS_OPS_DOIT_UNLOCKED = 1,
+};
+
 struct Qdisc_ops {
struct Qdisc_ops*next;
const struct Qdisc_class_ops*cl_ops;
-- 
2.7.5

[PATCH net-next 02/17] net: sched: protect block state with spinlock

2018-11-11 Thread Vlad Buslov

Currently, tcf_block doesn't use any synchronization mechanisms to protect
code that manages lifetime of its chains. block->chain_list and multiple
variables in tcf_chain that control its lifetime assume external
synchronization provided by global rtnl lock. Converting chain reference
counting to atomic reference counters is not possible because cls API uses
multiple counters and flags to control chain lifetime, so all of them must
be synchronized in chain get/put code.

Use single per-block lock to protect block data and manage lifetime of all
chains on the block. Always take block->lock when accessing chain_list.
Chain get and put modify chain lifetime-management data and parent block's
chain_list, so take the lock in these functions. Verify block->lock state
with assertions in functions that expect to be called with the lock taken
and are called from multiple places. Take block->lock when accessing
filter_chain_list.

block->lock is a spinlock which means blocking functions like classifier
ops callbacks cannot be called while holding it. Rearrange chain get and
put functions code to only access protected chain data while holding block
lock and move blocking calls outside critical section:
- Check if chain was explicitly created inside put function while holding
  block lock. Add additional argument to __tcf_chain_put() to only put
  explicitly created chain.
- Rearrange code to only access chain reference counter and chain action
  reference counter while holding block lock.
- Split tcf_chain_destroy() helper to two functions: one that requires
  block->lock, and another one that needs to call sleeping functions and
  can be executed after lock is released. First helper is used to detach
  chain from block and make it inaccessible for concurrent users, second
  actually deallocates chain memory (and parent block, if applicable).

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  4 ++
 net/sched/cls_api.c   | 93 +++
 2 files changed, 81 insertions(+), 16 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 29118bbd528f..5f4fc28fc77a 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -350,6 +350,10 @@ struct tcf_chain {
 };
 
 struct tcf_block {
+   /* Lock protects tcf_block and lifetime-management data of chains
+* attached to the block (refcnt, action_refcnt, explicitly_created).
+*/
+   spinlock_t lock;
struct list_head chain_list;
u32 index; /* block index for shared blocks */
refcount_t refcnt;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 17c1691bf0c0..df3326dd33ef 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -216,6 +216,10 @@ static void tcf_proto_destroy(struct tcf_proto *tp,
tc_queue_proto_work(>work);
 }
 
+#define ASSERT_BLOCK_LOCKED(block) \
+   WARN_ONCE(!spin_is_locked(&(block)->lock),  \
+ "BLOCK: assertion failed at %s (%d)\n", __FILE__,  __LINE__)
+
 struct tcf_filter_chain_list_item {
struct list_head list;
tcf_chain_head_change_t *chain_head_change;
@@ -227,7 +231,9 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block 
*block,
 {
struct tcf_chain *chain;
 
-   chain = kzalloc(sizeof(*chain), GFP_KERNEL);
+   ASSERT_BLOCK_LOCKED(block);
+
+   chain = kzalloc(sizeof(*chain), GFP_ATOMIC);
if (!chain)
return NULL;
list_add_tail(>list, >chain_list);
@@ -258,13 +264,29 @@ static void tcf_chain0_head_change(struct tcf_chain 
*chain,
tcf_chain_head_change_item(item, tp_head);
 }
 
-static void tcf_chain_destroy(struct tcf_chain *chain)
+/* Returns true if block can be safely freed. */
+
+static bool tcf_chain_detach(struct tcf_chain *chain)
 {
struct tcf_block *block = chain->block;
 
+   ASSERT_BLOCK_LOCKED(block);
+
list_del(>list);
if (!chain->index)
block->chain0.chain = NULL;
+
+   if (list_empty(>chain_list) &&
+   refcount_read(>refcnt) == 0)
+   return true;
+
+   return false;
+}
+
+static void tcf_chain_destroy(struct tcf_chain *chain, bool free_block)
+{
+   struct tcf_block *block = chain->block;
+
kfree(chain);
if (list_empty(>chain_list) && !refcount_read(>refcnt))
kfree_rcu(block, rcu);
@@ -272,11 +294,15 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
 
 static void tcf_chain_hold(struct tcf_chain *chain)
 {
+   ASSERT_BLOCK_LOCKED(chain->block);
+
++chain->refcnt;
 }
 
 static bool tcf_chain_held_by_acts_only(struct tcf_chain *chain)
 {
+   ASSERT_BLOCK_LOCKED(chain->block);
+
/* In case all the references are action references, this
 * chain should not

[PATCH net-next 14/17] net: sched: extend proto ops to support unlocked classifiers

2018-11-11 Thread Vlad Buslov

Add 'rtnl_held' flag to tcf proto change, delete, destroy, dump, walk
functions to track rtnl lock status. This allows classifiers to release
rtnl lock when necessary and to pass rtnl lock status to extensions and
driver offload callbacks.

Add flags field to tcf proto ops. Add flag value to indicate that
classifier doesn't require rtnl lock.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h | 17 +++---
 net/sched/cls_api.c   | 85 ---
 net/sched/cls_basic.c | 12 ---
 net/sched/cls_bpf.c   | 12 ---
 net/sched/cls_cgroup.c| 11 +++---
 net/sched/cls_flow.c  | 13 +---
 net/sched/cls_flower.c| 13 +---
 net/sched/cls_fw.c| 13 +---
 net/sched/cls_matchall.c  | 13 +---
 net/sched/cls_route.c | 12 ---
 net/sched/cls_rsvp.h  | 13 +---
 net/sched/cls_tcindex.c   | 15 +
 net/sched/cls_u32.c   | 12 ---
 net/sched/sch_api.c   |  2 +-
 14 files changed, 145 insertions(+), 98 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 24103b7282bd..7b158d6fae85 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -269,7 +269,7 @@ struct tcf_proto_ops {
const struct tcf_proto *,
struct tcf_result *);
int (*init)(struct tcf_proto*);
-   void(*destroy)(struct tcf_proto *tp,
+   void(*destroy)(struct tcf_proto *tp, bool rtnl_held,
   struct netlink_ext_ack *extack);
 
void*   (*get)(struct tcf_proto*, u32 handle);
@@ -277,12 +277,13 @@ struct tcf_proto_ops {
int (*change)(struct net *net, struct sk_buff *,
struct tcf_proto*, unsigned long,
u32 handle, struct nlattr **,
-   void **, bool,
+   void **, bool, bool,
struct netlink_ext_ack *);
int (*delete)(struct tcf_proto *tp, void *arg,
- bool *last,
+ bool *last, bool rtnl_held,
  struct netlink_ext_ack *);
-   void(*walk)(struct tcf_proto*, struct tcf_walker 
*arg);
+   void(*walk)(struct tcf_proto *tp,
+   struct tcf_walker *arg, bool rtnl_held);
int (*reoffload)(struct tcf_proto *tp, bool add,
 tc_setup_cb_t *cb, void *cb_priv,
 struct netlink_ext_ack *extack);
@@ -295,12 +296,18 @@ struct tcf_proto_ops {
 
/* rtnetlink specific */
int (*dump)(struct net*, struct tcf_proto*, void *,
-   struct sk_buff *skb, struct tcmsg*);
+   struct sk_buff *skb, struct tcmsg*,
+   bool);
int (*tmplt_dump)(struct sk_buff *skb,
  struct net *net,
  void *tmplt_priv);
 
struct module   *owner;
+   int flags;
+};
+
+enum tcf_proto_ops_flags {
+   TCF_PROTO_OPS_DOIT_UNLOCKED = 1,
 };
 
 struct tcf_proto {
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 7f65ed84b5e5..1956c5df5f89 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -60,7 +60,8 @@ static const struct tcf_proto_ops 
*__tcf_proto_lookup_ops(const char *kind)
 }
 
 static const struct tcf_proto_ops *
-tcf_proto_lookup_ops(const char *kind, struct netlink_ext_ack *extack)
+tcf_proto_lookup_ops(const char *kind, bool rtnl_held,
+struct netlink_ext_ack *extack)
 {
const struct tcf_proto_ops *ops;
 
@@ -68,9 +69,11 @@ tcf_proto_lookup_ops(const char *kind, struct 
netlink_ext_ack *extack)
if (ops)
return ops;
 #ifdef CONFIG_MODULES
-   rtnl_unlock();
+   if (rtnl_held)
+   rtnl_unlock();
request_module("cls_%s", kind);
-   rtnl_lock();
+   if (rtnl_held)
+   rtnl_lock();
ops = __tcf_proto_lookup_ops(kind);
/* We dropped the RTNL semaphore in order to perform
 * the module load. So, even if we succeeded in loading
@@ -168,7 +171,7 @@ static void tcf_proto_destroy_work(struct work_struct *work)
 
rtnl_lock();
 
-   tp->ops->destroy(tp, NULL);
+   tp->ops->destroy(tp, true, NULL);
module_put(tp->ops->owner);
kfree_rcu(tp, rcu);
tcf

[PATCH net-next 04/17] net: sched: protect block->chain0 with block->lock

2018-11-11 Thread Vlad Buslov

In order to remove dependency on rtnl lock, use block->lock to protect
chain0 struct from concurrent modification. Rearrange code in chain0
callback add and del functions to only access chain0 when block->lock is
held.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 146a02094905..beffcc2ab1fa 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -260,8 +260,11 @@ static void tcf_chain0_head_change(struct tcf_chain *chain,
 
if (chain->index)
return;
+
+   spin_lock(>lock);
list_for_each_entry(item, >chain0.filter_chain_list, list)
tcf_chain_head_change_item(item, tp_head);
+   spin_unlock(>lock);
 }
 
 /* Returns true if block can be safely freed. */
@@ -519,8 +522,8 @@ tcf_chain0_head_change_cb_add(struct tcf_block *block,
  struct tcf_block_ext_info *ei,
  struct netlink_ext_ack *extack)
 {
-   struct tcf_chain *chain0 = block->chain0.chain;
struct tcf_filter_chain_list_item *item;
+   struct tcf_chain *chain0;
 
item = kmalloc(sizeof(*item), GFP_KERNEL);
if (!item) {
@@ -529,9 +532,14 @@ tcf_chain0_head_change_cb_add(struct tcf_block *block,
}
item->chain_head_change = ei->chain_head_change;
item->chain_head_change_priv = ei->chain_head_change_priv;
+
+   spin_lock(>lock);
+   chain0 = block->chain0.chain;
if (chain0 && chain0->filter_chain)
tcf_chain_head_change_item(item, chain0->filter_chain);
list_add(>list, >chain0.filter_chain_list);
+   spin_unlock(>lock);
+
return 0;
 }
 
@@ -539,20 +547,23 @@ static void
 tcf_chain0_head_change_cb_del(struct tcf_block *block,
  struct tcf_block_ext_info *ei)
 {
-   struct tcf_chain *chain0 = block->chain0.chain;
struct tcf_filter_chain_list_item *item;
 
+   spin_lock(>lock);
list_for_each_entry(item, >chain0.filter_chain_list, list) {
if ((!ei->chain_head_change && !ei->chain_head_change_priv) ||
(item->chain_head_change == ei->chain_head_change &&
 item->chain_head_change_priv == 
ei->chain_head_change_priv)) {
-   if (chain0)
+   if (block->chain0.chain)
tcf_chain_head_change_item(item, NULL);
list_del(>list);
+   spin_unlock(>lock);
+
kfree(item);
return;
}
}
+   spin_unlock(>lock);
WARN_ON(1);
 }
 
-- 
2.7.5

[PATCH net-next 06/17] net: sched: protect chain template accesses with block lock

2018-11-11 Thread Vlad Buslov

When cls API is called without protection of rtnl lock, parallel
modification of chain is possible, which means that chain template can be
changed concurrently in certain circumstances. For example, when chain is
'deleted' by new user-space chain API, the chain might continue to be used
if it is referenced by actions, and can be 're-created' again by user. In
such case same chain structure is reused and its template is changed. To
protect from described scenario, cache chain template while holding block
lock. Introduce standalone tc_chain_notify_delete() function that works
with cached template values, instead of chains themselves.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 73 +
 1 file changed, 57 insertions(+), 16 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index bc4666985d4d..bd0dac35e26b 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -381,14 +381,22 @@ struct tcf_chain *tcf_chain_get_by_act(struct tcf_block 
*block, u32 chain_index)
 }
 EXPORT_SYMBOL(tcf_chain_get_by_act);
 
-static void tc_chain_tmplt_del(struct tcf_chain *chain);
+static void tc_chain_tmplt_del(const struct tcf_proto_ops *tmplt_ops,
+  void *tmplt_priv);
+static int tc_chain_notify_delete(const struct tcf_proto_ops *tmplt_ops,
+ void *tmplt_priv, u32 chain_index,
+ struct tcf_block *block, struct sk_buff *oskb,
+ u32 seq, u16 flags, bool unicast);
 
 static void __tcf_chain_put(struct tcf_chain *chain, bool by_act,
bool explicitly_created)
 {
struct tcf_block *block = chain->block;
+   const struct tcf_proto_ops *tmplt_ops;
bool is_last, free_block = false;
unsigned int refcnt;
+   void *tmplt_priv;
+   u32 chain_index;
 
spin_lock(>lock);
if (explicitly_created) {
@@ -408,16 +416,21 @@ static void __tcf_chain_put(struct tcf_chain *chain, bool 
by_act,
 */
refcnt = --chain->refcnt;
is_last = refcnt - chain->action_refcnt == 0;
+   tmplt_ops = chain->tmplt_ops;
+   tmplt_priv = chain->tmplt_priv;
+   chain_index = chain->index;
+
if (refcnt == 0)
free_block = tcf_chain_detach(chain);
spin_unlock(>lock);
 
/* The last dropped non-action reference will trigger notification. */
if (is_last && !by_act)
-   tc_chain_notify(chain, NULL, 0, 0, RTM_DELCHAIN, false);
+   tc_chain_notify_delete(tmplt_ops, tmplt_priv, chain_index,
+  block, NULL, 0, 0, false);
 
if (refcnt == 0) {
-   tc_chain_tmplt_del(chain);
+   tc_chain_tmplt_del(tmplt_ops, tmplt_priv);
tcf_chain_destroy(chain, free_block);
}
 }
@@ -1947,8 +1960,10 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct 
netlink_callback *cb)
return skb->len;
 }
 
-static int tc_chain_fill_node(struct tcf_chain *chain, struct net *net,
- struct sk_buff *skb, struct tcf_block *block,
+static int tc_chain_fill_node(const struct tcf_proto_ops *tmplt_ops,
+ void *tmplt_priv, u32 chain_index,
+ struct net *net, struct sk_buff *skb,
+ struct tcf_block *block,
  u32 portid, u32 seq, u16 flags, int event)
 {
unsigned char *b = skb_tail_pointer(skb);
@@ -1957,8 +1972,8 @@ static int tc_chain_fill_node(struct tcf_chain *chain, 
struct net *net,
struct tcmsg *tcm;
void *priv;
 
-   ops = chain->tmplt_ops;
-   priv = chain->tmplt_priv;
+   ops = tmplt_ops;
+   priv = tmplt_priv;
 
nlh = nlmsg_put(skb, portid, seq, event, sizeof(*tcm), flags);
if (!nlh)
@@ -1976,7 +1991,7 @@ static int tc_chain_fill_node(struct tcf_chain *chain, 
struct net *net,
tcm->tcm_block_index = block->index;
}
 
-   if (nla_put_u32(skb, TCA_CHAIN, chain->index))
+   if (nla_put_u32(skb, TCA_CHAIN, chain_index))
goto nla_put_failure;
 
if (ops) {
@@ -2007,7 +2022,8 @@ static int tc_chain_notify(struct tcf_chain *chain, 
struct sk_buff *oskb,
if (!skb)
return -ENOBUFS;
 
-   if (tc_chain_fill_node(chain, net, skb, block, portid,
+   if (tc_chain_fill_node(chain->tmplt_ops, chain->tmplt_priv,
+  chain->index, net, skb, block, portid,
   seq, flags, event) <= 0) {
kfree_skb(skb);
return -EINVAL;
@@ -2019,6 +2035,31 @@ static int tc_chain_notify(struct tcf_chain *chain, 
struct sk_buff *oskb,
return rtnetlink_send(skb, net, portid, RTNLGRP_TC, flags & NLM_F_ECHO);

[PATCH net-next 12/17] net: sched: track rtnl lock status when validating extensions

2018-11-11 Thread Vlad Buslov

Actions API is already updated to not rely on rtnl lock for
synchronization. However, it need to be provided with rtnl status when
called from classifiers API in order to be able to correctly release the
lock when loading kernel module.

Extend extension validation function with 'rtnl_held' flag which is passed
to actions API. Add new 'rtnl_held' parameter to tcf_exts_validate() in cls
API. No classifier is currently updated to support unlocked execution, so
pass hardcoded 'true' flag parameter value.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/pkt_cls.h| 2 +-
 net/sched/cls_api.c  | 9 +
 net/sched/cls_basic.c| 2 +-
 net/sched/cls_bpf.c  | 3 ++-
 net/sched/cls_cgroup.c   | 2 +-
 net/sched/cls_flow.c | 2 +-
 net/sched/cls_flower.c   | 3 ++-
 net/sched/cls_fw.c   | 2 +-
 net/sched/cls_matchall.c | 3 ++-
 net/sched/cls_route.c| 2 +-
 net/sched/cls_rsvp.h | 3 ++-
 net/sched/cls_tcindex.c  | 2 +-
 net/sched/cls_u32.c  | 2 +-
 13 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 727cb99da5d4..1f1b06311622 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -381,7 +381,7 @@ tcf_exts_exec(struct sk_buff *skb, struct tcf_exts *exts,
 
 int tcf_exts_validate(struct net *net, struct tcf_proto *tp,
  struct nlattr **tb, struct nlattr *rate_tlv,
- struct tcf_exts *exts, bool ovr,
+ struct tcf_exts *exts, bool ovr, bool rtnl_held,
  struct netlink_ext_ack *extack);
 void tcf_exts_destroy(struct tcf_exts *exts);
 void tcf_exts_change(struct tcf_exts *dst, struct tcf_exts *src);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index d4f763525412..0949502e31b9 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -2613,7 +2613,7 @@ EXPORT_SYMBOL(tcf_exts_destroy);
 
 int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr 
**tb,
  struct nlattr *rate_tlv, struct tcf_exts *exts, bool ovr,
- struct netlink_ext_ack *extack)
+ bool rtnl_held, struct netlink_ext_ack *extack)
 {
 #ifdef CONFIG_NET_CLS_ACT
{
@@ -2623,7 +2623,8 @@ int tcf_exts_validate(struct net *net, struct tcf_proto 
*tp, struct nlattr **tb,
if (exts->police && tb[exts->police]) {
act = tcf_action_init_1(net, tp, tb[exts->police],
rate_tlv, "police", ovr,
-   TCA_ACT_BIND, true, extack);
+   TCA_ACT_BIND, rtnl_held,
+   extack);
if (IS_ERR(act))
return PTR_ERR(act);
 
@@ -2635,8 +2636,8 @@ int tcf_exts_validate(struct net *net, struct tcf_proto 
*tp, struct nlattr **tb,
 
err = tcf_action_init(net, tp, tb[exts->action],
  rate_tlv, NULL, ovr, TCA_ACT_BIND,
- exts->actions, _size, true,
- extack);
+ exts->actions, _size,
+ rtnl_held, extack);
if (err < 0)
return err;
exts->nr_actions = err;
diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index 6a5dce8baf19..90e44888f85d 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -148,7 +148,7 @@ static int basic_set_parms(struct net *net, struct 
tcf_proto *tp,
 {
int err;
 
-   err = tcf_exts_validate(net, tp, tb, est, >exts, ovr, extack);
+   err = tcf_exts_validate(net, tp, tb, est, >exts, ovr, true, extack);
if (err < 0)
return err;
 
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index fa6fe2fe0f32..9c0ac7c23ad7 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -417,7 +417,8 @@ static int cls_bpf_set_parms(struct net *net, struct 
tcf_proto *tp,
if ((!is_bpf && !is_ebpf) || (is_bpf && is_ebpf))
return -EINVAL;
 
-   ret = tcf_exts_validate(net, tp, tb, est, >exts, ovr, extack);
+   ret = tcf_exts_validate(net, tp, tb, est, >exts, ovr, true,
+   extack);
if (ret < 0)
return ret;
 
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 3bc01bdde165..663ee1c6d606 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -110,7 +110,7 @@ static int cls_cgroup_change(struct net *net, struct 
sk_buff *in_skb,
goto errout;
 
err = tcf_exts_validate(net, tp, tb, tca[TCA_RATE], >exts, ovr,
-

[PATCH net-next 01/17] net: sched: refactor mini_qdisc_pair_swap() to use workqueue

2018-11-11 Thread Vlad Buslov

As a part of the effort to remove dependency on rtnl lock, cls API is being
converted to use fine-grained locking mechanisms instead of global rtnl
lock. However, chain_head_change callback for ingress Qdisc is a sleeping
function and cannot be executed while holding a spinlock.

Extend cls API with new workqueue intended to be used for tcf_proto
lifetime management. Modify tcf_proto_destroy() to deallocate proto
asynchronously on workqueue in order to ensure that all chain_head_change
callbacks involving the proto complete before it is freed. Convert
mini_qdisc_pair_swap(), that is used as a chain_head_change callback for
ingress and clsact Qdiscs, to use a workqueue.

Signed-off-by: Vlad Buslov 
Suggested-by: Jiri Pirko 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  6 ++
 net/sched/cls_api.c   | 48 +--
 net/sched/sch_generic.c   | 37 +---
 net/sched/sch_ingress.c   |  4 
 4 files changed, 86 insertions(+), 9 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 4d736427a4cb..29118bbd528f 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -319,6 +319,7 @@ struct tcf_proto {
const struct tcf_proto_ops  *ops;
struct tcf_chain*chain;
struct rcu_head rcu;
+   struct work_struct  work;
 };
 
 struct qdisc_skb_cb {
@@ -397,6 +398,8 @@ tc_cls_offload_cnt_update(struct tcf_block *block, u32 *cnt,
}
 }
 
+bool tc_queue_proto_work(struct work_struct *work);
+
 static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
 {
struct qdisc_skb_cb *qcb;
@@ -1148,12 +1151,15 @@ struct mini_Qdisc_pair {
struct mini_Qdisc miniq1;
struct mini_Qdisc miniq2;
struct mini_Qdisc __rcu **p_miniq;
+   struct tcf_proto *tp_head;
+   struct work_struct work;
 };
 
 void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
  struct tcf_proto *tp_head);
 void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
  struct mini_Qdisc __rcu **p_miniq);
+void mini_qdisc_pair_cleanup(struct mini_Qdisc_pair *miniqp);
 
 static inline void skb_tc_reinsert(struct sk_buff *skb, struct tcf_result *res)
 {
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index f427a1e00e7e..17c1691bf0c0 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -107,12 +107,14 @@ int register_tcf_proto_ops(struct tcf_proto_ops *ops)
 EXPORT_SYMBOL(register_tcf_proto_ops);
 
 static struct workqueue_struct *tc_filter_wq;
+static struct workqueue_struct *tc_proto_wq;
 
 int unregister_tcf_proto_ops(struct tcf_proto_ops *ops)
 {
struct tcf_proto_ops *t;
int rc = -ENOENT;
 
+   flush_workqueue(tc_proto_wq);
/* Wait for outstanding call_rcu()s, if any, from a
 * tcf_proto_ops's destroy() handler.
 */
@@ -139,6 +141,12 @@ bool tcf_queue_work(struct rcu_work *rwork, work_func_t 
func)
 }
 EXPORT_SYMBOL(tcf_queue_work);
 
+bool tc_queue_proto_work(struct work_struct *work)
+{
+   return queue_work(tc_proto_wq, work);
+}
+EXPORT_SYMBOL(tc_queue_proto_work);
+
 /* Select new prio value from the range, managed by kernel. */
 
 static inline u32 tcf_auto_prio(struct tcf_proto *tp)
@@ -151,6 +159,23 @@ static inline u32 tcf_auto_prio(struct tcf_proto *tp)
return TC_H_MAJ(first);
 }
 
+static void tcf_chain_put(struct tcf_chain *chain);
+
+static void tcf_proto_destroy_work(struct work_struct *work)
+{
+   struct tcf_proto *tp = container_of(work, struct tcf_proto, work);
+   struct tcf_chain *chain = tp->chain;
+
+   rtnl_lock();
+
+   tp->ops->destroy(tp, NULL);
+   module_put(tp->ops->owner);
+   kfree_rcu(tp, rcu);
+   tcf_chain_put(chain);
+
+   rtnl_unlock();
+}
+
 static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
  u32 prio, struct tcf_chain *chain,
  struct netlink_ext_ack *extack)
@@ -171,6 +196,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, 
u32 protocol,
tp->protocol = protocol;
tp->prio = prio;
tp->chain = chain;
+   INIT_WORK(>work, tcf_proto_destroy_work);
 
err = tp->ops->init(tp);
if (err) {
@@ -187,9 +213,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, 
u32 protocol,
 static void tcf_proto_destroy(struct tcf_proto *tp,
  struct netlink_ext_ack *extack)
 {
-   tp->ops->destroy(tp, extack);
-   module_put(tp->ops->owner);
-   kfree_rcu(tp, rcu);
+   tc_queue_proto_work(>work);
 }
 
 struct tcf_filter_chain_list_item {
@@ -361,7 +385,6 @@ static void tcf_chain_flush(struct tcf_chain *chain)
RCU_INIT_POINTER(chain->filter_chain, tp

[PATCH net-next 07/17] net: sched: lock the chain when accessing filter_chain list

2018-11-11 Thread Vlad Buslov

Always lock chain when accessing filter_chain list, instead of relying on
rtnl lock. Dereference filter_chain with tcf_chain_dereference() lockdep
macro to verify that all users of chain_list have the lock taken.

Rearrange tp insert/remove code in tc_new_tfilter/tc_del_tfilter to execute
all necessary code while holding chain lock in order to prevent
invalidation of chain_info structure by potential concurrent change.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  17 
 net/sched/cls_api.c   | 103 +-
 2 files changed, 83 insertions(+), 37 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 5f4fc28fc77a..82fa23da4969 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -338,6 +338,8 @@ struct qdisc_skb_cb {
 typedef void tcf_chain_head_change_t(struct tcf_proto *tp_head, void *priv);
 
 struct tcf_chain {
+   /* Protects filter_chain and filter_chain_list. */
+   spinlock_t filter_chain_lock;
struct tcf_proto __rcu *filter_chain;
struct list_head list;
struct tcf_block *block;
@@ -371,6 +373,21 @@ struct tcf_block {
struct rcu_head rcu;
 };
 
+#ifdef CONFIG_PROVE_LOCKING
+static inline bool lockdep_tcf_chain_is_locked(struct tcf_chain *chain)
+{
+   return lockdep_is_held(>filter_chain_lock);
+}
+#else
+static inline bool lockdep_tcf_chain_is_locked(struct tcf_block *chain)
+{
+   return true;
+}
+#endif /* #ifdef CONFIG_PROVE_LOCKING */
+
+#define tcf_chain_dereference(p, chain)
\
+   rcu_dereference_protected(p, lockdep_tcf_chain_is_locked(chain))
+
 static inline void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
 {
if (*flags & TCA_CLS_FLAGS_IN_HW)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index bd0dac35e26b..8f5dfa3ffb1c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -237,6 +237,7 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block 
*block,
if (!chain)
return NULL;
list_add_tail(>list, >chain_list);
+   spin_lock_init(>filter_chain_lock);
chain->block = block;
chain->index = chain_index;
chain->refcnt = 1;
@@ -453,9 +454,13 @@ static void tcf_chain_put_explicitly_created(struct 
tcf_chain *chain)
 
 static void tcf_chain_flush(struct tcf_chain *chain)
 {
-   struct tcf_proto *tp = rtnl_dereference(chain->filter_chain);
+   struct tcf_proto *tp;
 
+   spin_lock(>filter_chain_lock);
+   tp = tcf_chain_dereference(chain->filter_chain, chain);
tcf_chain0_head_change(chain, NULL);
+   spin_unlock(>filter_chain_lock);
+
while (tp) {
RCU_INIT_POINTER(chain->filter_chain, tp->next);
tcf_proto_destroy(tp, NULL);
@@ -548,8 +553,15 @@ tcf_chain0_head_change_cb_add(struct tcf_block *block,
 
spin_lock(>lock);
chain0 = block->chain0.chain;
-   if (chain0 && chain0->filter_chain)
-   tcf_chain_head_change_item(item, chain0->filter_chain);
+   if (chain0) {
+   struct tcf_proto *tp_head;
+
+   spin_lock(>filter_chain_lock);
+   tp_head = tcf_chain_dereference(chain0->filter_chain, chain0);
+   if (tp_head)
+   tcf_chain_head_change_item(item, tp_head);
+   spin_unlock(>filter_chain_lock);
+   }
list_add(>list, >chain0.filter_chain_list);
spin_unlock(>lock);
 
@@ -1251,9 +1263,10 @@ struct tcf_chain_info {
struct tcf_proto __rcu *next;
 };
 
-static struct tcf_proto *tcf_chain_tp_prev(struct tcf_chain_info *chain_info)
+static struct tcf_proto *tcf_chain_tp_prev(struct tcf_chain *chain,
+  struct tcf_chain_info *chain_info)
 {
-   return rtnl_dereference(*chain_info->pprev);
+   return tcf_chain_dereference(*chain_info->pprev, chain);
 }
 
 static void tcf_chain_tp_insert(struct tcf_chain *chain,
@@ -1262,7 +1275,7 @@ static void tcf_chain_tp_insert(struct tcf_chain *chain,
 {
if (*chain_info->pprev == chain->filter_chain)
tcf_chain0_head_change(chain, tp);
-   RCU_INIT_POINTER(tp->next, tcf_chain_tp_prev(chain_info));
+   RCU_INIT_POINTER(tp->next, tcf_chain_tp_prev(chain, chain_info));
rcu_assign_pointer(*chain_info->pprev, tp);
tcf_chain_hold(chain);
 }
@@ -1271,7 +1284,7 @@ static void tcf_chain_tp_remove(struct tcf_chain *chain,
struct tcf_chain_info *chain_info,
struct tcf_proto *tp)
 {
-   struct tcf_proto *next = rtnl_dereference(chain_info->next);
+   struct tcf_proto *next = tcf_chain_dereference(chain_info->next, chain);
 
if (tp == chain->filter_chain)

[PATCH net-next 03/17] net: sched: refactor tc_ctl_chain() to use block->lock

2018-11-11 Thread Vlad Buslov

In order to remove dependency on rtnl lock, modify chain API to use
block->lock to protect chain from concurrent modification. Rearrange
tc_ctl_chain() code to call tcf_chain_hold() while holding block->lock.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 36 +---
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index df3326dd33ef..146a02094905 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -2047,6 +2047,8 @@ static int tc_ctl_chain(struct sk_buff *skb, struct 
nlmsghdr *n,
err = -EINVAL;
goto errout_block;
}
+
+   spin_lock(>lock);
chain = tcf_chain_lookup(block, chain_index);
if (n->nlmsg_type == RTM_NEWCHAIN) {
if (chain) {
@@ -2058,41 +2060,49 @@ static int tc_ctl_chain(struct sk_buff *skb, struct 
nlmsghdr *n,
} else {
NL_SET_ERR_MSG(extack, "Filter chain already 
exists");
err = -EEXIST;
-   goto errout_block;
+   goto errout_block_locked;
}
} else {
if (!(n->nlmsg_flags & NLM_F_CREATE)) {
NL_SET_ERR_MSG(extack, "Need both RTM_NEWCHAIN 
and NLM_F_CREATE to create a new chain");
err = -ENOENT;
-   goto errout_block;
+   goto errout_block_locked;
}
chain = tcf_chain_create(block, chain_index);
if (!chain) {
NL_SET_ERR_MSG(extack, "Failed to create filter 
chain");
err = -ENOMEM;
-   goto errout_block;
+   goto errout_block_locked;
}
}
} else {
if (!chain || tcf_chain_held_by_acts_only(chain)) {
NL_SET_ERR_MSG(extack, "Cannot find specified filter 
chain");
err = -EINVAL;
-   goto errout_block;
+   goto errout_block_locked;
}
tcf_chain_hold(chain);
}
 
+   if (n->nlmsg_type == RTM_NEWCHAIN) {
+   /* Modifying chain requires holding parent block lock. In case
+* the chain was successfully added, take a reference to the
+* chain. This ensures that an empty chain does not disappear at
+* the end of this function.
+*/
+   tcf_chain_hold(chain);
+   chain->explicitly_created = true;
+   }
+   spin_unlock(>lock);
+
switch (n->nlmsg_type) {
case RTM_NEWCHAIN:
err = tc_chain_tmplt_add(chain, net, tca, extack);
-   if (err)
+   if (err) {
+   tcf_chain_put_explicitly_created(chain);
goto errout;
-   /* In case the chain was successfully added, take a reference
-* to the chain. This ensures that an empty chain
-* does not disappear at the end of this function.
-*/
-   tcf_chain_hold(chain);
-   chain->explicitly_created = true;
+   }
+
tc_chain_notify(chain, NULL, 0, NLM_F_CREATE | NLM_F_EXCL,
RTM_NEWCHAIN, false);
break;
@@ -2127,6 +2137,10 @@ static int tc_ctl_chain(struct sk_buff *skb, struct 
nlmsghdr *n,
/* Replay the request. */
goto replay;
return err;
+
+errout_block_locked:
+   spin_unlock(>lock);
+   goto errout_block;
 }
 
 /* called with RTNL */
-- 
2.7.5

[PATCH iproute2-next] libnetlink: fix use-after-free of message buf

2018-10-08 Thread Vlad Buslov

In __rtnl_talk_iov() main loop, err is a pointer to memory in dynamically
allocated 'buf' that is used to store netlink messages. If netlink message
is an error message, buf is deallocated before returning with error code.
However, on return err->error code is checked one more time to generate
return value, after memory which err points to has already been
freed. Save error code in temporary variable and use the variable to
generate return value.

Signed-off-by: Vlad Buslov 
---
 lib/libnetlink.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index f8b8fbfd0010..bc8338052e17 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -802,6 +802,7 @@ static int __rtnl_talk_iov(struct rtnl_handle *rtnl, struct 
iovec *iov,
 
if (h->nlmsg_type == NLMSG_ERROR) {
struct nlmsgerr *err = (struct nlmsgerr 
*)NLMSG_DATA(h);
+   int error = err->error;
 
if (l < sizeof(struct nlmsgerr)) {
fprintf(stderr, "ERROR truncated\n");
@@ -825,7 +826,7 @@ static int __rtnl_talk_iov(struct rtnl_handle *rtnl, struct 
iovec *iov,
else
free(buf);
 
-   return err->error ? -i : 0;
+   return error ? -i : 0;
}
 
if (answer) {
-- 
2.7.5

[PATCH iproute2-next v2] tc: flower: expose hardware offload count

2018-10-03 Thread Vlad Buslov

Recently flower classifier was updated to expose count of devices that
filter is offloaded to. Add support to print this counter as 'in_hw_count'.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
Changes from V1 to V2:
- Change print format string to "%u"

 tc/f_flower.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tc/f_flower.c b/tc/f_flower.c
index 59e5f572c542..ab7ea3e32f69 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -1585,8 +1585,16 @@ static int flower_print_opt(struct filter_util *qu, FILE 
*f,
if (flags & TCA_CLS_FLAGS_SKIP_SW)
print_bool(PRINT_ANY, "skip_sw", "\n  skip_sw", true);
 
-   if (flags & TCA_CLS_FLAGS_IN_HW)
+   if (flags & TCA_CLS_FLAGS_IN_HW) {
print_bool(PRINT_ANY, "in_hw", "\n  in_hw", true);
+
+   if (tb[TCA_FLOWER_IN_HW_COUNT]) {
+   __u32 count = 
rta_getattr_u32(tb[TCA_FLOWER_IN_HW_COUNT]);
+
+   print_uint(PRINT_ANY, "in_hw_count",
+  " in_hw_count %u", count);
+   }
+   }
else if (flags & TCA_CLS_FLAGS_NOT_IN_HW)
print_bool(PRINT_ANY, "not_in_hw", "\n  not_in_hw", 
true);
}
-- 
2.7.5

Re: [PATCH iproute2-next] tc: flower: expose hardware offload count

2018-10-03 Thread Vlad Buslov



On Wed 03 Oct 2018 at 16:08, Davide Caratti  wrote:
> On Wed, 2018-10-03 at 18:29 +0300, Vlad Buslov wrote:
>> Recently flower classifier was updated to expose count of devices that
>> filter is offloaded to. Add support to print this counter as 'in_hw_count'.
>> 
>> Signed-off-by: Vlad Buslov 
>> Acked-by: Jiri Pirko 
>> ---
>>  tc/f_flower.c | 10 +-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>> 
>> diff --git a/tc/f_flower.c b/tc/f_flower.c
>> index 59e5f572c542..cbacc664d397 100644
>
> hello Vlad!
>
>> --- a/tc/f_flower.c
>> +++ b/tc/f_flower.c
>> @@ -1585,8 +1585,16 @@ static int flower_print_opt(struct filter_util *qu, 
>> FILE *f,
>>  if (flags & TCA_CLS_FLAGS_SKIP_SW)
>>  print_bool(PRINT_ANY, "skip_sw", "\n  skip_sw", true);
>>  
>> -if (flags & TCA_CLS_FLAGS_IN_HW)
>> +if (flags & TCA_CLS_FLAGS_IN_HW) {
>>  print_bool(PRINT_ANY, "in_hw", "\n  in_hw", true);
>> +
>> +if (tb[TCA_FLOWER_IN_HW_COUNT]) {
>> +__u32 count = 
>> rta_getattr_u32(tb[TCA_FLOWER_IN_HW_COUNT]);
>> +
>> +print_uint(PRINT_ANY, "in_hw_count",
>> +   " in_hw_count %d", count);
> ^^ maybe using %u in the format is better?
>
> thanks!

Hello Davide!

Sure. I'll send V2 with "%u".

Thanks,
Vlad

[PATCH iproute2-next] tc: flower: expose hardware offload count

2018-10-03 Thread Vlad Buslov

Recently flower classifier was updated to expose count of devices that
filter is offloaded to. Add support to print this counter as 'in_hw_count'.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 tc/f_flower.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tc/f_flower.c b/tc/f_flower.c
index 59e5f572c542..cbacc664d397 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -1585,8 +1585,16 @@ static int flower_print_opt(struct filter_util *qu, FILE 
*f,
if (flags & TCA_CLS_FLAGS_SKIP_SW)
print_bool(PRINT_ANY, "skip_sw", "\n  skip_sw", true);
 
-   if (flags & TCA_CLS_FLAGS_IN_HW)
+   if (flags & TCA_CLS_FLAGS_IN_HW) {
print_bool(PRINT_ANY, "in_hw", "\n  in_hw", true);
+
+   if (tb[TCA_FLOWER_IN_HW_COUNT]) {
+   __u32 count = 
rta_getattr_u32(tb[TCA_FLOWER_IN_HW_COUNT]);
+
+   print_uint(PRINT_ANY, "in_hw_count",
+  " in_hw_count %d", count);
+   }
+   }
else if (flags & TCA_CLS_FLAGS_NOT_IN_HW)
print_bool(PRINT_ANY, "not_in_hw", "\n  not_in_hw", 
true);
}
-- 
2.7.5

Re: [Patch net-next] net_sched: fix an extack message in tcf_block_find()

2018-09-30 Thread Vlad Buslov



On Fri 28 Sep 2018 at 17:03, Cong Wang  wrote:
> On Fri, Sep 28, 2018 at 4:36 AM Vlad Buslov  wrote:
>>
>> On Thu 27 Sep 2018 at 20:42, Cong Wang  wrote:
>> > It is clearly a copy-n-paste.
>> >
>> > Signed-off-by: Cong Wang 
>> > ---
>> >  net/sched/cls_api.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>> > index 3de47e99b788..8dd7f8af6d54 100644
>> > --- a/net/sched/cls_api.c
>> > +++ b/net/sched/cls_api.c
>> > @@ -655,7 +655,7 @@ static struct tcf_block *tcf_block_find(struct net 
>> > *net, struct Qdisc **q,
>> >
>> >   *q = qdisc_refcount_inc_nz(*q);
>> >   if (!*q) {
>> > - NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't 
>> > exists");
>> > + NL_SET_ERR_MSG(extack, "Can't increase Qdisc 
>> > refcount");
>> >   err = -EINVAL;
>> >   goto errout_rcu;
>> >   }
>>
>> Is there a benefit in exposing this info to user?
>
> Depends on what user you mean here. For kernel developers, yes,
> this is useful. For others, no.
>
>
>> For all intents and purposes Qdisc is gone at that point.
>
> I don't want to be a language lawyer, but there is a difference between
> "it doesn't exist" and "it exists but being removed". The errno EINVAL
> betrays what you said too, it must be ENOENT to mach "Qdisc is gone".
>
> I don't want to waste my time on this any more. Let's just drop it.
>
> I really don't care, do you?

I'm asked the question in order to improve error messages in my future
patches, not because I care about this particular string.

Re: [Patch net-next] net_sched: fix an extack message in tcf_block_find()

2018-09-28 Thread Vlad Buslov

On Thu 27 Sep 2018 at 20:42, Cong Wang  wrote:
> It is clearly a copy-n-paste.
>
> Signed-off-by: Cong Wang 
> ---
>  net/sched/cls_api.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 3de47e99b788..8dd7f8af6d54 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -655,7 +655,7 @@ static struct tcf_block *tcf_block_find(struct net *net, 
> struct Qdisc **q,
>  
>   *q = qdisc_refcount_inc_nz(*q);
>   if (!*q) {
> - NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
> + NL_SET_ERR_MSG(extack, "Can't increase Qdisc refcount");
>   err = -EINVAL;
>   goto errout_rcu;
>   }

Is there a benefit in exposing this info to user?
For all intents and purposes Qdisc is gone at that point.

[PATCH net-next v3 07/10] net: sched: implement functions to put and flush all chains

2018-09-24 Thread Vlad Buslov

Extract code that flushes and puts all chains on tcf block to two
standalone function to be shared with functions that locklessly get/put
reference to block.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 55 +
 1 file changed, 30 insertions(+), 25 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 90843b6a8fa9..cb7422af5e51 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -527,6 +527,31 @@ static struct tcf_block *tcf_block_lookup(struct net *net, 
u32 block_index)
return idr_find(>idr, block_index);
 }
 
+static void tcf_block_flush_all_chains(struct tcf_block *block)
+{
+   struct tcf_chain *chain;
+
+   /* Hold a refcnt for all chains, so that they don't disappear
+* while we are iterating.
+*/
+   list_for_each_entry(chain, >chain_list, list)
+   tcf_chain_hold(chain);
+
+   list_for_each_entry(chain, >chain_list, list)
+   tcf_chain_flush(chain);
+}
+
+static void tcf_block_put_all_chains(struct tcf_block *block)
+{
+   struct tcf_chain *chain, *tmp;
+
+   /* At this point, all the chains should have refcnt >= 1. */
+   list_for_each_entry_safe(chain, tmp, >chain_list, list) {
+   tcf_chain_put_explicitly_created(chain);
+   tcf_chain_put(chain);
+   }
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -786,8 +811,6 @@ EXPORT_SYMBOL(tcf_block_get);
 void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
   struct tcf_block_ext_info *ei)
 {
-   struct tcf_chain *chain, *tmp;
-
if (!block)
return;
tcf_chain0_head_change_cb_del(block, ei);
@@ -804,32 +827,14 @@ void tcf_block_put_ext(struct tcf_block *block, struct 
Qdisc *q,
 
if (tcf_block_shared(block))
tcf_block_remove(block, block->net);
-
-   if (!free_block) {
-   /* Hold a refcnt for all chains, so that they don't
-* disappear while we are iterating.
-*/
-   list_for_each_entry(chain, >chain_list, list)
-   tcf_chain_hold(chain);
-
-   list_for_each_entry(chain, >chain_list, list)
-   tcf_chain_flush(chain);
-   }
-
+   if (!free_block)
+   tcf_block_flush_all_chains(block);
tcf_block_offload_unbind(block, q, ei);
 
-   if (free_block) {
+   if (free_block)
kfree(block);
-   } else {
-   /* At this point, all the chains should have
-* refcnt >= 1.
-*/
-   list_for_each_entry_safe(chain, tmp, >chain_list,
-list) {
-   tcf_chain_put_explicitly_created(chain);
-   tcf_chain_put(chain);
-   }
-   }
+   else
+   tcf_block_put_all_chains(block);
} else {
tcf_block_offload_unbind(block, q, ei);
}
-- 
2.7.5

[PATCH net-next v3 08/10] net: sched: protect block idr with spinlock

2018-09-24 Thread Vlad Buslov

Protect block idr access with spinlock, instead of relying on rtnl lock.
Take tn->idr_lock spinlock during block insertion and removal.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index cb7422af5e51..5a21888d0ee9 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -473,6 +473,7 @@ tcf_chain0_head_change_cb_del(struct tcf_block *block,
 }
 
 struct tcf_net {
+   spinlock_t idr_lock; /* Protects idr */
struct idr idr;
 };
 
@@ -482,16 +483,25 @@ static int tcf_block_insert(struct tcf_block *block, 
struct net *net,
struct netlink_ext_ack *extack)
 {
struct tcf_net *tn = net_generic(net, tcf_net_id);
+   int err;
+
+   idr_preload(GFP_KERNEL);
+   spin_lock(>idr_lock);
+   err = idr_alloc_u32(>idr, block, >index, block->index,
+   GFP_NOWAIT);
+   spin_unlock(>idr_lock);
+   idr_preload_end();
 
-   return idr_alloc_u32(>idr, block, >index, block->index,
-GFP_KERNEL);
+   return err;
 }
 
 static void tcf_block_remove(struct tcf_block *block, struct net *net)
 {
struct tcf_net *tn = net_generic(net, tcf_net_id);
 
+   spin_lock(>idr_lock);
idr_remove(>idr, block->index);
+   spin_unlock(>idr_lock);
 }
 
 static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
@@ -2278,6 +2288,7 @@ static __net_init int tcf_net_init(struct net *net)
 {
struct tcf_net *tn = net_generic(net, tcf_net_id);
 
+   spin_lock_init(>idr_lock);
idr_init(>idr);
return 0;
 }
-- 
2.7.5

[PATCH net-next v3 10/10] net: sched: use reference counting for tcf blocks on rules update

2018-09-24 Thread Vlad Buslov

In order to remove dependency on rtnl lock on rules update path, always
take reference to block while using it on rules update path. Change
tcf_block_get() error handling to properly release block with reference
counting, instead of just destroying it, in order to accommodate potential
concurrent users.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 38 +-
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 49e3c10532ad..3de47e99b788 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -622,7 +622,7 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
int err = 0;
 
if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-   block = tcf_block_lookup(net, block_index);
+   block = tcf_block_refcnt_get(net, block_index);
if (!block) {
NL_SET_ERR_MSG(extack, "Block of given index was not 
found");
return ERR_PTR(-EINVAL);
@@ -702,6 +702,14 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
err = -EOPNOTSUPP;
goto errout_qdisc;
}
+
+   /* Always take reference to block in order to support execution
+* of rules update path of cls API without rtnl lock. Caller
+* must release block when it is finished using it. 'if' block
+* of this conditional obtain reference to block by calling
+* tcf_block_refcnt_get().
+*/
+   refcount_inc(>refcnt);
}
 
return block;
@@ -716,6 +724,9 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
 
 static void tcf_block_release(struct Qdisc *q, struct tcf_block *block)
 {
+   if (!IS_ERR_OR_NULL(block))
+   tcf_block_refcnt_put(block);
+
if (q)
qdisc_put(q);
 }
@@ -785,21 +796,16 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
 {
struct net *net = qdisc_net(q);
struct tcf_block *block = NULL;
-   bool created = false;
int err;
 
-   if (ei->block_index) {
+   if (ei->block_index)
/* block_index not 0 means the shared block is requested */
-   block = tcf_block_lookup(net, ei->block_index);
-   if (block)
-   refcount_inc(>refcnt);
-   }
+   block = tcf_block_refcnt_get(net, ei->block_index);
 
if (!block) {
block = tcf_block_create(net, q, ei->block_index, extack);
if (IS_ERR(block))
return PTR_ERR(block);
-   created = true;
if (tcf_block_shared(block)) {
err = tcf_block_insert(block, net, extack);
if (err)
@@ -829,14 +835,8 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
 err_chain0_head_change_cb_add:
tcf_block_owner_del(block, q, ei->binder_type);
 err_block_owner_add:
-   if (created) {
-   if (tcf_block_shared(block))
-   tcf_block_remove(block, net);
 err_block_insert:
-   kfree_rcu(block, rcu);
-   } else {
-   refcount_dec(>refcnt);
-   }
+   tcf_block_refcnt_put(block);
return err;
 }
 EXPORT_SYMBOL(tcf_block_get_ext);
@@ -1730,7 +1730,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct 
netlink_callback *cb)
return err;
 
if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-   block = tcf_block_lookup(net, tcm->tcm_block_index);
+   block = tcf_block_refcnt_get(net, tcm->tcm_block_index);
if (!block)
goto out;
/* If we work with block index, q is NULL and parent value
@@ -1789,6 +1789,8 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct 
netlink_callback *cb)
}
}
 
+   if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK)
+   tcf_block_refcnt_put(block);
cb->args[0] = index;
 
 out:
@@ -2055,7 +2057,7 @@ static int tc_dump_chain(struct sk_buff *skb, struct 
netlink_callback *cb)
return err;
 
if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-   block = tcf_block_lookup(net, tcm->tcm_block_index);
+   block = tcf_block_refcnt_get(net, tcm->tcm_block_index);
if (!block)
goto out;
/* If we work with block index, q is NULL and parent value
@@ -2122,6 +2124,8 @@ static int tc_dump_chain(struct sk_buff *skb, struct 
netlink_callback *cb)
index++;
}
 
+   if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK)
+

[PATCH net-next v3 09/10] net: sched: implement tcf_block_refcnt_{get|put}()

2018-09-24 Thread Vlad Buslov

Implement get/put function for blocks that only take/release the reference
and perform deallocation. These functions are intended to be used by
unlocked rules update path to always hold reference to block while working
with it. They use on new fine-grained locking mechanisms introduced in
previous patches in this set, instead of relying on global protection
provided by rtnl lock.

Extract code that is common with tcf_block_detach_ext() into common
function __tcf_block_put().

Extend tcf_block with rcu to allow safe deallocation when it is accessed
concurrently.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  1 +
 net/sched/cls_api.c   | 74 ---
 2 files changed, 51 insertions(+), 24 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 4a86f4d33f07..7a6b71ee5433 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -357,6 +357,7 @@ struct tcf_block {
struct tcf_chain *chain;
struct list_head filter_chain_list;
} chain0;
+   struct rcu_head rcu;
 };
 
 static inline void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 5a21888d0ee9..49e3c10532ad 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -241,7 +241,7 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
block->chain0.chain = NULL;
kfree(chain);
if (list_empty(>chain_list) && !refcount_read(>refcnt))
-   kfree(block);
+   kfree_rcu(block, rcu);
 }
 
 static void tcf_chain_hold(struct tcf_chain *chain)
@@ -537,6 +537,19 @@ static struct tcf_block *tcf_block_lookup(struct net *net, 
u32 block_index)
return idr_find(>idr, block_index);
 }
 
+static struct tcf_block *tcf_block_refcnt_get(struct net *net, u32 block_index)
+{
+   struct tcf_block *block;
+
+   rcu_read_lock();
+   block = tcf_block_lookup(net, block_index);
+   if (block && !refcount_inc_not_zero(>refcnt))
+   block = NULL;
+   rcu_read_unlock();
+
+   return block;
+}
+
 static void tcf_block_flush_all_chains(struct tcf_block *block)
 {
struct tcf_chain *chain;
@@ -562,6 +575,40 @@ static void tcf_block_put_all_chains(struct tcf_block 
*block)
}
 }
 
+static void __tcf_block_put(struct tcf_block *block, struct Qdisc *q,
+   struct tcf_block_ext_info *ei)
+{
+   if (refcount_dec_and_test(>refcnt)) {
+   /* Flushing/putting all chains will cause the block to be
+* deallocated when last chain is freed. However, if chain_list
+* is empty, block has to be manually deallocated. After block
+* reference counter reached 0, it is no longer possible to
+* increment it or add new chains to block.
+*/
+   bool free_block = list_empty(>chain_list);
+
+   if (tcf_block_shared(block))
+   tcf_block_remove(block, block->net);
+   if (!free_block)
+   tcf_block_flush_all_chains(block);
+
+   if (q)
+   tcf_block_offload_unbind(block, q, ei);
+
+   if (free_block)
+   kfree_rcu(block, rcu);
+   else
+   tcf_block_put_all_chains(block);
+   } else if (q) {
+   tcf_block_offload_unbind(block, q, ei);
+   }
+}
+
+static void tcf_block_refcnt_put(struct tcf_block *block)
+{
+   __tcf_block_put(block, NULL, NULL);
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -786,7 +833,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
if (tcf_block_shared(block))
tcf_block_remove(block, net);
 err_block_insert:
-   kfree(block);
+   kfree_rcu(block, rcu);
} else {
refcount_dec(>refcnt);
}
@@ -826,28 +873,7 @@ void tcf_block_put_ext(struct tcf_block *block, struct 
Qdisc *q,
tcf_chain0_head_change_cb_del(block, ei);
tcf_block_owner_del(block, q, ei->binder_type);
 
-   if (refcount_dec_and_test(>refcnt)) {
-   /* Flushing/putting all chains will cause the block to be
-* deallocated when last chain is freed. However, if chain_list
-* is empty, block has to be manually deallocated. After block
-* reference counter reached 0, it is no longer possible to
-* increment it or add new chains to block.
-*/
-   bool free_block = list_empty(>chain_list);
-
-   if (tcf_block_shared(block))
-   tcf_block_remove(block, block->net);
-   if (!free_block)
-

[PATCH net-next v3 03/10] net: sched: extend Qdisc with rcu

2018-09-24 Thread Vlad Buslov

Currently, Qdisc API functions assume that users have rtnl lock taken. To
implement rtnl unlocked classifiers update interface, Qdisc API must be
extended with functions that do not require rtnl lock.

Extend Qdisc structure with rcu. Implement special version of put function
qdisc_put_unlocked() that is called without rtnl lock taken. This function
only takes rtnl lock if Qdisc reference counter reached zero and is
intended to be used as optimization.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/linux/rtnetlink.h |  5 +
 include/net/pkt_sched.h   |  1 +
 include/net/sch_generic.h |  2 ++
 net/sched/sch_api.c   | 18 ++
 net/sched/sch_generic.c   | 25 -
 5 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 9cdd76348d9a..bb9cb84114c1 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -85,6 +85,11 @@ static inline struct netdev_queue *dev_ingress_queue(struct 
net_device *dev)
return rtnl_dereference(dev->ingress_queue);
 }
 
+static inline struct netdev_queue *dev_ingress_queue_rcu(struct net_device 
*dev)
+{
+   return rcu_dereference(dev->ingress_queue);
+}
+
 struct netdev_queue *dev_ingress_queue_create(struct net_device *dev);
 
 #ifdef CONFIG_NET_INGRESS
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 7dc769e5452b..a16fbe9a2a67 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -102,6 +102,7 @@ int qdisc_set_default(const char *id);
 void qdisc_hash_add(struct Qdisc *q, bool invisible);
 void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
+struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
struct nlattr *tab,
struct netlink_ext_ack *extack);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index fadb1a4d4ee8..091b40c198ff 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -105,6 +105,7 @@ struct Qdisc {
 
spinlock_t  busylock cacheline_aligned_in_smp;
spinlock_t  seqlock;
+   struct rcu_head rcu;
 };
 
 static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
@@ -555,6 +556,7 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue 
*dev_queue,
  struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
 void qdisc_put(struct Qdisc *qdisc);
+void qdisc_put_unlocked(struct Qdisc *qdisc);
 void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n,
   unsigned int len);
 struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 2096138c4bf6..22e9799e5b69 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -314,6 +314,24 @@ struct Qdisc *qdisc_lookup(struct net_device *dev, u32 
handle)
return q;
 }
 
+struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle)
+{
+   struct netdev_queue *nq;
+   struct Qdisc *q;
+
+   if (!handle)
+   return NULL;
+   q = qdisc_match_from_root(dev->qdisc, handle);
+   if (q)
+   goto out;
+
+   nq = dev_ingress_queue_rcu(dev);
+   if (nq)
+   q = qdisc_match_from_root(nq->qdisc_sleeping, handle);
+out:
+   return q;
+}
+
 static struct Qdisc *qdisc_leaf(struct Qdisc *p, u32 classid)
 {
unsigned long cl;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 3e7696f3e053..531fac1d2875 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -941,6 +941,13 @@ void qdisc_free(struct Qdisc *qdisc)
kfree((char *) qdisc - qdisc->padded);
 }
 
+void qdisc_free_cb(struct rcu_head *head)
+{
+   struct Qdisc *q = container_of(head, struct Qdisc, rcu);
+
+   qdisc_free(q);
+}
+
 static void qdisc_destroy(struct Qdisc *qdisc)
 {
const struct Qdisc_ops  *ops = qdisc->ops;
@@ -970,7 +977,7 @@ static void qdisc_destroy(struct Qdisc *qdisc)
kfree_skb_list(skb);
}
 
-   qdisc_free(qdisc);
+   call_rcu(>rcu, qdisc_free_cb);
 }
 
 void qdisc_put(struct Qdisc *qdisc)
@@ -983,6 +990,22 @@ void qdisc_put(struct Qdisc *qdisc)
 }
 EXPORT_SYMBOL(qdisc_put);
 
+/* Version of qdisc_put() that is called with rtnl mutex unlocked.
+ * Intended to be used as optimization, this function only takes rtnl lock if
+ * qdisc reference counter reached zero.
+ */
+
+void qdisc_put_unlocked(struct Qdisc *qdisc)
+{
+   if (qdisc->flags & TCQ_F_BUILTIN ||
+   !refcount_dec_and_rtnl_lock(>refcnt))
+   return;
+
+   qdisc_destroy(qdisc);
+   rtnl_unlock();
+}
+EXPORT_SYMBOL(qdisc_put_unlocked);
+
 /* Attach toplevel qdisc to

[PATCH net-next v3 06/10] net: sched: change tcf block reference counter type to refcount_t

2018-09-24 Thread Vlad Buslov

As a preparation for removing rtnl lock dependency from rules update path,
change tcf block reference counter type to refcount_t to allow modification
by concurrent users.

In block put function perform decrement and check reference counter once to
accommodate concurrent modification by unlocked users. After this change
tcf_chain_put at the end of block put function is called with
block->refcnt==0 and will deallocate block after the last chain is
released, so there is no need to manually deallocate block in this case.
However, if block reference counter reached 0 and there are no chains to
release, block must still be deallocated manually.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  2 +-
 net/sched/cls_api.c   | 59 ---
 2 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 43b17f82d8ee..4a86f4d33f07 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -345,7 +345,7 @@ struct tcf_chain {
 struct tcf_block {
struct list_head chain_list;
u32 index; /* block index for shared blocks */
-   unsigned int refcnt;
+   refcount_t refcnt;
struct net *net;
struct Qdisc *q;
struct list_head cb_list;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index c33636e7b431..90843b6a8fa9 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -240,7 +240,7 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
if (!chain->index)
block->chain0.chain = NULL;
kfree(chain);
-   if (list_empty(>chain_list) && block->refcnt == 0)
+   if (list_empty(>chain_list) && !refcount_read(>refcnt))
kfree(block);
 }
 
@@ -510,7 +510,7 @@ static struct tcf_block *tcf_block_create(struct net *net, 
struct Qdisc *q,
INIT_LIST_HEAD(>owner_list);
INIT_LIST_HEAD(>chain0.filter_chain_list);
 
-   block->refcnt = 1;
+   refcount_set(>refcnt, 1);
block->net = net;
block->index = block_index;
 
@@ -710,7 +710,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
/* block_index not 0 means the shared block is requested */
block = tcf_block_lookup(net, ei->block_index);
if (block)
-   block->refcnt++;
+   refcount_inc(>refcnt);
}
 
if (!block) {
@@ -753,7 +753,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
 err_block_insert:
kfree(block);
} else {
-   block->refcnt--;
+   refcount_dec(>refcnt);
}
return err;
 }
@@ -793,34 +793,45 @@ void tcf_block_put_ext(struct tcf_block *block, struct 
Qdisc *q,
tcf_chain0_head_change_cb_del(block, ei);
tcf_block_owner_del(block, q, ei->binder_type);
 
-   if (block->refcnt == 1) {
-   if (tcf_block_shared(block))
-   tcf_block_remove(block, block->net);
-
-   /* Hold a refcnt for all chains, so that they don't disappear
-* while we are iterating.
+   if (refcount_dec_and_test(>refcnt)) {
+   /* Flushing/putting all chains will cause the block to be
+* deallocated when last chain is freed. However, if chain_list
+* is empty, block has to be manually deallocated. After block
+* reference counter reached 0, it is no longer possible to
+* increment it or add new chains to block.
 */
-   list_for_each_entry(chain, >chain_list, list)
-   tcf_chain_hold(chain);
+   bool free_block = list_empty(>chain_list);
 
-   list_for_each_entry(chain, >chain_list, list)
-   tcf_chain_flush(chain);
-   }
+   if (tcf_block_shared(block))
+   tcf_block_remove(block, block->net);
 
-   tcf_block_offload_unbind(block, q, ei);
+   if (!free_block) {
+   /* Hold a refcnt for all chains, so that they don't
+* disappear while we are iterating.
+*/
+   list_for_each_entry(chain, >chain_list, list)
+   tcf_chain_hold(chain);
 
-   if (block->refcnt == 1) {
-   /* At this point, all the chains should have refcnt >= 1. */
-   list_for_each_entry_safe(chain, tmp, >chain_list, list) {
-   tcf_chain_put_explicitly_created(chain);
-   tcf_chain_put(chain);
+   list_for_each_entry(chain, >chain_list, list)
+   tcf_chain_flush(chain);
}
 
-   block-

[PATCH net-next v3 02/10] net: sched: rename qdisc_destroy() to qdisc_put()

2018-09-24 Thread Vlad Buslov

Current implementation of qdisc_destroy() decrements Qdisc reference
counter and only actually destroy Qdisc if reference counter value reached
zero. Rename qdisc_destroy() to qdisc_put() in order for it to better
describe the way in which this function currently implemented and used.

Extract code that deallocates Qdisc into new private qdisc_destroy()
function. It is intended to be shared between regular qdisc_put() and its
unlocked version that is introduced in next patch in this series.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  2 +-
 net/sched/sch_api.c   |  6 +++---
 net/sched/sch_atm.c   |  2 +-
 net/sched/sch_cbq.c   |  2 +-
 net/sched/sch_cbs.c   |  2 +-
 net/sched/sch_drr.c   |  4 ++--
 net/sched/sch_dsmark.c|  2 +-
 net/sched/sch_fifo.c  |  2 +-
 net/sched/sch_generic.c   | 23 ++-
 net/sched/sch_hfsc.c  |  2 +-
 net/sched/sch_htb.c   |  4 ++--
 net/sched/sch_mq.c|  4 ++--
 net/sched/sch_mqprio.c|  4 ++--
 net/sched/sch_multiq.c|  6 +++---
 net/sched/sch_netem.c |  2 +-
 net/sched/sch_prio.c  |  6 +++---
 net/sched/sch_qfq.c   |  4 ++--
 net/sched/sch_red.c   |  4 ++--
 net/sched/sch_sfb.c   |  4 ++--
 net/sched/sch_tbf.c   |  4 ++--
 20 files changed, 47 insertions(+), 42 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index d326fd553b58..fadb1a4d4ee8 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -554,7 +554,7 @@ void dev_deactivate_many(struct list_head *head);
 struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
  struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
-void qdisc_destroy(struct Qdisc *qdisc);
+void qdisc_put(struct Qdisc *qdisc);
 void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n,
   unsigned int len);
 struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 411c40344b77..2096138c4bf6 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -920,7 +920,7 @@ static void notify_and_destroy(struct net *net, struct 
sk_buff *skb,
qdisc_notify(net, skb, n, clid, old, new);
 
if (old)
-   qdisc_destroy(old);
+   qdisc_put(old);
 }
 
 /* Graft qdisc "new" to class "classid" of qdisc "parent" or
@@ -973,7 +973,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc 
*parent,
qdisc_refcount_inc(new);
 
if (!ingress)
-   qdisc_destroy(old);
+   qdisc_put(old);
}
 
 skip:
@@ -1561,7 +1561,7 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct 
nlmsghdr *n,
err = qdisc_graft(dev, p, skb, n, clid, q, NULL, extack);
if (err) {
if (q)
-   qdisc_destroy(q);
+   qdisc_put(q);
return err;
}
 
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index cd49afca9617..d714d3747bcb 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -150,7 +150,7 @@ static void atm_tc_put(struct Qdisc *sch, unsigned long cl)
pr_debug("atm_tc_put: destroying\n");
list_del_init(>list);
pr_debug("atm_tc_put: qdisc %p\n", flow->q);
-   qdisc_destroy(flow->q);
+   qdisc_put(flow->q);
tcf_block_put(flow->block);
if (flow->sock) {
pr_debug("atm_tc_put: f_count %ld\n",
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index f42025d53cfe..4dc05409e3fb 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1418,7 +1418,7 @@ static void cbq_destroy_class(struct Qdisc *sch, struct 
cbq_class *cl)
WARN_ON(cl->filters);
 
tcf_block_put(cl->block);
-   qdisc_destroy(cl->q);
+   qdisc_put(cl->q);
qdisc_put_rtab(cl->R_tab);
gen_kill_estimator(>rate_est);
if (cl != >link)
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index e26a24017faa..e689e11b6d0f 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -379,7 +379,7 @@ static void cbs_destroy(struct Qdisc *sch)
cbs_disable_offload(dev, q);
 
if (q->qdisc)
-   qdisc_destroy(q->qdisc);
+   qdisc_put(q->qdisc);
 }
 
 static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb)
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index e0b0cf8a9939..cdebaed0f8cf 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -134,7 +134,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, 
u32 parentid,
tca[TCA_RATE]);
if (err) {
NL_SET_ERR_MSG(extack, "Fail

[PATCH net-next v3 04/10] net: sched: add helper function to take reference to Qdisc

2018-09-24 Thread Vlad Buslov

Implement function to take reference to Qdisc that relies on rcu read lock
instead of rtnl mutex. Function only takes reference to Qdisc if reference
counter isn't zero. Intended to be used by unlocked cls API.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 091b40c198ff..43b17f82d8ee 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -115,6 +115,19 @@ static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
refcount_inc(>refcnt);
 }
 
+/* Intended to be used by unlocked users, when concurrent qdisc release is
+ * possible.
+ */
+
+static inline struct Qdisc *qdisc_refcount_inc_nz(struct Qdisc *qdisc)
+{
+   if (qdisc->flags & TCQ_F_BUILTIN)
+   return qdisc;
+   if (refcount_inc_not_zero(>refcnt))
+   return qdisc;
+   return NULL;
+}
+
 static inline bool qdisc_is_running(struct Qdisc *qdisc)
 {
if (qdisc->flags & TCQ_F_NOLOCK)
-- 
2.7.5

[PATCH net-next v3 05/10] net: sched: use Qdisc rcu API instead of relying on rtnl lock

2018-09-24 Thread Vlad Buslov

As a preparation from removing rtnl lock dependency from rules update path,
use Qdisc rcu and reference counting capabilities instead of relying on
rtnl lock while working with Qdiscs. Create new tcf_block_release()
function, and use it to free resources taken by tcf_block_find().
Currently, this function only releases Qdisc and it is extended in next
patches in this series.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 79 +++--
 1 file changed, 64 insertions(+), 15 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 0a75cb2e5e7b..c33636e7b431 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -537,6 +537,7 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
struct netlink_ext_ack *extack)
 {
struct tcf_block *block;
+   int err = 0;
 
if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
block = tcf_block_lookup(net, block_index);
@@ -548,55 +549,93 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
const struct Qdisc_class_ops *cops;
struct net_device *dev;
 
+   rcu_read_lock();
+
/* Find link */
-   dev = __dev_get_by_index(net, ifindex);
-   if (!dev)
+   dev = dev_get_by_index_rcu(net, ifindex);
+   if (!dev) {
+   rcu_read_unlock();
return ERR_PTR(-ENODEV);
+   }
 
/* Find qdisc */
if (!*parent) {
*q = dev->qdisc;
*parent = (*q)->handle;
} else {
-   *q = qdisc_lookup(dev, TC_H_MAJ(*parent));
+   *q = qdisc_lookup_rcu(dev, TC_H_MAJ(*parent));
if (!*q) {
NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't 
exists");
-   return ERR_PTR(-EINVAL);
+   err = -EINVAL;
+   goto errout_rcu;
}
}
 
+   *q = qdisc_refcount_inc_nz(*q);
+   if (!*q) {
+   NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
+   err = -EINVAL;
+   goto errout_rcu;
+   }
+
/* Is it classful? */
cops = (*q)->ops->cl_ops;
if (!cops) {
NL_SET_ERR_MSG(extack, "Qdisc not classful");
-   return ERR_PTR(-EINVAL);
+   err = -EINVAL;
+   goto errout_rcu;
}
 
if (!cops->tcf_block) {
NL_SET_ERR_MSG(extack, "Class doesn't support blocks");
-   return ERR_PTR(-EOPNOTSUPP);
+   err = -EOPNOTSUPP;
+   goto errout_rcu;
}
 
+   /* At this point we know that qdisc is not noop_qdisc,
+* which means that qdisc holds a reference to net_device
+* and we hold a reference to qdisc, so it is safe to release
+* rcu read lock.
+*/
+   rcu_read_unlock();
+
/* Do we search for filter, attached to class? */
if (TC_H_MIN(*parent)) {
*cl = cops->find(*q, *parent);
if (*cl == 0) {
NL_SET_ERR_MSG(extack, "Specified class doesn't 
exist");
-   return ERR_PTR(-ENOENT);
+   err = -ENOENT;
+   goto errout_qdisc;
}
}
 
/* And the last stroke */
block = cops->tcf_block(*q, *cl, extack);
-   if (!block)
-   return ERR_PTR(-EINVAL);
+   if (!block) {
+   err = -EINVAL;
+   goto errout_qdisc;
+   }
if (tcf_block_shared(block)) {
NL_SET_ERR_MSG(extack, "This filter block is shared. 
Please use the block index to manipulate the filters");
-   return ERR_PTR(-EOPNOTSUPP);
+   err = -EOPNOTSUPP;
+   goto errout_qdisc;
}
}
 
return block;
+
+errout_rcu:
+   rcu_read_unlock();
+errout_qdisc:
+   if (*q)
+   qdisc_put(*q);
+   return ERR_PTR(err);
+}
+
+static void tcf_block_release(struct Qdisc *q, struct tcf_block *block)
+{
+   if (q)
+   qdisc_put(q);
 }
 
 struct tcf_block_owner_item {
@@ -1332,6 +1371,7 @@ static int tc

[PATCH net-next v3 00/10] Refactor classifier API to work with Qdisc/blocks without rtnl lock

2018-09-24 Thread Vlad Buslov

Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a third step to remove
rtnl lock dependency from TC rules update path.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
Handlers registered with this flag are called without RTNL taken. End
goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER,
etc.) to be registered with UNLOCKED flag to allow parallel execution.
However, there is no intention to completely remove or split rtnl lock
itself. This patch set addresses specific problems in implementation of
classifiers API that prevent its control path from being executed
concurrently. Additional changes are required to refactor classifiers
API and individual classifiers for parallel execution. This patch set
lays groundwork to eventually register rule update handlers as
rtnl-unlocked by modifying code in cls API that works with Qdiscs and
blocks. Following patch set does the same for chains and classifiers.

The goal of this change is to refactor tcf_block_find() and its
dependencies to allow concurrent execution:
- Extend Qdisc API with rcu to lookup and take reference to Qdisc
  without relying on rtnl lock.
- Extend tcf_block with atomic reference counting and rcu.
- Always take reference to tcf_block while working with it.
- Implement tcf_block_release() to release resources obtained by
  tcf_block_find()
- Create infrastructure to allow registering Qdiscs with class ops that
  do not require the caller to hold rtnl lock.

All three netlink rule update handlers use tcf_block_find() to lookup
Qdisc and block, and this patch set introduces additional means of
synchronization to substitute rtnl lock in cls API.

Some functions in cls and sch APIs have historic names that no longer
clearly describe their intent. In order not make this code even more
confusing when introducing their concurrency-friendly versions, rename
these functions to describe actual implementation.

Changes from V2 to V3:
- Patch 1:
  - Explicitly include refcount.h in rtnetlink.h.
- Patch 3:
  - Move rcu_head field to the end of struct Qdisc.
  - Rearrange local variable declarations in qdisc_lookup_rcu().
- Patch 5:
  - Remove tcf_qdisc_put() and inline its content to callers.

Changes from V1 to V2:
- Rebase on latest net-next.
- Patch 8 - remove.
- Patch 9 - fold into patch 11.
- Patch 11:
  - Rename tcf_block_{get|put}() to tcf_block_refcnt_{get|put}().
- Patch 13 - remove.

Vlad Buslov (10):
  net: core: netlink: add helper refcount dec and lock function
  net: sched: rename qdisc_destroy() to qdisc_put()
  net: sched: extend Qdisc with rcu
  net: sched: add helper function to take reference to Qdisc
  net: sched: use Qdisc rcu API instead of relying on rtnl lock
  net: sched: change tcf block reference counter type to refcount_t
  net: sched: implement functions to put and flush all chains
  net: sched: protect block idr with spinlock
  net: sched: implement tcf_block_refcnt_{get|put}()
  net: sched: use reference counting for tcf blocks on rules update

 include/linux/rtnetlink.h |   7 ++
 include/net/pkt_sched.h   |   1 +
 include/net/sch_generic.h |  20 +++-
 net/core/rtnetlink.c  |   6 ++
 net/sched/cls_api.c   | 242 +-
 net/sched/sch_api.c   |  24 -
 net/sched/sch_atm.c   |   2 +-
 net/sched/sch_cbq.c   |   2 +-
 net/sched/sch_cbs.c   |   2 +-
 net/sched/sch_drr.c   |   4 +-
 net/sched/sch_dsmark.c|   2 +-
 net/sched/sch_fifo.c  |   2 +-
 net/sched/sch_generic.c   |  48 +++--
 net/sched/sch_hfsc.c  |   2 +-
 net/sched/sch_htb.c   |   4 +-
 net/sched/sch_mq.c|   4 +-
 net/sched/sch_mqprio.c|   4 +-
 net/sched/sch_multiq.c|   6 +-
 net/sched/sch_netem.c |   2 +-
 net/sched/sch_prio.c  |   6 +-
 net/sched/sch_qfq.c   |   4 +-
 net/sched/sch_red.c   |   4 +-
 net/sched/sch_sfb.c   |   4 +-
 net/sched/sch_tbf.c   |   4 +-
 24 files changed, 294 insertions(+), 112 deletions(-)

-- 
2.7.5

[PATCH net-next v3 01/10] net: core: netlink: add helper refcount dec and lock function

2018-09-24 Thread Vlad Buslov

Rtnl lock is encapsulated in netlink and cannot be accessed by other
modules directly. This means that reference counted objects that rely on
rtnl lock cannot use it with refcounter helper function that atomically
releases decrements reference and obtains mutex.

This patch implements simple wrapper function around refcount_dec_and_lock
that obtains rtnl lock if reference counter value reached 0.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/linux/rtnetlink.h | 2 ++
 net/core/rtnetlink.c  | 6 ++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 5225832bd6ff..9cdd76348d9a 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 extern int rtnetlink_send(struct sk_buff *skb, struct net *net, u32 pid, u32 
group, int echo);
@@ -34,6 +35,7 @@ extern void rtnl_unlock(void);
 extern int rtnl_trylock(void);
 extern int rtnl_is_locked(void);
 extern int rtnl_lock_killable(void);
+extern bool refcount_dec_and_rtnl_lock(refcount_t *r);
 
 extern wait_queue_head_t netdev_unregistering_wq;
 extern struct rw_semaphore pernet_ops_rwsem;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 80a7e18c65fb..35162e1b06ad 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -130,6 +130,12 @@ int rtnl_is_locked(void)
 }
 EXPORT_SYMBOL(rtnl_is_locked);
 
+bool refcount_dec_and_rtnl_lock(refcount_t *r)
+{
+   return refcount_dec_and_mutex_lock(r, _mutex);
+}
+EXPORT_SYMBOL(refcount_dec_and_rtnl_lock);
+
 #ifdef CONFIG_PROVE_LOCKING
 bool lockdep_rtnl_is_held(void)
 {
-- 
2.7.5

[PATCH net-next v3 00/10] Refactor classifier API to work with Qdisc/blocks without rtnl lock

2018-09-24 Thread Vlad Buslov

Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a third step to remove
rtnl lock dependency from TC rules update path.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
Handlers registered with this flag are called without RTNL taken. End
goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER,
etc.) to be registered with UNLOCKED flag to allow parallel execution.
However, there is no intention to completely remove or split rtnl lock
itself. This patch set addresses specific problems in implementation of
classifiers API that prevent its control path from being executed
concurrently. Additional changes are required to refactor classifiers
API and individual classifiers for parallel execution. This patch set
lays groundwork to eventually register rule update handlers as
rtnl-unlocked by modifying code in cls API that works with Qdiscs and
blocks. Following patch set does the same for chains and classifiers.

The goal of this change is to refactor tcf_block_find() and its
dependencies to allow concurrent execution:
- Extend Qdisc API with rcu to lookup and take reference to Qdisc
  without relying on rtnl lock.
- Extend tcf_block with atomic reference counting and rcu.
- Always take reference to tcf_block while working with it.
- Implement tcf_block_release() to release resources obtained by
  tcf_block_find()
- Create infrastructure to allow registering Qdiscs with class ops that
  do not require the caller to hold rtnl lock.

All three netlink rule update handlers use tcf_block_find() to lookup
Qdisc and block, and this patch set introduces additional means of
synchronization to substitute rtnl lock in cls API.

Some functions in cls and sch APIs have historic names that no longer
clearly describe their intent. In order not make this code even more
confusing when introducing their concurrency-friendly versions, rename
these functions to describe actual implementation.

Changes from V2 to V3:
- Patch 1:
  - Explicitly include refcount.h in rtnetlink.h.
- Patch 3:
  - Move rcu_head field to the end of struct Qdisc.
  - Rearrange local variable declarations in qdisc_lookup_rcu().
- Patch 5:
  - Remove tcf_qdisc_put() and inline its content to callers.

Changes from V1 to V2:
- Rebase on latest net-next.
- Patch 8 - remove.
- Patch 9 - fold into patch 11.
- Patch 11:
  - Rename tcf_block_{get|put}() to tcf_block_refcnt_{get|put}().
- Patch 13 - remove.

Vlad Buslov (10):
  net: core: netlink: add helper refcount dec and lock function
  net: sched: rename qdisc_destroy() to qdisc_put()
  net: sched: extend Qdisc with rcu
  net: sched: add helper function to take reference to Qdisc
  net: sched: use Qdisc rcu API instead of relying on rtnl lock
  net: sched: change tcf block reference counter type to refcount_t
  net: sched: implement functions to put and flush all chains
  net: sched: protect block idr with spinlock
  net: sched: implement tcf_block_refcnt_{get|put}()
  net: sched: use reference counting for tcf blocks on rules update

 include/linux/rtnetlink.h |   7 ++
 include/net/pkt_sched.h   |   1 +
 include/net/sch_generic.h |  20 +++-
 net/core/rtnetlink.c  |   6 ++
 net/sched/cls_api.c   | 242 +-
 net/sched/sch_api.c   |  24 -
 net/sched/sch_atm.c   |   2 +-
 net/sched/sch_cbq.c   |   2 +-
 net/sched/sch_cbs.c   |   2 +-
 net/sched/sch_drr.c   |   4 +-
 net/sched/sch_dsmark.c|   2 +-
 net/sched/sch_fifo.c  |   2 +-
 net/sched/sch_generic.c   |  48 +++--
 net/sched/sch_hfsc.c  |   2 +-
 net/sched/sch_htb.c   |   4 +-
 net/sched/sch_mq.c|   4 +-
 net/sched/sch_mqprio.c|   4 +-
 net/sched/sch_multiq.c|   6 +-
 net/sched/sch_netem.c |   2 +-
 net/sched/sch_prio.c  |   6 +-
 net/sched/sch_qfq.c   |   4 +-
 net/sched/sch_red.c   |   4 +-
 net/sched/sch_sfb.c   |   4 +-
 net/sched/sch_tbf.c   |   4 +-
 24 files changed, 294 insertions(+), 112 deletions(-)

-- 
2.7.5

Re: [PATCH net-next v2 08/10] net: sched: protect block idr with spinlock

2018-09-20 Thread Vlad Buslov



On Wed 19 Sep 2018 at 22:09, Cong Wang  wrote:
> On Mon, Sep 17, 2018 at 12:19 AM Vlad Buslov  wrote:
>> @@ -482,16 +483,25 @@ static int tcf_block_insert(struct tcf_block *block, 
>> struct net *net,
>> struct netlink_ext_ack *extack)
>>  {
>> struct tcf_net *tn = net_generic(net, tcf_net_id);
>> +   int err;
>> +
>> +   idr_preload(GFP_KERNEL);
>> +   spin_lock(>idr_lock);
>> +   err = idr_alloc_u32(>idr, block, >index, block->index,
>> +   GFP_NOWAIT);
>
>
> Why GFP_NOWAIT rather than GFP_ATOMIC here?

I checked how idr_preload is used in kernel and in most places following
allocation uses GFP_NOWAIT (including idr-test.c). You suggest I should
change it to GFP_ATOMIC?

Re: [PATCH net-next v2 05/10] net: sched: use Qdisc rcu API instead of relying on rtnl lock

2018-09-20 Thread Vlad Buslov

On Wed 19 Sep 2018 at 22:04, Cong Wang  wrote:
> On Mon, Sep 17, 2018 at 12:19 AM Vlad Buslov  wrote:
>> +static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
>> +{
>> +   if (!q)
>> +   return;
>> +
>> +   if (rtnl_held)
>> +   qdisc_put(q);
>> +   else
>> +   qdisc_put_unlocked(q);
>> +}
>
> This is very ugly. You should know whether RTNL is held or
> not when calling it.
>
> What's more, all of your code passes true, so why do you
> need a parameter for rtnl_held?

It passes true because currently rule update handlers still registered
as locked. This is a preparation for next patch set where this would be
changed to proper variable that depends on qdics and classifier type.

[PATCH net-next v2 04/10] net: sched: add helper function to take reference to Qdisc

2018-09-17 Thread Vlad Buslov

Implement function to take reference to Qdisc that relies on rcu read lock
instead of rtnl mutex. Function only takes reference to Qdisc if reference
counter isn't zero. Intended to be used by unlocked cls API.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 8a0d1523d11b..9a295e690efe 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -115,6 +115,19 @@ static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
refcount_inc(>refcnt);
 }
 
+/* Intended to be used by unlocked users, when concurrent qdisc release is
+ * possible.
+ */
+
+static inline struct Qdisc *qdisc_refcount_inc_nz(struct Qdisc *qdisc)
+{
+   if (qdisc->flags & TCQ_F_BUILTIN)
+   return qdisc;
+   if (refcount_inc_not_zero(>refcnt))
+   return qdisc;
+   return NULL;
+}
+
 static inline bool qdisc_is_running(struct Qdisc *qdisc)
 {
if (qdisc->flags & TCQ_F_NOLOCK)
-- 
2.7.5

[PATCH net-next v2 10/10] net: sched: use reference counting for tcf blocks on rules update

2018-09-17 Thread Vlad Buslov

In order to remove dependency on rtnl lock on rules update path, always
take reference to block while using it on rules update path. Change
tcf_block_get() error handling to properly release block with reference
counting, instead of just destroying it, in order to accommodate potential
concurrent users.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 37 -
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 0a7a3ace2da9..6a3eec5dbdf1 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -633,7 +633,7 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
int err = 0;
 
if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-   block = tcf_block_lookup(net, block_index);
+   block = tcf_block_refcnt_get(net, block_index);
if (!block) {
NL_SET_ERR_MSG(extack, "Block of given index was not 
found");
return ERR_PTR(-EINVAL);
@@ -713,6 +713,14 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
err = -EOPNOTSUPP;
goto errout_qdisc;
}
+
+   /* Always take reference to block in order to support execution
+* of rules update path of cls API without rtnl lock. Caller
+* must release block when it is finished using it. 'if' block
+* of this conditional obtain reference to block by calling
+* tcf_block_refcnt_get().
+*/
+   refcount_inc(>refcnt);
}
 
return block;
@@ -726,6 +734,8 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
 
 static void tcf_block_release(struct Qdisc *q, struct tcf_block *block)
 {
+   if (!IS_ERR_OR_NULL(block))
+   tcf_block_refcnt_put(block);
tcf_qdisc_put(q, true);
 }
 
@@ -794,21 +804,16 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
 {
struct net *net = qdisc_net(q);
struct tcf_block *block = NULL;
-   bool created = false;
int err;
 
-   if (ei->block_index) {
+   if (ei->block_index)
/* block_index not 0 means the shared block is requested */
-   block = tcf_block_lookup(net, ei->block_index);
-   if (block)
-   refcount_inc(>refcnt);
-   }
+   block = tcf_block_refcnt_get(net, ei->block_index);
 
if (!block) {
block = tcf_block_create(net, q, ei->block_index, extack);
if (IS_ERR(block))
return PTR_ERR(block);
-   created = true;
if (tcf_block_shared(block)) {
err = tcf_block_insert(block, net, extack);
if (err)
@@ -838,14 +843,8 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
 err_chain0_head_change_cb_add:
tcf_block_owner_del(block, q, ei->binder_type);
 err_block_owner_add:
-   if (created) {
-   if (tcf_block_shared(block))
-   tcf_block_remove(block, net);
 err_block_insert:
-   kfree_rcu(block, rcu);
-   } else {
-   refcount_dec(>refcnt);
-   }
+   tcf_block_refcnt_put(block);
return err;
 }
 EXPORT_SYMBOL(tcf_block_get_ext);
@@ -1739,7 +1738,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct 
netlink_callback *cb)
return err;
 
if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-   block = tcf_block_lookup(net, tcm->tcm_block_index);
+   block = tcf_block_refcnt_get(net, tcm->tcm_block_index);
if (!block)
goto out;
/* If we work with block index, q is NULL and parent value
@@ -1798,6 +1797,8 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct 
netlink_callback *cb)
}
}
 
+   if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK)
+   tcf_block_refcnt_put(block);
cb->args[0] = index;
 
 out:
@@ -2062,7 +2063,7 @@ static int tc_dump_chain(struct sk_buff *skb, struct 
netlink_callback *cb)
return err;
 
if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-   block = tcf_block_lookup(net, tcm->tcm_block_index);
+   block = tcf_block_refcnt_get(net, tcm->tcm_block_index);
if (!block)
goto out;
/* If we work with block index, q is NULL and parent value
@@ -2129,6 +2130,8 @@ static int tc_dump_chain(struct sk_buff *skb, struct 
netlink_callback *cb)
index++;
}
 
+   if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK)
+   tcf_block_refcnt_put(block);
cb->args[0] = index;
 
 out:
-- 
2.7.5

[PATCH net-next v2 07/10] net: sched: implement functions to put and flush all chains

2018-09-17 Thread Vlad Buslov

Extract code that flushes and puts all chains on tcf block to two
standalone function to be shared with functions that locklessly get/put
reference to block.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 55 +
 1 file changed, 30 insertions(+), 25 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index c3c7d4e2f84c..58b2d8443f6a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -538,6 +538,31 @@ static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
qdisc_put_unlocked(q);
 }
 
+static void tcf_block_flush_all_chains(struct tcf_block *block)
+{
+   struct tcf_chain *chain;
+
+   /* Hold a refcnt for all chains, so that they don't disappear
+* while we are iterating.
+*/
+   list_for_each_entry(chain, >chain_list, list)
+   tcf_chain_hold(chain);
+
+   list_for_each_entry(chain, >chain_list, list)
+   tcf_chain_flush(chain);
+}
+
+static void tcf_block_put_all_chains(struct tcf_block *block)
+{
+   struct tcf_chain *chain, *tmp;
+
+   /* At this point, all the chains should have refcnt >= 1. */
+   list_for_each_entry_safe(chain, tmp, >chain_list, list) {
+   tcf_chain_put_explicitly_created(chain);
+   tcf_chain_put(chain);
+   }
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -795,8 +820,6 @@ EXPORT_SYMBOL(tcf_block_get);
 void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
   struct tcf_block_ext_info *ei)
 {
-   struct tcf_chain *chain, *tmp;
-
if (!block)
return;
tcf_chain0_head_change_cb_del(block, ei);
@@ -813,32 +836,14 @@ void tcf_block_put_ext(struct tcf_block *block, struct 
Qdisc *q,
 
if (tcf_block_shared(block))
tcf_block_remove(block, block->net);
-
-   if (!free_block) {
-   /* Hold a refcnt for all chains, so that they don't
-* disappear while we are iterating.
-*/
-   list_for_each_entry(chain, >chain_list, list)
-   tcf_chain_hold(chain);
-
-   list_for_each_entry(chain, >chain_list, list)
-   tcf_chain_flush(chain);
-   }
-
+   if (!free_block)
+   tcf_block_flush_all_chains(block);
tcf_block_offload_unbind(block, q, ei);
 
-   if (free_block) {
+   if (free_block)
kfree(block);
-   } else {
-   /* At this point, all the chains should have
-* refcnt >= 1.
-*/
-   list_for_each_entry_safe(chain, tmp, >chain_list,
-list) {
-   tcf_chain_put_explicitly_created(chain);
-   tcf_chain_put(chain);
-   }
-   }
+   else
+   tcf_block_put_all_chains(block);
} else {
tcf_block_offload_unbind(block, q, ei);
}
-- 
2.7.5

[PATCH net-next v2 01/10] net: core: netlink: add helper refcount dec and lock function

2018-09-17 Thread Vlad Buslov

Rtnl lock is encapsulated in netlink and cannot be accessed by other
modules directly. This means that reference counted objects that rely on
rtnl lock cannot use it with refcounter helper function that atomically
releases decrements reference and obtains mutex.

This patch implements simple wrapper function around refcount_dec_and_lock
that obtains rtnl lock if reference counter value reached 0.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/linux/rtnetlink.h | 1 +
 net/core/rtnetlink.c  | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 5225832bd6ff..dffbf665c086 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -34,6 +34,7 @@ extern void rtnl_unlock(void);
 extern int rtnl_trylock(void);
 extern int rtnl_is_locked(void);
 extern int rtnl_lock_killable(void);
+extern bool refcount_dec_and_rtnl_lock(refcount_t *r);
 
 extern wait_queue_head_t netdev_unregistering_wq;
 extern struct rw_semaphore pernet_ops_rwsem;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e4ae0319e189..ad9d1493cb27 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -130,6 +130,12 @@ int rtnl_is_locked(void)
 }
 EXPORT_SYMBOL(rtnl_is_locked);
 
+bool refcount_dec_and_rtnl_lock(refcount_t *r)
+{
+   return refcount_dec_and_mutex_lock(r, _mutex);
+}
+EXPORT_SYMBOL(refcount_dec_and_rtnl_lock);
+
 #ifdef CONFIG_PROVE_LOCKING
 bool lockdep_rtnl_is_held(void)
 {
-- 
2.7.5

[PATCH net-next v2 09/10] net: sched: implement tcf_block_refcnt_{get|put}()

2018-09-17 Thread Vlad Buslov

Implement get/put function for blocks that only take/release the reference
and perform deallocation. These functions are intended to be used by
unlocked rules update path to always hold reference to block while working
with it. They use on new fine-grained locking mechanisms introduced in
previous patches in this set, instead of relying on global protection
provided by rtnl lock.

Extract code that is common with tcf_block_detach_ext() into common
function __tcf_block_put().

Extend tcf_block with rcu to allow safe deallocation when it is accessed
concurrently.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  1 +
 net/sched/cls_api.c   | 74 ---
 2 files changed, 51 insertions(+), 24 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 45fee65468d0..931fcdadf64a 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -357,6 +357,7 @@ struct tcf_block {
struct tcf_chain *chain;
struct list_head filter_chain_list;
} chain0;
+   struct rcu_head rcu;
 };
 
 static inline void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 924723fb74f6..0a7a3ace2da9 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -241,7 +241,7 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
block->chain0.chain = NULL;
kfree(chain);
if (list_empty(>chain_list) && !refcount_read(>refcnt))
-   kfree(block);
+   kfree_rcu(block, rcu);
 }
 
 static void tcf_chain_hold(struct tcf_chain *chain)
@@ -537,6 +537,19 @@ static struct tcf_block *tcf_block_lookup(struct net *net, 
u32 block_index)
return idr_find(>idr, block_index);
 }
 
+static struct tcf_block *tcf_block_refcnt_get(struct net *net, u32 block_index)
+{
+   struct tcf_block *block;
+
+   rcu_read_lock();
+   block = tcf_block_lookup(net, block_index);
+   if (block && !refcount_inc_not_zero(>refcnt))
+   block = NULL;
+   rcu_read_unlock();
+
+   return block;
+}
+
 static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
 {
if (!q)
@@ -573,6 +586,40 @@ static void tcf_block_put_all_chains(struct tcf_block 
*block)
}
 }
 
+static void __tcf_block_put(struct tcf_block *block, struct Qdisc *q,
+   struct tcf_block_ext_info *ei)
+{
+   if (refcount_dec_and_test(>refcnt)) {
+   /* Flushing/putting all chains will cause the block to be
+* deallocated when last chain is freed. However, if chain_list
+* is empty, block has to be manually deallocated. After block
+* reference counter reached 0, it is no longer possible to
+* increment it or add new chains to block.
+*/
+   bool free_block = list_empty(>chain_list);
+
+   if (tcf_block_shared(block))
+   tcf_block_remove(block, block->net);
+   if (!free_block)
+   tcf_block_flush_all_chains(block);
+
+   if (q)
+   tcf_block_offload_unbind(block, q, ei);
+
+   if (free_block)
+   kfree_rcu(block, rcu);
+   else
+   tcf_block_put_all_chains(block);
+   } else if (q) {
+   tcf_block_offload_unbind(block, q, ei);
+   }
+}
+
+static void tcf_block_refcnt_put(struct tcf_block *block)
+{
+   __tcf_block_put(block, NULL, NULL);
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -795,7 +842,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
if (tcf_block_shared(block))
tcf_block_remove(block, net);
 err_block_insert:
-   kfree(block);
+   kfree_rcu(block, rcu);
} else {
refcount_dec(>refcnt);
}
@@ -835,28 +882,7 @@ void tcf_block_put_ext(struct tcf_block *block, struct 
Qdisc *q,
tcf_chain0_head_change_cb_del(block, ei);
tcf_block_owner_del(block, q, ei->binder_type);
 
-   if (refcount_dec_and_test(>refcnt)) {
-   /* Flushing/putting all chains will cause the block to be
-* deallocated when last chain is freed. However, if chain_list
-* is empty, block has to be manually deallocated. After block
-* reference counter reached 0, it is no longer possible to
-* increment it or add new chains to block.
-*/
-   bool free_block = list_empty(>chain_list);
-
-   if (tcf_block_shared(block))
-   tcf_block_remove(block, block->net);
-   if (!free_block)
-   tcf_block_flush_all_chains(block)

[PATCH net-next v2 03/10] net: sched: extend Qdisc with rcu

2018-09-17 Thread Vlad Buslov

Currently, Qdisc API functions assume that users have rtnl lock taken. To
implement rtnl unlocked classifiers update interface, Qdisc API must be
extended with functions that do not require rtnl lock.

Extend Qdisc structure with rcu. Implement special version of put function
qdisc_put_unlocked() that is called without rtnl lock taken. This function
only takes rtnl lock if Qdisc reference counter reached zero and is
intended to be used as optimization.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/linux/rtnetlink.h |  5 +
 include/net/pkt_sched.h   |  1 +
 include/net/sch_generic.h |  2 ++
 net/sched/sch_api.c   | 18 ++
 net/sched/sch_generic.c   | 25 -
 5 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index dffbf665c086..d3dff3e41e6c 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -84,6 +84,11 @@ static inline struct netdev_queue *dev_ingress_queue(struct 
net_device *dev)
return rtnl_dereference(dev->ingress_queue);
 }
 
+static inline struct netdev_queue *dev_ingress_queue_rcu(struct net_device 
*dev)
+{
+   return rcu_dereference(dev->ingress_queue);
+}
+
 struct netdev_queue *dev_ingress_queue_create(struct net_device *dev);
 
 #ifdef CONFIG_NET_INGRESS
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 7dc769e5452b..a16fbe9a2a67 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -102,6 +102,7 @@ int qdisc_set_default(const char *id);
 void qdisc_hash_add(struct Qdisc *q, bool invisible);
 void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
+struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
struct nlattr *tab,
struct netlink_ext_ack *extack);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index fadb1a4d4ee8..8a0d1523d11b 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -90,6 +90,7 @@ struct Qdisc {
struct gnet_stats_queue __percpu *cpu_qstats;
int padded;
refcount_t  refcnt;
+   struct rcu_head rcu;
 
/*
 * For performance sake on SMP, we put highly modified fields at the end
@@ -555,6 +556,7 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue 
*dev_queue,
  struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
 void qdisc_put(struct Qdisc *qdisc);
+void qdisc_put_unlocked(struct Qdisc *qdisc);
 void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n,
   unsigned int len);
 struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 2096138c4bf6..070bed39155b 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -314,6 +314,24 @@ struct Qdisc *qdisc_lookup(struct net_device *dev, u32 
handle)
return q;
 }
 
+struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle)
+{
+   struct Qdisc *q;
+   struct netdev_queue *nq;
+
+   if (!handle)
+   return NULL;
+   q = qdisc_match_from_root(dev->qdisc, handle);
+   if (q)
+   goto out;
+
+   nq = dev_ingress_queue_rcu(dev);
+   if (nq)
+   q = qdisc_match_from_root(nq->qdisc_sleeping, handle);
+out:
+   return q;
+}
+
 static struct Qdisc *qdisc_leaf(struct Qdisc *p, u32 classid)
 {
unsigned long cl;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 3e7696f3e053..531fac1d2875 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -941,6 +941,13 @@ void qdisc_free(struct Qdisc *qdisc)
kfree((char *) qdisc - qdisc->padded);
 }
 
+void qdisc_free_cb(struct rcu_head *head)
+{
+   struct Qdisc *q = container_of(head, struct Qdisc, rcu);
+
+   qdisc_free(q);
+}
+
 static void qdisc_destroy(struct Qdisc *qdisc)
 {
const struct Qdisc_ops  *ops = qdisc->ops;
@@ -970,7 +977,7 @@ static void qdisc_destroy(struct Qdisc *qdisc)
kfree_skb_list(skb);
}
 
-   qdisc_free(qdisc);
+   call_rcu(>rcu, qdisc_free_cb);
 }
 
 void qdisc_put(struct Qdisc *qdisc)
@@ -983,6 +990,22 @@ void qdisc_put(struct Qdisc *qdisc)
 }
 EXPORT_SYMBOL(qdisc_put);
 
+/* Version of qdisc_put() that is called with rtnl mutex unlocked.
+ * Intended to be used as optimization, this function only takes rtnl lock if
+ * qdisc reference counter reached zero.
+ */
+
+void qdisc_put_unlocked(struct Qdisc *qdisc)
+{
+   if (qdisc->flags & TCQ_F_BUILTIN ||
+   !refcount_dec_and_rtnl_lock(>refcnt))
+   return;
+
+   qdisc_destroy(qdisc);
+   rtnl_unlock();
+}
+EXPORT_SYMBOL(qdisc_put_u

[PATCH net-next v2 05/10] net: sched: use Qdisc rcu API instead of relying on rtnl lock

2018-09-17 Thread Vlad Buslov

As a preparation from removing rtnl lock dependency from rules update path,
use Qdisc rcu and reference counting capabilities instead of relying on
rtnl lock while working with Qdiscs. Create new tcf_block_release()
function, and use it to free resources taken by tcf_block_find().
Currently, this function only releases Qdisc and it is extended in next
patches in this series.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 88 -
 1 file changed, 73 insertions(+), 15 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 1a67af8a6e8c..cfa4a02a6a1a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -527,6 +527,17 @@ static struct tcf_block *tcf_block_lookup(struct net *net, 
u32 block_index)
return idr_find(>idr, block_index);
 }
 
+static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
+{
+   if (!q)
+   return;
+
+   if (rtnl_held)
+   qdisc_put(q);
+   else
+   qdisc_put_unlocked(q);
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -537,6 +548,7 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
struct netlink_ext_ack *extack)
 {
struct tcf_block *block;
+   int err = 0;
 
if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
block = tcf_block_lookup(net, block_index);
@@ -548,55 +560,91 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
const struct Qdisc_class_ops *cops;
struct net_device *dev;
 
+   rcu_read_lock();
+
/* Find link */
-   dev = __dev_get_by_index(net, ifindex);
-   if (!dev)
+   dev = dev_get_by_index_rcu(net, ifindex);
+   if (!dev) {
+   rcu_read_unlock();
return ERR_PTR(-ENODEV);
+   }
 
/* Find qdisc */
if (!*parent) {
*q = dev->qdisc;
*parent = (*q)->handle;
} else {
-   *q = qdisc_lookup(dev, TC_H_MAJ(*parent));
+   *q = qdisc_lookup_rcu(dev, TC_H_MAJ(*parent));
if (!*q) {
NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't 
exists");
-   return ERR_PTR(-EINVAL);
+   err = -EINVAL;
+   goto errout_rcu;
}
}
 
+   *q = qdisc_refcount_inc_nz(*q);
+   if (!*q) {
+   NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
+   err = -EINVAL;
+   goto errout_rcu;
+   }
+
/* Is it classful? */
cops = (*q)->ops->cl_ops;
if (!cops) {
NL_SET_ERR_MSG(extack, "Qdisc not classful");
-   return ERR_PTR(-EINVAL);
+   err = -EINVAL;
+   goto errout_rcu;
}
 
if (!cops->tcf_block) {
NL_SET_ERR_MSG(extack, "Class doesn't support blocks");
-   return ERR_PTR(-EOPNOTSUPP);
+   err = -EOPNOTSUPP;
+   goto errout_rcu;
}
 
+   /* At this point we know that qdisc is not noop_qdisc,
+* which means that qdisc holds a reference to net_device
+* and we hold a reference to qdisc, so it is safe to release
+* rcu read lock.
+*/
+   rcu_read_unlock();
+
/* Do we search for filter, attached to class? */
if (TC_H_MIN(*parent)) {
*cl = cops->find(*q, *parent);
if (*cl == 0) {
NL_SET_ERR_MSG(extack, "Specified class doesn't 
exist");
-   return ERR_PTR(-ENOENT);
+   err = -ENOENT;
+   goto errout_qdisc;
}
}
 
/* And the last stroke */
block = cops->tcf_block(*q, *cl, extack);
-   if (!block)
-   return ERR_PTR(-EINVAL);
+   if (!block) {
+   err = -EINVAL;
+   goto errout_qdisc;
+   }
if (tcf_block_shared(block)) {
NL_SET_ERR_MSG(extack, "This filter block is shared. 
Please use the block index to manipulate the filters");
-   return ERR_PTR(-EOPNOTSUPP);
+   err = -EOPNOTSUPP;
+

[PATCH net-next v2 02/10] net: sched: rename qdisc_destroy() to qdisc_put()

2018-09-17 Thread Vlad Buslov

Current implementation of qdisc_destroy() decrements Qdisc reference
counter and only actually destroy Qdisc if reference counter value reached
zero. Rename qdisc_destroy() to qdisc_put() in order for it to better
describe the way in which this function currently implemented and used.

Extract code that deallocates Qdisc into new private qdisc_destroy()
function. It is intended to be shared between regular qdisc_put() and its
unlocked version that is introduced in next patch in this series.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  2 +-
 net/sched/sch_api.c   |  6 +++---
 net/sched/sch_atm.c   |  2 +-
 net/sched/sch_cbq.c   |  2 +-
 net/sched/sch_cbs.c   |  2 +-
 net/sched/sch_drr.c   |  4 ++--
 net/sched/sch_dsmark.c|  2 +-
 net/sched/sch_fifo.c  |  2 +-
 net/sched/sch_generic.c   | 23 ++-
 net/sched/sch_hfsc.c  |  2 +-
 net/sched/sch_htb.c   |  4 ++--
 net/sched/sch_mq.c|  4 ++--
 net/sched/sch_mqprio.c|  4 ++--
 net/sched/sch_multiq.c|  6 +++---
 net/sched/sch_netem.c |  2 +-
 net/sched/sch_prio.c  |  6 +++---
 net/sched/sch_qfq.c   |  4 ++--
 net/sched/sch_red.c   |  4 ++--
 net/sched/sch_sfb.c   |  4 ++--
 net/sched/sch_tbf.c   |  4 ++--
 20 files changed, 47 insertions(+), 42 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index d326fd553b58..fadb1a4d4ee8 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -554,7 +554,7 @@ void dev_deactivate_many(struct list_head *head);
 struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
  struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
-void qdisc_destroy(struct Qdisc *qdisc);
+void qdisc_put(struct Qdisc *qdisc);
 void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n,
   unsigned int len);
 struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 411c40344b77..2096138c4bf6 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -920,7 +920,7 @@ static void notify_and_destroy(struct net *net, struct 
sk_buff *skb,
qdisc_notify(net, skb, n, clid, old, new);
 
if (old)
-   qdisc_destroy(old);
+   qdisc_put(old);
 }
 
 /* Graft qdisc "new" to class "classid" of qdisc "parent" or
@@ -973,7 +973,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc 
*parent,
qdisc_refcount_inc(new);
 
if (!ingress)
-   qdisc_destroy(old);
+   qdisc_put(old);
}
 
 skip:
@@ -1561,7 +1561,7 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct 
nlmsghdr *n,
err = qdisc_graft(dev, p, skb, n, clid, q, NULL, extack);
if (err) {
if (q)
-   qdisc_destroy(q);
+   qdisc_put(q);
return err;
}
 
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index cd49afca9617..d714d3747bcb 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -150,7 +150,7 @@ static void atm_tc_put(struct Qdisc *sch, unsigned long cl)
pr_debug("atm_tc_put: destroying\n");
list_del_init(>list);
pr_debug("atm_tc_put: qdisc %p\n", flow->q);
-   qdisc_destroy(flow->q);
+   qdisc_put(flow->q);
tcf_block_put(flow->block);
if (flow->sock) {
pr_debug("atm_tc_put: f_count %ld\n",
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index f42025d53cfe..4dc05409e3fb 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1418,7 +1418,7 @@ static void cbq_destroy_class(struct Qdisc *sch, struct 
cbq_class *cl)
WARN_ON(cl->filters);
 
tcf_block_put(cl->block);
-   qdisc_destroy(cl->q);
+   qdisc_put(cl->q);
qdisc_put_rtab(cl->R_tab);
gen_kill_estimator(>rate_est);
if (cl != >link)
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index e26a24017faa..e689e11b6d0f 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -379,7 +379,7 @@ static void cbs_destroy(struct Qdisc *sch)
cbs_disable_offload(dev, q);
 
if (q->qdisc)
-   qdisc_destroy(q->qdisc);
+   qdisc_put(q->qdisc);
 }
 
 static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb)
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index e0b0cf8a9939..cdebaed0f8cf 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -134,7 +134,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, 
u32 parentid,
tca[TCA_RATE]);
if (err) {
NL_SET_ERR_MSG(extack, "Fail

[PATCH net-next v2 08/10] net: sched: protect block idr with spinlock

2018-09-17 Thread Vlad Buslov

Protect block idr access with spinlock, instead of relying on rtnl lock.
Take tn->idr_lock spinlock during block insertion and removal.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 58b2d8443f6a..924723fb74f6 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -473,6 +473,7 @@ tcf_chain0_head_change_cb_del(struct tcf_block *block,
 }
 
 struct tcf_net {
+   spinlock_t idr_lock; /* Protects idr */
struct idr idr;
 };
 
@@ -482,16 +483,25 @@ static int tcf_block_insert(struct tcf_block *block, 
struct net *net,
struct netlink_ext_ack *extack)
 {
struct tcf_net *tn = net_generic(net, tcf_net_id);
+   int err;
+
+   idr_preload(GFP_KERNEL);
+   spin_lock(>idr_lock);
+   err = idr_alloc_u32(>idr, block, >index, block->index,
+   GFP_NOWAIT);
+   spin_unlock(>idr_lock);
+   idr_preload_end();
 
-   return idr_alloc_u32(>idr, block, >index, block->index,
-GFP_KERNEL);
+   return err;
 }
 
 static void tcf_block_remove(struct tcf_block *block, struct net *net)
 {
struct tcf_net *tn = net_generic(net, tcf_net_id);
 
+   spin_lock(>idr_lock);
idr_remove(>idr, block->index);
+   spin_unlock(>idr_lock);
 }
 
 static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
@@ -2285,6 +2295,7 @@ static __net_init int tcf_net_init(struct net *net)
 {
struct tcf_net *tn = net_generic(net, tcf_net_id);
 
+   spin_lock_init(>idr_lock);
idr_init(>idr);
return 0;
 }
-- 
2.7.5

[PATCH net-next v2 06/10] net: sched: change tcf block reference counter type to refcount_t

2018-09-17 Thread Vlad Buslov

As a preparation for removing rtnl lock dependency from rules update path,
change tcf block reference counter type to refcount_t to allow modification
by concurrent users.

In block put function perform decrement and check reference counter once to
accommodate concurrent modification by unlocked users. After this change
tcf_chain_put at the end of block put function is called with
block->refcnt==0 and will deallocate block after the last chain is
released, so there is no need to manually deallocate block in this case.
However, if block reference counter reached 0 and there are no chains to
release, block must still be deallocated manually.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  2 +-
 net/sched/cls_api.c   | 59 ---
 2 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9a295e690efe..45fee65468d0 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -345,7 +345,7 @@ struct tcf_chain {
 struct tcf_block {
struct list_head chain_list;
u32 index; /* block index for shared blocks */
-   unsigned int refcnt;
+   refcount_t refcnt;
struct net *net;
struct Qdisc *q;
struct list_head cb_list;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index cfa4a02a6a1a..c3c7d4e2f84c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -240,7 +240,7 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
if (!chain->index)
block->chain0.chain = NULL;
kfree(chain);
-   if (list_empty(>chain_list) && block->refcnt == 0)
+   if (list_empty(>chain_list) && !refcount_read(>refcnt))
kfree(block);
 }
 
@@ -510,7 +510,7 @@ static struct tcf_block *tcf_block_create(struct net *net, 
struct Qdisc *q,
INIT_LIST_HEAD(>owner_list);
INIT_LIST_HEAD(>chain0.filter_chain_list);
 
-   block->refcnt = 1;
+   refcount_set(>refcnt, 1);
block->net = net;
block->index = block_index;
 
@@ -719,7 +719,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
/* block_index not 0 means the shared block is requested */
block = tcf_block_lookup(net, ei->block_index);
if (block)
-   block->refcnt++;
+   refcount_inc(>refcnt);
}
 
if (!block) {
@@ -762,7 +762,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
 err_block_insert:
kfree(block);
} else {
-   block->refcnt--;
+   refcount_dec(>refcnt);
}
return err;
 }
@@ -802,34 +802,45 @@ void tcf_block_put_ext(struct tcf_block *block, struct 
Qdisc *q,
tcf_chain0_head_change_cb_del(block, ei);
tcf_block_owner_del(block, q, ei->binder_type);
 
-   if (block->refcnt == 1) {
-   if (tcf_block_shared(block))
-   tcf_block_remove(block, block->net);
-
-   /* Hold a refcnt for all chains, so that they don't disappear
-* while we are iterating.
+   if (refcount_dec_and_test(>refcnt)) {
+   /* Flushing/putting all chains will cause the block to be
+* deallocated when last chain is freed. However, if chain_list
+* is empty, block has to be manually deallocated. After block
+* reference counter reached 0, it is no longer possible to
+* increment it or add new chains to block.
 */
-   list_for_each_entry(chain, >chain_list, list)
-   tcf_chain_hold(chain);
+   bool free_block = list_empty(>chain_list);
 
-   list_for_each_entry(chain, >chain_list, list)
-   tcf_chain_flush(chain);
-   }
+   if (tcf_block_shared(block))
+   tcf_block_remove(block, block->net);
 
-   tcf_block_offload_unbind(block, q, ei);
+   if (!free_block) {
+   /* Hold a refcnt for all chains, so that they don't
+* disappear while we are iterating.
+*/
+   list_for_each_entry(chain, >chain_list, list)
+   tcf_chain_hold(chain);
 
-   if (block->refcnt == 1) {
-   /* At this point, all the chains should have refcnt >= 1. */
-   list_for_each_entry_safe(chain, tmp, >chain_list, list) {
-   tcf_chain_put_explicitly_created(chain);
-   tcf_chain_put(chain);
+   list_for_each_entry(chain, >chain_list, list)
+   tcf_chain_flush(chain);
}
 
-   block-

[PATCH net-next v2 00/10] Refactor classifier API to work with Qdisc/blocks without rtnl lock

2018-09-17 Thread Vlad Buslov

Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a third step to remove
rtnl lock dependency from TC rules update path.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
Handlers registered with this flag are called without RTNL taken. End
goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER,
etc.) to be registered with UNLOCKED flag to allow parallel execution.
However, there is no intention to completely remove or split rtnl lock
itself. This patch set addresses specific problems in implementation of
classifiers API that prevent its control path from being executed
concurrently. Additional changes are required to refactor classifiers
API and individual classifiers for parallel execution. This patch set
lays groundwork to eventually register rule update handlers as
rtnl-unlocked by modifying code in cls API that works with Qdiscs and
blocks. Following patch set does the same for chains and classifiers.

The goal of this change is to refactor tcf_block_find() and its
dependencies to allow concurrent execution:
- Extend Qdisc API with rcu to lookup and take reference to Qdisc
  without relying on rtnl lock.
- Extend tcf_block with atomic reference counting and rcu.
- Always take reference to tcf_block while working with it.
- Implement tcf_block_release() to release resources obtained by
  tcf_block_find()
- Create infrastructure to allow registering Qdiscs with class ops that
  do not require the caller to hold rtnl lock.

All three netlink rule update handlers use tcf_block_find() to lookup
Qdisc and block, and this patch set introduces additional means of
synchronization to substitute rtnl lock in cls API.

Some functions in cls and sch APIs have historic names that no longer
clearly describe their intent. In order not make this code even more
confusing when introducing their concurrency-friendly versions, rename
these functions to describe actual implementation.

Changes from V1 to V2:
- Rebase on latest net-next.
- Patch 8 - remove.
- Patch 9 - fold into patch 11.
- Patch 11:
  - Rename tcf_block_{get|put}() to tcf_block_refcnt_{get|put}().
- Patch 13 - remove.

Vlad Buslov (10):
  net: core: netlink: add helper refcount dec and lock function
  net: sched: rename qdisc_destroy() to qdisc_put()
  net: sched: extend Qdisc with rcu
  net: sched: add helper function to take reference to Qdisc
  net: sched: use Qdisc rcu API instead of relying on rtnl lock
  net: sched: change tcf block reference counter type to refcount_t
  net: sched: implement functions to put and flush all chains
  net: sched: protect block idr with spinlock
  net: sched: implement tcf_block_refcnt_{get|put}()
  net: sched: use reference counting for tcf blocks on rules update

 include/linux/rtnetlink.h |   6 ++
 include/net/pkt_sched.h   |   1 +
 include/net/sch_generic.h |  20 +++-
 net/core/rtnetlink.c  |   6 ++
 net/sched/cls_api.c   | 250 +-
 net/sched/sch_api.c   |  24 -
 net/sched/sch_atm.c   |   2 +-
 net/sched/sch_cbq.c   |   2 +-
 net/sched/sch_cbs.c   |   2 +-
 net/sched/sch_drr.c   |   4 +-
 net/sched/sch_dsmark.c|   2 +-
 net/sched/sch_fifo.c  |   2 +-
 net/sched/sch_generic.c   |  48 +++--
 net/sched/sch_hfsc.c  |   2 +-
 net/sched/sch_htb.c   |   4 +-
 net/sched/sch_mq.c|   4 +-
 net/sched/sch_mqprio.c|   4 +-
 net/sched/sch_multiq.c|   6 +-
 net/sched/sch_netem.c |   2 +-
 net/sched/sch_prio.c  |   6 +-
 net/sched/sch_qfq.c   |   4 +-
 net/sched/sch_red.c   |   4 +-
 net/sched/sch_sfb.c   |   4 +-
 net/sched/sch_tbf.c   |   4 +-
 24 files changed, 301 insertions(+), 112 deletions(-)

-- 
2.7.5

Re: [PATCH net-next v2] net: sched: change tcf_del_walker() to take idrinfo->lock

2018-09-14 Thread Vlad Buslov



On Thu 13 Sep 2018 at 17:13, Cong Wang  wrote:
> On Wed, Sep 12, 2018 at 1:51 AM Vlad Buslov  wrote:
>>
>>
>> On Fri 07 Sep 2018 at 19:12, Cong Wang  wrote:
>> > On Fri, Sep 7, 2018 at 6:52 AM Vlad Buslov  wrote:
>> >>
>> >> Action API was changed to work with actions and action_idr in concurrency
>> >> safe manner, however tcf_del_walker() still uses actions without taking a
>> >> reference or idrinfo->lock first, and deletes them directly, disregarding
>> >> possible concurrent delete.
>> >>
>> >> Add tc_action_wq workqueue to action API. Implement
>> >> tcf_idr_release_unsafe() that assumes external synchronization by caller
>> >> and delays blocking action cleanup part to tc_action_wq workqueue. Extend
>> >> tcf_action_cleanup() with 'async' argument to indicate that function 
>> >> should
>> >> free action asynchronously.
>> >
>> > Where exactly is blocking in tcf_action_cleanup()?
>> >
>> > From your code, it looks like free_tcf(), but from my observation,
>> > the only blocking function inside is tcf_action_goto_chain_fini()
>> > which calls __tcf_chain_put(). But, __tcf_chain_put() is blocking
>> > _ONLY_ when tc_chain_notify() is called, for tc action it is never
>> > called.
>> >
>> > So, what else is blocking?
>>
>> __tcf_chain_put() calls tc_chain_tmplt_del(), which calls
>> ops->tmplt_destroy(). This last function uses hw offload API, which is
>> blocking.
>
> Good to know.
>
> Can we just make ops->tmplt_destroy() to use workqueue?
> Making tc action to workqueue seems overkill, for me.

How about changing tcf_chain_put_by_act() to use tc_filter_wq, instead
of directly calling __tcf_chain_put()? IMO it is a better solution
because it benefits all classifiers, instead of requiring every
classifier with templates support to implement non-blocking
ops->tmplt_destroy().

Re: [PATCH net-next 08/13] net: sched: rename tcf_block_get{_ext}() and tcf_block_put{_ext}()

2018-09-14 Thread Vlad Buslov



On Thu 13 Sep 2018 at 17:21, Cong Wang  wrote:
> On Wed, Sep 12, 2018 at 1:24 AM Vlad Buslov  wrote:
>>
>>
>> On Fri 07 Sep 2018 at 20:09, Cong Wang  wrote:
>> > On Thu, Sep 6, 2018 at 12:59 AM Vlad Buslov  wrote:
>> >>
>> >> Functions tcf_block_get{_ext}() and tcf_block_put{_ext}() actually
>> >> attach/detach block to specific Qdisc besides just taking/putting
>> >> reference. Rename them according to their purpose.
>> >
>> > Where exactly does it attach to?
>> >
>> > Each qdisc provides a pointer to a pointer of a block, like
>> > >block. It is where the result is saved to. It takes a parameter
>> > of Qdisc* merely for read-only purpose.
>>
>> tcf_block_attach_ext() passes qdisc parameter to tcf_block_owner_add()
>> which saves qdisc to new tcf_block_owner_item and adds the item to
>> block's owner list. I proposed several naming options for these
>> functions to Jiri on internal review and he suggested "attach" as better
>> option.
>
> But that is merely item->q = q, this is why I said it is read-only,
> hard to claim this is attaching.
>
>
>>
>> >
>> > So, renaming it to *attach() is even confusing, at least not
>> > any better. Please find other names or leave them as they are.
>>
>> What would you recommend?
>
> I don't know, perhaps "acquire"?
>
> Or, leaving tcf_block_get() as it is but rename your refcnt
> increment function to be something like tcf_block_refcnt_get()?

Cong, I'm okay with both options.

Jiri, which naming would you prefer?

Re: [PATCH net-next v2] net: sched: change tcf_del_walker() to take idrinfo->lock

2018-09-12 Thread Vlad Buslov



On Fri 07 Sep 2018 at 19:12, Cong Wang  wrote:
> On Fri, Sep 7, 2018 at 6:52 AM Vlad Buslov  wrote:
>>
>> Action API was changed to work with actions and action_idr in concurrency
>> safe manner, however tcf_del_walker() still uses actions without taking a
>> reference or idrinfo->lock first, and deletes them directly, disregarding
>> possible concurrent delete.
>>
>> Add tc_action_wq workqueue to action API. Implement
>> tcf_idr_release_unsafe() that assumes external synchronization by caller
>> and delays blocking action cleanup part to tc_action_wq workqueue. Extend
>> tcf_action_cleanup() with 'async' argument to indicate that function should
>> free action asynchronously.
>
> Where exactly is blocking in tcf_action_cleanup()?
>
> From your code, it looks like free_tcf(), but from my observation,
> the only blocking function inside is tcf_action_goto_chain_fini()
> which calls __tcf_chain_put(). But, __tcf_chain_put() is blocking
> _ONLY_ when tc_chain_notify() is called, for tc action it is never
> called.
>
> So, what else is blocking?

__tcf_chain_put() calls tc_chain_tmplt_del(), which calls
ops->tmplt_destroy(). This last function uses hw offload API, which is
blocking.

Re: [PATCH net-next 13/13] net: sched: add flags to Qdisc class ops struct

2018-09-12 Thread Vlad Buslov



On Fri 07 Sep 2018 at 19:50, Cong Wang  wrote:
> On Thu, Sep 6, 2018 at 12:59 AM Vlad Buslov  wrote:
>>
>> Extend Qdisc_class_ops with flags. Create enum to hold possible class ops
>> flag values. Add first class ops flags value QDISC_CLASS_OPS_DOIT_UNLOCKED
>> to indicate that class ops functions can be called without taking rtnl
>> lock.
>
> We don't add anything that is not used.
>
> This is the last patch in this series, so I am pretty sure you split
> it in a wrong way, it certainly belongs to next series, not this series.

Will do.

Thank you for reviewing my code!

Re: [PATCH net-next 09/13] net: sched: extend tcf_block with rcu

2018-09-12 Thread Vlad Buslov



On Fri 07 Sep 2018 at 19:52, Cong Wang  wrote:
> On Thu, Sep 6, 2018 at 12:59 AM Vlad Buslov  wrote:
>>
>> Extend tcf_block with rcu to allow safe deallocation when it is accessed
>> concurrently.
>
> This sucks, please fold this patch into where you call rcu_read_lock()
> on tcf block.
>
> This patch _alone_ is apparently not complete. This is not how
> we split patches.

Got it.

Re: [PATCH net-next 08/13] net: sched: rename tcf_block_get{_ext}() and tcf_block_put{_ext}()

2018-09-12 Thread Vlad Buslov

On Fri 07 Sep 2018 at 20:09, Cong Wang  wrote:
> On Thu, Sep 6, 2018 at 12:59 AM Vlad Buslov  wrote:
>>
>> Functions tcf_block_get{_ext}() and tcf_block_put{_ext}() actually
>> attach/detach block to specific Qdisc besides just taking/putting
>> reference. Rename them according to their purpose.
>
> Where exactly does it attach to?
>
> Each qdisc provides a pointer to a pointer of a block, like
> >block. It is where the result is saved to. It takes a parameter
> of Qdisc* merely for read-only purpose.

tcf_block_attach_ext() passes qdisc parameter to tcf_block_owner_add()
which saves qdisc to new tcf_block_owner_item and adds the item to
block's owner list. I proposed several naming options for these
functions to Jiri on internal review and he suggested "attach" as better
option.

>
> So, renaming it to *attach() is even confusing, at least not
> any better. Please find other names or leave them as they are.

What would you recommend?

[PATCH net-next v2] net: sched: cls_flower: dump offload count value

2018-09-07 Thread Vlad Buslov

Change flower in_hw_count type to fixed-size u32 and dump it as
TCA_FLOWER_IN_HW_COUNT. This change is necessary to properly test shared
blocks and re-offload functionality.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h| 2 +-
 include/uapi/linux/pkt_cls.h | 2 ++
 net/sched/cls_flower.c   | 5 -
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a6d00093f35e..d68ac55539a5 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -362,7 +362,7 @@ static inline void tcf_block_offload_dec(struct tcf_block 
*block, u32 *flags)
 }
 
 static inline void
-tc_cls_offload_cnt_update(struct tcf_block *block, unsigned int *cnt,
+tc_cls_offload_cnt_update(struct tcf_block *block, u32 *cnt,
  u32 *flags, bool add)
 {
if (add) {
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index be382fb0592d..401d0c1e612d 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -483,6 +483,8 @@ enum {
TCA_FLOWER_KEY_ENC_OPTS,
TCA_FLOWER_KEY_ENC_OPTS_MASK,
 
+   TCA_FLOWER_IN_HW_COUNT,
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 6fd9bdd93796..4b8dd37dd4f8 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -98,7 +98,7 @@ struct cls_fl_filter {
struct list_head list;
u32 handle;
u32 flags;
-   unsigned int in_hw_count;
+   u32 in_hw_count;
struct rcu_work rwork;
struct net_device *hw_dev;
 };
@@ -1880,6 +1880,9 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, 
void *fh,
if (f->flags && nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags))
goto nla_put_failure;
 
+   if (nla_put_u32(skb, TCA_FLOWER_IN_HW_COUNT, f->in_hw_count))
+   goto nla_put_failure;
+
if (tcf_exts_dump(skb, >exts))
goto nla_put_failure;
 
-- 
2.7.5

[PATCH net-next v2] net: sched: change tcf_del_walker() to take idrinfo->lock

2018-09-07 Thread Vlad Buslov

Action API was changed to work with actions and action_idr in concurrency
safe manner, however tcf_del_walker() still uses actions without taking a
reference or idrinfo->lock first, and deletes them directly, disregarding
possible concurrent delete.

Add tc_action_wq workqueue to action API. Implement
tcf_idr_release_unsafe() that assumes external synchronization by caller
and delays blocking action cleanup part to tc_action_wq workqueue. Extend
tcf_action_cleanup() with 'async' argument to indicate that function should
free action asynchronously.

Change tcf_del_walker() to take idrinfo->lock while iterating over actions
and use new tcf_idr_release_unsafe() to release them while holding the
lock.

Signed-off-by: Vlad Buslov 
---
 include/net/act_api.h |  1 +
 net/sched/act_api.c   | 73 ---
 2 files changed, 65 insertions(+), 9 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index c6f195b3c706..4c5117bc4afb 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -38,6 +38,7 @@ struct tc_action {
struct gnet_stats_queue __percpu *cpu_qstats;
struct tc_cookie__rcu *act_cookie;
struct tcf_chain*goto_chain;
+   struct work_struct  work;
 };
 #define tcf_index  common.tcfa_index
 #define tcf_refcnt common.tcfa_refcnt
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 6f118d62c731..4ad9062c34b3 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -90,13 +90,38 @@ static void free_tcf(struct tc_action *p)
kfree(p);
 }
 
-static void tcf_action_cleanup(struct tc_action *p)
+static void tcf_action_free(struct tc_action *p)
+{
+   gen_kill_estimator(>tcfa_rate_est);
+   free_tcf(p);
+}
+
+static void tcf_action_free_work(struct work_struct *work)
+{
+   struct tc_action *p = container_of(work,
+  struct tc_action,
+  work);
+
+   tcf_action_free(p);
+}
+
+static struct workqueue_struct *tc_action_wq;
+
+static bool tcf_action_queue_work(struct work_struct *work, work_func_t func)
+{
+   INIT_WORK(work, func);
+   return queue_work(tc_action_wq, work);
+}
+
+static void tcf_action_cleanup(struct tc_action *p, bool async)
 {
if (p->ops->cleanup)
p->ops->cleanup(p);
 
-   gen_kill_estimator(>tcfa_rate_est);
-   free_tcf(p);
+   if (async)
+   tcf_action_queue_work(>work, tcf_action_free_work);
+   else
+   tcf_action_free(p);
 }
 
 static int __tcf_action_put(struct tc_action *p, bool bind)
@@ -109,7 +134,7 @@ static int __tcf_action_put(struct tc_action *p, bool bind)
idr_remove(>action_idr, p->tcfa_index);
spin_unlock(>lock);
 
-   tcf_action_cleanup(p);
+   tcf_action_cleanup(p, false);
return 1;
}
 
@@ -147,6 +172,24 @@ int __tcf_idr_release(struct tc_action *p, bool bind, bool 
strict)
 }
 EXPORT_SYMBOL(__tcf_idr_release);
 
+/* Release idr without obtaining idrinfo->lock. Caller must prevent any
+ * concurrent modifications of idrinfo->action_idr!
+ */
+
+static int tcf_idr_release_unsafe(struct tc_action *p)
+{
+   if (atomic_read(>tcfa_bindcnt) > 0)
+   return -EPERM;
+
+   if (refcount_dec_and_test(>tcfa_refcnt)) {
+   idr_remove(>idrinfo->action_idr, p->tcfa_index);
+   tcf_action_cleanup(p, true);
+   return ACT_P_DELETED;
+   }
+
+   return 0;
+}
+
 static size_t tcf_action_shared_attrs_size(const struct tc_action *act)
 {
struct tc_cookie *act_cookie;
@@ -262,20 +305,25 @@ static int tcf_del_walker(struct tcf_idrinfo *idrinfo, 
struct sk_buff *skb,
if (nla_put_string(skb, TCA_KIND, ops->kind))
goto nla_put_failure;
 
+   spin_lock(>lock);
idr_for_each_entry_ul(idr, p, id) {
-   ret = __tcf_idr_release(p, false, true);
+   ret = tcf_idr_release_unsafe(p);
if (ret == ACT_P_DELETED) {
module_put(ops->owner);
n_i++;
} else if (ret < 0) {
-   goto nla_put_failure;
+   goto nla_put_failure_locked;
}
}
+   spin_unlock(>lock);
+
if (nla_put_u32(skb, TCA_FCNT, n_i))
goto nla_put_failure;
nla_nest_end(skb, nest);
 
return n_i;
+nla_put_failure_locked:
+   spin_unlock(>lock);
 nla_put_failure:
nla_nest_cancel(skb, nest);
return ret;
@@ -341,7 +389,7 @@ static int tcf_idr_delete_index(struct tcf_idrinfo 
*idrinfo, u32 index)
p->tcfa_index));
spin_unlock(>lock);
 
-   tcf_action_cleanup(p);
+

Re: [PATCH net-next] net: sched: cls_flower: dump offload count value

2018-09-07 Thread Vlad Buslov



On Fri 07 Sep 2018 at 09:11, Jakub Kicinski  
wrote:
> On Thu,  6 Sep 2018 18:37:23 +0300, Vlad Buslov wrote:
>> Change flower in_hw_count type to fixed-size u32 and dump it as
>> TCA_FLOWER_IN_HW_COUNT. This change is necessary to properly test shared
>> blocks and re-offload functionality.
>> 
>> Signed-off-by: Vlad Buslov 
>> Acked-by: Jiri Pirko 
>> ---
>>  include/net/sch_generic.h| 2 +-
>>  include/uapi/linux/pkt_cls.h | 2 ++
>>  net/sched/cls_flower.c   | 5 -
>>  3 files changed, 7 insertions(+), 2 deletions(-)
>> 
>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> index a6d00093f35e..d68ac55539a5 100644
>> --- a/include/net/sch_generic.h
>> +++ b/include/net/sch_generic.h
>> @@ -362,7 +362,7 @@ static inline void tcf_block_offload_dec(struct 
>> tcf_block *block, u32 *flags)
>>  }
>>  
>>  static inline void
>> -tc_cls_offload_cnt_update(struct tcf_block *block, unsigned int *cnt,
>> +tc_cls_offload_cnt_update(struct tcf_block *block, u32 *cnt,
>>u32 *flags, bool add)
>>  {
>>  if (add) {
>> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>> index be382fb0592d..2824fb7ed1c9 100644
>> --- a/include/uapi/linux/pkt_cls.h
>> +++ b/include/uapi/linux/pkt_cls.h
>> @@ -483,6 +483,8 @@ enum {
>>  TCA_FLOWER_KEY_ENC_OPTS,
>>  TCA_FLOWER_KEY_ENC_OPTS_MASK,
>>  
>> +TCA_FLOWER_IN_HW_COUNT, /* be32 */
>
> Why be32?

This comment is wrong.
Thanks for catching this.

>
> I wish there was a good way to share this attribute between
> classifiers :( 
>
>>  __TCA_FLOWER_MAX,
>>  };
>>  
>> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>> index 6fd9bdd93796..4b8dd37dd4f8 100644
>> --- a/net/sched/cls_flower.c
>> +++ b/net/sched/cls_flower.c
>> @@ -98,7 +98,7 @@ struct cls_fl_filter {
>>  struct list_head list;
>>  u32 handle;
>>  u32 flags;
>> -unsigned int in_hw_count;
>> +u32 in_hw_count;
>>  struct rcu_work rwork;
>>  struct net_device *hw_dev;
>>  };
>> @@ -1880,6 +1880,9 @@ static int fl_dump(struct net *net, struct tcf_proto 
>> *tp, void *fh,
>>  if (f->flags && nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags))
>>  goto nla_put_failure;
>>  
>> +if (nla_put_u32(skb, TCA_FLOWER_IN_HW_COUNT, f->in_hw_count))
>> +goto nla_put_failure;
>> +
>>  if (tcf_exts_dump(skb, >exts))
>>  goto nla_put_failure;
>>

[PATCH net-next] net: sched: cls_flower: dump offload count value

2018-09-06 Thread Vlad Buslov

Change flower in_hw_count type to fixed-size u32 and dump it as
TCA_FLOWER_IN_HW_COUNT. This change is necessary to properly test shared
blocks and re-offload functionality.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h| 2 +-
 include/uapi/linux/pkt_cls.h | 2 ++
 net/sched/cls_flower.c   | 5 -
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a6d00093f35e..d68ac55539a5 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -362,7 +362,7 @@ static inline void tcf_block_offload_dec(struct tcf_block 
*block, u32 *flags)
 }
 
 static inline void
-tc_cls_offload_cnt_update(struct tcf_block *block, unsigned int *cnt,
+tc_cls_offload_cnt_update(struct tcf_block *block, u32 *cnt,
  u32 *flags, bool add)
 {
if (add) {
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index be382fb0592d..2824fb7ed1c9 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -483,6 +483,8 @@ enum {
TCA_FLOWER_KEY_ENC_OPTS,
TCA_FLOWER_KEY_ENC_OPTS_MASK,
 
+   TCA_FLOWER_IN_HW_COUNT, /* be32 */
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 6fd9bdd93796..4b8dd37dd4f8 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -98,7 +98,7 @@ struct cls_fl_filter {
struct list_head list;
u32 handle;
u32 flags;
-   unsigned int in_hw_count;
+   u32 in_hw_count;
struct rcu_work rwork;
struct net_device *hw_dev;
 };
@@ -1880,6 +1880,9 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, 
void *fh,
if (f->flags && nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags))
goto nla_put_failure;
 
+   if (nla_put_u32(skb, TCA_FLOWER_IN_HW_COUNT, f->in_hw_count))
+   goto nla_put_failure;
+
if (tcf_exts_dump(skb, >exts))
goto nla_put_failure;
 
-- 
2.7.5

Re: [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete

2018-09-06 Thread Vlad Buslov



On Wed 05 Sep 2018 at 20:32, Cong Wang  wrote:
> On Wed, Sep 5, 2018 at 12:05 AM Vlad Buslov  wrote:
>>
>>
>> On Tue 04 Sep 2018 at 22:41, Cong Wang  wrote:
>> > On Mon, Sep 3, 2018 at 1:33 PM Vlad Buslov  wrote:
>> >>
>> >>
>> >> On Mon 03 Sep 2018 at 18:50, Cong Wang  wrote:
>> >> > On Mon, Sep 3, 2018 at 12:06 AM Vlad Buslov  wrote:
>> >> >>
>> >> >> Action API was changed to work with actions and action_idr in 
>> >> >> concurrency
>> >> >> safe manner, however tcf_del_walker() still uses actions without taking
>> >> >> reference to them first and deletes them directly, disregarding 
>> >> >> possible
>> >> >> concurrent delete.
>> >> >>
>> >> >> Change tcf_del_walker() to use tcf_idr_delete_index() that doesn't 
>> >> >> require
>> >> >> caller to hold reference to action and accepts action id as argument,
>> >> >> instead of direct action pointer.
>> >> >
>> >> > Hmm, why doesn't tcf_del_walker() just take idrinfo->lock? At least
>> >> > tcf_dump_walker() already does.
>> >>
>> >> Because tcf_del_walker() calls __tcf_idr_release(), which take
>> >> idrinfo->lock itself (deadlock). It also calls sleeping functions like
>> >
>> > Deadlock can be easily resolved by moving the lock out.
>> >
>> >
>> >> tcf_action_goto_chain_fini(), so just implementing function that
>> >> releases action without taking idrinfo->lock is not enough.
>> >
>> > Sleeping can be resolved either by making it atomic or
>> > deferring it to a work queue.
>> >
>> > None of your arguments here is a blocker to locking
>> > idrinfo->lock. You really should focus on if it is really
>> > necessary to lock idrinfo->lock in tcf_del_walker(), rather
>> > than these details.
>> >
>> > For me, if you need idrinfo->lock for dump walker, you must
>> > need it for delete walker too, because deletion is a writer
>> > which should require stronger protection than the dumper,
>> > which merely a reader.
>>
>> I don't get how it is necessary. Dump walker uses pointers to actions
>> directly, and in order to be concurrency-safe it must either hold the
>
> It uses the pointer in a read-only way, what you said doesn't change
> the fact that it is a reader. And, like other readers, it may not need
> to lock at all, which is a different topic.
>
>
>> lock or obtain reference to action. Note that del walker doesn't use the
>> action pointer, it only passed action id to tcf_idr_delete_index()
>> function, which does all the necessary locking and can deal with
>> potential concurrency issues (concurrent delete, etc.). This approach
>> also benefits from code reuse from other code paths that delete actions,
>> instead of implementing its own.
>
> Look at the difference below.
>
> With your change:
>
> idr_for_each_entry_ul{
>spin_lock(>lock);
>idr_remove();
>spin_unlock(>lock);
> }
>
> With what I suggest:
>
> spin_lock(>lock);
> idr_for_each_entry_ul{
>idr_remove();
> }
> spin_unlock(>lock);
>
> Isn't a concurrent tcf_idr_check_alloc() able to livelock here with
> your change?
>
> idr_for_each_entry_ul{
>spin_lock(>lock);
>idr_remove();
>spin_unlock(>lock);
>   // tcf_idr_check_alloc() jumps in,
>  // allocates next ID which can be found
>   // by idr_get_next_ul()
> } // the whole loop goes _literately_ infinite...

idr_for_each_entry_ul traverses idr entries with ascending order of
identifiers, so infinite livelock like this is not possible because it
never goes back to newly added entries with id
>
> Also, idr_for_each_entry_ul() is supposed to be protected either
> by RCU or idrinfo->lock, no? With your change or without any change,
> it doesn't even have any lock after removing RTNL?

After reading this comment I checked actual idr implementation and I
think you are right. Even though idr_for_each_entry_ul() macro (and
function idr_get_next_ul() that it uses to iterate over idr entries)
doesn't specify any locking requirements in comment description (that is
why this patch doesn't use any), its implementation seems to require
external synchronization.

You suggest I should just hold idrinfo->lock for whole del_walker loop
duration, or play nicely with potential concurrent users and
take/release it per action?

Thanks,
Vlad

Re: [PATCH net-next 03/13] net: sched: extend Qdisc with rcu

2018-09-06 Thread Vlad Buslov



On Thu 06 Sep 2018 at 08:39, Kirill Tkhai  wrote:
> On 06.09.2018 11:30, Eric Dumazet wrote:
>> 
>> 
>> On 09/06/2018 12:58 AM, Vlad Buslov wrote:
>> 
>> ...
>> 
>>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>>> index 18e22a5a6550..239c73f29471 100644
>>> --- a/include/net/sch_generic.h
>>> +++ b/include/net/sch_generic.h
>>> @@ -90,6 +90,7 @@ struct Qdisc {
>>> struct gnet_stats_queue __percpu *cpu_qstats;
>>> int padded;
>>> refcount_t  refcnt;
>>> +   struct rcu_head rcu;
>>>  
>>> /*
>>>  * For performance sake on SMP, we put highly modified fields at the end
>> 
>> Probably better to move this at the end of struct Qdisc,
>> not risking unexpected performance regressions in fast path.
>
> Do you mean regressions on UP? On SMP it looks like this field
> fits in the unused gap created by:
>
>   struct sk_buff_head gso_skb cacheline_aligned_in_smp;
>
> Kirill

Hi Eric, Kirill

I intentionally put rcu_head here in order for it not to be in same
cache line with "highly modified fields" (according to comment).

[PATCH net-next 06/13] net: sched: change tcf block reference counter type to refcount_t

2018-09-06 Thread Vlad Buslov

As a preparation for removing rtnl lock dependency from rules update path,
change tcf block reference counter type to refcount_t to allow modification
by concurrent users.

In block put function perform decrement and check reference counter once to
accommodate concurrent modification by unlocked users. After this change
tcf_chain_put at the end of block put function is called with
block->refcnt==0 and will deallocate block after the last chain is
released, so there is no need to manually deallocate block in this case.
However, if block reference counter reached 0 and there are no chains to
release, block must still be deallocated manually.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  2 +-
 net/sched/cls_api.c   | 59 ---
 2 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index f878afa58be4..825e2bf6c5c3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -345,7 +345,7 @@ struct tcf_chain {
 struct tcf_block {
struct list_head chain_list;
u32 index; /* block index for shared blocks */
-   unsigned int refcnt;
+   refcount_t refcnt;
struct net *net;
struct Qdisc *q;
struct list_head cb_list;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index cfa4a02a6a1a..c3c7d4e2f84c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -240,7 +240,7 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
if (!chain->index)
block->chain0.chain = NULL;
kfree(chain);
-   if (list_empty(>chain_list) && block->refcnt == 0)
+   if (list_empty(>chain_list) && !refcount_read(>refcnt))
kfree(block);
 }
 
@@ -510,7 +510,7 @@ static struct tcf_block *tcf_block_create(struct net *net, 
struct Qdisc *q,
INIT_LIST_HEAD(>owner_list);
INIT_LIST_HEAD(>chain0.filter_chain_list);
 
-   block->refcnt = 1;
+   refcount_set(>refcnt, 1);
block->net = net;
block->index = block_index;
 
@@ -719,7 +719,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
/* block_index not 0 means the shared block is requested */
block = tcf_block_lookup(net, ei->block_index);
if (block)
-   block->refcnt++;
+   refcount_inc(>refcnt);
}
 
if (!block) {
@@ -762,7 +762,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
 err_block_insert:
kfree(block);
} else {
-   block->refcnt--;
+   refcount_dec(>refcnt);
}
return err;
 }
@@ -802,34 +802,45 @@ void tcf_block_put_ext(struct tcf_block *block, struct 
Qdisc *q,
tcf_chain0_head_change_cb_del(block, ei);
tcf_block_owner_del(block, q, ei->binder_type);
 
-   if (block->refcnt == 1) {
-   if (tcf_block_shared(block))
-   tcf_block_remove(block, block->net);
-
-   /* Hold a refcnt for all chains, so that they don't disappear
-* while we are iterating.
+   if (refcount_dec_and_test(>refcnt)) {
+   /* Flushing/putting all chains will cause the block to be
+* deallocated when last chain is freed. However, if chain_list
+* is empty, block has to be manually deallocated. After block
+* reference counter reached 0, it is no longer possible to
+* increment it or add new chains to block.
 */
-   list_for_each_entry(chain, >chain_list, list)
-   tcf_chain_hold(chain);
+   bool free_block = list_empty(>chain_list);
 
-   list_for_each_entry(chain, >chain_list, list)
-   tcf_chain_flush(chain);
-   }
+   if (tcf_block_shared(block))
+   tcf_block_remove(block, block->net);
 
-   tcf_block_offload_unbind(block, q, ei);
+   if (!free_block) {
+   /* Hold a refcnt for all chains, so that they don't
+* disappear while we are iterating.
+*/
+   list_for_each_entry(chain, >chain_list, list)
+   tcf_chain_hold(chain);
 
-   if (block->refcnt == 1) {
-   /* At this point, all the chains should have refcnt >= 1. */
-   list_for_each_entry_safe(chain, tmp, >chain_list, list) {
-   tcf_chain_put_explicitly_created(chain);
-   tcf_chain_put(chain);
+   list_for_each_entry(chain, >chain_list, list)
+   tcf_chain_flush(chain);
}
 
-   block-

[PATCH net-next 01/13] net: core: netlink: add helper refcount dec and lock function

2018-09-06 Thread Vlad Buslov

Rtnl lock is encapsulated in netlink and cannot be accessed by other
modules directly. This means that reference counted objects that rely on
rtnl lock cannot use it with refcounter helper function that atomically
releases decrements reference and obtains mutex.

This patch implements simple wrapper function around refcount_dec_and_lock
that obtains rtnl lock if reference counter value reached 0.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/linux/rtnetlink.h | 1 +
 net/core/rtnetlink.c  | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 5225832bd6ff..dffbf665c086 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -34,6 +34,7 @@ extern void rtnl_unlock(void);
 extern int rtnl_trylock(void);
 extern int rtnl_is_locked(void);
 extern int rtnl_lock_killable(void);
+extern bool refcount_dec_and_rtnl_lock(refcount_t *r);
 
 extern wait_queue_head_t netdev_unregistering_wq;
 extern struct rw_semaphore pernet_ops_rwsem;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 60c928894a78..4ea0b1413076 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -130,6 +130,12 @@ int rtnl_is_locked(void)
 }
 EXPORT_SYMBOL(rtnl_is_locked);
 
+bool refcount_dec_and_rtnl_lock(refcount_t *r)
+{
+   return refcount_dec_and_mutex_lock(r, _mutex);
+}
+EXPORT_SYMBOL(refcount_dec_and_rtnl_lock);
+
 #ifdef CONFIG_PROVE_LOCKING
 bool lockdep_rtnl_is_held(void)
 {
-- 
2.7.5

[PATCH net-next 04/13] net: sched: add helper function to take reference to Qdisc

2018-09-06 Thread Vlad Buslov

Implement function to take reference to Qdisc that relies on rcu read lock
instead of rtnl mutex. Function only takes reference to Qdisc if reference
counter isn't zero. Intended to be used by unlocked cls API.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 239c73f29471..f878afa58be4 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -115,6 +115,19 @@ static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
refcount_inc(>refcnt);
 }
 
+/* Intended to be used by unlocked users, when concurrent qdisc release is
+ * possible.
+ */
+
+static inline struct Qdisc *qdisc_refcount_inc_nz(struct Qdisc *qdisc)
+{
+   if (qdisc->flags & TCQ_F_BUILTIN)
+   return qdisc;
+   if (refcount_inc_not_zero(>refcnt))
+   return qdisc;
+   return NULL;
+}
+
 static inline bool qdisc_is_running(struct Qdisc *qdisc)
 {
if (qdisc->flags & TCQ_F_NOLOCK)
-- 
2.7.5

[PATCH net-next 10/13] net: sched: protect block idr with spinlock

2018-09-06 Thread Vlad Buslov

Protect block idr access with spinlock, instead of relying on rtnl lock.
Take tn->idr_lock spinlock during block insertion and removal.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 502b2da8a885..f06aa9313a58 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -473,6 +473,7 @@ tcf_chain0_head_change_cb_del(struct tcf_block *block,
 }
 
 struct tcf_net {
+   spinlock_t idr_lock; /* Protects idr */
struct idr idr;
 };
 
@@ -482,16 +483,25 @@ static int tcf_block_insert(struct tcf_block *block, 
struct net *net,
struct netlink_ext_ack *extack)
 {
struct tcf_net *tn = net_generic(net, tcf_net_id);
+   int err;
+
+   idr_preload(GFP_KERNEL);
+   spin_lock(>idr_lock);
+   err = idr_alloc_u32(>idr, block, >index, block->index,
+   GFP_NOWAIT);
+   spin_unlock(>idr_lock);
+   idr_preload_end();
 
-   return idr_alloc_u32(>idr, block, >index, block->index,
-GFP_KERNEL);
+   return err;
 }
 
 static void tcf_block_remove(struct tcf_block *block, struct net *net)
 {
struct tcf_net *tn = net_generic(net, tcf_net_id);
 
+   spin_lock(>idr_lock);
idr_remove(>idr, block->index);
+   spin_unlock(>idr_lock);
 }
 
 static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
@@ -2284,6 +2294,7 @@ static __net_init int tcf_net_init(struct net *net)
 {
struct tcf_net *tn = net_generic(net, tcf_net_id);
 
+   spin_lock_init(>idr_lock);
idr_init(>idr);
return 0;
 }
-- 
2.7.5

[PATCH net-next 00/13] Refactor classifier API to work with Qdisc/blocks without rtnl lock

2018-09-06 Thread Vlad Buslov

Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a third step to remove
rtnl lock dependency from TC rules update path.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
Handlers registered with this flag are called without RTNL taken. End
goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER,
etc.) to be registered with UNLOCKED flag to allow parallel execution.
However, there is no intention to completely remove or split rtnl lock
itself. This patch set addresses specific problems in implementation of
classifiers API that prevent its control path from being executed
concurrently. Additional changes are required to refactor classifiers
API and individual classifiers for parallel execution. This patch set
lays groundwork to eventually register rule update handlers as
rtnl-unlocked by modifying code in cls API that works with Qdiscs and
blocks. Following patch set does the same for chains and classifiers.

The goal of this change is to refactor tcf_block_find() and its
dependencies to allow concurrent execution:
- Extend Qdisc API with rcu to lookup and take reference to Qdisc
  without relying on rtnl lock.
- Extend tcf_block with atomic reference counting and rcu.
- Always take reference to tcf_block while working with it.
- Implement tcf_block_release() to release resources obtained by
  tcf_block_find()
- Create infrastructure to allow registering Qdiscs with class ops that
  do not require the caller to hold rtnl lock.

All three netlink rule update handlers use tcf_block_find() to lookup
Qdisc and block, and this patch set introduces additional means of
synchronization to substitute rtnl lock in cls API.

Some functions in cls and sch APIs have historic names that no longer
clearly describe their intent. In order not make this code even more
confusing when introducing their concurrency-friendly versions, rename
these functions to describe actual implementation.

Vlad Buslov (13):
  net: core: netlink: add helper refcount dec and lock function
  net: sched: rename qdisc_destroy() to qdisc_put()
  net: sched: extend Qdisc with rcu
  net: sched: add helper function to take reference to Qdisc
  net: sched: use Qdisc rcu API instead of relying on rtnl lock
  net: sched: change tcf block reference counter type to refcount_t
  net: sched: implement functions to put and flush all chains
  net: sched: rename tcf_block_get{_ext}() and tcf_block_put{_ext}()
  net: sched: extend tcf_block with rcu
  net: sched: protect block idr with spinlock
  net: sched: implement tcf_block_get() and tcf_block_put()
  net: sched: use reference counting for tcf blocks on rules update
  net: sched: add flags to Qdisc class ops struct

 include/linux/rtnetlink.h |   6 +
 include/net/pkt_cls.h |  36 +++---
 include/net/pkt_sched.h   |   1 +
 include/net/sch_generic.h |  28 -
 net/core/rtnetlink.c  |   6 +
 net/sched/cls_api.c   | 281 --
 net/sched/sch_api.c   |  24 +++-
 net/sched/sch_atm.c   |  14 +--
 net/sched/sch_cake.c  |   4 +-
 net/sched/sch_cbq.c   |  15 +--
 net/sched/sch_cbs.c   |   2 +-
 net/sched/sch_drr.c   |   8 +-
 net/sched/sch_dsmark.c|   6 +-
 net/sched/sch_fifo.c  |   2 +-
 net/sched/sch_fq_codel.c  |   4 +-
 net/sched/sch_generic.c   |  48 ++--
 net/sched/sch_hfsc.c  |  13 ++-
 net/sched/sch_htb.c   |  17 +--
 net/sched/sch_ingress.c   |  15 +--
 net/sched/sch_mq.c|   4 +-
 net/sched/sch_mqprio.c|   4 +-
 net/sched/sch_multiq.c|  10 +-
 net/sched/sch_netem.c |   2 +-
 net/sched/sch_prio.c  |  10 +-
 net/sched/sch_qfq.c   |   8 +-
 net/sched/sch_red.c   |   4 +-
 net/sched/sch_sfb.c   |   8 +-
 net/sched/sch_sfq.c   |   4 +-
 net/sched/sch_tbf.c   |   4 +-
 29 files changed, 394 insertions(+), 194 deletions(-)

-- 
2.7.5

[PATCH net-next 03/13] net: sched: extend Qdisc with rcu

2018-09-06 Thread Vlad Buslov

Currently, Qdisc API functions assume that users have rtnl lock taken. To
implement rtnl unlocked classifiers update interface, Qdisc API must be
extended with functions that do not require rtnl lock.

Extend Qdisc structure with rcu. Implement special version of put function
qdisc_put_unlocked() that is called without rtnl lock taken. This function
only takes rtnl lock if Qdisc reference counter reached zero and is
intended to be used as optimization.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/linux/rtnetlink.h |  5 +
 include/net/pkt_sched.h   |  1 +
 include/net/sch_generic.h |  2 ++
 net/sched/sch_api.c   | 18 ++
 net/sched/sch_generic.c   | 25 -
 5 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index dffbf665c086..d3dff3e41e6c 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -84,6 +84,11 @@ static inline struct netdev_queue *dev_ingress_queue(struct 
net_device *dev)
return rtnl_dereference(dev->ingress_queue);
 }
 
+static inline struct netdev_queue *dev_ingress_queue_rcu(struct net_device 
*dev)
+{
+   return rcu_dereference(dev->ingress_queue);
+}
+
 struct netdev_queue *dev_ingress_queue_create(struct net_device *dev);
 
 #ifdef CONFIG_NET_INGRESS
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 7dc769e5452b..a16fbe9a2a67 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -102,6 +102,7 @@ int qdisc_set_default(const char *id);
 void qdisc_hash_add(struct Qdisc *q, bool invisible);
 void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
+struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
struct nlattr *tab,
struct netlink_ext_ack *extack);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 18e22a5a6550..239c73f29471 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -90,6 +90,7 @@ struct Qdisc {
struct gnet_stats_queue __percpu *cpu_qstats;
int padded;
refcount_t  refcnt;
+   struct rcu_head rcu;
 
/*
 * For performance sake on SMP, we put highly modified fields at the end
@@ -555,6 +556,7 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue 
*dev_queue,
  struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
 void qdisc_put(struct Qdisc *qdisc);
+void qdisc_put_unlocked(struct Qdisc *qdisc);
 void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n,
   unsigned int len);
 struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 836b32e6e8e8..8854c9b674b8 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -315,6 +315,24 @@ struct Qdisc *qdisc_lookup(struct net_device *dev, u32 
handle)
return q;
 }
 
+struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle)
+{
+   struct Qdisc *q;
+   struct netdev_queue *nq;
+
+   if (!handle)
+   return NULL;
+   q = qdisc_match_from_root(dev->qdisc, handle);
+   if (q)
+   goto out;
+
+   nq = dev_ingress_queue_rcu(dev);
+   if (nq)
+   q = qdisc_match_from_root(nq->qdisc_sleeping, handle);
+out:
+   return q;
+}
+
 static struct Qdisc *qdisc_leaf(struct Qdisc *p, u32 classid)
 {
unsigned long cl;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index bb778485ed88..2176fe9db750 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -941,6 +941,13 @@ void qdisc_free(struct Qdisc *qdisc)
kfree((char *) qdisc - qdisc->padded);
 }
 
+void qdisc_free_cb(struct rcu_head *head)
+{
+   struct Qdisc *q = container_of(head, struct Qdisc, rcu);
+
+   qdisc_free(q);
+}
+
 static void qdisc_destroy(struct Qdisc *qdisc)
 {
const struct Qdisc_ops  *ops = qdisc->ops;
@@ -970,7 +977,7 @@ static void qdisc_destroy(struct Qdisc *qdisc)
kfree_skb_list(skb);
}
 
-   qdisc_free(qdisc);
+   call_rcu(>rcu, qdisc_free_cb);
 }
 
 void qdisc_put(struct Qdisc *qdisc)
@@ -983,6 +990,22 @@ void qdisc_put(struct Qdisc *qdisc)
 }
 EXPORT_SYMBOL(qdisc_put);
 
+/* Version of qdisc_put() that is called with rtnl mutex unlocked.
+ * Intended to be used as optimization, this function only takes rtnl lock if
+ * qdisc reference counter reached zero.
+ */
+
+void qdisc_put_unlocked(struct Qdisc *qdisc)
+{
+   if (qdisc->flags & TCQ_F_BUILTIN ||
+   !refcount_dec_and_rtnl_lock(>refcnt))
+   return;
+
+   qdisc_destroy(qdisc);
+   rtnl_unlock();
+}
+EXPORT_SYMBOL(qdisc_put_u

[PATCH net-next 07/13] net: sched: implement functions to put and flush all chains

2018-09-06 Thread Vlad Buslov

Extract code that flushes and puts all chains on tcf block to two
standalone function to be shared with functions that locklessly get/put
reference to block.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 55 +
 1 file changed, 30 insertions(+), 25 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index c3c7d4e2f84c..58b2d8443f6a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -538,6 +538,31 @@ static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
qdisc_put_unlocked(q);
 }
 
+static void tcf_block_flush_all_chains(struct tcf_block *block)
+{
+   struct tcf_chain *chain;
+
+   /* Hold a refcnt for all chains, so that they don't disappear
+* while we are iterating.
+*/
+   list_for_each_entry(chain, >chain_list, list)
+   tcf_chain_hold(chain);
+
+   list_for_each_entry(chain, >chain_list, list)
+   tcf_chain_flush(chain);
+}
+
+static void tcf_block_put_all_chains(struct tcf_block *block)
+{
+   struct tcf_chain *chain, *tmp;
+
+   /* At this point, all the chains should have refcnt >= 1. */
+   list_for_each_entry_safe(chain, tmp, >chain_list, list) {
+   tcf_chain_put_explicitly_created(chain);
+   tcf_chain_put(chain);
+   }
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -795,8 +820,6 @@ EXPORT_SYMBOL(tcf_block_get);
 void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
   struct tcf_block_ext_info *ei)
 {
-   struct tcf_chain *chain, *tmp;
-
if (!block)
return;
tcf_chain0_head_change_cb_del(block, ei);
@@ -813,32 +836,14 @@ void tcf_block_put_ext(struct tcf_block *block, struct 
Qdisc *q,
 
if (tcf_block_shared(block))
tcf_block_remove(block, block->net);
-
-   if (!free_block) {
-   /* Hold a refcnt for all chains, so that they don't
-* disappear while we are iterating.
-*/
-   list_for_each_entry(chain, >chain_list, list)
-   tcf_chain_hold(chain);
-
-   list_for_each_entry(chain, >chain_list, list)
-   tcf_chain_flush(chain);
-   }
-
+   if (!free_block)
+   tcf_block_flush_all_chains(block);
tcf_block_offload_unbind(block, q, ei);
 
-   if (free_block) {
+   if (free_block)
kfree(block);
-   } else {
-   /* At this point, all the chains should have
-* refcnt >= 1.
-*/
-   list_for_each_entry_safe(chain, tmp, >chain_list,
-list) {
-   tcf_chain_put_explicitly_created(chain);
-   tcf_chain_put(chain);
-   }
-   }
+   else
+   tcf_block_put_all_chains(block);
} else {
tcf_block_offload_unbind(block, q, ei);
}
-- 
2.7.5

[PATCH net-next 11/13] net: sched: implement tcf_block_get() and tcf_block_put()

2018-09-06 Thread Vlad Buslov

Implement get/put function for blocks that only take/release the reference
and perform deallocation. These functions are intended to be used by
unlocked rules update path to always hold reference to block while working
with it. They use on new fine-grained locking mechanisms introduced in
previous patches in this set, instead of relying on global protection
provided by rtnl lock.

Extract code that is common with tcf_block_detach_ext() into common
function __tcf_block_put().

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 70 -
 1 file changed, 48 insertions(+), 22 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index f06aa9313a58..5d9f91331d26 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -537,6 +537,19 @@ static struct tcf_block *tcf_block_lookup(struct net *net, 
u32 block_index)
return idr_find(>idr, block_index);
 }
 
+static struct tcf_block *tcf_block_get(struct net *net, u32 block_index)
+{
+   struct tcf_block *block;
+
+   rcu_read_lock();
+   block = tcf_block_lookup(net, block_index);
+   if (block && !refcount_inc_not_zero(>refcnt))
+   block = NULL;
+   rcu_read_unlock();
+
+   return block;
+}
+
 static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
 {
if (!q)
@@ -573,6 +586,40 @@ static void tcf_block_put_all_chains(struct tcf_block 
*block)
}
 }
 
+static void __tcf_block_put(struct tcf_block *block, struct Qdisc *q,
+   struct tcf_block_ext_info *ei)
+{
+   if (refcount_dec_and_test(>refcnt)) {
+   /* Flushing/putting all chains will cause the block to be
+* deallocated when last chain is freed. However, if chain_list
+* is empty, block has to be manually deallocated. After block
+* reference counter reached 0, it is no longer possible to
+* increment it or add new chains to block.
+*/
+   bool free_block = list_empty(>chain_list);
+
+   if (tcf_block_shared(block))
+   tcf_block_remove(block, block->net);
+   if (!free_block)
+   tcf_block_flush_all_chains(block);
+
+   if (q)
+   tcf_block_offload_unbind(block, q, ei);
+
+   if (free_block)
+   kfree_rcu(block, rcu);
+   else
+   tcf_block_put_all_chains(block);
+   } else if (q) {
+   tcf_block_offload_unbind(block, q, ei);
+   }
+}
+
+static void tcf_block_put(struct tcf_block *block)
+{
+   __tcf_block_put(block, NULL, NULL);
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -835,28 +882,7 @@ void tcf_block_detach_ext(struct tcf_block *block, struct 
Qdisc *q,
tcf_chain0_head_change_cb_del(block, ei);
tcf_block_owner_del(block, q, ei->binder_type);
 
-   if (refcount_dec_and_test(>refcnt)) {
-   /* Flushing/putting all chains will cause the block to be
-* deallocated when last chain is freed. However, if chain_list
-* is empty, block has to be manually deallocated. After block
-* reference counter reached 0, it is no longer possible to
-* increment it or add new chains to block.
-*/
-   bool free_block = list_empty(>chain_list);
-
-   if (tcf_block_shared(block))
-   tcf_block_remove(block, block->net);
-   if (!free_block)
-   tcf_block_flush_all_chains(block);
-   tcf_block_offload_unbind(block, q, ei);
-
-   if (free_block)
-   kfree_rcu(block, rcu);
-   else
-   tcf_block_put_all_chains(block);
-   } else {
-   tcf_block_offload_unbind(block, q, ei);
-   }
+   __tcf_block_put(block, q, ei);
 }
 EXPORT_SYMBOL(tcf_block_detach_ext);
 
-- 
2.7.5

[PATCH net-next 08/13] net: sched: rename tcf_block_get{_ext}() and tcf_block_put{_ext}()

2018-09-06 Thread Vlad Buslov

Functions tcf_block_get{_ext}() and tcf_block_put{_ext}() actually
attach/detach block to specific Qdisc besides just taking/putting
reference. Rename them according to their purpose.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/pkt_cls.h| 36 ++--
 net/sched/cls_api.c  | 31 +++
 net/sched/sch_atm.c  | 12 ++--
 net/sched/sch_cake.c |  4 ++--
 net/sched/sch_cbq.c  | 13 +++--
 net/sched/sch_drr.c  |  4 ++--
 net/sched/sch_dsmark.c   |  4 ++--
 net/sched/sch_fq_codel.c |  4 ++--
 net/sched/sch_hfsc.c | 11 ++-
 net/sched/sch_htb.c  | 13 +++--
 net/sched/sch_ingress.c  | 15 ---
 net/sched/sch_multiq.c   |  4 ++--
 net/sched/sch_prio.c |  4 ++--
 net/sched/sch_qfq.c  |  4 ++--
 net/sched/sch_sfb.c  |  4 ++--
 net/sched/sch_sfq.c  |  4 ++--
 16 files changed, 85 insertions(+), 82 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 75a3f3fdb359..9c11f8d83c1c 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -44,15 +44,15 @@ struct tcf_chain *tcf_chain_get_by_act(struct tcf_block 
*block,
   u32 chain_index);
 void tcf_chain_put_by_act(struct tcf_chain *chain);
 void tcf_block_netif_keep_dst(struct tcf_block *block);
-int tcf_block_get(struct tcf_block **p_block,
- struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
- struct netlink_ext_ack *extack);
-int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
- struct tcf_block_ext_info *ei,
- struct netlink_ext_ack *extack);
-void tcf_block_put(struct tcf_block *block);
-void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
-  struct tcf_block_ext_info *ei);
+int tcf_block_attach(struct tcf_block **p_block,
+struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
+struct netlink_ext_ack *extack);
+int tcf_block_attach_ext(struct tcf_block **p_block, struct Qdisc *q,
+struct tcf_block_ext_info *ei,
+struct netlink_ext_ack *extack);
+void tcf_block_detach(struct tcf_block *block);
+void tcf_block_detach_ext(struct tcf_block *block, struct Qdisc *q,
+ struct tcf_block_ext_info *ei);
 
 static inline bool tcf_block_shared(struct tcf_block *block)
 {
@@ -92,28 +92,28 @@ int tcf_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
 
 #else
 static inline
-int tcf_block_get(struct tcf_block **p_block,
- struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
- struct netlink_ext_ack *extack)
+int tcf_block_attach(struct tcf_block **p_block,
+struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
+struct netlink_ext_ack *extack)
 {
return 0;
 }
 
 static inline
-int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
- struct tcf_block_ext_info *ei,
- struct netlink_ext_ack *extack)
+int tcf_block_attach_ext(struct tcf_block **p_block, struct Qdisc *q,
+struct tcf_block_ext_info *ei,
+struct netlink_ext_ack *extack)
 {
return 0;
 }
 
-static inline void tcf_block_put(struct tcf_block *block)
+static inline void tcf_block_detach(struct tcf_block *block)
 {
 }
 
 static inline
-void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
-  struct tcf_block_ext_info *ei)
+void tcf_block_detach_ext(struct tcf_block *block, struct Qdisc *q,
+ struct tcf_block_ext_info *ei)
 {
 }
 
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 58b2d8443f6a..f11da74dd339 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -731,9 +731,9 @@ static void tcf_block_owner_del(struct tcf_block *block,
WARN_ON(1);
 }
 
-int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
- struct tcf_block_ext_info *ei,
- struct netlink_ext_ack *extack)
+int tcf_block_attach_ext(struct tcf_block **p_block, struct Qdisc *q,
+struct tcf_block_ext_info *ei,
+struct netlink_ext_ack *extack)
 {
struct net *net = qdisc_net(q);
struct tcf_block *block = NULL;
@@ -791,7 +791,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct 
Qdisc *q,
}
return err;
 }
-EXPORT_SYMBOL(tcf_block_get_ext);
+EXPORT_SYMBOL(tcf_block_attach_ext);
 
 static void tcf_chain_head_change_dflt(struct tcf_proto *tp_head, void *priv)
 {
@@ -800,9 +800,9 @@ static void tcf_chain_head_change_dflt(struct tcf_proto 
*tp_head, void *priv)
rcu_assign_pointer(*p_filter_chain, tp_head);
 }
 
-int tcf_block_get(struct tcf_block **p_block,
- struct

[PATCH net-next 02/13] net: sched: rename qdisc_destroy() to qdisc_put()

2018-09-06 Thread Vlad Buslov

Current implementation of qdisc_destroy() decrements Qdisc reference
counter and only actually destroy Qdisc if reference counter value reached
zero. Rename qdisc_destroy() to qdisc_put() in order for it to better
describe the way in which this function currently implemented and used.

Extract code that deallocates Qdisc into new private qdisc_destroy()
function. It is intended to be shared between regular qdisc_put() and its
unlocked version that is introduced in next patch in this series.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h |  2 +-
 net/sched/sch_api.c   |  6 +++---
 net/sched/sch_atm.c   |  2 +-
 net/sched/sch_cbq.c   |  2 +-
 net/sched/sch_cbs.c   |  2 +-
 net/sched/sch_drr.c   |  4 ++--
 net/sched/sch_dsmark.c|  2 +-
 net/sched/sch_fifo.c  |  2 +-
 net/sched/sch_generic.c   | 23 ++-
 net/sched/sch_hfsc.c  |  2 +-
 net/sched/sch_htb.c   |  4 ++--
 net/sched/sch_mq.c|  4 ++--
 net/sched/sch_mqprio.c|  4 ++--
 net/sched/sch_multiq.c|  6 +++---
 net/sched/sch_netem.c |  2 +-
 net/sched/sch_prio.c  |  6 +++---
 net/sched/sch_qfq.c   |  4 ++--
 net/sched/sch_red.c   |  4 ++--
 net/sched/sch_sfb.c   |  4 ++--
 net/sched/sch_tbf.c   |  4 ++--
 20 files changed, 47 insertions(+), 42 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a6d00093f35e..18e22a5a6550 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -554,7 +554,7 @@ void dev_deactivate_many(struct list_head *head);
 struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
  struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
-void qdisc_destroy(struct Qdisc *qdisc);
+void qdisc_put(struct Qdisc *qdisc);
 void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n,
   unsigned int len);
 struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 98541c6399db..836b32e6e8e8 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -921,7 +921,7 @@ static void notify_and_destroy(struct net *net, struct 
sk_buff *skb,
qdisc_notify(net, skb, n, clid, old, new);
 
if (old)
-   qdisc_destroy(old);
+   qdisc_put(old);
 }
 
 /* Graft qdisc "new" to class "classid" of qdisc "parent" or
@@ -974,7 +974,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc 
*parent,
qdisc_refcount_inc(new);
 
if (!ingress)
-   qdisc_destroy(old);
+   qdisc_put(old);
}
 
 skip:
@@ -1568,7 +1568,7 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct 
nlmsghdr *n,
err = qdisc_graft(dev, p, skb, n, clid, q, NULL, extack);
if (err) {
if (q)
-   qdisc_destroy(q);
+   qdisc_put(q);
return err;
}
 
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index cd49afca9617..d714d3747bcb 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -150,7 +150,7 @@ static void atm_tc_put(struct Qdisc *sch, unsigned long cl)
pr_debug("atm_tc_put: destroying\n");
list_del_init(>list);
pr_debug("atm_tc_put: qdisc %p\n", flow->q);
-   qdisc_destroy(flow->q);
+   qdisc_put(flow->q);
tcf_block_put(flow->block);
if (flow->sock) {
pr_debug("atm_tc_put: f_count %ld\n",
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index f42025d53cfe..4dc05409e3fb 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1418,7 +1418,7 @@ static void cbq_destroy_class(struct Qdisc *sch, struct 
cbq_class *cl)
WARN_ON(cl->filters);
 
tcf_block_put(cl->block);
-   qdisc_destroy(cl->q);
+   qdisc_put(cl->q);
qdisc_put_rtab(cl->R_tab);
gen_kill_estimator(>rate_est);
if (cl != >link)
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index e26a24017faa..e689e11b6d0f 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -379,7 +379,7 @@ static void cbs_destroy(struct Qdisc *sch)
cbs_disable_offload(dev, q);
 
if (q->qdisc)
-   qdisc_destroy(q->qdisc);
+   qdisc_put(q->qdisc);
 }
 
 static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb)
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index e0b0cf8a9939..cdebaed0f8cf 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -134,7 +134,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, 
u32 parentid,
tca[TCA_RATE]);
if (err) {
NL_SET_ERR_MSG(extack, "Fail

[PATCH net-next 13/13] net: sched: add flags to Qdisc class ops struct

2018-09-06 Thread Vlad Buslov

Extend Qdisc_class_ops with flags. Create enum to hold possible class ops
flag values. Add first class ops flags value QDISC_CLASS_OPS_DOIT_UNLOCKED
to indicate that class ops functions can be called without taking rtnl
lock.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 2b87b47c49f6..bc4082961726 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -174,6 +174,7 @@ static inline int qdisc_avail_bulklimit(const struct 
netdev_queue *txq)
 }
 
 struct Qdisc_class_ops {
+   unsigned intflags;
/* Child qdisc manipulation */
struct netdev_queue *   (*select_queue)(struct Qdisc *, struct tcmsg *);
int (*graft)(struct Qdisc *, unsigned long cl,
@@ -205,6 +206,13 @@ struct Qdisc_class_ops {
struct gnet_dump *);
 };
 
+/* Qdisc_class_ops flag values */
+
+/* Implements API that doesn't require rtnl lock */
+enum qdisc_class_ops_flags {
+   QDISC_CLASS_OPS_DOIT_UNLOCKED = 1,
+};
+
 struct Qdisc_ops {
struct Qdisc_ops*next;
const struct Qdisc_class_ops*cl_ops;
-- 
2.7.5

[PATCH net-next 12/13] net: sched: use reference counting for tcf blocks on rules update

2018-09-06 Thread Vlad Buslov

In order to remove dependency on rtnl lock on rules update path, always
take reference to block while using it on rules update path. Change
tcf_block_get() error handling to properly release block with reference
counting, instead of just destroying it, in order to accommodate potential
concurrent users.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 37 -
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 5d9f91331d26..8808818a1d24 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -633,7 +633,7 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
int err = 0;
 
if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-   block = tcf_block_lookup(net, block_index);
+   block = tcf_block_get(net, block_index);
if (!block) {
NL_SET_ERR_MSG(extack, "Block of given index was not 
found");
return ERR_PTR(-EINVAL);
@@ -713,6 +713,14 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
err = -EOPNOTSUPP;
goto errout_qdisc;
}
+
+   /* Always take reference to block in order to support execution
+* of rules update path of cls API without rtnl lock. Caller
+* must release block when it is finished using it. 'if' block
+* of this conditional obtain reference to block by calling
+* tcf_block_get().
+*/
+   refcount_inc(>refcnt);
}
 
return block;
@@ -726,6 +734,8 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
 
 static void tcf_block_release(struct Qdisc *q, struct tcf_block *block)
 {
+   if (!IS_ERR_OR_NULL(block))
+   tcf_block_put(block);
tcf_qdisc_put(q, true);
 }
 
@@ -794,21 +804,16 @@ int tcf_block_attach_ext(struct tcf_block **p_block, 
struct Qdisc *q,
 {
struct net *net = qdisc_net(q);
struct tcf_block *block = NULL;
-   bool created = false;
int err;
 
-   if (ei->block_index) {
+   if (ei->block_index)
/* block_index not 0 means the shared block is requested */
-   block = tcf_block_lookup(net, ei->block_index);
-   if (block)
-   refcount_inc(>refcnt);
-   }
+   block = tcf_block_get(net, ei->block_index);
 
if (!block) {
block = tcf_block_create(net, q, ei->block_index, extack);
if (IS_ERR(block))
return PTR_ERR(block);
-   created = true;
if (tcf_block_shared(block)) {
err = tcf_block_insert(block, net, extack);
if (err)
@@ -838,14 +843,8 @@ int tcf_block_attach_ext(struct tcf_block **p_block, 
struct Qdisc *q,
 err_chain0_head_change_cb_add:
tcf_block_owner_del(block, q, ei->binder_type);
 err_block_owner_add:
-   if (created) {
-   if (tcf_block_shared(block))
-   tcf_block_remove(block, net);
 err_block_insert:
-   kfree_rcu(block, rcu);
-   } else {
-   refcount_dec(>refcnt);
-   }
+   tcf_block_put(block);
return err;
 }
 EXPORT_SYMBOL(tcf_block_attach_ext);
@@ -1738,7 +1737,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct 
netlink_callback *cb)
return err;
 
if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-   block = tcf_block_lookup(net, tcm->tcm_block_index);
+   block = tcf_block_get(net, tcm->tcm_block_index);
if (!block)
goto out;
/* If we work with block index, q is NULL and parent value
@@ -1797,6 +1796,8 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct 
netlink_callback *cb)
}
}
 
+   if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK)
+   tcf_block_put(block);
cb->args[0] = index;
 
 out:
@@ -2061,7 +2062,7 @@ static int tc_dump_chain(struct sk_buff *skb, struct 
netlink_callback *cb)
return err;
 
if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-   block = tcf_block_lookup(net, tcm->tcm_block_index);
+   block = tcf_block_get(net, tcm->tcm_block_index);
if (!block)
goto out;
/* If we work with block index, q is NULL and parent value
@@ -2128,6 +2129,8 @@ static int tc_dump_chain(struct sk_buff *skb, struct 
netlink_callback *cb)
index++;
}
 
+   if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK)
+   tcf_block_put(block);
cb->args[0] = index;
 
 out:
-- 
2.7.5

[PATCH net-next 09/13] net: sched: extend tcf_block with rcu

2018-09-06 Thread Vlad Buslov

Extend tcf_block with rcu to allow safe deallocation when it is accessed
concurrently.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 include/net/sch_generic.h | 1 +
 net/sched/cls_api.c   | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 825e2bf6c5c3..2b87b47c49f6 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -357,6 +357,7 @@ struct tcf_block {
struct tcf_chain *chain;
struct list_head filter_chain_list;
} chain0;
+   struct rcu_head rcu;
 };
 
 static inline void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index f11da74dd339..502b2da8a885 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -241,7 +241,7 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
block->chain0.chain = NULL;
kfree(chain);
if (list_empty(>chain_list) && !refcount_read(>refcnt))
-   kfree(block);
+   kfree_rcu(block, rcu);
 }
 
 static void tcf_chain_hold(struct tcf_chain *chain)
@@ -785,7 +785,7 @@ int tcf_block_attach_ext(struct tcf_block **p_block, struct 
Qdisc *q,
if (tcf_block_shared(block))
tcf_block_remove(block, net);
 err_block_insert:
-   kfree(block);
+   kfree_rcu(block, rcu);
} else {
refcount_dec(>refcnt);
}
@@ -841,7 +841,7 @@ void tcf_block_detach_ext(struct tcf_block *block, struct 
Qdisc *q,
tcf_block_offload_unbind(block, q, ei);
 
if (free_block)
-   kfree(block);
+   kfree_rcu(block, rcu);
else
tcf_block_put_all_chains(block);
} else {
-- 
2.7.5

[PATCH net-next 05/13] net: sched: use Qdisc rcu API instead of relying on rtnl lock

2018-09-06 Thread Vlad Buslov

As a preparation from removing rtnl lock dependency from rules update path,
use Qdisc rcu and reference counting capabilities instead of relying on
rtnl lock while working with Qdiscs. Create new tcf_block_release()
function, and use it to free resources taken by tcf_block_find().
Currently, this function only releases Qdisc and it is extended in next
patches in this series.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 net/sched/cls_api.c | 88 -
 1 file changed, 73 insertions(+), 15 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 1a67af8a6e8c..cfa4a02a6a1a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -527,6 +527,17 @@ static struct tcf_block *tcf_block_lookup(struct net *net, 
u32 block_index)
return idr_find(>idr, block_index);
 }
 
+static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
+{
+   if (!q)
+   return;
+
+   if (rtnl_held)
+   qdisc_put(q);
+   else
+   qdisc_put_unlocked(q);
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -537,6 +548,7 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
struct netlink_ext_ack *extack)
 {
struct tcf_block *block;
+   int err = 0;
 
if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
block = tcf_block_lookup(net, block_index);
@@ -548,55 +560,91 @@ static struct tcf_block *tcf_block_find(struct net *net, 
struct Qdisc **q,
const struct Qdisc_class_ops *cops;
struct net_device *dev;
 
+   rcu_read_lock();
+
/* Find link */
-   dev = __dev_get_by_index(net, ifindex);
-   if (!dev)
+   dev = dev_get_by_index_rcu(net, ifindex);
+   if (!dev) {
+   rcu_read_unlock();
return ERR_PTR(-ENODEV);
+   }
 
/* Find qdisc */
if (!*parent) {
*q = dev->qdisc;
*parent = (*q)->handle;
} else {
-   *q = qdisc_lookup(dev, TC_H_MAJ(*parent));
+   *q = qdisc_lookup_rcu(dev, TC_H_MAJ(*parent));
if (!*q) {
NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't 
exists");
-   return ERR_PTR(-EINVAL);
+   err = -EINVAL;
+   goto errout_rcu;
}
}
 
+   *q = qdisc_refcount_inc_nz(*q);
+   if (!*q) {
+   NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
+   err = -EINVAL;
+   goto errout_rcu;
+   }
+
/* Is it classful? */
cops = (*q)->ops->cl_ops;
if (!cops) {
NL_SET_ERR_MSG(extack, "Qdisc not classful");
-   return ERR_PTR(-EINVAL);
+   err = -EINVAL;
+   goto errout_rcu;
}
 
if (!cops->tcf_block) {
NL_SET_ERR_MSG(extack, "Class doesn't support blocks");
-   return ERR_PTR(-EOPNOTSUPP);
+   err = -EOPNOTSUPP;
+   goto errout_rcu;
}
 
+   /* At this point we know that qdisc is not noop_qdisc,
+* which means that qdisc holds a reference to net_device
+* and we hold a reference to qdisc, so it is safe to release
+* rcu read lock.
+*/
+   rcu_read_unlock();
+
/* Do we search for filter, attached to class? */
if (TC_H_MIN(*parent)) {
*cl = cops->find(*q, *parent);
if (*cl == 0) {
NL_SET_ERR_MSG(extack, "Specified class doesn't 
exist");
-   return ERR_PTR(-ENOENT);
+   err = -ENOENT;
+   goto errout_qdisc;
}
}
 
/* And the last stroke */
block = cops->tcf_block(*q, *cl, extack);
-   if (!block)
-   return ERR_PTR(-EINVAL);
+   if (!block) {
+   err = -EINVAL;
+   goto errout_qdisc;
+   }
if (tcf_block_shared(block)) {
NL_SET_ERR_MSG(extack, "This filter block is shared. 
Please use the block index to manipulate the filters");
-   return ERR_PTR(-EOPNOTSUPP);
+   err = -EOPNOTSUPP;
+

Re: [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete

2018-09-05 Thread Vlad Buslov



On Tue 04 Sep 2018 at 22:41, Cong Wang  wrote:
> On Mon, Sep 3, 2018 at 1:33 PM Vlad Buslov  wrote:
>>
>>
>> On Mon 03 Sep 2018 at 18:50, Cong Wang  wrote:
>> > On Mon, Sep 3, 2018 at 12:06 AM Vlad Buslov  wrote:
>> >>
>> >> Action API was changed to work with actions and action_idr in concurrency
>> >> safe manner, however tcf_del_walker() still uses actions without taking
>> >> reference to them first and deletes them directly, disregarding possible
>> >> concurrent delete.
>> >>
>> >> Change tcf_del_walker() to use tcf_idr_delete_index() that doesn't require
>> >> caller to hold reference to action and accepts action id as argument,
>> >> instead of direct action pointer.
>> >
>> > Hmm, why doesn't tcf_del_walker() just take idrinfo->lock? At least
>> > tcf_dump_walker() already does.
>>
>> Because tcf_del_walker() calls __tcf_idr_release(), which take
>> idrinfo->lock itself (deadlock). It also calls sleeping functions like
>
> Deadlock can be easily resolved by moving the lock out.
>
>
>> tcf_action_goto_chain_fini(), so just implementing function that
>> releases action without taking idrinfo->lock is not enough.
>
> Sleeping can be resolved either by making it atomic or
> deferring it to a work queue.
>
> None of your arguments here is a blocker to locking
> idrinfo->lock. You really should focus on if it is really
> necessary to lock idrinfo->lock in tcf_del_walker(), rather
> than these details.
>
> For me, if you need idrinfo->lock for dump walker, you must
> need it for delete walker too, because deletion is a writer
> which should require stronger protection than the dumper,
> which merely a reader.

I don't get how it is necessary. Dump walker uses pointers to actions
directly, and in order to be concurrency-safe it must either hold the
lock or obtain reference to action. Note that del walker doesn't use the
action pointer, it only passed action id to tcf_idr_delete_index()
function, which does all the necessary locking and can deal with
potential concurrency issues (concurrent delete, etc.). This approach
also benefits from code reuse from other code paths that delete actions,
instead of implementing its own.

[PATCH net-next v2] net: sched: action_ife: take reference to meta module

2018-09-03 Thread Vlad Buslov

 7d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb 
cd 66 0f 1f 44 00 00 83 3d b9 d5 2b 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 31 c3 48 83 ec 08 e8 be cd 00 00 48 89 04 24
[  646.791474] RSP: 002b:7ffdef7d6b58 EFLAGS: 0246 ORIG_RAX: 
002e
[  646.799721] RAX: ffda RBX: 0024 RCX: 7f4163872150
[  646.807240] RDX:  RSI: 7ffdef7d6bd0 RDI: 0003
[  646.814760] RBP: 5b8b9482 R08: 0001 R09: 
[  646.822286] R10: 05e7 R11: 0246 R12: 7ffdef7dad20
[  646.829807] R13:  R14:  R15: 00679bc0
[  646.837360] irq event stamp: 6083
[  646.841043] hardirqs last  enabled at (6081): [] 
__call_rcu+0x17d/0x500
[  646.849882] hardirqs last disabled at (6083): [] 
trace_hardirqs_off_thunk+0x1a/0x1c
[  646.859775] softirqs last  enabled at (5968): [] 
__do_softirq+0x4a1/0x6ee
[  646.868784] softirqs last disabled at (6082): [] 
tcf_ife_cleanup+0x39/0x200 [act_ife]
[  646.878845] ---[ end trace b1b8c12ffe51e657 ]---

Fixes: 5ffe57da29b3 ("act_ife: fix a potential deadlock")
Signed-off-by: Vlad Buslov 
---

Changes V1->V2:
- fold constants into helper function

 net/sched/act_ife.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 19454146f60d..ffd77e3fc2b6 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -326,6 +326,20 @@ static int __add_metainfo(const struct tcf_meta_ops *ops,
return ret;
 }
 
+static int add_metainfo_and_get_ops(const struct tcf_meta_ops *ops,
+   struct tcf_ife_info *ife, u32 metaid,
+   bool exists)
+{
+   int ret;
+
+   if (!try_module_get(ops->owner))
+   return -ENOENT;
+   ret = __add_metainfo(ops, ife, metaid, NULL, 0, true, exists);
+   if (ret)
+   module_put(ops->owner);
+   return ret;
+}
+
 static int add_metainfo(struct tcf_ife_info *ife, u32 metaid, void *metaval,
int len, bool exists)
 {
@@ -349,7 +363,7 @@ static int use_all_metadata(struct tcf_ife_info *ife, bool 
exists)
 
read_lock(_mod_lock);
list_for_each_entry(o, , list) {
-   rc = __add_metainfo(o, ife, o->metaid, NULL, 0, true, exists);
+   rc = add_metainfo_and_get_ops(o, ife, o->metaid, exists);
if (rc == 0)
installed += 1;
}
-- 
2.7.5

Re: [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete

2018-09-03 Thread Vlad Buslov



On Mon 03 Sep 2018 at 18:50, Cong Wang  wrote:
> On Mon, Sep 3, 2018 at 12:06 AM Vlad Buslov  wrote:
>>
>> Action API was changed to work with actions and action_idr in concurrency
>> safe manner, however tcf_del_walker() still uses actions without taking
>> reference to them first and deletes them directly, disregarding possible
>> concurrent delete.
>>
>> Change tcf_del_walker() to use tcf_idr_delete_index() that doesn't require
>> caller to hold reference to action and accepts action id as argument,
>> instead of direct action pointer.
>
> Hmm, why doesn't tcf_del_walker() just take idrinfo->lock? At least
> tcf_dump_walker() already does.

Because tcf_del_walker() calls __tcf_idr_release(), which take
idrinfo->lock itself (deadlock). It also calls sleeping functions like
tcf_action_goto_chain_fini(), so just implementing function that
releases action without taking idrinfo->lock is not enough.

Re: [PATCH net-next] net: sched: action_ife: take reference to meta module

2018-09-03 Thread Vlad Buslov



On Mon 03 Sep 2018 at 17:03, Cong Wang  wrote:
> On Mon, Sep 3, 2018 at 12:10 AM Vlad Buslov  wrote:
>>
>> Recent refactoring of add_metainfo() caused use_all_metadata() to add
>> metainfo to ife action metalist without taking reference to module. This
>> causes warning in module_put called from ife action cleanup function.
>>
>> Implement add_metainfo_and_get_ops() function that returns with reference
>> to module taken if metainfo was added successfully, and call it from
>> use_all_metadata(), instead of calling __add_metainfo() directly.
>
> Good catch!
>
> I thought every entry in ifeoplist must hold a refcnt to its module, looks
> like I was wrong.
>
>
>> read_lock(_mod_lock);
>> list_for_each_entry(o, , list) {
>> -   rc = __add_metainfo(o, ife, o->metaid, NULL, 0, true, 
>> exists);
>> +   rc = add_metainfo_and_get_ops(o, ife, o->metaid, NULL, 0, 
>> true,
>> + exists);
>> if (rc == 0)
>> installed += 1;
>
> I am afraid you have to rollback on failure inside this loop, that is,
> releasing all previous module refcnt properly on error.

Do I? This function looks like it is explicitly designed to succeed if
at least one metainfo was successfully added. And this is how it was
originally implemented before deadlock fix refactoring.

[PATCH net-next] net: sched: action_ife: take reference to meta module

2018-09-03 Thread Vlad Buslov

 7d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb 
cd 66 0f 1f 44 00 00 83 3d b9 d5 2b 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 31 c3 48 83 ec 08 e8 be cd 00 00 48 89 04 24
[  646.791474] RSP: 002b:7ffdef7d6b58 EFLAGS: 0246 ORIG_RAX: 
002e
[  646.799721] RAX: ffda RBX: 0024 RCX: 7f4163872150
[  646.807240] RDX:  RSI: 7ffdef7d6bd0 RDI: 0003
[  646.814760] RBP: 5b8b9482 R08: 0001 R09: 
[  646.822286] R10: 05e7 R11: 0246 R12: 7ffdef7dad20
[  646.829807] R13:  R14:  R15: 00679bc0
[  646.837360] irq event stamp: 6083
[  646.841043] hardirqs last  enabled at (6081): [] 
__call_rcu+0x17d/0x500
[  646.849882] hardirqs last disabled at (6083): [] 
trace_hardirqs_off_thunk+0x1a/0x1c
[  646.859775] softirqs last  enabled at (5968): [] 
__do_softirq+0x4a1/0x6ee
[  646.868784] softirqs last disabled at (6082): [] 
tcf_ife_cleanup+0x39/0x200 [act_ife]
[  646.878845] ---[ end trace b1b8c12ffe51e657 ]---

Fixes: 5ffe57da29b3 ("act_ife: fix a potential deadlock")
Signed-off-by: Vlad Buslov 
---
 net/sched/act_ife.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 19454146f60d..d8ea10259934 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -326,6 +326,21 @@ static int __add_metainfo(const struct tcf_meta_ops *ops,
return ret;
 }
 
+static int add_metainfo_and_get_ops(const struct tcf_meta_ops *ops,
+   struct tcf_ife_info *ife, u32 metaid,
+   void *metaval, int len, bool atomic,
+   bool exists)
+{
+   int ret;
+
+   if (!try_module_get(ops->owner))
+   return -ENOENT;
+   ret = __add_metainfo(ops, ife, metaid, metaval, len, false, exists);
+   if (ret)
+   module_put(ops->owner);
+   return ret;
+}
+
 static int add_metainfo(struct tcf_ife_info *ife, u32 metaid, void *metaval,
int len, bool exists)
 {
@@ -349,7 +364,8 @@ static int use_all_metadata(struct tcf_ife_info *ife, bool 
exists)
 
read_lock(_mod_lock);
list_for_each_entry(o, , list) {
-   rc = __add_metainfo(o, ife, o->metaid, NULL, 0, true, exists);
+   rc = add_metainfo_and_get_ops(o, ife, o->metaid, NULL, 0, true,
+ exists);
if (rc == 0)
installed += 1;
}
-- 
2.7.5

[PATCH net-next] net: sched: act_nat: remove dependency on rtnl lock

2018-09-03 Thread Vlad Buslov

According to the new locking rule, we have to take tcf_lock for both
->init() and ->dump(), as RTNL will be removed.

Use tcf spinlock to protect private nat action data from concurrent
modification during dump. (nat init already uses tcf spinlock when changing
action state)

Signed-off-by: Vlad Buslov 
---
 net/sched/act_nat.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c
index d98f33fdffe2..c5c1e23add77 100644
--- a/net/sched/act_nat.c
+++ b/net/sched/act_nat.c
@@ -256,28 +256,31 @@ static int tcf_nat_dump(struct sk_buff *skb, struct 
tc_action *a,
unsigned char *b = skb_tail_pointer(skb);
struct tcf_nat *p = to_tcf_nat(a);
struct tc_nat opt = {
-   .old_addr = p->old_addr,
-   .new_addr = p->new_addr,
-   .mask = p->mask,
-   .flags= p->flags,
-
.index= p->tcf_index,
-   .action   = p->tcf_action,
.refcnt   = refcount_read(>tcf_refcnt) - ref,
.bindcnt  = atomic_read(>tcf_bindcnt) - bind,
};
struct tcf_t t;
 
+   spin_lock_bh(>tcf_lock);
+   opt.old_addr = p->old_addr;
+   opt.new_addr = p->new_addr;
+   opt.mask = p->mask;
+   opt.flags = p->flags;
+   opt.action = p->tcf_action;
+
if (nla_put(skb, TCA_NAT_PARMS, sizeof(opt), ))
goto nla_put_failure;
 
tcf_tm_dump(, >tcf_tm);
if (nla_put_64bit(skb, TCA_NAT_TM, sizeof(t), , TCA_NAT_PAD))
goto nla_put_failure;
+   spin_unlock_bh(>tcf_lock);
 
return skb->len;
 
 nla_put_failure:
+   spin_unlock_bh(>tcf_lock);
nlmsg_trim(skb, b);
return -1;
 }
-- 
2.7.5

[PATCH net-next] net: sched: act_skbedit: remove dependency on rtnl lock

2018-09-03 Thread Vlad Buslov

According to the new locking rule, we have to take tcf_lock for both
->init() and ->dump(), as RTNL will be removed.

Use tcf lock to protect skbedit action struct private data from concurrent
modification in init and dump. Use rcu swap operation to reassign params
pointer under protection of tcf lock. (old params value is not used by
init, so there is no need of standalone rcu dereference step)

Remove rtnl lock assertion that is no longer required.

Signed-off-by: Vlad Buslov 
---
 net/sched/act_skbedit.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index b6263704ea57..64dba3708fce 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -99,7 +99,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr 
*nla,
struct netlink_ext_ack *extack)
 {
struct tc_action_net *tn = net_generic(net, skbedit_net_id);
-   struct tcf_skbedit_params *params_old, *params_new;
+   struct tcf_skbedit_params *params_new;
struct nlattr *tb[TCA_SKBEDIT_MAX + 1];
struct tc_skbedit *parm;
struct tcf_skbedit *d;
@@ -187,8 +187,6 @@ static int tcf_skbedit_init(struct net *net, struct nlattr 
*nla,
}
}
 
-   ASSERT_RTNL();
-
params_new = kzalloc(sizeof(*params_new), GFP_KERNEL);
if (unlikely(!params_new)) {
if (ret == ACT_P_CREATED)
@@ -210,11 +208,13 @@ static int tcf_skbedit_init(struct net *net, struct 
nlattr *nla,
if (flags & SKBEDIT_F_MASK)
params_new->mask = *mask;
 
+   spin_lock_bh(>tcf_lock);
d->tcf_action = parm->action;
-   params_old = rtnl_dereference(d->params);
-   rcu_assign_pointer(d->params, params_new);
-   if (params_old)
-   kfree_rcu(params_old, rcu);
+   rcu_swap_protected(d->params, params_new,
+  lockdep_is_held(>tcf_lock));
+   spin_unlock_bh(>tcf_lock);
+   if (params_new)
+   kfree_rcu(params_new, rcu);
 
if (ret == ACT_P_CREATED)
tcf_idr_insert(tn, *a);
@@ -231,12 +231,14 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct 
tc_action *a,
.index   = d->tcf_index,
.refcnt  = refcount_read(>tcf_refcnt) - ref,
.bindcnt = atomic_read(>tcf_bindcnt) - bind,
-   .action  = d->tcf_action,
};
u64 pure_flags = 0;
struct tcf_t t;
 
-   params = rtnl_dereference(d->params);
+   spin_lock_bh(>tcf_lock);
+   params = rcu_dereference_protected(d->params,
+  lockdep_is_held(>tcf_lock));
+   opt.action = d->tcf_action;
 
if (nla_put(skb, TCA_SKBEDIT_PARMS, sizeof(opt), ))
goto nla_put_failure;
@@ -264,9 +266,12 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct 
tc_action *a,
tcf_tm_dump(, >tcf_tm);
if (nla_put_64bit(skb, TCA_SKBEDIT_TM, sizeof(t), , TCA_SKBEDIT_PAD))
goto nla_put_failure;
+   spin_unlock_bh(>tcf_lock);
+
return skb->len;
 
 nla_put_failure:
+   spin_unlock_bh(>tcf_lock);
nlmsg_trim(skb, b);
return -1;
 }
-- 
2.7.5

[PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete

2018-09-03 Thread Vlad Buslov

Action API was changed to work with actions and action_idr in concurrency
safe manner, however tcf_del_walker() still uses actions without taking
reference to them first and deletes them directly, disregarding possible
concurrent delete.

Change tcf_del_walker() to use tcf_idr_delete_index() that doesn't require
caller to hold reference to action and accepts action id as argument,
instead of direct action pointer.

Modify tcf_action_delete_index() to return ACT_P_DELETED instead of 0 when
action was deleted successfully. This is necessary to allow
tcf_del_walker() to count deleted actions.

Signed-off-by: Vlad Buslov 
---
 net/sched/act_api.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 398c752ff529..d593114e7d2f 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -246,6 +246,8 @@ static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, 
struct sk_buff *skb,
goto done;
 }
 
+static int tcf_idr_delete_index(struct tcf_idrinfo *idrinfo, u32 index);
+
 static int tcf_del_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
  const struct tc_action_ops *ops)
 {
@@ -263,13 +265,11 @@ static int tcf_del_walker(struct tcf_idrinfo *idrinfo, 
struct sk_buff *skb,
goto nla_put_failure;
 
idr_for_each_entry_ul(idr, p, id) {
-   ret = __tcf_idr_release(p, false, true);
-   if (ret == ACT_P_DELETED) {
-   module_put(ops->owner);
+   ret = tcf_idr_delete_index(idrinfo, id);
+   if (ret == ACT_P_DELETED)
n_i++;
-   } else if (ret < 0) {
+   else if (ret < 0)
goto nla_put_failure;
-   }
}
if (nla_put_u32(skb, TCA_FCNT, n_i))
goto nla_put_failure;
@@ -343,7 +343,7 @@ static int tcf_idr_delete_index(struct tcf_idrinfo 
*idrinfo, u32 index)
 
tcf_action_cleanup(p);
module_put(owner);
-   return 0;
+   return ACT_P_DELETED;
}
ret = 0;
} else {
-- 
2.7.5

[PATCH net-next] net: sched: null actions array pointer before releasing action

2018-09-03 Thread Vlad Buslov

 kfree+0xf4/0x2f0
[  807.830080]  __tcf_action_put+0x5a/0xb0
[  807.834281]  tcf_action_put_many+0x46/0x70
[  807.838747]  tca_action_gd+0x232/0xc40
[  807.842862]  tc_ctl_action+0x215/0x230
[  807.846977]  rtnetlink_rcv_msg+0x56a/0x6d0
[  807.851444]  netlink_rcv_skb+0x18d/0x200
[  807.855731]  netlink_unicast+0x2d0/0x370
[  807.860021]  netlink_sendmsg+0x3b9/0x6a0
[  807.864312]  sock_sendmsg+0x6b/0x80
[  807.868166]  ___sys_sendmsg+0x4a1/0x520
[  807.872372]  __sys_sendmsg+0xd7/0x150
[  807.876401]  do_syscall_64+0x72/0x2c0
[  807.880431]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  807.887704] The buggy address belongs to the object at 88033e636000
which belongs to the cache kmalloc-256 of size 256
[  807.900909] The buggy address is located 0 bytes inside of
256-byte region [88033e636000, 88033e636100)
[  807.913155] The buggy address belongs to the page:
[  807.918322] page:ea000cf98d80 count:1 mapcount:0 
mapping:88036f80ee00 index:0x0 compound_mapcount: 0
[  807.928831] flags: 0x5fff808100(slab|head)
[  807.933647] raw: 005fff808100 ea000db44f00 00040004 
88036f80ee00
[  807.942050] raw:  80190019 0001 

[  807.950456] page dumped because: kasan: bad access detected

[  807.958240] Memory state around the buggy address:
[  807.963405]  88033e635f00: fc fc fc fc fb fb fb fb fb fb fb fc fc fc fc 
fb
[  807.971288]  88033e635f80: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc 
fc
[  807.979166] >88033e636000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  807.994882]^
[  807.998477]  88033e636080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  808.006352]  88033e636100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb 
fb
[  808.014230] 
==
[  808.022108] Disabling lock debugging due to kernel taint

Fixes: edfaf94fa705 ("net_sched: improve and refactor tcf_action_put_many()")
Signed-off-by: Vlad Buslov 
---
 net/sched/act_api.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 398c752ff529..b8540bcaa8fc 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -1175,6 +1175,7 @@ static int tcf_action_delete(struct net *net, struct 
tc_action *actions[])
struct tcf_idrinfo *idrinfo = a->idrinfo;
u32 act_index = a->tcfa_index;
 
+   actions[i] = NULL;
if (tcf_action_put(a)) {
/* last reference, action was deleted concurrently */
module_put(ops->owner);
@@ -1186,7 +1187,6 @@ static int tcf_action_delete(struct net *net, struct 
tc_action *actions[])
if (ret < 0)
return ret;
}
-   actions[i] = NULL;
}
return 0;
 }
-- 
2.7.5

Re: [Patch net-nnext] net_sched: add missing tcf_lock for act_connmark

2018-08-30 Thread Vlad Buslov

On Wed 29 Aug 2018 at 17:15, Cong Wang  wrote:
> According to the new locking rule, we have to take tcf_lock
> for both ->init() and ->dump(), as RTNL will be removed.
> However, it is missing for act_connmark.

Thank you for finding and fixing this!

>
> Cc: Vlad Buslov 
> Signed-off-by: Cong Wang 
> ---
>  net/sched/act_connmark.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
> index e869c0ee63c8..8475913f2070 100644
> --- a/net/sched/act_connmark.c
> +++ b/net/sched/act_connmark.c
> @@ -143,8 +143,10 @@ static int tcf_connmark_init(struct net *net, struct 
> nlattr *nla,
>   return -EEXIST;
>   }
>   /* replacing action and zone */
> + spin_lock_bh(>tcf_lock);
>   ci->tcf_action = parm->action;
>   ci->zone = parm->zone;
> + spin_unlock_bh(>tcf_lock);
>   ret = 0;
>   }
>  
> @@ -156,16 +158,16 @@ static inline int tcf_connmark_dump(struct sk_buff 
> *skb, struct tc_action *a,
>  {
>   unsigned char *b = skb_tail_pointer(skb);
>   struct tcf_connmark_info *ci = to_connmark(a);
> -
>   struct tc_connmark opt = {
>   .index   = ci->tcf_index,
>   .refcnt  = refcount_read(>tcf_refcnt) - ref,
>   .bindcnt = atomic_read(>tcf_bindcnt) - bind,
> - .action  = ci->tcf_action,
> - .zone   = ci->zone,
>   };
>   struct tcf_t t;
>  
> + spin_lock_bh(>tcf_lock);
> + opt.action = ci->tcf_action;
> + opt.zone = ci->zone;
>   if (nla_put(skb, TCA_CONNMARK_PARMS, sizeof(opt), ))
>   goto nla_put_failure;
>  
> @@ -173,9 +175,12 @@ static inline int tcf_connmark_dump(struct sk_buff *skb, 
> struct tc_action *a,
>   if (nla_put_64bit(skb, TCA_CONNMARK_TM, sizeof(t), ,
> TCA_CONNMARK_PAD))
>   goto nla_put_failure;
> + spin_unlock_bh(>tcf_lock);
>  
>   return skb->len;
> +
>  nla_put_failure:
> + spin_unlock_bh(>tcf_lock);
>   nlmsg_trim(skb, b);
>   return -1;
>  }

[PATCH net-next] net: sched: always disable bh when taking tcf_lock

2018-08-14 Thread Vlad Buslov

 
[  105.914997] R10:  R11:  R12: 0004
[  105.922487] R13: b6636140 R14: b66362d8 R15: 00188d36091b
[  105.929988]  ? trace_hardirqs_on_caller+0x141/0x2d0
[  105.935232]  do_idle+0x28e/0x320
[  105.938817]  ? arch_cpu_idle_exit+0x40/0x40
[  105.943361]  ? mark_lock+0x8c1/0x980
[  105.947295]  ? _raw_spin_unlock_irqrestore+0x32/0x60
[  105.952619]  cpu_startup_entry+0xc2/0xd0
[  105.956900]  ? cpu_in_idle+0x20/0x20
[  105.960830]  ? _raw_spin_unlock_irqrestore+0x32/0x60
[  105.966146]  ? trace_hardirqs_on_caller+0x141/0x2d0
[  105.971391]  start_secondary+0x2b5/0x360
[  105.975669]  ? set_cpu_sibling_map+0x1330/0x1330
[  105.980654]  secondary_startup_64+0xa5/0xb0

Taking tcf_lock in sample action with bh disabled causes lockdep to issue a
warning regarding possible irq lock inversion dependency between tcf_lock,
and psample_groups_lock that is taken when holding tcf_lock in sample init:

[  162.108959]  Possible interrupt unsafe locking scenario:

[  162.116386]CPU0CPU1
[  162.121277]
[  162.126162]   lock(psample_groups_lock);
[  162.130447]local_irq_disable();
[  162.136772]lock(&(>tcfa_lock)->rlock);
[  162.143957]lock(psample_groups_lock);
[  162.150813]   
[  162.153808] lock(&(>tcfa_lock)->rlock);
[  162.158608]
*** DEADLOCK ***

In order to prevent potential lock inversion dependency between tcf_lock
and psample_groups_lock, extract call to psample_group_get() from tcf_lock
protected section in sample action init function.

Fixes: 4e232818bd32 ("net: sched: act_mirred: remove dependency on rtnl lock")
Fixes: 764e9a24480f ("net: sched: act_vlan: remove dependency on rtnl lock")
Fixes: 729e01260989 ("net: sched: act_tunnel_key: remove dependency on rtnl 
lock")
Fixes: d77284956656 ("net: sched: act_sample: remove dependency on rtnl lock")
Fixes: e8917f437006 ("net: sched: act_gact: remove dependency on rtnl lock")
Fixes: b6a2b971c0b0 ("net: sched: act_csum: remove dependency on rtnl lock")
Fixes: 2142236b4584 ("net: sched: act_bpf: remove dependency on rtnl lock")
Signed-off-by: Vlad Buslov 
---
 net/sched/act_bpf.c| 10 +-
 net/sched/act_csum.c   | 10 +-
 net/sched/act_gact.c   | 10 +-
 net/sched/act_mirred.c | 16 
 net/sched/act_sample.c | 25 ++---
 net/sched/act_tunnel_key.c | 10 +-
 net/sched/act_vlan.c   | 10 +-
 7 files changed, 47 insertions(+), 44 deletions(-)

diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 9e8a33f9fee3..3d56ab69614c 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -147,7 +147,7 @@ static int tcf_bpf_dump(struct sk_buff *skb, struct 
tc_action *act,
struct tcf_t tm;
int ret;
 
-   spin_lock(>tcf_lock);
+   spin_lock_bh(>tcf_lock);
opt.action = prog->tcf_action;
if (nla_put(skb, TCA_ACT_BPF_PARMS, sizeof(opt), ))
goto nla_put_failure;
@@ -164,11 +164,11 @@ static int tcf_bpf_dump(struct sk_buff *skb, struct 
tc_action *act,
  TCA_ACT_BPF_PAD))
goto nla_put_failure;
 
-   spin_unlock(>tcf_lock);
+   spin_unlock_bh(>tcf_lock);
return skb->len;
 
 nla_put_failure:
-   spin_unlock(>tcf_lock);
+   spin_unlock_bh(>tcf_lock);
nlmsg_trim(skb, tp);
return -1;
 }
@@ -340,7 +340,7 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
 
prog = to_bpf(*act);
 
-   spin_lock(>tcf_lock);
+   spin_lock_bh(>tcf_lock);
if (res != ACT_P_CREATED)
tcf_bpf_prog_fill_cfg(prog, );
 
@@ -352,7 +352,7 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
 
prog->tcf_action = parm->action;
rcu_assign_pointer(prog->filter, cfg.filter);
-   spin_unlock(>tcf_lock);
+   spin_unlock_bh(>tcf_lock);
 
if (res == ACT_P_CREATED) {
tcf_idr_insert(tn, *act);
diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index f01c59ba6d12..422fdcd9b196 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -96,11 +96,11 @@ static int tcf_csum_init(struct net *net, struct nlattr 
*nla,
}
params_new->update_flags = parm->update_flags;
 
-   spin_lock(>tcf_lock);
+   spin_lock_bh(>tcf_lock);
p->tcf_action = parm->action;
rcu_swap_protected(p->params, params_new,
   lockdep_is_held(>tcf_lock));
-   spin_unlock(>tcf_lock);
+   spin_unlock_bh(>tcf_lock);
 
if (params_new)
kfree_rcu(params_new, rcu);
@@ -604,7 +604,7 @@ static int

Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-14 Thread Vlad Buslov



On Mon 13 Aug 2018 at 23:18, Cong Wang  wrote:
> Hi, Vlad,
>
> Could you help to test my fixes?
>
> I just pushed them into my own git repo:
> https://github.com/congwang/linux/commits/net-sched-fixes
>
> Particularly, this is the revert:
> https://github.com/congwang/linux/commit/b3f51c4ab8272cc8d3244848e528fce1426c4659
> and this is my fix for the lockdep warning you reported:
> https://github.com/congwang/linux/commit/ecadcde94919183e9f0d5bc376f05e731baf2661
>
> I don't have environment to test ife modules.

Hi Cong,

I've run the test with your patch applied and couldn't reproduce the
lockdep warning.

>
> BTW, this is the fix for the deadlock I spotted:
> https://github.com/congwang/linux/commit/44f3d7f5b6ed2d4a46177e6c658fa23b76141afa
>
> Thanks!

[PATCH net-next] net: sched: act_ife: always release ife action on init error

2018-08-14 Thread Vlad Buslov

Action init API was changed to always take reference to action, even when
overwriting existing action. Substitute conditional action release, which
was executed only if action is newly created, with unconditional release in
tcf_ife_init() error handling code to prevent double free or memory leak in
case of overwrite.

Fixes: 4e8ddd7f1758 ("net: sched: don't release reference on action overwrite")
Reported-by: Cong Wang 
Signed-off-by: Vlad Buslov 
---
 net/sched/act_ife.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 5d200495e467..c524edcad900 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -551,9 +551,6 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
   NULL, NULL);
if (err) {
 metadata_parse_err:
-   if (ret == ACT_P_CREATED)
-   tcf_idr_release(*a, bind);
-
if (exists)
spin_unlock_bh(>tcf_lock);
tcf_idr_release(*a, bind);
@@ -574,11 +571,10 @@ static int tcf_ife_init(struct net *net, struct nlattr 
*nla,
 */
err = use_all_metadata(ife);
if (err) {
-   if (ret == ACT_P_CREATED)
-   tcf_idr_release(*a, bind);
-
if (exists)
spin_unlock_bh(>tcf_lock);
+   tcf_idr_release(*a, bind);
+
kfree(p);
return err;
}
-- 
2.7.5

Re: [PATCH net-next v6 08/11] net: sched: don't release reference on action overwrite

2018-08-14 Thread Vlad Buslov

On Mon 13 Aug 2018 at 23:00, Cong Wang  wrote:
> On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov  wrote:
>> diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
>> index 89a761395c94..acea3feae762 100644
>> --- a/net/sched/act_ife.c
>> +++ b/net/sched/act_ife.c
> ...
>> @@ -548,6 +546,8 @@ static int tcf_ife_init(struct net *net, struct nlattr 
>> *nla,
>>
>> if (exists)
>> spin_unlock_bh(>tcf_lock);
>> +   tcf_idr_release(*a, bind);
>> +
>> kfree(p);
>> return err;
>> }
>
> With this change, you seem release it twice when nla_parse_nested() fails
> for ACT_P_CREATED case...?

Thank you, great catch!

>
> Looks like what you want is the following?
>
> if (err) {
> tcf_idr_release(*a, bind);
> kfree(p);
> return err;
> }

Yes. Sending the fix.

Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-13 Thread Vlad Buslov



On Mon 13 Aug 2018 at 17:23, Jamal Hadi Salim  wrote:
> On 2018-08-13 1:20 p.m., Vlad Buslov wrote:
>> Lockdep reports deadlock for following locking scenario in ife action:
>> 
>> Task one:
>> 1) Executes ife action update.
>> 2) Takes tcfa_lock.
>> 3) Waits on ife_mod_lock which is already taken by task two.
>> 
>> Task two:
>> 
>> 1) Executes any path that obtains ife_mod_lock without disabling bh (any
>> path that takes ife_mod_lock while holding tcfa_lock has bh disabled) like
>> loading a meta module, or creating new action.
>> 2) Takes ife_mod_lock.
>> 3) Task is preempted by rate estimator timer.
>> 4) Timer callback waits on tcfa_lock which is taken by task one.
>> 
>> In described case tasks deadlock because they take same two locks in
>> different order. To prevent potential deadlock reported by lockdep, always
>> disable bh when obtaining ife_mod_lock.
>> 
>
> Looks like your recent changes on net-next exposed this.

Its because I've recently expanded my private tests to create all kinds
of actions with estimator.

>
> Acked-by: Jamal Hadi Salim 
>
> cheers,
> jamal

Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-13 Thread Vlad Buslov

Hi David,

Is it okay to submit a fix for issue I uncovered when testing actions
with estimators, or I should resubmit to net when net-next is moved?

Thanks,
Vlad

[PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-13 Thread Vlad Buslov

   entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  509.008722]SOFTIRQ-ON-W at:\
[  509.012242] _raw_write_lock+0x2c/0x40
[  509.018013] register_ife_op+0x118/0x2c0 [act_ife]
[  509.024841] do_one_initcall+0xf7/0x4d9
[  509.030720] do_init_module+0x18b/0x44e
[  509.036604] load_module+0x4167/0x5730
[  509.042397] __do_sys_finit_module+0x16d/0x1a0
[  509.048865] do_syscall_64+0x7a/0x3f0
[  509.054551] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  509.061636]SOFTIRQ-ON-R at:
[  509.065145] _raw_read_lock+0x2f/0x40
[  509.070854] find_ife_oplist+0x1e/0xc0 [act_ife]
[  509.077515] tcf_ife_init+0x82f/0xf40 [act_ife]
[  509.084051] tcf_action_init_1+0x510/0x750
[  509.090172] tcf_action_init+0x1e8/0x340
[  509.096124] tcf_action_add+0xc5/0x240
[  509.101891] tc_ctl_action+0x203/0x2a0
[  509.107671] rtnetlink_rcv_msg+0x5bd/0x7b0
[  509.113811] netlink_rcv_skb+0x184/0x220
[  509.119768] netlink_unicast+0x31b/0x460
[  509.125716] netlink_sendmsg+0x3fb/0x840
[  509.131668] sock_sendmsg+0x7b/0xd0
[  509.137167] ___sys_sendmsg+0x4c6/0x610
[  509.143010] __sys_sendmsg+0xd7/0x150
[  509.148718] do_syscall_64+0x7a/0x3f0
[  509.154443] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  509.161533]INITIAL USE at:
[  509.164956]_raw_read_lock+0x2f/0x40
[  509.170574]find_ife_oplist+0x1e/0xc0 [act_ife]
[  509.177134]tcf_ife_init+0x82f/0xf40 [act_ife]
[  509.183619]tcf_action_init_1+0x510/0x750
[  509.189674]tcf_action_init+0x1e8/0x340
[  509.195534]tcf_action_add+0xc5/0x240
[  509.201229]tc_ctl_action+0x203/0x2a0
[  509.206920]rtnetlink_rcv_msg+0x5bd/0x7b0
[  509.212936]netlink_rcv_skb+0x184/0x220
[  509.218818]netlink_unicast+0x31b/0x460
[  509.224699]netlink_sendmsg+0x3fb/0x840
[  509.230581]sock_sendmsg+0x7b/0xd0
[  509.235984]___sys_sendmsg+0x4c6/0x610
[  509.241791]__sys_sendmsg+0xd7/0x150
[  509.247425]do_syscall_64+0x7a/0x3f0
[  509.253007]entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  509.259975]  }
[  509.261998]  ... key  at: [] 
ife_mod_lock+0x18/0x8dc0 [act_ife]
[  509.271569]  ... acquired at:
[  509.274912]_raw_read_lock+0x2f/0x40
[  509.279134]find_ife_oplist+0x1e/0xc0 [act_ife]
[  509.284324]tcf_ife_init+0x82f/0xf40 [act_ife]
[  509.289425]tcf_action_init_1+0x510/0x750
[  509.294068]tcf_action_init+0x1e8/0x340
[  509.298553]tcf_action_add+0xc5/0x240
[  509.302854]tc_ctl_action+0x203/0x2a0
[  509.307153]rtnetlink_rcv_msg+0x5bd/0x7b0
[  509.311805]netlink_rcv_skb+0x184/0x220
[  509.316282]netlink_unicast+0x31b/0x460
[  509.320769]netlink_sendmsg+0x3fb/0x840
[  509.325248]sock_sendmsg+0x7b/0xd0
[  509.329290]___sys_sendmsg+0x4c6/0x610
[  509.333687]__sys_sendmsg+0xd7/0x150
[  509.337902]do_syscall_64+0x7a/0x3f0
[  509.342116]entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  509.349601]
   stack backtrace:
[  509.354663] CPU: 6 PID: 5460 Comm: tc Not tainted 4.18.0-rc8+ #646
[  509.361216] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 
03/30/2017

Fixes: ef6980b6becb ("introduce IFE action")
Signed-off-by: Vlad Buslov 
---
 net/sched/act_ife.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 5d200495e467..fdb928ca81bb 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -167,16 +167,16 @@ static struct tcf_meta_ops *find_ife_oplist(u16 metaid)
 {
struct tcf_meta_ops *o;
 
-   read_lock(_mod_lock);
+   read_lock_bh(_mod_lock);
list_for_each_entry(o, , list) {
if (o->metaid == metaid) {
if (!try_module_get(o->owner))
o = NULL;
-   read_unlock(_mod_lock);
+   read_unlock_bh(_mod_lock);
return o;
}
}
-   read_unlock(_mod_lock);
+   read_unlock_bh(_mod_lock);
 
return NULL;
 }
@@ -190,12 +190,12 @@ int register_ife_op(struct tcf_meta_ops *mops)
!mops->get || !mops->alloc)
return -EINVAL;
 
-   write_lock(_mod_lock);
+   write_lock_bh(_mod_lock);
 
list_for_each_entry(m, ,

Re: [PATCH net-next v6 10/11] net: sched: atomically check-allocate action

2018-08-13 Thread Vlad Buslov

On Fri 10 Aug 2018 at 21:45, Cong Wang  wrote:
> On Fri, Aug 10, 2018 at 3:29 AM Vlad Buslov  wrote:
>>
>> Approach you suggest is valid, but has its own trade-offs:
>>
>> - As you noted, lock granularity becomes coarse-grained due to per-netns
>> scope.
>
> Sure, you acquire idrinfo->lock too, the only difference is how long
> you take it.
>
> The bottleneck of your approach is the same, also you take idrinfo->lock
> twice, so the contention is heavier.
>
>
>>
>> - I am not sure it is possible to call idr_replace() without obtaining
>> idrinfo->lock in this particular case. Concurrent delete of action with
>> same id is possible and, according to idr_replace() description,
>> unlocked execution is not supported for such use-case:
>
> But we can hold its refcnt before releasing idrinfo->lock, so
> idr_replace() can't race with concurrent delete.

Yes, for concurrent delete case I agree. Action is removed from idr only
when last reference is released and, in case of existing action update,
init holds a reference.

What about case when multiple task race to update the same existing
action? I assume idr_replace() can be used for such case, but what would
be the algorithm in case init replaced some other action, and not the
action it actually copied before calling idr_replace()?

>
>
>>
>> - High rate or replace request will generate a lot of unnecessary memory
>> allocations and deallocations.
>>
>
> Yes, this is literally how RCU works, always allocate and copy,
> release upon error.
>
> Also, if this is really a problem, we have SLAB_TYPESAFE_BY_RCU
> too. ;)

Current action update implementation is in-place, so there is no "copy"
stage, besides members of some actions that are RCU-pointers. But I
guess it makes sense if your goal is to refactor all actions to be
updated with RCU.

[PATCH net-next v2 10/15] net: sched: act_tunnel_key: remove dependency on rtnl lock

2018-08-10 Thread Vlad Buslov

Use tcf lock to protect tunnel key action struct private data from
concurrent modification in init and dump. Use rcu swap operation to
reassign params pointer under protection of tcf lock. (old params value is
not used by init, so there is no need of standalone rcu dereference step)

Remove rtnl lock assertion that is no longer required.

Signed-off-by: Vlad Buslov 
---
 net/sched/act_tunnel_key.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index d42d9e112789..ba2ae9f75ef5 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -204,7 +204,6 @@ static int tunnel_key_init(struct net *net, struct nlattr 
*nla,
 {
struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
struct nlattr *tb[TCA_TUNNEL_KEY_MAX + 1];
-   struct tcf_tunnel_key_params *params_old;
struct tcf_tunnel_key_params *params_new;
struct metadata_dst *metadata = NULL;
struct tc_tunnel_key *parm;
@@ -346,24 +345,22 @@ static int tunnel_key_init(struct net *net, struct nlattr 
*nla,
 
t = to_tunnel_key(*a);
 
-   ASSERT_RTNL();
params_new = kzalloc(sizeof(*params_new), GFP_KERNEL);
if (unlikely(!params_new)) {
tcf_idr_release(*a, bind);
NL_SET_ERR_MSG(extack, "Cannot allocate tunnel key parameters");
return -ENOMEM;
}
-
-   params_old = rtnl_dereference(t->params);
-
-   t->tcf_action = parm->action;
params_new->tcft_action = parm->t_action;
params_new->tcft_enc_metadata = metadata;
 
-   rcu_assign_pointer(t->params, params_new);
-
-   if (params_old)
-   kfree_rcu(params_old, rcu);
+   spin_lock(>tcf_lock);
+   t->tcf_action = parm->action;
+   rcu_swap_protected(t->params, params_new,
+  lockdep_is_held(>tcf_lock));
+   spin_unlock(>tcf_lock);
+   if (params_new)
+   kfree_rcu(params_new, rcu);
 
if (ret == ACT_P_CREATED)
tcf_idr_insert(tn, *a);
@@ -485,12 +482,13 @@ static int tunnel_key_dump(struct sk_buff *skb, struct 
tc_action *a,
.index= t->tcf_index,
.refcnt   = refcount_read(>tcf_refcnt) - ref,
.bindcnt  = atomic_read(>tcf_bindcnt) - bind,
-   .action   = t->tcf_action,
};
struct tcf_t tm;
 
-   params = rtnl_dereference(t->params);
-
+   spin_lock(>tcf_lock);
+   params = rcu_dereference_protected(t->params,
+  lockdep_is_held(>tcf_lock));
+   opt.action   = t->tcf_action;
opt.t_action = params->tcft_action;
 
if (nla_put(skb, TCA_TUNNEL_KEY_PARMS, sizeof(opt), ))
@@ -522,10 +520,12 @@ static int tunnel_key_dump(struct sk_buff *skb, struct 
tc_action *a,
if (nla_put_64bit(skb, TCA_TUNNEL_KEY_TM, sizeof(tm),
  , TCA_TUNNEL_KEY_PAD))
goto nla_put_failure;
+   spin_unlock(>tcf_lock);
 
return skb->len;
 
 nla_put_failure:
+   spin_unlock(>tcf_lock);
nlmsg_trim(skb, b);
return -1;
 }
-- 
2.7.5

1 2 3 >

1 - 100 of 276 matches

Mail list logo