Re: [PATCH net] net_sched: act_mirred: full rcu conversion
On 16-09-11 11:12 PM, Cong Wang wrote: > On Fri, Sep 9, 2016 at 8:52 AM, John Fastabend> wrote: >> On 16-09-08 10:26 PM, Cong Wang wrote: >>> On Thu, Sep 8, 2016 at 8:51 AM, Eric Dumazet wrote: On Thu, 2016-09-08 at 08:47 -0700, John Fastabend wrote: > Works for me. FWIW I find this plenty straightforward and don't really > see the need to make the hash table itself rcu friendly. > > Acked-by: John Fastabend > Yes, it seems this hash table is used in control path, with RTNL held anyway. >>> >>> Seriously? You never read hashtable in fast path?? I think you need >>> to wake up. >>> >> >> But the actions use refcnt'ing and should never be decremented to zero >> as long as they can still be referenced by an active filter. If each >> action handles its parameters like mirred/gact then I don't see why its >> necessary. > > This is correct, by "read" I meant "dereference", the tc actions > are now permanently stored in hashtable directly, so "reading" > a tc action is reading from hashtable. > > Sorry if this wasn't clear. > OK, but with the current code there is no need to protect the hash table with an RCU semantics. The ref counting ensures the hash table entries are always available and any 'replace' commands on actions are handled internally in the action itself with rcu_assign_pointer replacing old params with new params. The fast path readers will always read a consistent set of parameters in this scheme. So no need for an rcu hash table. The reference count though should likely be atomic because not all increments/decrements are protected by RTNL lock. .John
Re: [PATCH net] net_sched: act_mirred: full rcu conversion
On Fri, Sep 9, 2016 at 8:52 AM, John Fastabendwrote: > On 16-09-08 10:26 PM, Cong Wang wrote: >> On Thu, Sep 8, 2016 at 8:51 AM, Eric Dumazet wrote: >>> On Thu, 2016-09-08 at 08:47 -0700, John Fastabend wrote: >>> Works for me. FWIW I find this plenty straightforward and don't really see the need to make the hash table itself rcu friendly. Acked-by: John Fastabend >>> >>> Yes, it seems this hash table is used in control path, with RTNL held >>> anyway. >> >> Seriously? You never read hashtable in fast path?? I think you need >> to wake up. >> > > But the actions use refcnt'ing and should never be decremented to zero > as long as they can still be referenced by an active filter. If each > action handles its parameters like mirred/gact then I don't see why its > necessary. This is correct, by "read" I meant "dereference", the tc actions are now permanently stored in hashtable directly, so "reading" a tc action is reading from hashtable. Sorry if this wasn't clear.
Re: [PATCH net] net_sched: act_mirred: full rcu conversion
On Fri, Sep 9, 2016 at 5:23 AM, Eric Dumazetwrote: > On Thu, 2016-09-08 at 22:24 -0700, Cong Wang wrote: >> On Thu, Sep 8, 2016 at 8:35 AM, Eric Dumazet wrote: >> > From: Eric Dumazet >> > >> > As reported by Cong Wang, I was lazy when I did initial RCU conversion >> > of tc_mirred, as I thought I could avoid allocation/freeing of a >> > parameter block. >> >> Quote from Eric Dumazet: >> >> https://www.mail-archive.com/netdev@vger.kernel.org/msg115482.html >> >> >> Well, I added a READ_ONCE() to read tcf_action once. >> >> Adding rcu here would mean adding a pointer and extra cache line, to >> deref the values. >> >> IMHO the race here has no effect . You either read the old or new value. >> >> >> Me with facepalm... ;-) > > > Point is still valid. Show me a real case where it was a serious > problem, instead of simply theoretical. > > tc_mirred + ifb patches allowed us to reach a milestone, removing the > last contended spinlocks, and you are catching up with this one year > later. > > I wont backport this fix in Google prod kernels, because there is > absolutely no way we need it, and the extra memory cache line might hurt > latencies. > > Since you did not write a fix on your side since June 17th, I presume > you do not care that much. Sounds like I were the author of this patch Why are you questioning your own patch? Did I ask you to care about it? ;-) Please drop this patch. Thanks.
Re: [PATCH net] net_sched: act_mirred: full rcu conversion
On 16-09-08 10:26 PM, Cong Wang wrote: > On Thu, Sep 8, 2016 at 8:51 AM, Eric Dumazetwrote: >> On Thu, 2016-09-08 at 08:47 -0700, John Fastabend wrote: >> >>> Works for me. FWIW I find this plenty straightforward and don't really >>> see the need to make the hash table itself rcu friendly. >>> >>> Acked-by: John Fastabend >>> >> >> Yes, it seems this hash table is used in control path, with RTNL held >> anyway. > > Seriously? You never read hashtable in fast path?? I think you need > to wake up. > But the actions use refcnt'ing and should never be decremented to zero as long as they can still be referenced by an active filter. If each action handles its parameters like mirred/gact then I don't see why its necessary. I believe though that the refcnt needs to be fixed a bit though most likely by making it atomic. I original assumed it was protected by RTNL lock but because its getting decremented from rcu callback this is not true. .John
Re: [PATCH net] net_sched: act_mirred: full rcu conversion
On Thu, 2016-09-08 at 22:59 -0700, Cong Wang wrote: > On Thu, Sep 8, 2016 at 10:48 PM, Alexei Starovoitov >wrote: > > > > imo the deliberate small race in Eric's initial lock removal in mirred > > was a good design choice. I think he did this patch only to > > silence the complains with the code instead of arguing with words. > > imo the initial code is good as-is. This patch is also a nice improvement, > > but certainly not mandatory. 'face palm' is unnecessary. > > LOL, to summarize your words: > > 1) not changing the current code is good > 2) changing the current code is good too Only because I was tired of your rants. I wont backport this patch to Google prod kernels, I already spent too much time.
Re: [PATCH net] net_sched: act_mirred: full rcu conversion
On Thu, 2016-09-08 at 22:24 -0700, Cong Wang wrote: > On Thu, Sep 8, 2016 at 8:35 AM, Eric Dumazetwrote: > > From: Eric Dumazet > > > > As reported by Cong Wang, I was lazy when I did initial RCU conversion > > of tc_mirred, as I thought I could avoid allocation/freeing of a > > parameter block. > > Quote from Eric Dumazet: > > https://www.mail-archive.com/netdev@vger.kernel.org/msg115482.html > > > Well, I added a READ_ONCE() to read tcf_action once. > > Adding rcu here would mean adding a pointer and extra cache line, to > deref the values. > > IMHO the race here has no effect . You either read the old or new value. > > > Me with facepalm... ;-) Point is still valid. Show me a real case where it was a serious problem, instead of simply theoretical. tc_mirred + ifb patches allowed us to reach a milestone, removing the last contended spinlocks, and you are catching up with this one year later. I wont backport this fix in Google prod kernels, because there is absolutely no way we need it, and the extra memory cache line might hurt latencies. Since you did not write a fix on your side since June 17th, I presume you do not care that much.
Re: [PATCH net] net_sched: act_mirred: full rcu conversion
On Thu, 2016-09-08 at 22:26 -0700, Cong Wang wrote: > On Thu, Sep 8, 2016 at 8:51 AM, Eric Dumazetwrote: > > On Thu, 2016-09-08 at 08:47 -0700, John Fastabend wrote: > > > >> Works for me. FWIW I find this plenty straightforward and don't really > >> see the need to make the hash table itself rcu friendly. > >> > >> Acked-by: John Fastabend > >> > > > > Yes, it seems this hash table is used in control path, with RTNL held > > anyway. > > Seriously? You never read hashtable in fast path?? I think you need > to wake up. I seriously consider to ignore your mails. Since you write patches with no changelog, expecting us to read your mind, you definitely don't need us.
Re: [PATCH net] net_sched: act_mirred: full rcu conversion
On Thu, Sep 8, 2016 at 10:48 PM, Alexei Starovoitovwrote: > > imo the deliberate small race in Eric's initial lock removal in mirred > was a good design choice. I think he did this patch only to > silence the complains with the code instead of arguing with words. > imo the initial code is good as-is. This patch is also a nice improvement, > but certainly not mandatory. 'face palm' is unnecessary. LOL, to summarize your words: 1) not changing the current code is good 2) changing the current code is good too
Re: [PATCH net] net_sched: act_mirred: full rcu conversion
On Thu, Sep 08, 2016 at 10:24:32PM -0700, Cong Wang wrote: > On Thu, Sep 8, 2016 at 8:35 AM, Eric Dumazetwrote: > > From: Eric Dumazet > > > > As reported by Cong Wang, I was lazy when I did initial RCU conversion > > of tc_mirred, as I thought I could avoid allocation/freeing of a > > parameter block. > > Quote from Eric Dumazet: > > https://www.mail-archive.com/netdev@vger.kernel.org/msg115482.html > > > Well, I added a READ_ONCE() to read tcf_action once. > > Adding rcu here would mean adding a pointer and extra cache line, to > deref the values. > > IMHO the race here has no effect . You either read the old or new value. > > > Me with facepalm... ;-) imo the deliberate small race in Eric's initial lock removal in mirred was a good design choice. I think he did this patch only to silence the complains with the code instead of arguing with words. imo the initial code is good as-is. This patch is also a nice improvement, but certainly not mandatory. 'face palm' is unnecessary.
Re: [PATCH net] net_sched: act_mirred: full rcu conversion
On Thu, Sep 8, 2016 at 8:51 AM, Eric Dumazetwrote: > On Thu, 2016-09-08 at 08:47 -0700, John Fastabend wrote: > >> Works for me. FWIW I find this plenty straightforward and don't really >> see the need to make the hash table itself rcu friendly. >> >> Acked-by: John Fastabend >> > > Yes, it seems this hash table is used in control path, with RTNL held > anyway. Seriously? You never read hashtable in fast path?? I think you need to wake up.
Re: [PATCH net] net_sched: act_mirred: full rcu conversion
On Thu, Sep 8, 2016 at 8:35 AM, Eric Dumazetwrote: > From: Eric Dumazet > > As reported by Cong Wang, I was lazy when I did initial RCU conversion > of tc_mirred, as I thought I could avoid allocation/freeing of a > parameter block. Quote from Eric Dumazet: https://www.mail-archive.com/netdev@vger.kernel.org/msg115482.html Well, I added a READ_ONCE() to read tcf_action once. Adding rcu here would mean adding a pointer and extra cache line, to deref the values. IMHO the race here has no effect . You either read the old or new value. Me with facepalm... ;-)
Re: [PATCH net] net_sched: act_mirred: full rcu conversion
On Thu, 2016-09-08 at 08:47 -0700, John Fastabend wrote: > Works for me. FWIW I find this plenty straightforward and don't really > see the need to make the hash table itself rcu friendly. > > Acked-by: John Fastabend> Yes, it seems this hash table is used in control path, with RTNL held anyway. Thanks for reviewing John.
Re: [PATCH net] net_sched: act_mirred: full rcu conversion
On 16-09-08 08:35 AM, Eric Dumazet wrote: > From: Eric Dumazet> > As reported by Cong Wang, I was lazy when I did initial RCU conversion > of tc_mirred, as I thought I could avoid allocation/freeing of a > parameter block. > > Use an intermediate object so that fast path can get a consistent > snapshot of multiple variables (dev, action, eaction, ok_push) > > Fixes: 2ee22a90c7af ("net_sched: act_mirred: remove spinlock in fast path") > Signed-off-by: Eric Dumazet > Reported-by: Cong Wang > Cc: John Fastabend > Cc: Jamal Hadi Salim > Cc: Hadar Hen Zion > Cc: Amir Vadai > --- Works for me. FWIW I find this plenty straightforward and don't really see the need to make the hash table itself rcu friendly. Acked-by: John Fastabend
[PATCH net] net_sched: act_mirred: full rcu conversion
From: Eric DumazetAs reported by Cong Wang, I was lazy when I did initial RCU conversion of tc_mirred, as I thought I could avoid allocation/freeing of a parameter block. Use an intermediate object so that fast path can get a consistent snapshot of multiple variables (dev, action, eaction, ok_push) Fixes: 2ee22a90c7af ("net_sched: act_mirred: remove spinlock in fast path") Signed-off-by: Eric Dumazet Reported-by: Cong Wang Cc: John Fastabend Cc: Jamal Hadi Salim Cc: Hadar Hen Zion Cc: Amir Vadai --- include/net/tc_act/tc_mirred.h | 12 +- net/sched/act_mirred.c | 54 +-- 2 files changed, 47 insertions(+), 19 deletions(-) diff --git a/include/net/tc_act/tc_mirred.h b/include/net/tc_act/tc_mirred.h index 62770add15bd..a613b4b9c56e 100644 --- a/include/net/tc_act/tc_mirred.h +++ b/include/net/tc_act/tc_mirred.h @@ -4,12 +4,20 @@ #include #include + +struct tcf_mirred_params { + struct net_device *dev; + int action; + int eaction; + int ok_push; + struct rcu_head rcu; +}; + struct tcf_mirred { struct tc_actioncommon; int tcfm_eaction; int tcfm_ifindex; - int tcfm_ok_push; - struct net_device __rcu *tcfm_dev; + struct tcf_mirred_params __rcu *params; struct list_headtcfm_list; }; #define to_mirred(a) ((struct tcf_mirred *)a) diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index 6038c85d92f5..1dc0b6036180 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -36,15 +36,16 @@ static DEFINE_SPINLOCK(mirred_list_lock); static void tcf_mirred_release(struct tc_action *a, int bind) { struct tcf_mirred *m = to_mirred(a); - struct net_device *dev; + struct tcf_mirred_params *params; /* We could be called either in a RCU callback or with RTNL lock held. */ spin_lock_bh(_list_lock); list_del(>tcfm_list); - dev = rcu_dereference_protected(m->tcfm_dev, 1); - if (dev) - dev_put(dev); + params = rcu_dereference_protected(m->params, 1); + if (params->dev) + dev_put(params->dev); spin_unlock_bh(_list_lock); + kfree_rcu(params, rcu); } static const struct nla_policy mirred_policy[TCA_MIRRED_MAX + 1] = { @@ -60,6 +61,8 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla, { struct tc_action_net *tn = net_generic(net, mirred_net_id); struct nlattr *tb[TCA_MIRRED_MAX + 1]; + struct tcf_mirred_params *params_new; + struct tcf_mirred_params *params_old; struct tc_mirred *parm; struct tcf_mirred *m; struct net_device *dev; @@ -124,21 +127,33 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla, tcf_hash_release(*a, bind); if (!ovr) return -EEXIST; + ret = 0; } m = to_mirred(*a); ASSERT_RTNL(); - m->tcf_action = parm->action; - m->tcfm_eaction = parm->eaction; + params_new = kzalloc(sizeof(*params_new), GFP_KERNEL); + if (unlikely(!params_new)) { + if (ret == ACT_P_CREATED) + tcf_hash_release(*a, bind); + return -ENOMEM; + } + params_old = rtnl_dereference(m->params); + params_new->action = m->tcf_action = parm->action; + params_new->eaction = m->tcfm_eaction = parm->eaction; if (dev != NULL) { m->tcfm_ifindex = parm->ifindex; - if (ret != ACT_P_CREATED) - dev_put(rcu_dereference_protected(m->tcfm_dev, 1)); dev_hold(dev); - rcu_assign_pointer(m->tcfm_dev, dev); - m->tcfm_ok_push = ok_push; + params_new->dev = dev; + params_new->ok_push = ok_push; } + rcu_assign_pointer(m->params, params_new); + if (params_old) { + if (params_old->dev) + dev_put(params_old->dev); + kfree_rcu(params_old, rcu); + } if (ret == ACT_P_CREATED) { spin_lock_bh(_list_lock); list_add(>tcfm_list, _list); @@ -153,6 +168,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a, struct tcf_result *res) { struct tcf_mirred *m = to_mirred(a); + struct tcf_mirred_params *params; struct net_device *dev; struct sk_buff *skb2; int retval, err; @@ -162,8 +178,9 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,