Inefficient call to ipv6_chk_acast_addr_src in icmp6_send
Hi, The call to ipv6_chk_acast_addr_src in icmp6_send can be pretty costly on systems with a lot of net_devices since it can end up looping through all net_devices in a net namespace searching for an anycast address. A few thousand icmp6 error packets can end up consuming a whole CPU. I am thinking of fixing this by adding a hash table along the lines of inet6_addr_lst, providing a fast lookup for anycast addresses. Is that the right way to go? Thanks, Salam
Re: Null pointer dereference in tg3_poll_work running linux-3.4
On Tue, Mar 28, 2017 at 11:53 AM, Michael Chanwrote: > I don't remember any bug fixes related to this logic. One possibility > is that there is DMA corruption and we are getting a bad opaque value. > If you have the core dump, it will be useful to look at the receive > completion ring. These opaque values are simple index values and > should be in sequence. Unfortunately I don't have a core dump, this has happened around 5 times on boxes that have been up between 2 and 7 months. So it seems extremely rare. So far, the only information I have is the stack trace.
Null pointer dereference in tg3_poll_work running linux-3.4
Hi, We've seen a very rare kernel panic in tg3_poll_work on hardware running linux-3.4. I haven't seen any upstream patches that seem to fix this issue in the tg3 driver. The disassembly shows that the panic is happening in tg3_rx which is inlined into tg3_poll_work. In the code below, the "data" pointer seem to be Null, tg3_recycle_rx(tnapi, tpr, opaque_key, desc_idx, *post_ptr); skb = netdev_alloc_skb(tp->dev, len + TG3_RAW_IP_ALIGN); if (skb == NULL) goto drop_it_no_recycle; skb_reserve(skb, TG3_RAW_IP_ALIGN); pci_dma_sync_single_for_cpu(tp->pdev, dma_addr, len, PCI_DMA_FROMDEVICE); memcpy(skb->data, data + TG3_RX_OFFSET(tp), len); pci_dma_sync_single_for_device(tp->pdev, dma_addr, len, PCI_DMA_FROMDEVICE); I am wondering if anyone has seen this before or if it was fixed and I missed the patch for it. If not, any ideas on how we could end up with data being null? I don't have a reproduction scenario for this one. Thanks, Salam
Re: [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister
On Fri, Feb 5, 2016 at 8:04 AM, Sergei Shtylyov <sergei.shtyl...@cogentembedded.com> wrote: > On 02/05/2016 02:35 AM, Salam Noureddine wrote: >> >> if (event == NETDEV_UNREGISTER) { >> - fib_disable_ip(dev, event, true); >> + if (fib_sync_down_dev(dev, event, true)) >> + net->ipv4.needs_fib_flush = true; >> rt_flush_dev(dev); >> return NOTIFY_DONE; >> } >> >> + if (event == NETDEV_UNREGISTER_BATCH || event == >> NETDEV_DOWN_BATCH) { >> + if (net->ipv4.needs_fib_flush) { >> + fib_flush(net); >> + net->ipv4.needs_fib_flush = false; >> + } >> + rt_cache_flush(net); >> + arp_ifdown_all(); >> + return NOTIFY_DONE; >> + } >> + > > >I'd convert to *switch* the above 2 *if*'s... > > [...] > > MBR, Sergei > I could do that. Thanks, Salam
Re: [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers
On Sat, Feb 6, 2016 at 10:58 AM, Julian Anastasovwrote: > > I now see that we should split the loop > here, so that NETDEV_DOWN_BATCH is called only once per net: > > bool down = false; > > for_each_netdev(net, dev) { > if (dev == last) > break; > > if (dev->flags & IFF_UP) { > call_netdevice_notifier(nb, NETDEV_GOING_DOWN, > dev); > call_netdevice_notifier(nb, NETDEV_DOWN, dev); > down = true; > } > } > > rt_cache_flush and arp_ifdown_all will be called > on NETDEV_UNREGISTER_BATCH, so use 'down' flag: > > if (down) > call_netdevice_notifier(nb, NETDEV_DOWN_BATCH, > net->loopback_dev); > for_each_netdev(net, dev) { > if (dev == last) > goto outroll; > call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev); > } > call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, > net->loopback_dev); > > >> } >> call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev); >> } >> + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, >> + net->loopback_dev); >> } >> >> outroll: >> + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, last); >> raw_notifier_chain_unregister(_chain, nb); >> goto unlock; >> } I am not sure we need to worry too much about optimizing the failure code path to justify splitting into two loops. >> static void rollback_registered_many(struct list_head *head) >> { >> list_for_each_entry(dev, head, unreg_list) { >> - struct sk_buff *skb = NULL; >> - >> /* Shutdown queueing discipline. */ >> dev_shutdown(dev); >> >> @@ -6475,6 +6497,20 @@ static void rollback_registered_many(struct list_head >> *head) >> this device. They should clean all the things. >> */ >> call_netdevice_notifiers(NETDEV_UNREGISTER, dev); >> + } >> + >> + /* call batch notifiers which act on net namespaces */ >> + list_for_each_entry(dev, head, unreg_list) { >> + net_add_event_list(_head, dev_net(dev)); > > Looks like we can move the above net_add_event_list with > the comment into the previous loop after NETDEV_UNREGISTER, > we will save some cycles. > I didn't move it into the previous loop because the NETDEV_UNREGISTER notifier can end up calling rollback_registered_many (for example the vlan driver unregistering all vlans on top of a device) in which case we would be using the event_list in the net namespace. > Julian Anastasov Thanks, Salam
Re: [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown
Forgot to mention the rtnl_lock hold time gain with these changes. I got the following benchmark results on one of our switches. Without this patch, deleting 1k interfaces with 100k routes in the fib held the rtnl_lock for 13 seconds. With the patch, rtnl_lock hold time went down to 5 seconds. The gain is even more pronounced with 512k routes in the FIB. In this case, without the patch, rtnl_lock was held for 36 seconds and with the patch it was held for 5.5 seconds. On Thu, Feb 4, 2016 at 3:35 PM, Salam Noureddine <nouredd...@arista.com> wrote: > Added changes suggested by Julian Anastasov in version 2. > > fib_flush walks the whole fib in a net_namespace and is called for > each net_device being closed or unregistered. This can be very expensive > when dealing with 100k or more routes in the fib and removal of a lot > of interfaces. These four patches deal with this issue by calling fib_flush > just once for each net namespace and introduce a new function arp_ifdown_all > that does a similar optimization for the neighbour table. > > The benchmark tests were run on linux-3.18. > > Salam Noureddine (4): > net: add event_list to struct net and provide utility functions > net: dev: add batching to net_device notifiers > net: core: introduce neigh_ifdown_all for all down interfaces > net: fib: avoid calling fib_flush for each device when doing batch > close and unregister > > include/linux/netdevice.h | 2 ++ > include/net/arp.h | 1 + > include/net/neighbour.h | 1 + > include/net/net_namespace.h | 22 + > include/net/netns/ipv4.h| 1 + > net/core/dev.c | 48 > - > net/core/neighbour.c| 38 --- > net/core/net_namespace.c| 1 + > net/ipv4/arp.c | 4 > net/ipv4/fib_frontend.c | 16 +-- > 10 files changed, 120 insertions(+), 14 deletions(-) > > -- > 1.8.1.4 >
[PATCH v2 net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces
This cleans up neighbour entries for all interfaces in the down state, avoiding walking the whole neighbour table for each interface being brought down. Signed-off-by: Salam Noureddine <nouredd...@arista.com> --- include/net/arp.h | 1 + include/net/neighbour.h | 1 + net/core/neighbour.c| 38 +++--- net/ipv4/arp.c | 4 4 files changed, 37 insertions(+), 7 deletions(-) diff --git a/include/net/arp.h b/include/net/arp.h index 5e0f891..0efee66 100644 --- a/include/net/arp.h +++ b/include/net/arp.h @@ -43,6 +43,7 @@ void arp_send(int type, int ptype, __be32 dest_ip, const unsigned char *src_hw, const unsigned char *th); int arp_mc_map(__be32 addr, u8 *haddr, struct net_device *dev, int dir); void arp_ifdown(struct net_device *dev); +void arp_ifdown_all(void); struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip, struct net_device *dev, __be32 src_ip, diff --git a/include/net/neighbour.h b/include/net/neighbour.h index 8b68384..8785d7b 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -318,6 +318,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, u32 flags); void __neigh_set_probe_once(struct neighbour *neigh); void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev); int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev); +int neigh_ifdown_all(struct neigh_table *tbl); int neigh_resolve_output(struct neighbour *neigh, struct sk_buff *skb); int neigh_connected_output(struct neighbour *neigh, struct sk_buff *skb); int neigh_direct_output(struct neighbour *neigh, struct sk_buff *skb); diff --git a/net/core/neighbour.c b/net/core/neighbour.c index f18ae91..bfbd97a 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -54,7 +54,8 @@ do { \ static void neigh_timer_handler(unsigned long arg); static void __neigh_notify(struct neighbour *n, int type, int flags); static void neigh_update_notify(struct neighbour *neigh); -static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev); +static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev, +bool all_down); #ifdef CONFIG_PROC_FS static const struct file_operations neigh_stat_seq_fops; @@ -192,7 +193,8 @@ static void pneigh_queue_purge(struct sk_buff_head *list) } } -static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) +static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev, + bool all_down) { int i; struct neigh_hash_table *nht; @@ -210,6 +212,12 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) np = >next; continue; } + if (!dev && n->dev && all_down) { + if (n->dev->flags & IFF_UP) { + np = >next; + continue; + } + } rcu_assign_pointer(*np, rcu_dereference_protected(n->next, lockdep_is_held(>lock))); @@ -245,7 +253,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev) { write_lock_bh(>lock); - neigh_flush_dev(tbl, dev); + neigh_flush_dev(tbl, dev, false); write_unlock_bh(>lock); } EXPORT_SYMBOL(neigh_changeaddr); @@ -253,8 +261,8 @@ EXPORT_SYMBOL(neigh_changeaddr); int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev) { write_lock_bh(>lock); - neigh_flush_dev(tbl, dev); - pneigh_ifdown(tbl, dev); + neigh_flush_dev(tbl, dev, false); + pneigh_ifdown(tbl, dev, false); write_unlock_bh(>lock); del_timer_sync(>proxy_timer); @@ -263,6 +271,19 @@ int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev) } EXPORT_SYMBOL(neigh_ifdown); +int neigh_ifdown_all(struct neigh_table *tbl) +{ + write_lock_bh(>lock); + neigh_flush_dev(tbl, NULL, true); + pneigh_ifdown(tbl, NULL, true); + write_unlock_bh(>lock); + + del_timer_sync(>proxy_timer); + pneigh_queue_purge(>proxy_queue); + return 0; +} +EXPORT_SYMBOL(neigh_ifdown_all); + static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device *dev) { struct neighbour *n = NULL; @@ -645,7 +666,8 @@ int pneigh_delete(struct neigh_table *tbl, struct net *net, const void *pkey, return -ENOENT; } -static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
[PATCH v2 net-next 1/4] net: add event_list to struct net and provide utility functions
Signed-off-by: Salam Noureddine <nouredd...@arista.com> --- include/net/net_namespace.h | 22 ++ net/core/net_namespace.c| 1 + 2 files changed, 23 insertions(+) diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 4089abc..6dbc0b2 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -58,6 +58,7 @@ struct net { struct list_headlist; /* list of network namespaces */ struct list_headcleanup_list; /* namespaces on death row */ struct list_headexit_list; /* Use only net_mutex */ + struct list_headevent_list; /* net_device notifier list */ struct user_namespace *user_ns; /* Owning user namespace */ spinlock_t nsid_lock; @@ -380,4 +381,25 @@ static inline void fnhe_genid_bump(struct net *net) atomic_inc(>fnhe_genid); } +#ifdef CONFIG_NET_NS +static inline void net_add_event_list(struct list_head *head, struct net *net) +{ + if (list_empty(>event_list)) + list_add_tail(>event_list, head); +} + +static inline void net_del_event_list(struct net *net) +{ + list_del_init(>event_list); +} +#else +static inline void net_add_event_list(struct list_head *head, struct net *net) +{ +} + +static inline void net_del_event_list(struct net *net) +{ +} +#endif + #endif /* __NET_NET_NAMESPACE_H */ diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 2c2eb1b..58e84ce 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -282,6 +282,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns) net->user_ns = user_ns; idr_init(>netns_ids); spin_lock_init(>nsid_lock); + INIT_LIST_HEAD(>event_list); list_for_each_entry(ops, _list, list) { error = ops_init(ops, net); -- 1.8.1.4
[PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers
This can be used to optimize bringing down and unregsitering net_devices by running certain cleanup operations only on the net namespace instead of on each net_device. Signed-off-by: Salam Noureddine <nouredd...@arista.com> --- include/linux/netdevice.h | 2 ++ net/core/dev.c| 48 ++- 2 files changed, 45 insertions(+), 5 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c20b814..1b12269 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2183,6 +2183,8 @@ struct netdev_lag_lower_state_info { #define NETDEV_BONDING_INFO0x0019 #define NETDEV_PRECHANGEUPPER 0x001A #define NETDEV_CHANGELOWERSTATE0x001B +#define NETDEV_UNREGISTER_BATCH0x001C +#define NETDEV_DOWN_BATCH 0x001D int register_netdevice_notifier(struct notifier_block *nb); int unregister_netdevice_notifier(struct notifier_block *nb); diff --git a/net/core/dev.c b/net/core/dev.c index 914b4a2..dbd8995 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1439,6 +1439,8 @@ static int __dev_close(struct net_device *dev) int dev_close_many(struct list_head *head, bool unlink) { struct net_device *dev, *tmp; + struct net *net, *net_tmp; + LIST_HEAD(net_head); /* Remove the devices that don't need to be closed */ list_for_each_entry_safe(dev, tmp, head, close_list) @@ -1447,13 +1449,22 @@ int dev_close_many(struct list_head *head, bool unlink) __dev_close_many(head); - list_for_each_entry_safe(dev, tmp, head, close_list) { + list_for_each_entry(dev, head, close_list) { rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING, GFP_KERNEL); call_netdevice_notifiers(NETDEV_DOWN, dev); + } + + list_for_each_entry_safe(dev, tmp, head, close_list) { + net_add_event_list(_head, dev_net(dev)); if (unlink) list_del_init(>close_list); } + list_for_each_entry_safe(net, net_tmp, _head, event_list) { + call_netdevice_notifiers(NETDEV_DOWN_BATCH, net->loopback_dev); + net_del_event_list(net); + } + return 0; } EXPORT_SYMBOL(dev_close_many); @@ -1572,12 +1583,17 @@ rollback: call_netdevice_notifier(nb, NETDEV_GOING_DOWN, dev); call_netdevice_notifier(nb, NETDEV_DOWN, dev); + call_netdevice_notifier(nb, NETDEV_DOWN_BATCH, + dev); } call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev); } + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, + net->loopback_dev); } outroll: + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, last); raw_notifier_chain_unregister(_chain, nb); goto unlock; } @@ -1614,9 +1630,13 @@ int unregister_netdevice_notifier(struct notifier_block *nb) call_netdevice_notifier(nb, NETDEV_GOING_DOWN, dev); call_netdevice_notifier(nb, NETDEV_DOWN, dev); + call_netdevice_notifier(nb, NETDEV_DOWN_BATCH, + dev); } call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev); } + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, + net->loopback_dev); } unlock: rtnl_unlock(); @@ -6187,10 +6207,12 @@ void __dev_notify_flags(struct net_device *dev, unsigned int old_flags, rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges, GFP_ATOMIC); if (changes & IFF_UP) { - if (dev->flags & IFF_UP) + if (dev->flags & IFF_UP) { call_netdevice_notifiers(NETDEV_UP, dev); - else + } else { call_netdevice_notifiers(NETDEV_DOWN, dev); + call_netdevice_notifiers(NETDEV_DOWN_BATCH, dev); + } } if (dev->flags & IFF_UP && @@ -6427,7 +6449,9 @@ static void net_set_todo(struct net_device *dev) static void rollback_registered_many(struct list_head *head) { struct net_device *dev, *tmp; + struct net *net, *net_tmp; LIST_HEAD(close_head); + LIST_HEAD(net_head); BUG_ON(dev_boot_phase); ASSERT_RTNL(); @@ -6465,8 +6489,6 @@ static void rollback_registered_many(struct list_head *head) synchronize_net(); list_for_each_entry(dev, head, unreg_list) { - struct
[PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister
Call fib_flush at the end when closing or unregistering multiple devices. This can save walking the fib many times and greatly reduce rtnl_lock hold time when unregistering many devices with a fib having hundreds of thousands of routes. Signed-off-by: Salam Noureddine <nouredd...@arista.com> --- include/net/netns/ipv4.h | 1 + net/ipv4/fib_frontend.c | 16 ++-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index d75be32..d59a078 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -111,5 +111,6 @@ struct netns_ipv4 { #endif #endif atomic_trt_genid; + boolneeds_fib_flush; }; #endif diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 4734475..808426e 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -1161,11 +1161,22 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo unsigned int flags; if (event == NETDEV_UNREGISTER) { - fib_disable_ip(dev, event, true); + if (fib_sync_down_dev(dev, event, true)) + net->ipv4.needs_fib_flush = true; rt_flush_dev(dev); return NOTIFY_DONE; } + if (event == NETDEV_UNREGISTER_BATCH || event == NETDEV_DOWN_BATCH) { + if (net->ipv4.needs_fib_flush) { + fib_flush(net); + net->ipv4.needs_fib_flush = false; + } + rt_cache_flush(net); + arp_ifdown_all(); + return NOTIFY_DONE; + } + in_dev = __in_dev_get_rtnl(dev); if (!in_dev) return NOTIFY_DONE; @@ -1182,7 +1193,8 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo rt_cache_flush(net); break; case NETDEV_DOWN: - fib_disable_ip(dev, event, false); + if (fib_sync_down_dev(dev, event, false)) + net->ipv4.needs_fib_flush = true; break; case NETDEV_CHANGE: flags = dev_get_flags(dev); -- 1.8.1.4
[PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown
Added changes suggested by Julian Anastasov in version 2. fib_flush walks the whole fib in a net_namespace and is called for each net_device being closed or unregistered. This can be very expensive when dealing with 100k or more routes in the fib and removal of a lot of interfaces. These four patches deal with this issue by calling fib_flush just once for each net namespace and introduce a new function arp_ifdown_all that does a similar optimization for the neighbour table. The benchmark tests were run on linux-3.18. Salam Noureddine (4): net: add event_list to struct net and provide utility functions net: dev: add batching to net_device notifiers net: core: introduce neigh_ifdown_all for all down interfaces net: fib: avoid calling fib_flush for each device when doing batch close and unregister include/linux/netdevice.h | 2 ++ include/net/arp.h | 1 + include/net/neighbour.h | 1 + include/net/net_namespace.h | 22 + include/net/netns/ipv4.h| 1 + net/core/dev.c | 48 - net/core/neighbour.c| 38 --- net/core/net_namespace.c| 1 + net/ipv4/arp.c | 4 net/ipv4/fib_frontend.c | 16 +-- 10 files changed, 120 insertions(+), 14 deletions(-) -- 1.8.1.4
Re: [PATCH net-next 2/4] net: dev: add batching to net_device notifiers
On Wed, Feb 3, 2016 at 12:08 AM, Julian Anastasovwrote: > Aha, I see, it is after NETDEV_UNREGISTER but may be > the above loop should be changed to two loops so that > NETDEV_UNREGISTER_BATCH is called exactly after all > NETDEV_UNREGISTER and before all dev_*_flush and > ndo_uninit calls to avoid any risks. I mean: > > synchronize_net(); > > First part of loop: > > list_for_each_entry(dev, head, unreg_list) { > /* Shutdown queueing discipline. */ > dev_shutdown(dev); > > /* Notify protocols, that we are about to destroy >this device. They should clean all the things. > */ > call_netdevice_notifiers(NETDEV_UNREGISTER, dev); > } > > This is the same NETDEV_UNREGISTER_BATCH logic: > > + list_for_each_entry(dev, head, unreg_list) { > + net_add_event_list(_head, dev_net(dev)); > + } > + list_for_each_entry_safe(net, net_tmp, _head, event_list) { > + call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, > +net->loopback_dev); > + net_del_event_list(net); > + } > > Second part of the loop: > > list_for_each_entry(dev, head, unreg_list) { > struct sk_buff *skb = NULL; > > if (!dev->rtnl_link_ops || > ... > > Regards > > -- > Julian Anastasov You're right. It is probably safer to organize the code the way you said. Will change that, Thanks! Salam
Re: [PATCH net-next 1/4] net: add event_list to struct net and provide utility functions
On Tue, Feb 2, 2016 at 12:01 PM, Julian Anastasovwrote: >> +#ifdef CONFIG_NET_NS >> +static inline void net_add_event_list(struct list_head *head, struct net >> *net) >> +{ >> + if (!list_empty(>event_list)) > > Above check looks inverted, it works may be > because INIT_LIST_HEAD(>event_list) is missing. > >> + list_add_tail(>event_list, head); >> +} >> + Thanks for catching this! I ran my benchmark again with the corrected check and I still get the same benefits with respect to rtnl_lock hold time. Salam
Re: [PATCH net-next 2/4] net: dev: add batching to net_device notifiers
On Tue, Feb 2, 2016 at 12:55 PM, Julian Anastasov <j...@ssi.bg> wrote: > > Hello, > > On Mon, 1 Feb 2016, Salam Noureddine wrote: > >> @@ -1572,8 +1582,12 @@ rollback: >> call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev); >> + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, >> + dev); > > If the rule is once per net, the above call... > >> } > > should be here: > > call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, > net->loopback_dev); > > and also once after outroll label?: That's a good optimization to add. I was mostly focusing on the device unregister path. > > call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, last); > >> } >> >> @@ -1614,8 +1628,12 @@ int unregister_netdevice_notifier(struct >> notifier_block *nb) >> call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev); >> + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, >> + dev); > > Above call... > >> } > > should be here, for net->loopback_dev? > Also, is it ok to call NETDEV_DOWN_BATCH many times, as result, > sometimes after NETDEV_UNREGISTER? Same here, I can add this optimization. I think it is fine to call the BATCH notifiers for every interface. It is just better to do it for many interfaces at the same time. >> + list_for_each_entry_safe(net, net_tmp, _head, event_list) { >> + call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, >> + net->loopback_dev); >> + net_del_event_list(net); >> + } >> + > > NETDEV_UNREGISTER* should not be called before > following synchronize_net and NETDEV_UNREGISTER. May be > we should split the loop: loop (dev_shutdown+NETDEV_UNREGISTER) > followed by above NETDEV_UNREGISTER_BATCH then again the > loop for all remaining calls > >> synchronize_net(); >> >> list_for_each_entry(dev, head, unreg_list) > > Regards > > -- > Julian Anastasov <j...@ssi.bg> The call to NETDEV_UNREGISTER_BATCH is actually after NETDEV_UNREGISTER, it seems the other way around in the patch because it is showing part of netdev_wait_allrefs and not rollback_registered_may. Thanks, Salam
[PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown
fib_flush walks the whole fib in a net_namespace and is called for each net_device being closed or unregistered. This can be very expensive when dealing with 100k or more routes in the fib and removal of a lot of interfaces. These four patches deal with this issue by calling fib_flush just once for each net namespace and introduce a new function arp_ifdown_all that does a similar optimization for the neighbour table. I got the following benchmark results on one of our switches. Without this patch, deleting 1k interfaces with 100k routes in the fib held the rtnl_lock for 13 seconds. With the patch, rtnl_lock hold time went down to 5 seconds. The gain is even more pronounced with 512k routes in the FIB. In this case, without the patch, rtnl_lock was held for 36 seconds and with the patch it was held for 5.5 seconds. Salam Noureddine (4): net: add event_list to struct net and provide utility functions net: dev: add batching to net_device notifiers net: core: introduce neigh_ifdown_all for all down interfaces net: fib: avoid calling fib_flush for each device when doing batch close and unregister include/linux/netdevice.h | 2 ++ include/net/arp.h | 1 + include/net/neighbour.h | 1 + include/net/net_namespace.h | 22 ++ include/net/netns/ipv4.h| 1 + net/core/dev.c | 39 --- net/core/neighbour.c| 38 +++--- net/ipv4/arp.c | 4 net/ipv4/fib_frontend.c | 16 ++-- 9 files changed, 112 insertions(+), 12 deletions(-) -- 1.8.1.4
[PATCH net-next 2/4] net: dev: add batching to net_device notifiers
This can be used to optimize bringing down and unregsitering net_devices by running certain cleanup operations only on the net namespace instead of on each net_device. Signed-off-by: Salam Noureddine <nouredd...@arista.com> --- include/linux/netdevice.h | 2 ++ net/core/dev.c| 39 --- 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c20b814..1b12269 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2183,6 +2183,8 @@ struct netdev_lag_lower_state_info { #define NETDEV_BONDING_INFO0x0019 #define NETDEV_PRECHANGEUPPER 0x001A #define NETDEV_CHANGELOWERSTATE0x001B +#define NETDEV_UNREGISTER_BATCH0x001C +#define NETDEV_DOWN_BATCH 0x001D int register_netdevice_notifier(struct notifier_block *nb); int unregister_netdevice_notifier(struct notifier_block *nb); diff --git a/net/core/dev.c b/net/core/dev.c index 914b4a2..77410a3 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1439,11 +1439,16 @@ static int __dev_close(struct net_device *dev) int dev_close_many(struct list_head *head, bool unlink) { struct net_device *dev, *tmp; + struct net *net, *net_tmp; + LIST_HEAD(net_head); /* Remove the devices that don't need to be closed */ - list_for_each_entry_safe(dev, tmp, head, close_list) + list_for_each_entry_safe(dev, tmp, head, close_list) { if (!(dev->flags & IFF_UP)) list_del_init(>close_list); + else + net_add_event_list(_head, dev_net(dev)); + } __dev_close_many(head); @@ -1454,6 +1459,11 @@ int dev_close_many(struct list_head *head, bool unlink) list_del_init(>close_list); } + list_for_each_entry_safe(net, net_tmp, _head, event_list) { + call_netdevice_notifiers(NETDEV_DOWN_BATCH, net->loopback_dev); + net_del_event_list(net); + } + return 0; } EXPORT_SYMBOL(dev_close_many); @@ -1572,8 +1582,12 @@ rollback: call_netdevice_notifier(nb, NETDEV_GOING_DOWN, dev); call_netdevice_notifier(nb, NETDEV_DOWN, dev); + call_netdevice_notifier(nb, NETDEV_DOWN_BATCH, + dev); } call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev); + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, + dev); } } @@ -1614,8 +1628,12 @@ int unregister_netdevice_notifier(struct notifier_block *nb) call_netdevice_notifier(nb, NETDEV_GOING_DOWN, dev); call_netdevice_notifier(nb, NETDEV_DOWN, dev); + call_netdevice_notifier(nb, NETDEV_DOWN_BATCH, + dev); } call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev); + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, + dev); } } unlock: @@ -6187,10 +6205,12 @@ void __dev_notify_flags(struct net_device *dev, unsigned int old_flags, rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges, GFP_ATOMIC); if (changes & IFF_UP) { - if (dev->flags & IFF_UP) + if (dev->flags & IFF_UP) { call_netdevice_notifiers(NETDEV_UP, dev); - else + } else { call_netdevice_notifiers(NETDEV_DOWN, dev); + call_netdevice_notifiers(NETDEV_DOWN_BATCH, dev); + } } if (dev->flags & IFF_UP && @@ -6427,7 +6447,9 @@ static void net_set_todo(struct net_device *dev) static void rollback_registered_many(struct list_head *head) { struct net_device *dev, *tmp; + struct net *net, *net_tmp; LIST_HEAD(close_head); + LIST_HEAD(net_head); BUG_ON(dev_boot_phase); ASSERT_RTNL(); @@ -6504,6 +6526,15 @@ static void rollback_registered_many(struct list_head *head) #endif } + list_for_each_entry(dev, head, unreg_list) { + net_add_event_list(_head, dev_net(dev)); + } + list_for_each_entry_safe(net, net_tmp, _head, event_list) { + call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, +net->loopback_dev); + net_del_event_list(net); + } + synchronize_net(); list_for_each
[PATCH net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister
Call fib_flush at the end when closing or unregistering multiple devices. This can save walking the fib many times and greatly reduce rtnl_lock hold time when unregistering many devices with a fib having hundreds of thousands of routes. Signed-off-by: Salam Noureddine <nouredd...@arista.com> --- include/net/netns/ipv4.h | 1 + net/ipv4/fib_frontend.c | 16 ++-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index d75be32..d59a078 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -111,5 +111,6 @@ struct netns_ipv4 { #endif #endif atomic_trt_genid; + boolneeds_fib_flush; }; #endif diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 4734475..808426e 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -1161,11 +1161,22 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo unsigned int flags; if (event == NETDEV_UNREGISTER) { - fib_disable_ip(dev, event, true); + if (fib_sync_down_dev(dev, event, true)) + net->ipv4.needs_fib_flush = true; rt_flush_dev(dev); return NOTIFY_DONE; } + if (event == NETDEV_UNREGISTER_BATCH || event == NETDEV_DOWN_BATCH) { + if (net->ipv4.needs_fib_flush) { + fib_flush(net); + net->ipv4.needs_fib_flush = false; + } + rt_cache_flush(net); + arp_ifdown_all(); + return NOTIFY_DONE; + } + in_dev = __in_dev_get_rtnl(dev); if (!in_dev) return NOTIFY_DONE; @@ -1182,7 +1193,8 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo rt_cache_flush(net); break; case NETDEV_DOWN: - fib_disable_ip(dev, event, false); + if (fib_sync_down_dev(dev, event, false)) + net->ipv4.needs_fib_flush = true; break; case NETDEV_CHANGE: flags = dev_get_flags(dev); -- 1.8.1.4
[PATCH net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces
This cleans up neighbour entries for all interfaces in the down state, avoiding walking the whole neighbour table for each interface being brought down. Signed-off-by: Salam Noureddine <nouredd...@arista.com> --- include/net/arp.h | 1 + include/net/neighbour.h | 1 + net/core/neighbour.c| 38 +++--- net/ipv4/arp.c | 4 4 files changed, 37 insertions(+), 7 deletions(-) diff --git a/include/net/arp.h b/include/net/arp.h index 5e0f891..0efee66 100644 --- a/include/net/arp.h +++ b/include/net/arp.h @@ -43,6 +43,7 @@ void arp_send(int type, int ptype, __be32 dest_ip, const unsigned char *src_hw, const unsigned char *th); int arp_mc_map(__be32 addr, u8 *haddr, struct net_device *dev, int dir); void arp_ifdown(struct net_device *dev); +void arp_ifdown_all(void); struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip, struct net_device *dev, __be32 src_ip, diff --git a/include/net/neighbour.h b/include/net/neighbour.h index 8b68384..8785d7b 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -318,6 +318,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, u32 flags); void __neigh_set_probe_once(struct neighbour *neigh); void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev); int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev); +int neigh_ifdown_all(struct neigh_table *tbl); int neigh_resolve_output(struct neighbour *neigh, struct sk_buff *skb); int neigh_connected_output(struct neighbour *neigh, struct sk_buff *skb); int neigh_direct_output(struct neighbour *neigh, struct sk_buff *skb); diff --git a/net/core/neighbour.c b/net/core/neighbour.c index f18ae91..bfbd97a 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -54,7 +54,8 @@ do { \ static void neigh_timer_handler(unsigned long arg); static void __neigh_notify(struct neighbour *n, int type, int flags); static void neigh_update_notify(struct neighbour *neigh); -static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev); +static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev, +bool all_down); #ifdef CONFIG_PROC_FS static const struct file_operations neigh_stat_seq_fops; @@ -192,7 +193,8 @@ static void pneigh_queue_purge(struct sk_buff_head *list) } } -static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) +static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev, + bool all_down) { int i; struct neigh_hash_table *nht; @@ -210,6 +212,12 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) np = >next; continue; } + if (!dev && n->dev && all_down) { + if (n->dev->flags & IFF_UP) { + np = >next; + continue; + } + } rcu_assign_pointer(*np, rcu_dereference_protected(n->next, lockdep_is_held(>lock))); @@ -245,7 +253,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev) { write_lock_bh(>lock); - neigh_flush_dev(tbl, dev); + neigh_flush_dev(tbl, dev, false); write_unlock_bh(>lock); } EXPORT_SYMBOL(neigh_changeaddr); @@ -253,8 +261,8 @@ EXPORT_SYMBOL(neigh_changeaddr); int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev) { write_lock_bh(>lock); - neigh_flush_dev(tbl, dev); - pneigh_ifdown(tbl, dev); + neigh_flush_dev(tbl, dev, false); + pneigh_ifdown(tbl, dev, false); write_unlock_bh(>lock); del_timer_sync(>proxy_timer); @@ -263,6 +271,19 @@ int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev) } EXPORT_SYMBOL(neigh_ifdown); +int neigh_ifdown_all(struct neigh_table *tbl) +{ + write_lock_bh(>lock); + neigh_flush_dev(tbl, NULL, true); + pneigh_ifdown(tbl, NULL, true); + write_unlock_bh(>lock); + + del_timer_sync(>proxy_timer); + pneigh_queue_purge(>proxy_queue); + return 0; +} +EXPORT_SYMBOL(neigh_ifdown_all); + static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device *dev) { struct neighbour *n = NULL; @@ -645,7 +666,8 @@ int pneigh_delete(struct neigh_table *tbl, struct net *net, const void *pkey, return -ENOENT; } -static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
[PATCH net-next 1/4] net: add event_list to struct net and provide utility functions
Signed-off-by: Salam Noureddine <nouredd...@arista.com> --- include/net/net_namespace.h | 22 ++ 1 file changed, 22 insertions(+) diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 4089abc..4cf47de 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -58,6 +58,7 @@ struct net { struct list_headlist; /* list of network namespaces */ struct list_headcleanup_list; /* namespaces on death row */ struct list_headexit_list; /* Use only net_mutex */ + struct list_headevent_list; /* net_device notifier list */ struct user_namespace *user_ns; /* Owning user namespace */ spinlock_t nsid_lock; @@ -380,4 +381,25 @@ static inline void fnhe_genid_bump(struct net *net) atomic_inc(>fnhe_genid); } +#ifdef CONFIG_NET_NS +static inline void net_add_event_list(struct list_head *head, struct net *net) +{ + if (!list_empty(>event_list)) + list_add_tail(>event_list, head); +} + +static inline void net_del_event_list(struct net *net) +{ + list_del_init(>event_list); +} +#else +static inline void net_add_event_list(struct list_head *head, struct net *net) +{ +} + +static inline void net_del_event_list(struct net *net) +{ +} +#endif + #endif /* __NET_NET_NAMESPACE_H */ -- 1.8.1.4
[PATCH net-next 1/4] net: add event_list to struct net and provide utility functions
Signed-off-by: Salam Noureddine <nouredd...@arista.com> --- include/net/net_namespace.h | 22 ++ 1 file changed, 22 insertions(+) diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 4089abc..4cf47de 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -58,6 +58,7 @@ struct net { struct list_headlist; /* list of network namespaces */ struct list_headcleanup_list; /* namespaces on death row */ struct list_headexit_list; /* Use only net_mutex */ + struct list_headevent_list; /* net_device notifier list */ struct user_namespace *user_ns; /* Owning user namespace */ spinlock_t nsid_lock; @@ -380,4 +381,25 @@ static inline void fnhe_genid_bump(struct net *net) atomic_inc(>fnhe_genid); } +#ifdef CONFIG_NET_NS +static inline void net_add_event_list(struct list_head *head, struct net *net) +{ + if (!list_empty(>event_list)) + list_add_tail(>event_list, head); +} + +static inline void net_del_event_list(struct net *net) +{ + list_del_init(>event_list); +} +#else +static inline void net_add_event_list(struct list_head *head, struct net *net) +{ +} + +static inline void net_del_event_list(struct net *net) +{ +} +#endif + #endif /* __NET_NET_NAMESPACE_H */ -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister
Call fib_flush at the end when closing or unregistering multiple devices. This can save walking the fib many times and greatly reduce rtnl_lock hold time when unregistering many devices with a fib having hundreds of thousands of routes. Signed-off-by: Salam Noureddine <nouredd...@arista.com> --- include/net/netns/ipv4.h | 1 + net/ipv4/fib_frontend.c | 16 ++-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index d75be32..d59a078 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -111,5 +111,6 @@ struct netns_ipv4 { #endif #endif atomic_trt_genid; + boolneeds_fib_flush; }; #endif diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 4734475..808426e 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -1161,11 +1161,22 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo unsigned int flags; if (event == NETDEV_UNREGISTER) { - fib_disable_ip(dev, event, true); + if (fib_sync_down_dev(dev, event, true)) + net->ipv4.needs_fib_flush = true; rt_flush_dev(dev); return NOTIFY_DONE; } + if (event == NETDEV_UNREGISTER_BATCH || event == NETDEV_DOWN_BATCH) { + if (net->ipv4.needs_fib_flush) { + fib_flush(net); + net->ipv4.needs_fib_flush = false; + } + rt_cache_flush(net); + arp_ifdown_all(); + return NOTIFY_DONE; + } + in_dev = __in_dev_get_rtnl(dev); if (!in_dev) return NOTIFY_DONE; @@ -1182,7 +1193,8 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo rt_cache_flush(net); break; case NETDEV_DOWN: - fib_disable_ip(dev, event, false); + if (fib_sync_down_dev(dev, event, false)) + net->ipv4.needs_fib_flush = true; break; case NETDEV_CHANGE: flags = dev_get_flags(dev); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown
fib_flush walks the whole fib in a net_namespace and is called for each net_device being closed or unregistered. This can be very expensive when dealing with 100k or more routes in the fib and removal of a lot of interfaces. These four patches deal with this issue by calling fib_flush just once for each net namespace and introduce a new function arp_ifdown_all that does a similar optimization for the neighbour table. Salam Noureddine (4): net: add event_list to struct net and provide utility functions net: dev: add batching to net_device notifiers net: core: introduce neigh_ifdown_all for all down interfaces net: fib: avoid calling fib_flush for each device when doing batch close and unregister include/linux/netdevice.h | 2 ++ include/net/arp.h | 1 + include/net/neighbour.h | 1 + include/net/net_namespace.h | 22 ++ include/net/netns/ipv4.h| 1 + net/core/dev.c | 39 --- net/core/neighbour.c| 38 +++--- net/ipv4/arp.c | 4 net/ipv4/fib_frontend.c | 16 ++-- 9 files changed, 112 insertions(+), 12 deletions(-) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces
This cleans up neighbour entries for all interfaces in the down state, avoiding walking the whole neighbour table for each interface being brought down. Signed-off-by: Salam Noureddine <nouredd...@arista.com> --- include/net/arp.h | 1 + include/net/neighbour.h | 1 + net/core/neighbour.c| 38 +++--- net/ipv4/arp.c | 4 4 files changed, 37 insertions(+), 7 deletions(-) diff --git a/include/net/arp.h b/include/net/arp.h index 5e0f891..0efee66 100644 --- a/include/net/arp.h +++ b/include/net/arp.h @@ -43,6 +43,7 @@ void arp_send(int type, int ptype, __be32 dest_ip, const unsigned char *src_hw, const unsigned char *th); int arp_mc_map(__be32 addr, u8 *haddr, struct net_device *dev, int dir); void arp_ifdown(struct net_device *dev); +void arp_ifdown_all(void); struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip, struct net_device *dev, __be32 src_ip, diff --git a/include/net/neighbour.h b/include/net/neighbour.h index 8b68384..8785d7b 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -318,6 +318,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, u32 flags); void __neigh_set_probe_once(struct neighbour *neigh); void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev); int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev); +int neigh_ifdown_all(struct neigh_table *tbl); int neigh_resolve_output(struct neighbour *neigh, struct sk_buff *skb); int neigh_connected_output(struct neighbour *neigh, struct sk_buff *skb); int neigh_direct_output(struct neighbour *neigh, struct sk_buff *skb); diff --git a/net/core/neighbour.c b/net/core/neighbour.c index f18ae91..bfbd97a 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -54,7 +54,8 @@ do { \ static void neigh_timer_handler(unsigned long arg); static void __neigh_notify(struct neighbour *n, int type, int flags); static void neigh_update_notify(struct neighbour *neigh); -static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev); +static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev, +bool all_down); #ifdef CONFIG_PROC_FS static const struct file_operations neigh_stat_seq_fops; @@ -192,7 +193,8 @@ static void pneigh_queue_purge(struct sk_buff_head *list) } } -static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) +static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev, + bool all_down) { int i; struct neigh_hash_table *nht; @@ -210,6 +212,12 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) np = >next; continue; } + if (!dev && n->dev && all_down) { + if (n->dev->flags & IFF_UP) { + np = >next; + continue; + } + } rcu_assign_pointer(*np, rcu_dereference_protected(n->next, lockdep_is_held(>lock))); @@ -245,7 +253,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev) { write_lock_bh(>lock); - neigh_flush_dev(tbl, dev); + neigh_flush_dev(tbl, dev, false); write_unlock_bh(>lock); } EXPORT_SYMBOL(neigh_changeaddr); @@ -253,8 +261,8 @@ EXPORT_SYMBOL(neigh_changeaddr); int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev) { write_lock_bh(>lock); - neigh_flush_dev(tbl, dev); - pneigh_ifdown(tbl, dev); + neigh_flush_dev(tbl, dev, false); + pneigh_ifdown(tbl, dev, false); write_unlock_bh(>lock); del_timer_sync(>proxy_timer); @@ -263,6 +271,19 @@ int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev) } EXPORT_SYMBOL(neigh_ifdown); +int neigh_ifdown_all(struct neigh_table *tbl) +{ + write_lock_bh(>lock); + neigh_flush_dev(tbl, NULL, true); + pneigh_ifdown(tbl, NULL, true); + write_unlock_bh(>lock); + + del_timer_sync(>proxy_timer); + pneigh_queue_purge(>proxy_queue); + return 0; +} +EXPORT_SYMBOL(neigh_ifdown_all); + static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device *dev) { struct neighbour *n = NULL; @@ -645,7 +666,8 @@ int pneigh_delete(struct neigh_table *tbl, struct net *net, const void *pkey, return -ENOENT; } -static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
Re: [PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown
On Mon, Jan 4, 2016 at 4:35 PM, Eric W. Biedermanwrote: > Two things would be very valuable with this patchset. > > Some numbers on how much your changes have improved the code in the case > you care about. I suspect the improvements are not subtle so this > should not be hard. > > Can you please provide a justification for event_list. Just skimming > through it appears that event_list because a duplicate of the list of > batched network devices that are passed to dev_close_many and friends. > If the list is actually a duplicate it appears foolish to create it. > > Eric The performance test I ran tries to unregister 1000 dummy interfaces with 512K routes in the fib. Without the patch I could unregister 35 interfaces per second and with the patch it jumped to 620 interfaces per second. 512K is a lot of routes but I am assuming we would get a good improvement even with 100K routes in the fib. I am using event_list to put all the net namespaces in the current net_device batch on a list and only call the NETDEV_UNREGISTER_BATCH on those namespaces. It would be possible to just call the notifier for NETDEV_DOWN/UNREGISTER_BATCH for all the devices on the list and rely on the needs_fib_flush flag to only call fib_flush once per namespace but it seems like a waste to me. Salam -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html