Inefficient call to ipv6_chk_acast_addr_src in icmp6_send

2018-05-25 Thread Salam Noureddine
Hi,

The call to ipv6_chk_acast_addr_src in icmp6_send can be pretty costly on
systems with a lot of net_devices since it can end up looping through all
net_devices in a net namespace searching for an anycast address. A few
thousand icmp6 error packets can end up consuming a whole CPU.
I am thinking of fixing this by adding a hash table along the lines of
inet6_addr_lst,
providing a fast lookup for anycast addresses. Is that the right way to go?

Thanks,

Salam


Re: Null pointer dereference in tg3_poll_work running linux-3.4

2017-03-28 Thread Salam Noureddine
On Tue, Mar 28, 2017 at 11:53 AM, Michael Chan
 wrote:

> I don't remember any bug fixes related to this logic.  One possibility
> is that there is DMA corruption and we are getting a bad opaque value.
> If you have the core dump, it will be useful to look at the receive
> completion ring.   These opaque values are simple index values and
> should be in sequence.

Unfortunately I don't have a core dump, this has happened around 5
times on boxes that have been up between 2 and 7 months. So it
seems extremely rare. So far, the only information I have is the
stack trace.


Null pointer dereference in tg3_poll_work running linux-3.4

2017-03-28 Thread Salam Noureddine
Hi,

We've seen a very rare kernel panic in tg3_poll_work on hardware
running linux-3.4.
I haven't seen any upstream patches that seem to fix this issue in the
tg3 driver.
The disassembly shows that the panic is happening in tg3_rx which is
inlined into
tg3_poll_work. In the code below, the "data" pointer seem to be Null,

tg3_recycle_rx(tnapi, tpr, opaque_key,
   desc_idx, *post_ptr);

skb = netdev_alloc_skb(tp->dev,
   len + TG3_RAW_IP_ALIGN);

if (skb == NULL)
goto drop_it_no_recycle;

skb_reserve(skb, TG3_RAW_IP_ALIGN);
pci_dma_sync_single_for_cpu(tp->pdev,
dma_addr, len, PCI_DMA_FROMDEVICE);
memcpy(skb->data,
   data + TG3_RX_OFFSET(tp),
   len);

pci_dma_sync_single_for_device(tp->pdev, dma_addr, len,
PCI_DMA_FROMDEVICE);

I am wondering if anyone has seen this before or if it was fixed and I
missed the patch for it. If not,
any ideas on how we could end up with data being null? I don't have a
reproduction scenario for
this one.

Thanks,

Salam


Re: [PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister

2016-02-06 Thread Salam Noureddine
On Fri, Feb 5, 2016 at 8:04 AM, Sergei Shtylyov
<sergei.shtyl...@cogentembedded.com> wrote:
> On 02/05/2016 02:35 AM, Salam Noureddine wrote:

>>
>> if (event == NETDEV_UNREGISTER) {
>> -   fib_disable_ip(dev, event, true);
>> +   if (fib_sync_down_dev(dev, event, true))
>> +   net->ipv4.needs_fib_flush = true;
>> rt_flush_dev(dev);
>> return NOTIFY_DONE;
>> }
>>
>> +   if (event == NETDEV_UNREGISTER_BATCH || event ==
>> NETDEV_DOWN_BATCH) {
>> +   if (net->ipv4.needs_fib_flush) {
>> +   fib_flush(net);
>> +   net->ipv4.needs_fib_flush = false;
>> +   }
>> +   rt_cache_flush(net);
>> +   arp_ifdown_all();
>> +   return NOTIFY_DONE;
>> +   }
>> +
>
>
>I'd convert to *switch* the above 2 *if*'s...
>
> [...]
>
> MBR, Sergei
>

I could do that.

Thanks,

Salam


Re: [PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers

2016-02-06 Thread Salam Noureddine
On Sat, Feb 6, 2016 at 10:58 AM, Julian Anastasov  wrote:

>
> I now see that we should split the loop
> here, so that NETDEV_DOWN_BATCH is called only once per net:
>
> bool down = false;
>
> for_each_netdev(net, dev) {
> if (dev == last)
> break;
>
> if (dev->flags & IFF_UP) {
> call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
> dev);
> call_netdevice_notifier(nb, NETDEV_DOWN, dev);
> down = true;
> }
> }
>
> rt_cache_flush and arp_ifdown_all will be called
> on NETDEV_UNREGISTER_BATCH, so use 'down' flag:
>
> if (down)
> call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
> net->loopback_dev);
> for_each_netdev(net, dev) {
> if (dev == last)
> goto outroll;
> call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
> }
> call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
> net->loopback_dev);
>
>
>>   }
>>   call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
>>   }
>> + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
>> + net->loopback_dev);
>>   }
>>
>>  outroll:
>> + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, last);
>>   raw_notifier_chain_unregister(_chain, nb);
>>   goto unlock;
>>  }

I am not sure we need to worry too much about optimizing the failure
code path to justify splitting into two loops.


>>  static void rollback_registered_many(struct list_head *head)
>>  {

>>   list_for_each_entry(dev, head, unreg_list) {
>> - struct sk_buff *skb = NULL;
>> -
>>   /* Shutdown queueing discipline. */
>>   dev_shutdown(dev);
>>
>> @@ -6475,6 +6497,20 @@ static void rollback_registered_many(struct list_head 
>> *head)
>>  this device. They should clean all the things.
>>   */
>>   call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
>> + }
>> +
>> + /* call batch notifiers which act on net namespaces */
>> + list_for_each_entry(dev, head, unreg_list) {
>> + net_add_event_list(_head, dev_net(dev));
>
> Looks like we can move the above net_add_event_list with
> the comment into the previous loop after NETDEV_UNREGISTER,
> we will save some cycles.
>

I didn't move it into the previous loop because the NETDEV_UNREGISTER
notifier can
end up calling rollback_registered_many (for example the vlan driver
unregistering
all vlans on top of a device) in which case we would be using the
event_list in the net
namespace.

> Julian Anastasov 

Thanks,

Salam


Re: [PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown

2016-02-05 Thread Salam Noureddine
Forgot to mention the rtnl_lock hold time gain with these changes.

I got the following benchmark results on one of our switches.
Without this patch, deleting 1k interfaces with 100k routes in the fib held
the rtnl_lock for 13 seconds. With the patch, rtnl_lock hold time went down
to 5 seconds. The gain is even more pronounced with 512k routes in the FIB.
In this case, without the patch, rtnl_lock was held for 36 seconds and with
the patch it was held for 5.5 seconds.

On Thu, Feb 4, 2016 at 3:35 PM, Salam Noureddine <nouredd...@arista.com> wrote:
> Added changes suggested by Julian Anastasov in version 2.
>
> fib_flush walks the whole fib in a net_namespace and is called for
> each net_device being closed or unregistered. This can be very expensive
> when dealing with 100k or more routes in the fib and removal of a lot
> of interfaces. These four patches deal with this issue by calling fib_flush
> just once for each net namespace and introduce a new function arp_ifdown_all
> that does a similar optimization for the neighbour table.
>
> The benchmark tests were run on linux-3.18.
>
> Salam Noureddine (4):
>   net: add event_list to struct net and provide utility functions
>   net: dev: add batching to net_device notifiers
>   net: core: introduce neigh_ifdown_all for all down interfaces
>   net: fib: avoid calling fib_flush for each device when doing batch
> close and unregister
>
>  include/linux/netdevice.h   |  2 ++
>  include/net/arp.h   |  1 +
>  include/net/neighbour.h |  1 +
>  include/net/net_namespace.h | 22 +
>  include/net/netns/ipv4.h|  1 +
>  net/core/dev.c  | 48 
> -
>  net/core/neighbour.c| 38 ---
>  net/core/net_namespace.c|  1 +
>  net/ipv4/arp.c  |  4 
>  net/ipv4/fib_frontend.c | 16 +--
>  10 files changed, 120 insertions(+), 14 deletions(-)
>
> --
> 1.8.1.4
>


[PATCH v2 net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces

2016-02-04 Thread Salam Noureddine
This cleans up neighbour entries for all interfaces in the down
state, avoiding walking the whole neighbour table for each interface
being brought down.

Signed-off-by: Salam Noureddine <nouredd...@arista.com>
---
 include/net/arp.h   |  1 +
 include/net/neighbour.h |  1 +
 net/core/neighbour.c| 38 +++---
 net/ipv4/arp.c  |  4 
 4 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/include/net/arp.h b/include/net/arp.h
index 5e0f891..0efee66 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -43,6 +43,7 @@ void arp_send(int type, int ptype, __be32 dest_ip,
  const unsigned char *src_hw, const unsigned char *th);
 int arp_mc_map(__be32 addr, u8 *haddr, struct net_device *dev, int dir);
 void arp_ifdown(struct net_device *dev);
+void arp_ifdown_all(void);
 
 struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip,
   struct net_device *dev, __be32 src_ip,
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 8b68384..8785d7b 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -318,6 +318,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, 
u8 new, u32 flags);
 void __neigh_set_probe_once(struct neighbour *neigh);
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev);
 int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
+int neigh_ifdown_all(struct neigh_table *tbl);
 int neigh_resolve_output(struct neighbour *neigh, struct sk_buff *skb);
 int neigh_connected_output(struct neighbour *neigh, struct sk_buff *skb);
 int neigh_direct_output(struct neighbour *neigh, struct sk_buff *skb);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index f18ae91..bfbd97a 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -54,7 +54,8 @@ do {  \
 static void neigh_timer_handler(unsigned long arg);
 static void __neigh_notify(struct neighbour *n, int type, int flags);
 static void neigh_update_notify(struct neighbour *neigh);
-static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
+static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
+bool all_down);
 
 #ifdef CONFIG_PROC_FS
 static const struct file_operations neigh_stat_seq_fops;
@@ -192,7 +193,8 @@ static void pneigh_queue_purge(struct sk_buff_head *list)
}
 }
 
-static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
+static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev,
+   bool all_down)
 {
int i;
struct neigh_hash_table *nht;
@@ -210,6 +212,12 @@ static void neigh_flush_dev(struct neigh_table *tbl, 
struct net_device *dev)
np = >next;
continue;
}
+   if (!dev && n->dev && all_down) {
+   if (n->dev->flags & IFF_UP) {
+   np = >next;
+   continue;
+   }
+   }
rcu_assign_pointer(*np,
   rcu_dereference_protected(n->next,
lockdep_is_held(>lock)));
@@ -245,7 +253,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct 
net_device *dev)
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev)
 {
write_lock_bh(>lock);
-   neigh_flush_dev(tbl, dev);
+   neigh_flush_dev(tbl, dev, false);
write_unlock_bh(>lock);
 }
 EXPORT_SYMBOL(neigh_changeaddr);
@@ -253,8 +261,8 @@ EXPORT_SYMBOL(neigh_changeaddr);
 int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
 {
write_lock_bh(>lock);
-   neigh_flush_dev(tbl, dev);
-   pneigh_ifdown(tbl, dev);
+   neigh_flush_dev(tbl, dev, false);
+   pneigh_ifdown(tbl, dev, false);
write_unlock_bh(>lock);
 
del_timer_sync(>proxy_timer);
@@ -263,6 +271,19 @@ int neigh_ifdown(struct neigh_table *tbl, struct 
net_device *dev)
 }
 EXPORT_SYMBOL(neigh_ifdown);
 
+int neigh_ifdown_all(struct neigh_table *tbl)
+{
+   write_lock_bh(>lock);
+   neigh_flush_dev(tbl, NULL, true);
+   pneigh_ifdown(tbl, NULL, true);
+   write_unlock_bh(>lock);
+
+   del_timer_sync(>proxy_timer);
+   pneigh_queue_purge(>proxy_queue);
+   return 0;
+}
+EXPORT_SYMBOL(neigh_ifdown_all);
+
 static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct 
net_device *dev)
 {
struct neighbour *n = NULL;
@@ -645,7 +666,8 @@ int pneigh_delete(struct neigh_table *tbl, struct net *net, 
const void *pkey,
return -ENOENT;
 }
 
-static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev)

[PATCH v2 net-next 1/4] net: add event_list to struct net and provide utility functions

2016-02-04 Thread Salam Noureddine

Signed-off-by: Salam Noureddine <nouredd...@arista.com>
---
 include/net/net_namespace.h | 22 ++
 net/core/net_namespace.c|  1 +
 2 files changed, 23 insertions(+)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4089abc..6dbc0b2 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -58,6 +58,7 @@ struct net {
struct list_headlist;   /* list of network namespaces */
struct list_headcleanup_list;   /* namespaces on death row */
struct list_headexit_list;  /* Use only net_mutex */
+   struct list_headevent_list; /* net_device notifier list */
 
struct user_namespace   *user_ns;   /* Owning user namespace */
spinlock_t  nsid_lock;
@@ -380,4 +381,25 @@ static inline void fnhe_genid_bump(struct net *net)
atomic_inc(>fnhe_genid);
 }
 
+#ifdef CONFIG_NET_NS
+static inline void net_add_event_list(struct list_head *head, struct net *net)
+{
+   if (list_empty(>event_list))
+   list_add_tail(>event_list, head);
+}
+
+static inline void net_del_event_list(struct net *net)
+{
+   list_del_init(>event_list);
+}
+#else
+static inline void net_add_event_list(struct list_head *head, struct net *net)
+{
+}
+
+static inline void net_del_event_list(struct net *net)
+{
+}
+#endif
+
 #endif /* __NET_NET_NAMESPACE_H */
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2c2eb1b..58e84ce 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -282,6 +282,7 @@ static __net_init int setup_net(struct net *net, struct 
user_namespace *user_ns)
net->user_ns = user_ns;
idr_init(>netns_ids);
spin_lock_init(>nsid_lock);
+   INIT_LIST_HEAD(>event_list);
 
list_for_each_entry(ops, _list, list) {
error = ops_init(ops, net);
-- 
1.8.1.4



[PATCH v2 net-next 2/4] net: dev: add batching to net_device notifiers

2016-02-04 Thread Salam Noureddine
This can be used to optimize bringing down and unregsitering
net_devices by running certain cleanup operations only on the
net namespace instead of on each net_device.

Signed-off-by: Salam Noureddine <nouredd...@arista.com>
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c| 48 ++-
 2 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c20b814..1b12269 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2183,6 +2183,8 @@ struct netdev_lag_lower_state_info {
 #define NETDEV_BONDING_INFO0x0019
 #define NETDEV_PRECHANGEUPPER  0x001A
 #define NETDEV_CHANGELOWERSTATE0x001B
+#define NETDEV_UNREGISTER_BATCH0x001C
+#define NETDEV_DOWN_BATCH  0x001D
 
 int register_netdevice_notifier(struct notifier_block *nb);
 int unregister_netdevice_notifier(struct notifier_block *nb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 914b4a2..dbd8995 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1439,6 +1439,8 @@ static int __dev_close(struct net_device *dev)
 int dev_close_many(struct list_head *head, bool unlink)
 {
struct net_device *dev, *tmp;
+   struct net *net, *net_tmp;
+   LIST_HEAD(net_head);
 
/* Remove the devices that don't need to be closed */
list_for_each_entry_safe(dev, tmp, head, close_list)
@@ -1447,13 +1449,22 @@ int dev_close_many(struct list_head *head, bool unlink)
 
__dev_close_many(head);
 
-   list_for_each_entry_safe(dev, tmp, head, close_list) {
+   list_for_each_entry(dev, head, close_list) {
rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING, GFP_KERNEL);
call_netdevice_notifiers(NETDEV_DOWN, dev);
+   }
+
+   list_for_each_entry_safe(dev, tmp, head, close_list) {
+   net_add_event_list(_head, dev_net(dev));
if (unlink)
list_del_init(>close_list);
}
 
+   list_for_each_entry_safe(net, net_tmp, _head, event_list) {
+   call_netdevice_notifiers(NETDEV_DOWN_BATCH, net->loopback_dev);
+   net_del_event_list(net);
+   }
+
return 0;
 }
 EXPORT_SYMBOL(dev_close_many);
@@ -1572,12 +1583,17 @@ rollback:
call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
dev);
call_netdevice_notifier(nb, NETDEV_DOWN, dev);
+   call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
+   dev);
}
call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
}
+   call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
+   net->loopback_dev);
}
 
 outroll:
+   call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, last);
raw_notifier_chain_unregister(_chain, nb);
goto unlock;
 }
@@ -1614,9 +1630,13 @@ int unregister_netdevice_notifier(struct notifier_block 
*nb)
call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
dev);
call_netdevice_notifier(nb, NETDEV_DOWN, dev);
+   call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
+   dev);
}
call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
}
+   call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
+   net->loopback_dev);
}
 unlock:
rtnl_unlock();
@@ -6187,10 +6207,12 @@ void __dev_notify_flags(struct net_device *dev, 
unsigned int old_flags,
rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges, GFP_ATOMIC);
 
if (changes & IFF_UP) {
-   if (dev->flags & IFF_UP)
+   if (dev->flags & IFF_UP) {
call_netdevice_notifiers(NETDEV_UP, dev);
-   else
+   } else {
call_netdevice_notifiers(NETDEV_DOWN, dev);
+   call_netdevice_notifiers(NETDEV_DOWN_BATCH, dev);
+   }
}
 
if (dev->flags & IFF_UP &&
@@ -6427,7 +6449,9 @@ static void net_set_todo(struct net_device *dev)
 static void rollback_registered_many(struct list_head *head)
 {
struct net_device *dev, *tmp;
+   struct net *net, *net_tmp;
LIST_HEAD(close_head);
+   LIST_HEAD(net_head);
 
BUG_ON(dev_boot_phase);
ASSERT_RTNL();
@@ -6465,8 +6489,6 @@ static void rollback_registered_many(struct list_head 
*head)
synchronize_net();
 
list_for_each_entry(dev, head, unreg_list) {
-   struct

[PATCH v2 net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister

2016-02-04 Thread Salam Noureddine
Call fib_flush at the end when closing or unregistering multiple
devices. This can save walking the fib many times and greatly
reduce rtnl_lock hold time when unregistering many devices with
a fib having hundreds of thousands of routes.

Signed-off-by: Salam Noureddine <nouredd...@arista.com>
---
 include/net/netns/ipv4.h |  1 +
 net/ipv4/fib_frontend.c  | 16 ++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index d75be32..d59a078 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -111,5 +111,6 @@ struct netns_ipv4 {
 #endif
 #endif
atomic_trt_genid;
+   boolneeds_fib_flush;
 };
 #endif
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4734475..808426e 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1161,11 +1161,22 @@ static int fib_netdev_event(struct notifier_block 
*this, unsigned long event, vo
unsigned int flags;
 
if (event == NETDEV_UNREGISTER) {
-   fib_disable_ip(dev, event, true);
+   if (fib_sync_down_dev(dev, event, true))
+   net->ipv4.needs_fib_flush = true;
rt_flush_dev(dev);
return NOTIFY_DONE;
}
 
+   if (event == NETDEV_UNREGISTER_BATCH || event == NETDEV_DOWN_BATCH) {
+   if (net->ipv4.needs_fib_flush) {
+   fib_flush(net);
+   net->ipv4.needs_fib_flush = false;
+   }
+   rt_cache_flush(net);
+   arp_ifdown_all();
+   return NOTIFY_DONE;
+   }
+
in_dev = __in_dev_get_rtnl(dev);
if (!in_dev)
return NOTIFY_DONE;
@@ -1182,7 +1193,8 @@ static int fib_netdev_event(struct notifier_block *this, 
unsigned long event, vo
rt_cache_flush(net);
break;
case NETDEV_DOWN:
-   fib_disable_ip(dev, event, false);
+   if (fib_sync_down_dev(dev, event, false))
+   net->ipv4.needs_fib_flush = true;
break;
case NETDEV_CHANGE:
flags = dev_get_flags(dev);
-- 
1.8.1.4



[PATCH v2 net-next 0/4] batch calls to fib_flush and arp_ifdown

2016-02-04 Thread Salam Noureddine
Added changes suggested by Julian Anastasov in version 2.

fib_flush walks the whole fib in a net_namespace and is called for
each net_device being closed or unregistered. This can be very expensive
when dealing with 100k or more routes in the fib and removal of a lot
of interfaces. These four patches deal with this issue by calling fib_flush
just once for each net namespace and introduce a new function arp_ifdown_all
that does a similar optimization for the neighbour table.

The benchmark tests were run on linux-3.18.

Salam Noureddine (4):
  net: add event_list to struct net and provide utility functions
  net: dev: add batching to net_device notifiers
  net: core: introduce neigh_ifdown_all for all down interfaces
  net: fib: avoid calling fib_flush for each device when doing batch
close and unregister

 include/linux/netdevice.h   |  2 ++
 include/net/arp.h   |  1 +
 include/net/neighbour.h |  1 +
 include/net/net_namespace.h | 22 +
 include/net/netns/ipv4.h|  1 +
 net/core/dev.c  | 48 -
 net/core/neighbour.c| 38 ---
 net/core/net_namespace.c|  1 +
 net/ipv4/arp.c  |  4 
 net/ipv4/fib_frontend.c | 16 +--
 10 files changed, 120 insertions(+), 14 deletions(-)

-- 
1.8.1.4



Re: [PATCH net-next 2/4] net: dev: add batching to net_device notifiers

2016-02-03 Thread Salam Noureddine
On Wed, Feb 3, 2016 at 12:08 AM, Julian Anastasov  wrote:


> Aha, I see, it is after NETDEV_UNREGISTER but may be
> the above loop should be changed to two loops so that
> NETDEV_UNREGISTER_BATCH is called exactly after all
> NETDEV_UNREGISTER and before all dev_*_flush and
> ndo_uninit calls to avoid any risks. I mean:
>
> synchronize_net();
>
> First part of loop:
>
> list_for_each_entry(dev, head, unreg_list) {
> /* Shutdown queueing discipline. */
> dev_shutdown(dev);
>
> /* Notify protocols, that we are about to destroy
>this device. They should clean all the things.
> */
> call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
> }
>
> This is the same NETDEV_UNREGISTER_BATCH logic:
>
> +   list_for_each_entry(dev, head, unreg_list) {
> +   net_add_event_list(_head, dev_net(dev));
> +   }
> +   list_for_each_entry_safe(net, net_tmp, _head, event_list) {
> +   call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH,
> +net->loopback_dev);
> +   net_del_event_list(net);
> +   }
>
> Second part of the loop:
>
> list_for_each_entry(dev, head, unreg_list) {
> struct sk_buff *skb = NULL;
>
> if (!dev->rtnl_link_ops ||
> ...
>
> Regards
>
> --
> Julian Anastasov 

You're right. It is probably safer to organize the code the way you said.
Will change that, Thanks!

Salam


Re: [PATCH net-next 1/4] net: add event_list to struct net and provide utility functions

2016-02-02 Thread Salam Noureddine
On Tue, Feb 2, 2016 at 12:01 PM, Julian Anastasov  wrote:

>> +#ifdef CONFIG_NET_NS
>> +static inline void net_add_event_list(struct list_head *head, struct net 
>> *net)
>> +{
>> + if (!list_empty(>event_list))
>
> Above check looks inverted, it works may be
> because INIT_LIST_HEAD(>event_list) is missing.
>
>> + list_add_tail(>event_list, head);
>> +}
>> +

Thanks for catching this! I ran my benchmark again with the corrected check
and I still get the same benefits with respect to rtnl_lock hold time.

Salam


Re: [PATCH net-next 2/4] net: dev: add batching to net_device notifiers

2016-02-02 Thread Salam Noureddine
On Tue, Feb 2, 2016 at 12:55 PM, Julian Anastasov <j...@ssi.bg> wrote:
>
> Hello,
>
> On Mon, 1 Feb 2016, Salam Noureddine wrote:
>
>> @@ -1572,8 +1582,12 @@ rollback:

>>   call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
>> + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
>> + dev);
>
> If the rule is once per net, the above call...
>
>>   }
>
> should be here:
>
> call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
> net->loopback_dev);
>
> and also once after outroll label?:

That's a good optimization to add. I was mostly focusing on the device
unregister path.

>
> call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH, last);
>
>>   }
>>
>> @@ -1614,8 +1628,12 @@ int unregister_netdevice_notifier(struct 
>> notifier_block *nb)

>>   call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
>> + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
>> + dev);
>
> Above call...
>
>>   }
>
> should be here, for net->loopback_dev?
> Also, is it ok to call NETDEV_DOWN_BATCH many times, as result,
> sometimes after NETDEV_UNREGISTER?

Same here, I can add this optimization. I think it is fine to call the
BATCH notifiers
for every interface. It is just better to do it for many interfaces at
the same time.


>> + list_for_each_entry_safe(net, net_tmp, _head, event_list) {
>> + call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH,
>> +  net->loopback_dev);
>> + net_del_event_list(net);
>> + }
>> +
>
> NETDEV_UNREGISTER* should not be called before
> following synchronize_net and NETDEV_UNREGISTER. May be
> we should split the loop: loop (dev_shutdown+NETDEV_UNREGISTER)
> followed by above NETDEV_UNREGISTER_BATCH then again the
> loop for all remaining calls
>
>>   synchronize_net();
>>
>>   list_for_each_entry(dev, head, unreg_list)
>
> Regards
>
> --
> Julian Anastasov <j...@ssi.bg>

The call to NETDEV_UNREGISTER_BATCH is actually after NETDEV_UNREGISTER,
it seems the other way around in the patch because it is showing part
of netdev_wait_allrefs
and not rollback_registered_may.

Thanks,

Salam


[PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown

2016-02-01 Thread Salam Noureddine
fib_flush walks the whole fib in a net_namespace and is called for
each net_device being closed or unregistered. This can be very expensive
when dealing with 100k or more routes in the fib and removal of a lot
of interfaces. These four patches deal with this issue by calling fib_flush
just once for each net namespace and introduce a new function arp_ifdown_all
that does a similar optimization for the neighbour table.

I got the following benchmark results on one of our switches.
Without this patch, deleting 1k interfaces with 100k routes in the fib held
the rtnl_lock for 13 seconds. With the patch, rtnl_lock hold time went down
to 5 seconds. The gain is even more pronounced with 512k routes in the FIB.
In this case, without the patch, rtnl_lock was held for 36 seconds and with
the patch it was held for 5.5 seconds.

Salam Noureddine (4):
  net: add event_list to struct net and provide utility functions
  net: dev: add batching to net_device notifiers
  net: core: introduce neigh_ifdown_all for all down interfaces
  net: fib: avoid calling fib_flush for each device when doing batch
close and unregister

 include/linux/netdevice.h   |  2 ++
 include/net/arp.h   |  1 +
 include/net/neighbour.h |  1 +
 include/net/net_namespace.h | 22 ++
 include/net/netns/ipv4.h|  1 +
 net/core/dev.c  | 39 ---
 net/core/neighbour.c| 38 +++---
 net/ipv4/arp.c  |  4 
 net/ipv4/fib_frontend.c | 16 ++--
 9 files changed, 112 insertions(+), 12 deletions(-)

-- 
1.8.1.4



[PATCH net-next 2/4] net: dev: add batching to net_device notifiers

2016-02-01 Thread Salam Noureddine
This can be used to optimize bringing down and unregsitering
net_devices by running certain cleanup operations only on the
net namespace instead of on each net_device.

Signed-off-by: Salam Noureddine <nouredd...@arista.com>
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c| 39 ---
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c20b814..1b12269 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2183,6 +2183,8 @@ struct netdev_lag_lower_state_info {
 #define NETDEV_BONDING_INFO0x0019
 #define NETDEV_PRECHANGEUPPER  0x001A
 #define NETDEV_CHANGELOWERSTATE0x001B
+#define NETDEV_UNREGISTER_BATCH0x001C
+#define NETDEV_DOWN_BATCH  0x001D
 
 int register_netdevice_notifier(struct notifier_block *nb);
 int unregister_netdevice_notifier(struct notifier_block *nb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 914b4a2..77410a3 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1439,11 +1439,16 @@ static int __dev_close(struct net_device *dev)
 int dev_close_many(struct list_head *head, bool unlink)
 {
struct net_device *dev, *tmp;
+   struct net *net, *net_tmp;
+   LIST_HEAD(net_head);
 
/* Remove the devices that don't need to be closed */
-   list_for_each_entry_safe(dev, tmp, head, close_list)
+   list_for_each_entry_safe(dev, tmp, head, close_list) {
if (!(dev->flags & IFF_UP))
list_del_init(>close_list);
+   else
+   net_add_event_list(_head, dev_net(dev));
+   }
 
__dev_close_many(head);
 
@@ -1454,6 +1459,11 @@ int dev_close_many(struct list_head *head, bool unlink)
list_del_init(>close_list);
}
 
+   list_for_each_entry_safe(net, net_tmp, _head, event_list) {
+   call_netdevice_notifiers(NETDEV_DOWN_BATCH, net->loopback_dev);
+   net_del_event_list(net);
+   }
+
return 0;
 }
 EXPORT_SYMBOL(dev_close_many);
@@ -1572,8 +1582,12 @@ rollback:
call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
dev);
call_netdevice_notifier(nb, NETDEV_DOWN, dev);
+   call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
+   dev);
}
call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
+   call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
+   dev);
}
}
 
@@ -1614,8 +1628,12 @@ int unregister_netdevice_notifier(struct notifier_block 
*nb)
call_netdevice_notifier(nb, NETDEV_GOING_DOWN,
dev);
call_netdevice_notifier(nb, NETDEV_DOWN, dev);
+   call_netdevice_notifier(nb, NETDEV_DOWN_BATCH,
+   dev);
}
call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
+   call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
+   dev);
}
}
 unlock:
@@ -6187,10 +6205,12 @@ void __dev_notify_flags(struct net_device *dev, 
unsigned int old_flags,
rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges, GFP_ATOMIC);
 
if (changes & IFF_UP) {
-   if (dev->flags & IFF_UP)
+   if (dev->flags & IFF_UP) {
call_netdevice_notifiers(NETDEV_UP, dev);
-   else
+   } else {
call_netdevice_notifiers(NETDEV_DOWN, dev);
+   call_netdevice_notifiers(NETDEV_DOWN_BATCH, dev);
+   }
}
 
if (dev->flags & IFF_UP &&
@@ -6427,7 +6447,9 @@ static void net_set_todo(struct net_device *dev)
 static void rollback_registered_many(struct list_head *head)
 {
struct net_device *dev, *tmp;
+   struct net *net, *net_tmp;
LIST_HEAD(close_head);
+   LIST_HEAD(net_head);
 
BUG_ON(dev_boot_phase);
ASSERT_RTNL();
@@ -6504,6 +6526,15 @@ static void rollback_registered_many(struct list_head 
*head)
 #endif
}
 
+   list_for_each_entry(dev, head, unreg_list) {
+   net_add_event_list(_head, dev_net(dev));
+   }
+   list_for_each_entry_safe(net, net_tmp, _head, event_list) {
+   call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH,
+net->loopback_dev);
+   net_del_event_list(net);
+   }
+
synchronize_net();
 
list_for_each

[PATCH net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister

2016-02-01 Thread Salam Noureddine
Call fib_flush at the end when closing or unregistering multiple
devices. This can save walking the fib many times and greatly
reduce rtnl_lock hold time when unregistering many devices with
a fib having hundreds of thousands of routes.

Signed-off-by: Salam Noureddine <nouredd...@arista.com>
---
 include/net/netns/ipv4.h |  1 +
 net/ipv4/fib_frontend.c  | 16 ++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index d75be32..d59a078 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -111,5 +111,6 @@ struct netns_ipv4 {
 #endif
 #endif
atomic_trt_genid;
+   boolneeds_fib_flush;
 };
 #endif
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4734475..808426e 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1161,11 +1161,22 @@ static int fib_netdev_event(struct notifier_block 
*this, unsigned long event, vo
unsigned int flags;
 
if (event == NETDEV_UNREGISTER) {
-   fib_disable_ip(dev, event, true);
+   if (fib_sync_down_dev(dev, event, true))
+   net->ipv4.needs_fib_flush = true;
rt_flush_dev(dev);
return NOTIFY_DONE;
}
 
+   if (event == NETDEV_UNREGISTER_BATCH || event == NETDEV_DOWN_BATCH) {
+   if (net->ipv4.needs_fib_flush) {
+   fib_flush(net);
+   net->ipv4.needs_fib_flush = false;
+   }
+   rt_cache_flush(net);
+   arp_ifdown_all();
+   return NOTIFY_DONE;
+   }
+
in_dev = __in_dev_get_rtnl(dev);
if (!in_dev)
return NOTIFY_DONE;
@@ -1182,7 +1193,8 @@ static int fib_netdev_event(struct notifier_block *this, 
unsigned long event, vo
rt_cache_flush(net);
break;
case NETDEV_DOWN:
-   fib_disable_ip(dev, event, false);
+   if (fib_sync_down_dev(dev, event, false))
+   net->ipv4.needs_fib_flush = true;
break;
case NETDEV_CHANGE:
flags = dev_get_flags(dev);
-- 
1.8.1.4



[PATCH net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces

2016-02-01 Thread Salam Noureddine
This cleans up neighbour entries for all interfaces in the down
state, avoiding walking the whole neighbour table for each interface
being brought down.

Signed-off-by: Salam Noureddine <nouredd...@arista.com>
---
 include/net/arp.h   |  1 +
 include/net/neighbour.h |  1 +
 net/core/neighbour.c| 38 +++---
 net/ipv4/arp.c  |  4 
 4 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/include/net/arp.h b/include/net/arp.h
index 5e0f891..0efee66 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -43,6 +43,7 @@ void arp_send(int type, int ptype, __be32 dest_ip,
  const unsigned char *src_hw, const unsigned char *th);
 int arp_mc_map(__be32 addr, u8 *haddr, struct net_device *dev, int dir);
 void arp_ifdown(struct net_device *dev);
+void arp_ifdown_all(void);
 
 struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip,
   struct net_device *dev, __be32 src_ip,
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 8b68384..8785d7b 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -318,6 +318,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, 
u8 new, u32 flags);
 void __neigh_set_probe_once(struct neighbour *neigh);
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev);
 int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
+int neigh_ifdown_all(struct neigh_table *tbl);
 int neigh_resolve_output(struct neighbour *neigh, struct sk_buff *skb);
 int neigh_connected_output(struct neighbour *neigh, struct sk_buff *skb);
 int neigh_direct_output(struct neighbour *neigh, struct sk_buff *skb);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index f18ae91..bfbd97a 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -54,7 +54,8 @@ do {  \
 static void neigh_timer_handler(unsigned long arg);
 static void __neigh_notify(struct neighbour *n, int type, int flags);
 static void neigh_update_notify(struct neighbour *neigh);
-static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
+static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
+bool all_down);
 
 #ifdef CONFIG_PROC_FS
 static const struct file_operations neigh_stat_seq_fops;
@@ -192,7 +193,8 @@ static void pneigh_queue_purge(struct sk_buff_head *list)
}
 }
 
-static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
+static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev,
+   bool all_down)
 {
int i;
struct neigh_hash_table *nht;
@@ -210,6 +212,12 @@ static void neigh_flush_dev(struct neigh_table *tbl, 
struct net_device *dev)
np = >next;
continue;
}
+   if (!dev && n->dev && all_down) {
+   if (n->dev->flags & IFF_UP) {
+   np = >next;
+   continue;
+   }
+   }
rcu_assign_pointer(*np,
   rcu_dereference_protected(n->next,
lockdep_is_held(>lock)));
@@ -245,7 +253,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct 
net_device *dev)
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev)
 {
write_lock_bh(>lock);
-   neigh_flush_dev(tbl, dev);
+   neigh_flush_dev(tbl, dev, false);
write_unlock_bh(>lock);
 }
 EXPORT_SYMBOL(neigh_changeaddr);
@@ -253,8 +261,8 @@ EXPORT_SYMBOL(neigh_changeaddr);
 int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
 {
write_lock_bh(>lock);
-   neigh_flush_dev(tbl, dev);
-   pneigh_ifdown(tbl, dev);
+   neigh_flush_dev(tbl, dev, false);
+   pneigh_ifdown(tbl, dev, false);
write_unlock_bh(>lock);
 
del_timer_sync(>proxy_timer);
@@ -263,6 +271,19 @@ int neigh_ifdown(struct neigh_table *tbl, struct 
net_device *dev)
 }
 EXPORT_SYMBOL(neigh_ifdown);
 
+int neigh_ifdown_all(struct neigh_table *tbl)
+{
+   write_lock_bh(>lock);
+   neigh_flush_dev(tbl, NULL, true);
+   pneigh_ifdown(tbl, NULL, true);
+   write_unlock_bh(>lock);
+
+   del_timer_sync(>proxy_timer);
+   pneigh_queue_purge(>proxy_queue);
+   return 0;
+}
+EXPORT_SYMBOL(neigh_ifdown_all);
+
 static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct 
net_device *dev)
 {
struct neighbour *n = NULL;
@@ -645,7 +666,8 @@ int pneigh_delete(struct neigh_table *tbl, struct net *net, 
const void *pkey,
return -ENOENT;
 }
 
-static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev)

[PATCH net-next 1/4] net: add event_list to struct net and provide utility functions

2016-02-01 Thread Salam Noureddine

Signed-off-by: Salam Noureddine <nouredd...@arista.com>
---
 include/net/net_namespace.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4089abc..4cf47de 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -58,6 +58,7 @@ struct net {
struct list_headlist;   /* list of network namespaces */
struct list_headcleanup_list;   /* namespaces on death row */
struct list_headexit_list;  /* Use only net_mutex */
+   struct list_headevent_list; /* net_device notifier list */
 
struct user_namespace   *user_ns;   /* Owning user namespace */
spinlock_t  nsid_lock;
@@ -380,4 +381,25 @@ static inline void fnhe_genid_bump(struct net *net)
atomic_inc(>fnhe_genid);
 }
 
+#ifdef CONFIG_NET_NS
+static inline void net_add_event_list(struct list_head *head, struct net *net)
+{
+   if (!list_empty(>event_list))
+   list_add_tail(>event_list, head);
+}
+
+static inline void net_del_event_list(struct net *net)
+{
+   list_del_init(>event_list);
+}
+#else
+static inline void net_add_event_list(struct list_head *head, struct net *net)
+{
+}
+
+static inline void net_del_event_list(struct net *net)
+{
+}
+#endif
+
 #endif /* __NET_NET_NAMESPACE_H */
-- 
1.8.1.4



[PATCH net-next 1/4] net: add event_list to struct net and provide utility functions

2016-01-04 Thread Salam Noureddine

Signed-off-by: Salam Noureddine <nouredd...@arista.com>
---
 include/net/net_namespace.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4089abc..4cf47de 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -58,6 +58,7 @@ struct net {
struct list_headlist;   /* list of network namespaces */
struct list_headcleanup_list;   /* namespaces on death row */
struct list_headexit_list;  /* Use only net_mutex */
+   struct list_headevent_list; /* net_device notifier list */
 
struct user_namespace   *user_ns;   /* Owning user namespace */
spinlock_t  nsid_lock;
@@ -380,4 +381,25 @@ static inline void fnhe_genid_bump(struct net *net)
atomic_inc(>fnhe_genid);
 }
 
+#ifdef CONFIG_NET_NS
+static inline void net_add_event_list(struct list_head *head, struct net *net)
+{
+   if (!list_empty(>event_list))
+   list_add_tail(>event_list, head);
+}
+
+static inline void net_del_event_list(struct net *net)
+{
+   list_del_init(>event_list);
+}
+#else
+static inline void net_add_event_list(struct list_head *head, struct net *net)
+{
+}
+
+static inline void net_del_event_list(struct net *net)
+{
+}
+#endif
+
 #endif /* __NET_NET_NAMESPACE_H */
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 4/4] net: fib: avoid calling fib_flush for each device when doing batch close and unregister

2016-01-04 Thread Salam Noureddine
Call fib_flush at the end when closing or unregistering multiple
devices. This can save walking the fib many times and greatly
reduce rtnl_lock hold time when unregistering many devices with
a fib having hundreds of thousands of routes.

Signed-off-by: Salam Noureddine <nouredd...@arista.com>
---
 include/net/netns/ipv4.h |  1 +
 net/ipv4/fib_frontend.c  | 16 ++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index d75be32..d59a078 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -111,5 +111,6 @@ struct netns_ipv4 {
 #endif
 #endif
atomic_trt_genid;
+   boolneeds_fib_flush;
 };
 #endif
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4734475..808426e 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1161,11 +1161,22 @@ static int fib_netdev_event(struct notifier_block 
*this, unsigned long event, vo
unsigned int flags;
 
if (event == NETDEV_UNREGISTER) {
-   fib_disable_ip(dev, event, true);
+   if (fib_sync_down_dev(dev, event, true))
+   net->ipv4.needs_fib_flush = true;
rt_flush_dev(dev);
return NOTIFY_DONE;
}
 
+   if (event == NETDEV_UNREGISTER_BATCH || event == NETDEV_DOWN_BATCH) {
+   if (net->ipv4.needs_fib_flush) {
+   fib_flush(net);
+   net->ipv4.needs_fib_flush = false;
+   }
+   rt_cache_flush(net);
+   arp_ifdown_all();
+   return NOTIFY_DONE;
+   }
+
in_dev = __in_dev_get_rtnl(dev);
if (!in_dev)
return NOTIFY_DONE;
@@ -1182,7 +1193,8 @@ static int fib_netdev_event(struct notifier_block *this, 
unsigned long event, vo
rt_cache_flush(net);
break;
case NETDEV_DOWN:
-   fib_disable_ip(dev, event, false);
+   if (fib_sync_down_dev(dev, event, false))
+   net->ipv4.needs_fib_flush = true;
break;
case NETDEV_CHANGE:
flags = dev_get_flags(dev);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown

2016-01-04 Thread Salam Noureddine
fib_flush walks the whole fib in a net_namespace and is called for
each net_device being closed or unregistered. This can be very expensive
when dealing with 100k or more routes in the fib and removal of a lot
of interfaces. These four patches deal with this issue by calling fib_flush
just once for each net namespace and introduce a new function arp_ifdown_all
that does a similar optimization for the neighbour table.

Salam Noureddine (4):
  net: add event_list to struct net and provide utility functions
  net: dev: add batching to net_device notifiers
  net: core: introduce neigh_ifdown_all for all down interfaces
  net: fib: avoid calling fib_flush for each device when doing batch
close and unregister

 include/linux/netdevice.h   |  2 ++
 include/net/arp.h   |  1 +
 include/net/neighbour.h |  1 +
 include/net/net_namespace.h | 22 ++
 include/net/netns/ipv4.h|  1 +
 net/core/dev.c  | 39 ---
 net/core/neighbour.c| 38 +++---
 net/ipv4/arp.c  |  4 
 net/ipv4/fib_frontend.c | 16 ++--
 9 files changed, 112 insertions(+), 12 deletions(-)

-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 3/4] net: core: introduce neigh_ifdown_all for all down interfaces

2016-01-04 Thread Salam Noureddine
This cleans up neighbour entries for all interfaces in the down
state, avoiding walking the whole neighbour table for each interface
being brought down.

Signed-off-by: Salam Noureddine <nouredd...@arista.com>
---
 include/net/arp.h   |  1 +
 include/net/neighbour.h |  1 +
 net/core/neighbour.c| 38 +++---
 net/ipv4/arp.c  |  4 
 4 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/include/net/arp.h b/include/net/arp.h
index 5e0f891..0efee66 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -43,6 +43,7 @@ void arp_send(int type, int ptype, __be32 dest_ip,
  const unsigned char *src_hw, const unsigned char *th);
 int arp_mc_map(__be32 addr, u8 *haddr, struct net_device *dev, int dir);
 void arp_ifdown(struct net_device *dev);
+void arp_ifdown_all(void);
 
 struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip,
   struct net_device *dev, __be32 src_ip,
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 8b68384..8785d7b 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -318,6 +318,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, 
u8 new, u32 flags);
 void __neigh_set_probe_once(struct neighbour *neigh);
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev);
 int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
+int neigh_ifdown_all(struct neigh_table *tbl);
 int neigh_resolve_output(struct neighbour *neigh, struct sk_buff *skb);
 int neigh_connected_output(struct neighbour *neigh, struct sk_buff *skb);
 int neigh_direct_output(struct neighbour *neigh, struct sk_buff *skb);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index f18ae91..bfbd97a 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -54,7 +54,8 @@ do {  \
 static void neigh_timer_handler(unsigned long arg);
 static void __neigh_notify(struct neighbour *n, int type, int flags);
 static void neigh_update_notify(struct neighbour *neigh);
-static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
+static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
+bool all_down);
 
 #ifdef CONFIG_PROC_FS
 static const struct file_operations neigh_stat_seq_fops;
@@ -192,7 +193,8 @@ static void pneigh_queue_purge(struct sk_buff_head *list)
}
 }
 
-static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
+static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev,
+   bool all_down)
 {
int i;
struct neigh_hash_table *nht;
@@ -210,6 +212,12 @@ static void neigh_flush_dev(struct neigh_table *tbl, 
struct net_device *dev)
np = >next;
continue;
}
+   if (!dev && n->dev && all_down) {
+   if (n->dev->flags & IFF_UP) {
+   np = >next;
+   continue;
+   }
+   }
rcu_assign_pointer(*np,
   rcu_dereference_protected(n->next,
lockdep_is_held(>lock)));
@@ -245,7 +253,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct 
net_device *dev)
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev)
 {
write_lock_bh(>lock);
-   neigh_flush_dev(tbl, dev);
+   neigh_flush_dev(tbl, dev, false);
write_unlock_bh(>lock);
 }
 EXPORT_SYMBOL(neigh_changeaddr);
@@ -253,8 +261,8 @@ EXPORT_SYMBOL(neigh_changeaddr);
 int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
 {
write_lock_bh(>lock);
-   neigh_flush_dev(tbl, dev);
-   pneigh_ifdown(tbl, dev);
+   neigh_flush_dev(tbl, dev, false);
+   pneigh_ifdown(tbl, dev, false);
write_unlock_bh(>lock);
 
del_timer_sync(>proxy_timer);
@@ -263,6 +271,19 @@ int neigh_ifdown(struct neigh_table *tbl, struct 
net_device *dev)
 }
 EXPORT_SYMBOL(neigh_ifdown);
 
+int neigh_ifdown_all(struct neigh_table *tbl)
+{
+   write_lock_bh(>lock);
+   neigh_flush_dev(tbl, NULL, true);
+   pneigh_ifdown(tbl, NULL, true);
+   write_unlock_bh(>lock);
+
+   del_timer_sync(>proxy_timer);
+   pneigh_queue_purge(>proxy_queue);
+   return 0;
+}
+EXPORT_SYMBOL(neigh_ifdown_all);
+
 static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct 
net_device *dev)
 {
struct neighbour *n = NULL;
@@ -645,7 +666,8 @@ int pneigh_delete(struct neigh_table *tbl, struct net *net, 
const void *pkey,
return -ENOENT;
 }
 
-static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev)

Re: [PATCH net-next 0/4] batch calls to fib_flush and arp_ifdown

2016-01-04 Thread Salam Noureddine
On Mon, Jan 4, 2016 at 4:35 PM, Eric W. Biederman  wrote:

> Two things would be very valuable with this patchset.
>
> Some numbers on how much your changes have improved the code in the case
> you care about.  I suspect the improvements are not subtle so this
> should not be hard.
>
> Can you please provide a justification for event_list.  Just skimming
> through it appears that event_list because a duplicate of the list of
> batched network devices that are passed to dev_close_many and friends.
> If the list is actually a duplicate it appears foolish to create it.
>
> Eric

The performance test I ran tries to unregister 1000 dummy interfaces
with 512K routes in the fib.
Without the patch I could unregister 35 interfaces per second and with
the patch it jumped to 620
interfaces per second. 512K is a lot of routes but I am assuming we
would get a good improvement
even with 100K routes in the fib.

I am using event_list to put all the net namespaces in the current
net_device batch on a list and only
call the NETDEV_UNREGISTER_BATCH on those namespaces. It would be
possible to just call the
notifier for NETDEV_DOWN/UNREGISTER_BATCH for all the devices on the
list and rely on the
needs_fib_flush flag to only call fib_flush once per namespace but it
seems like a waste to me.

Salam
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html