Re: [Bridge] [PATCH net-next] net: bridge: use rhashtable for fdbs

2017-12-12 Thread Nikolay Aleksandrov
On 12/12/17 20:02, Stephen Hemminger wrote:
> On Tue, 12 Dec 2017 16:02:50 +0200
> Nikolay Aleksandrov  wrote:
> 
>> +memcpy(__entry->addr, f->key.addr.addr, ETH_ALEN);
> 
> Maybe use ether_addr_copy() here?
> 

This is an unrelated cleanup, the code in question was already like that. I can 
post
a separate patch to turn these into ether_addr_copy().
 


Re: [Bridge] [PATCH net-next] net: bridge: use rhashtable for fdbs

2017-12-12 Thread Nikolay Aleksandrov
On 12/12/17 20:07, Stephen Hemminger wrote:
> On Tue, 12 Dec 2017 16:02:50 +0200
> Nikolay Aleksandrov  wrote:
> 
>> Before this patch the bridge used a fixed 256 element hash table which
>> was fine for small use cases (in my tests it starts to degrade
>> above 1000 entries), but it wasn't enough for medium or large
>> scale deployments. Modern setups have thousands of participants in a
>> single bridge, even only enabling vlans and adding a few thousand vlan
>> entries will cause a few thousand fdbs to be automatically inserted per
>> participating port. So we need to scale the fdb table considerably to
>> cope with modern workloads, and this patch converts it to use a
>> rhashtable for its operations thus improving the bridge scalability.
>> Tests show the following results (10 runs each), at up to 1000 entries
>> rhashtable is ~3% slower, at 2000 rhashtable is 30% faster, at 3000 it
>> is 2 times faster and at 3 it is 50 times faster.
>> Obviously this happens because of the properties of the two constructs
>> and is expected, rhashtable keeps pretty much a constant time even with
>> 1000 entries (tested), while the fixed hash table struggles
>> considerably even above 1.
>> As a side effect this also reduces the net_bridge struct size from 3248
>> bytes to 1344 bytes. Also note that the key struct is 8 bytes.
>>
>> Signed-off-by: Nikolay Aleksandrov 
>> ---
> 
> Thanks for doing this, it was on my list of things that never get done.
> 
> Some downsides:
>  * size of the FDB entry gets larger.

It does not, due to smp alignment of the write-heavy members we had a large
hole between cache line 1 and 2, the new 8 bytes fit perfectly and there are
still bytes left to use.

>  * you lost the ability to salt the hash (and rekey) which is important
>for DDoS attacks

The hash is always salted (property of rhashtable) and in fact is better because
now the salt is generated for each rhashtable separately rather than having 1 
global
salt for all bridge devices.

>  * being slower for small (<10 entries) also matters and is is a common
>use case for containers.

I think they're pretty comparable in speed, the difference is negligible IMO.




Re: [Bridge] [PATCH net-next] net: bridge: use rhashtable for fdbs

2017-12-12 Thread Stephen Hemminger
On Tue, 12 Dec 2017 16:02:50 +0200
Nikolay Aleksandrov  wrote:

> Before this patch the bridge used a fixed 256 element hash table which
> was fine for small use cases (in my tests it starts to degrade
> above 1000 entries), but it wasn't enough for medium or large
> scale deployments. Modern setups have thousands of participants in a
> single bridge, even only enabling vlans and adding a few thousand vlan
> entries will cause a few thousand fdbs to be automatically inserted per
> participating port. So we need to scale the fdb table considerably to
> cope with modern workloads, and this patch converts it to use a
> rhashtable for its operations thus improving the bridge scalability.
> Tests show the following results (10 runs each), at up to 1000 entries
> rhashtable is ~3% slower, at 2000 rhashtable is 30% faster, at 3000 it
> is 2 times faster and at 3 it is 50 times faster.
> Obviously this happens because of the properties of the two constructs
> and is expected, rhashtable keeps pretty much a constant time even with
> 1000 entries (tested), while the fixed hash table struggles
> considerably even above 1.
> As a side effect this also reduces the net_bridge struct size from 3248
> bytes to 1344 bytes. Also note that the key struct is 8 bytes.
> 
> Signed-off-by: Nikolay Aleksandrov 
> ---

Thanks for doing this, it was on my list of things that never get done.

Some downsides:
 * size of the FDB entry gets larger.
 * you lost the ability to salt the hash (and rekey) which is important
   for DDoS attacks
 * being slower for small (<10 entries) also matters and is is a common
   use case for containers.




Re: [Bridge] [PATCH net-next] net: bridge: use rhashtable for fdbs

2017-12-12 Thread Stephen Hemminger
On Tue, 12 Dec 2017 16:02:50 +0200
Nikolay Aleksandrov  wrote:

> + memcpy(__entry->addr, f->key.addr.addr, ETH_ALEN);

Maybe use ether_addr_copy() here?


[Bridge] [PATCH net-next] net: bridge: use rhashtable for fdbs

2017-12-12 Thread Nikolay Aleksandrov
Before this patch the bridge used a fixed 256 element hash table which
was fine for small use cases (in my tests it starts to degrade
above 1000 entries), but it wasn't enough for medium or large
scale deployments. Modern setups have thousands of participants in a
single bridge, even only enabling vlans and adding a few thousand vlan
entries will cause a few thousand fdbs to be automatically inserted per
participating port. So we need to scale the fdb table considerably to
cope with modern workloads, and this patch converts it to use a
rhashtable for its operations thus improving the bridge scalability.
Tests show the following results (10 runs each), at up to 1000 entries
rhashtable is ~3% slower, at 2000 rhashtable is 30% faster, at 3000 it
is 2 times faster and at 3 it is 50 times faster.
Obviously this happens because of the properties of the two constructs
and is expected, rhashtable keeps pretty much a constant time even with
1000 entries (tested), while the fixed hash table struggles
considerably even above 1.
As a side effect this also reduces the net_bridge struct size from 3248
bytes to 1344 bytes. Also note that the key struct is 8 bytes.

Signed-off-by: Nikolay Aleksandrov 
---
After this I'll post patches for the per-port fdb limit option. Later we
can get rid of hash_lock altogether though that requires much more
careful changes.

 include/trace/events/bridge.h |   4 +-
 net/bridge/br_device.c|  10 ++
 net/bridge/br_fdb.c   | 392 --
 net/bridge/br_private.h   |  16 +-
 net/bridge/br_switchdev.c |   8 +-
 5 files changed, 211 insertions(+), 219 deletions(-)

diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h
index 1bee3e7fdf32..8ea966448b58 100644
--- a/include/trace/events/bridge.h
+++ b/include/trace/events/bridge.h
@@ -82,8 +82,8 @@ TRACE_EVENT(fdb_delete,
TP_fast_assign(
__assign_str(br_dev, br->dev->name);
__assign_str(dev, f->dst ? f->dst->dev->name : "null");
-   memcpy(__entry->addr, f->addr.addr, ETH_ALEN);
-   __entry->vid = f->vlan_id;
+   memcpy(__entry->addr, f->key.addr.addr, ETH_ALEN);
+   __entry->vid = f->key.vlan_id;
),
 
TP_printk("br_dev %s dev %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u",
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index af5b8c87f590..1285ca30ab0a 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -125,9 +125,16 @@ static int br_dev_init(struct net_device *dev)
if (!br->stats)
return -ENOMEM;
 
+   err = br_fdb_hash_init(br);
+   if (err) {
+   free_percpu(br->stats);
+   return err;
+   }
+
err = br_vlan_init(br);
if (err) {
free_percpu(br->stats);
+   br_fdb_hash_fini(br);
return err;
}
 
@@ -135,6 +142,7 @@ static int br_dev_init(struct net_device *dev)
if (err) {
free_percpu(br->stats);
br_vlan_flush(br);
+   br_fdb_hash_fini(br);
}
br_set_lockdep_class(dev);
 
@@ -148,6 +156,7 @@ static void br_dev_uninit(struct net_device *dev)
br_multicast_dev_del(br);
br_multicast_uninit_stats(br);
br_vlan_flush(br);
+   br_fdb_hash_fini(br);
free_percpu(br->stats);
 }
 
@@ -416,6 +425,7 @@ void br_dev_setup(struct net_device *dev)
br->dev = dev;
spin_lock_init(>lock);
INIT_LIST_HEAD(>port_list);
+   INIT_HLIST_HEAD(>fdb_list);
spin_lock_init(>hash_lock);
 
br->bridge_id.prio[0] = 0x80;
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 4ea5c8bbe286..dc87fbc9a23b 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -28,14 +28,20 @@
 #include 
 #include "br_private.h"
 
+static const struct rhashtable_params br_fdb_rht_params = {
+   .head_offset = offsetof(struct net_bridge_fdb_entry, rhnode),
+   .key_offset = offsetof(struct net_bridge_fdb_entry, key),
+   .key_len = sizeof(struct net_bridge_fdb_key),
+   .automatic_shrinking = true,
+   .locks_mul = 1,
+};
+
 static struct kmem_cache *br_fdb_cache __read_mostly;
 static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source,
  const unsigned char *addr, u16 vid);
 static void fdb_notify(struct net_bridge *br,
   const struct net_bridge_fdb_entry *, int);
 
-static u32 fdb_salt __read_mostly;
-
 int __init br_fdb_init(void)
 {
br_fdb_cache = kmem_cache_create("bridge_fdb_cache",
@@ -45,7 +51,6 @@ int __init br_fdb_init(void)
if (!br_fdb_cache)
return -ENOMEM;
 
-   get_random_bytes(_salt, sizeof(fdb_salt));
return 0;
 }
 
@@ -54,6 +59,15 @@ void br_fdb_fini(void)
kmem_cache_destroy(br_fdb_cache);
 }
 
+int br_fdb_hash_init(struct