On 12/12/17 20:07, Stephen Hemminger wrote:
> On Tue, 12 Dec 2017 16:02:50 +0200
> Nikolay Aleksandrov <niko...@cumulusnetworks.com> wrote:
> 
>> Before this patch the bridge used a fixed 256 element hash table which
>> was fine for small use cases (in my tests it starts to degrade
>> above 1000 entries), but it wasn't enough for medium or large
>> scale deployments. Modern setups have thousands of participants in a
>> single bridge, even only enabling vlans and adding a few thousand vlan
>> entries will cause a few thousand fdbs to be automatically inserted per
>> participating port. So we need to scale the fdb table considerably to
>> cope with modern workloads, and this patch converts it to use a
>> rhashtable for its operations thus improving the bridge scalability.
>> Tests show the following results (10 runs each), at up to 1000 entries
>> rhashtable is ~3% slower, at 2000 rhashtable is 30% faster, at 3000 it
>> is 2 times faster and at 30000 it is 50 times faster.
>> Obviously this happens because of the properties of the two constructs
>> and is expected, rhashtable keeps pretty much a constant time even with
>> 10000000 entries (tested), while the fixed hash table struggles
>> considerably even above 10000.
>> As a side effect this also reduces the net_bridge struct size from 3248
>> bytes to 1344 bytes. Also note that the key struct is 8 bytes.
>>
>> Signed-off-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>
>> ---
> 
> Thanks for doing this, it was on my list of things that never get done.
> 
> Some downsides:
>  * size of the FDB entry gets larger.

It does not, due to smp alignment of the write-heavy members we had a large
hole between cache line 1 and 2, the new 8 bytes fit perfectly and there are
still bytes left to use.

>  * you lost the ability to salt the hash (and rekey) which is important
>    for DDoS attacks

The hash is always salted (property of rhashtable) and in fact is better because
now the salt is generated for each rhashtable separately rather than having 1 
global
salt for all bridge devices.

>  * being slower for small (<10 entries) also matters and is is a common
>    use case for containers.

I think they're pretty comparable in speed, the difference is negligible IMO.


Reply via email to