> > > with 16384 hashbuckets and a maximum of 131072 tracked connections it took
> > > 7.5 seconds to perform 1 million lookups in the hashtable (using
> > > __ip_conntrack_find from userspace).
> > 
> > Was this 1 million lookups to a random hash bucket each, with guaranteed
> > "no match"? Assuming this in the following...
> 
> yes that was a lookup of a connection not present in the conntrack.

For a single one? That would mean the one hash chain has a good chance
of being in processor cache - not the real world situation.

> > That's 8 connections per bucket, resulting in 8 million "pointer chasings"
> > during table lookup, i.e. about 937ns per chasing.
> 
> and compare of tuple...

Yes, but the comparison itself would be insignificant vs. the time to
bring the conntrack elements from main memory to cache - unless you
were cache-hot.

> I expect the lookups to be quite fast if the chain length isn't too long
> as all we do is a hash of the tuple and then go do a linear search in the
> hashbucket.

Again, under normal operation, you cannot expect the hash chains to
be in CPU cache. Each "step" in searching the chain, thus will incur
a main memory round trip. This makes hashing with chaining "bad" even
for moderately small average chain lengths.

> Conntrack uses the standard LIST_FIND macro that's used all
> over the kernel and that should be quite fast. 

That's not the way to look at these things, really. Superfluous code
is superfluous code, and on each individual lookup, only one of the
entries in a list will match - all others are superfluously looked at.

Hashing with chaining is fine, but for high performance, you want the
chains only as a backdrop for the occasional hash collision. The "planned"
oversubscription of the ip_conntrack hash table (1:8 hashsize/conntrack_max)
does not perform well when conntrack_max is near. This will become more
apparent as more people try to use conntracking at the line rate their
hardware permits.

On machines where I expect many connections, I'd use a hashsize
near the number of expected connections, and make conntrack_max
only about two times that value.

Note that an additional hash bucket costs 4 byte; a single conntrack
entry is about 100 times the memory cost.

How does the core team feel about this issue? I hereby suggest changing
the default calculation to have hashsize == conntrack_max/2. Were there
good reasons to do different?

<soap-box-stepdown/>

best regards
  Patrick

Reply via email to