> > > with 16384 hashbuckets and a maximum of 131072 tracked connections it took > > > 7.5 seconds to perform 1 million lookups in the hashtable (using > > > __ip_conntrack_find from userspace). > > > > Was this 1 million lookups to a random hash bucket each, with guaranteed > > "no match"? Assuming this in the following... > > yes that was a lookup of a connection not present in the conntrack.
For a single one? That would mean the one hash chain has a good chance of being in processor cache - not the real world situation. > > That's 8 connections per bucket, resulting in 8 million "pointer chasings" > > during table lookup, i.e. about 937ns per chasing. > > and compare of tuple... Yes, but the comparison itself would be insignificant vs. the time to bring the conntrack elements from main memory to cache - unless you were cache-hot. > I expect the lookups to be quite fast if the chain length isn't too long > as all we do is a hash of the tuple and then go do a linear search in the > hashbucket. Again, under normal operation, you cannot expect the hash chains to be in CPU cache. Each "step" in searching the chain, thus will incur a main memory round trip. This makes hashing with chaining "bad" even for moderately small average chain lengths. > Conntrack uses the standard LIST_FIND macro that's used all > over the kernel and that should be quite fast. That's not the way to look at these things, really. Superfluous code is superfluous code, and on each individual lookup, only one of the entries in a list will match - all others are superfluously looked at. Hashing with chaining is fine, but for high performance, you want the chains only as a backdrop for the occasional hash collision. The "planned" oversubscription of the ip_conntrack hash table (1:8 hashsize/conntrack_max) does not perform well when conntrack_max is near. This will become more apparent as more people try to use conntracking at the line rate their hardware permits. On machines where I expect many connections, I'd use a hashsize near the number of expected connections, and make conntrack_max only about two times that value. Note that an additional hash bucket costs 4 byte; a single conntrack entry is about 100 times the memory cost. How does the core team feel about this issue? I hereby suggest changing the default calculation to have hashsize == conntrack_max/2. Were there good reasons to do different? <soap-box-stepdown/> best regards Patrick