Hi Harald, > > On machines where I expect many connections, I'd use a hashsize > > near the number of expected connections, and make conntrack_max > > only about two times that value. > > But this obviously only helps if the hash function is distributing > the conntrack entries equally among the hash buckets. I wouldn't be > so sure if this really does happen when the hash becomes wider than a > certain point.
In theory, a good hash function would distribute well independant of the size of the bucket array. In practise, theory is theory... I agree that the hash function needs scrutiny. Do you (or somebody else here) have a good collection of real world /proc/net/ip_conntrack excerpts, maybe coming from the development of ctnetlink? I'll cook up a "hash occupation simulator" for user level, where you can pipe in a conntrack table, and get reports about the distribution of chain sizes. As I don't run realworld conntracking firewalls with lots of connections (I'm using the stuff almost exclusively on servers), I need the help of you all to get good input test data, here. > > How does the core team feel about this issue? I hereby suggest changing > > the default calculation to have hashsize == conntrack_max/2. Were there > > good reasons to do different? > > This would be fine with me, but rather than just blindly doing that, > I'd be more interested in how good our hash function is with real world > traffic. Jep. > And real-world traffic usually means narrow source ip ranges > (because most people firewall a couple of Class-C's) and narrow source > port ranges (let's assume lots of users aren't causing too many connections > and thus the source port range stays close to the startup default port (32k?)) > The destination ports are most definitely also not very distributed, since > most people will do the same services (http, ftp, smtp, or whatever is used > from within this organization). These are all important constraints. However, the problem is not insoluble. Given a good cryptographic hash (not that I'd want to have SHA or MD5 for this purpose :-) won't care about such issues. The art is to find a fast hash function that still is not very sensitive to the inputs. best regards Patrick