Hi Harald,

> > On machines where I expect many connections, I'd use a hashsize
> > near the number of expected connections, and make conntrack_max
> > only about two times that value.
> 
> But this obviously only helps if the hash function is distributing
> the conntrack entries equally among the hash buckets.  I wouldn't be 
> so sure if this really does happen when the hash becomes wider than a
> certain point.

In theory, a good hash function would distribute well independant of
the size of the bucket array.

In practise, theory is theory...

I agree that the hash function needs scrutiny. Do you (or somebody else
here) have a good collection of real world /proc/net/ip_conntrack excerpts,
maybe coming from the development of ctnetlink? I'll cook up a "hash
occupation simulator" for user level, where you can pipe in a conntrack
table, and get reports about the distribution of chain sizes.

As I don't run realworld conntracking firewalls with lots of connections
(I'm using the stuff almost exclusively on servers), I need the help of
you all to get good input test data, here.

> > How does the core team feel about this issue? I hereby suggest changing
> > the default calculation to have hashsize == conntrack_max/2. Were there
> > good reasons to do different?
> 
> This would be fine with me, but rather than just blindly doing that,
> I'd be more interested in how good our hash function is with real world
> traffic.

Jep.

> And real-world traffic usually means narrow source ip ranges
> (because most people firewall a couple of Class-C's) and narrow source
> port ranges (let's assume lots of users aren't causing too many connections
> and thus the source port range stays close to the startup default port (32k?))
> The destination ports are most definitely also not very distributed, since
> most people will do the same services (http, ftp, smtp, or whatever is used
> from within this organization).

These are all important constraints. However, the problem is not insoluble.
Given a good cryptographic hash (not that I'd want to have SHA or MD5 for
this purpose :-) won't care about such issues. The art is to find a fast
hash function that still is not very sensitive to the inputs.

best regards
  Patrick

Reply via email to