On Thu, Apr 30, 2015 at 01:11:44PM +0300, Alberto Garcia wrote: > By using a hash of the offset as the starting point, lookups are > faster and independent from the array size. > > The hash is computed using the cluster number of the table, multiplied > by 4 to make it perform better when there are collisions. ... > + i = lookup_index = (offset / c->table_size * 4) % c->size;
The hash function is very weak. It's better than always starting with i=0 but mixes input bits very poorly. Have you considered an integer hash function so that more input bits affect the output? http://burtleburtle.net/bob/hash/integer.html Regarding collisions, the multiply-by-4 effectively reduces the hash space by a factor of 4. This *increases* the chance of collisions in the hope that the extra slots will prevent chains from forming. Using an integer hash function ought to be more resistant to input value patterns and eliminate the need for multiply-by-4, because chains only form when the table load is high. Finally, the hash optimizes for the sparsely occupied table scenario. If the table is loaded (dense) then there will be chains in any case. A full search will be necessary. Feel free to leave the hash function as-is, I don't think it's a huge issue either way. Stefan
pgpD0YvajHH69.pgp
Description: PGP signature