Re: [Qemu-devel] [Qemu-block] [PATCH 5/6] qcow2: use a hash to look for entries in the L2 cache

Stefan Hajnoczi Wed, 06 May 2015 09:44:52 -0700

On Thu, Apr 30, 2015 at 01:11:44PM +0300, Alberto Garcia wrote:
> By using a hash of the offset as the starting point, lookups are
> faster and independent from the array size.
> 
> The hash is computed using the cluster number of the table, multiplied
> by 4 to make it perform better when there are collisions.
...
> +    i = lookup_index = (offset / c->table_size * 4) % c->size;


The hash function is very weak.  It's better than always starting with
i=0 but mixes input bits very poorly.  Have you considered an integer
hash function so that more input bits affect the output?
http://burtleburtle.net/bob/hash/integer.html

Regarding collisions, the multiply-by-4 effectively reduces the hash
space by a factor of 4.  This *increases* the chance of collisions in
the hope that the extra slots will prevent chains from forming.

Using an integer hash function ought to be more resistant to input value
patterns and eliminate the need for multiply-by-4, because chains only
form when the table load is high.

Finally, the hash optimizes for the sparsely occupied table scenario.
If the table is loaded (dense) then there will be chains in any case.  A
full search will be necessary.

Feel free to leave the hash function as-is, I don't think it's a huge
issue either way.

Stefan

pgpD0YvajHH69.pgp
Description: PGP signature

Re: [Qemu-devel] [Qemu-block] [PATCH 5/6] qcow2: use a hash to look for entries in the L2 cache

Reply via email to