Currently, each entry in a D hashtable stores the hash of the key for efficiency purposes.

There is a bit of redundancy: as soon as you entered a hash bucket you know that the hashkey % tablesize is a specific number, call it k. But k is not enough information to restore the hash, so the actual hash gets stored for each entry.

I'm thinking of exploiting that redundancy by storing the hash multiplier, not the hash value. Instead of storing h in each slot, store m = h / tablesize. Then you can restore the actual h by writing:

restored_h = m * tablesize + k;

k is known for each given bucket (it's the index of the bucket) and m is what gets stored per entry.

What's the advantage of this approach? Well the advantage is that m is a small number. Any hash function will try to disperse the hash value as much as possible between the 32 available bits. But m will be a smaller number, and therefore will be more grouped and will engender fewer false pointers.

Would this help?


Andrei

Reply via email to