Tim Peters added the comment:

The attached hashsim.py pretty much convinces me there's nothing left to be 
gained here.  It shows that the current scheme essentially achieves the 
performance of theoretical "uniform hashing" for string keys, for both 
successful and unsuccessful searches, as measured by the average number of 
probes needed.  Uniform hashing is the head scheme in which, for each key, the 
probe sequence is chosen uniformly at random from the set of all table slot 
permutations.  Its advantage is that it's feasible to analyze - its 
disadvantage is that it can't actually be sanely implemented ;-)

Back when the current scheme was implemented, string keys sometimes behaved 
"better than random", for reasons explained in the code comments.  The string 
hash now is more like a high-quality PRNG, so the performance of uniform 
hashing is the best that can be hoped for.  Dicts with int keys can still enjoy 
better-than-random performance.

Back then I was using a 32-bit box, and I'm pleased to see that PERTURB_SHIFT=5 
still appears to work well on a 64-bit box (note:  to a first approximation you 
can emulate a 32-bit box on a 64-bit box by setting M64 in the code to a 32-bit 
mask).  But the choice appears far less touchy on a 64-bit box, and it may help 
a little to increase PERTURB_SHIFT.  Not so much in the average case, but for 
pathological cases, like int keys all of which have a lot of trailing zeroes.  
Before the bits that differ get shifted down far enough to matter, they all 
follow the same probe sequence.  Increasing PERTURB_SHIFT would reduce the 
number of iterations needed before that happens.  But I'd wait until someone 
complains before churning the code just for that ;-)

----------
Added file: http://bugs.python.org/file46957/hashsim.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30671>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to