Eric Appelt added the comment:

Ugh... so I think I made a mental error (i.e. confused myself) in the last 
comment. The result looked a bit "to good to be true" and I think that there is 
at least one error in my thinking about the problem.

I tried testing with the width set to 2 and then 1 as a check and noticed that 
even without "widening" the problem was still fixed and the hash distributions 
matched that of pseudorandom numbers.

It turns out that just running the XORed result of the shuffled entry hashes 
through the hashing algorithm gets rid of any bit patterns picked up through 
the process. Currently there is an LCG that is used to disperse patterns but I 
don't think it really helps the problems arising in these tests:

    hash = hash * 69069U + 907133923UL;

I'm not attaching any more plots as the result can be described in words, but 
when the LCG applied to the hash after XORing all entries is replaced with a 
hashing of the result using the standard python hashing algorithm, the test 
problem goes away. Specifically, when 128 distinct sets are hashed, the 
resulting hashes have a distribution of unique values across their last 7 
digits matches the distribution from pseudorandom numbers.

The fix is implemented in a small patch that I have attached.

----------
keywords: +patch
Added file: http://bugs.python.org/file45283/setobject.c.2.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26163>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to