Eric Appelt added the comment:
Ugh... so I think I made a mental error (i.e. confused myself) in the last
comment. The result looked a bit "to good to be true" and I think that there is
at least one error in my thinking about the problem.
I tried testing with the width set to 2 and then 1 as a check and noticed that
even without "widening" the problem was still fixed and the hash distributions
matched that of pseudorandom numbers.
It turns out that just running the XORed result of the shuffled entry hashes
through the hashing algorithm gets rid of any bit patterns picked up through
the process. Currently there is an LCG that is used to disperse patterns but I
don't think it really helps the problems arising in these tests:
hash = hash * 69069U + 907133923UL;
I'm not attaching any more plots as the result can be described in words, but
when the LCG applied to the hash after XORing all entries is replaced with a
hashing of the result using the standard python hashing algorithm, the test
problem goes away. Specifically, when 128 distinct sets are hashed, the
resulting hashes have a distribution of unique values across their last 7
digits matches the distribution from pseudorandom numbers.
The fix is implemented in a small patch that I have attached.
----------
keywords: +patch
Added file: http://bugs.python.org/file45283/setobject.c.2.patch
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue26163>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com