Serhiy Storchaka <[email protected]> added the comment:
As far as I understand, the problem is that the value obtained by XORing hashes
of elements has nonuniform distribution. Further transformations, either linear
or nonlinear, as well as adding length, don't change the fact that the result
of XORing contains less information than the number of bit in the hash word.
The mathematically strong way of computing the hash of a frozenset is:
hash(tuple(sorted(hash(x) for x in frozenset)))
Store all hash values into an array, sort them for getting rid of ordering, and
make a hash of all concatenated hashes.
But this algorithm requires linear memory and have superlinear complexity. For
practical purposes we need just makes the difference of the distribution from
the uniform distribution small enough. Maybe the following algorithm could help:
buckets = [0] * N
for x in frozenset:
h = hash(x)
i = shuffle_bits1(h) % N
buckets[i] ^= shuffle_bits2(h)
return hash(tuple(buckets))
where shuffle_bits1() and shuffle_bits2() are functions that shuffle bits in
different ways.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue26163>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com