Serhiy Storchaka <storchaka+cpyt...@gmail.com> added the comment:

As far as I understand, the problem is that the value obtained by XORing hashes 
of elements has nonuniform distribution. Further transformations, either linear 
or nonlinear, as well as adding length, don't change the fact that the result 
of XORing contains less information than the number of bit in the hash word. 
The mathematically strong way of computing the hash of a frozenset is:

    hash(tuple(sorted(hash(x) for x in frozenset)))

Store all hash values into an array, sort them for getting rid of ordering, and 
make a hash of all concatenated hashes.

But this algorithm requires linear memory and have superlinear complexity. For 
practical purposes we need just makes the difference of the distribution from 
the uniform distribution small enough. Maybe the following algorithm could help:

    buckets = [0] * N
    for x in frozenset:
        h = hash(x)
        i = shuffle_bits1(h) % N
        buckets[i] ^= shuffle_bits2(h)
    return hash(tuple(buckets))

where shuffle_bits1() and shuffle_bits2() are functions that shuffle bits in 
different ways.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue26163>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to