Tim Peters <t...@python.org> added the comment:

@jdemeyer, please define exactly what you mean by "Bernstein hash".  Bernstein 
has authored many hashes, and none on his current hash page could possibly be 
called "simple":

    https://cr.yp.to/hash.html

If you're talking about the teensy 

    hash = hash * 33 + c;

thing from now-ancient C folklore, Bernstein is claimed to have replaced 
addition with xor in that later:

    http://www.cse.yorku.ca/~oz/hash.html

In any case, Python _used_ to do plain

    x = (1000003 * x) ^ y;
    
but changed to the heart of the current scheme 14 years ago to address Source 
Forge bug #942952:
    
https://github.com/python/cpython/commit/41bd02256f5a2348d2de3d6e5fdfcaeb2fcaaebc#diff-cde4fb3109c243b2c2badb10eeab7fcd

The Source Forge bug reports no longer appear to exist.  If you care enough, 
dig through the messages with subject:

    [ python-Bugs-942952 ] Weakness in tuple hash

here:

    https://mail.python.org/pipermail/python-bugs-list/2004-May/subject.html

The thrust was that the simpler approach caused, in real life, hashes of nested 
tuples of the form

    (X, (X, Y))

to return hash(Y) - the hash(X) parts across calls magically xor'ed out.

It was also systematically the case that

   (X, (Y, Z))

and

   (Y, (X, Z))

hashed to the same thing.

So the complications were added for a reason, to address the 
showed-up-in-real-life pathological cases discussed in the bug report already 
linked to.  Since I don't believe we've had other reports of real-life 
pathologies since, nobody is going to change this again without truly 
compelling reasons.  Which I haven't seen here, so I'm closing this report 
again.

BTW, as also explained in that report, the goal of Python's hash functions is 
to minimize collisions in dicts in real life.  We generally couldn't care less 
whether they "act random", or are hard to out-guess.  For example,

     assert all(hash(i) == i for i in range(1000000))

has always succeeded.

----------
resolution:  -> rejected
status: open -> closed

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34751>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to