Raymond Hettinger <[email protected]> added the comment:
ISTM the collision of "hash((1,0,0)) == hash((1,-2,-2))" also stems from
integers hashing to themselves and that twos-complement arithmetic relation, -x
== ~x+1. Further, that example specifically exploits that "hash(-1) ==
hash(-2)" due to how error codes. In a way, tuple hashing is incidental.
Occasional collisions aren't really a big deal -- it is normal for dicts to
have some collisions and to resolve them. I would be more concerned is
non-contrived datasets had many elements that gave exactly the same hash value
resulting in a pile-up in one place.
FWIW, the current algorithm also has some advantages that we don't want to
lose. In particular, the combination of the int hash and tuple hash are good at
producing distinct values for non-negative numbers:
>>> from itertools import product
>>> len(set(map(hash, product(range(100), repeat=4))))
100000000
The current hash is in the pretty-darned-good category and so we should be
disinclined to change battle-tested code that is known to work well across a
broad range of use cases.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue34751>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com