Tim Peters <[email protected]> added the comment:
Has anyone figured out the real source of the degeneration when mixing in
negative integers? I have not. XOR always permutes the hash range - it's
one-to-one. No possible outputs are lost, and XOR with a negative int isn't
"obviously degenerate" in general:
>>> for i in range(-16, 17):
... print(i, i ^ -5)
-16 11
-15 10
-14 9
-13 8
-12 15
-11 14
-10 13
-9 12
-8 3
-7 2
-6 1
-5 0
-4 7
-3 6
-2 5
-1 4
0 -5
1 -6
2 -7
3 -8
4 -1
5 -2
6 -3
7 -4
8 -13
9 -14
10 -15
11 -16
12 -9
13 -10
14 -11
15 -12
16 -21
Clear enough? "Distinct integers in, distinct integers out", but with the twist
that the sign bit flips. That's _seemingly_ minor to me. Yet it has massive
effects on tuple hash distribution. Here's a function to show that:
def doit(cands, repeat=4):
from collections import Counter
from itertools import product
print(len(cands), "cands **", repeat, "=", len(cands)**repeat)
c1 = Counter(map(hash, product(cands, repeat=repeat)))
print(len(c1), "distinct hash codes")
c2 = Counter(c1.values())
for dups in sorted(c2):
print(dups, c2[dups])
Then an example we're proud of:
>>> doit(range(100))
100 cands ** 4 = 100000000
100000000 distinct hash codes
1 100000000
Much the same, mixing in negative ints, but _excluding_ the problematic -1 and
-2 inputs:
>>> cands = list(range(-50, 51))
>>> cands.remove(-1) # hashes to -2
>>> cands.remove(-2) # for j odd, j ^ -2 == -j
>>> doit(cands)
99 cands ** 4 = 96059601
15736247 distinct hash codes
1 70827
2 1005882
3 228578
4 5000424
5 19728
6 1047762
8 8363046
What accounts for such a massive difference? It's not merely that we're using
negative ints:
>>> doit(range(-101, -1)) # leave -2 in, for a change
100 cands ** 4 = 100000000
100000000 distinct hash codes
1 100000000
So, on their own, negative ints are as spectacularly well-behaved in this range
as positive ints, and including -2 creates no problems.
I suspect it's related to that x ^ -(x + 1) == -1, but not in a way obvious
enough to be ... well, obvious ;-)
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue34751>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com