Tim Peters <t...@python.org> added the comment:
Has anyone figured out the real source of the degeneration when mixing in negative integers? I have not. XOR always permutes the hash range - it's one-to-one. No possible outputs are lost, and XOR with a negative int isn't "obviously degenerate" in general: >>> for i in range(-16, 17): ... print(i, i ^ -5) -16 11 -15 10 -14 9 -13 8 -12 15 -11 14 -10 13 -9 12 -8 3 -7 2 -6 1 -5 0 -4 7 -3 6 -2 5 -1 4 0 -5 1 -6 2 -7 3 -8 4 -1 5 -2 6 -3 7 -4 8 -13 9 -14 10 -15 11 -16 12 -9 13 -10 14 -11 15 -12 16 -21 Clear enough? "Distinct integers in, distinct integers out", but with the twist that the sign bit flips. That's _seemingly_ minor to me. Yet it has massive effects on tuple hash distribution. Here's a function to show that: def doit(cands, repeat=4): from collections import Counter from itertools import product print(len(cands), "cands **", repeat, "=", len(cands)**repeat) c1 = Counter(map(hash, product(cands, repeat=repeat))) print(len(c1), "distinct hash codes") c2 = Counter(c1.values()) for dups in sorted(c2): print(dups, c2[dups]) Then an example we're proud of: >>> doit(range(100)) 100 cands ** 4 = 100000000 100000000 distinct hash codes 1 100000000 Much the same, mixing in negative ints, but _excluding_ the problematic -1 and -2 inputs: >>> cands = list(range(-50, 51)) >>> cands.remove(-1) # hashes to -2 >>> cands.remove(-2) # for j odd, j ^ -2 == -j >>> doit(cands) 99 cands ** 4 = 96059601 15736247 distinct hash codes 1 70827 2 1005882 3 228578 4 5000424 5 19728 6 1047762 8 8363046 What accounts for such a massive difference? It's not merely that we're using negative ints: >>> doit(range(-101, -1)) # leave -2 in, for a change 100 cands ** 4 = 100000000 100000000 distinct hash codes 1 100000000 So, on their own, negative ints are as spectacularly well-behaved in this range as positive ints, and including -2 creates no problems. I suspect it's related to that x ^ -(x + 1) == -1, but not in a way obvious enough to be ... well, obvious ;-) ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue34751> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com