Jeroen Demeyer <j.deme...@ugent.be> added the comment:

> the made-up hacks Python used to worm around a class of gross flaws its prior 
> DJBX33X approach suffered, taking DJBX33X out of its original context and 
> applying it in an area it wasn't designed for.

But we know why the DJBX33A hash didn't work (nested tuples), so we can think 
of the best way to solve that. Python messes with the multiplier, which makes 
it quite a different hash. Surely, if you believe that the precise choice of 
multiplier matters a lot, then you should also agree that arbitrarily changing 
the multiplier in the loop is a bad idea.

My proposal instead is to keep the structure of the DJBX33A hash but change the 
hash of the individual items to be hashed. That's a much less invasive change 
to the known algorithm.

Finally, something that I haven't mentioned here: an additional advantage of my 
approach is that high-order bits become more important:

BEFORE:
>>> L = [n << 60 for n in range(100)]; T = [(a,b,c) for a in L for b in L for c 
>>> in L]; len(set(hash(x) for x in T))
500000

AFTER:
>>> L = [n << 60 for n in range(100)]; T = [(a,b,c) for a in L for b in L for c 
>>> in L]; len(set(hash(x) for x in T))
1000000

Again, I'm not claiming that this is a major issue. Just additional evidence 
that maybe my new hash might actually be slightly better than the existing hash.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34751>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to