Tim Peters <[email protected]> added the comment:
We need to worry about timing too :-( I'm using this as a test. It's very
heavy on using 3-tuples of little ints as dict keys. Getting hash codes for
ints is relatively cheap, and there's not much else going on, so this is
intended to be very sensitive to changes in the speed of tuple hashing:
def doit():
from time import perf_counter as now
from itertools import product
s = now()
d = dict.fromkeys(product(range(150), repeat=3), 0)
for k in d:
d[k] += 1
for k in d:
d[k] *= 2
f = now()
return f - s
I run that five times in a loop on a mostly quiet box, and pick the smallest
time as "the result".
Compared to current master, a 64-bit release build under Visual Studio takes
20.7% longer. Ouch! That's a real hit.
Fiddling the code a bit (see the PR) to convince Visual Studio to emit a rotate
instruction instead of some shifts and an add reduces that to 19.3% longer. A
little better.
The variant I discussed last time (add in the length at the start, and get rid
of all the post-loop avalanche code) reduces it to 8.88% longer. The avalanche
code is fixed overhead independent of tuple length, so losing it is more
valuable (for relative speed) the shorter the tuple.
I can't speed it more. These high-quality hashes have unforgiving long
critical paths, and every operation appears crucial to their good results in
_some_ test we already have. But "long" is relative to our current tuple hash,
which does relatively little to scramble bits, and never "propagates right" at
all. In its loop, it does one multiply nothing waits on, and increments the
multplier for the next iteration while the multiply is going on.
Note: "the real" xxHash runs 4 independent streams, but it only has to read 8
bytes to get the next value to fold in. That can go on - in a single thread -
while other streams are doing arithmetic (ILP). We pretty much have to "stop
the world" to call PyObject_Hash() instead.
We could call that 4 times in a row and _then_ start arithmetic. But most
tuples that get hashed are probably less than 4 long, and the code to mix
stream results together at the end is another bottleneck. My first stab at
trying to split it into 2 streams ran substantially slower on realistic-length
(i.e., small) tuples - so that was also my last stab ;-)
I can live with the variant. When I realized we never "propagate right" now, I
agreed with Jeroen that the tuple hash is fundamentally "broken", despite that
people hadn't reported it as such yet, and despite that that flaw had
approximately nothing to do with the issue this bug report was opened about.
Switching to "a real" hash will avoid a universe of bad cases, and xxHash
appears to be as cheap as a hash without glaringly obvious weaknesses gets:
two multiplies, an add, and a rotate.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue34751>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com