Tim Peters <[email protected]> added the comment:
Thinking about the way xxHash works prompted me to try this initialization
change:
x = 0x345678UL + (Py_uhash_t)len; // adding in the length is new
That was a pure win in my tests, slashing the number of collisions in the new
tuple test, 32-bit build, from 51 to 17. In the 64-bit build, cut from 11 to
10, but when looking at the high 32 hash bits instead from 27 to 15. There
were also small improvements in other 32-bit build collision stats.
Full-blown xxHash (& SeaHash, and murmurhash, and ...) also fold in the length,
but not inside their core loops. It's useful info!
But they fold it in _after_ their core loops, not before. They're apparently
worried about this: if their internal state ever reaches 0, that's a fixed
point so long as all remaining incoming data is zeroes (in Python, e.g.,
picture adding one or more trailing integer or float zeroes or ..., which hash
to 0). Folding in the length at the end snaps it slightly out of zeroland.
I don't really care about that. Folding in the length at the start quickly
leads to large differences across iterations for tuples of different lengths
(thanks to repeated mulitiplication and "propagate right" steps), which is
especially helpful in things like the nested tuple tests where there's almost
as much variation in tuple lengths as there is in the few little integers
bottom-level tuples contain.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue34751>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com