Tim Peters <[email protected]> added the comment:
So you don't know of any directly relevant research either. "Offhand I can't
see anything wrong" is better than nothing, but very far from "and we know it
will be OK because [see references 1 and 2]".
That Bernstein's DJBX33A has been widely used inspires no confidence
whatsoever. As already explained, Python _did_ use DJBX33X (with a different
multiplier), and it systematically screwed up catastrophically in some nested
tuple cases already spelled out. Bernstein himself moved to DJBX33X (using xor
instead of addition), and that would have screwed up too on a mix of smallish
integers of both signs.
What Python does now, except for changing the multiplier, is essentially
version 1a of the _also_ very widely used - but more recent - Fowler/Noll/Vo
hash family[1]:
# Version 1a, recommended over version 1 (which does * before ^).
hash = offset_basis
for each octet_of_data to be hashed
hash = hash xor octet_of_data
hash = hash * FNV_prime
return hash
They did extensive studies, and very deliberately use a prime multiplier
subject to a number of other constraints. Not based on "offhand I can't see
why not", but based on analysis and running extensive tests.
But, same as with Bernstein's variants (which predate FNV's work on picking
"good" multipliers), they were developed in the context of hashing sequences of
unsigned 8-bit integers. There's no a priori reason I can see to expect them
to work well in contexts other than that. Indeed, FNV puts special constraints
on the last 8 bits of the primes they pick, presumably because they're only
concerned with hashing sequences of 8-bit values.
FYI, for 32-bit hashes they use multiplier 16777619, and for 64-bit
1099511628211.
In the absence of coherent analysis directly relevant to what Python actually
needs here, we're all flying blind. What we do have is over a decade of
positive real-world experience with the made-up hacks Python used to worm
around a class of gross flaws its prior DJBX33X approach suffered, taking
DJBX33X out of its original context and applying it in an area it wasn't
designed for. Which made it close to FNV-1a, but also in a context _that_
wasn't designed for.
However, it _is_ structurally close to FNV-1a, and using prime multipliers was
a big deal to them. "But offhand I don't see why" isn't a good enough reason
to abandon that. To the contrary, in the absence of _proof_ that it doesn't
actually matter, I'd be happiest using the same multipliers (given above) and
initial values "the standard" FNV-1a algorithms use.
[1] http://www.isthe.com/chongo/tech/comp/fnv/index.html#FNV-1a
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue34751>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com