Tim Peters <t...@python.org> added the comment:

Jeroen, thanks for helping us fly slightly less blind! ;-) It's a lot of work.

I'd say you may as well pick a prime.  It's folklore, but "a reason" is that 
you've discovered that regular bit patterns in multipliers can hurt, and 
sticking to primes eliminates worlds of simple & subtle bit patterns.

Another bit of folklore:  pick a multiplier such that for each byte, neither it 
nor its complement are close in value to any other byte of the multiplier.  
Which may seem more valuable for byte-oriented hashes, yet the byte-oriented 
FNV violates it spectacularly:  all FNV multipliers have _no_ bits set higher 
than 2**8 except for their leading bit.  So the longer the FNV multiplier, the 
more all-0 bytes it contains.

Which appears to be a deliberate choice to limit how quickly each new input 
byte propagates to other lower-order state bits across iterations.  The DJB33 
algorithms accomplish that by using a tiny multiplier (relative to the output 
hash width) instead.

As I explained in other msgs here, treating tuple component hashes as sequences 
of bytes seems to deliver excellent results with the unaltered byte-oriented 
versions of FNV-1[a] and DJBX33[AX] - too good for anything in my test 
collection to detect anything even suspicious, let alone "a problem" (although 
it would be easy to contrive problem cases for 33[AX] - 33 is way "too small" 
to avoid collisions even across tuples with a single component).

So part of our challenge appears to me to be that we're trying instead to 
produce a hash from inputs of the _same_ bit width.  DJB/FNV were designed to 
produce hashes of width at least 4 times their inputs' bit widths.  Each input 
has a relatively tiny effect on the internal state for them.  For us, each 
input can change the entire internal state.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34751>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to