Tim Peters <t...@python.org> added the comment:

> when you do t ^= t << 7, then you are not changing
> the lower 7 bits at all.

I want to leave low-order hash bits alone.  That's deliberate.

The most important tuple component types, for tuples that are hashable, are 
strings and contiguous ranges of "not huge" ints.  The current string hash 
works hard to "act randomly" in every bit position - there's no point in trying 
to "improve" anything about the hashes it already produces.

In contrast, the hashes of any contiguous range of N not-huge integers 
(excluding -1) are already, by construction, guaranteed to have no collisions 
at all in their low-order (roughly) log2(N) bits.  They can't be improved in 
this respect, because they're already optimal in this respect.

So, if anything, I'd look at increasing the left shift rather than reducing it. 
 The low bits are already un-improvable in the most important cases.


> So applications using hash(x) % 128 will still see
> all the problems that we are trying to fix.

?  The only "problem" I've seen identified was in mixing positive and negative 
integers.  Here on a 32-build with the FNV-1a 32-bit multiplier:

>> from itertools import product
>> from collections import Counter
>> cands = list(range(-50, 51))
>> cands.remove(-1)
>> c = Counter()
>> for t in product(cands, repeat=4):
..     c[hash(t) & 0x7f] += 1
>>> len(c)
128

So all 128 lower 7-bit patterns showed up.  And they're quite evenly 
distributed:

>>> c2 = Counter(c.values())
>>> for k in sorted(c2):
...     print(k, c2[k])
...
781202 1
781207 1
781209 2
781212 1
781213 2
781214 1
781215 4
781221 3
781222 1
781225 3
781226 1
781227 3
781228 2
781229 2
781230 1
781231 1
781232 2
781233 3
781234 1
781235 4
781236 2
781237 1
781238 2
781240 5
781241 6
781242 1
781243 1
781244 1
781245 1
781247 1
781248 2
781251 2
781252 4
781253 3
781254 5
781255 2
781256 2
781257 3
781259 2
781260 1
781261 1
781262 1
781263 2
781264 4
781265 2
781266 1
781267 1
781268 4
781269 1
781270 1
781271 2
781272 1
781274 2
781275 1
781276 1
781278 1
781280 1
781281 2
781282 2
781285 1
781286 2
781288 1
781295 1
781297 2
781301 1
781302 1
781304 1
781307 1

> With the standard FNV multiplier on 64 bits, I did
> get collisions while testing.

When testing what, specifically?  And the standard 32-bit FNV multiplier, or 
the standard 64-bit FNV multiplier?

> Instead, I chose 3**41 as multiplier. But of course,
> there are still plenty of bikeshedding opportunities
> for the multiplier...

Not for me.  If the universally used 64-bit FNV multiplier can't be used in the 
context of Python's tuple hash fiddled to use an almost-pure form of FNV-1a, 
then that approach is dead to me.  Needing to pick multipliers out of thin air 
instead implies the theory underlying FNV-1a doesn't transfer to this context.

About which I have no current opinion.  It may or may not.  Since we're flying 
blind, I'm just willing to _assume_ it does until proven otherwise by testing.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34751>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to