Tim Peters <t...@python.org> added the comment:

Something that may be slightly worth pursuing:  in

    j = (5*j + 1) % 2**power

to get a full-period sequence hitting every integer in range(2**power) exactly 
once, the multiplier (5) is a bit delicate (it has to be congruent to 1 mod 4), 
but the addend (1) only needs to be odd.  When the hash code has a long string 
of high-order 1 bits, the lower bits of `perturb` are all ones repeatedly 
across iterations, and a string of `power` one bits is congruent to -1 mod 
2**power, cancelling out the +1.

Which people stumbled into here:  all negative ints with small absolute value 
_do_ have a long string of high-order 1 bits (as did also my contrived 2**60 - 
2).

If we changed the addend to, say, +3 instead, it would still be possible to 
contrive cases as bad, but you'd be unlikely to stumble into them by accident, 
and they would be sensitive to the value of `power` (the table's current size). 
 For example, for a table size of 8 (the smallest we use), only ints ending 
with bits 101 are congruent to -3 mod 8.  So maximally "bad" hash codes would 
be of the form (bits, where "x" is "doesn't matter", and grouped into chunks of 
PERTURB_SHIFT (5) bits from the right):

    ... xx101 xx101 ... xx101 xx101 xxxxx
 
Provided changing +1 to +3 didn't slow anything down (you never know until you 
try!), that would be fine by me.  But I'm not bothered by the current behavior, 
so I won't be pursuing it.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue38105>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to