Tim Peters <[email protected]> added the comment:
Something that may be slightly worth pursuing: in
j = (5*j + 1) % 2**power
to get a full-period sequence hitting every integer in range(2**power) exactly
once, the multiplier (5) is a bit delicate (it has to be congruent to 1 mod 4),
but the addend (1) only needs to be odd. When the hash code has a long string
of high-order 1 bits, the lower bits of `perturb` are all ones repeatedly
across iterations, and a string of `power` one bits is congruent to -1 mod
2**power, cancelling out the +1.
Which people stumbled into here: all negative ints with small absolute value
_do_ have a long string of high-order 1 bits (as did also my contrived 2**60 -
2).
If we changed the addend to, say, +3 instead, it would still be possible to
contrive cases as bad, but you'd be unlikely to stumble into them by accident,
and they would be sensitive to the value of `power` (the table's current size).
For example, for a table size of 8 (the smallest we use), only ints ending
with bits 101 are congruent to -3 mod 8. So maximally "bad" hash codes would
be of the form (bits, where "x" is "doesn't matter", and grouped into chunks of
PERTURB_SHIFT (5) bits from the right):
... xx101 xx101 ... xx101 xx101 xxxxx
Provided changing +1 to +3 didn't slow anything down (you never know until you
try!), that would be fine by me. But I'm not bothered by the current behavior,
so I won't be pursuing it.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue38105>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com