Re: [Python-ideas] Reducing collisions in small dicts/sets

MRAB Mon, 26 Jun 2017 11:52:47 -0700

On 2017-06-26 19:21, Tim Peters wrote:

Some explanations and cautions.  An advantage of sticking with pure
Python instead of C is that spare moments are devoted to investigating
the problem instead of thrashing with micro-optimizations ;-)


Why does the current scheme suffer for small tables?  With hindsight
it's pretty obvious:  it can visit table slots more than once (the
other schemes cannot), and while the odds of that happening are tiny
in large tables, they're high for small tables.

[snip]

So where does that leave us?  I don't know.  All schemes have good &
bad points ;-)  I haven't yet thought of a cheap way to compute an
`inc` for double-hashing that isn't vulnerable to bad behavior for
_some_ easily constructed set of int keys.  If you forget "cheap",
it's easy; e.g.,

     random.seed(h)
     inc = random.choice(range(1, mask + 1, 2))

Regardless, I'll attach the current version of the code.

If the current scheme suffers only for small tables, couldn't you use analternative scheme only for small tables?

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Reducing collisions in small dicts/sets

Reply via email to