New submission from Dmitry Rubanovich:

lookdict_index() (and the rest of the files in dictobject.c) are using 
unnecessarily complicated perturb mechanism.  And, in fact, it's slower than 
the simpler case.

Instead of this:

for (size_t perturb = hash;;) {
    perturb >>= PERTURB_SHIFT;
    i = mask & ((i << 2) + i + perturb + 1);
....

it should do this:

for (size_t perturb = hash;;) {
    i = mask & ((i << 1) + perturb + 1);
    perturb >>= PERTURB_SHIFT;
....

This would not only save an instruction (a minor issue), but it would also 
reduce collisions.

I've attached a file which calculates frequencies of collisions for 
demonstration purposes.  It shows that the calculation, as it stands right now, 
does not create a 1-1 mapping even on the 1st iteration through the loop.  
Moving PERTURB_SHIFT to the line before the calculation does reduce the density 
of the collision space.  But using the calculation, which I proposed, 
eliminates collisions on the 1st iteration completely and reduces it on most 
subsequent iterations.

----------
components: Interpreter Core
messages: 296072
nosy: Dmitry Rubanovich, inada.naoki, rhettinger, serhiy.storchaka, xiang.zhang
priority: normal
severity: normal
status: open
title: dict: simplify and improve lookup function
type: enhancement
versions: Python 3.7
Added file: http://bugs.python.org/file46953/collisions_count.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30671>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to