New submission from Raymond Hettinger:

The dictresize() method unnecessarily copies the keys, values, and hashes as 
well as making insertions in the index table.   Only the latter step is 
necessary or desirable.

Here in the pure Python code for resizing taking from the original 
proof-of-concept code at https://code.activestate.com/recipes/578375

    def _resize(self, n):
        '''Reindex the existing hash/key/value entries.
           Entries do not get moved, they only get new indices.
           No calls are made to hash() or __eq__().

        '''
        n = 2 ** n.bit_length()                     # round-up to power-of-two
        self.indices = self._make_index(n)
        for index, hashvalue in enumerate(self.hashlist):
            for i in Dict._gen_probes(hashvalue, n-1):
                if self.indices[i] == FREE:
                    break
            self.indices[i] = index
        self.filled = self.used

And here is a rough sketch of what it would look like in the C code (very 
rough, not yet compileable):

static void
insert_indices_clean(PyDictObject *mp, Py_hash_t hash)
{
    size_t i, perturb;
    PyDictKeysObject *k = mp->ma_keys;
    size_t mask = (size_t)DK_SIZE(k)-1;

    i = hash & mask;
    for (perturb = hash; dk_get_index(k, i) != DKIX_EMPTY;
         perturb >>= PERTURB_SHIFT) {
        i = mask & ((i << 2) + i + perturb + 1);
    }
    dk_set_index(k, i, k->dk_nentries);
}

static int
dictresize(PyDictObject *mp, Py_ssize_t minused)
{
    Py_ssize_t i, newsize;
    PyDictKeyEntry *ep0;

    /* Find the smallest table size > minused. */
    for (newsize = PyDict_MINSIZE;
         newsize <= minused && newsize > 0;
         newsize <<= 1)
        ;
    if (newsize <= 0) {
        PyErr_NoMemory();
        return -1;
    }

    /* Resize and zero-out the indices array */
    realloc(dk->dk_indices, es * newsize);
    memset(&dk->dk_indices.as_1[0], 0xff, es * size);
    dk->dk_size = size;

    /* Loop over hashes, skipping NULLs, inserting new indices */
    for (i = 0; i < mp->dk_nentries; i++) {
        PyDictKeyEntry *ep = &ep0[i];
        if (ep->me_value != NULL) {
            insert_indices_clean(mp, ep->me_hash);
        }
    }
    return 0;
}

----------
components: Interpreter Core
messages: 276921
nosy: methane, rhettinger, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Compact dict resizing is doing too much work
versions: Python 3.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28199>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to