Hi,

On 12/04/2019 2:44 pm, Inada Naoki wrote:
Hi, all.

I propose adding new method: dict.with_values(iterable)

You can already do something like this, if memory saving is the main concern. This should work on all versions from 3.3.


def shared_keys_dict_maker(keys):
    class C: pass
    instance = C()
    for key in keys:
        for key in keys:
            setattr(instance, key, None)
    prototype = instance.__dict__
    def maker(values):
        result = prototype.copy()
        result.update(zip(keys, values))
        return result
    return maker

m = shared_keys_dict_maker(('a', 'b'))

>>> d1 = {'a':1, 'b':2}
>>> print(sys.getsizeof(d1))
... 248

>>> d2 = m((1,2))
>>> print(sys.getsizeof(d2))
... 120

>>> d3 = m((None,"Hi"))
>>> print(sys.getsizeof(d3))
... 120




# Motivation

Python is used to handle data.
While dict is not efficient way to handle may records, it is still
convenient way.

When creating many dicts with same keys, dict need to
lookup internal hash table while inserting each keys.

It is costful operation.  If we can reuse existing keys of dict,
we can skip this inserting cost.

Additionally, we have "Key-Sharing Dictionary (PEP 412)".
When all keys are string, many dict can share one key.
It reduces memory consumption.

This might be usable for:

* csv.DictReader
* namedtuple._asdict()
* DB-API 2.0 implementations:  (e.g. DictCursor of mysqlclient-python)


# Draft implementation

pull request: https://github.com/python/cpython/pull/12802

with_values(self, iterable, /)
     Create a new dictionary with keys from this dict and values from iterable.

     When length of iterable is different from len(self), ValueError is raised.
     This method does not support dict subclass.


## Memory usage (Key-Sharing dict)

import sys
keys = tuple("abcdefg")
keys
('a', 'b', 'c', 'd', 'e', 'f', 'g')
d = dict(zip(keys, range(7)))
d
{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4, 'f': 5, 'g': 6}
sys.getsizeof(d)
360

keys = dict.fromkeys("abcdefg")
d = keys.with_values(range(7))
d
{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4, 'f': 5, 'g': 6}
sys.getsizeof(d)
144

## Speed

$ ./python -m perf timeit -o zip_dict.json -s 'keys =
tuple("abcdefg"); values=[*range(7)]' 'dict(zip(keys, values))'

$ ./python -m perf timeit -o with_values.json -s 'keys =
dict.fromkeys("abcdefg"); values=[*range(7)]'
'keys.with_values(values)'

$ ./python -m perf compare_to zip_dict.json with_values.json
Mean +- std dev: [zip_dict] 935 ns +- 9 ns -> [with_values] 109 ns +-
2 ns: 8.59x faster (-88%)


How do you think?
Any comments are appreciated.

Regards,

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to