[Python-ideas] Re: Experimenting with dict performance, and an immutable dict

Guido van Rossum Tue, 21 Jul 2020 21:54:15 -0700

That seems pretty clear — presumably it follows the lead of frozenset and
tuple.


On Tue, Jul 21, 2020 at 21:45 Todd <toddr...@gmail.com> wrote:

> What, exactly, is frozen?  My understanding is that one problem with
> frozen dicts in the past is deciding exactly what is mutable and what is
> immutable.  Can you change what object a key maps to so long as the set of
> keys stay the same?  Can you modify the contents of mutable object that is
> a value?
>
> On Tue, Jul 21, 2020 at 6:30 PM Marco Sulla <marco.sulla.pyt...@gmail.com>
> wrote:
>
>> Let me first say that the code and discussion, as per title, is also
>> about possible performance improvements to the dict base type.
>>
>> TL;DR I implemented a frozendict using CPython 3.9 code. It seems that an
>> immutable dict *could* be faster than dict in some cases. Furthermore, some
>> optimization could be applied to dict too.
>>
>> Long explaining:
>>
>> Since now I have some time, I decided to experiment a little with the
>> creation of an immutable dict in Python.
>>
>> Unluckily, I started this experiment many months ago, so the CPython code
>> I used is old. Maybe some or all of my considerations are outdated.
>>
>> Initially, I wrote a quick and dirty implementation:
>>
>> https://github.com/Marco-Sulla/cpython/commit/fde4e6d236b19636063f8afedea8c50278205334
>>
>> The code was very simple, and the performance was identical to dict. So
>> in theory, adding a frozendict to CPython is not hard. But there's more.
>>
>> After the first implementation, I started to try to improve the
>> performance of frozendict. The result of the improvements are here:
>>
>>
>> https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.txt
>>
>> For benchmarks, I used simply timeit, with autorange and repeat and, as
>> suggested in the module documentation, I got the minimum of the results.
>> Here is the code:
>>
>>
>> https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.py
>>
>> I have not tested with an optimized build, since optimization data is
>> collected using the unit tests, and I didn't write tests for frozendict in
>> the official CPython test framework.
>>
>> The tests and benchmarks were done using CPython 3.9a0. CPU and other pc
>> resources were not restricted using pyperf or similar tools, to see the
>> "real" speed. CPython was compiled using gcc and g++.
>>
>> In benchmarks, I compared methods and operators using dict and
>> frozendict. The benchmark uses dicts with all integers and dicts with all
>> strings. Furthermore, I tested dicts with size 8 (the most optimized size
>> in CPython) and 1000 elements (maybe too much, but I wanted to see how they
>> perform with a high number of elements).
>>
>> Every benchmark has a line in the output. The Name describes the
>> benchmarked method, operator or code snippet. Size is the size of the dict,
>> 8 or 1000. Keys are the keys type, str or int. Type is the dictionary type,
>> dict or frozendict.
>>
>> In Name, the "o" represents the object itself (dict or frozendict). "d",
>> in benchmark with dict, is "o"; in benchmarks with frozendict is an
>> equivalent instance of type dict.
>>
>> Some consideration:
>>
>> 1. frozendict is very fast, as expected, at copying. But it's also faster
>> at creation, using a (not frozen) dict, kwargs or a sequence2. Speedups
>> range from 20% to 45%.
>> 2. frozendict is also a bit faster when you iterate over it, especially
>> over values, where is ~15% faster
>> 3. hash seems really fast, but this is because it's cached the first time
>> hash() is invoked
>> 4. where frozendict is slower is when you unpickle it and with fromkeys
>> and repr. This is because I wrote a very naif implementation of these
>> methods, without optimizing them. The other methods have a comparable speed.
>>
>> Here is the code:
>> https://github.com/Marco-Sulla/cpython
>>
>> Here is the diff between the CPython code and my commits:
>> https://github.com/python/cpython/compare/master...Marco-Sulla:master
>>
>> About code
>>
>> I coded the implementation doing a simple copy/paste of the existing dict
>> functions, modifying their code and renaming them. This way I'm sure dict
>> continues to work as before, and I can compare the speed gain.
>>
>> Some of the optimization I adopted can be implemented in `dict` too. For
>> example, instead of creating an empty dict and resizing it, I create it
>> with the "maximum" size and I fill it. It seems to work, even if I did not
>> explore the possibility that a mutable object can change while a frozendict
>> creation is in progress.
>>
>> Some problems with optimizing dict and maintaining a frozendict:
>>
>> 1. duplication of code. To gain a significant boost, I had to copy and
>> paste a lot of code. Functions can be remerged again, but maybe the speedup
>> will be reduced.
>> 2. split vs combined dicts. As I wrote, split dicts seem to be faster in
>> reading than combined dicts. For example, iterating over values is faster
>> with a split dict, as expected.
>> But writing was not tested; furthermore, some of the optimizations can be
>> adopted for dicts too, so the convenience of a split table can be lowered.
>> dict continues to maintain both split and combined tables, so this could
>> be not a problem. But the code could be less and more fast if only a table
>> layout is supported
>> 3. the CPython code I used is old, so some of the improvements I adopted
>> could be already implemented
>>
>> About frozendict
>>
>> Apart the considerations done in the [PEP 416](
>> https://www.python.org/dev/peps/pep-0416/), that was rejected since
>> there was little gain from its implementation, I think that frozendict can
>> be useful as a substitute of MappingProxyType, that is really slow.
>> MappingProxyType is not much used, but it's present in critical parts of
>> CPython code, for example in _sre. We have to see if a mapping proxy type
>> *can* be substituted with an immutable map in some critical part of CPython
>> code.
>>
>> Furthermore, frozendicts could be used for implementing "immutable"
>> classes and modules, and can be used as a faster dict if its content does
>> not change.
>> _______________________________________________
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/K7CRVW6O7RO6DT3JIG3OAJCAVCA5CNTN/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/GAJHZEBOU5BGYJAUNQP5V66KSS6HHSQD/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
--Guido (mobile)

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YPQRFJIHHQQAVYN5AZ53HONUJJRKEV5A/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Experimenting with dict performance, and an immutable dict

Reply via email to