On Tue, Nov 9, 2021 at 9:00 PM Steven D'Aprano <st...@pearwood.info> wrote:

> Sorting dicts has been discussed on the Python-Ideas mailing list, it is
> too hard and expensive to justify for the limited use-cases for it. If
> you want to sort a dict, you are best to sort the dict's keys, then
> create a new dict. Or possibly use a dedicated Sorted Mapping type like
> a red-black tree or similar.
>

There are several implementations of key-order sorted dicts available in
Python, and more than a few of them are well tested.  I am the maintainer
of one of them: https://pypi.org/project/treap/ which I ported from Java.
I also did a performance comparison among several of them a few years back.

Red-black trees are popular, perhaps especially among Java developers, but
I've never understood why.  They aren't the fastest.  Among traditional
tree-based key-sorted dicts, treaps are frequently fastest.

However, SortedDict is not uncommonly faster than treaps - I believe this
is because SortedDict is very good at maximizing locality of reference.
Traditional trees are almost always going to do an entire cache line hit
for every node in large trees, even if those nodes are "next to each other"
when sorted/traversed.  SortedDicts put keys that are nearly equal, in
nearly the same part of memory - so multiple values can be retrieved with a
single cache line hit.

Sorting a dict's keys inside a loop tends to give O(n^2) algorithms, and
sometimes even O(n^2 * logn).  This is not good.  A traditional tree should
give O(nlogn) algorithms under similar circumstances, and although my gut
is telling me SortedDict is similar the presentation linked below suggests
a different (though still better than sorting in a loop) runtime for
SortedDict.

It's true that key-sorted dicts are not all that common.  Their most common
use is probably for implementing finite caches - evictions can benefit from
ordered keys.  However, we already have functools.lru_cache.

Here's my most recent performance comparison:
https://stromberg.dnsalias.org/~strombrg/sorted-dictionary-comparison/Datastructure%20comparison.pdf

Here's a PyCon 2016 presentation about SortedContainers, that includes
SortedDict:
https://www.youtube.com/watch?v=7z2Ki44Vs4E

I think it would make sense to include at least one key-sorted dict in
CPython, and feel that SortedDict would not be a bad way to go.  Neither
would treap.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VNK4VPGA4LJH7RMR4LCCACEG2WNDBBFO/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to