[Python-ideas] IndexableDict? [was: dict_items.getitem?]

Stephen J. Turnbull Mon, 11 Oct 2021 03:58:55 -0700

Alex Waygood writes:

 > > Should `dict.items()` be indexable now that dicts are ordered? I
 > > say yes. Why shouldn't it?

Read on, MacDuff!

> Would there be a way to ensure that this had the same time
> complexity as indexing of sequences?

Depends on what you mean by "same". I wrote the following earlier,
but apparently didn't send it, if I did, sorry for the duplication.
It's been a while since I looked at the implementation, so I may have
missed something. Caveat coder!

Earlier I wrote:

> The fact that [dicts are] ubiquitous and ordered in Python 3 makes
> "indexing into dicts" a problem to be solved, but not by
> itertools.first, IMO.

I imagine the following has been suggested already somewhere in these
threads, but I don't recall it.

To recap, now that dicts are ordered (since a few versions back in
Python 3), the idea of indexing directly into dicts or dict views
makes conceptual sense. The reason that this is not practically
obvious to implement is that dicts and their views use an internal
hash table whose values are indicies into an array of items (key-value
pairs). The latter array is in insertion order[1], but the active
ones are not necessarily contiguous. "Holes" happen only when a key
is deleted, if I understand correctly. The implementation of
__delitem__ replaces the index with a marker that is skipped in
iteration. If the dict needs to be grown, only the active entries are
rehashed, so compaction already happens already happens at that time.
However, the table resize strategy is such that resizing takes place
with geometrically increasing sizes, giving amortized O(1) for
insertions.

So the barrier to indexing into dicts is that in this implementation
accessing and mutating dict items is O(1) (this is why dicts are
attractive data structures), but adding the indexing capability
requires that *some* fundamental operation become O(n). There are
these possibilities: del could always compact the underlying data
structure, the indexing operation could compact (and then be O(1)
until the next del), or the indexing operation could just be linear
search (and so O(n)). I would use del, because del is relatively
infrequent in my experience, and indexing compacting as-needed would
add overhead to all indexing and view operations (checking whether the
underlying structure is currently compact). Linear search is just
inefficient.

On the other hand, if you are going to try to add this to the builtin
dict, compacting as-needed in the indexing operation is the way to go.
Unless del is frequent, this is O(n) with a small coefficient. It
adds a tiny bit of overhead to del, but doesn't impact other core dict
functionality at all to my knowledge. However, there are comments in
the source about "shared tables", whose design and implementation I
did not study. That might require some additional complexity, or
perhaps shared tables would have to be omitted from the IndexableDict
implementation. There are probably other complexities, too, but I'm
pretty sure the basic strategy is sound.

Since you do need to compact to turn this structure into a contiguous
array, something needs to be O(n).

Deriving such an IndexableDict from dict is conceptually simple:
1. Break out the __compact method (this needs to be done in C, of
course).
2. Override __delitem__ with a version that calls __compact after
calling super's __delitem__.
3. Document that del is O(n).
4. Add an index API. I don't think overriding __getitem__ and
__setitem__ is a good idea, but YMMV -- there are already
specialized implementations of dict that only accept strs as
keys, so isinstance(key, int) would work for such applications.

I'm pessimistic that it's worth putting such a class into the stdlib,
but that depends on people coming forward with compelling use cases.

I'm sure there will be extremely strong pushback on putting the
feature into dict itself: dict accounts for a fraction of CPU cycles
used by Python that's visible to the naked eye. Even additional
complexity that might make it difficult to further optimize this
desigh in the future would be opposed, I believe.

Footnotes:
[1] Note that a deleted key is not remembered. If you delete the
first key of several, then reinsert it, it is now ordered last. AFAIK,
this means that the dict semantics of IndexedDict are truly dict; only
the performance is degraded.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/CS2DWPCXMND3VA3534I242Z7EMYOZ3JG/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] IndexableDict? [was: dict_items.__getitem__?]

Reply via email to

[Python-ideas] IndexableDict? [was: dict_items.getitem?]