Alex Waygood writes: > > Should `dict.items()` be indexable now that dicts are ordered? I > > say yes. Why shouldn't it?
Read on, MacDuff! > Would there be a way to ensure that this had the same time > complexity as indexing of sequences? Depends on what you mean by "same". I wrote the following earlier, but apparently didn't send it, if I did, sorry for the duplication. It's been a while since I looked at the implementation, so I may have missed something. Caveat coder! Earlier I wrote: > The fact that [dicts are] ubiquitous and ordered in Python 3 makes > "indexing into dicts" a problem to be solved, but not by > itertools.first, IMO. I imagine the following has been suggested already somewhere in these threads, but I don't recall it. To recap, now that dicts are ordered (since a few versions back in Python 3), the idea of indexing directly into dicts or dict views makes conceptual sense. The reason that this is not practically obvious to implement is that dicts and their views use an internal hash table whose values are indicies into an array of items (key-value pairs). The latter array is in insertion order[1], but the active ones are not necessarily contiguous. "Holes" happen only when a key is deleted, if I understand correctly. The implementation of __delitem__ replaces the index with a marker that is skipped in iteration. If the dict needs to be grown, only the active entries are rehashed, so compaction already happens already happens at that time. However, the table resize strategy is such that resizing takes place with geometrically increasing sizes, giving amortized O(1) for insertions. So the barrier to indexing into dicts is that in this implementation accessing and mutating dict items is O(1) (this is why dicts are attractive data structures), but adding the indexing capability requires that *some* fundamental operation become O(n). There are these possibilities: del could always compact the underlying data structure, the indexing operation could compact (and then be O(1) until the next del), or the indexing operation could just be linear search (and so O(n)). I would use del, because del is relatively infrequent in my experience, and indexing compacting as-needed would add overhead to all indexing and view operations (checking whether the underlying structure is currently compact). Linear search is just inefficient. On the other hand, if you are going to try to add this to the builtin dict, compacting as-needed in the indexing operation is the way to go. Unless del is frequent, this is O(n) with a small coefficient. It adds a tiny bit of overhead to del, but doesn't impact other core dict functionality at all to my knowledge. However, there are comments in the source about "shared tables", whose design and implementation I did not study. That might require some additional complexity, or perhaps shared tables would have to be omitted from the IndexableDict implementation. There are probably other complexities, too, but I'm pretty sure the basic strategy is sound. Since you do need to compact to turn this structure into a contiguous array, something needs to be O(n). Deriving such an IndexableDict from dict is conceptually simple: 1. Break out the __compact method (this needs to be done in C, of course). 2. Override __delitem__ with a version that calls __compact after calling super's __delitem__. 3. Document that del is O(n). 4. Add an index API. I don't think overriding __getitem__ and __setitem__ is a good idea, but YMMV -- there are already specialized implementations of dict that only accept strs as keys, so isinstance(key, int) would work for such applications. I'm pessimistic that it's worth putting such a class into the stdlib, but that depends on people coming forward with compelling use cases. I'm sure there will be extremely strong pushback on putting the feature into dict itself: dict accounts for a fraction of CPU cycles used by Python that's visible to the naked eye. Even additional complexity that might make it difficult to further optimize this desigh in the future would be opposed, I believe. Footnotes: [1] Note that a deleted key is not remembered. If you delete the first key of several, then reinsert it, it is now ordered last. AFAIK, this means that the dict semantics of IndexedDict are truly dict; only the performance is degraded. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CS2DWPCXMND3VA3534I242Z7EMYOZ3JG/ Code of Conduct: http://python.org/psf/codeofconduct/