[Python-Dev] Re: Should set objects maintain insertion order too?

Tim Peters Tue, 17 Dec 2019 11:53:31 -0800

[Larry]
> "I don't care about performance" is not because I'm aching for Python to
> run my code slowly.  It's because I'm 100% confident that the Python
> community will lovingly optimize the implementation.


I'm not ;-)

>  So when I have my language designer hat on, I really don't concern myself
> with performance.  As I thought I said earlier in the thread, I think we 
> should
> figure out the semantics we want first, and then we figure out how to make it 
> fast.

Pretty much the opposite of how Python started.  The original lists
and dicts were deliberately designed to have dirt simple
implementations, in reaction against ABC's "theoretically optimal"
data structures that were a nightmare to maintain, and had so much
overhead to support "optimality" in all cases that they were, in fact,
much slower in common cases.

There isn't magic waiting to be uncovered here:  if you want O(1)
deletion at arbitrary positions in an ordered sequence, a doubly
linked list is _the_ way to do it.  That's why, e.g., Raymond said he
still uses a doubly linked list, instead of an ordered dict, in the
LRU cache implementation.  If that isn't clear, a cache can be
maintained in least-to-most recently accessed order with an ordered
dict like so:

    if key in cache:
        cached_value = cache.pop(key)  # remove key
   else:
        compute cached_value
    assert key not in cache
    cache[key] = cached_value # most recently used at the end now
    return cached_value

and deleting the least recently used is just "del
cache[next(iter(cache))]" (off topic, just noting this is a fine use
for the "first()" function that's the subject of a different thread).

We _could_ structure an ordered dict's hash+key+value records as a
doubly linked list instead (and avoid the need for O(N)
reorganizations).  But then we piss away much of the memory savings
(to store the new links) that was the _point_ of compact dicts to
begin with.

So there was a compromise.  No links, deletion on its own is O(1), but
can periodically require O(N) under-the-covers table reorganizations
to squash out the holes.  "Suitable enough" for ordered dicts, but
_thought_ to be unsuitable for ordered sets (which appear to have
higher rates of mixing deletions with insertions - the LRU cache
example being an exception).

But there are also other optimizations in the current set
implementation, so "fine, add the doubly linked list to sets but not
to dicts" is only part of it.

Which may or may not be possible to match, let alone beat, in an
ordered set implementation.  A practical barrier now is that Python is
too mature to bank on loving optimizations _after_ a change to a core
feature is released.  It's going to need a highly polished
implementation first.

I certainly don't object to anyone trying, but it won't be me ;-)
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UXWGNLGC3BG2XSMYN5H73FVKP3O2XUUC/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

Reply via email to