[Raymond]
> ...
> * The ordering we have for dicts uses a hash table that indexes into a 
> sequence.
> That works reasonably well for typical dict operations but is unsuitable for 
> set
> operations where some common use cases make interspersed additions
> and deletions (that is why the LRU cache still uses a cheaply updated doubly-l
> linked list rather that deleting and reinserting dict entries).

I'm going to take a stab at fleshing that out a bit:  ideally, an
ordered dict stores hash+key+value records in a contiguous array with
no "holes".  That's ("no holes") where the memory savings comes from.
The "holes" are in the hash table, which only hold indices _into_ the
contiguous array (and the smallest width of C integer indices
sufficient to get to every slot in the array).

"The problem" is that deletions leave _two_ kinds of holes:  one in
the hash table, and another in the contiguous vector.  The latter
holes cannot be filled with subsequent new hash+key+value records
because that would break insertion order.

So in an app that mixes additions and deletions, the ordered dict
needs to be massively rearranged at times to squash out the holes left
behind by deletions, effectively moving all the holes to "the end",
where they can again be used to reflect insertion order.

Unordered dicts never had to be rearranged unless the total size
changed "a lot", and that remains true of the set implementation.  But
in apps mixing adds and deletes, ordered dicts can need massive
internal rearrangement at times even if the total size never changes
by more than 1.

Algorithms doing a lot of mixing of adds and deletes seem a lot more
common for sets than for dicts, and so the ordered dict's basic
implementation _approach_ is a lot less suitable for sets.  Or, at
least, that's my best attempt to flesh out Raymond's telegraphic
thinking there.

Note:  the holes left by deletions _wouldn't_ be "a problem" _except_
for maintaining insertion order.  If we were only after the memory
savings, then on deletion "the last" record in the contiguous array
could be moved into the hole at once, leaving the array hole-free
again.  But that also changes the order.  IIRC, that's what the
original "compact dict" _did_ do.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QQXSSJWHKTUNHSMSHVM7XLMDBMUV7BDX/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to