[Python-Dev] Re: Should set objects maintain insertion order too?

Tim Peters Mon, 16 Dec 2019 11:03:08 -0800

[Guido]
> ...
> the language should not disappoint them, optimization opportunities be damned.

I would like to distinguish between two kinds of "optimization
opportunities": theoretical ones that may or may not be exploited
some day, and those that CPython has _already_ exploited.

That is, we don't have a blank slate here. As Raymond said, the set
implementation already diverged from the dict implementation in
fundamental ways for "go fast" reasons. How much are people willing
to see set operations slow down to get this new feature?

For me, "none" ;-) Really. I have no particular use for "ordered"
sets, but have set-slinging code that benefits _today_ from the "go
fast" work Raymond did for them.

Analogy: it was always obvious that list.sort() is "better" stable
than not, but I ended up crafting a non-stable samplesort variant that
ran faster than any stable sort anyone tried to write. For years. So
we stuck with that, to avoid slowing down sorting across releases.

The stable "timsort" finally managed to be _as_ fast as the older
samplesort in almost all cases, and was very much faster in many
real-world cases. "Goes faster" was the thing that really sold it,
and so much so that its increased memory use didn't count for much
against it in comparison.

Kinda similarly, "ordered dicts" were (as has already been pointed
out) originally introduced as a memory optimization ("compact" dicts),
where the "ordered" part fell out more-or-less for free. The same
approach wouldn't save memory for sets (or so I'm told), so the
original motivation for compact dicts doesn't carry over.

So I'm +1 on ordered sets if and only if there's an implementation
that's at least as fast as what we already have. If not now, then,
like ordered dicts evolved, offer a slower OrderedSet type in the
`collections` module for those who really want it, and wait for magic
;-)

BTW, what should

{1, 2} | {3, 4, 5, 6, 7}

return as ordered sets? Beats me.; The imbalance between the
operands' cardinalities can be arbitrarily large, and "first make a
copy of the larger one, then loop over the smaller one" is the obvious
way to implement union to minimize the number of lookups needed. The
speed penalty for, e.g., considering "the left" operand's elements'
insertion orders to precede all "the right" operand's elements'
insertion orders can be arbitrarily large.

The concept of "insertion order" just doesn't make well-defined sense
to me for any operation the produces a set from multiple input sets,
unless it means no more than "whatever order happens to be used
internally to add elements to the result set". Dicts don't have
anything like that, although dict.update comes close, but in that case
the result is defined by mutating the dict via a one-at-a-time loop
over the argument dict.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/YXSGUPZYTO7TKOVVU32276M54TMITVVQ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

Reply via email to