On Thu, Nov 11, 2021 at 4:30 AM Paul Moore <p.f.mo...@gmail.com> wrote:

> On Thu, 11 Nov 2021 at 11:51, Antoine Pitrou <anto...@python.org> wrote:
> >
> > On Wed, 10 Nov 2021 21:12:17 -0600
> > Tim Peters <tim.pet...@gmail.com> wrote:
> > > [Bob Fang <boyuf...@bytedance.com>]
> > > > This is a modest proposal to consider having sorted containers
> > > > (http://www.grantjenks.com/docs/sortedcontainers/) in standard
> library.
> > >
> > > +1 from me, but if and only if Grant Jenks (its author) wants that too.
> > >
> > > It's first-rate code in all respects, including that it's a fine
> > > example _of_ Python programming (it's not written in C - in Python).
> >
> > Agreed with Tim.  This is a perfect example of some basic and perennial
> > facility that would fit very well in the stdlib.
>
> I agree as well. Is anyone interested enough to ask the library author
> if he supports doing this? That seems to be the main unanswered
> question here.
>
> But if anyone wants to argue the "the stdlib should be shrinking, not
> growing" position, I suggest they do so *before* someone reaches out
> to the module author. No point in us making the suggestion and then
> being forced to withdraw it.
>

A couple 2 cents and IMHO.

-----
The tldr is not so much "don't put it in the standard lib", but that the
stdlib would be better having particular implementations instead of a
one-size-fits-all e.g. SortedList.

Should the stdlib have e.g. SortedList? Probably not, because the use cases
of such data types are too niche to a one-size-fits-all implementation, and
there are too many implementations with too many of their own settings.
Should it have e.g. BTree, RedBlackTree, SortedArrayList, etc? Probably so,
because those are somewhat fundamental data structures and implementing,
testing, and verifying them is very much non-trivial. While niche, having
them at your immediate disposal is very powerful.
-----

Last year, for fun, after wishing there was a SortedSet in the standard
lib, I ended up implementing a Red-Black Tree and BTree based sorted
dictionary/set[1]. After then trying to use them for my use case[2], I
found that, in order to fully and truly exploit their benefits, the basic
Sequence/Collection/Set/Dict APIs didn't really suffice. I needed APIs that
would let me, e.g. binary search to a particular spot and then iterate, or
give me a range between two points, etc. Such apis tend to be pretty
specialized to exploit their under data structure (which isn't to say an
abstract API can't be created, see Java's NavigableMap, but that level of
abstraction deserves careful consideration).

In the end, it was a fun exercise, but in practice a dictionary and
sorted() got me 90% of the way there and sufficed. Optimizing that last 10%
wasn't worth the effort.

Anyways, I came to two particular, IMHO, conclusions:
1. Sorted collections are very niche. I *could* have used one, but probably
would have spent as much time fiddling with them as using a dict/sorted().
2. If you're in a niche use case, especially for performance, then
abstractions aren't really helpful. Using e.g. a BTree and knowing the
particulars of the implementation are going to be more helpful than having
an abstract SortedList and working out how to use its special APIs for
particular actions.

This basically echos Christopher Baker's sentiment of "whats the use case"
and "why doesn't a sorted() call at the end suffice" to some extent. IME,
Python's sort algorithm is just really good, so it's hard to beat simply
calling sorted() when you're done processing in both runtime performance
and effort-spent-optimizing. I struggle to think of or remember a
particular case where dropping in a sorted collection would be a clear and
obvious win. Maybe a more memory constrained case where sorted() creating a
new list is too much? IDK.

[1] As an aside, RB didn't perform nearly as well as sortedcontainers for
the majority of cases, but the naive BTree was about on par. IIRC,
sortedcollections is basically a specialized BTree.
[2] A wish that came about because I had sorted sets of exact/prefix
strings of 1,000 to 10,000 elements and needed them to interact in various
ways and wanted to preserve their sortedness.

-----

re: "Is this particular one clearly the “best”[*]?"

Performance wise, it's probably about the best insofar as a BTree of
depth=2 and high fanout (10,000 or something? I forget what
sortedcontainers defaults to) is. In my limited experience, it's docs and
claims about its performance were pretty accurate[3].

API wise, it has its share of idiosyncrasies like anything else. e.g.
SortedList claims to implement MutableSequence, but actually raises errors
when calling e.g. append(), which is a bit misleading / wrong to some
extent (to be fair, given MutableSequence's api contract, it's not like
there's particularly clear/right/good choice one can make here).

(disclaimer: i've minimal experience with sortedcontainers; when I looked
at it, it was really only for comparing my own toy BTree/RBT
behavior/performance with a known implementation)

[3] The exception being a claim that its (supposedly) better cache locality
is a significant factor. I think the underlying BTree algorithm is the
dominant factor. But someone who actually knows how to measure cache
locality would be a better judge of that than me, who doesn't know how.


> Paul
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HVZCFPKI7TY6WVLFK43KLVVDTEPIXFFQ/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to