Serhiy Storchaka wrote:
> From my C++ and Java experience, hashtable-based containers are much
> more useful than tree-based containers. They are more compact and fast.
> In most cases the only reason of using sorted container is supporting
> deterministic iteration order, but often it is enough to sort data only
> for output. 

The concern is not constant speed factor but asymptotic complexity.
Inserting or deleting in a tree-based sorted collection is O(log n),
but lists are O(n): the bisect is O(log n) but moving trailing list elements
is O(n). It's memmove fast but dominates as n gets large.

There are many algorithms that are hard to implement as fast as they
can be without a tree. Something like rolling median would be a good
test case, e.g. "for a list N containing numbers, produce a list M of
length len(N) - 1 where M[i] == statistics.median(N[:i+1])". I think the
best you can do with the standard library and without writing a
balanced binary tree from scratch is O(n²) (if you can beat this I'd like
to know how!). With a tree-based sorted list it's O(n log n).

Dicts and sets cannot maintain an arbitrary order, except that
OrderedDict.move_to_end() very occasionally turns out to be all you
need (and even then, I find it tricky to reason about and apply
correctly in an algorithm: it requires you to maintain an invariant,
where SortedList maintains the sorted invariant for you).

> There is no such need
> of tree-based implementation. It is not needed in Python core, and is
> not needed for most users. So it is good to keep it in third-party
> libraries. Maintaining it in the stdlib has a cost and the benefit/cost
> value would be low.

The situation where this bites my firm the most is in interviews. This is
exactly the situation where we're asking candidates to demonstrate
their CS knowledge and produce an algorithm that has low asymptotic
complexity.

We use various services including HackerRank, which provide a
sandboxed Python environment that cannot use PyPI. There are quite
a few services in this kind of Educational+Challenge space including
some that execute Python in the browser using Brython etc.

My team believes that candidates with a Java background find the
challenge easier than candidates with a Python background, because
they just use NavigableSet and move on. I find that kind of embarrassing.
It requires us to apologise for Python's deficiency and fudge how we
score the question, which makes it hard to know we're providing a level
playing field.

(I would argue that we should not ask those kinds of questions or not
use these sandboxed interview environments but institutional friction
makes that magnitude of change very difficult.)

Putting something in the standard library means that it will, in due
course, appear in HackerRank and other services we use, and we can
expect candidates to be familiar with it and incorporate it in their
solutions.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VTOYBWO5OWXID4OIH5FWCKWSWRDXKSTB/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to