Tim Peters <[email protected]> added the comment:
To be concrete, here's an implementation of a full-blown, stable lexicographic
sort based on "bucket refinement". `xs` is a list of sequences to be sorted in
lexicographic order. The types of the sequences don't matter (lists, tuples,
strings, ...). Indeed, since list elements are never compared against each
other directly, they don't even have to be the same sequence type.
This is already faster in pure Python than list.sort() for cases like:
xs = [random.choices(range(100000), k=random.randrange(5, 30))
for i in range(1000000)]
However, for cases like strings of the form
'x' * 10_000 + str(i)
it's very much slower than list.sort(), despite that it "should be" very much
faster. That appears mostly due to the:
t.sort(key=lambda x: x[k])
xs[lo : hi] = t
lines. list.sort() doesn't accept `lo` and `hi` arguments, so sorting _just_ a
slice requires copying that slice into a temp list, sorting the temp, then
copying the sorted temp back in. So dealing with a single `x` position in the
string prefixes effectively requires physically copying the entire list twice
over - mad overhead to copy the list 20 thousand times.
While the need doesn't come up all that often, I'd support adding optional `lo`
and `hi` arguments to `list.sort()`. This isn't the first time the lack has
turned a slick approach into a timing disaster for me ;-)
About list.sort()'s merge strategy, I'm glad to be rid of the original. It's
not just correctness, it's the effort of _reasoning_ about its consequences. It
was about a full decade before the first proof was published of that it really
is a worst-case O(N log N) sort. listsort.txt didn't contain a proof because
the only proof attempts I had were so complicated not even _I_ found them
compelling ;-)
Vincent Jugé in particular started at the other end, looking for a merge
strategy that made proof straightforward instead of Byzantine. It's
straightforward under the "powersort" strategy too, although it relies on "well
known" results about approximations to optimal binary search trees.
def lexisort(xs):
buckets = [(0, len(xs), 0)]
while buckets:
lo, hi, k = buckets.pop()
t = []
for i in range(lo, hi):
x = xs[i]
if k >= len(x):
xs[lo] = x
lo += 1
else:
t.append(x)
t.sort(key=lambda x: x[k])
xs[lo : hi] = t
while lo < hi:
pivotk = xs[lo][k]
i = lo + 1
while i < hi and xs[i][k] == pivotk:
i += 1
if i - lo > 1:
buckets.append((lo, i, k + 1))
lo = i
assert lo == hi
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue45530>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com