Tim Peters <[email protected]> added the comment:
Thank you, Vincent! I very much enjoyed - and appreciated - your paper I
referenced at the start. Way back when, I thought I had a proof of O(N log N),
but never wrote it up because some details weren't convincing - even to me ;-)
. Then I had to move on to other things, and never got back to it. I was
probably mistaken. So it was delightful to see your elegant proof of that, and
even more.
I'm attaching a new runstack.py that includes code modeling your new adaptive
Shivers strategy. Like all these approximations, it has its own "bad cases" on
smallish inputs. Based on the specific cases this code runs, powersort remains
in a category of its own, with timsort, twomerge, and shivers roughly tied on
_average_ merge cost.
Part of the reason, I suspect: powersort is the only strategy here that's
always optimal for 3-run cases. Which it buys at the cost of never merging the
run just discovered (unless it's the last run in the array). For example, in
31, 16, 1
twomerge and shivers merge 31 and 16 first, before seeing 1, which is "far
from" optimal. timsort and powersort merge 16 and 1 first, although that's by
design in powersort and by luck in timsort. In
16, 31, 1
only powersort does the best thing (note: runstack.py doesn't model peeksort;
I'm sure it would merge 31 and 1 first on that too).
I care about small-number-of-runs because real-life Python lists aren't
asymptotic in nature ;-) People deliberately construct lists with a small
number of runs now before sorting, because they know Python's sort can exploit
that. For many other cases, all runs are artificially forced to length
`minrun`, and then any scheme at all that approximates balanced merging is
about as good as any other.
What I can't know before we time things is whether order-of-merging makes a
measurable difference in real life, and whether powersort's possible delay in
merging the most recent run has bad cache effects that overwhelm its smaller
"merge cost".
In any case, I'm keen to replace timsort's merge-order strategy with
_something_ that's much easier to analyze. The makes your Shivers variant and
powersort the only two real contenders now. It seems quite remarkable that the
Shivers variant has such good asymptotics! It's certainly the easiest to
modify Python's C code to implement (straightforward edits in a single
function).
----------
Added file: https://bugs.python.org/file47818/runstack.py
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue34561>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com