On Friday, 19 April 2013 at 22:56:00 UTC, bearophile wrote:
So, why isn't TimSort the default?
Probably because someone thinks that "on average" the other
sort is faster.
In theory the other should be faster, because if you relax the
constraints of the sort being stable I think in theory you
should be able to write a little faster sorting algorithm (I
don't know if this is true).
I'm just throwing my 2c in, but I think TimSort ought to be the
default. It's simply the safer option. Since worst-case
performance can be designed (and it can be designed unless the
pivots are, at least, pseudo-randomly chosen), there is a very
real risk of the default sort being used in a way that can be
compromised by an attacker. From this perspective, seems to be
like TimSort being default would support the design goal #5 of D,
"Make doing things the right way easier than the wrong way."
Plus, TimSort seems to be faster in most cases in my attempts.
The only cases I could find that it wasn't faster is when I could
guarantee that the data I was passing in had no order to it. In
cases where you suspect (but can't guarantee) data is sorted when
passed in (think fetching from a web API and it gives you sorted
data, but the docs don't say it should), calling TimSort is
nearly as fast as calling an "isSorted" ... So, my recommendation
is to just call TimSort and don't worry about the extra branching
code ([checking sortedness, if not sorted, call sort] vs [call
sort]). This makes it so the "fast" code is also very convenient
to write.
TBH, though, I think the reason that TimSort is not the default
is because it was added only semi-recently. The old stable sort
was not nearly as fast as the unstable sort (and, in fact, IIRC,
it didn't work properly when I tried it). I doubt that someone
intentionally said that quicksort was faster than TimSort on
average ... it's just that no one thought to change the default
when TimSort was added.
Maybe an enhancement request could be made on this?