2009/11/9 Michael Alipio <daem0n...@yahoo.com>:
> Hi,
>
> i'm planning to sort an input file (which was File::Slurp'ed, most likely 
> megabyte-sized file) in various ways. I did some readings and learned several 
> methods
> that people have come up with in recent years. So to summarize, the default 
> sort is fast (uses quick sort),

If default sort is quicksort, I can't find the docs that say so. They
seem to imply (but don't state explicitly) that default sort is
whatever was measured to be fastest on that platform, be it quicksort
or mergesort. See perldoc -f sort (function) and perldoc sort
(pragma).

>explicit (using sub) is a bit slower,

What does this mean? You can write anything, including any sorting
algorithm, in a sub.

> other method
> that uses caching is faster. Then there's Schwartzian Transform and a packed 
> version by Guttman.

ST is not a sorting algorithm. It's a technique to improve the
efficiency of another sorting algorithm by reducing the cost of the
comparison function.

> Seems like everything is clear. Guttman is the fastest,
> until I went to cpan. Found Sort::Key, which claims to be the fastest, even 
> faster that ST, GRT. Now, before someone says, why not try each one and
> see for yourself (which doing such could be another subject for me to learn), 
> my question is this: if such faster sorting algorithms exist, why don't they 
> just replace the default sort function in Perl?

The default sort function is based on the classical sorting paradigm
of having a comparison function and trying to do as few comparisons as
possible. Any replacement for it must be a drop-in replacement in
order to not break existing code. In fact as perldoc -f sort shows,
there have been changes to the default sort over the years: perl 5.6
sort is quicksort, perl 5.7 sort is mergesort, and perl 5.8 and later
has a sort which doesn't seem specified but which can be controlled by
the 'use sort' pragma. The perl team are perfectly capable of
improving efficiency and functionality if they see a need to.

ST works on comparison-based sorting algorithms to make the comparison
function cheaper by precaching as much of the work as possible.
Whether this is more efficient or not depends on how expensive the
comparison is and how much work can be extracted into an ST. If you
need an ST, it's not because perl's sort() is inefficient; it is
because the comparison function you gave it is inefficient. There is
no improvement that the perl authors can ever make to sort() which
will make an ST unnecessary.

Sort::Key has a different interface: instead of a comparison function,
you give it a key-generating function. As a result, it is able to do
the precaching work itself, so no ST would be necessary. This means,
however, that it is not a drop-in replacement for sort(); to use
Sort::Key you have to rewrite code. Perl will always need a sort()
function to maintain backward compatibility and because every sort can
be expressed in terms of comparison functions.

As for whether or why Sort::Key is faster, I can't say because its
perldoc doesn't seem to give away its algorithm. One possible
algorithm it uses is radix sort, which is O(n) provided the calculated
keys are of bounded size -- which works for natural integers, but
fails for bignums.

> And for the classical question, given my situation (in combination with 
> File::Slurp), which is fastest sort method? (I hope somebody includes this in 
> perlfaq in the future).

I'm sorry to say, but this really will depend on your system. I don't
know how Sort::Key works, but regarding all the other sorts, you'd
have to measure it depending on your data and your system.

Do you need the fastest possible sort?

Philip

PS your email client has a very long line length, causing my quoting
above to go somewhat haywire. I'd recommend setting it to something
like 74.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to