2009/11/9 Michael Alipio <daem0n...@yahoo.com>: > Hi, > > i'm planning to sort an input file (which was File::Slurp'ed, most likely > megabyte-sized file) in various ways. I did some readings and learned several > methods > that people have come up with in recent years. So to summarize, the default > sort is fast (uses quick sort),
If default sort is quicksort, I can't find the docs that say so. They seem to imply (but don't state explicitly) that default sort is whatever was measured to be fastest on that platform, be it quicksort or mergesort. See perldoc -f sort (function) and perldoc sort (pragma). >explicit (using sub) is a bit slower, What does this mean? You can write anything, including any sorting algorithm, in a sub. > other method > that uses caching is faster. Then there's Schwartzian Transform and a packed > version by Guttman. ST is not a sorting algorithm. It's a technique to improve the efficiency of another sorting algorithm by reducing the cost of the comparison function. > Seems like everything is clear. Guttman is the fastest, > until I went to cpan. Found Sort::Key, which claims to be the fastest, even > faster that ST, GRT. Now, before someone says, why not try each one and > see for yourself (which doing such could be another subject for me to learn), > my question is this: if such faster sorting algorithms exist, why don't they > just replace the default sort function in Perl? The default sort function is based on the classical sorting paradigm of having a comparison function and trying to do as few comparisons as possible. Any replacement for it must be a drop-in replacement in order to not break existing code. In fact as perldoc -f sort shows, there have been changes to the default sort over the years: perl 5.6 sort is quicksort, perl 5.7 sort is mergesort, and perl 5.8 and later has a sort which doesn't seem specified but which can be controlled by the 'use sort' pragma. The perl team are perfectly capable of improving efficiency and functionality if they see a need to. ST works on comparison-based sorting algorithms to make the comparison function cheaper by precaching as much of the work as possible. Whether this is more efficient or not depends on how expensive the comparison is and how much work can be extracted into an ST. If you need an ST, it's not because perl's sort() is inefficient; it is because the comparison function you gave it is inefficient. There is no improvement that the perl authors can ever make to sort() which will make an ST unnecessary. Sort::Key has a different interface: instead of a comparison function, you give it a key-generating function. As a result, it is able to do the precaching work itself, so no ST would be necessary. This means, however, that it is not a drop-in replacement for sort(); to use Sort::Key you have to rewrite code. Perl will always need a sort() function to maintain backward compatibility and because every sort can be expressed in terms of comparison functions. As for whether or why Sort::Key is faster, I can't say because its perldoc doesn't seem to give away its algorithm. One possible algorithm it uses is radix sort, which is O(n) provided the calculated keys are of bounded size -- which works for natural integers, but fails for bignums. > And for the classical question, given my situation (in combination with > File::Slurp), which is fastest sort method? (I hope somebody includes this in > perlfaq in the future). I'm sorry to say, but this really will depend on your system. I don't know how Sort::Key works, but regarding all the other sorts, you'd have to measure it depending on your data and your system. Do you need the fastest possible sort? Philip PS your email client has a very long line length, causing my quoting above to go somewhat haywire. I'd recommend setting it to something like 74. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/