ok: > > On 11 Mar 2008, at 12:27 pm, Krzysztof Skrzętnicki wrote: > > >I've written little framework to work on. See sortbench.hs and > >sortbench.py attachments. > >Furthermore, I checked Yhc's implementation of sort: it is indeed > >very fast: > > I took his earlier code and plugged yhc's sort into it. > Compiling with ghc -O2 using GHC 6.8.2, I found the yhc code (basically > variant of natural merge) to be considerably slower than some of the > alternatives. > > There is a pretty obvious way to speed up the YHC code which you would > expect to provide nearly a factor of two speedup, and with the random > integer data, it does. > > However, there is one simple but extremely important point which must be > considered in evaluating a sorting routine: the library 'sort' function > is, or should be, a *general-purpose* sort. It should be useful with > any > data type which is an instance of Ord or for which you can write a `cmp` > function, and it should work at least as well with already-sorted input > as with randomised input. quicksort (whose original reason for > existence > was to sort on a machine whose memory would disgrace today's > wristwatches) > is well known for doing deceptively well on randomised integer > sequences. > > When I run Krzystztof's test harness (which I have currently brought > up to > 25 different sorting functions) with a list of the form [1..N] instead > of > a random list, suddenly all the variants of merge sort come out ahead of > all the variants of quick sort. In fact his best version of quicksort, > qsort_iv, comes out fully 1155 times slower than the YHC algorithm on a > list of 10,000 ordered integers. That can be improved by spending a bit > of effort on choosing a good pivot, but then the quicksort algorithms > are > no longer so competitive for randomised inputs. > > The classic "Engineering a Quicksort" paper by Bentley and McIlroy from > Software : Practice & Experience recommends a whole bunch of > distribution > shapes (one run, two runs, sawtooth, organ pipes, random, ...) that > should > be benchmarked before drawing too many conclusions. > > It is also wise to try more than one data type. How do the different > algorithms compare on random samples from a Scrabble dictionary? (Why > that particular question? Because I mean to try it.) > > Right now, I remain happy with merge sort, because it is never > mysteriously > several thousand times slower than expected.
Do you have these tests bundled up in a repository? It would be very useful to drop these into the base library testsuite, so we can point to this when needed. -- Don _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe