On Jun 27, 2007, at 5:29 PM, Andrew Lentvorski wrote:
Christopher Smith wrote:
Are we constructing or reusing lists?
Haskell is a purely functional language. :-)
Ignoring the obvious non sequitur ...
It's not a non-sequitur. In a functional language you are always
*defining* a new list, and whether a copy is made or not isn't really
specified by the source code. Since you can't specify state, you
can't specify copying.
Unfortunately, I'm pretty sure that the algorithm is O(n^2) time
*and* o(n^2) space in *all* cases because of the requirement to
copy the lists around.
No such requirement exists (and yes, the wiki page suggests
otherwise, but that's not exactly what it is saying), and even if it
did, I don't see how that would do anything beyond growing constants
on the time side and making the space side grow just as much as the
time side (i.e. be O(nlogn) for the common case).
This reference seems to agree:
http://en.literateprograms.org/Quicksort_(Haskell)
Stewart should look at that link as it addresses most of his issues,
although it doesn't use quickcheck to do unit testing.
and needs to use this code to get O(n log n) time with O(n) space:
This isn't needed to get this level of big O performance (not
surprising, since that is a function of the algorithm, not the code).
Rather, it is necessary to get better performance out of the code.
Indeed, if it were, you'd expect qsort3 to be thousands of times
(100000/log(100000) times) faster and consume 1/100000th the space
than qsort1 for sorting 100,000 elements (well, I guess it is
conceivable that qsort3's constant factor could be substantially
worse, but there is no reason to think that this is true for at least
running time).
qsort3' (x:xs) y = part xs [] [x] []
where
part [] l e g = qsort3' l (e ++ (qsort3' g y))
part (z:zs) l e g
| z > x = part zs l e (z:g)
| z < x = part zs (z:l) e g
| otherwise = part zs l (z:e) g
Doesn't look so clean or obvious anymore.
Perhaps I'm insane, but this doesn't seem any harder to read. I'll
agree that coming up with it in the first place requires some
knowledge of standard optimization tricks (namely the time honored
tradition of avoiding concatenation), but reading it isn't that
difficult.
In fact, what you described is the *classic* pitfall of functional
languages. It is very easy to write code that performs absolutely
horribly because of all the implied copying.
Yes, although runtimes can and do figure out how to minimize this,
which is why you don't see the "slow" version being that much worse
than the "fast" version. The same is not true with other languages.
In this instance, someone spent the extra time to do a functional-
style mergesort that would be matched in performance (and better in
space!) by someone who wrote a stupid shell sort ( O(n^2) time, O
(n) space) in imperative style.
a) They didn't do a mergesort. The GHC List.sort function does a
merge sort, which tends to be the preferred way to do a stable sort
(for example, std::sort is quicksort but std::stable_sort is
mergesort) for obvious reasons.
b) "stupid" shell sort is theoretically faster than quicksort for
worst cases, and not much worse on average. Quicksort tends to be
faster in practice because real world machines don't follow the
abstract computing model. Similarly, it is not clear that even the
original qsort implementation posted would necessarily perform more
slowly on average than "a stupid shell sort in imperative style".
Worse, a naive Haskell programmer would actually have been more
successful with writing a shell sort even in Haskell (which would
give the expected O(n^2) time, O(n) space)!
This is also far from clear to me.
Looking "just like the textbook algorithm" and then executing in
the wrong O() bounds is hardly a good example of where functional
programming shines.
Again, it is not clear to me at all that it would perform in the
wrong O() bounds of the equivalent pseudo code.
--Chris
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg