On Thu, 23 Sep 1999, Manuel M. T. Chakravarty wrote:

> [EMAIL PROTECTED] (Marcin 'Qrczak' Kowalczyk) wrote,
>
> > S.D.Mechveliani <[EMAIL PROTECTED]> pisze:
> > > So far, no clear progrm example appeared in this list to demonstrate
> > > Haskell's in-efficiency in comparison to other languages.
> > 
> > I have not done benchmarking myself yet, but in
> > <http://www.brookes.ac.uk/~p0071749/papers/bridging.ps.gz>
> > they describe an algorithm for text formatting.
> > 
> >           | lines | chars | size(KB) | time(s) | memory(KB) |
> > ----------+-------+-------+----------+---------+------------+
> >  Haskell  |   163 |  4640 |      410 |    45.2 |       4505 |
> >  Modula-2 |   551 | 10005 |       74 |    18.6 |        356 |
> >  C++      |   389 |  5832 |      328 |     3.9 |        368 |
> > 
> > It is not quite fair because in Modula-2 and C++ all data structures
> > were of fixed size, but...
> 
> This may lead to a number of different conclusions:
> * Haskell is hopeless,
> * the author of the program has no clue about Haskell,
> * the Haskell compiler is hopeless,
> * the Haskell interpreter is really only an interpreter, or
> * the author forgot to pass -O when calling the Haskell
>   compiler.

The original URL cited by Kowalczyk seems to be dead, but you can get
what I believe is an incarnation of the same work in:

    http://www.brookes.ac.uk/~p0071749/publications.html#para

Please note that this paper does not actually concentrate on a "Haskell
vs.the world" comparison, but on the derivation of a new, linear time
paragraph formatting algorithm.  The date on it was September 9, 1999, so
this is pretty much "hot off the press".

This version of the paper drops the Modula-2 line and the memory size
column; the rest of the new table would look like this:

                     | lines | chars | size(KB) | time(s)  |
---------------------+-------+-------+----------+----------+
Haskell ghc 4.01     |   183 |  4676 |      453 |     4.72 |
Haskell hbc 0.9999.4 |   183 |  4676 |      196 |    10.34 |
C++ g++ 2.90.27      |   416 |  6310 |        8 |     0.43 |

The C++ code was a direct (by hand) translation of the haskell, except
that they did things the way a C++ programmer would do them: using
destructive array updates, and data structures that were of fixed size
for a given problem.  The code size of the C++ version suggests it was
dynamically linked to the usual stuff.

The text they formatted was the ASCII version of _Far from the Madding
Crowd_; clearly a problem of decent size.

You can probably learn a lot more from reading the whole paper, and
presumably from playing with the code.  But that's still a factor of 10
speed difference between the ghc version and the C++ version.  This is
actually a bit encouraging, I think, but does point out some possible room
for improvement.  If I had to make a wild guess, I'd bet the major source
of performance loss is in the haskell data structures used.  While the
pedagogical niceness of defining String to be [Char] is obvious, I know
that (at least in Hugs98) this can really ruin your day.  

(Some of you might like to try doing:

  strace -c your_favorite_standalone_hugs_texthack

and note what you see.  Then write the equivalent 5-line perl script (:-))
and do the strace exercise again.  Ouch.  But since the next version of
Hugs98 is apparently due out imminently, this situation may well have
improved in the new version.)

But, man, things are getting *this close*; if it were really true that all
ghc needs right now to get in line with C++ on text-formatting problems is
a kicking string library and a way for the compiler to know when to use
it, then the arguments usually used in favor become much more compelling
to a far larger audience.

jking




Reply via email to