George Russell wrote:
> I hope I will not tread on too many corns if I say that a complete
> rewrite of GHC's parser (at least) is long overdue.

Well, it *has* already been rewritten not so long ago, and if Simon M
doesn't get into a masochistic mood, it won't happen again soon...  :-}

> It is really appalling that
>   (a) there is no error-correction.  (I presume this is because the
>       parser tries to be pure and so all errors result in the
>       non-returning "error" function.)

That's not the case: Lexing and parsing are completely monadic and
there is no "error".

Regarding good error messages and automatic error correction: You
basically have two options: Add Yacc-style error productions generously
and wisely throughout the grammar by hand (IIRC, gcc does this). This
way you can give very reasonable error messages, but on the other hand
it will probably take you aeons to catch most errors. There have been
extensive studies with the net result that there is no such thing like
a "typical error", e.g. beginners make errors quite different from the
ones very skilled people make. So you need an excessive collection of
incorrect Haskell programs from a large variety of users.

The other option is the automatic construction of an error correcting
parser from a grammar. There are a lot of algorithms for this from the
past decades, so you only have to pick one, e.g.

@Article{         roehrich:methods,
  author        = {Johannes R{\"o}hrich},
  title         = "Methods for the Automatic Construction of Error
                   Correcting Parsers",
  journal       = "Acta Informatica",
  year          = 1980,
  volume        = 13,
  number        = 2,
  pages         = "115--139"
}

This one works very well and has
<Advertising>
   been implemented as part of my diploma thesis a few years ago
   http://www.informatik.uni-muenchen.de/~Sven.Panne/eagle.ps.gz
   (sorry, in German only)
</Advertising>
The downside: For good error recovery tokens have to be inserted
automatically, so you have to supply/guess sensible default attributes
for them if you want to continue with semantic analysis later. And
that is *extremely* hard, e.g. what identifier name should be used
when the parser decides that an identifier token has to be inserted?

>   (b) locations of errors are often several lines away from the
>       actual error.

Actually: *Arbitrary* many lines away or even in another module...

>   (c) the parser is so slow.

I would be highly surprised if GHC's parser would take more than a
few percent of the total compilation time for a typical module.

> It is perfectly possible to have error-correcting efficient parsers
> written for and in functional languages which keep location
> information throughout, provided you don't mind using impure features.
> [...]

It's not a matter of pure/impure: Given any error correction strategy,
you can easily construct an example where the "correction" makes you
roll on the floor, laughing out loud... (OK, I've read a more formal
version of this theorem, but I can't find it now :-)

My personal opinion: Given the long compilation/linking times for a
non-trivial project, I simply don't care if compilation aborts after
a fraction of a second. And the "recovery" I've seen so far, e.g.
from some C++ compilers, is definitely not worth any effort.

Cheers,
   Sven "Santa" Panne
-- 
Sven Panne                                        Tel.: +49/89/2178-2235
LMU, Institut fuer Informatik                     FAX : +49/89/2178-2211
LFE Programmier- und Modellierungssprachen              Oettingenstr. 67
mailto:[EMAIL PROTECTED]            D-80538 Muenchen
http://www.informatik.uni-muenchen.de/~Sven.Panne

Reply via email to