I saw the usefulness/need of a Haskell parser for various applications
sometime ago as well. Hence I had a student write one for me. Before you
hold your breath let me state first that it is currently a
pre-alpha-alpha version.
The parser is really built in Haskell (not with parser generators ;-)
using the parser combinators of Graham Hutton and Erik Meijer, including
their extension for handling the offside rule. I think that parser
combinators are more flexible than parser generators (nicest of all: the
parsing functions have the same structure as the syntax definition of
the Haskell report). The problems of the current implementation are:
* it is much too slow
* it doesn't produce any error messages but just fails
* it is hardly tested
* it is documented in German
However, there exists a lot of literature and I have some ideas of my
own on how to attack the first two problems. I really want to see how
far I can get with this approach.
Two remarks about writing a Haskell parser of wide-range use:
I don't consider it a good idea to have the parser directly construct
the algebraic data structure of the abstract syntax tree. You never know
the needs of user of the parser. Hence our parser constructs the
abstract syntax tree by using functions, not data constructors. These
functions are defined in another module. Currently in the only such
module we have, all functions are defined as data constructors of the
algebraic data types which are defined in a third module. For defining
these algebraic data types we did use GHC's HsSyn as model (substituting
our parser for GHC's would be a really good test).
There is still the problem of which information should be passed to the
functions that construct the syntax tree. Source location information is
already collected because of the off-side rule. So every syntactic
construct is told its start location. However, should it also know its
end location?
Some applications may require parsing comments as well.
There is also the problem of how to handle fixities of imported
operators. GHC takes the approach of first parsing all operators as
being left-associative with identical priority. After all imports are
processed (and hence fixities are known), another compiler phase
modifies the syntax tree accordingly.
Laziness would actually allow the neat trick of calling the (complete)
parser function with a table of fixities, which is calculated from the
syntax tree which is produced by the parser. I just fear that this
method may lead to an obscure order of error messages.
Our parser project is currently resting, but I have hope that it will
continue in January.
Olaf
PS: Sven, you did not mention Rojemos Haskell compiler nhc. The parser
is written in pure Haskell with parser combinators. See `Efficient
parsing combinators' at http://www.cs.chalmers.se/~rojemo/thesis.html
--
OLAF CHITIL, Lehrstuhl fuer Informatik II, RWTH Aachen, 52056 Aachen,
Germany
Tel: (+49/0)241/80-21212; Fax: (+49/0)241/8888-217
URL: http://www-i2.informatik.rwth-aachen.de/~chitil/