Re: Let's stop parser Hell

David Piepgrass Sat, 07 Jul 2012 11:25:47 -0700

Note that PEG does not impose to use packrat parsing, eventhough itwas developed to use it. I think it's a historical 'accident'that put
the two together: Bryan Ford thesis used the two together.

Interesting. After trying to use ANTLR-C# several years back, Igot disillusioned because nobody was interested in fixing thebugs in it (ANTLR's author is a Java guy first and foremost) andthe source code of the required libraries didn't have source codeor a license (wtf.)

So, for awhile I was thinking about how I might make my ownparser generator that was "better" than ANTLR. I liked the syntaxof PEG descriptions, but I was concerned about the performancehit of packrat and, besides, I already liked the syntax andflexibility of ANTLR. So my idea was to make something that wasLL(k) and mixed the syntax of ANTLR and PEG while using more sane(IMO) semantics than ANTLR did at the time (I've no idea if ANTLR3 still uses the same semantics today...) All of this is 'waterunder the bridge' now, but I hand-wrote a lexer to help me planout how my parser-generator would produce code. The output codewas to be both more efficient and significantly more readablethan ANTLR's output. I didn't get around to writing theparser-generator itself but I'll have a look back at my handmadelexer for inspiration.

However, as I found a few hours ago, Packrat parsing(typically used tohandle PEG) has serious disadvantages: it complicatesdebugging because offrequent backtracking, it has problems with error recovery,and typicallydisallows to add actions with side effects (because ofpossibility ofbacktracking). These are important enough to reconsider myplans of usingPegged. I will try to analyze whether the issues are sofundamental that I(or somebody else) will have to create an ANTLR-like parserinstead, orwhether it is possible to introduce changes into Pegged thatwould fix these
problems.

I don't like the sound of this either. Even if PEGs were fast,difficulty in debugging, error handling, etc. would give mepause. I insist on well-rounded tools. For example, even thoughLALR(1) may be the fastest type of parser (is it?), I prefer notto use it due to its inflexibility (it just doesn't like somereasonable grammars), and the fact that the generated code istotally unreadable and hard to debug (mind you, when I learnedLALR in school I found that it is possible to visualize how itworks in a pretty intuitive way--but debuggers won't do that foryou.)

While PEGs are clearly far more flexible than LALR and probablymore flexible than LL(k), I am a big fan of old-fashionedrecursive descent because it's very flexible (easy to insertactions during parsing, and it's possible to use custom parsingcode in certain places, if necessary*) and the parser generator'soutput is potentially very straightforward to understand anddebug. In my mind, the main reason you want to use a parsergenerator instead of hand-coding is convenience, e.g. (1) tocompress the grammar down so you can see it clearly, (2) have thePG compute the first-sets and follow-sets for you, (3) getreasonably automatic error handling.

* (If the language you want to parse is well-designed, you'llprobably not need much custom parsing. But it's a nice thing tooffer in a general-purpose parser generator.)

I'm not totally sure yet how to support good error messages,efficiency and straightforward output at the same time, but bythe power of D I'm sure I could think of something...

I would like to submit another approach to parsing that I daresay is my favorite, even though I have hardly used it at all yet.ANTLR offers something called "tree parsing" that is extremelycool. It parses trees instead of linear token streams, andproduces other trees as output. I don't have a good sense of howtree parsing works, but I think that some kind of tree-basedparser generator could become the basis for a very flexible andeasy-to-understand D front-end. If a PG operates on trees insteadof linear token streams, I have a sneaky suspicion that it couldrevolutionize how a compiler front-end works.

Why? because right now parsers operate just once, on the user'sinput, and from there you manipulate the AST with "ordinary"code. But if you have a tree parser, you can routinely manipulateand transform parts of the tree with a sequence of independentparsers and grammars. Thus, parsers would replace a lot of thingsfor which you would otherwise use a visitor pattern, orsomething. I think I'll try to sketch out this idea in moredetail later.

Re: Let's stop parser Hell

Reply via email to