On Saturday, 7 July 2012 at 20:39:18 UTC, Roman D. Boiko wrote:
On Saturday, 7 July 2012 at 20:26:07 UTC, David Piepgrass wrote:
I'd like to add that if we give tree parsing first-class treatment, I believe the most logical approach to parsing has three or more stages instead of the traditional two (lex+parse):

1. Lexer
2. Tree-ification
3. Parsing to AST (which may itself use multiple stages, e.g. parse the declarations first, then parse function bodies later)

The new stage two simply groups things that are in parenthesis and braces. So an input stream such as the following:

I bet that after stage 2 you would have performed almost the same amount of work (in other words, spent almost the same time) as you would if you did full parsing. After that you would need to iterate the whole tree (possibly multiple times), modify (or recreate if the AST is immutable) its nodes, etc. Altogether this might be a lot of overhead.

My opinion is that tree manipulation is something that should be available to clients of parser-as-a-library or even of parser+semantic analyzer, but not necessarily advantageous for parser itself.

Hmm, you've got a good point there, although simple tree-ification is clearly less work than standard parsing, since statements like "auto x = y + z;" would be quickly "blitted" into the same node in phase 2, but would become multiple separate nodes in phase 3.

What I like about it is not its performance, but how it matches the way we think about languages. Humans tend to see overall structure first, and examine the fine details later. The tree parsing approach is similarly nonlinear and can be modularized in a way that might be more intuitive than traditional EBNF.

On the other hand, one could argue it is /too/ flexible, admitting so many different approaches to parsing that a front-end based on this approach is not necessarily intuitive to follow; and of course, not using a standard EBNF-type grammar could be argued to be bad.

Still... it's a fun concept, and even if the initial parsing ends up using the good-old lex-parse approach, semantic analysis and lowering can benefit from a tree parser. Tree parsing, of course, is just a generalization of linear parsing and so a tree parser generator (TPG) could work equally well for flat input.

Reply via email to