On Saturday, 7 July 2012 at 20:39:18 UTC, Roman D. Boiko wrote:
On Saturday, 7 July 2012 at 20:26:07 UTC, David Piepgrass wrote:
I'd like to add that if we give tree parsing first-class
treatment, I believe the most logical approach to parsing has
three or more stages instead of the traditional two
(lex+parse):
1. Lexer
2. Tree-ification
3. Parsing to AST (which may itself use multiple stages, e.g.
parse the declarations first, then parse function bodies later)
The new stage two simply groups things that are in parenthesis
and braces. So an input stream such as the following:
I bet that after stage 2 you would have performed almost the
same amount of work (in other words, spent almost the same
time) as you would if you did full parsing. After that you
would need to iterate the whole tree (possibly multiple times),
modify (or recreate if the AST is immutable) its nodes, etc.
Altogether this might be a lot of overhead.
My opinion is that tree manipulation is something that should
be available to clients of parser-as-a-library or even of
parser+semantic analyzer, but not necessarily advantageous for
parser itself.
Hmm, you've got a good point there, although simple
tree-ification is clearly less work than standard parsing, since
statements like "auto x = y + z;" would be quickly "blitted" into
the same node in phase 2, but would become multiple separate
nodes in phase 3.
What I like about it is not its performance, but how it matches
the way we think about languages. Humans tend to see overall
structure first, and examine the fine details later. The tree
parsing approach is similarly nonlinear and can be modularized in
a way that might be more intuitive than traditional EBNF.
On the other hand, one could argue it is /too/ flexible,
admitting so many different approaches to parsing that a
front-end based on this approach is not necessarily intuitive to
follow; and of course, not using a standard EBNF-type grammar
could be argued to be bad.
Still... it's a fun concept, and even if the initial parsing ends
up using the good-old lex-parse approach, semantic analysis and
lowering can benefit from a tree parser. Tree parsing, of course,
is just a generalization of linear parsing and so a tree parser
generator (TPG) could work equally well for flat input.