I have been working on my own TPDL parser (I make the distinction from PEG
because while it works in approximately the same way, it does not support the
fancy Regular-Expression-like syntax, and right now I am working on a token
level rather than a character level) in C (imagine that!) as part of my
summer goal of building a compiler. The parser is working very well, I have
it support left recursion directly (per a paper listen on Bryan's website),
and I am all-around very happy with it.
I wanted to mention one of the way in which I have been building an AST from
my parse tree while the tree itself is being constructed (a more appropriate
term might be: reduced parse tree). I use one rule within the parser and also
two operators within the grammar. One of these operators is already well-
understood and has been mentioned on this list several times.
My two operators are:
i) A non-excludable operator (-), which requires that a non/terminal appear
in the reduced parse tree. If no such operator is used, then terminals are
*not* added to the tree as leafs, and non-terminals, where only one child
exists, are replaced with their child (my automatic reduction rule).
ii) A raise children operator (^), which puts the derivations of a particular
non-terminal in a production in place of the non-terminal. This allows me to
collect lists of nodes.
Note: an identifier surrounded with '<' and '>' signals a token. The empty
'<>' represents the empty token (epsilon transition). The notation I use for
my grammars is fairly straightforward.
Here are some simple things I am able to do with my one rule and two
operators:
Things
: Thing ^Things
: <>
;
ListOfThings
: -Things
;
When applied with 'ListOfThings' as the starting production, this grammar
will result in a tree of 'Thing' nodes, rooted at a 'Things' node. Note:
'ListOfThings' will not appear in the tree.
One normally expects terminals / tokens to appear as the leaves of the tree;
however, with these rules we can force non-terminals to appear as leaves. In
my case, this has helpful as a flag. For example, if a certain token appears
in the terminal stream then I can have a non-terminal appear as a leaf node
in the parse tree; however, if the token does not appear then no such non-
terminal will appear.
Rule
: RuleFlag -<non_terminal>
: RuleFlag -<terminal>
: RuleFlag -<epsilon>
;
RuleFlag
: -NonExcludable
: -Subsumable
: <>
;
NonExcludable : <dash> ;
Subsumable : <up_arrow> ;
The above grammar is actually describing the syntax of the grammar itself,
in this case, how a non/terminal looks in the right-hand-side of a
production. Subsumable might not be the most appropriate word; however, it
is what I thought of at the time. In the above example, if a '-' appears
before a non_terminal token then a 'NonExcludable' non-terminal node will
appear as a leaf in the parse tree.
I was wondering if anyone has done similar things, other than to have a
similar operator to my non-excludable (-) one. Also, does anyone have any
thoughts on my approach; am I possibly missing an obvious pitfalls?
_______________________________________________
PEG mailing list
[email protected]
https://lists.csail.mit.edu/mailman/listinfo/peg