Hi Bryan,
I'm not sure if this is quite the answer you were looking for.
In my latest version of a PEG parsing library I tried a number of hacks
(like allowing tree decomposition syntactic actions, and special rules to
express binary recursion) but in the end I abandoned them all as just being
too complicated to explain to outsiders. I decided to go back to basics
where each rule literally corresponds to a matching algorithm. This seems to
be more in the spirit of a PEG. Therefore all of the left-recursion examples
simply lead to a stack-overflow.
I find that expressing left-recursive rules iteratively rather than
recursively leads to easier to understand grammars, even though the parse
tree is a pain to work with. To deal with the ugly parse trees I simply use
a tree transformation to convert the tree into the desired form.
So for a concrete example, here is a snippet from a JavaScript grammar from
my latest parsing library (Jigsaw)
public static Rule LeafExpr = Node(ParenExpr | NewExpr | Function |
Literal | Identifier);
public static Rule PrefixExpr = Node(PrefixOp + Recursive(() =>
UnaryExpr));
public static Rule UnaryExpr = (PrefixExpr | LeafExpr) + WS;
public static Rule PostfixOp = Field | Index | ArgList;
public static Rule PostfixExpr = Node(UnaryExpr + WS +
ZeroOrMore(PostfixOp));
Here is a snippet from the "Node" rewriting algorithm that transforms the
"PostfixExpr" node into something resembling the output of a left-recursive
rule parse-tree.
case "PostfixExpr":
{
Debug.Assert(n.Count != 0);
// Is it really a postfix expression? If not
return the sub-expression
if (n.Count == 1)
return n.Nodes[0];
var last = n.Nodes.Last();
switch (last.Label)
{
case "Field":
return LeftGroup(n, "FieldExpr");
case "Index":
return LeftGroup(n, "IndexExpr");
case "ArgList":
return LeftGroup(n, "CallExpr");
}
Cheers,
Christopher Diggins
http://code.google.com/p/jigsaw-library
On Sun, Oct 9, 2011 at 4:39 PM, Bryan Ford <[email protected]> wrote:
> Left recursion in PEGs indeed seems like an interesting can of worms. For
> those interested, I'm wondering how a few example grammars behave under your
> preferred left-recursive parsing technique, and how you think they should
> behave.
>
> First, a trivially evil left-recursive grammar:
>
> S <- S
>
> For example, does your parser detect and reject this somehow, or does it
> behave the same as 'S <- f'? (I hope it doesn't result in an infinite loop
> at runtime anyway. :) )
>
> Now a grammar that's weird, not necessarily evil, in a slightly more subtle
> way:
>
> S <- S / a
>
> Does this behave the same as 'S <- a', or do something else? How should it
> behave?
>
> Cranking up the evilness factor one notch with a mutually left-recursive
> grammar…
>
> S <- T / a
> T <- S / &a
>
> Given the input string "a", does this behave the same as 'S <- a'
> (succeeding and consuming) or the same as 'S <- &a' (succeeding but
> consuming no input)? Do S and T behave the same way or differently? Should
> they?
>
> Now, another grammar that's not necessarily evil but strange in a slightly
> different way:
>
> S <- Saa / a /
>
> Given the input string 'aaaa', for example, does/should this grammar
> consume just 3 or all 4 a's, or does it do something else? What should it
> do?
>
> Cheers,
> Bryan
> _______________________________________________
> PEG mailing list
> [email protected]
> https://lists.csail.mit.edu/mailman/listinfo/peg
>
_______________________________________________
PEG mailing list
[email protected]
https://lists.csail.mit.edu/mailman/listinfo/peg