As is well known, PEGs have problems with expressing some grammars like
Expr <- Sum / Diff / Num
Sum <- Expr, '+', Num
Diff <- Expr, '-', Num
This uses left recursion which is not a straightforward concept and is hard to
formalize and hard to implement in a parser. Of course, it is possible to write
Expr <- Num, ('+'/'-', Num)*
but this will not give (Expr (Sum (Diff '1' '2') '3') as a parse tree for
'1-2+3'.
I have an idea how to replace left recursion by something more straightforward.
Sometimes e_0 followed by e_1 can get some additional semantic meaning. For
example, in PEG's own grammar Primary followed by '?' becomes Optional, Primary
followed by '*' becomes ZeroOrMore, Primary followed by '+' becomes OneOrMore.
I will call such a rule a "gulper rule" because it "gulps" an expression to
which it is applied.
We can write a series of gulper rules as follows:
e_0 (?) id_1: e_1 / id_2: e_2 / ... / id_n: e_n
When e_0 is followed by e_1 then the sequence is interpreted as id_1. When e_0
is not followed by e_1 and is followed by e_2 then the sequence is interpreted
as id_2 and so on. When e_0 is not followed by any of e_1,...,e_n then e_0 is
just e_0.
Of course, these one-shot gulper rules are no more than syntactic sugar for
ordinary PEG rules.
id_0 <- e_0 (?) id_1: e_1 / id_2: e_2 / ... / id_n: e_n
is essentially equivalent to
id_0 <- id_1 / id_2 / ... / id_n / e_0
id_1 <- e_1, e_0
...
id_n <- e_n, e_0
More important is that some cases of left recursion can be replaced by
repetitive gulper rules. We can write such rules as follows:
e_0 (*) id_1: e_1 / id_2: e_2 / ... / id_n: e_n
For example, for sums and differences the grammar can be
Expr <- Num (*) Sum: '+', Num / Diff: '-', Num.
Such extended PEGs should not produce any serious problems with top-down
parsing.
-----------------------------
Alexander Tsyplakov
_______________________________________________
PEG mailing list
[email protected]
https://lists.csail.mit.edu/mailman/listinfo/peg