Re: [PEG] Bounds for PEG for a language known to be LL(1)

Nicolas Laurent Wed, 22 May 2019 05:11:21 -0700

>
> We have discussed (and proved) this correspondence here:
> https://www.sciencedirect.com/science/article/pii/S0167642314000276
> https://arxiv.org/abs/1304.3177

Nice, that is good to know!

I think it would be valuable if you could present in more detail which are
> these problematic patterns,
> this would help users of PEG-based tools to avoid them.

The most obvious ones are the "layered" right-associative and
left-associative encodings of binary expressions:

E ::= S + E | S - E | S
S ::= N * S | N / S | N
N ::= [0-9]*

E ::= E + S | E - S | S
S ::= S * N | S / N | N
N ::= [0-9]*

Some details in this paper (https://norswap.com/pubs/sle2015.pdf) in
section 2, under "Perfomrance issues in non-memoizing parsers".

Basically, if you want to parse a number as an expression, you have to
consider each alternative in E, and for each of those you have to consider
each alternative in S, etc... Hence the parse time is exponential in the
number of operators (but not in the size of the input!) -- that's pretty
slow. If you take such a naive encoding without memoization in a language
like java, you're doing maybe 2^12 parsing expression calls to parse a
single number. Since expressions are ubiquitous, this really kills
performance.

The left-associative encoding is only possible in parsers that support it,
but the problem is just the same if rules aren't memoized.

The problem with these patterns is that this is how it's typically done in
CFGs. And this is what produces the correct parse trees, compared to the
"idiomatic" solution that uses the kleene star operator.

I thought I had more examples, but that seems to be it actually. I have
another case of slowdown but it's specific to the left-recursion handling
algorithm (basically when multiple left-recursive cycles have shared nodes
(parsing expressions), it may happen that multiple parsing expression are
marked as left-recursive within the same cycle, and while this produces
correct results, it incurs a slowdown of a similar nature as the above).

Nicolas LAURENT

On Mon, 13 May 2019 at 13:59, Sérgio Medeiros <sqmedei...@gmail.com> wrote:

> On Sat, May 11, 2019 at 6:09 PM Nicolas Laurent <
> nicolas.laur...@uclouvain.be> wrote:
> > Without memoization, a LL(1) language trivially admits an O(n) PEG
> grammar, which should generally look very much like the LL(1) grammar (I
> think they might even be provably identical, but I haven't thought this
> through, so don't quote me on this).
>
> We have discussed (and proved) this correspondence here:
> https://www.sciencedirect.com/science/article/pii/S0167642314000276
> https://arxiv.org/abs/1304.3177
>
> > - There exists a set of problematic grammar patterns that have
> non-exponential complexity but are very slow nonetheless. These all have to
> do with parsing things that look like binary operators. Things are even
> worse when some kind of left-recursion support is present. Without entering
> the details (which I can supply if requested), you get into a bad
> backtracking pattern and you incur things like O(2^(num of operators)) to
> parse a "primitive expression". So to ground this, in Java this is ~ 2^10
> parser invocation to parse a mere integer. It's easy to fall into these bad
> patterns, because some of them are the straight transposition of how
> operators are handled in CFGs. My own solution is to supply explicit parser
> combinators to enable parsing left- / right-associative expressions.
> >
> > What I'm getting at here is that it seems that if you map out the
> potential pitfalls that give bad performance and supply tools to replace
> them, the risk of accidentally incurring catastrophic performance reduces a
> lot.
> > Static (and potentially heuristic) analysis of grammars could also help
> a lot here.
>
> I think it would be valuable if you could present in more detail which are
> these problematic patterns,
> this would help users of PEG-based tools to avoid them.
>
> --
> Sérgio
>

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Re: [PEG] Bounds for PEG for a language known to be LL(1)

Reply via email to