> > We have discussed (and proved) this correspondence here: > https://www.sciencedirect.com/science/article/pii/S0167642314000276 > https://arxiv.org/abs/1304.3177
Nice, that is good to know! I think it would be valuable if you could present in more detail which are > these problematic patterns, > this would help users of PEG-based tools to avoid them. The most obvious ones are the "layered" right-associative and left-associative encodings of binary expressions: E ::= S + E | S - E | S S ::= N * S | N / S | N N ::= [0-9]* E ::= E + S | E - S | S S ::= S * N | S / N | N N ::= [0-9]* Some details in this paper (https://norswap.com/pubs/sle2015.pdf) in section 2, under "Perfomrance issues in non-memoizing parsers". Basically, if you want to parse a number as an expression, you have to consider each alternative in E, and for each of those you have to consider each alternative in S, etc... Hence the parse time is exponential in the number of operators (but not in the size of the input!) -- that's pretty slow. If you take such a naive encoding without memoization in a language like java, you're doing maybe 2^12 parsing expression calls to parse a single number. Since expressions are ubiquitous, this really kills performance. The left-associative encoding is only possible in parsers that support it, but the problem is just the same if rules aren't memoized. The problem with these patterns is that this is how it's typically done in CFGs. And this is what produces the correct parse trees, compared to the "idiomatic" solution that uses the kleene star operator. I thought I had more examples, but that seems to be it actually. I have another case of slowdown but it's specific to the left-recursion handling algorithm (basically when multiple left-recursive cycles have shared nodes (parsing expressions), it may happen that multiple parsing expression are marked as left-recursive within the same cycle, and while this produces correct results, it incurs a slowdown of a similar nature as the above). Nicolas LAURENT On Mon, 13 May 2019 at 13:59, Sérgio Medeiros <sqmedei...@gmail.com> wrote: > On Sat, May 11, 2019 at 6:09 PM Nicolas Laurent < > nicolas.laur...@uclouvain.be> wrote: > > Without memoization, a LL(1) language trivially admits an O(n) PEG > grammar, which should generally look very much like the LL(1) grammar (I > think they might even be provably identical, but I haven't thought this > through, so don't quote me on this). > > We have discussed (and proved) this correspondence here: > https://www.sciencedirect.com/science/article/pii/S0167642314000276 > https://arxiv.org/abs/1304.3177 > > > - There exists a set of problematic grammar patterns that have > non-exponential complexity but are very slow nonetheless. These all have to > do with parsing things that look like binary operators. Things are even > worse when some kind of left-recursion support is present. Without entering > the details (which I can supply if requested), you get into a bad > backtracking pattern and you incur things like O(2^(num of operators)) to > parse a "primitive expression". So to ground this, in Java this is ~ 2^10 > parser invocation to parse a mere integer. It's easy to fall into these bad > patterns, because some of them are the straight transposition of how > operators are handled in CFGs. My own solution is to supply explicit parser > combinators to enable parsing left- / right-associative expressions. > > > > What I'm getting at here is that it seems that if you map out the > potential pitfalls that give bad performance and supply tools to replace > them, the risk of accidentally incurring catastrophic performance reduces a > lot. > > Static (and potentially heuristic) analysis of grammars could also help > a lot here. > > I think it would be valuable if you could present in more detail which are > these problematic patterns, > this would help users of PEG-based tools to avoid them. > > -- > Sérgio >
_______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg