Re: [PEG] Ambiguity

Robert Grimm Wed, 18 Oct 2006 14:36:18 -0700

Sure. I went back to our emails and to the CVS logs.

From that, I find that Terence really discovered only one ambiguityin the grammar (Terence, please check this as well). The productionfor DeclarationOrStatement reads:


transient GNode DeclarationOrStatement = /* Choice */
  <Declaration> Declaration
  / <Statement> Statement
  ;

But a declaration may be a block (think block of code directly nestedinside a class declaration) and Statement may also be a block (thinknested block of code). As a result, either alternative matchesexactly the same syntax and, depending on the order of the twoalternatives, you either get a BlockDeclaration AST node or a BlockAST node (with the production as shown it's the BlockDeclaration).

Terence also found a few bugs: The grammar didn't support assertions,empty declarations, and types as primary expressions (toenable .class expressions).

Going through CVS logs for the C, Java, and Rats! grammars, I foundthe following other problems:

Wide string literals in C are written as L"text" and the C grammaroriginally tried to recognize identifiers before literals, with theresult that L"text" would be parsed as identifier "L" followed byregular string literal "text". I discovered that very quickly once Istarted regression testing xtc's C type checker, i.e. actually usedwide string literals.

An integer literal can be a prefix of a floating point literal in C &Java and the Java grammar originally tried to recognize integerliterals before floating point literals. Again, I discovered thatvery quickly once I tested code containing floating point operations.I also avoided that bug when writing the C grammar a while later.

A more subtle and complicated problem occurred in the Rats! grammar:Originally, Rats! didn't have a module system and didn't support theaddition/removal of alternatives in top-level choices. When I addedthat feature, the parser suddenly didn't work anymore. The problemwas an unfortunate interaction between productions recognizing theempty input and greediness. In more detail, sequences may be emptyand an ordered choice is a composed of one or more sequencesseparated by a slash "/":


OrderedChoice Choice =
   s:Sequence ss:( void:"/":Symbol Sequence )*
      { yyValue = new OrderedChoice(new Pair(s, ss).list()); }
   ;

Sequence Sequence =
    n:SequenceName? l:Voided*
      { yyValue = new Sequence(n, l.list()); }
   ;

(I'm giving you the old definition).

Adding new alternatives before another, named alternative isexpressed as:


    choice:Choice "/":Symbol s:SequenceName "...":Symbol

E.g. to inject Java's unsigned right-shift into the production forsymbols common to C and Java, I write:


String SymbolCharacters +=
    <TripleGreaterEqual>  ">>>="
  / <GreaterGreaterEqual> ...
  ;

But since sequences may be empty and choices are greedy the "/<GreaterGreaterEqual>" is consumed by the choice in the aboveexpression, which makes the expression fail on the ellipsis "...".The solution is to add a syntactic predicate to the definition ofsequence:


Sequence Sequence =
   !Ellipsis n:SequenceName? l:Voided*
      { yyValue = new Sequence(n, l.list()); }
   ;

I have had at least one user report, where the interaction betweenempty and greediness caused infinite recursions, b/c the repeatedexpression recognized the empty input.

In summary: Yes, there can be subtle ordering problems, but theyusually show quickly during testing. And for the really hard ones,you have to hope for Terence's sharp eyes and mind. :) Also, a moresubtle class of problems can be caused by interactions between(legal) empty inputs and greediness. At the same time, oncediscovered, the solutions are simple: reorder for the former case andadd a predicate in the latter case.

To put this in perspective: I have translated my Java grammar toScott McPeak's Elkhound. I got 18 shift-reduce and 3 reduce-reduceconflicts. I then had to test across a range of source files to findout that only 2 of these conflicts were real conflicts requiringexplicit disambiguation. Additionally, I had an extremely frustratingday trying to translate the token-level productions into flex rules,with the end result being considerably less precise than Rats!.

Don't get me wrong: I like Elkhound. It is a good tool with very niceinsights into GLR parsing behind it. But it too presents some subtledebugging issues...


Robert


On Oct 18, 2006, at 4:22 PM, Sylvain Schmitz wrote:

Hi all,

Robert Grimm wrote:
I agree with Terence in that ambiguity vs. ordering is a trade-off. With CFGs you may get unnecessary ambiguity and with PEGs youmay get subtle ordering errors. As Terence points out, I had a(very) few in my Java grammar.
I'd love to know what the precise errors were. Would you mindsharing them with the list? Did you ever find a really bad problemwhere no ordering would work and you had to rewrite a larger partof the grammar?
The key difference, however, is that CFGs are only closed undercomposition if you use GLR parsing, while PEGs are always closedundercomposition. As a result, providing modularity for PEGs is simplerand faster than for CFGs. That closes the deal for me...
--
Cheers,

  Sylvain



_______________________________________________
PEG mailing list
[EMAIL PROTECTED]
https://lists.csail.mit.edu/mailman/listinfo/peg

Re: [PEG] Ambiguity

Reply via email to