On Tue, 31 Mar 2015 13:17:41 -0700, drf...@pobox.com wrote: > OS: Ubuntu 14.04 LTS on VirtualBox > Host OS: Windows 8 > Rakudo version: Current as of 25/03/2015 > > This is a simple parser for function argument syntax. > > With this there are two surprising behaviors for the price of one. The > first is in the token TOP. As the script stands, the test passes. > > token <term> expands to either a <compound-term> or an <integer>, and of > course the first alternative matches, as this trace shows: > > --cut here-- > 「foo(1)」 > term => 「foo(1)」 > compound-term => 「foo(1)」 > atom => 「foo」 > argument-list => 「1」 > integer => 「1」 > --cut here-- > > The commented-out line should simply bypass expanding <term> into > <compound-term>, but instead parsing fails. Note that it's using the > same quantifiers in both cases. > > The other waterbed-style issue is in the second set of commented-out > lines. > > Just like above, the uncommented line works correctly, and expands to > the match tree shown above. However, if you write out the <integer>* % > ',' inline in the compound-term directly, the match fails. Since actions > don't run on a failed parse (a good thing from the point of view of side > effects) I don't have much of a way to debug the situation, but I'll > look at parser internals later. Something like a regex debugger and/or > REPL would be an excellent idea, and I've already started binding Linux > libreadline in perl6. > > Anyway, thoughts for consideration. I'm not certain why the behavior > manifests itself, but I'm going to spend some time poking around. > > --cut here-- > use v6; > grammar Bug { > > #token TOP { <compound-term>* % \n } > token TOP { <term>* % \n } > > token term { > <compound-term> > | <integer> > } > > token atom { <[a..z]>+ } > token integer { <[0..9]>+ } > > token argument-list { <integer>* % ',' } > > token compound-term { > #<atom> '(' <integer>* % ',' ')' # This term should be the expanded > form of > <atom> '(' <argument-list> ')' # This term here, yet the above > generates an error. > } > } > > use Test; > ok Bug.parse('foo(1)'); > --cut here--
Update: I was only able to replicate this when keeping the first comment and uncommenting the second one. No other combination failed the parse. The failing case golfs down to: $ perl6 -e 'grammar Bug { token TOP { f "(" 1* % "," ")" | 1 } }; Bug.parse("f(1)"); $/.say;' Nil $ perl6 -e 'grammar NoBug { token TOP { f "(" <b> ")" | 1 }; token b { 1* % "," } }; NoBug.parse("f(1)"); $/.say;' 「f(1)」 b => 「1」 A sequence point will "work around" the bug: $ perl6 -e 'grammar Bug { token TOP { f "(" 1* % "," {} ")" | 1 } }; Bug.parse("f(1)"); $/.say;' 「f(1)」 ...so will using 1+ instead of 1*. This would make me suspect that this was merely a problem with backtracking... ISTR there was some sort of issue when using patterns that can match 0 chars. However, I tried changing <b> to a regex instead of a token to remove the ratchet, and that failed to reproduce the problem. So -^o^-