[perl #124220] [BUG] Grammar waterbed-style issue

Brian S. Julin via RT Thu, 07 Sep 2017 18:00:10 -0700

On Tue, 31 Mar 2015 13:17:41 -0700, drf...@pobox.com wrote:
> OS: Ubuntu 14.04 LTS on VirtualBox
> Host OS: Windows 8
> Rakudo version: Current as of 25/03/2015
> 
> This is a simple parser for function argument syntax.
> 
> With this there are two surprising behaviors for the price of one. The 
> first is in the token TOP. As the script stands, the test passes.
> 
> token <term> expands to either a <compound-term> or an <integer>, and of 
> course the first alternative matches, as this trace shows:
> 
> --cut here--
> ｢foo(1)｣
>   term => ｢foo(1)｣
>    compound-term => ｢foo(1)｣
>     atom => ｢foo｣
>     argument-list => ｢1｣
>      integer => ｢1｣
> --cut here--
> 
> The commented-out line should simply bypass expanding <term> into 
> <compound-term>, but instead parsing fails. Note that it's using the 
> same quantifiers in both cases.
> 
> The other waterbed-style issue is in the second set of commented-out 
> lines.
> 
> Just like above, the uncommented line works correctly, and expands to 
> the match tree shown above. However, if you write out the <integer>* % 
> ',' inline in the compound-term directly, the match fails. Since actions 
> don't run on a failed parse (a good thing from the point of view of side 
> effects) I don't have much of a way to debug the situation, but I'll 
> look at parser internals later. Something like a regex debugger and/or 
> REPL would be an excellent idea, and I've already started binding Linux 
> libreadline in perl6.
> 
> Anyway, thoughts for consideration. I'm not certain why the behavior 
> manifests itself, but I'm going to spend some time poking around.
> 
> --cut here--
> use v6;
> grammar Bug {
> 
>    #token TOP { <compound-term>* % \n }
>    token TOP { <term>* % \n }
> 
>    token term {
>      <compound-term>
>    | <integer>
>    }
> 
>    token atom { <[a..z]>+ }
>    token integer { <[0..9]>+ }
> 
>    token argument-list { <integer>* % ',' }
> 
>    token compound-term {
>      #<atom> '(' <integer>* % ',' ')' # This term should be the expanded 
> form of
>      <atom> '(' <argument-list> ')' # This term here, yet the above 
> generates an error.
>    }
> }
> 
> use Test;
> ok Bug.parse('foo(1)');
> --cut here--


Update:

I was only able to replicate this when keeping the first comment
and uncommenting the second one.  No other combination failed the
parse.

The failing case golfs down to:

$ perl6 -e 'grammar Bug { token TOP { f "(" 1* % "," ")" | 1 } }; 
Bug.parse("f(1)"); $/.say;'
Nil
$ perl6 -e 'grammar NoBug { token TOP { f "(" <b> ")" | 1 }; token b { 1* % "," 
} }; NoBug.parse("f(1)"); $/.say;'
｢f(1)｣
 b => ｢1｣

A sequence point will "work around" the bug:

$ perl6 -e 'grammar Bug { token TOP { f "(" 1* % "," {} ")" | 1 } }; 
Bug.parse("f(1)"); $/.say;'
｢f(1)｣

...so will using 1+ instead of 1*.  This would make me suspect that this was 
merely a
problem with backtracking... ISTR there was some sort of issue when using 
patterns
that can match 0 chars.

However, I tried changing <b> to a regex instead of a token to remove the 
ratchet,
and that failed to reproduce the problem.  So -^o^-

[perl #124220] [BUG] Grammar waterbed-style issue

Reply via email to