On Mon Feb 18 14:00:49 2008, particle wrote: > in rakudo's perl6doc parser > (languages/perl6/src/utils/perl6doc/grammar.pg), i have the following: > > token pod_delimited_block { > ^^ '=' <.unsp>? 'begin' <.ws> <block_type> <pod_option>* \n > .*? > ^^ '=' <.unsp>? 'end' <.ws> $<block_type> \N* > {*} > } > > i'd like to capture '.*?' either via an alias or better, via a > subrule. however, modifying the grammar to something that will > capture, like > (.*?) > or > $<body>=[.*?] > or > <some_subrule> > > causes the match to fail. smells like a pge bug to me.
Turns out that this isn't a bug, although it is a somewhat unexpected artifact of :ratchet. When :ratchet is active within a regex (as would be the case for 'token' or 'rule'), then placing a grouping construct around .*? effectively makes it non-backtracking. Or, to be more precise, the grouping construct doesn't have an explicit quantifier on it (even though the thing it contains does have one), and thus once the group matches something then :ratchet prevents us from backtracking into it. So, in this specific instance of a token (i.e., :ratchet is in effect), the expression C<< .*? >> performs backtracking and will eagerly match any sequence, but C<< (.*?) >> and C<< [.*?] >> always match exactly the null string because there is an assumed "cut" operation after the parens or brackets. There was a short discussion on IRC about possibly changing this to be somewhat less surprising, but I think we concluded that the current behavior is the "least bad" one for now. Closing ticket. Pm