On Mon Feb 18 14:00:49 2008, particle wrote:
> in rakudo's perl6doc parser
> (languages/perl6/src/utils/perl6doc/grammar.pg), i have the following:
> 
>   token pod_delimited_block {
>       ^^ '=' <.unsp>? 'begin' <.ws> <block_type> <pod_option>* \n
>       .*?
>       ^^ '=' <.unsp>? 'end'   <.ws> $<block_type> \N*
>       {*}
>   }
> 
> i'd like to capture '.*?' either via an alias or better, via a
> subrule. however, modifying the grammar to something that will
> capture, like
>   (.*?)
> or
>   $<body>=[.*?]
> or
>   <some_subrule>
> 
> causes the match to fail. smells like a pge bug to me.

Turns out that this isn't a bug, although it is a somewhat unexpected
artifact of :ratchet.  When :ratchet is active within a regex (as would
be the case for 'token' or 'rule'), then placing a grouping construct
around .*? effectively makes it non-backtracking.  Or, to be more
precise, the grouping construct doesn't have an explicit quantifier
on it (even though the thing it contains does have one), and thus
once the group matches something then :ratchet prevents us from
backtracking into it.

So, in this specific instance of a token (i.e., :ratchet is in
effect), the expression C<< .*? >> performs backtracking and
will eagerly match any sequence, but C<< (.*?) >> and C<< [.*?] >>
always match exactly the null string because there is an
assumed "cut" operation after the parens or brackets.

There was a short discussion on IRC about possibly changing this
to be somewhat less surprising, but I think we concluded that
the current behavior is the "least bad" one for now.

Closing ticket.

Pm

Reply via email to