On Sun, May 29, 2005 at 12:52:25PM -0400, Jeff 'japhy' Pinyan wrote:
> I'm curious if <commit> and <cut> "capture" anything.  They don't start 
> with '?', so following the guidelines, it would appear they capture, but 
> that doesn't make sense.  Should they be written as <?commit> and <?cut>, 
> or is the fact that they capture silently ignored because they're not 
> consuming anything?
> 
> Same thing with <null> and <prior>.  And with <after P> and <before P>. 
> It should be assumed that <!after P> doesn't capture because it can only 
> capture if P matches, in which case <!after P> fails.
> 
> So, what's the deal?

I'm not the language designer, but FWIW here is my interpretation.

First, we have to remember that "capture" now means more than just
grabbing characters from a string -- it also generates a successful 
match and a corresponding match object.  Thus, even though <after>, 
<before>, <commit>, <cut>, and <null> are zero width assertions,
maybe they should still produce a corresponding match object 
indicating a successful match.  This might end up being useful in 
alternations or other rule structures:

    m/ [ abc <commit> def | ab ]/ ;
    if $<commit> { say "we found 'abcdef'"; }

    m/ [ abc | def <null> ]/;
    if $<null> { say "we found 'def'"; }

I don't *know* that this would be useful, and certainly there are
other ways to achieve the same results, but keeping the same
capture semantics for zero-length assertions seems to work 
out okay.  Of course, to avoid the generation of the match objects 
one can use <?commit>, <?cut>, <?null>, etc.  I suspect that for the 
majority of cases the choice of <commit> vs. <?commit> isn't going to 
make a whole lot of difference, and for the places where it does make 
a difference it's nice to preserve the interpretation being used by 
other subrules.

Things could be a bit interesting from a performance/optimization
perspective; conceivably an optimizer could do a lot better for the
common case if we somehow declared that <null>, <commit>, <cut>, etc. 
never capture.  But I think the execution cost of capturing vs. 
non-capturing in PGE is minimal relative to other considerations,
so we're a bit premature to try to optimize there.  Overall I think
we'll be better off keeping things consistent for programmers at
the language level, and then build better/smarter optimizers into 
the pattern matching engine to handle the common cases.

Pm

Reply via email to