On Sun, May 29, 2005 at 12:52:25PM -0400, Jeff 'japhy' Pinyan wrote: > I'm curious if <commit> and <cut> "capture" anything. They don't start > with '?', so following the guidelines, it would appear they capture, but > that doesn't make sense. Should they be written as <?commit> and <?cut>, > or is the fact that they capture silently ignored because they're not > consuming anything? > > Same thing with <null> and <prior>. And with <after P> and <before P>. > It should be assumed that <!after P> doesn't capture because it can only > capture if P matches, in which case <!after P> fails. > > So, what's the deal?
I'm not the language designer, but FWIW here is my interpretation. First, we have to remember that "capture" now means more than just grabbing characters from a string -- it also generates a successful match and a corresponding match object. Thus, even though <after>, <before>, <commit>, <cut>, and <null> are zero width assertions, maybe they should still produce a corresponding match object indicating a successful match. This might end up being useful in alternations or other rule structures: m/ [ abc <commit> def | ab ]/ ; if $<commit> { say "we found 'abcdef'"; } m/ [ abc | def <null> ]/; if $<null> { say "we found 'def'"; } I don't *know* that this would be useful, and certainly there are other ways to achieve the same results, but keeping the same capture semantics for zero-length assertions seems to work out okay. Of course, to avoid the generation of the match objects one can use <?commit>, <?cut>, <?null>, etc. I suspect that for the majority of cases the choice of <commit> vs. <?commit> isn't going to make a whole lot of difference, and for the places where it does make a difference it's nice to preserve the interpretation being used by other subrules. Things could be a bit interesting from a performance/optimization perspective; conceivably an optimizer could do a lot better for the common case if we somehow declared that <null>, <commit>, <cut>, etc. never capture. But I think the execution cost of capturing vs. non-capturing in PGE is minimal relative to other considerations, so we're a bit premature to try to optimize there. Overall I think we'll be better off keeping things consistent for programmers at the language level, and then build better/smarter optimizers into the pattern matching engine to handle the common cases. Pm