Here's some more commentary to draft zero of the capturing semantics
(thanks, Damian!), based partially on PGE's current implementation.
On Mon, May 09, 2005 at 10:51:53PM +1000, Damian Conway wrote:
> [...]
> =head2 Nested subpattern captures
> [...]
> There may also be shortcuts for accessing nested components of a subpattern,
> specifically:
>
> # Perl 6...
> #
> # $1----------------------------- $2--------- $3--------------------
> # | $1.1 $1.2------------- | | | | $3.1 $3.2---- |
> # | | | | $1.2.1 | | | | | | | | | |
> # | | | | | | | | | | | | | | | |
> m/ ( The (\S+) (guy|gal|g(\S+) ) ) (sees|calls) ( the (\S+) (gal|guy)
> ) /;
>
> but this has not yet been decided.
After thinking on this a bit, I'm hoping we don't do this -- at least not
initially. I'm not sure there's a lot of advantage of C< $1.1 > over
C< $1[0] >, and one starts to wonder about things like $1.$j.2 and
$1[$j].2 and the like.
> =head2 Quantified subpattern captures
> [...]
> If a subpattern is directly quantified using the C<?> or C<??> quantifier,
> it produces a single C<Match> object. That object is "successful" if the
> subpattern did match, and "unsuccessful" if it was skipped.
I'm not sure that PGE has these exact semantics for C<?> yet -- I'll have
to check.
> =head2 Indirectly quantified subpattern captures
> [...]
> A subpattern may sometimes be nested inside a quantified non-capturing
> structure:
>
> # non-capturing quantified
> # __________/\_________ __/\__
> # | || |
> # | $1 $2 || |
> # | _^_ ___^___ || |
> # | | | | | || |
> m/ [ (\w+) \: (\w+ \s+)* ]**{2...} /
>
> [...] In Perl 5, any repeated captures of this kind:
>
> # Perl 5 equivalent...
> m/ (?: (\w+) \: (\w+ \s+)* ){2,} /x
>
> would overwrite the previous captures to C<$1> and C<$2> each time the
> surrrounding non-capturing parens iterated. So C<$1> and C<$2> would
> contain only the captures from the final repetition.
>
> This does not happen in Perl 6. Any indirectly quantified subpattern is
> treated like a directly quantified subpattern. Specifically, an
> indirectly quantified subpattern also returns an array of C<Match>
> objects, so the corresponding array element for the indirectly
> quantified capture will store an array reference, rather than a single
> C<Match> object.
It might be worthwhile to add a note here that one can still get
at the results of the final repetition by using $1[-1] and $2[-1].
> =head2 Subpattern numbering
> [...]
> Of course, the leading C<undef>s that Perl 5 would produce do convey
> (albeit awkwardly) which alternative actually matched. If that
> information is important, Perl 6 has several far cleaner ways to
> preserve it. For example:
>
> rule alt (Str $n) { {$/ = $n} }
>
> m/ <alt tea> (don't) (ray) (me) (for) (solar tea), (d'oh!)
> | <alt BEM> (every) (green) (BEM) (devours) (faces)
> /;
If the C< alt > rule is accepting a string argument, the match
statement probably needs to read
m/ <alt: tea> (don't) (ray) (me) (for) (solar tea), (d'oh!)
| <alt: BEM> (every) (green) (BEM) (devours) (faces)
/;
> =head2 Repeated captures of the same subrule
>
> =head3 Scalar aliases applied to quantified constructs
> [...]
> A set of quantified I<non-capturing> brackets always returns a
> single C<Match> object which contains only the complete substring
> that was matched by the full set of repetitions of the brackets (as
> described in L<Named scalar aliases applied to non-capturing brackets>).
At present, PGE isn't working this way -- aliased quantified non-capturing
brackets returns an array of match objects, same as other quantified
structures. This can be changed, but I kind of like the consistency
that results --
"coffee fifo fumble" ~~ m/ .*? $<effs>:=[f <-[f]>**{1..2} \s*]+ /;
PGE currently gives $<effs> an array of matches, same as for the
other capturing constructs. If someone wants to capture the full
set, it's easy enough to do
"coffee fifo fumble" ~~ m/ .*? $<effs>:=[ [f <-[f]>**{1..2} \s*]+ ] /;
and it's pretty clear what was intended.
> =head3 Array aliasing
> =head3 Hash aliasing
> =head3 External aliasing
> =head2 The C<:parsetree> flag
> etc.
At the moment PGE doesn't support these, and probably won't until
they're actually needed in the course of developing the compiler
(or until someone adds them).
> [...]
> Moreover, the C<:parsetree> flag overrides the exemption of C<< �name� >>
> subrule calls, so they act as if they were C<< <name> >> calls instead. They
> generate C<Match> objects, and those objects are also appended onto the
> surrounding scope's C<Match> array.
Do we still have the C<< �name� >> syntax for rules? S05 doesn't
mention it, A05 mentions it as a non-capturing subrule but I think
we've since changed to C<< <?name> >> instead. If we don't have
C<< �name� >> I'll adjust S05/A05 accordingly.
Pm