Re: Nested captures

Patrick R. Michaud Mon, 09 May 2005 08:29:35 -0700

Here's some more commentary to draft zero of the capturing semantics
(thanks, Damian!), based partially on PGE's current implementation.


On Mon, May 09, 2005 at 10:51:53PM +1000, Damian Conway wrote:
> [...]
> =head2 Nested subpattern captures
> [...]
> There may also be shortcuts for accessing nested components of a subpattern,
> specifically:
> 
>      # Perl 6...
>      #
>      # $1-----------------------------  $2---------  $3--------------------
>      # |     $1.1  $1.2-------------  | |          | |     $3.1  $3.2----  |
>      # |     |   | |        $1.2.1  | | |          | |     |   | |       | |
>      # |     |   | |         |   |  | | |          | |     |   | |       | |
>     m/ ( The (\S+) (guy|gal|g(\S+)  ) ) (sees|calls) ( the (\S+) (gal|guy) 
>     ) /;
> 
> but this has not yet been decided.

After thinking on this a bit, I'm hoping we don't do this -- at least not
initially.  I'm not sure there's a lot of advantage of  C< $1.1 > over 
C< $1[0] >, and one starts to wonder about things like $1.$j.2 and
$1[$j].2 and the like.  

> =head2 Quantified subpattern captures
> [...]
> If a subpattern is directly quantified using the C<?> or C<??> quantifier,
> it produces a single C<Match> object. That object is "successful" if the
> subpattern did match, and "unsuccessful" if it was skipped. 

I'm not sure that PGE has these exact semantics for C<?> yet -- I'll have 
to check.

> =head2 Indirectly quantified subpattern captures
> [...]
> A subpattern may sometimes be nested inside a quantified non-capturing
> structure:
> 
>      #       non-capturing    quantified
>      #  __________/\_________  __/\__
>      # |                     ||      |
>      # |   $1         $2     ||      |
>      # |  _^_      ___^___   ||      |
>      # | |   |    |       |  ||      |
>     m/ [ (\w+) \: (\w+ \s+)* ]**{2...} /
> 
> [...] In Perl 5, any repeated captures of this kind:
> 
>      # Perl 5 equivalent...
>     m/ (?: (\w+) \: (\w+ \s+)* ){2,} /x
> 
> would overwrite the previous captures to C<$1> and C<$2> each time the
> surrrounding non-capturing parens iterated. So C<$1> and C<$2> would
> contain only the captures from the final repetition.
> 
> This does not happen in Perl 6. Any indirectly quantified subpattern is
> treated like a directly quantified subpattern. Specifically, an
> indirectly quantified subpattern also returns an array of C<Match>
> objects, so the corresponding array element for the indirectly
> quantified capture will store an array reference, rather than a single
> C<Match> object.

It might be worthwhile to add a note here that one can still get
at the results of the final repetition by using $1[-1] and $2[-1].

> =head2 Subpattern numbering
> [...]
> Of course, the leading C<undef>s that Perl 5 would produce do convey
> (albeit awkwardly) which alternative actually matched. If that
> information is important, Perl 6 has several far cleaner ways to
> preserve it. For example:
> 
>     rule alt (Str $n) { {$/ = $n} }
> 
>     m/ <alt tea>  (don't) (ray) (me) (for) (solar tea), (d'oh!)
>      | <alt BEM>  (every) (green) (BEM) (devours) (faces)
>      /;

If the C< alt > rule is accepting a string argument, the match
statement probably needs to read

     m/ <alt: tea>  (don't) (ray) (me) (for) (solar tea), (d'oh!)
      | <alt: BEM>  (every) (green) (BEM) (devours) (faces)
      /;


> =head2 Repeated captures of the same subrule
> 
> =head3 Scalar aliases applied to quantified constructs
> [...]
> A set of quantified I<non-capturing> brackets always returns a
> single C<Match> object which contains only the complete substring
> that was matched by the full set of repetitions of the brackets (as
> described in L<Named scalar aliases applied to non-capturing brackets>).

At present, PGE isn't working this way -- aliased quantified non-capturing
brackets returns an array of match objects, same as other quantified
structures.  This can be changed, but I kind of like the consistency 
that results -- 

    "coffee fifo fumble" ~~ m/ .*? $<effs>:=[f <-[f]>**{1..2} \s*]+ /;

PGE currently gives $<effs> an array of matches, same as for the
other capturing constructs.  If someone wants to capture the full
set, it's easy enough to do

    "coffee fifo fumble" ~~ m/ .*? $<effs>:=[ [f <-[f]>**{1..2} \s*]+ ] /;

and it's pretty clear what was intended.

> =head3 Array aliasing
> =head3 Hash aliasing
> =head3 External aliasing
> =head2 The C<:parsetree> flag
> etc.

At the moment PGE doesn't support these, and probably won't until
they're actually needed in the course of developing the compiler
(or until someone adds them).

> [...]
> Moreover, the C<:parsetree> flag overrides the exemption of C<< Ťnameť >>
> subrule calls, so they act as if they were C<< <name> >> calls instead. They
> generate C<Match> objects, and those objects are also appended onto the
> surrounding scope's C<Match> array.

Do we still have the C<< Ťnameť >> syntax for rules?  S05 doesn't 
mention it, A05 mentions it as a non-capturing subrule but I think 
we've since changed to C<< <?name> >> instead.  If we don't have 
C<< Ťnameť >> I'll adjust S05/A05 accordingly.

Pm

Re: Nested captures

Reply via email to