Patrick clarified:

At any rate, I find that having a subpattern capture base its
index on the highest index of all of the previous alternation
branches is easy to understand and works well in practice.  It can
also be easily changed with another alias if needed.

I strongly agree, and would be unhappy to see it work any other way.


* If a subrule appears two (or more) times in the same lexical scope
 (i.e. twice within the same subpattern and alternation), or if the
 subrule is quantified anywhere within the entire rule, then its
 corresponding hash entry is always assigned a reference to an array
 of Match objects, rather than a single Match object.

Maybe you're not the right person to ask, but is there a particular
reason for the "entire rule" bit?

/ (<foo>|None) <foo> (<foo>) /

Here we get three Matches $0<foo> (possibly undefined), $<foo>, and
$1<foo>. At least, I think so.

/ (<foo>?) <foo> (<foo>) /

Now, we suddenly get three more or less unrelated arrays with lengths
1..1, 1, and 1. Of course, I admit this example is a bit artificial.


Oh, I hadn't caught that particular clause (or hadn't read it as
you just did).  PGE certainly doesn't implement things that way.
I think the "entire rule" clause was intended to cover cases like

    / [ <foo> ]* /

where <foo> is indirectly quantified and therefore is an array of
match objects.  We should probably reword it, or get a clarification
of what is intended.  (Damian, @Larry:  can you confirm or clarify
this for us?)

Sorry, you're correct that it's not what was intended. I was specifically trying to address the case where the same subrule appears with different quantifications in different alternations in the same scope.

That is, the difference between:

        m/ bar <foo> | baz <foo> /     # $<foo> always contains a scalar

and:

        m/ bar <foo> | baz <foo>* /    # $<foo> always contains an array ref


Is this clearer:

    * If a subrule appears two (or more) times in any branch of a
      lexical scope (i.e. twice within the same subpattern and
      alternation), or if the subrule is quantified anywhere within a
      given scope, then its corresponding hash entry is always assigned
      a reference to an array of Match objects, rather than a single
      Match object.


???

If so, I'd be happy if someone wanted to update the Synposis that way.

Note, however, that this question suggests that we need a more overt statement about what consistitutes a scope within a regex. I'll work on providing that when I take my next pass through the Synopses (probably next week).



Furthermore, I think "within the same subpattern and alternation" is
not quite correct, at least it wouldn't apply to somethink like

/ (<foo> [ <foo> | ... ]) /

unless we consider the (...) sequence as a kind of single branch
alternation. And why are alternation branches considered to be
lexical scopes, anyway?

In the example you give, $0<foo> is indeed an array of match objects.
The "same alternation" in this case is the subpattern... compare to

   / (<foo> [ <foo> | ... ]) | <foo> /

$0<foo> is an array, $<foo> is a single match object.

Alternation branches don't create new lexical scopes, they just
affect quantification and subpattern numbering. In both of the following examples

    / abc <foo> def <foo> /

    / ghi <foo> | jkl <foo> /

each <foo> has the same lexical scope ($<foo>), but in the "abc"
example $<foo> is an array of match objects, while in the "ghi"
example $<foo> is a single match object.

Patrick is spot-on here.

In simplest terms, the only things that create a scope are the regex delimiters (which delimit the outermost lexical scope), and any pair of capturing parentheses (which delimit some nested scope).


My second question is why adding a "?" or "??" to an unquantified
subrule which would otherwise result in a single Match object should
result in an array, rather than a single (possibly undefined) Match.

The specification was originally this way but was later changed
to the current definition.  I think people found the idea of
"?" producing a single match object confusing, so for consistency
we ended up with "all quantifiers produces arrays of match objects".

That's my recollection too. And I certainly agree with the decision, even though I proposed it the other way originally.

Damian

Reply via email to