On Thu, Oct 20, 2005 at 09:14:15PM -0400, John Adams wrote:
> From: Luke Palmer <[EMAIL PROTECTED]>
>
> > But $1 in Perl 5 wasn't the same as $1 in a shell script.
>
> I'm all for breaking things that need breaking, which is why I
> keep my mouth shut most of the time--either I see the reason or
> I suspect (that is, take on faith, which is okay by me) there's
> a reason I don't see or fully understand. I'm just not seeing a
> compelling reason for this one, and a pretty good reason not to do it:
I can state the compelling reason for this one -- it's way too
confusing when $1, $2, $3, etc. correspond to $/[0], $/[1], $/[2], etc.
In many discussions of capturing semantics earlier in the year,
nearly everyone using $1, $2, $3 in examples, documentation, and
discussion was having trouble with off-by-one errors. This includes
the language designers, and even those who were advocating staying
with $1, $2, $3. Once we switched to using $0, $1, $2, etc.,
nearly all of the confusion and mistakes disappeared.
> I'm not aware offhand of any other place where $0 is used in
> regex matching, and several of the languages which you point out
> are zero-based in other places are not zero-based in regex matching.
Yes, but none of those other regex matching languages do nested
captures either. In particular, a rule like:
/:w ( (\w+) = (\d+) ; )+ /
no longer captures to $1, $2, $3, or even to $0, $1, $2. It now
creates an array in $/[0] (aka $0), and each element of that array
contains a [0] and [1] index representing the second and third set of
parentheses in the rule. That is
"a=4; b=2; c=8;" ~~ /:w ( (\w+) = (\d+) ; )+ /
results in
$/[0][0][0] == 'a' $/[0][0][1] == '4'
$/[0][1][0] == 'b' $/[0][1][1] == '2'
$/[0][2][0] == 'c' $/[0][2][1] == '8'
Trying to make *all* of these indexes 1-based leads to
chaos (especially wrt array assignment), and saying that top
level parens in a rule are named $1, $2, $3, ... while nested parens
are named [0], [1], [2], ... just throws everything and
everyone off. It's *much* easier when everything is zero-based,
even for those who are used to using $1, $2, $3 in regular
expressions.
Pm