This and other RFCs are available on the web at
http://dev.perl.org/rfc/
=head1 TITLE
Replace =~, !~, m//, and s/// with match() and subst()
=head1 VERSION
Maintainer: Nathan Wiger [EMAIL PROTECTED]
Date: 27 Aug 2000
Version: 1
Mailing List: [EMAIL PROTECTED]
Number: 164
=head1 ABSTRACT
Several people (including Larry) have expressed a desire to get rid of
C=~ and C!~. This RFC proposes a way to replace Cm// and Cs///
with two new builtins, Cmatch() and Csubst().
=head1 DESCRIPTION
=head2 Overview
Everyone knows how C=~ and C!~ work. Several proposals, such as RFCs
135 and 138, attempt to fix some stuff with the current pattern-matching
syntax. Most proposals center around minor modifications to Cm// and
Cs///.
This RFC proposes that Cm// and Cs/// be dropped from the language
altogether, and instead be replaced with new Cmatch and Csubst
builtins, with the following syntaxes:
$res = match /pattern/flags, $string
$new = subst /pattern/newpattern/flags, $string
These subs are designed to mirror the format of Csplit, making them
more consistent. Unlike the current forms, these return the modified
string, leaving C$string alone. (Unless they are called in a void
context, in which case they act on and modify C$_ consistent with
current behavior).
Extra arguments can be dropped, consistent with Csplit and many other
builtins:
match; # all defaults (pattern is /\w+/?)
match /pat/;# match $_
match /pat/, $str; # match $str
match /pat/, @strs; # match any of @strs
subst; # like s///, pretty useless :-)
subst /pat/new/;# sub on $_
subst /pat/new/, $str; # sub on $str
subst /pat/new/, @strs; # return array of modified strings
These new builtins eliminate the need for C=~ and C!~ altogether,
since they are functions just like Csplit, Cjoin, Csplice, and so
on.
Sometimes examples are easiest, so here are some examples of the new
syntax:
Perl 5 Perl 6
--
if ( /\w+/ ) { } if ( match ) { }
die "Bad!" if ( $_ !~ /\w+/ ); die "Bad!" if ( ! match );
($res) = m#^(.*)$#g; ($res) = match #^(.*)$#g;
next if /\s+/ || /\w+/; next if match /\s+/ or match /\w+/;
next if ($str =~ /\s+/) || next if match /\s+/, $str or
($str =~ /\w+/) match /\w+/, $str;
next unless $str =~ /^N/;next unless match /^N/, $str;
$str =~ s/\w+/$bob/gi; $str = subst /\w+/$bob/gi, $str;
($str = $_) =~ s/\d+/func/ge; $str = subst /\d+/func/ge;
s/\w+/this/; subst /\w+/this/;
# These are pretty cool...
foreach (@old) { @new = subst /hello/X/gi, @old;
s/hello/X/gi;
push @new, $_;
}
foreach (@str) { print "Got it" if match /\w+/, @str;
print "Got it" if (/\w+/);
}
This gives us a cleaner, more consistent syntax. In addition, it makes
several things easier, is more easily extensible:
callsomesub(subst(/old/new/gi, $mystr));
$str = subst /old/new/i, $r-getsomeval;
and is easier to read English-wise. However, it requires a little too
much typing. See below.
=head2 Concerns
This should be carefully considered. It's good because it gets rid of
"yet another odditty" with a more standard syntax that I would argue is
more powerful and consistent. However, it also causes everyone to
relearn how to match and substitute patterns. This must be a careful,
conscious decision, lest we really screw stuff up.
That being said, since my intial post I have received several personal
emails endorsing this, hence the reason I decided to RFC it. So it's an
option, it just has to be powerful enough for people to see the "big
win".
Finally, it requires a little too much typing still for my tastes.
Perhaps we should make "m" and "s" at least shortcuts to the names,
possibly allowing users to bind them to the front of the pattern
(similar to some of RFC 138's suggestions). Maybe these two could be
equivalent:
$new = subst /old/new/i, $old; ==$new = s/old/new/i, $old;
And then it doesn't look that radical anymore. This is similar to RFC
138, only C$old is not modified.
=head1 IMPLEMENTATION
Hold your horses
=head1 MIGRATION
This would be huge. Every pattern match would have to be translated,
every Perl hacker would have to relearn patterns, and every Perl 5
book's regexp section would be instantly out of date. Like I said, this
is not a simple decision. But if there's obvious increases in power, I
think people will appreciate the change, not dread it. At the very least
it makes Perl much more consistent.
=head1 REFERENCES
This is a synthesis of several ideas from myself, Ed Mills, and Tom C
RFC 138: Eliminate =~ operator.
RFC 135: Require explicit m on matches, even with ?? and // as
delimiters.