This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Replace =~, !~, m//, and s/// with match() and subst()

=head1 VERSION

   Maintainer: Nathan Wiger <[EMAIL PROTECTED]>
   Date: 27 Aug 2000
   Version: 1
   Mailing List: [EMAIL PROTECTED]
   Number: 164

=head1 ABSTRACT

Several people (including Larry) have expressed a desire to get rid of
C<=~> and C<!~>. This RFC proposes a way to replace C<m//> and C<s///>
with two new builtins, C<match()> and C<subst()>. 

=head1 DESCRIPTION

=head2 Overview

Everyone knows how C<=~> and C<!~> work. Several proposals, such as RFCs
135 and 138, attempt to fix some stuff with the current pattern-matching
syntax. Most proposals center around minor modifications to C<m//> and
C<s///>.

This RFC proposes that C<m//> and C<s///> be dropped from the language
altogether, and instead be replaced with new C<match> and C<subst>
builtins, with the following syntaxes:

   $res = match /pattern/flags, $string
   $new = subst /pattern/newpattern/flags, $string

These subs are designed to mirror the format of C<split>, making them
more consistent. Unlike the current forms, these return the modified
string, leaving C<$string> alone. (Unless they are called in a void
context, in which case they act on and modify C<$_> consistent with
current behavior).

Extra arguments can be dropped, consistent with C<split> and many other
builtins:

   match;                  # all defaults (pattern is /\w+/?)
   match /pat/;            # match $_
   match /pat/, $str;      # match $str
   match /pat/, @strs;     # match any of @strs

   subst;                  # like s///, pretty useless :-)
   subst /pat/new/;        # sub on $_
   subst /pat/new/, $str;  # sub on $str
   subst /pat/new/, @strs; # return array of modified strings
 
These new builtins eliminate the need for C<=~> and C<!~> altogether,
since they are functions just like C<split>, C<join>, C<splice>, and so
on.

Sometimes examples are easiest, so here are some examples of the new
syntax:

   Perl 5                           Perl 6
   -------------------------------- ----------------------------------
   if ( /\w+/ ) { }                 if ( match ) { }
   die "Bad!" if ( $_ !~ /\w+/ );   die "Bad!" if ( ! match ); 
   ($res) = m#^(.*)$#g;             ($res) = match #^(.*)$#g;

   next if /\s+/ || /\w+/;          next if match /\s+/ or match /\w+/;
   next if ($str =~ /\s+/) ||       next if match /\s+/, $str or 
           ($str =~ /\w+/)                  match /\w+/, $str;
   next unless $str =~ /^N/;        next unless match /^N/, $str;
   
   $str =~ s/\w+/$bob/gi;           $str = subst /\w+/$bob/gi, $str;
   ($str = $_) =~ s/\d+/&func/ge;   $str = subst /\d+/&func/ge;
   s/\w+/this/;                     subst /\w+/this/;             

   # These are pretty cool...   
   foreach (@old) {                 @new = subst /hello/X/gi, @old;
      s/hello/X/gi;
      push @new, $_;
   }

   foreach (@str) {                 print "Got it" if match /\w+/, @str;
      print "Got it" if (/\w+/);
   }

This gives us a cleaner, more consistent syntax. In addition, it makes
several things easier, is more easily extensible:

   &callsomesub(subst(/old/new/gi, $mystr));
   $str = subst /old/new/i, $r->getsomeval;

and is easier to read English-wise. However, it requires a little too
much typing. See below.

=head2 Concerns

This should be carefully considered. It's good because it gets rid of
"yet another odditty" with a more standard syntax that I would argue is
more powerful and consistent. However, it also causes everyone to
relearn how to match and substitute patterns. This must be a careful,
conscious decision, lest we really screw stuff up.

That being said, since my intial post I have received several personal
emails endorsing this, hence the reason I decided to RFC it. So it's an
option, it just has to be powerful enough for people to see the "big
win".

Finally, it requires a little too much typing still for my tastes.
Perhaps we should make "m" and "s" at least shortcuts to the names,
possibly allowing users to bind them to the front of the pattern
(similar to some of RFC 138's suggestions). Maybe these two could be
equivalent:

    $new = subst /old/new/i, $old;   ==    $new = s/old/new/i, $old;

And then it doesn't look that radical anymore. This is similar to RFC
138, only C<$old> is not modified.

=head1 IMPLEMENTATION

Hold your horses

=head1 MIGRATION

This would be huge. Every pattern match would have to be translated,
every Perl hacker would have to relearn patterns, and every Perl 5
book's regexp section would be instantly out of date. Like I said, this
is not a simple decision. But if there's obvious increases in power, I
think people will appreciate the change, not dread it. At the very least
it makes Perl much more consistent.

=head1 REFERENCES

This is a synthesis of several ideas from myself, Ed Mills, and Tom C

RFC 138: Eliminate =~ operator. 

RFC 135: Require explicit m on matches, even with ?? and // as
delimiters.

Reply via email to