This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Behavior of empty regex should be simple

=head1 VERSION

  Maintainer: Mark Dominus <[EMAIL PROTECTED]>
  Date: 24 August 2000
  Last Modified: 27 August 2000
  Version: 2
  Mailing List: [EMAIL PROTECTED]
  Number: 144

=head1 ABSTRACT

=head2 Standard Documentation

According to L<perlop>:

=over 4

=item  m/PATTERN/cgimosx

=item /PATTERN/cgimosx

If the PATTERN evaluates to the empty string, the last successfully
matched regular expression is used instead.

=back

This behavior should be changed.  If the PATTERN is empty, Perl should
look for the empty string.  (That is, if the PATTERN is empty, it
should always match.)

=head1 DESCRIPTION

Literal empty patterns, such as:

        $s =~ // ;

are not the problem here.  The real problem is that the special case
is invoked for interpolated patterns also.  For example,

        chomp($pat = <STDIN>);
        $s =~ /\Q$pat\E/;

looks to see if $pat is a substring of $s, unless $pat is empty, in
which case it matches $s against the last regex that was matched
successfully.  That regex might be far away, in some other module.
If the far-away regex happened to contain backreference groups, the
backreference variables will be set accordingly.

To make this safe in Perl 5, the programmer has to write something
peculiar like

        $s =~ /(?=)\Q$pat\E/;

to ensure that the regex, after interpolation, is never empty.

I propose that this 'last successful match' behavior be discarded
entirely, and that an empty pattern always match the empty string.

=head1 RATIONALE

=head2 The Feature Was Not Useful, I

The special behavior for empty patterns has never been particularly
useful.  For example, you could imagine code like this:

        for $pat (@patterns) {
          if ($a =~ /$pat/ && $b =~ //) {
            # do something 
          }
        }

This would be more efficient than the equivalent
            
        for $pat (@patterns) {
          if ($a =~ /$pat/ && $b =~ /$pat/) {
            # do something 
          }
        }

because $pat would be compiled only once per loop instead of twice.
It is now more straightforward and efficient to do this sort of thing
explicitly with the qr// operator:

        @patterns = map qr/$_/, @patterns;
        for $pat (@patterns) {
          if ($a =~ /$pat/ && $b =~ /$pat/) {
            # do something 
          }
        }


=head2 The Feature Was Not Useful, II

People sometimes propose the following use for the empty pattern
special case:  They have a pattern, and many strings, and they want to
see if every string matches the pattern.  This code works, but is
inefficient:

        sub match_all {
          my $pat = shift;
          for (@_) {
            return 0 unless /$pat/;
          }
          return 1; 
        }

This is because C</$pat/> must be recompiled for each string, or checked
to see whether recompilation is necessary.

This code does not work:

        sub match_all {
          my $pat = shift;
          for (@_) {
            return 0 unless /$pat/o;
          }
          return 1; 
        }

because C<$pat> changes with each call.

One solution is to use 'eval' here to generate the pattern matching
code (with C</o>) at run time.

People have sometimes tried to use C<//> here, but usually without
success.  The idea is:

        sub match_all {
          my $pat = shift;
          # load $pat into 'last successfully matched' space
          for (@_) {
            return 0 unless //;
          }
          return 1;
        }

The problem here is that there is no way to designate $pat as the last
successfully matched regex without actually finding a string that
matches it.  In the past people attempting this strategy have appeared
in C<comp.lang.perl.misc> asking how to find a string that matches a
given regex.  As far as I know, no useful solutions have been offered.
(In fact, there may not be any such string.  Consider the pattern
C</a\bx/> for example.)

A better, simpler solution to this problem is to use the C<qr> operator:

        sub match_all {
          my $pat = shift;
          $pat = qr($pat);
          for (@_) {
            return 0 unless /$pat/;
          }
          return 1; 
        }

=head2 This feature has resulted in bugs

Any code that contains the innocent-looking

        if (/\Q$string\E/) {
          ...
        }

is potentially booby-trapped.  Such code is common.  An example of
this type appears in L<perlfaq6>.


=head1 Alternatives

Rather than eliminating the special case entirely, alternative changes
are sometimes proposed.

=head2 Empty pattern to mean 'last match' instead of 'last successful match'

This behavior would be more useful than the current behavior and is
sometimes proposed as an alternative.  

For example, the application discussed in the section 'The feature was
not useful, II' above would be feasible if the empty pattern matched
the last-matched pattern, because it would no longer be necessary to
manufacture a matching string.  But it is simpler and more
straightforward to solve the problem with C<qr//>.

Moreover, this alternative behavior retains all the drawbacks of the
current behavior: It yields subtle and intermittent bugs and
introduces strange action-at-a-distance effects.


=head2 Retain special behavior for literal empty pattern?

In the past it has been proposed that the special behavior be
discarded for patterns with interpolated variables, but retained for
empty patterns that appear literally in the source.  C</\Q$string\E/>
would look for $string, regardless of whether $string was empty,
becayse C</\Q$string\E/> is not a I<literal> empty pattern.  But

        $s =~ // ;

would still retain the special behavior and match $s against whatever
was the last pattern to be successfully matched.

Since the feature is of marginal usefulness (especially now that
C<qr//> is available) it should be eliminated anyway to reduce code
and documentation bloat.  However, see C<Translation Issues> below.

=head1 IMPLEMENTATION

No special implementation is necessary.  

=head1 TRANSLATION ISSUES

Old Perl 5 scripts that may depend on this feature may be hard to
translate unless the feature is implemented in Perl 6 also.

If the literal empty pattern were to retain the special behavior, then
Perl 5 code like this:

        if (/$FOO/) {
          ...
        }

could be translated to Perl 6 code like this:

        if (do { my $_tmp = $FOO; ($_tmp eq '') ? // : /$_tmp/ }) {
          ...
        }

However, it's not clear that such translation is actually desirable.

=head1 REFERENCES

L<perlop> discussion of empty patterns, quoted above.



Reply via email to