This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Behavior of empty regex should be simple =head1 VERSION Maintainer: Mark Dominus <[EMAIL PROTECTED]> Date: 24 August 2000 Last Modified: 27 August 2000 Version: 2 Mailing List: [EMAIL PROTECTED] Number: 144 =head1 ABSTRACT =head2 Standard Documentation According to L<perlop>: =over 4 =item m/PATTERN/cgimosx =item /PATTERN/cgimosx If the PATTERN evaluates to the empty string, the last successfully matched regular expression is used instead. =back This behavior should be changed. If the PATTERN is empty, Perl should look for the empty string. (That is, if the PATTERN is empty, it should always match.) =head1 DESCRIPTION Literal empty patterns, such as: $s =~ // ; are not the problem here. The real problem is that the special case is invoked for interpolated patterns also. For example, chomp($pat = <STDIN>); $s =~ /\Q$pat\E/; looks to see if $pat is a substring of $s, unless $pat is empty, in which case it matches $s against the last regex that was matched successfully. That regex might be far away, in some other module. If the far-away regex happened to contain backreference groups, the backreference variables will be set accordingly. To make this safe in Perl 5, the programmer has to write something peculiar like $s =~ /(?=)\Q$pat\E/; to ensure that the regex, after interpolation, is never empty. I propose that this 'last successful match' behavior be discarded entirely, and that an empty pattern always match the empty string. =head1 RATIONALE =head2 The Feature Was Not Useful, I The special behavior for empty patterns has never been particularly useful. For example, you could imagine code like this: for $pat (@patterns) { if ($a =~ /$pat/ && $b =~ //) { # do something } } This would be more efficient than the equivalent for $pat (@patterns) { if ($a =~ /$pat/ && $b =~ /$pat/) { # do something } } because $pat would be compiled only once per loop instead of twice. It is now more straightforward and efficient to do this sort of thing explicitly with the qr// operator: @patterns = map qr/$_/, @patterns; for $pat (@patterns) { if ($a =~ /$pat/ && $b =~ /$pat/) { # do something } } =head2 The Feature Was Not Useful, II People sometimes propose the following use for the empty pattern special case: They have a pattern, and many strings, and they want to see if every string matches the pattern. This code works, but is inefficient: sub match_all { my $pat = shift; for (@_) { return 0 unless /$pat/; } return 1; } This is because C</$pat/> must be recompiled for each string, or checked to see whether recompilation is necessary. This code does not work: sub match_all { my $pat = shift; for (@_) { return 0 unless /$pat/o; } return 1; } because C<$pat> changes with each call. One solution is to use 'eval' here to generate the pattern matching code (with C</o>) at run time. People have sometimes tried to use C<//> here, but usually without success. The idea is: sub match_all { my $pat = shift; # load $pat into 'last successfully matched' space for (@_) { return 0 unless //; } return 1; } The problem here is that there is no way to designate $pat as the last successfully matched regex without actually finding a string that matches it. In the past people attempting this strategy have appeared in C<comp.lang.perl.misc> asking how to find a string that matches a given regex. As far as I know, no useful solutions have been offered. (In fact, there may not be any such string. Consider the pattern C</a\bx/> for example.) A better, simpler solution to this problem is to use the C<qr> operator: sub match_all { my $pat = shift; $pat = qr($pat); for (@_) { return 0 unless /$pat/; } return 1; } =head2 This feature has resulted in bugs Any code that contains the innocent-looking if (/\Q$string\E/) { ... } is potentially booby-trapped. Such code is common. An example of this type appears in L<perlfaq6>. =head1 Alternatives Rather than eliminating the special case entirely, alternative changes are sometimes proposed. =head2 Empty pattern to mean 'last match' instead of 'last successful match' This behavior would be more useful than the current behavior and is sometimes proposed as an alternative. For example, the application discussed in the section 'The feature was not useful, II' above would be feasible if the empty pattern matched the last-matched pattern, because it would no longer be necessary to manufacture a matching string. But it is simpler and more straightforward to solve the problem with C<qr//>. Moreover, this alternative behavior retains all the drawbacks of the current behavior: It yields subtle and intermittent bugs and introduces strange action-at-a-distance effects. =head2 Retain special behavior for literal empty pattern? In the past it has been proposed that the special behavior be discarded for patterns with interpolated variables, but retained for empty patterns that appear literally in the source. C</\Q$string\E/> would look for $string, regardless of whether $string was empty, becayse C</\Q$string\E/> is not a I<literal> empty pattern. But $s =~ // ; would still retain the special behavior and match $s against whatever was the last pattern to be successfully matched. Since the feature is of marginal usefulness (especially now that C<qr//> is available) it should be eliminated anyway to reduce code and documentation bloat. However, see C<Translation Issues> below. =head1 IMPLEMENTATION No special implementation is necessary. =head1 TRANSLATION ISSUES Old Perl 5 scripts that may depend on this feature may be hard to translate unless the feature is implemented in Perl 6 also. If the literal empty pattern were to retain the special behavior, then Perl 5 code like this: if (/$FOO/) { ... } could be translated to Perl 6 code like this: if (do { my $_tmp = $FOO; ($_tmp eq '') ? // : /$_tmp/ }) { ... } However, it's not clear that such translation is actually desirable. =head1 REFERENCES L<perlop> discussion of empty patterns, quoted above.