s' modifier

Perl6 RFC Librarian Thu, 28 Sep 2000 13:51:12 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Regex: Make /$/ equivalent to /\z/ under the '/s' modifier

=head1 VERSION

  Maintainer: Bart Lateur <[EMAIL PROTECTED]>
  Date: 28 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 332
  Version: 1
  Status: Developing

=head1 ABSTRACT

To most Perlers, /$/ in a regex simply means "end of string". This is
only right, if you're absolutely sure your string doesn't end in a
newline, as is commonly the case in a large part of all textual data:
ordinary strings don't contain newlines. Lines coming from text files
can only contain a newline as the very last character. The '/s' modifier
is usually only used in combination with the former class of textual
data.

However, this situation is basically a bug hole.

This RFC proposes to change the '/s' modifier so that under '/s', /$/
will only match at the very end of a string, and not also before a
newline at the end of the string.

=head1 DESCRIPTION

To most Perl programmers, /^foo$/ is a regex that can only match the
string "foo". It's not, actually: it can match "foo\n", too. This
assumption is usually safe, because people know the kind of data they're
dealing with, and they "know" that it won't ever end in a newline.

However, this basically is a chance for bugs to creep in, if for some
reason this assumption about the data no longer holds.

To make matters worse, Perl doesn't even have a mechanism to prevent the
regex engine from matching /$/ at just before the last character if it's
a newline.

Originally, we had thought of adding Yet Another Regex Modifier; but to
be honest, having 2 modifiers just for the newline is already confusing
enough, for too many people. A third is definitely out.

Therefore, the proposal is instead to modify the behaviour of the '/s'
modifier.

Under '/s':

=over 2

=item *

/./ can match any character, including newline;

=item *

/$/ can match only at the very end of the string, not also in front of a
last character, if it happens to be a newline.

=back

This seems simple enough.

=head1 CONSIDERATIONS

=head2 Mnemonic value of '/s'

'/s' originally stood for "single line". This can no longer be true, the
mnemonic value of the "s" is thereby reduced to zero.

However, the mnemonic value wasn't that great to begin with, especially
if you consider that combining '/s' and '/m' is not only possible, but a
useful option, too. How can a string both be a single line and
multiline, at the same time?

So, to most Perl programmers, '/s' simply stands for

=over 2

=item

let /./ match a newline too

=back

which now gets turned into:

=over 2

=item

treat "\n" as an ordinary character

=back

The change isn't that big, so it is just as easy to remember. Or not.

=head2 The $* variable

'/s' and '/m' also have a lesser known side effect: they both override
the setting of the $* special variable, which controls multiline related
behaviour in regexes.

Use of this special variable has already been deprecated at least since
Perl5 first came out, more than 5 years ago. It is a very good candidate
to be removed from Perl6 altogether, which would result in fewer
gotcha's in the language. That is a Good Thing.

Perlvar says:

    Use of `$*' is deprecated in modern Perl, supplanted by the `/s'
    and `/m' modifiers on pattern matching.


Therefore, any changing behaviour of '/s', with regards to $*, can
nowadays hardly be considered relevant, any more.

=head2 Getting the old behaviour back

You can't. Question is: do you really want to?

=over 2

=item *

If you know your data can contain newlines, and you want to treat them
as ordinary characters, you probably don't want to make an exception for
a trailing newline, anyway.

=item *

If you still want to ignore a trailing newline in the regex, you can
either adjust your regex so that it contains /\n?$/ or something like
it, instead of plain /$/; or you can chomp() your data, before doing the
match.

=item * 

And finally, there's still the option for simply not using '/s', and all
things will remain as they were before.  ;-)

=back

=head2 '/ms': combined '/m' and '/s'

'/ms' still works as before. Internally, '/m' has taken over the job  of
matching before a newline at the end of the string, simply because /$/m
can match before I<every> newline.

=head1 MIGRATION

It's not unlikely that currently having /$/ in your regexes, is actually
a bug in your script, but you don't care because the data won't ever
make it visible.

Therefore, I think it is not desirable to have the Perl5 To Perl6
converter actually change your source code. A warning if /$/ is found in
combination with a bare '/s' modifier, not combined with '/m', is
probably all that is wanted.

=head1 IMPLEMENTATION

Under '/s', make '$' behave as /\z/ does now.

=head1 REFERENCES

perlre, about '/s' and '/m'

perlvar, section about $*
RFC 332 (v1) Regex: Make /$/ equivalent to /\z/ under the '/s' modifier

Reply via email to