Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()
Nathan Torkington wrote: > > Hmm. This is exactly the same situation as with chomp() and somehow > chomp() can tell the difference between: > > $_ = "hi\n"; > chomp; > > and > > @strings = (); > chomp @strings; Good point. I was looking at it from the general "What's wrong with how @arrays are parsed as arguments?" standpoint, not from a "How can we fix this specific function?" standpoint. > But chomp seems to use @ as its indicator. You can't say: > > $_ = $a = "hi\n"; > chomp $_, $a; > > If it sees that $, it figures its chomp SCALAR. > > I'm unsure if this is adequate for match, but it might be. Maybe. Behavior like chomp() is what we're looking for, so on ths surface this seems to work. But people might also want to do: match /string/, $one, $two, $three; However, being able to take @ or $;... seems like a possibility. In fact, chomp not doing this might be a "bug". > >2. I don't think it's even closely tied to this RFC itself. > > This is the mindset that worries me: every edge case needs another > RFC. Look to what's already in Perl: does anything else behave like > this? How does it get around it? Can we co-opt the way it works? Fair enough. Again, I was looking at it from a generalist standpoint. -Nate
Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()
Nathan Wiger writes: > Honestly, not sure. Although, there are two things I'd say about it: > >1. I don't think it's a showstopper for this RFC, since the > feature you are addressing is actually a new piece of > functionality. Hmm. This is exactly the same situation as with chomp() and somehow chomp() can tell the difference between: $_ = "hi\n"; chomp; and @strings = (); chomp @strings; But chomp seems to use @ as its indicator. You can't say: $_ = $a = "hi\n"; chomp $_, $a; If it sees that $, it figures its chomp SCALAR. I'm unsure if this is adequate for match, but it might be. >2. I don't think it's even closely tied to this RFC itself. This is the mindset that worries me: every edge case needs another RFC. Look to what's already in Perl: does anything else behave like this? How does it get around it? Can we co-opt the way it works? Nat
Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()
Nathan Torkington wrote: > > When I was thinking about this very topic yesterday and today, I > came up with this problem: > > @strs = (); > match /pat/, @strs; # surprise! I'm matching on $_ > > That is, how do you tell an empty array from no arguments? Easy: We'll just use lazy evaluation or some other magic. *snicker* :-) Honestly, not sure. Although, there are two things I'd say about it: 1. I don't think it's a showstopper for this RFC, since the feature you are addressing is actually a new piece of functionality. 2. I don't think it's even closely tied to this RFC itself. Not being able to tell an empty @array apart from no arguments is a significant problem right now in Perl. I've always viewed it as such. It would be really nice if we were able to tell we got a null @array argument somehow, but I'm not sure how. Sounds like an RFC... ;-) -Nate
Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()
Perl6 RFC Librarian writes: >match; # all defaults (pattern is /\w+/?) >match /pat/;# match $_ >match /pat/, $str; # match $str >match /pat/, @strs; # match any of @strs When I was thinking about this very topic yesterday and today, I came up with this problem: @strs = (); match /pat/, @strs; # surprise! I'm matching on $_ That is, how do you tell an empty array from no arguments? I have a horrible suspicion everyone is going to reach for lazy evaluation and other magic. Nat
Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()
> =head1 TITLE > > Replace =~, !~, m//, and s/// with match() and subst() In a marked oversight, I'd also like to note that tr// would be replaced with trade: Perl 5 Perl 6 --- -- $str =~ tr/a/b/;$new = trade /a/b/, $str; tr/a/b/;trade /a/b/; This will be reflected in v2. However, it should be fairly obvious how this fits in with the others. I know 'tr' is really 'translate', but that's too long and it looks like 'trans' is going to be taken up by Transactional Variables (RFC 130). 'trade' connotes what is happening pretty accurately, I think. -Nate
Re: RFC 110 (v2) counting matches
On 27 Aug 2000 19:01:45 -, Perl6 RFC Librarian wrote: >m//g just returns 1 for matching. Er... but in a scalar context, m//g DOES only match once! If you want more, repeat the match. Or use it in a list context, then it will try to match them all. $_ = "abaabbbababbbabbaaa"; while(/(b+)/g) { print "Got a '$1'\n"; } --> Got a 'b' Got a 'bbb' Got a 'b' Got a 'bbb' Got a 'bb' Let's try again: $_ = "abaabbbababbbabbaaa"; print scalar(() = /b+/g); --> 5 Is that what you're after? -- Bart.
RFC 166 (v1) Additions to regexs
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Additions to regexs =head1 VERSION Maintainer: Richard Proctor <[EMAIL PROTECTED]> Date: 27 Aug 2000 Mailing List: [EMAIL PROTECTED] Version: 1 Number: 166 =head1 ABSTRACT This is a set of minor enhancements to regexs that I thought up on a long plane ride. All these can be done now, I would just like to make them easier. =head1 DESCRIPTION These are a set of minor enhancements to regexes, they are largely independant. =head2 Alternative Lists from arrays (?@foo) is sort of equivalent to (??{join('|',@foo)}), ie it expands into a list of alternatives. One could possible use just @foo, for this. If @foo contained special characters you might want to \Quote each item. (?Q@foo) is sort of equivalent to (??{join('|', map quotemeta, @foo)}) =head2 Matching Not a pattern (?^pattern) matches anything that does not match the pattern. On its own, one can use !~ etc to negatively match patterns, but to match a pattern that has foo(anything but not baz)bar is currently difficult. With this syntax it would simply be /foo(?^baz)bar/. =head2 A disambiguator (?) is a null element in a pattern, that can be used to split elements that might otherwise be confused, it has no effect otherwise, it matches nothing. If you have a variable $foo, then matching $foobar would look for the variable $foobar, when you actually meant to look for $foo then "bar". This allows the user to simply write $foo(?)bar. (Yes I know this can be written other ways but this is a simple example). =head1 IMPLENTATION No Idea =head1 REFERENCES None yet
RFC 165 (v1) Allow Varibles in tr///
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Allow Varibles in tr/// =head1 VERSION Maintainer: Richard Proctor <[EMAIL PROTECTED]> Date: 27 Aug 2000 Mailing List: [EMAIL PROTECTED] Version: 1 Number: 165 =head1 ABSTRACT Allow variables in a tr///. At present the only way to do a tr/$foo/$bar/ is to wrap it up in an eval. I dont like using evals for this sort of thing. =head1 DESCRIPTION Suggested syntax: tr/$foo/$bar/e With a /e, tr will expand both the LHS and RHS of the translate function. Either or both could be variables. I am suggesting /e as it is sort of like /e for s///e. =head1 IMPLENTATION No idea, but should be straight forward. =head1 REFERENCES None yet.
RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Replace =~, !~, m//, and s/// with match() and subst() =head1 VERSION Maintainer: Nathan Wiger <[EMAIL PROTECTED]> Date: 27 Aug 2000 Version: 1 Mailing List: [EMAIL PROTECTED] Number: 164 =head1 ABSTRACT Several people (including Larry) have expressed a desire to get rid of C<=~> and C. This RFC proposes a way to replace C and C with two new builtins, C and C. =head1 DESCRIPTION =head2 Overview Everyone knows how C<=~> and C work. Several proposals, such as RFCs 135 and 138, attempt to fix some stuff with the current pattern-matching syntax. Most proposals center around minor modifications to C and C. This RFC proposes that C and C be dropped from the language altogether, and instead be replaced with new C and C builtins, with the following syntaxes: $res = match /pattern/flags, $string $new = subst /pattern/newpattern/flags, $string These subs are designed to mirror the format of C, making them more consistent. Unlike the current forms, these return the modified string, leaving C<$string> alone. (Unless they are called in a void context, in which case they act on and modify C<$_> consistent with current behavior). Extra arguments can be dropped, consistent with C and many other builtins: match; # all defaults (pattern is /\w+/?) match /pat/;# match $_ match /pat/, $str; # match $str match /pat/, @strs; # match any of @strs subst; # like s///, pretty useless :-) subst /pat/new/;# sub on $_ subst /pat/new/, $str; # sub on $str subst /pat/new/, @strs; # return array of modified strings These new builtins eliminate the need for C<=~> and C altogether, since they are functions just like C, C, C, and so on. Sometimes examples are easiest, so here are some examples of the new syntax: Perl 5 Perl 6 -- if ( /\w+/ ) { } if ( match ) { } die "Bad!" if ( $_ !~ /\w+/ ); die "Bad!" if ( ! match ); ($res) = m#^(.*)$#g; ($res) = match #^(.*)$#g; next if /\s+/ || /\w+/; next if match /\s+/ or match /\w+/; next if ($str =~ /\s+/) || next if match /\s+/, $str or ($str =~ /\w+/) match /\w+/, $str; next unless $str =~ /^N/;next unless match /^N/, $str; $str =~ s/\w+/$bob/gi; $str = subst /\w+/$bob/gi, $str; ($str = $_) =~ s/\d+/&func/ge; $str = subst /\d+/&func/ge; s/\w+/this/; subst /\w+/this/; # These are pretty cool... foreach (@old) { @new = subst /hello/X/gi, @old; s/hello/X/gi; push @new, $_; } foreach (@str) { print "Got it" if match /\w+/, @str; print "Got it" if (/\w+/); } This gives us a cleaner, more consistent syntax. In addition, it makes several things easier, is more easily extensible: &callsomesub(subst(/old/new/gi, $mystr)); $str = subst /old/new/i, $r->getsomeval; and is easier to read English-wise. However, it requires a little too much typing. See below. =head2 Concerns This should be carefully considered. It's good because it gets rid of "yet another odditty" with a more standard syntax that I would argue is more powerful and consistent. However, it also causes everyone to relearn how to match and substitute patterns. This must be a careful, conscious decision, lest we really screw stuff up. That being said, since my intial post I have received several personal emails endorsing this, hence the reason I decided to RFC it. So it's an option, it just has to be powerful enough for people to see the "big win". Finally, it requires a little too much typing still for my tastes. Perhaps we should make "m" and "s" at least shortcuts to the names, possibly allowing users to bind them to the front of the pattern (similar to some of RFC 138's suggestions). Maybe these two could be equivalent: $new = subst /old/new/i, $old; ==$new = s/old/new/i, $old; And then it doesn't look that radical anymore. This is similar to RFC 138, only C<$old> is not modified. =head1 IMPLEMENTATION Hold your horses =head1 MIGRATION This would be huge. Every pattern match would have to be translated, every Perl hacker would have to relearn patterns, and every Perl 5 book's regexp section would be instantly out of date. Like I said, this is not a simple decision. But if there's obvious increases in power, I think people will appreciate the change, not dread it. At the very least it makes Perl much more consistent. =head1 REFERENCES This is a synthesis of several ideas from myself, Ed Mills, and Tom C RFC 138: Eliminate =~ operator. RFC 135: Require explicit m on matches, even with ?? and // as delimiters.
RFC 144 (v2) Behavior of empty regex should be simple
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Behavior of empty regex should be simple =head1 VERSION Maintainer: Mark Dominus <[EMAIL PROTECTED]> Date: 24 August 2000 Last Modified: 27 August 2000 Version: 2 Mailing List: [EMAIL PROTECTED] Number: 144 =head1 ABSTRACT =head2 Standard Documentation According to L: =over 4 =item m/PATTERN/cgimosx =item /PATTERN/cgimosx If the PATTERN evaluates to the empty string, the last successfully matched regular expression is used instead. =back This behavior should be changed. If the PATTERN is empty, Perl should look for the empty string. (That is, if the PATTERN is empty, it should always match.) =head1 DESCRIPTION Literal empty patterns, such as: $s =~ // ; are not the problem here. The real problem is that the special case is invoked for interpolated patterns also. For example, chomp($pat = ); $s =~ /\Q$pat\E/; looks to see if $pat is a substring of $s, unless $pat is empty, in which case it matches $s against the last regex that was matched successfully. That regex might be far away, in some other module. If the far-away regex happened to contain backreference groups, the backreference variables will be set accordingly. To make this safe in Perl 5, the programmer has to write something peculiar like $s =~ /(?=)\Q$pat\E/; to ensure that the regex, after interpolation, is never empty. I propose that this 'last successful match' behavior be discarded entirely, and that an empty pattern always match the empty string. =head1 RATIONALE =head2 The Feature Was Not Useful, I The special behavior for empty patterns has never been particularly useful. For example, you could imagine code like this: for $pat (@patterns) { if ($a =~ /$pat/ && $b =~ //) { # do something } } This would be more efficient than the equivalent for $pat (@patterns) { if ($a =~ /$pat/ && $b =~ /$pat/) { # do something } } because $pat would be compiled only once per loop instead of twice. It is now more straightforward and efficient to do this sort of thing explicitly with the qr// operator: @patterns = map qr/$_/, @patterns; for $pat (@patterns) { if ($a =~ /$pat/ && $b =~ /$pat/) { # do something } } =head2 The Feature Was Not Useful, II People sometimes propose the following use for the empty pattern special case: They have a pattern, and many strings, and they want to see if every string matches the pattern. This code works, but is inefficient: sub match_all { my $pat = shift; for (@_) { return 0 unless /$pat/; } return 1; } This is because C must be recompiled for each string, or checked to see whether recompilation is necessary. This code does not work: sub match_all { my $pat = shift; for (@_) { return 0 unless /$pat/o; } return 1; } because C<$pat> changes with each call. One solution is to use 'eval' here to generate the pattern matching code (with C) at run time. People have sometimes tried to use C here, but usually without success. The idea is: sub match_all { my $pat = shift; # load $pat into 'last successfully matched' space for (@_) { return 0 unless //; } return 1; } The problem here is that there is no way to designate $pat as the last successfully matched regex without actually finding a string that matches it. In the past people attempting this strategy have appeared in C asking how to find a string that matches a given regex. As far as I know, no useful solutions have been offered. (In fact, there may not be any such string. Consider the pattern C for example.) A better, simpler solution to this problem is to use the C operator: sub match_all { my $pat = shift; $pat = qr($pat); for (@_) { return 0 unless /$pat/; } return 1; } =head2 This feature has resulted in bugs Any code that contains the innocent-looking if (/\Q$string\E/) { ... } is potentially booby-trapped. Such code is common. An example of this type appears in L. =head1 Alternatives Rather than eliminating the special case entirely, alternative changes are sometimes proposed. =head2 Empty pattern to mean 'last match' instead of 'last successful match' This behavior would be more useful than the current behavior and is sometimes proposed as an alternative. For example, the application discussed in the section 'The feature was not useful, II' above would be feasible if the empty pattern matched the last-matched pattern, because it would no longer be necessary to manufacture a matching stri
Re: RFC 112 (v2) Assignment within a regex
>if (/Time: (..):(..):(..)/) { > $hours = $1; > $minutes = $2; > $seconds = $3; > } > > This then becomes: > > /Time: (?$hours=..):(?$minutes=..):(?$seconds=..)/ > > This is more maintainable than counting the brackets and easier to understand > for a complex regex. And one does not have to worry about the scope of $1 etc. This is probably one of the coolest RFC's I've seen so far. :-) One question: How are these scoped? Are they lexicals? Global dynamics? What if you want to change the scoping? This is the only catch I see. Maybe requiring, under 'use strict': my($hours, $minutes, $seconds); /Time: (?$hours=..):(?$minutes=..):(?$seconds=..)/ Input? -Nate
RFC 110 (v2) counting matches
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE counting matches =head1 VERSION Maintainer: Richard Proctor <[EMAIL PROTECTED]> Date: 16 Aug 2000 Last Modified: 27 Aug 2000 Version: 2 Mailing List: [EMAIL PROTECTED] Number: 110 =head1 ABSTRACT Provide a simple way of giving a count of matches of a pattern. =head1 CHANGES Version 2 of this RFC redirects discussion of this topic to [EMAIL PROTECTED] =head1 DESCRIPTION Have you ever wanted to count the number of matches of a patten? s///g returns the number of matches it finds. m//g just returns 1 for matching. Counts can be made using s//$&/g but this is wastefull, or by putting some counting loop round a m//g. But this all seams rather messy. m//gt would be defined to do the match, and return the count of matches, this leaves all existing uses consistent and unaffected. /t is suggested for "counT", as /c is already taken. Using /t without /g would be result in only 0 or 1 being returned, which is nearly the existing syntax. (Note I am only on the announce list at present as I am suffering from negative free time). =head1 IMPLENTATION No idea =head1 REFERENCES I brought this up on p5p a couple of years ago, but it was lost in the noise...
RFC 112 (v2) Assignment within a regex
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Assignment within a regex =head1 VERSION Maintainer: Richard Proctor <[EMAIL PROTECTED]> Date: 16 Aug 2000 Date: 27 Aug 2000 Version: 2 Mailing List: [EMAIL PROTECTED] Number: 112 =head1 ABSTRACT Provide a simple way of naming and picking out information from a regex without having to count the brackets. =head1 CHANGES Version 2 of this RFC redirects discussion of this topic to [EMAIL PROTECTED] =head1 DESCRIPTION If a regex is complex, counting the bracketed sub-expressions to find the ones you wish to pick out can be messy. It is also prone to maintainability problems if and when you wish to add to the expression. Using (?:) can be used to surpress picking up brackets, it helps, but it still gets "complex". I would sometimes rather just pickout the bits I want within the regex itself. Suggested syntax: (?$foo= ... ) would assign the string that is matched by the patten ... to $foo when the patten matches. These assignments would be made left to right after the match has succeded but before processing a replacement or other results. There may be whitespace between the $foo and the "=". This would not give the backrefs \1 etc that come with conventional bracketed sub expressions, I don't think this would be a problem. Potentially the $foo could be any scalar LHS, as in (?$foo{$bar}= ... )!, likewise the '=' could be any asignment operator. The camel and the docs include this example: if (/Time: (..):(..):(..)/) { $hours = $1; $minutes = $2; $seconds = $3; } This then becomes: /Time: (?$hours=..):(?$minutes=..):(?$seconds=..)/ This is more maintainable than counting the brackets and easier to understand for a complex regex. And one does not have to worry about the scope of $1 etc. (Note I am only on the announce list at present as I am suffering from negative free time). =head1 IMPLENTATION No idea =head1 REFERENCES I brought this up on p5p a couple of years ago, but it was lost in the noise...