Re: RFC 332 (v1) Regex: Make /$/ equivalent to /\z/ under the '/s' modifier
Is $$ the only alternative, or did I miss more? I don't think I've even seen this $$ mentioned before? $$ is not a suitable alternative. It already means the current process ID. It really cannot be messed with. And ${$} is identical to $$ by definition. I still like the idea of $$, as I described it in the original thread. I've seen no comments for or against at this time. See above. I can't see how yet another alternative, /$$/, is any better than what we have now: /\z/. I agree. If it's more alternatives we're after, just have the person write a custom regex. The idea is to make Perl do the right thing, whatever that may be. The big problem with changing $, as you note, is for people that need to catch multiple instances in a string: $string = "Hello\nGoodbye\nHello\nHello\n"; $string =~ s/Hello$/Goodbye/gm; Without $, you can workaround this like so: $string =~ s/Hello\n/Goodbye\n/gm; My suggestion would be: 1. Make $ exactly always match just before the last \n, as the RFC suggests. 2. Introduce some new \X switch that does what $ does currently if it's deemed necessary. We're back to new alternatives again, but the one thing this buys you is a $ that works consistently. I don't think many people need $'s current functionality, and those that do can have an new \X. -Nate
Re: RFC 331 (v1) Consolidate the $1 and C\1 notations
=item * C\1 goes away as a special form =item * $1 means what C\1 currently means (first match in this regex) =item * ${1} is the same as $1 (first match in this regex) =item * ${P1} means what $1 currently means (first match in last regex) Here's the big problem with this, and I think others have said it similarly: If we need the functionality of both \1 and $1, then there is no reason redoing the syntax. Period. If \1 is unneeded, then let's ditch it and just use $1 everywhere. However, this is not the case, as Randal, Bart, and others have shown. If we need \1, then we should leave as-is. There's no reason to force literally millions of people to relearn this. Renaming something just to rename it does not add value. -Nate
Re: RFC 170 (v2) Generalize =~ to a special apply-to assignment operator
Simon Cozens wrote: Looks great on scalars, but... @foo =~ shift; # @foo = $foo[0] ? @foo =~ unshift; # @foo = $foo[-1] ? Yes, if you wanted to do something that twisted. :-) It probably makes more sense to do something like these: @array =~ reverse; @vals =~ sort { $a = $b }; @file =~ grep /!^#/; Although I have to admit I like: @foo =~ grep !/\S/; Exactly! But I'm not very keen on the idea of %foo =~ keys; Again, that depends on whether or not you're Really Evil. ;-) -Nate
Re: RFC 164 (v2) Replace =~, !~, m//, s///, and tr// with match(), subst(), and trade()
I'm opposed to an obligation to replace m// and s///. I won't mind the ability to give a prototype of "regex" to functions, or even *additional* functions, match and subst. As the RFC basically proposes. The idea is that s///, tr///, and m// would stay, seemingly unchanged. But they'd actually just be shortcuts to the new builtins. These new builtins can act on lists, be prototyped/overridden, be more easily chained together without in-betweener variables. Basically, they get all the benefits normal functions get, while still being 100% backwards compatible. -Nate
Re: What's in a Regex (was RFC 145)
Mark-Jason Dominus wrote: Larry said: # Well, the fact is, I've been thinking about possible ways to get rid # of =~ for some time now, so I certainly don't mind brainstorming in # this direction. That is in [EMAIL PROTECTED] which is archived at http://www.mail-archive.com/perl6-language-regex@perl.org/msg3.html I think Nathan was exaggerating here, but maybe he knows something I don't. Yeah, that's the quote I was thinking of. As for the "die a horrible death" thingy, people that know me will know that I use this phrase to mean "go away" in a general yet humorous sense. So it's not meant to put those precise words in Larry's mouth, but just that Larry has voiced his opinions it might be nice for =~ to go away. I doubt anyone on the list has actually suggested anything "die a horrible death" literally. ;-) -Nate
XML/HTML-specific ? and ? operators? (was Re: RFC 145 (alternate approach))
It would be useful (and increasingly more common) to be able to match qr|\s*(\w+)([^]*)| to qr|\s*/\1\s*|, and handle the case where those can nest as well. Something like listmatch this with list /list not this but /list this. I suspect this is going to need a ?[ and ?] of its own. I've been thinking about this since your email on the subject yesterday, and I don't see how either RFC 145 or this alternative method could support it, since there are two tags - and / - which are paired asymmetrically, and neither approach gives any credence to what's contained inside the tag. So tag would be matched itself as " matches ". What if we added special XML/HTML-parsing ? and ? operators? Unfortunately, as Richard notes, ? is already taken, but I will use it for the examples to make things symmetrical. ? = opening tag (with name specified) ? = closing tag (matches based on nesting) Your example would simply be: /(?list)[\s\w]*(?list)[\s\w]*(?)[\s\w]*(?)/; What makes me nervous about this is that ? and ? seem special-case. They are, but then again XML and HTML are also pervasive. So a special-case for something like this might not be any stranger than having a special-case for sin() and cos() - they're extremely important operations. The other thing that this doesn't handle is tags with no closing counterpart, like: br Perhaps for these the easiest thing is to tell people not to use ? and ?: /(?p)[\s*\w](?:br)(?)/; Would match p Some stuffbr /p Finally, tags which take arguments: div align="center"Stuff/div Would require some type of "this is optional" syntax: /(?div\s*\w*)Stuff(?)/ Perhaps only the first word specified is taken as the tag name? This is the XML/HTML spec anyways. -Nate
Re: RFC 145 (alternate approach)
I think it's cool too, I don't like the @^g and ^@G either. But I worry about the double-meaning of the []'s in your solution, and the fact that these: /\m[...]...\M/; /\d[...]...\D/; Will work so differently. Maybe another character like ()'s that takes a list: /\m(,[).*?\M(,])/; That solves the multiple characters problem at least. However, we still have a \M and \m, which isn't consistent if they're going to take arguments. But, how about a new ?m operator? /(?m|[).*?(?M|])/; Then the ?M matches pairs with the previous ?m, if there was one that was matched. The | character separates or'ed sets consistent with other regex patterns. -Nate David Corbin wrote: I never saw one comment on this, and the more I think about it, the more I like it. So, I thought I'd throw it back out one more time...(If I get no comments this time, I'll be quiet :) David Corbin wrote: I haven't given this a WHOLE lot of thought, so please, shoot it full of holes. I certainly like the goal of this RFC, but I dislike the idea that the specification for what chacters are going to match are specified outside of the RE.
Re: RFC 145 (alternate approach)
Richard Proctor wrote: No ?] should match the closest ?[ it should nest the ?[s bound by any brackets in the regex and act accordingly. Good point. Also this does not work as a definition of simple bracket matching as you need ( to match ) not ( to match (. A ?[ list should specify for each element what the matching element is perhaps Actually, it should with some simple precedence rules. If ?] reverses the ordering of ?[, *and* we define "reversing" for bracketed pairs consistent with the current Perl definition in other contexts, then this is all automatic: "normal" "reversed" -- --- 103301 99aa99 (( )) + + {{[!_ _!]}} {__A1( )A1__} That is, when a bracket is encountered, the "reverse" of that is automatically interpreted as its closing counterpart. This is the same reason why qq// and qq() and qq{} all work without special notation. So we can replace @^g and @^G with simple precendence rules, the same that are actually invoked automatically throughout Perl already. (?[( = ),{ = }, 01 = 10) sort of hashish in style. I actually think this is redundant, for the reasons I mentioned above. I'm not striking it down outright, but it seems simple rules could make all this unnecessary. -Nate
Re: Overlapping RFCs 135 138 164
Mark-Jason Dominus wrote: RFC135: Require explicit m on matches, even with ?? and // as delimiters. This one is along a different line from these two: RFC138: Eliminate =~ operator. RFC164: Replace =~, !~, m//, and s/// with match() and subst() Which I could see unifying. I'd ask people to wait until v2 of RFC 164 comes up. It may well include everything from RFC 138 already. -Nate
Re: RFC 165 (v1) Allow Varibles in tr///
Mark-Jason Dominus wrote: I think the reason this hasn't been done before it because it's *not* quite straightforward. Before everyone gets tunnel vision, let me point out one thing: Accepting variables in tr// makes no sense. It defeats the purpose of tr/// - extremely fast, known transliterations. tr///e is the same as s///g: tr/$foo/$bar/e == s/$foo/$bar/g I don't think this RFC accomplishes anything, personally. -Nate
Re: RFC 165 (v1) Allow Varibles in tr///
Tom Christiansen wrote: tr///e is the same as s///g: tr/$foo/$bar/e == s/$foo/$bar/g I suggest you read up on tr///, sir. You are completely wrong. Yep, sorry. I tried to hit cancel and hit send instead. I'll shut up now. -Nate
Re: RFC 112 (v2) Assignment within a regex
if (/Time: (..):(..):(..)/) { $hours = $1; $minutes = $2; $seconds = $3; } This then becomes: /Time: (?$hours=..):(?$minutes=..):(?$seconds=..)/ This is more maintainable than counting the brackets and easier to understand for a complex regex. And one does not have to worry about the scope of $1 etc. This is probably one of the coolest RFC's I've seen so far. :-) One question: How are these scoped? Are they lexicals? Global dynamics? What if you want to change the scoping? This is the only catch I see. Maybe requiring, under 'use strict': my($hours, $minutes, $seconds); /Time: (?$hours=..):(?$minutes=..):(?$seconds=..)/ Input? -Nate
Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()
Nathan Torkington wrote: Hmm. This is exactly the same situation as with chomp() and somehow chomp() can tell the difference between: $_ = "hi\n"; chomp; and @strings = (); chomp @strings; Good point. I was looking at it from the general "What's wrong with how @arrays are parsed as arguments?" standpoint, not from a "How can we fix this specific function?" standpoint. But chomp seems to use @ as its indicator. You can't say: $_ = $a = "hi\n"; chomp $_, $a; If it sees that $, it figures its chomp SCALAR. I'm unsure if this is adequate for match, but it might be. Maybe. Behavior like chomp() is what we're looking for, so on ths surface this seems to work. But people might also want to do: match /string/, $one, $two, $three; However, being able to take @ or $;... seems like a possibility. In fact, chomp not doing this might be a "bug". 2. I don't think it's even closely tied to this RFC itself. This is the mindset that worries me: every edge case needs another RFC. Look to what's already in Perl: does anything else behave like this? How does it get around it? Can we co-opt the way it works? Fair enough. Again, I was looking at it from a generalist standpoint. -Nate
New match and subst replacements for =~ and !~ (was Re: RFC 135 (v2) Require explicit m on matches, even with ?? and // as delimiters.)
[cc'ed to -regex b/c this is related to RFC 138] Proposed replacements for m// and s///: match /pattern/flags, $string subst /pattern/newpattern/flags, $string The more I look at that, the more I like it. Very consistent with split and join. You can now potentially match on @multiple_strings too. Just to extend this idea, at least for the exercise of it, consider: match; # all defaults (pattern is /\w+/?) match /pat/;# match $_ match /pat/, $str; # match $str match /pat/, @strs; # match any of @strs subst; # like s///, pretty useless :-) subst /pat/new/;# sub on $_ subst /pat/new/, $str; # sub on $str subst /pat/new/, @strs; # return array of modified strings Notice you can drop trailing args and they work just like split. Much more consistent. This also eliminates "one more oddity", =~ and !~. So the new syntax would be: Perl 5 Perl 6 -- if ( /\w+/ ) { } if ( match ) { } if ( $_ !~ /\w+/ ) { } if ( ! match ) { }# better ($res) = m#^(.*)$#g; $res = match #^(.*)$#g; next if /\s+/ || /\w+/; next if match /\s+/ or match /\w+/; next if ($str =~ /\s+/) || next if match /\s+/, $str or ($str =~ /\w+/) match /\w+/, $str; next unless $str =~ /^N/;next unless match /^N/, $str; $str =~ s/\w+/$bob/gi; $str = subst /\w+/$bob/gi, $str; ($str = $_) =~ s/\d+/func/ge; $str = subst /\d+/func/ge; # better s/\w+/this/; subst /\w+/this/; # These are pretty cool... foreach (@old) { @new = subst /hello/X/gi, @old; s/hello/X/gi; push @new, $_; } foreach (@str) { print "Got it" if match /\w+/, @str; print "Got it" if (/\w+/); } Now, this gives us a cleaner syntax, yes. More consistent, more sensical, and makes some things easier. But more typing overall, and relearning for lots of people. If it's more powerful and extensible, then it's worth it, but this should be a conscious decision. However, it is worth consideration, in light of RFC 138 and many other issues. If we did eliminate =~, I think something like this would work pretty well in its place. If anyone thinks this is an idea worthy of an RFC (the more I look at it the better it looks, but I'm biased :), let me know. Although we'd probably need something better than "subst". Maybe just "m" and "s" still. -Nate