Re: RFC 158 (v1) Regular Expression Special Variables
"TC" == Tom Christiansen [EMAIL PROTECTED] writes: $`, $ and $' are useful variables which are never used by any experienced Perl hacker since they have well known problems with efficiency. TC That's hardly true. I could show you plenty of code from TC inexperienced Perl hackers like lwall that use them. But TC the cost in understood. :-) those early perl3 scripts by lwall floating around in /etc were poorly written. i am glad they are finally out of the distribution. TC The rest of what you said probably is reasonable, however. TC The (.*?)(blah)(.*) solution kind works sometimes, but is TC hardly pleasant. Likewise the @+ and @- stuff. i would like to see the @+ and @- stuff made to work faster or beterr or something. they have merit but not practicality. another related grabbing issue is grabbing repeated groups like @all_words = /(\w+\s+)+/ ; we only get the last match from that. but that should be a separate rfc. TC There's also long been talk/thought about making $ and $1 TC and friends magic aliases into the original string, which would TC save that cost. but if you modify that string with s/// you lose unless you make a copy. in fact $`, $ and $' should just be aliases if the op was m///. it is the s/// case that is the problem. that brings up the question about how often is $ needed after a s///? it almost makes little sense since you are matching and modifying. maybe we can also remove support for them with s/// and thereby remove the copy penalty. but my idea would work in both cases and puts it under program control so we could just use that. uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting The Perl Books Page --- http://www.sysarch.com/cgi-bin/perl_books The Best Search Engine on the Net -- http://www.northernlight.com
Re: RFC 158 (v1) Regular Expression Special Variables
those early perl3 scripts by lwall floating around in /etc were poorly written. i am glad they are finally out of the distribution. Those weren't the scripts I was thinking about, and it is *NOT* ipso facto true that something which uses $ or $` is poorly written. --tom
Re: RFC 145 (v2) Brace-matching for Perl Regular Expressions
Nat wrote: 5.6's regular expressions have (??{ ... }) to permit recursion and $^R to maintain state through the parsing. In another thread, Tomc wrote: [...] Likewise the @+ and @- stuff. Okay, I'm throwing my ignorance out for the whole world to see. WTF?? Sure, I'm not in the loop, as certainly gnat and tomc are, but ... I haven't heard of these features, and can't begin to guess what they mean. I just spent an hour or two cruising the perl web site, and nothing about them did I find. Most especially, no mention of any of them is made in the What's New in 5.6.0, What's new in 5.005, or What's New in 5.004 articles. What are these things, and where can I learn about them? ObPerl6: Perhaps some (many?) of the RFCs propose to solve problems that have already been solved, but nobody knows about the solution. -- Eric J. Roode, [EMAIL PROTECTED] print scalar reverse sort Senior Software Engineer'tona ', 'reh', 'ekca', 'lre', Myxa Corporation'.r', 'h ', 'uj', 'p ', 'ts';
Re: RFC 158 (v1) Regular Expression Special Variables
Tom Christiansen wrote: There's also long been talk/thought about making $ and $1 and friends magic aliases into the original string, which would save that cost. I was distressed to discover that s///g does not rebuild the old string between matches, but only at the end. It broke my random anagram generator which was depending on instant updates. If STRING was a linked list of partially full blocks rather than a big piece of contiguous space, we could do length-altering substitutions without copying. -- David Nicol 816.235.1187 [EMAIL PROTECTED] safety first: Republicans for Nader in 2000
Re: RFC 145 (v2) Brace-matching for Perl Regular Expressions
All in all, though, you're right that neither set of features is particularly well-known/used outside of p5p followers. At least from what I've seen. Virtually every person I've worked with since 5.6 came out has been surprised and amazed at the REx eval stuff. The completely reworked regex chapter in Camel III explains and demos all the new 5.6 features. I do not believe they will long remain the Cabal's secret. --tom
Re: RFC 158 (v1) Regular Expression Special Variables
Please correct me if I'm mistaken, but I believe that that's the way they are implemented now. A regex match populates the -startp and -endp parts of the regex structure, and the elements of these items are byte offsets into the original string. I haven't looked at it at all, and perhaps that 's sometihng Ilya did when creating @+ etc. So you might be right. As far as I know it's the same in 5.000. I thought the problem with $ was that the regex engine has to adjust the offsets in the startp/endp arrays every time it scans forward a character or backtracks a character. But maybe the effect of $ is greatly exaggerated or is a relic from perl4? Has anyone actually benchmarked this recently?
New match and subst replacements for =~ and !~ (was Re: RFC 135 (v2) Require explicit m on matches, even with ?? and // as delimiters.)
[cc'ed to -regex b/c this is related to RFC 138] Proposed replacements for m// and s///: match /pattern/flags, $string subst /pattern/newpattern/flags, $string The more I look at that, the more I like it. Very consistent with split and join. You can now potentially match on @multiple_strings too. Just to extend this idea, at least for the exercise of it, consider: match; # all defaults (pattern is /\w+/?) match /pat/;# match $_ match /pat/, $str; # match $str match /pat/, @strs; # match any of @strs subst; # like s///, pretty useless :-) subst /pat/new/;# sub on $_ subst /pat/new/, $str; # sub on $str subst /pat/new/, @strs; # return array of modified strings Notice you can drop trailing args and they work just like split. Much more consistent. This also eliminates "one more oddity", =~ and !~. So the new syntax would be: Perl 5 Perl 6 -- if ( /\w+/ ) { } if ( match ) { } if ( $_ !~ /\w+/ ) { } if ( ! match ) { }# better ($res) = m#^(.*)$#g; $res = match #^(.*)$#g; next if /\s+/ || /\w+/; next if match /\s+/ or match /\w+/; next if ($str =~ /\s+/) || next if match /\s+/, $str or ($str =~ /\w+/) match /\w+/, $str; next unless $str =~ /^N/;next unless match /^N/, $str; $str =~ s/\w+/$bob/gi; $str = subst /\w+/$bob/gi, $str; ($str = $_) =~ s/\d+/func/ge; $str = subst /\d+/func/ge; # better s/\w+/this/; subst /\w+/this/; # These are pretty cool... foreach (@old) { @new = subst /hello/X/gi, @old; s/hello/X/gi; push @new, $_; } foreach (@str) { print "Got it" if match /\w+/, @str; print "Got it" if (/\w+/); } Now, this gives us a cleaner syntax, yes. More consistent, more sensical, and makes some things easier. But more typing overall, and relearning for lots of people. If it's more powerful and extensible, then it's worth it, but this should be a conscious decision. However, it is worth consideration, in light of RFC 138 and many other issues. If we did eliminate =~, I think something like this would work pretty well in its place. If anyone thinks this is an idea worthy of an RFC (the more I look at it the better it looks, but I'm biased :), let me know. Although we'd probably need something better than "subst". Maybe just "m" and "s" still. -Nate