Re: RFC 332 (v1) Regex: Make /$/ equivalent to /\z/ under the '/s' modifier

2000-09-28 Thread Nathan Wiger

 Is $$ the only alternative, or did I miss more? I don't think I've even
 seen this $$ mentioned before?

$$ is not a suitable alternative. It already means the current process
ID. It really cannot be messed with. And ${$} is identical to $$ by
definition.

 I still like the idea of $$, as I described it in the original thread.
 I've seen no comments for or against at this time.

See above.

 I can't see how yet another alternative, /$$/, is any better than what
 we have now: /\z/.

I agree. If it's more alternatives we're after, just have the person
write a custom regex. The idea is to make Perl do the right thing,
whatever that may be.

The big problem with changing $, as you note, is for people that need to
catch multiple instances in a string:

   $string = "Hello\nGoodbye\nHello\nHello\n";
   $string =~ s/Hello$/Goodbye/gm;

Without $, you can workaround this like so:

   $string =~ s/Hello\n/Goodbye\n/gm;

My suggestion would be:

   1. Make $ exactly always match just before the last \n, as the
  RFC suggests.

   2. Introduce some new \X switch that does what $ does
  currently if it's deemed necessary.

We're back to new alternatives again, but the one thing this buys you is
a $ that works consistently. I don't think many people need $'s current
functionality, and those that do can have an new \X.

-Nate



Re: RFC 331 (v1) Consolidate the $1 and C\1 notations

2000-09-28 Thread Nathan Wiger

 =item *
 C\1 goes away as a special form
 
 =item *
 $1 means what C\1 currently means (first match in this regex)
 
 =item *
 ${1} is the same as $1 (first match in this regex)
 
 =item *
 ${P1} means what $1 currently means (first match in last regex)

Here's the big problem with this, and I think others have said it
similarly: If we need the functionality of both \1 and $1, then there is
no reason redoing the syntax. Period.

If \1 is unneeded, then let's ditch it and just use $1 everywhere.
However, this is not the case, as Randal, Bart, and others have shown.

If we need \1, then we should leave as-is. There's no reason to force
literally millions of people to relearn this. Renaming something just to
rename it does not add value.

-Nate



Re: RFC 170 (v2) Generalize =~ to a special apply-to assignment operator

2000-09-26 Thread Nathan Wiger

Simon Cozens wrote:
 
 Looks great on scalars, but...
 
 @foo =~ shift;   # @foo = $foo[0]  ?
 @foo =~ unshift; # @foo = $foo[-1] ?

Yes, if you wanted to do something that twisted. :-) It probably makes
more sense to do something like these:

   @array =~ reverse;
   @vals =~ sort { $a = $b };
   @file =~ grep /!^#/;
 
 Although I have to admit I like:
 
 @foo =~ grep !/\S/;

Exactly!
 
 But I'm not very keen on the idea of
 
 %foo =~ keys;

Again, that depends on whether or not you're Really Evil. ;-)

-Nate



Re: RFC 164 (v2) Replace =~, !~, m//, s///, and tr// with match(), subst(), and trade()

2000-09-14 Thread Nathan Wiger

 I'm opposed to an obligation to replace m// and s///. I won't mind the
 ability to give a prototype of "regex" to functions, or even
 *additional* functions, match and subst.

As the RFC basically proposes. The idea is that s///, tr///, and m//
would stay, seemingly unchanged. But they'd actually just be shortcuts
to the new builtins. These new builtins can act on lists, be
prototyped/overridden, be more easily chained together without
in-betweener variables. Basically, they get all the benefits normal
functions get, while still being 100% backwards compatible.

-Nate



Re: What's in a Regex (was RFC 145)

2000-09-07 Thread Nathan Wiger

Mark-Jason Dominus wrote:
 
 Larry said:
 
 # Well, the fact is, I've been thinking about possible ways to get rid
 # of =~ for some time now, so I certainly don't mind brainstorming in
 # this direction.
 
 That is in
 [EMAIL PROTECTED]
 
 which is archived at
 
 http://www.mail-archive.com/perl6-language-regex@perl.org/msg3.html
 
 I think Nathan was exaggerating here, but maybe he knows something I don't.

Yeah, that's the quote I was thinking of.

As for the "die a horrible death" thingy, people that know me will know
that I use this phrase to mean "go away" in a general yet humorous
sense. So it's not meant to put those precise words in Larry's mouth,
but just that Larry has voiced his opinions it might be nice for =~ to
go away.

I doubt anyone on the list has actually suggested anything "die a
horrible death" literally. ;-)

-Nate



XML/HTML-specific ? and ? operators? (was Re: RFC 145 (alternate approach))

2000-09-06 Thread Nathan Wiger

 It would be useful (and increasingly more common) to be able to match
 qr|\s*(\w+)([^]*)| to qr|\s*/\1\s*|, and handle the case where those
 can nest as well.  Something like
 
 listmatch this with
list
/list   not this but
 /list   this.

I suspect this is going to need a ?[ and ?] of its own. I've been
thinking about this since your email on the subject yesterday, and I
don't see how either RFC 145 or this alternative method could support
it, since there are two tags -  and / - which are paired
asymmetrically, and neither approach gives any credence to what's
contained inside the tag. So tag would be matched itself as " matches
".

What if we added special XML/HTML-parsing ? and ? operators?
Unfortunately, as Richard notes, ? is already taken, but I will use it
for the examples to make things symmetrical.

   ?  =  opening tag (with name specified)
   ?  =  closing tag (matches based on nesting)

Your example would simply be:

   /(?list)[\s\w]*(?list)[\s\w]*(?)[\s\w]*(?)/;

What makes me nervous about this is that ? and ? seem special-case.
They are, but then again XML and HTML are also pervasive. So a
special-case for something like this might not be any stranger than
having a special-case for sin() and cos() - they're extremely important
operations.

The other thing that this doesn't handle is tags with no closing
counterpart, like:

   br

Perhaps for these the easiest thing is to tell people not to use ? and
?:

   /(?p)[\s*\w](?:br)(?)/;

Would match

   p
  Some stuffbr
   /p

Finally, tags which take arguments:

   div align="center"Stuff/div

Would require some type of "this is optional" syntax:

   /(?div\s*\w*)Stuff(?)/

Perhaps only the first word specified is taken as the tag name? This is
the XML/HTML spec anyways.

-Nate



Re: RFC 145 (alternate approach)

2000-09-05 Thread Nathan Wiger

I think it's cool too, I don't like the @^g and ^@G either. But I worry
about the double-meaning of the []'s in your solution, and the fact that
these:

   /\m[...]...\M/;
   /\d[...]...\D/;

Will work so differently. Maybe another character like ()'s that takes a
list:

   /\m(,[).*?\M(,])/;

That solves the multiple characters problem at least. However, we still
have a \M and \m, which isn't consistent if they're going to take
arguments.

But, how about a new ?m operator?

   /(?m|[).*?(?M|])/;

Then the ?M matches pairs with the previous ?m, if there was one that
was matched. The | character separates or'ed sets consistent with other
regex patterns.

-Nate


David Corbin wrote:
 
 I never saw one comment on this, and the more I think about it, the more
 I like it. So,
 I thought I'd throw it back out one more time...(If I get no comments
 this time, I'll
 be quiet :)
 
 David Corbin wrote:
 
  I haven't given this a WHOLE lot of thought, so please, shoot it full
  of holes.
 
  I certainly like the goal of this RFC, but I dislike the idea that the
  specification for
  what chacters are going to match are specified outside of the RE.



Re: RFC 145 (alternate approach)

2000-09-05 Thread Nathan Wiger

Richard Proctor wrote:
 
 No ?] should match the closest ?[ it should nest the ?[s bound by any
 brackets in the regex and act accordingly.

Good point.
 
 Also this does not work as a definition of simple bracket matching as you
 need ( to match ) not ( to match (.  A ?[ list should specify for each
 element what the matching element is perhaps

Actually, it should with some simple precedence rules. If ?] reverses
the ordering of ?[, *and* we define "reversing" for bracketed pairs
consistent with the current Perl definition in other contexts, then this
is all automatic:

   "normal"   "reversed"
   -- ---
   103301
   99aa99
   (( ))
   + +
   {{[!_ _!]}}
   {__A1( )A1__}

That is, when a bracket is encountered, the "reverse" of that is
automatically interpreted as its closing counterpart. This is the same
reason why qq// and qq() and qq{} all work without special notation. 

So we can replace @^g and @^G with simple precendence rules, the same
that are actually invoked automatically throughout Perl already.

   (?[( = ),{ = }, 01 = 10)
 
 sort of hashish in style.

I actually think this is redundant, for the reasons I mentioned above.
I'm not striking it down outright, but it seems simple rules could make
all this unnecessary. 

-Nate



Re: Overlapping RFCs 135 138 164

2000-08-29 Thread Nathan Wiger

Mark-Jason Dominus wrote:
 
 RFC135: Require explicit m on matches, even with ?? and // as delimiters.

This one is along a different line from these two:

 RFC138: Eliminate =~ operator.
 
 RFC164: Replace =~, !~, m//, and s/// with match() and subst()

Which I could see unifying. I'd ask people to wait until v2 of RFC 164
comes up. It may well include everything from RFC 138 already.

-Nate



Re: RFC 165 (v1) Allow Varibles in tr///

2000-08-29 Thread Nathan Wiger

Mark-Jason Dominus wrote:

 I think the reason this hasn't been done before it because it's *not*
 quite straightforward.

Before everyone gets tunnel vision, let me point out one thing:
Accepting variables in tr// makes no sense. It defeats the purpose of
tr/// - extremely fast, known transliterations.

tr///e is the same as s///g:

tr/$foo/$bar/e  ==  s/$foo/$bar/g

I don't think this RFC accomplishes anything, personally.

-Nate



Re: RFC 165 (v1) Allow Varibles in tr///

2000-08-29 Thread Nathan Wiger

Tom Christiansen wrote:
 
 tr///e is the same as s///g:
 
 tr/$foo/$bar/e  ==  s/$foo/$bar/g
 
 I suggest you read up on tr///, sir.  You are completely wrong.

Yep, sorry. I tried to hit cancel and hit send instead. I'll shut up
now.

-Nate



Re: RFC 112 (v2) Assignment within a regex

2000-08-27 Thread Nathan Wiger

if (/Time: (..):(..):(..)/) {
 $hours = $1;
 $minutes = $2;
 $seconds = $3;
 }
 
 This then becomes:
 
   /Time: (?$hours=..):(?$minutes=..):(?$seconds=..)/
 
 This is more maintainable than counting the brackets and easier to understand
 for a complex regex.  And one does not have to worry about the scope of $1 etc.

This is probably one of the coolest RFC's I've seen so far. :-) 

One question: How are these scoped? Are they lexicals? Global dynamics?
What if you want to change the scoping?

This is the only catch I see. Maybe requiring, under 'use strict':

   my($hours, $minutes, $seconds);
   /Time: (?$hours=..):(?$minutes=..):(?$seconds=..)/

Input?

-Nate



Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-27 Thread Nathan Wiger

Nathan Torkington wrote:
 
 Hmm.  This is exactly the same situation as with chomp() and somehow
 chomp() can tell the difference between:
 
   $_ = "hi\n";
   chomp;
 
 and
 
   @strings = ();
   chomp @strings;

Good point. I was looking at it from the general "What's wrong with how
@arrays are parsed as arguments?" standpoint, not from a "How can we fix
this specific function?" standpoint.

 But chomp seems to use @ as its indicator.  You can't say:
 
   $_ = $a = "hi\n";
   chomp $_, $a;
 
 If it sees that $, it figures its chomp SCALAR.
 
 I'm unsure if this is adequate for match, but it might be.

Maybe. Behavior like chomp() is what we're looking for, so on ths
surface this seems to work. But people might also want to do:

match /string/, $one, $two, $three;

However, being able to take @ or $;... seems like a possibility. In
fact, chomp not doing this might be a "bug".

 2. I don't think it's even closely tied to this RFC itself.
 
 This is the mindset that worries me: every edge case needs another
 RFC.  Look to what's already in Perl: does anything else behave like
 this?  How does it get around it?  Can we co-opt the way it works?

Fair enough. Again, I was looking at it from a generalist standpoint.

-Nate



New match and subst replacements for =~ and !~ (was Re: RFC 135 (v2) Require explicit m on matches, even with ?? and // as delimiters.)

2000-08-25 Thread Nathan Wiger

[cc'ed to -regex b/c this is related to RFC 138]

Proposed replacements for m// and s///:

match /pattern/flags, $string
subst /pattern/newpattern/flags, $string
 
 The more I look at that, the more I like it. Very consistent with split
 and join. You can now potentially match on @multiple_strings too.

Just to extend this idea, at least for the exercise of it, consider:

   match;  # all defaults (pattern is /\w+/?)
   match /pat/;# match $_
   match /pat/, $str;  # match $str
   match /pat/, @strs; # match any of @strs

   subst;  # like s///, pretty useless :-)
   subst /pat/new/;# sub on $_
   subst /pat/new/, $str;  # sub on $str
   subst /pat/new/, @strs; # return array of modified strings
 
Notice you can drop trailing args and they work just like split. Much
more consistent. This also eliminates "one more oddity", =~ and !~. So
the new syntax would be:

   Perl 5   Perl 6
    --
   if ( /\w+/ ) { } if ( match ) { }
   if ( $_ !~ /\w+/ ) { }   if ( ! match ) { }#
better
   ($res) = m#^(.*)$#g; $res = match #^(.*)$#g;

   next if /\s+/ || /\w+/;  next if match /\s+/ or match /\w+/;
   next if ($str =~ /\s+/) ||   next if match /\s+/, $str or 
   ($str =~ /\w+/)  match /\w+/, $str;
   next unless $str =~ /^N/;next unless match /^N/, $str;
   
   $str =~ s/\w+/$bob/gi;   $str = subst /\w+/$bob/gi, $str;
   ($str = $_) =~ s/\d+/func/ge;   $str = subst /\d+/func/ge;   #
better
   s/\w+/this/; subst /\w+/this/; 

   # These are pretty cool...   
   foreach (@old) { @new = subst /hello/X/gi, @old;
  s/hello/X/gi;
  push @new, $_;
   }

   foreach (@str) { print "Got it" if match /\w+/, @str;
  print "Got it" if (/\w+/);
   }

Now, this gives us a cleaner syntax, yes. More consistent, more
sensical, and makes some things easier. But more typing overall, and
relearning for lots of people. If it's more powerful and extensible,
then it's worth it, but this should be a conscious decision.

However, it is worth consideration, in light of RFC 138 and many other
issues. If we did eliminate =~, I think something like this would work
pretty well in its place. If anyone thinks this is an idea worthy of an
RFC (the more I look at it the better it looks, but I'm biased :), let
me know. Although we'd probably need something better than "subst".
Maybe just "m" and "s" still.

-Nate