Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-28 Thread Jonathan Scott Duff

On Mon, Aug 28, 2000 at 08:12:22AM -0700, Nathan Wiger wrote:
 Jonathan Scott Duff wrote:
  
  I think that Csubst is too syntactically close yet semantically far
  from Csubstr that the evil demons of confusion will rear their ugly
  heads.
 
 I agree too, any suggestions are welcome. The fact that 'sub' and
 'substr' are already taken makes this tough...

Well, if s/// stays around, I imagine that's what people will use, so
we could call the function form Csubstitute.  Only those weirdos
using the function form would have to pay the syntactic penalty.  ;-)

  Given the above, why not make a bare Csubst do something equally
  useful?  Here are some ideas:
  
  subst;  # removes leading and trailing whitespace
 
 I like this one alot.

Me too.  I put down the others to give people brain-food mostly  :-)

But again, this doesn't seem to make much sense in what I would think
would be its common use (using the spelled out version):

while () {
substitute; # What the hell am I substituting?
...
}

Similarly with match:

while () {
next unless match;  # Er, match *what*?
...
}

Both leave me hanging.  I can't read Perl in english like I'm used to.


  I wonder what happens when people start typing
  
  $new = subst s/old/new/i, $old;
 
 They get a syntax error! :-)
 
 Honestly, I don't think that's a big problem. People don't do this with
 split() now. I think people will either use the "backwards compat" s///
 form or the function form.

But they might *accidentally* use both.  I'd prefer that Perl ... you
guessed it ... DWIM here.  I.e., 

$new = substitute s/old/new/i, $old;

would be equivalent to

$new = substitute /old/new/i, $old;

With a warning if they're turned on.  Same for match.  

Hmm.  Does using the function form still give the ability to pick
delimiters?  And what does *this* mean:

@stuff = split match /foo/, $string;

?

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]



Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-28 Thread Tom Christiansen

Simple solution.

If you want to require formats such as m/.../ (which I actually think is a
good idea), then make it part of -w, -W, -ww, or -WW, which would be a perl6
enhancement of strictness.

That's like having "use strict" enable mandatory perlstyle compliance
checks, and rejecting the program otherwise.  Doesn't seem sensible.

--tom



Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-28 Thread Nathan Torkington

Michael Maraist writes:
 Compatibility is going to have to be maintained somehow.  And we can either
 have some sort of perl6 designator (such as the pragma) to designate
 incompatible (and otherwise ambiguous) code, or we're going to have to
 continue tacking on syntactic sugar to legacy code.

The compatibility path for perl5 to perl6 is via a translator.  It
is not expected that perl6 will run perl5 programs unchanged.  The
complexity of the translator and the depth of the changes will be
decided by the decisions Larry makes.

Nat



Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-27 Thread Nathan Wiger

Nathan Torkington wrote:
 
 Hmm.  This is exactly the same situation as with chomp() and somehow
 chomp() can tell the difference between:
 
   $_ = "hi\n";
   chomp;
 
 and
 
   @strings = ();
   chomp @strings;

Good point. I was looking at it from the general "What's wrong with how
@arrays are parsed as arguments?" standpoint, not from a "How can we fix
this specific function?" standpoint.

 But chomp seems to use @ as its indicator.  You can't say:
 
   $_ = $a = "hi\n";
   chomp $_, $a;
 
 If it sees that $, it figures its chomp SCALAR.
 
 I'm unsure if this is adequate for match, but it might be.

Maybe. Behavior like chomp() is what we're looking for, so on ths
surface this seems to work. But people might also want to do:

match /string/, $one, $two, $three;

However, being able to take @ or $;... seems like a possibility. In
fact, chomp not doing this might be a "bug".

 2. I don't think it's even closely tied to this RFC itself.
 
 This is the mindset that worries me: every edge case needs another
 RFC.  Look to what's already in Perl: does anything else behave like
 this?  How does it get around it?  Can we co-opt the way it works?

Fair enough. Again, I was looking at it from a generalist standpoint.

-Nate



RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-27 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Replace =~, !~, m//, and s/// with match() and subst()

=head1 VERSION

   Maintainer: Nathan Wiger [EMAIL PROTECTED]
   Date: 27 Aug 2000
   Version: 1
   Mailing List: [EMAIL PROTECTED]
   Number: 164

=head1 ABSTRACT

Several people (including Larry) have expressed a desire to get rid of
C=~ and C!~. This RFC proposes a way to replace Cm// and Cs///
with two new builtins, Cmatch() and Csubst(). 

=head1 DESCRIPTION

=head2 Overview

Everyone knows how C=~ and C!~ work. Several proposals, such as RFCs
135 and 138, attempt to fix some stuff with the current pattern-matching
syntax. Most proposals center around minor modifications to Cm// and
Cs///.

This RFC proposes that Cm// and Cs/// be dropped from the language
altogether, and instead be replaced with new Cmatch and Csubst
builtins, with the following syntaxes:

   $res = match /pattern/flags, $string
   $new = subst /pattern/newpattern/flags, $string

These subs are designed to mirror the format of Csplit, making them
more consistent. Unlike the current forms, these return the modified
string, leaving C$string alone. (Unless they are called in a void
context, in which case they act on and modify C$_ consistent with
current behavior).

Extra arguments can be dropped, consistent with Csplit and many other
builtins:

   match;  # all defaults (pattern is /\w+/?)
   match /pat/;# match $_
   match /pat/, $str;  # match $str
   match /pat/, @strs; # match any of @strs

   subst;  # like s///, pretty useless :-)
   subst /pat/new/;# sub on $_
   subst /pat/new/, $str;  # sub on $str
   subst /pat/new/, @strs; # return array of modified strings
 
These new builtins eliminate the need for C=~ and C!~ altogether,
since they are functions just like Csplit, Cjoin, Csplice, and so
on.

Sometimes examples are easiest, so here are some examples of the new
syntax:

   Perl 5   Perl 6
    --
   if ( /\w+/ ) { } if ( match ) { }
   die "Bad!" if ( $_ !~ /\w+/ );   die "Bad!" if ( ! match ); 
   ($res) = m#^(.*)$#g; ($res) = match #^(.*)$#g;

   next if /\s+/ || /\w+/;  next if match /\s+/ or match /\w+/;
   next if ($str =~ /\s+/) ||   next if match /\s+/, $str or 
   ($str =~ /\w+/)  match /\w+/, $str;
   next unless $str =~ /^N/;next unless match /^N/, $str;
   
   $str =~ s/\w+/$bob/gi;   $str = subst /\w+/$bob/gi, $str;
   ($str = $_) =~ s/\d+/func/ge;   $str = subst /\d+/func/ge;
   s/\w+/this/; subst /\w+/this/; 

   # These are pretty cool...   
   foreach (@old) { @new = subst /hello/X/gi, @old;
  s/hello/X/gi;
  push @new, $_;
   }

   foreach (@str) { print "Got it" if match /\w+/, @str;
  print "Got it" if (/\w+/);
   }

This gives us a cleaner, more consistent syntax. In addition, it makes
several things easier, is more easily extensible:

   callsomesub(subst(/old/new/gi, $mystr));
   $str = subst /old/new/i, $r-getsomeval;

and is easier to read English-wise. However, it requires a little too
much typing. See below.

=head2 Concerns

This should be carefully considered. It's good because it gets rid of
"yet another odditty" with a more standard syntax that I would argue is
more powerful and consistent. However, it also causes everyone to
relearn how to match and substitute patterns. This must be a careful,
conscious decision, lest we really screw stuff up.

That being said, since my intial post I have received several personal
emails endorsing this, hence the reason I decided to RFC it. So it's an
option, it just has to be powerful enough for people to see the "big
win".

Finally, it requires a little too much typing still for my tastes.
Perhaps we should make "m" and "s" at least shortcuts to the names,
possibly allowing users to bind them to the front of the pattern
(similar to some of RFC 138's suggestions). Maybe these two could be
equivalent:

$new = subst /old/new/i, $old;   ==$new = s/old/new/i, $old;

And then it doesn't look that radical anymore. This is similar to RFC
138, only C$old is not modified.

=head1 IMPLEMENTATION

Hold your horses

=head1 MIGRATION

This would be huge. Every pattern match would have to be translated,
every Perl hacker would have to relearn patterns, and every Perl 5
book's regexp section would be instantly out of date. Like I said, this
is not a simple decision. But if there's obvious increases in power, I
think people will appreciate the change, not dread it. At the very least
it makes Perl much more consistent.

=head1 REFERENCES

This is a synthesis of several ideas from myself, Ed Mills, and Tom C

RFC 138: Eliminate =~ operator. 

RFC 135: Require explicit m on matches, even with ?? and // as
delimiters.