Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-27 Thread Nathan Wiger

Nathan Torkington wrote:
> 
> Hmm.  This is exactly the same situation as with chomp() and somehow
> chomp() can tell the difference between:
> 
>   $_ = "hi\n";
>   chomp;
> 
> and
> 
>   @strings = ();
>   chomp @strings;

Good point. I was looking at it from the general "What's wrong with how
@arrays are parsed as arguments?" standpoint, not from a "How can we fix
this specific function?" standpoint.

> But chomp seems to use @ as its indicator.  You can't say:
> 
>   $_ = $a = "hi\n";
>   chomp $_, $a;
> 
> If it sees that $, it figures its chomp SCALAR.
> 
> I'm unsure if this is adequate for match, but it might be.

Maybe. Behavior like chomp() is what we're looking for, so on ths
surface this seems to work. But people might also want to do:

match /string/, $one, $two, $three;

However, being able to take @ or $;... seems like a possibility. In
fact, chomp not doing this might be a "bug".

> >2. I don't think it's even closely tied to this RFC itself.
> 
> This is the mindset that worries me: every edge case needs another
> RFC.  Look to what's already in Perl: does anything else behave like
> this?  How does it get around it?  Can we co-opt the way it works?

Fair enough. Again, I was looking at it from a generalist standpoint.

-Nate



Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-27 Thread Nathan Torkington

Nathan Wiger writes:
> Honestly, not sure. Although, there are two things I'd say about it:
> 
>1. I don't think it's a showstopper for this RFC, since the
>   feature you are addressing is actually a new piece of
>   functionality.

Hmm.  This is exactly the same situation as with chomp() and somehow
chomp() can tell the difference between:

  $_ = "hi\n";
  chomp;

and

  @strings = ();
  chomp @strings;

But chomp seems to use @ as its indicator.  You can't say:

  $_ = $a = "hi\n";
  chomp $_, $a;

If it sees that $, it figures its chomp SCALAR.

I'm unsure if this is adequate for match, but it might be.

>2. I don't think it's even closely tied to this RFC itself.

This is the mindset that worries me: every edge case needs another
RFC.  Look to what's already in Perl: does anything else behave like
this?  How does it get around it?  Can we co-opt the way it works?

Nat



Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-27 Thread Nathan Wiger

Nathan Torkington wrote:
> 
> When I was thinking about this very topic yesterday and today, I
> came up with this problem:
> 
>   @strs = ();
>   match /pat/, @strs;   # surprise!  I'm matching on $_
> 
> That is, how do you tell an empty array from no arguments?

Easy: We'll just use lazy evaluation or some other magic. 

*snicker* :-)

Honestly, not sure. Although, there are two things I'd say about it:

   1. I don't think it's a showstopper for this RFC, since the
  feature you are addressing is actually a new piece of
  functionality.

   2. I don't think it's even closely tied to this RFC itself.

Not being able to tell an empty @array apart from no arguments is a
significant problem right now in Perl. I've always viewed it as such. It
would be really nice if we were able to tell we got a null @array
argument somehow, but I'm not sure how. Sounds like an RFC... ;-)

-Nate



Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-27 Thread Nathan Torkington

Perl6 RFC Librarian writes:
>match;  # all defaults (pattern is /\w+/?)
>match /pat/;# match $_
>match /pat/, $str;  # match $str
>match /pat/, @strs; # match any of @strs

When I was thinking about this very topic yesterday and today, I
came up with this problem:

  @strs = ();
  match /pat/, @strs;   # surprise!  I'm matching on $_

That is, how do you tell an empty array from no arguments?  I have
a horrible suspicion everyone is going to reach for lazy evaluation
and other magic.

Nat



Re: RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-27 Thread Nathan Wiger

> =head1 TITLE
> 
> Replace =~, !~, m//, and s/// with match() and subst()

In a marked oversight, I'd also like to note that tr// would be replaced
with trade:

Perl 5  Perl 6
--- --
$str =~ tr/a/b/;$new = trade /a/b/, $str;
tr/a/b/;trade /a/b/;

This will be reflected in v2. However, it should be fairly obvious how
this fits in with the others.

I know 'tr' is really 'translate', but that's too long and it looks like
'trans' is going to be taken up by Transactional Variables (RFC 130).
'trade' connotes what is happening pretty accurately, I think.

-Nate



Re: RFC 110 (v2) counting matches

2000-08-27 Thread Bart Lateur

On 27 Aug 2000 19:01:45 -, Perl6 RFC Librarian wrote:

>m//g just returns 1 for matching.

Er... but in a scalar context, m//g DOES only match once! If you want
more, repeat the match. Or use it in a list context, then it will try to
match them all.

$_ = "abaabbbababbbabbaaa";
while(/(b+)/g) {
print "Got a '$1'\n";
}
-->
Got a 'b'
Got a 'bbb'
Got a 'b'
Got a 'bbb'
Got a 'bb'

Let's try again:

$_ = "abaabbbababbbabbaaa";
print scalar(() = /b+/g);
-->
5

Is that what you're after?

-- 
Bart.



RFC 166 (v1) Additions to regexs

2000-08-27 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

Additions to regexs

=head1 VERSION

Maintainer: Richard Proctor <[EMAIL PROTECTED]>
Date: 27 Aug 2000
Mailing List: [EMAIL PROTECTED]
Version: 1
Number: 166

=head1 ABSTRACT

This is a set of minor enhancements to regexs that I thought up on a long
plane ride.  All these can be done now, I would just like to make them
easier.

=head1 DESCRIPTION

These are a set of minor enhancements to regexes, they are largely independant.

=head2 Alternative Lists from arrays

(?@foo) is sort of equivalent to (??{join('|',@foo)}), ie it expands into a
list of alternatives.  One could possible use just @foo, for this.

If @foo contained special characters you might want to \Quote each item.

(?Q@foo) is sort of equivalent to (??{join('|', map quotemeta, @foo)})

=head2 Matching Not a pattern

(?^pattern) matches anything that does not match the pattern.  On its own, one
can use !~ etc to negatively match patterns, but to match a pattern that 
has foo(anything but not baz)bar is currently difficult.  With this syntax
it would simply be /foo(?^baz)bar/.

=head2 A disambiguator

(?) is a null element in a pattern, that can be used to split elements that
might otherwise be confused, it has no effect otherwise, it matches nothing. 
If you have a variable $foo, then matching $foobar would look for the
variable $foobar, when you actually meant to look for $foo then "bar".  This
allows the user to simply write $foo(?)bar.  (Yes I know this can be written
other ways but this is a simple example).

=head1 IMPLENTATION

No Idea

=head1 REFERENCES

None yet




RFC 165 (v1) Allow Varibles in tr///

2000-08-27 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

Allow Varibles in tr///

=head1 VERSION

Maintainer: Richard Proctor <[EMAIL PROTECTED]>
Date: 27 Aug 2000
Mailing List: [EMAIL PROTECTED]
Version: 1
Number: 165

=head1 ABSTRACT

Allow variables in a tr///.  At present the only way to do a tr/$foo/$bar/
is to wrap it up in an eval.  I dont like using evals for this sort of thing.

=head1 DESCRIPTION

Suggested syntax: tr/$foo/$bar/e

With a /e, tr will expand both the LHS and RHS of the translate function.
Either or both could be variables. I am suggesting /e as it is sort of like
/e for s///e.

=head1 IMPLENTATION

No idea, but should be straight forward.

=head1 REFERENCES

None yet.





RFC 164 (v1) Replace =~, !~, m//, and s/// with match() and subst()

2000-08-27 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Replace =~, !~, m//, and s/// with match() and subst()

=head1 VERSION

   Maintainer: Nathan Wiger <[EMAIL PROTECTED]>
   Date: 27 Aug 2000
   Version: 1
   Mailing List: [EMAIL PROTECTED]
   Number: 164

=head1 ABSTRACT

Several people (including Larry) have expressed a desire to get rid of
C<=~> and C. This RFC proposes a way to replace C and C
with two new builtins, C and C. 

=head1 DESCRIPTION

=head2 Overview

Everyone knows how C<=~> and C work. Several proposals, such as RFCs
135 and 138, attempt to fix some stuff with the current pattern-matching
syntax. Most proposals center around minor modifications to C and
C.

This RFC proposes that C and C be dropped from the language
altogether, and instead be replaced with new C and C
builtins, with the following syntaxes:

   $res = match /pattern/flags, $string
   $new = subst /pattern/newpattern/flags, $string

These subs are designed to mirror the format of C, making them
more consistent. Unlike the current forms, these return the modified
string, leaving C<$string> alone. (Unless they are called in a void
context, in which case they act on and modify C<$_> consistent with
current behavior).

Extra arguments can be dropped, consistent with C and many other
builtins:

   match;  # all defaults (pattern is /\w+/?)
   match /pat/;# match $_
   match /pat/, $str;  # match $str
   match /pat/, @strs; # match any of @strs

   subst;  # like s///, pretty useless :-)
   subst /pat/new/;# sub on $_
   subst /pat/new/, $str;  # sub on $str
   subst /pat/new/, @strs; # return array of modified strings
 
These new builtins eliminate the need for C<=~> and C altogether,
since they are functions just like C, C, C, and so
on.

Sometimes examples are easiest, so here are some examples of the new
syntax:

   Perl 5   Perl 6
    --
   if ( /\w+/ ) { } if ( match ) { }
   die "Bad!" if ( $_ !~ /\w+/ );   die "Bad!" if ( ! match ); 
   ($res) = m#^(.*)$#g; ($res) = match #^(.*)$#g;

   next if /\s+/ || /\w+/;  next if match /\s+/ or match /\w+/;
   next if ($str =~ /\s+/) ||   next if match /\s+/, $str or 
   ($str =~ /\w+/)  match /\w+/, $str;
   next unless $str =~ /^N/;next unless match /^N/, $str;
   
   $str =~ s/\w+/$bob/gi;   $str = subst /\w+/$bob/gi, $str;
   ($str = $_) =~ s/\d+/&func/ge;   $str = subst /\d+/&func/ge;
   s/\w+/this/; subst /\w+/this/; 

   # These are pretty cool...   
   foreach (@old) { @new = subst /hello/X/gi, @old;
  s/hello/X/gi;
  push @new, $_;
   }

   foreach (@str) { print "Got it" if match /\w+/, @str;
  print "Got it" if (/\w+/);
   }

This gives us a cleaner, more consistent syntax. In addition, it makes
several things easier, is more easily extensible:

   &callsomesub(subst(/old/new/gi, $mystr));
   $str = subst /old/new/i, $r->getsomeval;

and is easier to read English-wise. However, it requires a little too
much typing. See below.

=head2 Concerns

This should be carefully considered. It's good because it gets rid of
"yet another odditty" with a more standard syntax that I would argue is
more powerful and consistent. However, it also causes everyone to
relearn how to match and substitute patterns. This must be a careful,
conscious decision, lest we really screw stuff up.

That being said, since my intial post I have received several personal
emails endorsing this, hence the reason I decided to RFC it. So it's an
option, it just has to be powerful enough for people to see the "big
win".

Finally, it requires a little too much typing still for my tastes.
Perhaps we should make "m" and "s" at least shortcuts to the names,
possibly allowing users to bind them to the front of the pattern
(similar to some of RFC 138's suggestions). Maybe these two could be
equivalent:

$new = subst /old/new/i, $old;   ==$new = s/old/new/i, $old;

And then it doesn't look that radical anymore. This is similar to RFC
138, only C<$old> is not modified.

=head1 IMPLEMENTATION

Hold your horses

=head1 MIGRATION

This would be huge. Every pattern match would have to be translated,
every Perl hacker would have to relearn patterns, and every Perl 5
book's regexp section would be instantly out of date. Like I said, this
is not a simple decision. But if there's obvious increases in power, I
think people will appreciate the change, not dread it. At the very least
it makes Perl much more consistent.

=head1 REFERENCES

This is a synthesis of several ideas from myself, Ed Mills, and Tom C

RFC 138: Eliminate =~ operator. 

RFC 135: Require explicit m on matches, even with ?? and // as
delimiters.




RFC 144 (v2) Behavior of empty regex should be simple

2000-08-27 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Behavior of empty regex should be simple

=head1 VERSION

  Maintainer: Mark Dominus <[EMAIL PROTECTED]>
  Date: 24 August 2000
  Last Modified: 27 August 2000
  Version: 2
  Mailing List: [EMAIL PROTECTED]
  Number: 144

=head1 ABSTRACT

=head2 Standard Documentation

According to L:

=over 4

=item  m/PATTERN/cgimosx

=item /PATTERN/cgimosx

If the PATTERN evaluates to the empty string, the last successfully
matched regular expression is used instead.

=back

This behavior should be changed.  If the PATTERN is empty, Perl should
look for the empty string.  (That is, if the PATTERN is empty, it
should always match.)

=head1 DESCRIPTION

Literal empty patterns, such as:

$s =~ // ;

are not the problem here.  The real problem is that the special case
is invoked for interpolated patterns also.  For example,

chomp($pat = );
$s =~ /\Q$pat\E/;

looks to see if $pat is a substring of $s, unless $pat is empty, in
which case it matches $s against the last regex that was matched
successfully.  That regex might be far away, in some other module.
If the far-away regex happened to contain backreference groups, the
backreference variables will be set accordingly.

To make this safe in Perl 5, the programmer has to write something
peculiar like

$s =~ /(?=)\Q$pat\E/;

to ensure that the regex, after interpolation, is never empty.

I propose that this 'last successful match' behavior be discarded
entirely, and that an empty pattern always match the empty string.

=head1 RATIONALE

=head2 The Feature Was Not Useful, I

The special behavior for empty patterns has never been particularly
useful.  For example, you could imagine code like this:

for $pat (@patterns) {
  if ($a =~ /$pat/ && $b =~ //) {
# do something 
  }
}

This would be more efficient than the equivalent

for $pat (@patterns) {
  if ($a =~ /$pat/ && $b =~ /$pat/) {
# do something 
  }
}

because $pat would be compiled only once per loop instead of twice.
It is now more straightforward and efficient to do this sort of thing
explicitly with the qr// operator:

@patterns = map qr/$_/, @patterns;
for $pat (@patterns) {
  if ($a =~ /$pat/ && $b =~ /$pat/) {
# do something 
  }
}


=head2 The Feature Was Not Useful, II

People sometimes propose the following use for the empty pattern
special case:  They have a pattern, and many strings, and they want to
see if every string matches the pattern.  This code works, but is
inefficient:

sub match_all {
  my $pat = shift;
  for (@_) {
return 0 unless /$pat/;
  }
  return 1; 
}

This is because C must be recompiled for each string, or checked
to see whether recompilation is necessary.

This code does not work:

sub match_all {
  my $pat = shift;
  for (@_) {
return 0 unless /$pat/o;
  }
  return 1; 
}

because C<$pat> changes with each call.

One solution is to use 'eval' here to generate the pattern matching
code (with C) at run time.

People have sometimes tried to use C here, but usually without
success.  The idea is:

sub match_all {
  my $pat = shift;
  # load $pat into 'last successfully matched' space
  for (@_) {
return 0 unless //;
  }
  return 1;
}

The problem here is that there is no way to designate $pat as the last
successfully matched regex without actually finding a string that
matches it.  In the past people attempting this strategy have appeared
in C asking how to find a string that matches a
given regex.  As far as I know, no useful solutions have been offered.
(In fact, there may not be any such string.  Consider the pattern
C for example.)

A better, simpler solution to this problem is to use the C operator:

sub match_all {
  my $pat = shift;
  $pat = qr($pat);
  for (@_) {
return 0 unless /$pat/;
  }
  return 1; 
}

=head2 This feature has resulted in bugs

Any code that contains the innocent-looking

if (/\Q$string\E/) {
  ...
}

is potentially booby-trapped.  Such code is common.  An example of
this type appears in L.


=head1 Alternatives

Rather than eliminating the special case entirely, alternative changes
are sometimes proposed.

=head2 Empty pattern to mean 'last match' instead of 'last successful match'

This behavior would be more useful than the current behavior and is
sometimes proposed as an alternative.  

For example, the application discussed in the section 'The feature was
not useful, II' above would be feasible if the empty pattern matched
the last-matched pattern, because it would no longer be necessary to
manufacture a matching stri

Re: RFC 112 (v2) Assignment within a regex

2000-08-27 Thread Nathan Wiger

>if (/Time: (..):(..):(..)/) {
> $hours = $1;
> $minutes = $2;
> $seconds = $3;
> }
> 
> This then becomes:
> 
>   /Time: (?$hours=..):(?$minutes=..):(?$seconds=..)/
> 
> This is more maintainable than counting the brackets and easier to understand
> for a complex regex.  And one does not have to worry about the scope of $1 etc.

This is probably one of the coolest RFC's I've seen so far. :-) 

One question: How are these scoped? Are they lexicals? Global dynamics?
What if you want to change the scoping?

This is the only catch I see. Maybe requiring, under 'use strict':

   my($hours, $minutes, $seconds);
   /Time: (?$hours=..):(?$minutes=..):(?$seconds=..)/

Input?

-Nate



RFC 110 (v2) counting matches

2000-08-27 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

counting matches

=head1 VERSION

Maintainer: Richard Proctor <[EMAIL PROTECTED]>
Date: 16 Aug 2000
Last Modified: 27 Aug 2000
Version: 2
Mailing List: [EMAIL PROTECTED]
Number: 110

=head1 ABSTRACT

Provide a simple way of giving a count of matches of a pattern.

=head1 CHANGES

Version 2 of this RFC redirects discussion of this topic to
[EMAIL PROTECTED]

=head1 DESCRIPTION

Have you ever wanted to count the number of matches of a patten?  s///g 
returns the number of matches it finds.  m//g just returns 1 for matching.
Counts can be made using s//$&/g but this is wastefull, or by putting some 
counting loop round a m//g.  But this all seams rather messy. 

m//gt would be defined to do the match, and return the count of matches, this 
leaves all existing uses consistent and unaffected.  /t is suggested for
"counT", as /c is already taken.  Using /t without /g would be result in
only 0 or 1 being returned, which is nearly the existing syntax.

(Note I am only on the announce list at present as I am suffering
from negative free time).

=head1 IMPLENTATION

No idea

=head1 REFERENCES

I brought this up on p5p a couple of years ago, but it was lost in the noise...






RFC 112 (v2) Assignment within a regex

2000-08-27 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1  TITLE

Assignment within a regex

=head1 VERSION

Maintainer: Richard Proctor <[EMAIL PROTECTED]>
Date: 16 Aug 2000
Date: 27 Aug 2000
Version: 2
Mailing List: [EMAIL PROTECTED]
Number: 112

=head1 ABSTRACT

Provide a simple way of naming and picking out information from a regex
without having to count the brackets.

=head1 CHANGES

Version 2 of this RFC redirects discussion of this topic to
[EMAIL PROTECTED]

=head1 DESCRIPTION

If a regex is complex, counting the bracketed sub-expressions to find the
ones you wish to pick out can be messy.  It is also prone to maintainability
problems if and when you wish to add to the expression.  Using (?:) can be
used to surpress picking up brackets, it helps, but it still gets "complex".  
I would sometimes rather just pickout the bits I want within the regex itself.

Suggested syntax: (?$foo= ... ) would assign the string that is matched by
the patten ... to $foo when the patten matches.  These assignments would be
made left to right after the match has succeded but before processing a 
replacement or other results.  There may be whitespace between the $foo and
the "=".  This would not give the backrefs \1 etc that come with conventional
bracketed sub expressions, I don't think this would be a problem.
Potentially the $foo could be any scalar LHS, as in (?$foo{$bar}= ... )!,
likewise the '=' could be any asignment operator.

The camel and the docs include this example:

   if (/Time: (..):(..):(..)/) {
$hours = $1;
$minutes = $2;
$seconds = $3;
}

This then becomes:
 
  /Time: (?$hours=..):(?$minutes=..):(?$seconds=..)/

This is more maintainable than counting the brackets and easier to understand
for a complex regex.  And one does not have to worry about the scope of $1 etc.

(Note I am only on the announce list at present as I am suffering
from negative free time).

=head1 IMPLENTATION

No idea

=head1 REFERENCES

I brought this up on p5p a couple of years ago, but it was lost in the noise...