all regexp RFCs

2000-09-08 Thread Hugo

Hi guys, I'm sorry that time has not permitted me to join and take an
active part in the perl6-language-regex list; however, I have grabbed
an opportunity to look through the RFCs generated to date, and thought
I should throw some comments at you.

Apologies in advance for so rudely dumping this lot and _still_ not
joining the list; sorry also if I duplicate stuff that's already
been said. Feel free to ignore all or any of this. You'll need to cc
me if you want me to see replies, and in that case you might want to
do what I didn't, and tailor the subject to be more specific.

I've tried in particular to add a note about implementation issues
in each case.

Enjoy,

Hugo
---
RFC 72: Variable-length lookbehind: the regexp engine should also go backward.
==

This is an interesting idea. However, it is not obvious to me that
there is any practical difference between the existing:
  /(?<= a+ ) b/x
.. and the proposed:
  /b (?`= a+ )/x
.. which implies that implementing one would be as difficult as the
other. And if that is the case, fixing (?<=...) to support variable
length would be preferable, since it is more general. (Consider
/\d+ (?

RFC 145: Brace-matching for Perl Regular Expressions
===

This is an interesting idea. I'm not sure how useful it would
actually be: as far as I can see it would not match the block
on code such as:

  use matchpairs '{' => '}';
  <
stuff...
stuff...
  
.. since it also isn't clear to me whether you'd be able to
extract the table contents, or the rows, using the mechanisms
of this proposal.

RFC 150: Extend regex syntax to provide for return of a hash of matched subpatterns
===

This is cool - I don't think I've seen this suggested before.

Implementation might be a bit more work: the backreferences are
currently stored as offsets (relative to the start of the string)
to the beginning and end of the contents of the backref, and it
might be a bit expensive for normal use to extend that either by
replacing the start offset with a pointer or by adding an extra
per-backref flag. Faster alternatives are possible, but would be
more complex.

RFC 158: Regular Expression Special Variables
===

I'd love to see the performance penalty removed. I'm not sure that
an extra /k flag is the right solution, though I don't have any
concrete alternative to offer.

There has been much discussion of this problem on p5p in the past;
it would be handy to have some references in the RFC to any of the
more informative parts of those threads.

RFC 164: Replace =~, !~, m//, s///, and tr// with match(), subst(), and trade()
===

I don't particularly dislike =~, but I can see that others might.
I think this RFC actually has two distinct parts, which should
probably be separated: the syntax change, and the changes to
behaviour under various contexts. I'm not sure I clearly
understand what the latter are, or why they are necessary. I'm
particularly confused about:

   1. If called in a void context, [the new operators] act on and modify C<$_>,
  consistent with current behavior.

Was this supposed to say 'the C<$str> arguments (or C<$_>)'?

The syntax change does not impact on the regexp engine at all as far
as I can see; I'm not sure whether implementation would make the
perl parser more or less complex. I don't think I understand the
other changes well enough to guess at implementation issues.

RFC 165: Allow Varibles in tr///
===

Definitely. Should be easy to implement. There is a potential for
confusion, since it makes the tr/ lists look even more like
m/ and s/ patterns, but I think it can only be less confusion than
the current state of affairs. It is tempting to make it the default,
and have a flag to turn it off (or just backwhack the dagnabbed
dollar), and auto-translation of existing scripts would be pretty
easy, except that it would presumably fail exactly where people
are using the current workaround, by way of eval.

It would be helpful to tie down would should occur for @var and
%var (but note that this one liner changed between 5.6.0 and 5.7.0:
  crypt% setperl 5.6.0
  crypt% perl -we '/.@x./'
  In string, @x now must be written as \@x at -e line 1, near ".@x"
  Execution of -e aborted due to compilation errors.
  crypt% setperl 5.7.0
  crypt% perl -we '/.@x./'
  Possible unintended interpolation of @x in string at -e line 1.
  Name "main::x" used only once: possible typo at -e line 1.
  Use of uninitialized value in pattern match (m//) at -e line 1.
  crypt% 
).

RFC 166: Additions to regexs
===

(?@foo) and (?Q@foo) are both things I've wanted before now. I'm
not sure if this is the right syntax, particularly if RFC 112 is
adopted: it would be confusing to have (?@foo) to have so
different a meaning from (?$foo=...), and even more so if the
latter is ever extended to allow (?@foo=...).
I see no reason that implementation should cause any problems
since this is purely a regexp-compile time issue.

(?^pattern) is interesting; I'm not sure I've ever fe

Re: RFC 150 (v1) Extend regex syntax to provide for return of a hash of matched subpatterns

2000-09-08 Thread Richard Proctor

On Fri 08 Sep, Kevin Walker wrote:
> (This thread has been inactive for a while.  See 
> http://www.mail-archive.com/perl6-language-regex@perl.org/index.html#0 
> 0015 for it's short history.)
> 
> Long ago Tom Christiansen wrote:
> 
> >This is useful in that it would stop being number dependent.
> >For example, you can't now safely say
> >
> >/$var (foo) \1/
> >
> >and guarantee for arbitrary contents of $var that your you have
> >the right number backref anymore.
> >
> >If I recall correctly, the Python folks addressed this.  One
> >might check that.
> 
> Python does, indeed, have something similar.  See (?P...) and 
> (?P=...) at http://www.python.org/doc/current/lib/re-syntax.html .
> 
> Tom's comment points out a shortcoming in the original RFC:  There's 
> no way to make, by name, a backref to a named group.  I propose to 
> fix that in a revised version of RFC 150.  I don't have strong 
> feelings about what the syntax should be.  Here one idea:
> 
>The substring matched by (?%some_name: ... ) can be referred to as 
> $%{some_name}.
> 
> That's kind of ugly, so other suggestions are welcome.  (The idea was 
> to do something analogous to $1, $2, etc.  Unfortunately ${some_name} 
> is already taken.  Maybe $_{some_name} would also work -- though if 
> %_ seems too valuable to use for this limited purpose.)
> 
> 

Kevin,

I have been having similar thoughts about my RFC 112 (assignment within
a regex).  At present it is worded that it does not generate the back
reference, but I now have some reservations.

Thinking about the comparision between the two RFCs there is some common
ground, but cases where people will want your hash and cases where
people will want explicit variables.  Using RFC 112, you can do
hash assignment, but it would not clear the hash beforehand whereas
your hash assignment would (I assume) set the hash to ONLY those elements
from the regex.

Your %hash = $string =~ /..(?%foo=..)/
is essentially the same as my %hash = (); $string =~ /..(?$hash{foo}=..)/

Do we need both?  I think the answer is prossibly, but whatever is
decided about back refereces should apply to both.

My thoughts on the back references would be, that if a variable is used
again later in the regex, assignment takes place and it is simply refered
to.

Thus $string =~ m#<(?$foo=\w+).*?#;

The parse notices the reuse of $foo and performs the actual assigment
as and when the foo is matched (or at least acts as if it does).

Richard


-- 

[EMAIL PROTECTED]




RFC 138 (v2) Eliminate =~ operator.

2000-09-08 Thread Perl6 RFC Librarian

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Eliminate =~ operator.

=head1 VERSION

  Maintainer: Steve Fink <[EMAIL PROTECTED]>
  Date: 21 Aug 2000
  Last Modified: 8 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 138
  Version: 2
  Status: Withdrawn

=head1 ABSTRACT

Replace EXPR =~ m/.../ with m/.../ EXPR, and similarly for s/// and
tr///. Force an explicit dereference when using qr/.../. Disallow the
implicit treatment of a string as a regular expression to match
against.

=head1 CHANGES

Withdrawn on 8 Sep 2000. Seems like discussion is pretty much over,
Larry's seen it and commented, and RFC164 mostly encompasses the idea,
so I'm withdrawing this one just to clean things up. Besides, I don't
want to maintain too many RFCs and I've got an idea for another one.
:-)

=head1 DESCRIPTION

The EXPR =~ m/.../ syntax is ugly and unintuitive, something only its
mother (awk? sed?) could love. It performs a function that is
semantically no different from other forms of argument passing. This
RFC proposes to eliminate the =~ binding operator and treat m,
tr, and s almost like regular subroutine names but with slightly
different syntax and semantics.

To illustrate the proposal by example, the current

 /pattern/;
 m/pattern/;
 $x =~ /pattern/;
 ($a, $b, $c) = $x =~ /p(a)t(t)e(r)n/;
 gsx =~ s/pattern/subst/gsx;
 $r = qr/pattern/; $x =~ $r;
 $r = "pattern"; $x =~ $r;

would become

 /pattern/;
 m/pattern/;
 /pattern/ $x; OR /pattern/ ($x);
 ($a, $b, $c) = /p(a)t(t)e(r)n/ $x;
 s/pattern/subst/gsx (gsx);
 $r = qr/pattern/; $r->($x);
 same as the previous, or $r = "pattern"; /$r/ ($x);

Specifically, all patterns behave as if they are subroutines with a
($) prototype, except they have the current syntax for their first
argument, and $1 etc. interpolation remain unchanged.

qr/.../ would produces a CODE ref that may be invoked with the pattern
to match against. It would be a regular CODE ref rather than the
current magical Regexp reference type.

=head2 RELATED WACKY IDEA #1: Everything's a reference

Alternatively, we could think of m/.../ as always returning a
reference, so that the syntax is /pattern/->($x). This is much more
visually distinctive, but runs afoul of Larry's "no implicit
dereferencing" rule in order to make /pattern/ default to
/pattern/->($_). On the other hand, $a =~ $b already breaks that rule
by dereferencing qr// refs, so maybe it's not such a big deal.

=head2 RELATED WACKY IDEA #2: Creating references to matching operations

Now forget about the previous alternative and assume as in the main
section that we have /pattern/ ($x) and qr/pattern/->($x). This
naturally leads to \m/pattern/ or \&m/pattern/ as an equivalent for
qr/pattern/, and also introduces \s/pattern/subst/ and
\tr/pattern/subst/ as new reference types.

=head1 IMPLEMENTATION

Minor parser changes. Currently, the relevant rule in perly.y is
B>. The terms would be reversed, and the first
would
need to be renamed to cover only s///, m//, and tr/// (and
equivalents). So it would be something like B>.

=head1 REFERENCES

=head2 Contributors

  Dirk Meyers <[EMAIL PROTECTED]> came up with this idea.




Re: RFC 150 (v1) Extend regex syntax to provide for return of ahash of matched subpatterns

2000-09-08 Thread Kevin Walker

(This thread has been inactive for a while.  See 
http://www.mail-archive.com/perl6-language-regex@perl.org/index.html#0 
0015 for it's short history.)

Long ago Tom Christiansen wrote:

>This is useful in that it would stop being number dependent.
>For example, you can't now safely say
>
>/$var (foo) \1/
>
>and guarantee for arbitrary contents of $var that your you have
>the right number backref anymore.
>
>If I recall correctly, the Python folks addressed this.  One
>might check that.

Python does, indeed, have something similar.  See (?P...) and 
(?P=...) at http://www.python.org/doc/current/lib/re-syntax.html .

Tom's comment points out a shortcoming in the original RFC:  There's 
no way to make, by name, a backref to a named group.  I propose to 
fix that in a revised version of RFC 150.  I don't have strong 
feelings about what the syntax should be.  Here one idea:

   The substring matched by (?%some_name: ... ) can be referred to as 
$%{some_name}.

That's kind of ugly, so other suggestions are welcome.  (The idea was 
to do something analogous to $1, $2, etc.  Unfortunately ${some_name} 
is already taken.  Maybe $_{some_name} would also work -- though if 
%_ seems too valuable to use for this limited purpose.)