all regexp RFCs
Hi guys, I'm sorry that time has not permitted me to join and take an active part in the perl6-language-regex list; however, I have grabbed an opportunity to look through the RFCs generated to date, and thought I should throw some comments at you. Apologies in advance for so rudely dumping this lot and _still_ not joining the list; sorry also if I duplicate stuff that's already been said. Feel free to ignore all or any of this. You'll need to cc me if you want me to see replies, and in that case you might want to do what I didn't, and tailor the subject to be more specific. I've tried in particular to add a note about implementation issues in each case. Enjoy, Hugo --- RFC 72: Variable-length lookbehind: the regexp engine should also go backward. == This is an interesting idea. However, it is not obvious to me that there is any practical difference between the existing: /(?<= a+ ) b/x .. and the proposed: /b (?`= a+ )/x .. which implies that implementing one would be as difficult as the other. And if that is the case, fixing (?<=...) to support variable length would be preferable, since it is more general. (Consider /\d+ (? RFC 145: Brace-matching for Perl Regular Expressions === This is an interesting idea. I'm not sure how useful it would actually be: as far as I can see it would not match the block on code such as: use matchpairs '{' => '}'; < stuff... stuff... .. since it also isn't clear to me whether you'd be able to extract the table contents, or the rows, using the mechanisms of this proposal. RFC 150: Extend regex syntax to provide for return of a hash of matched subpatterns === This is cool - I don't think I've seen this suggested before. Implementation might be a bit more work: the backreferences are currently stored as offsets (relative to the start of the string) to the beginning and end of the contents of the backref, and it might be a bit expensive for normal use to extend that either by replacing the start offset with a pointer or by adding an extra per-backref flag. Faster alternatives are possible, but would be more complex. RFC 158: Regular Expression Special Variables === I'd love to see the performance penalty removed. I'm not sure that an extra /k flag is the right solution, though I don't have any concrete alternative to offer. There has been much discussion of this problem on p5p in the past; it would be handy to have some references in the RFC to any of the more informative parts of those threads. RFC 164: Replace =~, !~, m//, s///, and tr// with match(), subst(), and trade() === I don't particularly dislike =~, but I can see that others might. I think this RFC actually has two distinct parts, which should probably be separated: the syntax change, and the changes to behaviour under various contexts. I'm not sure I clearly understand what the latter are, or why they are necessary. I'm particularly confused about: 1. If called in a void context, [the new operators] act on and modify C<$_>, consistent with current behavior. Was this supposed to say 'the C<$str> arguments (or C<$_>)'? The syntax change does not impact on the regexp engine at all as far as I can see; I'm not sure whether implementation would make the perl parser more or less complex. I don't think I understand the other changes well enough to guess at implementation issues. RFC 165: Allow Varibles in tr/// === Definitely. Should be easy to implement. There is a potential for confusion, since it makes the tr/ lists look even more like m/ and s/ patterns, but I think it can only be less confusion than the current state of affairs. It is tempting to make it the default, and have a flag to turn it off (or just backwhack the dagnabbed dollar), and auto-translation of existing scripts would be pretty easy, except that it would presumably fail exactly where people are using the current workaround, by way of eval. It would be helpful to tie down would should occur for @var and %var (but note that this one liner changed between 5.6.0 and 5.7.0: crypt% setperl 5.6.0 crypt% perl -we '/.@x./' In string, @x now must be written as \@x at -e line 1, near ".@x" Execution of -e aborted due to compilation errors. crypt% setperl 5.7.0 crypt% perl -we '/.@x./' Possible unintended interpolation of @x in string at -e line 1. Name "main::x" used only once: possible typo at -e line 1. Use of uninitialized value in pattern match (m//) at -e line 1. crypt% ). RFC 166: Additions to regexs === (?@foo) and (?Q@foo) are both things I've wanted before now. I'm not sure if this is the right syntax, particularly if RFC 112 is adopted: it would be confusing to have (?@foo) to have so different a meaning from (?$foo=...), and even more so if the latter is ever extended to allow (?@foo=...). I see no reason that implementation should cause any problems since this is purely a regexp-compile time issue. (?^pattern) is interesting; I'm not sure I've ever fe
Re: RFC 150 (v1) Extend regex syntax to provide for return of a hash of matched subpatterns
On Fri 08 Sep, Kevin Walker wrote: > (This thread has been inactive for a while. See > http://www.mail-archive.com/perl6-language-regex@perl.org/index.html#0 > 0015 for it's short history.) > > Long ago Tom Christiansen wrote: > > >This is useful in that it would stop being number dependent. > >For example, you can't now safely say > > > >/$var (foo) \1/ > > > >and guarantee for arbitrary contents of $var that your you have > >the right number backref anymore. > > > >If I recall correctly, the Python folks addressed this. One > >might check that. > > Python does, indeed, have something similar. See (?P...) and > (?P=...) at http://www.python.org/doc/current/lib/re-syntax.html . > > Tom's comment points out a shortcoming in the original RFC: There's > no way to make, by name, a backref to a named group. I propose to > fix that in a revised version of RFC 150. I don't have strong > feelings about what the syntax should be. Here one idea: > >The substring matched by (?%some_name: ... ) can be referred to as > $%{some_name}. > > That's kind of ugly, so other suggestions are welcome. (The idea was > to do something analogous to $1, $2, etc. Unfortunately ${some_name} > is already taken. Maybe $_{some_name} would also work -- though if > %_ seems too valuable to use for this limited purpose.) > > Kevin, I have been having similar thoughts about my RFC 112 (assignment within a regex). At present it is worded that it does not generate the back reference, but I now have some reservations. Thinking about the comparision between the two RFCs there is some common ground, but cases where people will want your hash and cases where people will want explicit variables. Using RFC 112, you can do hash assignment, but it would not clear the hash beforehand whereas your hash assignment would (I assume) set the hash to ONLY those elements from the regex. Your %hash = $string =~ /..(?%foo=..)/ is essentially the same as my %hash = (); $string =~ /..(?$hash{foo}=..)/ Do we need both? I think the answer is prossibly, but whatever is decided about back refereces should apply to both. My thoughts on the back references would be, that if a variable is used again later in the regex, assignment takes place and it is simply refered to. Thus $string =~ m#<(?$foo=\w+).*?#; The parse notices the reuse of $foo and performs the actual assigment as and when the foo is matched (or at least acts as if it does). Richard -- [EMAIL PROTECTED]
RFC 138 (v2) Eliminate =~ operator.
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Eliminate =~ operator. =head1 VERSION Maintainer: Steve Fink <[EMAIL PROTECTED]> Date: 21 Aug 2000 Last Modified: 8 Sep 2000 Mailing List: [EMAIL PROTECTED] Number: 138 Version: 2 Status: Withdrawn =head1 ABSTRACT Replace EXPR =~ m/.../ with m/.../ EXPR, and similarly for s/// and tr///. Force an explicit dereference when using qr/.../. Disallow the implicit treatment of a string as a regular expression to match against. =head1 CHANGES Withdrawn on 8 Sep 2000. Seems like discussion is pretty much over, Larry's seen it and commented, and RFC164 mostly encompasses the idea, so I'm withdrawing this one just to clean things up. Besides, I don't want to maintain too many RFCs and I've got an idea for another one. :-) =head1 DESCRIPTION The EXPR =~ m/.../ syntax is ugly and unintuitive, something only its mother (awk? sed?) could love. It performs a function that is semantically no different from other forms of argument passing. This RFC proposes to eliminate the =~ binding operator and treat m, tr, and s almost like regular subroutine names but with slightly different syntax and semantics. To illustrate the proposal by example, the current /pattern/; m/pattern/; $x =~ /pattern/; ($a, $b, $c) = $x =~ /p(a)t(t)e(r)n/; gsx =~ s/pattern/subst/gsx; $r = qr/pattern/; $x =~ $r; $r = "pattern"; $x =~ $r; would become /pattern/; m/pattern/; /pattern/ $x; OR /pattern/ ($x); ($a, $b, $c) = /p(a)t(t)e(r)n/ $x; s/pattern/subst/gsx (gsx); $r = qr/pattern/; $r->($x); same as the previous, or $r = "pattern"; /$r/ ($x); Specifically, all patterns behave as if they are subroutines with a ($) prototype, except they have the current syntax for their first argument, and $1 etc. interpolation remain unchanged. qr/.../ would produces a CODE ref that may be invoked with the pattern to match against. It would be a regular CODE ref rather than the current magical Regexp reference type. =head2 RELATED WACKY IDEA #1: Everything's a reference Alternatively, we could think of m/.../ as always returning a reference, so that the syntax is /pattern/->($x). This is much more visually distinctive, but runs afoul of Larry's "no implicit dereferencing" rule in order to make /pattern/ default to /pattern/->($_). On the other hand, $a =~ $b already breaks that rule by dereferencing qr// refs, so maybe it's not such a big deal. =head2 RELATED WACKY IDEA #2: Creating references to matching operations Now forget about the previous alternative and assume as in the main section that we have /pattern/ ($x) and qr/pattern/->($x). This naturally leads to \m/pattern/ or \&m/pattern/ as an equivalent for qr/pattern/, and also introduces \s/pattern/subst/ and \tr/pattern/subst/ as new reference types. =head1 IMPLEMENTATION Minor parser changes. Currently, the relevant rule in perly.y is B>. The terms would be reversed, and the first would need to be renamed to cover only s///, m//, and tr/// (and equivalents). So it would be something like B>. =head1 REFERENCES =head2 Contributors Dirk Meyers <[EMAIL PROTECTED]> came up with this idea.
Re: RFC 150 (v1) Extend regex syntax to provide for return of ahash of matched subpatterns
(This thread has been inactive for a while. See http://www.mail-archive.com/perl6-language-regex@perl.org/index.html#0 0015 for it's short history.) Long ago Tom Christiansen wrote: >This is useful in that it would stop being number dependent. >For example, you can't now safely say > >/$var (foo) \1/ > >and guarantee for arbitrary contents of $var that your you have >the right number backref anymore. > >If I recall correctly, the Python folks addressed this. One >might check that. Python does, indeed, have something similar. See (?P...) and (?P=...) at http://www.python.org/doc/current/lib/re-syntax.html . Tom's comment points out a shortcoming in the original RFC: There's no way to make, by name, a backref to a named group. I propose to fix that in a revised version of RFC 150. I don't have strong feelings about what the syntax should be. Here one idea: The substring matched by (?%some_name: ... ) can be referred to as $%{some_name}. That's kind of ugly, so other suggestions are welcome. (The idea was to do something analogous to $1, $2, etc. Unfortunately ${some_name} is already taken. Maybe $_{some_name} would also work -- though if %_ seems too valuable to use for this limited purpose.)