RFC 158 (v3) Regular Expression Special Variables
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Regular Expression Special Variables =head1 VERSION Maintainer: Uri Guttman [EMAIL PROTECTED] Date: 25 Aug 2000 Last Modified: 22 Sep 2000 Mailing List: [EMAIL PROTECTED] Number: 158 Version: 3 Status: Frozen Frozen since: v2 =head1 ABSTRACT This RFC addresses ways to make the regex special variables $`, $ and $' not be such pariahs like they are now. =head1 CHANGES I dropped the local scoping of $`, $ and $' as they are already localized now. =head1 DESCRIPTION $`, $ and $' are useful variables which are never used by any experienced Perl hacker since they have well known problems with efficiency. Since they are globals, any use of them anywhere in your code forces all regexes to copy their data for potential later referencing by one of them. I will describe some ideas to make this issue go away and return these variables back into the toolbox where they belong. =head1 IMPLEMENTATION The copy all regex data problem is solved by a new modifier k (for keep). This tells the regex to do the copy so the 3 vars will work properly. So you would use code like this: $str = 'prefoopost' ; if ( $str =~ /foo/k ) { print "pre is [$`]\n" ; print "match is [$]\n" ; print "post is [$']\n" ; } =head1 IMPACT None =head1 UNKNOWNS None =head1 REFERENCES None.
Re: RFC 197 (v1) Numeric Value Ranges In Regular Expressions
Hugo wrote: In [EMAIL PROTECTED], "David L. Nicol" writes: :I think I did -- I guess v2 didn't make it in; I sent it again; what :were your and mjd's comments again? Here are the messages: http://www.mail-archive.com/perl6-language-regex%40perl.org/msg00306.html http://www.mail-archive.com/perl6-language-regex%40perl.org/msg00294.html However if you didn't see them it is too late now, since I see that your v2 freezes the RFC. I think it is a shame there was not more discussion of this - I'm sure the functionality would be useful, but I'm not at all convinced about the syntax. Hugo Thanks. Yes, I had seen them, and they are both valid criticisms. There are more examples in v2. The syntax matches exactly the syntax used for specifying segments of number line in algebra classes. If this goes into the language, people who are writing nonlinear number systems would have to decide whether to support it or not and if so how; that goes w/o saying the inspiration for it, along with its companion piece on an implied grep in certain hash accesses, is a way to ease slicing of "traditional" multidimensional arrays. %Center_of_4x4x4_cube = %FourCube{/[2,3] [2,3] [2,3]/} ; That's an old-fasioned fake multidimensional array, of course, not one of these new creatures. -- David Nicol 816.235.1187 [EMAIL PROTECTED] "The most powerful force in the universe is gossip"
RFC 165 (v3) Allow Varibles in tr///
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Allow Varibles in tr/// =head1 VERSION Maintainer: Richard Proctor [EMAIL PROTECTED] Date: 27 Aug 2000 Last Modified: 22 Sep 2000 Mailing List: [EMAIL PROTECTED] Number: 165 Version: 3 Status: Frozen =head1 ABSTRACT Allow variables in a tr///. At present the only way to do a tr/$foo/$bar/ is to wrap it up in an eval. I dont like using evals for this sort of thing. =head1 DESCRIPTION Suggested syntax: tr/$foo/$bar/e With a /e, tr will expand both the LHS and RHS of the translate function. Either or both could be variables. I am suggesting /e as it is sort of like /e for s///e. These words from MJD: The way tr/// works is that a 256-byte table is constructed at compile time that say for each input character what output character is produced. Then when it's time to apply the tr/// to a string, Perl iterates over the string one character at a time, looks up each character in the table, and replaces it with the corresponding character from the table. With tr///e, you would have to generate the table at run-time. This would suggest that you want the same sorts of optimizations that Perl applies when it encounters a regex that contains variables: 1. Perl should examine the strings to see if they have changed since the last time it executed the code 2. It should rebuild the tables only if the strings changed 3. There should be a /o modifier that promises Perl that the variables will never change. The implementation could be analogous to the way m/.../o is implemented, with two separate op nodes: One that tells Perl 'construct the tables' and one that tells Perl 'transform the string'. The 'construct the tables' node would remove itself from the op tree if it saw that the tr//o modifier was used. Hugo wrote: Definitely. Should be easy to implement. There is a potential for confusion, since it makes the tr/ lists look even more like m/ and s/ patterns, but I think it can only be less confusion than the current state of affairs. It is tempting to make it the default, and have a flag to turn it off (or just backwhack the dagnabbed dollar), and auto-translation of existing scripts would be pretty easy, except that it would presumably fail exactly where people are using the current workaround, by way of eval. Comments by me: Therefore tr///o might be a good idea as well. If Hugo's idea of making this the normal behaviour, the problem of existing evals is avoided by p52p6 changing the eval to a perl5_eval which acts accordingly. (One of MJD's ideas). =head1 IMPLENTATION Hugo: Should be easy to implement. Me: Should not be too complicated, this is just a case of doing existing things in a different context. =head1 CHANGES V2 - Added words from MJD and Hugo - This hopefully in a pre freeze state. V3 - re issued due to an error in posting V2 and now frozen =head1 REFERENCES None yet.
RFC 166 (v3) Alternative lists and quoting of things
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Alternative lists and quoting of things =head1 VERSION Maintainer: Richard Proctor [EMAIL PROTECTED] Date: 27 Aug 2000 Last Modifiedj: 22 Sep 2000 Mailing List: [EMAIL PROTECTED] Number: 166 Version: 3 Status: Developing =head1 ABSTRACT Expand Alternate Lists from Arrays and Quote the contents of things inside regexes. =head1 DESCRIPTION These are a couple of constructs to make it easy to build up regexes from other things. =head2 Alternative Lists from arrays The basic idea is to expand an array as a list of alternatives. There are two possible syntaxs (?@foo) and just plain @foo. @foo might just have existing uses (just), therefore I prefer the (?@foo) syntax. (?@foo) is just syntactic sugar for (?:(??{ join('|',@foo) })) A bracketed list of alternatives. =head2 Quoting the contents of things If a regex uses $foo or @bar there are problems if the content of the variables contain special characters. What is needed is a way of \Quoting the content of scalars $foo or arrays (?@foo). Suggested syntax: (?Q$foo) Quotes the contents of the scalar $foo - equivalent to (??{ quotemeta $foo }). (?Q@foo) Quotes each item in a list (as above) this is equivalent to (?:(??{ join ('|', map quotemeta, @foo)})). In this syntax the Q is used as it represents a more inteligent \Quot\E. It is recognised that (?Q$foo) is equivalent to \Q$foo\E, but it does not mean that this is a bad idea to add this at the same time as (?Q@foo) for reasons of symetry and perl DWIM. =head2 Comments Hugo: (?@foo) and (?Q@foo) are both things I've wanted before now. I'm not sure if this is the right syntax, particularly if RFC 112 is adopted: it would be confusing to have (?@foo) to have so different a meaning from (?$foo=...), and even more so if the latter is ever extended to allow (?@foo=...). I see no reason that implementation should cause any problems since this is purely a regexp-compile time issue. Me: I cant see any reasonable meaning to (?@foo=...) this seams an appropriate syntax, but I am open for others to be suggested. =head1 CHANGES V1 of this RFC had three ideas, one has been dropped, the other is now part of RFC 198. V2 Expands the list expansion and quoting with quoting of scalars and Implemention issues. V3 In an error what should have been 165 V2 was issued as 166 V2 so this is V3 with a change in (?Q$foo). This is in a pre-frozen state. =head1 MIGRATION As (?@foo) and (?Q...) these are additions with out any compatibility issues. The option of just @foo for list exansion, might represent a small problem if people already use the construct. =head1 IMPLENTATION Both of these are changes are regex compile time issues. Generating lists from arrays almost works by localising $" as '|' for the regex and just using @foo. MJD has demonstrated implementing (?@foo) as (?\@foo) by means of an overload of regexes, this slight change was necessary because of the expansion of @foo - see below. Both of these changes are currently affected by the expansion of variables in the regex before the regex compiler gets to work on the regex. This problem also affects several other RFCs. The expansion of variables in regexes needs for these (and other RFCs) to be driven from within the regex compiler so that the regex can expand as and where appropriate. Changing this should not affect any existing behaviour. =head1 REFERENCES RFC 198
RFC 198 (v2) Boolean Regexes
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Boolean Regexes =head1 VERSION Maintainer: Richard Proctor [EMAIL PROTECTED] Date: 6 Sep 2000 Last Modified: 22 Sep 2000 Mailing List: [EMAIL PROTECTED] Number: 198 Version: 2 Status: Developing =head1 ABSTRACT This is a development of the proposal for the "not a pattern" concept in RFC 166 V1. Looking deeper into the handling of advanced regexs, there are potential needs for many other concepts, to allow a regex to extract information directly from a complex file in one go, rather than a mixture of splits and nested regexes as is typically needed today. With these parsing data should become easier (in some cases). =head1 CHANGES V2 - Changed the "Fail Pattern", enhanced the wording for many things. =head1 DESCRIPTION It would be nice (in my opinion) to be able to build more elaborate regexes allowing data to be mined out of a sting in one go. These ideas allow you to apply several patterns to one substring (each must match), to fail a match from within, to look for patterns that do not contain other patterns, and to handle looking for cases such as (foo.*bar)|(bar.*foo) in a more general way of saying "A substring that contains both foo and bar". These are ideas, at present with some proposed syntax. The ideas are more important than the exact syntax at this stage. This is very much work in progress. I have called these boolean regexs as they bring the concepts of and () or (||) and not(!) into the realm of regexes. Within a boolean regex (or the boolean part of a regex), several new symbols have meanings, and some have enhanced meanings. =head2 The Ideas Are these part of a boolean (?...) construct within an existing regex, or is the advanced syntax (and meaning of |!^$) invoked by a new flag such as /B? These can look like line noise so the use of white space with /x is used throughout, and it might be appropriate to enforce (or assume) /x within (...). =head3 Boolean construct (?...) grabs a substring, and applies one or more tests to the substring. =head3 Substring matching multiple patterns () (? pattern1 pattern2 pattern3 ) A substring is definied that matches each pattern. For example, the first pattern may say specify a substring of at least 30 chars, the next two have a foo and a bar. =head3 Substring matching alternative patterns (||) (? pattern1 || pattern2 || pattern3) This is similar to the existing alternative syntax "|" but the alternatives to "|" behave as /^pattern$/ rather than /pattern/ (^ and $ taken as refereing to the substring in this case - see below). (pattern1 || pattern2 || pattern3) can be mixed in with the case above to build up more advanced cases. and || operators can be nested with brackets in normal ways. =head3 Brackets within boolean regexes Within a complex boolean regex there are likely to be lots and lots of brackets to nest and control the behaviour of the regex. Rather than having to sprinkle the regex with (?:) line noise, it would be nicer to just use ordinary brackets () and only support capturing of elements by using one of the (?$=) or (?%=) constructs that have been proposed elsewhere (RFC 112 and RFC 150). There might be some case for this as a general capability using some flag /b = brackets? =head3 Substring not matching a pattern In RFC 166 I originally proposed (?^ pattern ). This proposal replaces that. Though it could be used as well outside of the (?) construct. !pattern matches anything that does not match the pattern. On its own it is not terribly useful, but in conjuction with and || one can do things such as /(? img ! alt=)/ ie does it have an image not have an alt. ! is chosen as it has the same basic meaning outside of regexes. !pattern is a non greedy construct that matches any string/substring that does not match the pattern. =head3 Meaning of $ and ^ inside a boolean regex ^ and $ are taken to mean the begining and end of the substring, not begining and and of the line/string from within a boolean regex. =head3 Greediness Should the (?...) construct be greedy or nongreedy? To some extent this depends on the elements it contains. If all the matching set of patterns are greedy then it will be greedy, if they are not greedy then it will not be. This might or might be sufficient. If the situation is ambiguous (or might be) The boolean can be expresed as (?? ...) to force non greediness. =head3 Delivering a substring to some code that generates a pass/fail (?*{code}) delivers a substring to the code, which returns with success or failure. The code sees the substring as $_. This is not dependant on the Boolean regex concept and could be used for other things, though it is most useful in this context. This is sort of equivalent to (?: (.*)(??{$_ = $1; code})) ie it matches an arbitary long substring and deliveres it to the code. But not dependant on how many brackets have been
RFC 274 (v1) Generalised Additions to Regexs
This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Generalised Additions to Regexs =head1 VERSION Maintainer: Richard Proctor [EMAIL PROTECTED] Date: 22 Sep 2000 Mailing List: [EMAIL PROTECTED] Number: 274 Version: 1 Status: Developing =head1 ABSTRACT This proposes a way for generalised additions to regex capabilities. =head1 DESCIPTION Given that expansion of regexes could include (+...) and (*...) I have been thinking about providing a general purpose way of adding functionality. Hence I propose that the entire (+...) syntax is kept free from formal specification for this. (+ = addition) A module or anything that wants to support some enhanced syntax registers something that handles "regex enhancements". At regex compile time, if and when (+foo) is found perl calls each of the registered regex enhancements in turn, these: 1) Are passed the foo string as a parameter exactly as is. (There is an issue of actually finding the end of the generic foo.) 2) The regex enhancement can either recognise the content or not. 3) If not the enhancement returns undef and perl goes to the next regex enhancement (Does it handle the enhancements as a stack (Last checked first) or a list (First checked first?) how are they scoped? Job here for the OO/scoping fanatics) 4) If perl runs out of registered regex enhancements it reports an error. 5) if an enhancement recognises the content it could do either of: a) return replacement expanded regex using existing capabilities perl will then pass this back through the regex compiler. b) return a coderef that is called at run time when the regex gets to this point. The referenced code needs to have enough access to the regex internals to be able to see the current sub-expression, request more characters, access to relevant flags and visability of greediness. It may also need a coderef that is simarly called when the regex is being unwound when it backtracks. These features would also be of interest to the existing code inside regexes as well. Thinking from that - the last case should be generalised (it is sort of like my (?*{...}) from RFC 198 or an enhancement to (??{...}). If so cases (a) and (b) are the same as case (b) is just a case of returning (?*{...}) the appropriate code. Following on, if (?{...}) etc code is evaluated in forward match, it would be a good idea to likewise support some code block that is ignored on a forward match but is executed when the code is unwound due to backtracking. Thus (?{ foo })(?\{ bar }) executes foo on the forward case and bar if it unwinds. I dont care at the moment what the syntax is - what about the concepts. Think about foo putting something on a stack (eg the bracket to match [RFC 145]) and bar taking it off for example. Note: I dont consider this RFC complete, but after posting this on the regex list to no effect I am making it an RFC to see if it gets a little more feedback... =head1 MIGRATION This is a new feature - no compatibity problems =head1 IMPLENTATION This has not been looked at in detail, but the desciption above provides some views as to how it may operate. =head1 REFERENCES RFC 145 - Bracket matching RFC 198 - Boolean Regexes