This may be of interest to people, maybe not. >RegexParser.pm, a Perl regular expression parser, is available for public >scrutiny. It's NOT on CPAN yet -- I plan to wait until I can nail any >bugs down. > >It will also be undergoing some root canal surgery, since I plan to >rethink the method in which regex_to_string() works. > >The most recent version is 0.02. It can be downloaded at the following >URL: > > http://www.pobox.com/~japhy/regexes/RegexParser-stable.tar.gz > >Following is the documentation. > > >NAME > RegexParser - module for breaking apart simple Perl regular > expressions > >SYNOPSIS > use RegexParser 'regex_to_string'; > my $filename = regex_to_string qr/\w{10}-\d{3}\.txt/; > # something like "jk3429jds2-014.txt" > > use RegexParser 'reverse_regex'; > my $last_num = reverse_regex qr/(\d+)\D*$/; > # (?-ismx:^\D*(\d+)) > > use RegexParser 'reverse_match' > $numbers = "123 456 678 012"; > ($match) = reverse_match $numbers => qr/(\d+)\D*$/; > # 012 > @matches = reverse_match $numbers => qr/(\d+)\D*(\d+)\D*$/; > # (678,012) > >DESCRIPTION > This module can break a regular expression down into "nodes" for two > major uses. The first is for creating a string that matches a regular > expression. The second is for reversing the regular expression, so > that you can match from the end of a string more efficiently. > > Because there are several areas where this module can be improved and > changed (if the need for it arises), the heuristics by which strings > are matched is not constant. > >FUNCTIONALITY > This is in place of a "bugs" list. If there appears to be an error in > one of the areas that is stated as supported, email the author > (information below). > > * backreferences > It can handle backreferences, and nested backreferences. Since the > engine uses `\\[0-3][0-7][0-7]' to match octals and `\\[1-9]\d*' > to match backreferences, that means it can technically handle up to > 99 backreferences without getting shaky. However, due to the use > of the `?' and `*' quantifiers on backreferences, read the > "conditionals" item below. > > * grouping parentheses > The `(?:...)' grouping parentheses are supported. Modifiers > related to this structure are listed next. > > * regex modifiers > Of the four modifiers, 'i', 's', 'm', and 'x', the engine > currently only "supports" the 'x' modifier. 'i' and 's' have no > "need" to be supported, since any string that matches without the > 'i' modifier can match with it, and the engine will not match `\n' > by `.' anyway. The 'm' modifier might be supported in the future. > > Modifiers in the form of `(?i)' are currently not supported, but > probably will be very soon. > > * anchors > Currently, the engine doesn't pay attention to the beginning-of-line > or end-of-line anchors when forming a string, since they can be > implied by the fact that there's nothing else in the string. They > are supported (and are properly reversed, as best as can be) when > reversing a regular expression. In addition, the `\b' and `\B' > anchors (word boundaries) are not supported when forming a string. > Support for these anchors at the beginning and end of a string > might come along soon (and be achieved by prepending or appending > the proper characters if needed); support for internal placement > might come later. > > * escapes > The engine supports octal, hexadecimal, and control-sequence > escapes. > > * alternation > The engine supports alternation. > > * look-ahead and look-behind > These are not supported. One reason is because there is the > possibility of creating an infinite loop (such as `/(?!foo)foo/' > which can never match). Look-behind is not supported for similar > reasons. If these ever were to be supported, this engine could > technically allow variable-width look-behinds, by employing regular > expression reversal (this could get into an ugly loop). > > * cut > The cut expression is not supported. It too can make patterns that > can never match (see 'perlre' for an example). > > * interpolation > The engine only handles regular expression elements, not things > that should have been interpolated beforehand. Sending it > `$foo|$bar' will thus match either `foo' or `bar', since it won't > recognize the variables, and it does not enforce any context > around the `$' anchor. If it did, this case would also bring up a > case that could never match. > > * evaluation > The `(?{CODE})' and `(??{LATER})' expressions are not supported, > for rather obvious reasons -- they're difficult to parse and might > be dangerous to allow. > > * conditionals > The `(?(COND)...|...)' expression is currently not supported, but > it might have to be in the near future, due to the nature of some > regular expressions when they become reversed: > > /(\w)\d\1*/ ==> /(?:(\w)\1*)?\d(?(1)\1|\w)/ > > The explanation for that cruft is that the regular expression > matches strings like "a9", "a9a", "a9aa", and so on. Upon reversal, > it matches strings like "9a", "a9a", "aa9a", and so on. The > beginning sequence is optional. This can cause problems. The > reversed regular expression optionally matches the ending part. If > it could match it, it matches the backreference. If it could not > match it, then it matches some arbitrary `\w'. > > * inline comments > These (`(?#...)') are supported. Remember that they end at the > first `)'. > > * quantifiers > Quantifiers of the form `*', `*?', `+', `+?', `?', and `??' match > once when forming a string. Those of the form `{m,n}' match a > random value of times, between *m* and *n*. If there is no *n*, it > will match *m* times. This may change in future versions, because > there may be the need to match differently in different cases (an > example is `/\w*(^\d+)/' which matches when the `\w*' node matches > 0 times). > > * character classes > These are supported. There is currently no error checking as far > as ranges are considered. Negated classes are also supported. > > In event of a bug in the code, email the author at [EMAIL PROTECTED] > Please use an intelligible subject, such as "RegexParser vX.XX bug: > 'blah'". Give as much output as possible. For debugging output, set > the $RegexParser::DEBUG variable to a true value. > >TO DO LIST > * add anchor support (at least `BOL' and `EOL') > * modify `regex_to_string()' matching heuristics > >HISTORY > > 0.02 -- Rel. Oct 30, 2000 > > Fixed a bug in the `(?:...)' support. > Added ability to return backreferences in `regex_to_string()'. > Added `reverse_match()' function. > Added regex comment support via the `/x' modifier and `(?#...)'. > > 0.01 -- Rel. Oct 27, 2000 > > Original release. > >SEE ALSO > re.pm, which is standard and shows debugging output about regexes. And > it wouldn't hurt to look at the regex man page (perlre). > >AUTHOR > Copyright (C) 2000, Jeff `japhy' Pinyan. All rights reserved. > > >-- >Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/ >PerlMonth - An Online Perl Magazine http://www.perlmonth.com/ >The Perl Archive - Articles, Forums, etc. http://www.perlarchive.com/ >CPAN - #1 Perl Resource (my id: PINYAN) http://search.cpan.org/