RegexParser-0.02 available (fwd)

Jeff Pinyan Mon, 30 Oct 2000 13:53:47 -0800
This may be of interest to people, maybe not.

>RegexParser.pm, a Perl regular expression parser, is available for public
>scrutiny.  It's NOT on CPAN yet -- I plan to wait until I can nail any
>bugs down.
>
>It will also be undergoing some root canal surgery, since I plan to
>rethink the method in which regex_to_string() works.
> 
>The most recent version is 0.02.  It can be downloaded at the following
>URL:
>
>   http://www.pobox.com/~japhy/regexes/RegexParser-stable.tar.gz
>
>Following is the documentation.
>
>
>NAME
>     RegexParser - module for breaking apart simple Perl regular
>     expressions
>
>SYNOPSIS
>       use RegexParser 'regex_to_string';
>       my $filename = regex_to_string qr/\w{10}-\d{3}\.txt/;
>       # something like "jk3429jds2-014.txt"
>
>       use RegexParser 'reverse_regex';
>       my $last_num = reverse_regex qr/(\d+)\D*$/;
>       # (?-ismx:^\D*(\d+))
>
>       use RegexParser 'reverse_match'
>       $numbers = "123 456 678 012";
>       ($match) = reverse_match $numbers => qr/(\d+)\D*$/;
>       # 012
>       @matches = reverse_match $numbers => qr/(\d+)\D*(\d+)\D*$/;
>       # (678,012)
>
>DESCRIPTION
>     This module can break a regular expression down into "nodes" for two
>     major uses. The first is for creating a string that matches a regular
>     expression. The second is for reversing the regular expression, so
>     that you can match from the end of a string more efficiently.
>
>     Because there are several areas where this module can be improved and
>     changed (if the need for it arises), the heuristics by which strings
>     are matched is not constant.
>
>FUNCTIONALITY
>     This is in place of a "bugs" list. If there appears to be an error in
>     one of the areas that is stated as supported, email the author
>     (information below).
>
>     * backreferences
>         It can handle backreferences, and nested backreferences. Since the
>         engine uses `\\[0-3][0-7][0-7]' to match octals and `\\[1-9]\d*'
>         to match backreferences, that means it can technically handle up to
>         99 backreferences without getting shaky. However, due to the use
>         of the `?' and `*' quantifiers on backreferences, read the
>         "conditionals" item below.
>
>     * grouping parentheses
>         The `(?:...)' grouping parentheses are supported. Modifiers
>         related to this structure are listed next.
>
>     * regex modifiers
>         Of the four modifiers, 'i', 's', 'm', and 'x', the engine
>         currently only "supports" the 'x' modifier. 'i' and 's' have no
>         "need" to be supported, since any string that matches without the
>         'i' modifier can match with it, and the engine will not match `\n'
>         by `.' anyway. The 'm' modifier might be supported in the future.
>
>         Modifiers in the form of `(?i)' are currently not supported, but
>         probably will be very soon.
>
>     * anchors
>         Currently, the engine doesn't pay attention to the beginning-of-line
>         or end-of-line anchors when forming a string, since they can be
>         implied by the fact that there's nothing else in the string. They
>         are supported (and are properly reversed, as best as can be) when
>         reversing a regular expression. In addition, the `\b' and `\B'
>         anchors (word boundaries) are not supported when forming a string.
>         Support for these anchors at the beginning and end of a string
>         might come along soon (and be achieved by prepending or appending
>         the proper characters if needed); support for internal placement
>         might come later.
>
>     * escapes
>         The engine supports octal, hexadecimal, and control-sequence
>         escapes.
>
>     * alternation
>         The engine supports alternation.
>
>     * look-ahead and look-behind
>         These are not supported. One reason is because there is the
>         possibility of creating an infinite loop (such as `/(?!foo)foo/'
>         which can never match). Look-behind is not supported for similar
>         reasons. If these ever were to be supported, this engine could
>         technically allow variable-width look-behinds, by employing regular
>         expression reversal (this could get into an ugly loop).
>
>     * cut
>         The cut expression is not supported. It too can make patterns that
>         can never match (see 'perlre' for an example).
>
>     * interpolation
>         The engine only handles regular expression elements, not things
>         that should have been interpolated beforehand. Sending it
>         `$foo|$bar' will thus match either `foo' or `bar', since it won't
>         recognize the variables, and it does not enforce any context
>         around the `$' anchor. If it did, this case would also bring up a
>         case that could never match.
>
>     * evaluation
>         The `(?{CODE})' and `(??{LATER})' expressions are not supported,
>         for rather obvious reasons -- they're difficult to parse and might
>         be dangerous to allow.
>
>     * conditionals
>         The `(?(COND)...|...)' expression is currently not supported, but
>         it might have to be in the near future, due to the nature of some
>         regular expressions when they become reversed:
>
>           /(\w)\d\1*/  ==>  /(?:(\w)\1*)?\d(?(1)\1|\w)/
>
>         The explanation for that cruft is that the regular expression
>         matches strings like "a9", "a9a", "a9aa", and so on. Upon reversal,
>         it matches strings like "9a", "a9a", "aa9a", and so on. The
>         beginning sequence is optional. This can cause problems. The
>         reversed regular expression optionally matches the ending part. If
>         it could match it, it matches the backreference. If it could not
>         match it, then it matches some arbitrary `\w'.
>
>     * inline comments
>         These (`(?#...)') are supported. Remember that they end at the
>         first `)'.
>
>     * quantifiers
>         Quantifiers of the form `*', `*?', `+', `+?', `?', and `??' match
>         once when forming a string. Those of the form `{m,n}' match a
>         random value of times, between *m* and *n*. If there is no *n*, it
>         will match *m* times. This may change in future versions, because
>         there may be the need to match differently in different cases (an
>         example is `/\w*(^\d+)/' which matches when the `\w*' node matches
>         0 times).
>
>     * character classes
>         These are supported. There is currently no error checking as far
>         as ranges are considered. Negated classes are also supported.
>
>     In event of a bug in the code, email the author at [EMAIL PROTECTED]
>     Please use an intelligible subject, such as "RegexParser vX.XX bug:
>     'blah'". Give as much output as possible. For debugging output, set
>     the $RegexParser::DEBUG variable to a true value.
>
>TO DO LIST
>     * add anchor support (at least `BOL' and `EOL')
>     * modify `regex_to_string()' matching heuristics
>
>HISTORY
>
>   0.02 -- Rel. Oct 30, 2000
>
>     Fixed a bug in the `(?:...)' support.
>     Added ability to return backreferences in `regex_to_string()'.
>     Added `reverse_match()' function.
>     Added regex comment support via the `/x' modifier and `(?#...)'.
>
>   0.01 -- Rel. Oct 27, 2000
>
>     Original release.
>
>SEE ALSO
>     re.pm, which is standard and shows debugging output about regexes. And
>     it wouldn't hurt to look at the regex man page (perlre).
>
>AUTHOR
>     Copyright (C) 2000, Jeff `japhy' Pinyan. All rights reserved.
>
>
>--
>Jeff "japhy" Pinyan     [EMAIL PROTECTED]     http://www.pobox.com/~japhy/
>PerlMonth - An Online Perl Magazine            http://www.perlmonth.com/
>The Perl Archive - Articles, Forums, etc.    http://www.perlarchive.com/
>CPAN - #1 Perl Resource  (my id:  PINYAN)        http://search.cpan.org/
RegexParser-0.02 available (fwd)

Reply via email to