regexp RE.java

jon Fri, 09 Mar 2001 14:01:22 -0800
jon         01/03/09 14:17:16

  Modified:    src/java/org/apache/regexp RE.java
  Log:
  fixed javadoc examples that would not compile
  
  for the find: thanks to
  Vladimir Tsichevski <[EMAIL PROTECTED]>
  
  for the fix: thanks to:
  Iain Lowe <[EMAIL PROTECTED]>
  
  fixed some of the 80 col wrapping.
  
  Revision  Changes    Path
  1.9       +108 -93   jakarta-regexp/src/java/org/apache/regexp/RE.java
  
  Index: RE.java
  ===================================================================
  RCS file: /home/cvs/jakarta-regexp/src/java/org/apache/regexp/RE.java,v
  retrieving revision 1.8
  retrieving revision 1.9
  diff -u -r1.8 -r1.9
  --- RE.java   2001/02/20 01:18:45     1.8
  +++ RE.java   2001/03/09 22:17:13     1.9
  @@ -60,20 +60,21 @@
   import java.util.Vector;
   
   /**
  - * RE is an efficient, lightweight regular expression evaluator/matcher class.
  - * Regular expressions are pattern descriptions which enable sophisticated matching 
of
  - * strings.  In addition to being able to match a string against a pattern, you
  - * can also extract parts of the match.  This is especially useful in text parsing!
  - * Details on the syntax of regular expression patterns are given below.
  + * RE is an efficient, lightweight regular expression evaluator/matcher
  + * class. Regular expressions are pattern descriptions which enable
  + * sophisticated matching of strings.  In addition to being able to
  + * match a string against a pattern, you can also extract parts of the
  + * match.  This is especially useful in text parsing! Details on the
  + * syntax of regular expression patterns are given below.
    *
    * <p>
    *
  - * To compile a regular expression (RE), you can simply construct an RE matcher
  - * object from the string specification of the pattern, like this:
  + * To compile a regular expression (RE), you can simply construct an RE
  + * matcher object from the string specification of the pattern, like this:
    *
    * <pre>
    *
  - *     RE r = new RE("a*b");
  + *  RE r = new RE("a*b");
    *
    * </pre>
    *
  @@ -84,7 +85,7 @@
    *
    * <pre>
    *
  - *     boolean matched = r.match("aaaab");
  + *  boolean matched = r.match("aaaab");
    *
    * </pre>
    *
  @@ -92,43 +93,43 @@
    * pattern "a*b" matches the string "aaaab".
    *
    * <p>
  - * If you were interested in the <i>number</i> of a's which matched the first
  - * part of our example expression, you could change the expression to
  + * If you were interested in the <i>number</i> of a's which matched the
  + * first part of our example expression, you could change the expression to
    * "(a*)b".  Then when you compiled the expression and matched it against
    * something like "xaaaab", you would get results like this:
    *
    * <pre>
    *
  - *     RE r = new RE("(a*)b");                  // Compile expression
  - *     boolean matched = r.match("xaaaab");     // Match against "xaaaab"
  + *  RE r = new RE("(a*)b");                  // Compile expression
  + *  boolean matched = r.match("xaaaab");     // Match against "xaaaab"
    *
    * <br>
    *
  - *     String wholeExpr = r.getParen(0);        // wholeExpr will be 'aaaab'
  - *     String insideParens = r.getParen(1);     // insideParens will be 'aaaa'
  + *  String wholeExpr = r.getParen(0);        // wholeExpr will be 'aaaab'
  + *  String insideParens = r.getParen(1);     // insideParens will be 'aaaa'
    *
    * <br>
    *
  - *     int startWholeExpr = getParenStart(0);   // startWholeExpr will be index 1
  - *     int endWholeExpr = getParenEnd(0);       // endWholeExpr will be index 6
  - *     int lenWholeExpr = getParenLength(0);    // lenWholeExpr will be 5
  + *  int startWholeExpr = r.getParenStart(0);   // startWholeExpr will be index 1
  + *  int endWholeExpr = r.getParenEnd(0);       // endWholeExpr will be index 6
  + *  int lenWholeExpr = r.getParenLength(0);    // lenWholeExpr will be 5
    *
    * <br>
    *
  - *     int startInside = getParenStart(1);      // startInside will be index 1
  - *     int endInside = getParenEnd(1);          // endInside will be index 5
  - *     int lenInside = getParenLength(1);       // lenInside will be 4
  + *  int startInside = r.getParenStart(1);    // startInside will be index 1
  + *  int endInside = r.getParenEnd(1);        // endInside will be index 5
  + *  int lenInside = r.getParenLength(1);     // lenInside will be 4
    *
    * </pre>
    *
  - * You can also refer to the contents of a parenthesized expression within
  - * a regular expression itself.  This is called a 'backreference'.  The first
  - * backreference in a regular expression is denoted by \1, the second by \2
  - * and so on.  So the expression:
  + * You can also refer to the contents of a parenthesized expression
  + * within a regular expression itself.  This is called a
  + * 'backreference'.  The first backreference in a regular expression is
  + * denoted by \1, the second by \2 and so on.  So the expression:
    *
    * <pre>
    *
  - *     ([0-9]+)=\1
  + *  ([0-9]+)=\1
    *
    * </pre>
    *
  @@ -146,12 +147,12 @@
    *
    * <br>
    *
  - *    <i>unicodeChar</i>          Matches any identical unicode character
  + *    <i>unicodeChar</i>   Matches any identical unicode character
    *    \                    Used to quote a meta-character (like '*')
    *    \\                   Matches a single '\' character
    *    \0nnn                Matches a given octal character
    *    \xhh                 Matches a given 8-bit hexadecimal character
  - *    \\uhhhh               Matches a given 16-bit hexadecimal character
  + *    \\uhhhh              Matches a given 16-bit hexadecimal character
    *    \t                   Matches an ASCII tab character
    *    \n                   Matches an ASCII newline character
    *    \r                   Matches an ASCII return character
  @@ -178,17 +179,23 @@
    *    [:blank:]            Space and tab characters.
    *    [:cntrl:]            Control characters.
    *    [:digit:]            Numeric characters.
  - *    [:graph:]            Characters that are printable and are also visible. (A 
space is printable, but not visible, while an `a' is both.)
  + *    [:graph:]            Characters that are printable and are also visible.
  + *                         (A space is printable, but not visible, while an 
  + *                         `a' is both.)
    *    [:lower:]            Lower-case alphabetic characters.
  - *    [:print:]            Printable characters (characters that are not control 
characters.)
  - *    [:punct:]            Punctuation characters (characters that are not letter, 
digits, control characters, or space characters).
  - *    [:space:]            Space characters (such as space, tab, and formfeed, to 
name a few).
  + *    [:print:]            Printable characters (characters that are not 
  + *                         control characters.)
  + *    [:punct:]            Punctuation characters (characters that are not letter,
  + *                         digits, control characters, or space characters).
  + *    [:space:]            Space characters (such as space, tab, and formfeed, 
  + *                         to name a few).
    *    [:upper:]            Upper-case alphabetic characters.
    *    [:xdigit:]           Characters that are hexadecimal digits.
    *
    * <br>
    *
  - *  <b><font face=times roman>Non-standard POSIX-style Character Classes</font></b>
  + *  <b><font face=times roman>Non-standard POSIX-style Character 
  + *                            Classes</font></b>
    *
    * <br>
    *
  @@ -201,13 +208,13 @@
    *
    * <br>
    *
  - *    .                    Matches any character other than newline
  - *    \w                   Matches a "word" character (alphanumeric plus "_")
  - *    \W                   Matches a non-word character
  - *    \s                   Matches a whitespace character
  - *    \S                   Matches a non-whitespace character
  - *    \d                   Matches a digit character
  - *    \D                   Matches a non-digit character
  + *    .         Matches any character other than newline
  + *    \w        Matches a "word" character (alphanumeric plus "_")
  + *    \W        Matches a non-word character
  + *    \s        Matches a whitespace character
  + *    \S        Matches a non-whitespace character
  + *    \d        Matches a digit character
  + *    \D        Matches a non-digit character
    *
    * <br>
    *
  @@ -215,10 +222,10 @@
    *
    * <br>
    *
  - *    ^                    Matches only at the beginning of a line
  - *    $                    Matches only at the end of a line
  - *    \b                   Matches only at a word boundary
  - *    \B                   Matches only at a non-word boundary
  + *    ^         Matches only at the beginning of a line
  + *    $         Matches only at the end of a line
  + *    \b        Matches only at a word boundary
  + *    \B        Matches only at a non-word boundary
    *
    * <br>
    *
  @@ -226,12 +233,12 @@
    *
    * <br>
    *
  - *    A*                   Matches A 0 or more times (greedy)
  - *    A+                   Matches A 1 or more times (greedy)
  - *    A?                   Matches A 1 or 0 times (greedy)
  - *    A{n}                 Matches A exactly n times (greedy)
  - *    A{n,}                Matches A at least n times (greedy)
  - *    A{n,m}               Matches A at least n but not more than m times (greedy)
  + *    A*        Matches A 0 or more times (greedy)
  + *    A+        Matches A 1 or more times (greedy)
  + *    A?        Matches A 1 or 0 times (greedy)
  + *    A{n}      Matches A exactly n times (greedy)
  + *    A{n,}     Matches A at least n times (greedy)
  + *    A{n,m}    Matches A at least n but not more than m times (greedy)
    *
    * <br>
    *
  @@ -239,9 +246,9 @@
    *
    * <br>
    *
  - *    A*?                  Matches A 0 or more times (reluctant)
  - *    A+?                  Matches A 1 or more times (reluctant)
  - *    A??                  Matches A 0 or 1 times (reluctant)
  + *    A*?       Matches A 0 or more times (reluctant)
  + *    A+?       Matches A 1 or more times (reluctant)
  + *    A??       Matches A 0 or 1 times (reluctant)
    *
    * <br>
    *
  @@ -249,10 +256,11 @@
    *
    * <br>
    *
  - *    AB                   Matches A followed by B
  - *    A|B                  Matches either A or B
  - *    (A)                  Used for subexpression grouping
  - *   (?:A)                 Used for subexpression clustering (just like grouping 
but no backrefs)
  + *    AB        Matches A followed by B
  + *    A|B       Matches either A or B
  + *    (A)       Used for subexpression grouping
  + *   (?:A)      Used for subexpression clustering (just like grouping but 
  + *              no backrefs)
    *
    * <br>
    *
  @@ -260,15 +268,15 @@
    *
    * <br>
    *
  - *    \1                   Backreference to 1st parenthesized subexpression
  - *    \2                   Backreference to 2nd parenthesized subexpression
  - *    \3                   Backreference to 3rd parenthesized subexpression
  - *    \4                   Backreference to 4th parenthesized subexpression
  - *    \5                   Backreference to 5th parenthesized subexpression
  - *    \6                   Backreference to 6th parenthesized subexpression
  - *    \7                   Backreference to 7th parenthesized subexpression
  - *    \8                   Backreference to 8th parenthesized subexpression
  - *    \9                   Backreference to 9th parenthesized subexpression
  + *    \1    Backreference to 1st parenthesized subexpression
  + *    \2    Backreference to 2nd parenthesized subexpression
  + *    \3    Backreference to 3rd parenthesized subexpression
  + *    \4    Backreference to 4th parenthesized subexpression
  + *    \5    Backreference to 5th parenthesized subexpression
  + *    \6    Backreference to 6th parenthesized subexpression
  + *    \7    Backreference to 7th parenthesized subexpression
  + *    \8    Backreference to 8th parenthesized subexpression
  + *    \9    Backreference to 9th parenthesized subexpression
    *
    * <br>
    *
  @@ -276,20 +284,21 @@
    *
    * <p>
    *
  - * All closure operators (+, *, ?, {m,n}) are greedy by default, meaning that they
  - * match as many elements of the string as possible without causing the overall
  - * match to fail.  If you want a closure to be reluctant (non-greedy), you can
  - * simply follow it with a '?'.  A reluctant closure will match as few elements
  - * of the string as possible when finding matches.  {m,n} closures don't currently
  + * All closure operators (+, *, ?, {m,n}) are greedy by default, meaning
  + * that they match as many elements of the string as possible without
  + * causing the overall match to fail.  If you want a closure to be
  + * reluctant (non-greedy), you can simply follow it with a '?'.  A
  + * reluctant closure will match as few elements of the string as
  + * possible when finding matches.  {m,n} closures don't currently
    * support reluctancy.
    *
    * <p>
    *
  - * RE runs programs compiled by the RECompiler class.  But the RE matcher class
  - * does not include the actual regular expression compiler for reasons of
  - * efficiency.  In fact, if you want to pre-compile one or more regular expressions,
  - * the 'recompile' class can be invoked from the command line to produce compiled
  - * output like this:
  + * RE runs programs compiled by the RECompiler class.  But the RE
  + * matcher class does not include the actual regular expression compiler
  + * for reasons of efficiency.  In fact, if you want to pre-compile one
  + * or more regular expressions, the 'recompile' class can be invoked
  + * from the command line to produce compiled output like this:
    *
    * <pre>
    *
  @@ -309,14 +318,16 @@
    *
    * </pre>
    *
  - * You can then construct a regular expression matcher (RE) object from the 
pre-compiled
  - * expression re1 and thus avoid the overhead of compiling the expression at 
runtime.
  - * If you require more dynamic regular expressions, you can construct a single 
RECompiler
  - * object and re-use it to compile each expression.  Similarly, you can change the
  - * program run by a given matcher object at any time.  However, RE and RECompiler 
are
  - * not threadsafe (for efficiency reasons, and because requiring thread safety in 
this
  - * class is deemed to be a rare requirement), so you will need to construct a 
separate
  - * compiler or matcher object for each thread (unless you do thread synchronization
  + * You can then construct a regular expression matcher (RE) object from
  + * the pre-compiled expression re1 and thus avoid the overhead of
  + * compiling the expression at runtime. If you require more dynamic
  + * regular expressions, you can construct a single RECompiler object and
  + * re-use it to compile each expression. * Similarly, you can change the
  + * program run by a given matcher object at any time. * However, RE and
  + * RECompiler are not threadsafe (for efficiency reasons, and because
  + * requiring thread safety in this class is deemed to be a rare
  + * requirement), so you will need to construct a separate compiler or
  + * matcher object for each thread (unless you do thread synchronization
    * yourself).
    *
    * </pre>
  @@ -326,20 +337,24 @@
    * <i>ISSUES:</i>
    *
    * <ul>
  - *  <li>com.weusours.util.re is not currently compatible with all standard POSIX 
regcomp flags
  - *  <li>com.weusours.util.re does not support POSIX equivalence classes ([=foo=] 
syntax) (I18N/locale issue)
  - *  <li>com.weusours.util.re does not support nested POSIX character classes 
(definitely should, but not completely trivial)
  - *  <li>com.weusours.util.re Does not support POSIX character collation concepts 
([.foo.] syntax) (I18N/locale issue)
  - *  <li>Should there be different matching styles (simple, POSIX, Perl etc?)
  - *  <li>Should RE support character iterators (for backwards RE matching!)?
  - *  <li>Should RE support reluctant {m,n} closures (does anyone care)?
  + *  <li>com.weusours.util.re is not currently compatible with all
  + *      standard POSIX regcomp flags</li>
  + *  <li>com.weusours.util.re does not support POSIX equivalence classes
  + *      ([=foo=] syntax) (I18N/locale issue)</li>
  + *  <li>com.weusours.util.re does not support nested POSIX character
  + *      classes (definitely should, but not completely trivial)</li>
  + *  <li>com.weusours.util.re Does not support POSIX character collation
  + *      concepts ([.foo.] syntax) (I18N/locale issue)</li>
  + *  <li>Should there be different matching styles (simple, POSIX, Perl etc?)</li>
  + *  <li>Should RE support character iterators (for backwards RE matching!)?</li>
  + *  <li>Should RE support reluctant {m,n} closures (does anyone care)?</li>
    *  <li>Not *all* possibilities are considered for greediness when backreferences
    *      are involved (as POSIX suggests should be the case).  The POSIX RE
    *      "(ac*)c*d[ac]*\1", when matched against "acdacaa" should yield a match
    *      of acdacaa where \1 is "a".  This is not the case in this RE package,
    *      and actually Perl doesn't go to this extent either!  Until someone
    *      actually complains about this, I'm not sure it's worth "fixing".
  - *      If it ever is fixed, test #137 in RETest.txt should be updated.
  + *      If it ever is fixed, test #137 in RETest.txt should be updated.</li>
    * </ul>
    *
    * </font>
  @@ -348,7 +363,7 @@
    * @see RECompiler
    *
    * @author <a href="mailto:[EMAIL PROTECTED]">Jonathan Locke</a>
  - * @version $Id: RE.java,v 1.8 2001/02/20 01:18:45 jon Exp $
  + * @version $Id: RE.java,v 1.9 2001/03/09 22:17:13 jon Exp $
    */
   public class RE
   {
cvs commit: jakarta-regexp/src/java/org/apache/regexp RE.java

Reply via email to