jon 01/03/09 14:17:16 Modified: src/java/org/apache/regexp RE.java Log: fixed javadoc examples that would not compile for the find: thanks to Vladimir Tsichevski <[EMAIL PROTECTED]> for the fix: thanks to: Iain Lowe <[EMAIL PROTECTED]> fixed some of the 80 col wrapping. Revision Changes Path 1.9 +108 -93 jakarta-regexp/src/java/org/apache/regexp/RE.java Index: RE.java =================================================================== RCS file: /home/cvs/jakarta-regexp/src/java/org/apache/regexp/RE.java,v retrieving revision 1.8 retrieving revision 1.9 diff -u -r1.8 -r1.9 --- RE.java 2001/02/20 01:18:45 1.8 +++ RE.java 2001/03/09 22:17:13 1.9 @@ -60,20 +60,21 @@ import java.util.Vector; /** - * RE is an efficient, lightweight regular expression evaluator/matcher class. - * Regular expressions are pattern descriptions which enable sophisticated matching of - * strings. In addition to being able to match a string against a pattern, you - * can also extract parts of the match. This is especially useful in text parsing! - * Details on the syntax of regular expression patterns are given below. + * RE is an efficient, lightweight regular expression evaluator/matcher + * class. Regular expressions are pattern descriptions which enable + * sophisticated matching of strings. In addition to being able to + * match a string against a pattern, you can also extract parts of the + * match. This is especially useful in text parsing! Details on the + * syntax of regular expression patterns are given below. * * <p> * - * To compile a regular expression (RE), you can simply construct an RE matcher - * object from the string specification of the pattern, like this: + * To compile a regular expression (RE), you can simply construct an RE + * matcher object from the string specification of the pattern, like this: * * <pre> * - * RE r = new RE("a*b"); + * RE r = new RE("a*b"); * * </pre> * @@ -84,7 +85,7 @@ * * <pre> * - * boolean matched = r.match("aaaab"); + * boolean matched = r.match("aaaab"); * * </pre> * @@ -92,43 +93,43 @@ * pattern "a*b" matches the string "aaaab". * * <p> - * If you were interested in the <i>number</i> of a's which matched the first - * part of our example expression, you could change the expression to + * If you were interested in the <i>number</i> of a's which matched the + * first part of our example expression, you could change the expression to * "(a*)b". Then when you compiled the expression and matched it against * something like "xaaaab", you would get results like this: * * <pre> * - * RE r = new RE("(a*)b"); // Compile expression - * boolean matched = r.match("xaaaab"); // Match against "xaaaab" + * RE r = new RE("(a*)b"); // Compile expression + * boolean matched = r.match("xaaaab"); // Match against "xaaaab" * * <br> * - * String wholeExpr = r.getParen(0); // wholeExpr will be 'aaaab' - * String insideParens = r.getParen(1); // insideParens will be 'aaaa' + * String wholeExpr = r.getParen(0); // wholeExpr will be 'aaaab' + * String insideParens = r.getParen(1); // insideParens will be 'aaaa' * * <br> * - * int startWholeExpr = getParenStart(0); // startWholeExpr will be index 1 - * int endWholeExpr = getParenEnd(0); // endWholeExpr will be index 6 - * int lenWholeExpr = getParenLength(0); // lenWholeExpr will be 5 + * int startWholeExpr = r.getParenStart(0); // startWholeExpr will be index 1 + * int endWholeExpr = r.getParenEnd(0); // endWholeExpr will be index 6 + * int lenWholeExpr = r.getParenLength(0); // lenWholeExpr will be 5 * * <br> * - * int startInside = getParenStart(1); // startInside will be index 1 - * int endInside = getParenEnd(1); // endInside will be index 5 - * int lenInside = getParenLength(1); // lenInside will be 4 + * int startInside = r.getParenStart(1); // startInside will be index 1 + * int endInside = r.getParenEnd(1); // endInside will be index 5 + * int lenInside = r.getParenLength(1); // lenInside will be 4 * * </pre> * - * You can also refer to the contents of a parenthesized expression within - * a regular expression itself. This is called a 'backreference'. The first - * backreference in a regular expression is denoted by \1, the second by \2 - * and so on. So the expression: + * You can also refer to the contents of a parenthesized expression + * within a regular expression itself. This is called a + * 'backreference'. The first backreference in a regular expression is + * denoted by \1, the second by \2 and so on. So the expression: * * <pre> * - * ([0-9]+)=\1 + * ([0-9]+)=\1 * * </pre> * @@ -146,12 +147,12 @@ * * <br> * - * <i>unicodeChar</i> Matches any identical unicode character + * <i>unicodeChar</i> Matches any identical unicode character * \ Used to quote a meta-character (like '*') * \\ Matches a single '\' character * \0nnn Matches a given octal character * \xhh Matches a given 8-bit hexadecimal character - * \\uhhhh Matches a given 16-bit hexadecimal character + * \\uhhhh Matches a given 16-bit hexadecimal character * \t Matches an ASCII tab character * \n Matches an ASCII newline character * \r Matches an ASCII return character @@ -178,17 +179,23 @@ * [:blank:] Space and tab characters. * [:cntrl:] Control characters. * [:digit:] Numeric characters. - * [:graph:] Characters that are printable and are also visible. (A space is printable, but not visible, while an `a' is both.) + * [:graph:] Characters that are printable and are also visible. + * (A space is printable, but not visible, while an + * `a' is both.) * [:lower:] Lower-case alphabetic characters. - * [:print:] Printable characters (characters that are not control characters.) - * [:punct:] Punctuation characters (characters that are not letter, digits, control characters, or space characters). - * [:space:] Space characters (such as space, tab, and formfeed, to name a few). + * [:print:] Printable characters (characters that are not + * control characters.) + * [:punct:] Punctuation characters (characters that are not letter, + * digits, control characters, or space characters). + * [:space:] Space characters (such as space, tab, and formfeed, + * to name a few). * [:upper:] Upper-case alphabetic characters. * [:xdigit:] Characters that are hexadecimal digits. * * <br> * - * <b><font face=times roman>Non-standard POSIX-style Character Classes</font></b> + * <b><font face=times roman>Non-standard POSIX-style Character + * Classes</font></b> * * <br> * @@ -201,13 +208,13 @@ * * <br> * - * . Matches any character other than newline - * \w Matches a "word" character (alphanumeric plus "_") - * \W Matches a non-word character - * \s Matches a whitespace character - * \S Matches a non-whitespace character - * \d Matches a digit character - * \D Matches a non-digit character + * . Matches any character other than newline + * \w Matches a "word" character (alphanumeric plus "_") + * \W Matches a non-word character + * \s Matches a whitespace character + * \S Matches a non-whitespace character + * \d Matches a digit character + * \D Matches a non-digit character * * <br> * @@ -215,10 +222,10 @@ * * <br> * - * ^ Matches only at the beginning of a line - * $ Matches only at the end of a line - * \b Matches only at a word boundary - * \B Matches only at a non-word boundary + * ^ Matches only at the beginning of a line + * $ Matches only at the end of a line + * \b Matches only at a word boundary + * \B Matches only at a non-word boundary * * <br> * @@ -226,12 +233,12 @@ * * <br> * - * A* Matches A 0 or more times (greedy) - * A+ Matches A 1 or more times (greedy) - * A? Matches A 1 or 0 times (greedy) - * A{n} Matches A exactly n times (greedy) - * A{n,} Matches A at least n times (greedy) - * A{n,m} Matches A at least n but not more than m times (greedy) + * A* Matches A 0 or more times (greedy) + * A+ Matches A 1 or more times (greedy) + * A? Matches A 1 or 0 times (greedy) + * A{n} Matches A exactly n times (greedy) + * A{n,} Matches A at least n times (greedy) + * A{n,m} Matches A at least n but not more than m times (greedy) * * <br> * @@ -239,9 +246,9 @@ * * <br> * - * A*? Matches A 0 or more times (reluctant) - * A+? Matches A 1 or more times (reluctant) - * A?? Matches A 0 or 1 times (reluctant) + * A*? Matches A 0 or more times (reluctant) + * A+? Matches A 1 or more times (reluctant) + * A?? Matches A 0 or 1 times (reluctant) * * <br> * @@ -249,10 +256,11 @@ * * <br> * - * AB Matches A followed by B - * A|B Matches either A or B - * (A) Used for subexpression grouping - * (?:A) Used for subexpression clustering (just like grouping but no backrefs) + * AB Matches A followed by B + * A|B Matches either A or B + * (A) Used for subexpression grouping + * (?:A) Used for subexpression clustering (just like grouping but + * no backrefs) * * <br> * @@ -260,15 +268,15 @@ * * <br> * - * \1 Backreference to 1st parenthesized subexpression - * \2 Backreference to 2nd parenthesized subexpression - * \3 Backreference to 3rd parenthesized subexpression - * \4 Backreference to 4th parenthesized subexpression - * \5 Backreference to 5th parenthesized subexpression - * \6 Backreference to 6th parenthesized subexpression - * \7 Backreference to 7th parenthesized subexpression - * \8 Backreference to 8th parenthesized subexpression - * \9 Backreference to 9th parenthesized subexpression + * \1 Backreference to 1st parenthesized subexpression + * \2 Backreference to 2nd parenthesized subexpression + * \3 Backreference to 3rd parenthesized subexpression + * \4 Backreference to 4th parenthesized subexpression + * \5 Backreference to 5th parenthesized subexpression + * \6 Backreference to 6th parenthesized subexpression + * \7 Backreference to 7th parenthesized subexpression + * \8 Backreference to 8th parenthesized subexpression + * \9 Backreference to 9th parenthesized subexpression * * <br> * @@ -276,20 +284,21 @@ * * <p> * - * All closure operators (+, *, ?, {m,n}) are greedy by default, meaning that they - * match as many elements of the string as possible without causing the overall - * match to fail. If you want a closure to be reluctant (non-greedy), you can - * simply follow it with a '?'. A reluctant closure will match as few elements - * of the string as possible when finding matches. {m,n} closures don't currently + * All closure operators (+, *, ?, {m,n}) are greedy by default, meaning + * that they match as many elements of the string as possible without + * causing the overall match to fail. If you want a closure to be + * reluctant (non-greedy), you can simply follow it with a '?'. A + * reluctant closure will match as few elements of the string as + * possible when finding matches. {m,n} closures don't currently * support reluctancy. * * <p> * - * RE runs programs compiled by the RECompiler class. But the RE matcher class - * does not include the actual regular expression compiler for reasons of - * efficiency. In fact, if you want to pre-compile one or more regular expressions, - * the 'recompile' class can be invoked from the command line to produce compiled - * output like this: + * RE runs programs compiled by the RECompiler class. But the RE + * matcher class does not include the actual regular expression compiler + * for reasons of efficiency. In fact, if you want to pre-compile one + * or more regular expressions, the 'recompile' class can be invoked + * from the command line to produce compiled output like this: * * <pre> * @@ -309,14 +318,16 @@ * * </pre> * - * You can then construct a regular expression matcher (RE) object from the pre-compiled - * expression re1 and thus avoid the overhead of compiling the expression at runtime. - * If you require more dynamic regular expressions, you can construct a single RECompiler - * object and re-use it to compile each expression. Similarly, you can change the - * program run by a given matcher object at any time. However, RE and RECompiler are - * not threadsafe (for efficiency reasons, and because requiring thread safety in this - * class is deemed to be a rare requirement), so you will need to construct a separate - * compiler or matcher object for each thread (unless you do thread synchronization + * You can then construct a regular expression matcher (RE) object from + * the pre-compiled expression re1 and thus avoid the overhead of + * compiling the expression at runtime. If you require more dynamic + * regular expressions, you can construct a single RECompiler object and + * re-use it to compile each expression. * Similarly, you can change the + * program run by a given matcher object at any time. * However, RE and + * RECompiler are not threadsafe (for efficiency reasons, and because + * requiring thread safety in this class is deemed to be a rare + * requirement), so you will need to construct a separate compiler or + * matcher object for each thread (unless you do thread synchronization * yourself). * * </pre> @@ -326,20 +337,24 @@ * <i>ISSUES:</i> * * <ul> - * <li>com.weusours.util.re is not currently compatible with all standard POSIX regcomp flags - * <li>com.weusours.util.re does not support POSIX equivalence classes ([=foo=] syntax) (I18N/locale issue) - * <li>com.weusours.util.re does not support nested POSIX character classes (definitely should, but not completely trivial) - * <li>com.weusours.util.re Does not support POSIX character collation concepts ([.foo.] syntax) (I18N/locale issue) - * <li>Should there be different matching styles (simple, POSIX, Perl etc?) - * <li>Should RE support character iterators (for backwards RE matching!)? - * <li>Should RE support reluctant {m,n} closures (does anyone care)? + * <li>com.weusours.util.re is not currently compatible with all + * standard POSIX regcomp flags</li> + * <li>com.weusours.util.re does not support POSIX equivalence classes + * ([=foo=] syntax) (I18N/locale issue)</li> + * <li>com.weusours.util.re does not support nested POSIX character + * classes (definitely should, but not completely trivial)</li> + * <li>com.weusours.util.re Does not support POSIX character collation + * concepts ([.foo.] syntax) (I18N/locale issue)</li> + * <li>Should there be different matching styles (simple, POSIX, Perl etc?)</li> + * <li>Should RE support character iterators (for backwards RE matching!)?</li> + * <li>Should RE support reluctant {m,n} closures (does anyone care)?</li> * <li>Not *all* possibilities are considered for greediness when backreferences * are involved (as POSIX suggests should be the case). The POSIX RE * "(ac*)c*d[ac]*\1", when matched against "acdacaa" should yield a match * of acdacaa where \1 is "a". This is not the case in this RE package, * and actually Perl doesn't go to this extent either! Until someone * actually complains about this, I'm not sure it's worth "fixing". - * If it ever is fixed, test #137 in RETest.txt should be updated. + * If it ever is fixed, test #137 in RETest.txt should be updated.</li> * </ul> * * </font> @@ -348,7 +363,7 @@ * @see RECompiler * * @author <a href="mailto:[EMAIL PROTECTED]">Jonathan Locke</a> - * @version $Id: RE.java,v 1.8 2001/02/20 01:18:45 jon Exp $ + * @version $Id: RE.java,v 1.9 2001/03/09 22:17:13 jon Exp $ */ public class RE {