Re: [PR] [XERCESJ-1781] Javadoc fixes in org.apache.xerces.impl [xerces-j]

via GitHub Mon, 03 Nov 2025 04:29:17 -0800


elharo commented on code in PR #41:
URL: https://github.com/apache/xerces-j/pull/41#discussion_r2486318396



##########
src/org/apache/xerces/impl/xpath/regex/RegularExpression.java:
##########
@@ -27,223 +27,212 @@
  * A regular expression matching engine using Non-deterministic Finite 
Automaton (NFA).
  * This engine does not conform to the POSIX regular expression.
  *
- * <hr width="50%">
  * <h3>How to use</h3>
  *
  * <dl>
  *   <dt>A. Standard way
  *   <dd>
  * <pre>
- * RegularExpression re = new RegularExpression(<var>regex</var>);
+ * {@code
+ * RegularExpression re = new RegularExpression(regex);
  * if (re.matches(text)) { ... }
+ * }
  * </pre>
  *
  *   <dt>B. Capturing groups
  *   <dd>
  * <pre>
- * RegularExpression re = new RegularExpression(<var>regex</var>);
+ * {@code
+ * RegularExpression re = new RegularExpression(regex);
  * Match match = new Match();
  * if (re.matches(text, match)) {
  *     ... // You can refer captured texts with methods of the 
<code>Match</code> class.
  * }
+ * }
  * </pre>
  *
  * </dl>
  *
  * <h4>Case-insensitive matching</h4>
  * <pre>
+ * {@code
  * RegularExpression re = new RegularExpression(<var>regex</var>, "i");
  * if (re.matches(text) >= 0) { ...}
+ * }
  * </pre>
  *
  * <h4>Options</h4>
- * <p>You can specify options to <a href="#RegularExpression(java.lang.String, 
java.lang.String)"><code>RegularExpression(</code><var>regex</var><code>, 
</code><var>options</var><code>)</code></a>
- *    or <a href="#setPattern(java.lang.String, 
java.lang.String)"><code>setPattern(</code><var>regex</var><code>, 
</code><var>options</var><code>)</code></a>.
- *    This <var>options</var> parameter consists of the following characters.
- * </p>
- * <dl>
- *   <dt><a name="I_OPTION"><code>"i"</code></a>
- *   <dd>This option indicates case-insensitive matching.
- *   <dt><a name="M_OPTION"><code>"m"</code></a>
- *   <dd class="REGEX"><kbd>^</kbd> and <kbd>$</kbd> consider the EOL 
characters within the text.
- *   <dt><a name="S_OPTION"><code>"s"</code></a>
- *   <dd class="REGEX"><kbd>.</kbd> matches any one character.
- *   <dt><a name="U_OPTION"><code>"u"</code></a>
- *   <dd class="REGEX">Redefines <Kbd>\d \D \w \W \s \S \b \B \&lt; \></kbd> 
as becoming to Unicode.
- *   <dt><a name="W_OPTION"><code>"w"</code></a>
- *   <dd class="REGEX">By this option, <kbd>\b \B \&lt; \></kbd> are processed 
with the method of
- *      'Unicode Regular Expression Guidelines' Revision 4.
- *      When "w" and "u" are specified at the same time,
- *      <kbd>\b \B \&lt; \></kbd> are processed for the "w" option.
- *   <dt><a name="COMMA_OPTION"><code>","</code></a>
- *   <dd>The parser treats a comma in a character class as a range separator.
- *      <kbd class="REGEX">[a,b]</kbd> matches <kbd>a</kbd> or <kbd>,</kbd> or 
<kbd>b</kbd> without this option.
- *      <kbd class="REGEX">[a,b]</kbd> matches <kbd>a</kbd> or <kbd>b</kbd> 
with this option.
- *
- *   <dt><a name="X_OPTION"><code>"X"</code></a>
- *   <dd class="REGEX">
- *       By this option, the engine confoms to <a 
href="http://www.w3.org/TR/2000/WD-xmlschema-2-20000407/#regexs";>XML Schema: 
Regular Expression</a>.
- *       The <code>match()</code> method does not do subsring matching
- *       but entire string matching.
+ * <p>You can specify options to {@link #RegularExpression(String, String)} or 
{@link #setPattern(String, String)}.</p>
+ * <p>This <code>options</code> parameter consists of the following 
characters:</p>
+ * <ul>
+ *   <li><code>i</code> : This option indicates case-insensitive matching.</li>
+ *   <li><code>m</code> : <code>^</code> and <code>$</code> consider the EOL 
characters within the text.</li>
+ *   <li><code>s</code> : <code>.</code> matches any one character.</li>
+ *   <li><code>u</code> : Redefines <code>\d \D \w \W \s \S \b \B \&lt; 
\></code> as being Unicode.</li>
+ *   <li><code>w</code> : With this option, <code>\b \B \&lt; \></code> are 
processed with the method of 'Unicode Regular Expression Guidelines' Revision 
4. When "w" and "u" are specified at the same time, <code>\b \B \&lt; \></code> 
are processed for the "w" option.</li>
+ *   <li><code>,</code> : The parser treats a comma in a character class as a 
range separator.
+ *   <ul>
+ *       <li><code>[a,b]</code> matches <code>a</code> or <code>,</code> or 
<code>b</code> without this option.</li>
+ *       <li><code>[a,b]</code> matches <code>a</code> or <code>b</code> with 
this option.</li>
+ *   </ul>
+ *   </li>
+ *   <li><code>X</code> : With this option, the engine conforms to <a 
href="https://www.w3.org/TR/2000/WD-xmlschema-2-20000407/#regexs";>XML Schema: 
Regular Expression</a>. The <code>match()</code> method does not do substring 
matching but entire string matching.</li>
+ * </ul>
  *
- * </dl>
- * 
- * <hr width="50%">
  * <h3>Syntax</h3>
- * <table border="1" bgcolor="#ddeeff">
- *   <tr>
- *    <td>
- *     <h4>Differences from the Perl 5 regular expression</h4>
- *     <ul>
- *      <li>There is 6-digit hexadecimal character representation  
(<kbd>\u005cv</kbd><var>HHHHHH</var>.)
- *      <li>Supports subtraction, union, and intersection operations for 
character classes.
- *      <li>Not supported: <kbd>\</kbd><var>ooo</var> (Octal character 
representations),
- *          <Kbd>\G</kbd>, <kbd>\C</kbd>, <kbd>\l</kbd><var>c</var>,
- *          <kbd>\u005c u</kbd><var>c</var>, <kbd>\L</kbd>, <kbd>\U</kbd>,
- *          <kbd>\E</kbd>, <kbd>\Q</kbd>, 
<kbd>\N{</kbd><var>name</var><kbd>}</kbd>,
- *          <Kbd>(?{<kbd><var>code</var><kbd>})</kbd>, 
<Kbd>(??{<kbd><var>code</var><kbd>})</kbd>
- *     </ul>
- *    </td>
- *   </tr>
- * </table>
  *
- * <p>Meta characters are `<KBD>. * + ? { [ ( ) | \ ^ $</KBD>'.</p>
+ * <h4>Differences from Perl 5 regular expression</h4>
+ * <ul>
+ *  <li>There is 6-digit hexadecimal character representation 
(<code>\vHHHHHH</code>).
+ *  <li>Supports subtraction, union, and intersection operations for character 
classes.
+ *  <li>Not supported:
+ *  <ul>
+ *    <li><code>\ooo</code> (Octal character representations)</li>
+ *    <li><code>\G</code>, <code>\C</code>, <code>\lc</code></li>
+ *    <li><code>\ uc</code>, <code>\L</code>, <code>\U</code></li>

Review Comment:
   I think that's an extra space between \ and uc that shouldn't be there



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [XERCESJ-1781] Javadoc fixes in org.apache.xerces.impl [xerces-j]

Reply via email to