DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14954>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14954 A bug caused by '-' in char class def ('[...]') Summary: A bug caused by '-' in char class def ('[...]') Product: Regexp Version: unspecified Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: Other Component: Other AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] When I put a '-' in a character class definition ('[...]'), there are some cases that a simple char in the definition is ignored. In such cases, instructions in REProgram objects are not as expected. This may be related to the bugs #2121 and #5212. For example, '[a-zA]' works fine, while for '[Aa-z]', 'A' is ignored, and for '[abcd\-]', 'd' is ignored. The point is that the ignored char is at 2-chars before '-'. Near Line 710 in RECompiler.java, we can see: > // If simple character and not start of range, include it > if ((idx + 1) >= len || pattern.charAt(idx + 1) != '-') > { > range.include(simpleChar, include); > } In my understanding, idx is pointing the next char of the simpleChar in question. The simpleChar should not be included when its next char (if any) is '-' (in that case, the simpleChar turns to be a start of a new range.) Therefore, the following code seems correct: > if (idx >= len || pattern.charAt(idx) != '-') I tried this fix on the CVS'ed source tree last night, with some new testcases, and it worked fine. I'm not sure there is no side effect of this; at least all tests in RETest.txt are still successful. The diff output follows. Does this help? Ikuya Index: docs/RETest.txt =================================================================== RCS file: /home/cvspublic/jakarta-regexp/docs/RETest.txt,v retrieving revision 1.3 diff -c -r1.3 RETest.txt *** docs/RETest.txt 27 Feb 2001 08:37:05 -0000 1.3 --- docs/RETest.txt 28 Nov 2002 14:22:25 -0000 *************** *** 1011,1014 **** --- 1011,1030 ---- YES aaabc + #168 + [a-zA]+ + JakartaAnt + YES + akartaAnt + #169 + [Aa-z]+ + JakartaAnt + YES + akartaAnt + + #170 + [akrt\-]+ + Jakarta-Ant + YES + akarta- Index: src/java/org/apache/regexp/RECompiler.java =================================================================== RCS file: /home/cvspublic/jakarta- regexp/src/java/org/apache/regexp/RECompiler.java,v retrieving revision 1.4 diff -c -r1.4 RECompiler.java *** src/java/org/apache/regexp/RECompiler.java 27 Feb 2001 08:37:05 -0000 1.4 --- src/java/org/apache/regexp/RECompiler.java 28 Nov 2002 14:22:26 -0000 *************** *** 710,716 **** else { // If simple character and not start of range, include it ! if ((idx + 1) >= len || pattern.charAt(idx + 1) != '-') { range.include(simpleChar, include); } --- 710,716 ---- else { // If simple character and not start of range, include it ! if (idx >= len || pattern.charAt(idx) != '-') { range.include(simpleChar, include); } -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>