http://nagoya.apache.org/bugzilla/show_bug.cgi?id=2121 *** shadow/2121 Mon Jun 11 12:26:15 2001 --- shadow/2121.tmp.13736 Thu Jun 21 14:29:55 2001 *************** *** 4,10 **** | Bug #: 2121 Product: Regexp | | Status: NEW Version: unspecified | | Resolution: Platform: Other | ! | Severity: Normal OS/Version: Windows NT/2K | | Priority: Other Component: Other | +----------------------------------------------------------------------------+ | Assigned To: [EMAIL PROTECTED] | --- 4,10 ---- | Bug #: 2121 Product: Regexp | | Status: NEW Version: unspecified | | Resolution: Platform: Other | ! | Severity: Normal OS/Version: All | | Priority: Other Component: Other | +----------------------------------------------------------------------------+ | Assigned To: [EMAIL PROTECTED] | *************** *** 69,71 **** --- 69,150 ---- reTest( s, "([a-z0-9.\\-]+)" ); reTest( s, "([a-z0-9\\.-]+)" ); %> + + ------- Additional Comments From [EMAIL PROTECTED] 2001-06-21 14:29 ------- + Here's an contribution to [EMAIL PROTECTED], + subject "What are we doing in regards to JDK 1.4?". + + It contains untested fixes. + + At 09:42 21-6-2001 -0700, Jon wrote: + Edwin, + + on 6/21/01 7:16 AM, "Edwin Martin" <[EMAIL PROTECTED]> wrote: + + - > org.apache.regexp 1.2 is pretty much broken. It has some + - > major flaws since 1.0 and they are still not addressed. + - > + - > See http://nagoya.betaversion.org/bugzilla/buglist.cgi?product=Regexp + - > for a list of bugs (BTW none of them is assigned). + - + - Sending in bug reports doesn't get the problems fixed. This is a community + - of VOLUNTEERS. You can't just magically put in a bug report and then someone + - is going to jump up and fix it...you have to submit patches or try to nicely + - motivate people to fix it for you. + - + - <http://jakarta.apache.org/site/understandingopensource.html> + - + - "With the opensource system, if you find any deficiency in the project, the + - onus is on you to redress that deficiency." + + I thought submitting bug reports is also an important + way to support Open Source. + + Well, I looked at the regexp-code and saw one of the bugs: + + RECompiler.java, line 664: + + // Premature end of range. define up to Character.MAX_VALUE + if ((idx + 1) < len && pattern.charAt(++idx) == ']') + { + simpleChar = Character.MAX_VALUE; + break; + } + + The code makes any minus a range. + + The RE "[a-]" becomes "the character a and anything after it". + + A minus at the beginning or the end should be just a minus. + + The code should be something like this: + + // Premature end of range. define up to Character.MAX_VALUE + if ((idx + 1) < len && pattern.charAt(++idx) == ']') + { + definingRange = false; + break; + } + + Futhermore, RECompiler.java, line 697: + + if ((idx + 1) >= len || pattern.charAt(idx + 1) != '-') + + Should become something like: + + if ((idx + 1) >= len || !(pattern.charAt(idx + 1) == '-' && + !((idx + 2) <= len && pattern.charAt(idx + 2) == ']'))) + + Which means: Do not include a char when followed by a minus, but DO include the + char when the minus is followed by a ']'. + + The code still does not address the possibility of a charclass which starts with a + minus, like "[-a]" or "[^-a]", but that shouldn't be too difficult to implement. + + It isn't really that hard to fix these bugs, I just wonder if there's anybody + responsible for the regexp package. + + And by the way, you don't have to shout. + + Bye, + Edwin Martin. \ No newline at end of file