Hi Jernej, Jernej Tuljak <jernej.tul...@gmail.com> wrote on 08/14/2013 03:41:17 AM:
> Hi, > > we're abusing org.apache.xerces.impl.xpath.regex.RegularExpression Yep. :-) > to validate XSD flavor regular expression strings and later matching > test strings against them. It seemingly worked, until someone tried > to use a very specific regex. > > Here's the code: > > import org.apache.xerces.impl.xpath.regex.RegularExpression; > > public class XercesRegexTest { > > public static void main(String[] args) { > String regexString = "([a-zA-Z][^ ]*)"; > RegularExpression regex = new RegularExpression(regexString, "x"); > System.out.println(regex.toString()); > } > > } > > The `x` option is supposed to make the regex engine conform to XSD > regular expressions. Only 'X' does that. That is the only option which Xerces uses internally. > But if you run this code, you'll end up with > > Exception in thread "main" > org.apache.xerces.impl.xpath.regex.ParseException: Unexpected end of > the pattern in a character class. > at org.apache.xerces.impl.xpath.regex.RegexParser.ex(Unknown Source) > at > org.apache.xerces.impl.xpath.regex.RegexParser.parseCharacterClass > (Unknown Source) > at org.apache.xerces.impl.xpath.regex.RegexParser.parseAtom > (Unknown Source) > at > org.apache.xerces.impl.xpath.regex.RegexParser.parseFactor(Unknown Source) > at org.apache.xerces.impl.xpath.regex.RegexParser.parseTerm > (Unknown Source) > at org.apache.xerces.impl.xpath.regex.RegexParser.parseRegex > (Unknown Source) > at > org.apache.xerces.impl.xpath.regex.RegexParser.processParen(Unknown Source) > at org.apache.xerces.impl.xpath.regex.RegexParser.parseAtom > (Unknown Source) > at > org.apache.xerces.impl.xpath.regex.RegexParser.parseFactor(Unknown Source) > at org.apache.xerces.impl.xpath.regex.RegexParser.parseTerm > (Unknown Source) > at org.apache.xerces.impl.xpath.regex.RegexParser.parseRegex > (Unknown Source) > at org.apache.xerces.impl.xpath.regex.RegexParser.parse > (Unknown Source) > at > org.apache.xerces.impl.xpath.regex.RegularExpression.setPattern > (Unknown Source) > at > org.apache.xerces.impl.xpath.regex.RegularExpression.setPattern > (Unknown Source) > at > org.apache.xerces.impl.xpath.regex.RegularExpression.<init>(Unknown Source) > at com.mgsoft.testing.regex.XercesRegexTest.main > (XercesRegexTest.java:9) > Java Result: 1 > > It first looked like a bug in Xerces' regular expression parser, but > after re-reading the documentation (http://xerces.apache.org/xerces- > j/apiDocs/org/apache/xerces/utils/regex/RegularExpression.html) of > this class, I found out that the `x` option should actually be `X` > (upper case). The docs for that class probably haven't changed much over the years but worth pointing out that that's the Xerces-J 1.x documentation not Xerces-J 2.x. > Thing is...it worked for countless other regular > expressions. In fact it is that space that is causing problems, any > other char works fine. Also removing the option and using the single > string constructor of `RegularExpression` works fine. If you're not specifying 'X' then you're using a mode that isn't XSD and that we never use. > Does anyone know why this is happening? I realize that this class is > probably not intended for such usage, but since the spec we're > implementing uses XSD regular expressions, we tried to avoid > reinventing the wheel though re-usage. Works for me with the current code in SVN. > We are using xercesImpl.jar that is distributed with xalan-j 2.7.1. Whatever you got out of Xalan-J 2.7.1 would be very old now. Have you tried Xerces-J 2.11.0? Thanks. Michael Glavassevich XML Technologies and WAS Development IBM Toronto Lab E-mail: mrgla...@ca.ibm.com E-mail: mrgla...@apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org For additional commands, e-mail: j-users-h...@xerces.apache.org