[ https://issues.apache.org/jira/browse/XERCESC-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612661#action_12612661 ]
David Bertoni commented on XERCESC-1816: ---------------------------------------- It's supposed to match the NameChar production in the XML recommendation, right? Looking at the code, it's clear things aren't even implemented: Token* RegxParser::processBacksolidus_c() { XMLCh ch; //Must be in 0x0040-0x005F if (fOffset >= fStringLen || ((ch = fString[fOffset++]) & 0xFFE0) != 0x0040) ThrowXMLwithMemMgr(ParseException,XMLExcepts::Parser_Atom1, fMemoryManager); processNext(); return fTokenFactory->createChar(ch - 0x40); } Token* RegxParser::processBacksolidus_C() { // REVISIT - Do we throw an exception - we do not want to throw too // many exceptions return 0; } Token* RegxParser::processBacksolidus_i() { processNext(); return fTokenFactory->createChar(chLatin_i); } Token* RegxParser::processBacksolidus_I() { //Ditto return 0; } I'm not sure why we "do not want to throw too many exceptions," which seems better to me than pretending something's actually implemented when it's not. I would guess calling fTokenFactory->getRange(fgXMLNameChar, false) would do the trick for "\c" and fTokenFactory->getRange(fgXMLNameChar, true) would work for "\C". "\C" and "\I" are causing an infinite loop because the associated functions return 0 without calling processNext(). What a mess -- thanks for actually working on this. > Multi-character escape classes don't work correctly in regular expressions > -------------------------------------------------------------------------- > > Key: XERCESC-1816 > URL: https://issues.apache.org/jira/browse/XERCESC-1816 > Project: Xerces-C++ > Issue Type: Bug > Components: Validating Parser (XML Schema) > Affects Versions: 2.8.0, 3.0.0 > Reporter: John Snelson > > The regular expressions "\i", "\I", "\c" and "\C" do not work as specified in > the XML Schema specification: > http://www.w3.org/TR/xmlschema-2/#nt-MultiCharEsc > In fact, "\I" and "\C" cause an infinite loop during the parsing of the > regular expression, "\i" seems to only match the letter "i", and "\c" gives > the error: > A character in U+0040-U+005f must follow '\c'. > I'd be happy to attempt to fix this bug, but I need some guidance as to what > the code for "\c" is actually meant to be doing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]