[ https://issues.apache.org/jira/browse/STDCXX-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jojo Jose updated STDCXX-1053: ------------------------------ Description: Hi All, Please let me know, if anybody can provide some clue on this. I have been using Xerces as XML parser in my C++ application and I have recently migrated my Xerces version from 1.3 (very old) to 3.1. After that, when I call AbstractDOMParser::parse(const xercesc_3_1::InputSource & source={...}) and passing a Unicode file as input, it pops up exception. However the same works ok for ANSI. The call stack is as shown below. xerces-c_3_1.dll!xercesc_3_1::XMLScanner::scanProlog() Line 1227 + 0x25 bytes xerces-c_3_1.dll!xercesc_3_1::IGXMLScanner::scanDocument(const xercesc_3_1::InputSource & src={...}) Line 210 xerces-c_3_1.dll!xercesc_3_1::AbstractDOMParser::parse(const xercesc_3_1::InputSource & source={...}) Line 549 EPConfigTool.dll!XCfgXMLParser::parse() Line 66 - // My application code In the code, it is reaching at else { emitError(XMLErrs::InvalidDocumentStructure); ... } The function at parse fail is as shown below: void XMLScanner::scanProlog() { bool sawDocTypeDecl = false; // Get a buffer for whitespace processing XMLBufBid bbCData(&fBufMgr); // Loop through the prolog. If there is no content, this could go all // the way to the end of the file. try { while (true) { const XMLCh nextCh = fReaderMgr.peekNextChar(); if (nextCh == chOpenAngle) { // Ok, it could be the xml decl, a comment, the doc type line, // or the start of the root element. if (checkXMLDecl(true)) { // There shall be at lease --ONE-- space in between // the tag '<?xml' and the VersionInfo. // // If we are not at line 1, col 6, then the decl was not // the first text, so its invalid. const XMLReader* curReader = fReaderMgr.getCurrentReader(); if ((curReader->getLineNumber() != 1) || (curReader->getColumnNumber() != 7)) { emitError(XMLErrs::XMLDeclMustBeFirst); } scanXMLDecl(Decl_XML); } else if (fReaderMgr.skippedString(XMLUni::fgPIString)) { scanPI(); } else if (fReaderMgr.skippedString(XMLUni::fgCommentString)) { scanComment(); } else if (fReaderMgr.skippedString(XMLUni::fgDocTypeString)) { if (sawDocTypeDecl) { emitError(XMLErrs::DuplicateDocTypeDecl); } scanDocTypeDecl(); sawDocTypeDecl = true; // if reusing grammar, this has been validated already in first scan // skip for performance if (fValidate && fGrammar && !fGrammar->getValidated()) { // validate the DTD scan so far fValidator->preContentValidation(fUseCachedGrammar, true); } } else { // Assume its the start of the root element return; } } else if (fReaderMgr.getCurrentReader()->isWhitespace(nextCh)) { // If we have a document handler then gather up the // whitespace and call back. Otherwise just skip over spaces. if (fDocHandler) { fReaderMgr.getSpaces(bbCData.getBuffer()); fDocHandler->ignorableWhitespace ( bbCData.getRawBuffer() , bbCData.getLen() , false ); } else { fReaderMgr.skipPastSpaces(); } } else { emitError(XMLErrs::InvalidDocumentStructure); // Watch for end of file and break out if (!nextCh) break; else fReaderMgr.skipPastChar(chCloseAngle); } } } catch(const EndOfEntityException&) { // We should never get an end of entity here. They should only // occur within the doc type scanning method, and not leak out to // here. emitError ( XMLErrs::UnexpectedEOE , "in prolog" ); } } It is working fine when I move back to version 1.3, but due to various other requirements, I have to use the new version 3.1 in my application. Thanks in advance, Jojo was: Hi All, Please let me know, if anybody can provide some clue on this. I have been using Xerces as XML parser in my C++ application and I have recently migrated my Xerces version from 1.3 (very old) to 3.1. After that, when I call AbstractDOMParser::parse(const xercesc_3_1::InputSource & source={...}) and passing a Unicode file as input, it pops up exception. However the same works ok for ANSI. The call stack is as shown below. xerces-c_3_1.dll!xercesc_3_1::XMLScanner::scanProlog() Line 1227 + 0x25 bytes xerces-c_3_1.dll!xercesc_3_1::IGXMLScanner::scanDocument(const xercesc_3_1::InputSource & src={...}) Line 210 xerces-c_3_1.dll!xercesc_3_1::AbstractDOMParser::parse(const xercesc_3_1::InputSource & source={...}) Line 549 EPConfigTool.dll!XCfgXMLParser::parse() Line 66 - <b>My application code</b> It is working fine when I move back to version 1.3, but due to various other requirements, I have to use the new version 3.1 in my application. Thanks in advance, Jojo Added the exact code at which this fails. > Xerces is poping up exception while parsing a Unicode file, but same is > working fine for an ANSI file > ----------------------------------------------------------------------------------------------------- > > Key: STDCXX-1053 > URL: https://issues.apache.org/jira/browse/STDCXX-1053 > Project: C++ Standard Library > Issue Type: Bug > Components: 20. General Utilities > Environment: Windows XP > Reporter: Jojo Jose > > Hi All, > Please let me know, if anybody can provide some clue on this. > I have been using Xerces as XML parser in my C++ application and I have > recently migrated my Xerces version from 1.3 (very old) to 3.1. > After that, when I call AbstractDOMParser::parse(const > xercesc_3_1::InputSource & source={...}) and passing a Unicode file as input, > it pops up exception. However the same works ok for ANSI. > The call stack is as shown below. > xerces-c_3_1.dll!xercesc_3_1::XMLScanner::scanProlog() Line 1227 + 0x25 bytes > xerces-c_3_1.dll!xercesc_3_1::IGXMLScanner::scanDocument(const > xercesc_3_1::InputSource & src={...}) Line 210 > xerces-c_3_1.dll!xercesc_3_1::AbstractDOMParser::parse(const > xercesc_3_1::InputSource & source={...}) Line 549 > EPConfigTool.dll!XCfgXMLParser::parse() Line 66 - // My application code > In the code, it is reaching at > else > { > emitError(XMLErrs::InvalidDocumentStructure); > ... > } > The function at parse fail is as shown below: > void XMLScanner::scanProlog() > { > bool sawDocTypeDecl = false; > // Get a buffer for whitespace processing > XMLBufBid bbCData(&fBufMgr); > // Loop through the prolog. If there is no content, this could go all > // the way to the end of the file. > try > { > while (true) > { > const XMLCh nextCh = fReaderMgr.peekNextChar(); > if (nextCh == chOpenAngle) > { > // Ok, it could be the xml decl, a comment, the doc type > line, > // or the start of the root element. > if (checkXMLDecl(true)) > { > // There shall be at lease --ONE-- space in between > // the tag '<?xml' and the VersionInfo. > // > // If we are not at line 1, col 6, then the decl was not > // the first text, so its invalid. > const XMLReader* curReader = > fReaderMgr.getCurrentReader(); > if ((curReader->getLineNumber() != 1) > || (curReader->getColumnNumber() != 7)) > { > emitError(XMLErrs::XMLDeclMustBeFirst); > } > scanXMLDecl(Decl_XML); > } > else if (fReaderMgr.skippedString(XMLUni::fgPIString)) > { > scanPI(); > } > else if (fReaderMgr.skippedString(XMLUni::fgCommentString)) > { > scanComment(); > } > else if (fReaderMgr.skippedString(XMLUni::fgDocTypeString)) > { > if (sawDocTypeDecl) { > emitError(XMLErrs::DuplicateDocTypeDecl); > } > scanDocTypeDecl(); > sawDocTypeDecl = true; > // if reusing grammar, this has been validated already in > first scan > // skip for performance > if (fValidate && fGrammar && !fGrammar->getValidated()) { > // validate the DTD scan so far > fValidator->preContentValidation(fUseCachedGrammar, > true); > } > } > else > { > // Assume its the start of the root element > return; > } > } > else if (fReaderMgr.getCurrentReader()->isWhitespace(nextCh)) > { > // If we have a document handler then gather up the > // whitespace and call back. Otherwise just skip over spaces. > if (fDocHandler) > { > fReaderMgr.getSpaces(bbCData.getBuffer()); > fDocHandler->ignorableWhitespace > ( > bbCData.getRawBuffer() > , bbCData.getLen() > , false > ); > } > else > { > fReaderMgr.skipPastSpaces(); > } > } > else > { > emitError(XMLErrs::InvalidDocumentStructure); > // Watch for end of file and break out > if (!nextCh) > break; > else > fReaderMgr.skipPastChar(chCloseAngle); > } > } > } > catch(const EndOfEntityException&) > { > // We should never get an end of entity here. They should only > // occur within the doc type scanning method, and not leak out to > // here. > emitError > ( > XMLErrs::UnexpectedEOE > , "in prolog" > ); > } > } > It is working fine when I move back to version 1.3, but due to various other > requirements, I have to use the new version 3.1 in my application. > Thanks in advance, > Jojo -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.