[jira] Updated: (STDCXX-1053) Xerces is poping up exception while parsing a Unicode file, but same is working fine for an ANSI file

Jojo Jose (JIRA) Fri, 21 Jan 2011 01:11:16 -0800

     [ 
https://issues.apache.org/jira/browse/STDCXX-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jojo Jose updated STDCXX-1053:
------------------------------

    Description: 
Hi All,

Please let me know, if anybody can provide some clue on this.

I have been using Xerces as XML parser in my C++ application and I have 
recently migrated my Xerces version from 1.3 (very old) to 3.1.

After that, when I call AbstractDOMParser::parse(const xercesc_3_1::InputSource 
& source={...}) and passing a Unicode file as input, it pops up exception. 
However the same works ok for ANSI.

The call stack is as shown below.

xerces-c_3_1.dll!xercesc_3_1::XMLScanner::scanProlog()  Line 1227 + 0x25 bytes
xerces-c_3_1.dll!xercesc_3_1::IGXMLScanner::scanDocument(const 
xercesc_3_1::InputSource & src={...})  Line 210
xerces-c_3_1.dll!xercesc_3_1::AbstractDOMParser::parse(const 
xercesc_3_1::InputSource & source={...})  Line 549
EPConfigTool.dll!XCfgXMLParser::parse()  Line 66 - // My application code

In the code, it is reaching at  
else
{
 emitError(XMLErrs::InvalidDocumentStructure);
...
}

The function at parse fail is as shown below:

void XMLScanner::scanProlog()
{
    bool sawDocTypeDecl = false;
    // Get a buffer for whitespace processing
    XMLBufBid bbCData(&fBufMgr);

    //  Loop through the prolog. If there is no content, this could go all
    //  the way to the end of the file.
    try
    {
        while (true)
        {
            const XMLCh nextCh = fReaderMgr.peekNextChar();

            if (nextCh == chOpenAngle)
            {
                //  Ok, it could be the xml decl, a comment, the doc type line,
                //  or the start of the root element.
                if (checkXMLDecl(true))
                {
                    // There shall be at lease --ONE-- space in between
                    // the tag '<?xml' and the VersionInfo.
                    //
                    //  If we are not at line 1, col 6, then the decl was not
                    //  the first text, so its invalid.
                    const XMLReader* curReader = fReaderMgr.getCurrentReader();
                    if ((curReader->getLineNumber() != 1)
                    ||  (curReader->getColumnNumber() != 7))
                    {
                        emitError(XMLErrs::XMLDeclMustBeFirst);
                    }

                    scanXMLDecl(Decl_XML);
                }
                else if (fReaderMgr.skippedString(XMLUni::fgPIString))
                {
                    scanPI();
                }
                 else if (fReaderMgr.skippedString(XMLUni::fgCommentString))
                {
                    scanComment();
                }
                 else if (fReaderMgr.skippedString(XMLUni::fgDocTypeString))
                {
                    if (sawDocTypeDecl) {
                        emitError(XMLErrs::DuplicateDocTypeDecl);
                    }
                    scanDocTypeDecl();
                    sawDocTypeDecl = true;

                    // if reusing grammar, this has been validated already in 
first scan
                    // skip for performance
                    if (fValidate && fGrammar && !fGrammar->getValidated()) {
                        //  validate the DTD scan so far
                        fValidator->preContentValidation(fUseCachedGrammar, 
true);
                    }
                }
                else
                {
                    // Assume its the start of the root element
                    return;
                }
            }
            else if (fReaderMgr.getCurrentReader()->isWhitespace(nextCh))
            {
                //  If we have a document handler then gather up the
                //  whitespace and call back. Otherwise just skip over spaces.
                if (fDocHandler)
                {
                    fReaderMgr.getSpaces(bbCData.getBuffer());
                    fDocHandler->ignorableWhitespace
                    (
                        bbCData.getRawBuffer()
                        , bbCData.getLen()
                        , false
                    );
                }
                 else
                {
                    fReaderMgr.skipPastSpaces();
                }
            }
             else
            {
                emitError(XMLErrs::InvalidDocumentStructure);

                // Watch for end of file and break out
                if (!nextCh)
                    break;
                else
                    fReaderMgr.skipPastChar(chCloseAngle);
            }

        }
    }
    catch(const EndOfEntityException&)
    {
        //  We should never get an end of entity here. They should only
        //  occur within the doc type scanning method, and not leak out to
        //  here.
        emitError
        (
            XMLErrs::UnexpectedEOE
            , "in prolog"
        );
    }
}

It is working fine when I move back to version 1.3, but due to various other 
requirements, I have to use the new version 3.1 in my application.

Thanks in advance,
Jojo


  was:
Hi All,

Please let me know, if anybody can provide some clue on this.

I have been using Xerces as XML parser in my C++ application and I have 
recently migrated my Xerces version from 1.3 (very old) to 3.1.

After that, when I call AbstractDOMParser::parse(const xercesc_3_1::InputSource 
& source={...}) and passing a Unicode file as input, it pops up exception. 
However the same works ok for ANSI.

The call stack is as shown below.

xerces-c_3_1.dll!xercesc_3_1::XMLScanner::scanProlog()  Line 1227 + 0x25 bytes
xerces-c_3_1.dll!xercesc_3_1::IGXMLScanner::scanDocument(const 
xercesc_3_1::InputSource & src={...})  Line 210
xerces-c_3_1.dll!xercesc_3_1::AbstractDOMParser::parse(const 
xercesc_3_1::InputSource & source={...})  Line 549
EPConfigTool.dll!XCfgXMLParser::parse()  Line 66 - <b>My application code</b>

It is working fine when I move back to version 1.3, but due to various other 
requirements, I have to use the new version 3.1 in my application.

Thanks in advance,
Jojo



Added the exact code at which this fails.

> Xerces is poping up exception while parsing a Unicode file, but same is 
> working fine for an ANSI file
> -----------------------------------------------------------------------------------------------------
>
>                 Key: STDCXX-1053
>                 URL: https://issues.apache.org/jira/browse/STDCXX-1053
>             Project: C++ Standard Library
>          Issue Type: Bug
>          Components: 20. General Utilities
>         Environment: Windows XP
>            Reporter: Jojo Jose
>
> Hi All,
> Please let me know, if anybody can provide some clue on this.
> I have been using Xerces as XML parser in my C++ application and I have 
> recently migrated my Xerces version from 1.3 (very old) to 3.1.
> After that, when I call AbstractDOMParser::parse(const 
> xercesc_3_1::InputSource & source={...}) and passing a Unicode file as input, 
> it pops up exception. However the same works ok for ANSI.
> The call stack is as shown below.
> xerces-c_3_1.dll!xercesc_3_1::XMLScanner::scanProlog()  Line 1227 + 0x25 bytes
> xerces-c_3_1.dll!xercesc_3_1::IGXMLScanner::scanDocument(const 
> xercesc_3_1::InputSource & src={...})  Line 210
> xerces-c_3_1.dll!xercesc_3_1::AbstractDOMParser::parse(const 
> xercesc_3_1::InputSource & source={...})  Line 549
> EPConfigTool.dll!XCfgXMLParser::parse()  Line 66 - // My application code
> In the code, it is reaching at  
> else
> {
>  emitError(XMLErrs::InvalidDocumentStructure);
> ...
> }
> The function at parse fail is as shown below:
> void XMLScanner::scanProlog()
> {
>     bool sawDocTypeDecl = false;
>     // Get a buffer for whitespace processing
>     XMLBufBid bbCData(&fBufMgr);
>     //  Loop through the prolog. If there is no content, this could go all
>     //  the way to the end of the file.
>     try
>     {
>         while (true)
>         {
>             const XMLCh nextCh = fReaderMgr.peekNextChar();
>             if (nextCh == chOpenAngle)
>             {
>                 //  Ok, it could be the xml decl, a comment, the doc type 
> line,
>                 //  or the start of the root element.
>                 if (checkXMLDecl(true))
>                 {
>                     // There shall be at lease --ONE-- space in between
>                     // the tag '<?xml' and the VersionInfo.
>                     //
>                     //  If we are not at line 1, col 6, then the decl was not
>                     //  the first text, so its invalid.
>                     const XMLReader* curReader = 
> fReaderMgr.getCurrentReader();
>                     if ((curReader->getLineNumber() != 1)
>                     ||  (curReader->getColumnNumber() != 7))
>                     {
>                         emitError(XMLErrs::XMLDeclMustBeFirst);
>                     }
>                     scanXMLDecl(Decl_XML);
>                 }
>                 else if (fReaderMgr.skippedString(XMLUni::fgPIString))
>                 {
>                     scanPI();
>                 }
>                  else if (fReaderMgr.skippedString(XMLUni::fgCommentString))
>                 {
>                     scanComment();
>                 }
>                  else if (fReaderMgr.skippedString(XMLUni::fgDocTypeString))
>                 {
>                     if (sawDocTypeDecl) {
>                         emitError(XMLErrs::DuplicateDocTypeDecl);
>                     }
>                     scanDocTypeDecl();
>                     sawDocTypeDecl = true;
>                     // if reusing grammar, this has been validated already in 
> first scan
>                     // skip for performance
>                     if (fValidate && fGrammar && !fGrammar->getValidated()) {
>                         //  validate the DTD scan so far
>                         fValidator->preContentValidation(fUseCachedGrammar, 
> true);
>                     }
>                 }
>                 else
>                 {
>                     // Assume its the start of the root element
>                     return;
>                 }
>             }
>             else if (fReaderMgr.getCurrentReader()->isWhitespace(nextCh))
>             {
>                 //  If we have a document handler then gather up the
>                 //  whitespace and call back. Otherwise just skip over spaces.
>                 if (fDocHandler)
>                 {
>                     fReaderMgr.getSpaces(bbCData.getBuffer());
>                     fDocHandler->ignorableWhitespace
>                     (
>                         bbCData.getRawBuffer()
>                         , bbCData.getLen()
>                         , false
>                     );
>                 }
>                  else
>                 {
>                     fReaderMgr.skipPastSpaces();
>                 }
>             }
>              else
>             {
>                 emitError(XMLErrs::InvalidDocumentStructure);
>                 // Watch for end of file and break out
>                 if (!nextCh)
>                     break;
>                 else
>                     fReaderMgr.skipPastChar(chCloseAngle);
>             }
>         }
>     }
>     catch(const EndOfEntityException&)
>     {
>         //  We should never get an end of entity here. They should only
>         //  occur within the doc type scanning method, and not leak out to
>         //  here.
>         emitError
>         (
>             XMLErrs::UnexpectedEOE
>             , "in prolog"
>         );
>     }
> }
> It is working fine when I move back to version 1.3, but due to various other 
> requirements, I have to use the new version 3.1 in my application.
> Thanks in advance,
> Jojo

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (STDCXX-1053) Xerces is poping up exception while parsing a Unicode file, but same is working fine for an ANSI file

Reply via email to