Jaya Nageswar wrote:
Hi,

I am using xerces c 1.7.0 (ICU build) for parsing xml files. I have some
special chinese characters in the xml file. So i am using ICU build to
support unicode. I defined encoding as UTF-8

*<?xml version="1.0" encoding="UTF-8"?>*

Part of xml file contains the has the following chinese characters.
  *      <Convert>
            <FromValue>TRUE</FromValue>
            <ToValue>您是如</ToValue>
        </Convert>
        <Convert>
            <FromValue>FALSE</FromValue>
            <ToValue>您好</ToValue>
        </Convert>*

I am using DOM to prase the xml file. I have the following code for DOM
parsing

*    static const XMLCh gLS[] = { chLatin_L, chLatin_S, chNull };
    DOMImplementation *impl =
DOMImplementationRegistry::getDOMImplementation(gLS);
    DOMBuilder        *CtlParser =
((DOMImplementationLS*)impl)->createDOMBuilder(DOMImplementationLS::MODE_SYNCHRONOUS,
0);*

*    CtlParser->setFeature(XMLUni::fgDOMNamespaces, true);
    CtlParser->setFeature(XMLUni::fgXercesSchema, true);
    CtlParser->setFeature(XMLUni::fgXercesSchemaFullChecking, true);
    CtlParser->setFeature(XMLUni::fgDOMValidateIfSchema, true);*

*    //create our error handler and install it
    XMLErrorHandler errorHandler;
    CtlParser->setErrorHandler(&errorHandler);

    CtlDoc = CtlParser->parseURI(XMLFilePath);
     if(errorHandler.getSawErrors())
     {
           cout<<errorHandler.ReturnErrorMessage();
     } *


I am getting the following error.
*Message: An exception occurred! Type:UTFDataFormatException,
Message:invalid byte 2 (�) of a 2-byte sequence.*
This indicates your file is not really encoded in UTF-8.


I do not understand why i am getting this error even though i am using
xercec-c ICU build. ICU build is supposed to work with unicode characters.
If i remove the chinese characters, i am not getting any error message while
parsing.
Xerces-C supports UTF-8 even without using the ICU transcoders.


If any body worked with unicode in xerces-c, please help me. Did i miss any
of the parser settings for unicode?
Your file is not encoded in UTF-8, so the parser reports an error. You can either fix the file so it's properly encoded, or update the encoding in the XML declaration to reflect the actual encoding.

Dave

Reply via email to