Re: xerces c 1.7.0 ICU for unicode

David Bertoni Tue, 02 Sep 2008 14:49:05 -0700

Jaya Nageswar wrote:

Hi,


I am using xerces c 1.7.0 (ICU build) for parsing xml files. I have some
special chinese characters in the xml file. So i am using ICU build to
support unicode. I defined encoding as UTF-8

*<?xml version="1.0" encoding="UTF-8"?>*

Part of xml file contains the has the following chinese characters.
  *      <Convert>
            <FromValue>TRUE</FromValue>
            <ToValue>您是如</ToValue>
        </Convert>
        <Convert>
            <FromValue>FALSE</FromValue>
            <ToValue>您好</ToValue>
        </Convert>*

I am using DOM to prase the xml file. I have the following code for DOM
parsing

*    static const XMLCh gLS[] = { chLatin_L, chLatin_S, chNull };
    DOMImplementation *impl =
DOMImplementationRegistry::getDOMImplementation(gLS);
    DOMBuilder        *CtlParser =
((DOMImplementationLS*)impl)->createDOMBuilder(DOMImplementationLS::MODE_SYNCHRONOUS,
0);*

*    CtlParser->setFeature(XMLUni::fgDOMNamespaces, true);
    CtlParser->setFeature(XMLUni::fgXercesSchema, true);
    CtlParser->setFeature(XMLUni::fgXercesSchemaFullChecking, true);
    CtlParser->setFeature(XMLUni::fgDOMValidateIfSchema, true);*

*    //create our error handler and install it
    XMLErrorHandler errorHandler;
    CtlParser->setErrorHandler(&errorHandler);

    CtlDoc = CtlParser->parseURI(XMLFilePath);
     if(errorHandler.getSawErrors())
     {
           cout<<errorHandler.ReturnErrorMessage();
     } *


I am getting the following error.
*Message: An exception occurred! Type:UTFDataFormatException,
Message:invalid byte 2 (�) of a 2-byte sequence.*

This indicates your file is not really encoded in UTF-8.


I do not understand why i am getting this error even though i am using
xercec-c ICU build. ICU build is supposed to work with unicode characters.
If i remove the chinese characters, i am not getting any error message while
parsing.

Xerces-C supports UTF-8 even without using the ICU transcoders.


If any body worked with unicode in xerces-c, please help me. Did i miss any
of the parser settings for unicode?

Your file is not encoded in UTF-8, so the parser reports an error. Youcan either fix the file so it's properly encoded, or update the encodingin the XML declaration to reflect the actual encoding.


Dave

Re: xerces c 1.7.0 ICU for unicode

Reply via email to