Hi,
I am using xerces c 1.7.0 (ICU build) for parsing xml files. I have some
special chinese characters in the xml file. So i am using ICU build to
support unicode. I defined encoding as UTF-8
*<?xml version="1.0" encoding="UTF-8"?>*
Part of xml file contains the has the following chinese characters.
* <Convert>
<FromValue>TRUE</FromValue>
<ToValue>您是如</ToValue>
</Convert>
<Convert>
<FromValue>FALSE</FromValue>
<ToValue>您好</ToValue>
</Convert>*
I am using DOM to prase the xml file. I have the following code for DOM
parsing
* static const XMLCh gLS[] = { chLatin_L, chLatin_S, chNull };
DOMImplementation *impl =
DOMImplementationRegistry::getDOMImplementation(gLS);
DOMBuilder *CtlParser =
((DOMImplementationLS*)impl)->createDOMBuilder(DOMImplementationLS::MODE_SYNCHRONOUS,
0);*
* CtlParser->setFeature(XMLUni::fgDOMNamespaces, true);
CtlParser->setFeature(XMLUni::fgXercesSchema, true);
CtlParser->setFeature(XMLUni::fgXercesSchemaFullChecking, true);
CtlParser->setFeature(XMLUni::fgDOMValidateIfSchema, true);*
* //create our error handler and install it
XMLErrorHandler errorHandler;
CtlParser->setErrorHandler(&errorHandler);
CtlDoc = CtlParser->parseURI(XMLFilePath);
if(errorHandler.getSawErrors())
{
cout<<errorHandler.ReturnErrorMessage();
} *
I am getting the following error.
*Message: An exception occurred! Type:UTFDataFormatException,
Message:invalid byte 2 (�) of a 2-byte sequence.*
I do not understand why i am getting this error even though i am using
xercec-c ICU build. ICU build is supposed to work with unicode characters.
If i remove the chinese characters, i am not getting any error message while
parsing.
If any body worked with unicode in xerces-c, please help me. Did i miss any
of the parser settings for unicode?
Thanks in advance,
Jaya Nageswar.