2008/9/19 Anna Simbirtsev <[EMAIL PROTECTED]>
> Thank you.
> I think I can just take it out completely, since I want to keep it in
> UTF-8 and just display to the user, not to convert to local code page.
> And all I need a parser to do is parse a document that is in UTF-8 so it
> should be ok.
>
If I understand correctly, you need to read-in a UTF-8 encoded XML file and
keep using this encoding after the Xerces Parser is done with it.
I found only one way to accomplish this.
First, read in the XML with the correct encoding:
XMLString::transcode("UTF-8", tempStr, xercesMaxString_ - 1);
domInputSource->setEncoding(tempStr);
Then create a UTF-8 Transcoder object to encode back (!) to UTF-8 the
strings the Xerces Parser will keep internally in UTF-16:
XMLTransService::Codes returnCode;
XMLTranscoder * utf8Transcoder_ =
XMLPlatformUtils::fgTransService->makeNewTranscoderFor("UTF-8",
returnCode, xercesMaxStringLength_);
Use this Transcoder on the strings the Xerces Parser returns:
std::string StringTranscoder::TranscodeToUTF8(const XMLCh * str, unsigned
int inputSize)
{
XMLByte * resultingString = new XMLByte[xercesMaxStringLength_ - 1];
unsigned int charsEaten;
unsigned int resultingSize = utf8Transcoder_->transcodeTo(str, inputSize,
resultingString,
xercesMaxStringLength_ - 1, charsEaten, XMLTranscoder::UnRep_RepChar);
std::string resultValue(resultingString, resultingString +
resultingSize);
delete resultingString;
return resultValue;
}
If there is a better way, I am interested as well.
Best regards,
Lucian