2008/9/19 Anna Simbirtsev <[EMAIL PROTECTED]>

> Thank you.
> I think I can just take it out completely, since I want to keep it in
> UTF-8 and just display to the user, not to convert to local code page.
> And all I need a parser to do is parse a document that is in UTF-8 so it
> should be ok.
>


If I understand correctly, you need to read-in a UTF-8 encoded XML file and
keep using this encoding after the Xerces Parser is done with it.

I found only one way to accomplish this.
First, read in the XML with the correct encoding:

   XMLString::transcode("UTF-8", tempStr, xercesMaxString_ - 1);
   domInputSource->setEncoding(tempStr);

Then create a UTF-8 Transcoder object to encode back (!) to UTF-8 the
strings the Xerces Parser will keep internally in UTF-16:

  XMLTransService::Codes returnCode;
  XMLTranscoder * utf8Transcoder_ =
      XMLPlatformUtils::fgTransService->makeNewTranscoderFor("UTF-8",
returnCode, xercesMaxStringLength_);

Use this Transcoder on the strings the Xerces Parser returns:

std::string StringTranscoder::TranscodeToUTF8(const XMLCh * str, unsigned
int inputSize)
{
   XMLByte * resultingString = new XMLByte[xercesMaxStringLength_ - 1];

   unsigned int charsEaten;
   unsigned int resultingSize = utf8Transcoder_->transcodeTo(str, inputSize,
resultingString,
      xercesMaxStringLength_ - 1, charsEaten, XMLTranscoder::UnRep_RepChar);

   std::string resultValue(resultingString, resultingString +
resultingSize);

   delete resultingString;
   return resultValue;
}

If there is a better way, I am interested as well.

Best regards,
Lucian

Reply via email to