Re: characters with accents

Alberto Massari Fri, 24 Feb 2006 08:40:42 -0800

Hi Dara,

this is a recurring issue when dealing with non-ASCII characters (inyour case, accented characters). Xerces works with Unicodecharacters, so every time you are using XMLCh strings you areguaranteed that whatever data was stored will be retrieved. The weakpoint is how you got those accented characters inside the DOM; Iguess you are using XMLString::transcode in order to convert themfrom an ISO-8859-1 literal stored in your C++ source code.But XMLString::transcode will interpret the data as it was encoded inthe current locale (e.g. read using scanf or cin) so there is thechance it will misinterpret it.

You have several options:

1) if the characters are stored in the C++ source, use Unicodestrings (if your compiler supports the L"string" format, try usingit; otherwise, create arrays of XMLCh elements and initialize themusing the Unicode codepoint for the characters)2) use normal C literals, but replace XMLString::transcode with atranscoder for ISO-8859-13) keep using XMLString::transcode, but ensure it is working with theencoding you need (e.g. try calling setlocale beforeXMLPlatformUtils::Initialize)


Hope this helps,
Alberto

At 04:14 PM 2/24/2006 +0000, dara wrote:

Hi all,
I am having a little fun with 'accented characters' in anapplication i'm working with at the minute.
Through various tracing and debug methods, i can see that thecharacters are correctly being propogated around the system untilthe point where they are added to a DOM in Xerces.
After subsequent writing of the DOM to a MemBufFormatTarget* andretrieval through the getRawBuffer() interface, the data is-corrupt-. (i get characters out, but they are not accented and notthe chars from my buffer)
To remove any potential issues in the application, i've altered theDOMPrint sample to write to the same type of target. It correctlyparses and writes the small XML document I've given it with accentedcharacters and in the codepage:: iso-8859-1.
I then added some code to add another node with some accentedcharacters, and these do not appear in the output. (which isinteresting, my app is giving me corrupt data and the sampl isn'tgiving me anything).
I'm running v240 at the moment under a linux variant, and i've setthe shell locale to the same as the xml document being parsed.However the application still reports the locale as "C", so I'mgoing to try a few more things.
But I've a feelling I'm missing something basic. Do I need to dosomething special with the Transcoder class, or the DOM documentwhen adding an element and text node ?
I'm sure somebody has met this already. Any pointers ?

Thanks and Regards

Dara

Re: characters with accents

Reply via email to