Matthew Boulter wrote:
Hi all, I just wanted some guidance of where to expend my investigation effort
into this topic.

I have a MySQL database that contains names of some Polish tram stops that I am extracting and encoding as WBXML for transmission.

Now I find when I get them from the database all is good until I get to the 
part where
I'm at our DomToWbxml task.

I find if the string has a Polish character it loses a character from the end of the string, if there are two it loses two and so on.

I read Xerces is UTF-16? If so am I losing something (other than my mind) going 
back to UTF-8 ?

Any help is greatly appreciated.
This is probably the number one problem people experience when using Xerces-C.

Please read the documentation carefully, as the transcoding API you're using is _not_ transcoding to UTF-8. Rather, it is transcoding to the local code page, so the disappearing characters are probably not representable in the local code page. Instead of using DOM_String::transcode(), you need to create a UTF-8 transcoder and use that.

Also, you're using the deprecated DOM, which will disappear in Xerces-C 3.0. I would suggest you update your code to use the new DOM.

For more information, please search the mailing list archives for "transcoding." Here's a good place to start:

http://marc.info/?l=xerces-c-users&m=119514889329902&w=2

Dave

Reply via email to