Matthew Boulter wrote:
Hi all, I just wanted some guidance of where to expend my investigation effort
into this topic.
I have a MySQL database that contains names of some Polish tram stops that I am
extracting and encoding as WBXML for transmission.
Now I find when I get them from the database all is good until I get to the
part where
I'm at our DomToWbxml task.
I find if the string has a Polish character it loses a character from the end of the string,
if there are two it loses two and so on.
I read Xerces is UTF-16? If so am I losing something (other than my mind) going
back to UTF-8 ?
Any help is greatly appreciated.
This is probably the number one problem people experience when using
Xerces-C.
Please read the documentation carefully, as the transcoding API you're
using is _not_ transcoding to UTF-8. Rather, it is transcoding to the
local code page, so the disappearing characters are probably not
representable in the local code page. Instead of using
DOM_String::transcode(), you need to create a UTF-8 transcoder and use that.
Also, you're using the deprecated DOM, which will disappear in Xerces-C
3.0. I would suggest you update your code to use the new DOM.
For more information, please search the mailing list archives for
"transcoding." Here's a good place to start:
http://marc.info/?l=xerces-c-users&m=119514889329902&w=2
Dave