Re: Losing UTF-8 characters at the end of a string

David Bertoni Tue, 09 Sep 2008 12:29:50 -0700

Matthew Boulter wrote:

Hi all, I just wanted some guidance of where to expend my investigation effort
into this topic.
I have a MySQL database that contains names of some Polish tram stops that I amextracting and encoding as WBXML for transmission.
Now I find when I get them from the database all is good until I get to the 
part where
I'm at our DomToWbxml task.
I find if the string has a Polish character it loses a character from the end of the string,if there are two it loses two and so on.
I read Xerces is UTF-16? If so am I losing something (other than my mind) going 
back to UTF-8 ?

Any help is greatly appreciated.

This is probably the number one problem people experience when usingXerces-C.

Please read the documentation carefully, as the transcoding API you'reusing is _not_ transcoding to UTF-8. Rather, it is transcoding to thelocal code page, so the disappearing characters are probably notrepresentable in the local code page. Instead of usingDOM_String::transcode(), you need to create a UTF-8 transcoder and use that.

Also, you're using the deprecated DOM, which will disappear in Xerces-C3.0. I would suggest you update your code to use the new DOM.

For more information, please search the mailing list archives for"transcoding." Here's a good place to start:


http://marc.info/?l=xerces-c-users&m=119514889329902&w=2

Dave

Re: Losing UTF-8 characters at the end of a string

Reply via email to