Hi,
Do you know if you can give me an example of how to transcode utf-8
string to unicode and back? I think if I get the string in utf-8
encoding, I need to convert it to unicode before I pass it into xerces
parser?
On Wed, 2008-09-17 at 09:58 -0700, David Bertoni wrote:
> Anna Simbirtsev wrote:
> > When I print it in hex format, I get
> > �: 0xffffffd0
> > �: 0xffffffb1
> > �: 0xffffffd0
> > �: 0xffffffb1
> > �: 0xffffffd0
> > �: 0xffffffb1
> >
> > Which I am not even sure what format, but maybe my shell does not
> > know what it is.
> You need to understand the limitations of any library you use. Here is
> a snippet of the source code from the domtools library you're using:
>
> string domtools::toString(const DOMString s)
> {
> char * t = s.transcode();
> if (!t) return "";
> string tmp = t;
> delete [] t;
> return tmp;
> }
>
> You can see the call to DOMString::transcode(). This will fail when
> characters in the DOMString are not representable in the local code
> page. This is likely what's happening, and I suggest you find another
> library to use, because this one is broken.
>
> Alternately, if you always want to transcode data to UTF-8, you can
> modify the library to use a UTF-8 transcoder. There was another thread
> late last week and this week on this topic.
>
> Dave