Also do I need to use std::wstring to store UTF-8 strings or I will be
ok with std::string?
Thank you
On Fri, 2008-09-19 at 09:40 -0400, Anna Simbirtsev wrote:
> Hi,
>
> Do you know if you can give me an example of how to transcode utf-8
> string to unicode and back? I think if I get the string in utf-8
> encoding, I need to convert it to unicode before I pass it into xerces
> parser?
>
> On Wed, 2008-09-17 at 09:58 -0700, David Bertoni wrote:
> > Anna Simbirtsev wrote:
> > > When I print it in hex format, I get
> > > �: 0xffffffd0
> > > �: 0xffffffb1
> > > �: 0xffffffd0
> > > �: 0xffffffb1
> > > �: 0xffffffd0
> > > �: 0xffffffb1
> > >
> > > Which I am not even sure what format, but maybe my shell does not
> > > know what it is.
> > You need to understand the limitations of any library you use. Here is
> > a snippet of the source code from the domtools library you're using:
> >
> > string domtools::toString(const DOMString s)
> > {
> > char * t = s.transcode();
> > if (!t) return "";
> > string tmp = t;
> > delete [] t;
> > return tmp;
> > }
> >
> > You can see the call to DOMString::transcode(). This will fail when
> > characters in the DOMString are not representable in the local code
> > page. This is likely what's happening, and I suggest you find another
> > library to use, because this one is broken.
> >
> > Alternately, if you always want to transcode data to UTF-8, you can
> > modify the library to use a UTF-8 transcoder. There was another thread
> > late last week and this week on this topic.
> >
> > Dave
>