Anna Simbirtsev wrote:
When I print it in hex format, I get
�: 0xffffffd0
�: 0xffffffb1
�: 0xffffffd0
�: 0xffffffb1
�: 0xffffffd0
�: 0xffffffb1
Which I am not even sure what format, but maybe my shell does not
know what it is.
You need to understand the limitations of any library you use. Here is
a snippet of the source code from the domtools library you're using:
string domtools::toString(const DOMString s)
{
char * t = s.transcode();
if (!t) return "";
string tmp = t;
delete [] t;
return tmp;
}
You can see the call to DOMString::transcode(). This will fail when
characters in the DOMString are not representable in the local code
page. This is likely what's happening, and I suggest you find another
library to use, because this one is broken.
Alternately, if you always want to transcode data to UTF-8, you can
modify the library to use a UTF-8 transcoder. There was another thread
late last week and this week on this topic.
Dave