Anna Simbirtsev wrote:
Do you know if I receive utf-8 string, can I just take out s.transcode
completely and keep the string in utf-8? DOMString is capable of
containing utf-8 strings?
No, Xerces-C always uses UTF-16 internally to encode character data.
When you supply a document that is not encoded in UTF-16, it uses a
transcoder to convert the byte stream to UTF-16 before parsing it.
You seemed to be confused about the differences between UTF-8 and
UTF-16. Both are encodings that can represent all of the characters in
Unicode. UTF-8 is an 8-bit encoding that is compatible with the char
data type in C. UTF-16 is a 16-bit encoding, so it's not compatible
with the char data type.
Is there some reason you need strings encoded in UTF-8?
Dave