Anna Simbirtsev wrote:
Do you know if I receive utf-8 string, can I just take out s.transcode
completely and keep the string in utf-8? DOMString is capable of
containing utf-8 strings?
No, Xerces-C always uses UTF-16 internally to encode character data. When you supply a document that is not encoded in UTF-16, it uses a transcoder to convert the byte stream to UTF-16 before parsing it.

You seemed to be confused about the differences between UTF-8 and UTF-16. Both are encodings that can represent all of the characters in Unicode. UTF-8 is an 8-bit encoding that is compatible with the char data type in C. UTF-16 is a 16-bit encoding, so it's not compatible with the char data type.

Is there some reason you need strings encoded in UTF-8?

Dave

Reply via email to