Re: Problems with xerces-c version 1.7.0 and UTF-8

David Bertoni Fri, 19 Sep 2008 16:07:40 -0700

Anna Simbirtsev wrote:

Do you know if I receive utf-8 string, can I just take out s.transcode
completely and keep the string in utf-8? DOMString is capable of
containing utf-8 strings?

No, Xerces-C always uses UTF-16 internally to encode character data.When you supply a document that is not encoded in UTF-16, it uses atranscoder to convert the byte stream to UTF-16 before parsing it.

You seemed to be confused about the differences between UTF-8 andUTF-16. Both are encodings that can represent all of the characters inUnicode. UTF-8 is an 8-bit encoding that is compatible with the chardata type in C. UTF-16 is a 16-bit encoding, so it's not compatiblewith the char data type.


Is there some reason you need strings encoded in UTF-8?

Dave

Re: Problems with xerces-c version 1.7.0 and UTF-8

Reply via email to