Re: UTF-16 processing

fi Thu, 15 Mar 2007 13:15:08 -0800

On 15 Mrz., 00:37, Christian Biesinger <[EMAIL PROTECTED]> wrote:
[...]
> UTF-16 can be either little- or big-endian. I assume you're trying to
> interpret it the wrong way. Is there a byte order mark at the beginning?
>


There is. In fact, the server can send UTF-16, UTF-16BE, and UTF-16LE
encoding. In the first case there is a BOM. This was one missing link,
so thank you very much for pointing this out.

There is one remaining piece missing: What byte ordering do I have to
use for PRUnichar? Suppose I have a char* buffer filled with raw bytes
in UTF-16LE encoding. If I cast this buffer to (PRUnichar *), it
works.

What I mean by this is, that the line

printf("String: %s\n",
NS_LossyConvertUTF16toASCII((PRUnichar*)mByteData).get());

will print the string as expected.

This obviously doesn't work with UTF-16BE, here I have to swap bytes.
But is this always the case with PRUnichar, or does it only work on my
machine, having the particular processor it has? I.E. will PRUnichar
work only with UTF-16BE in this way on other machines with different
processors and/or different operating systems?

Unfortunately I didn't find a comment in prtypes.h. May be the answer
is obvious and trivial, but I don't know it.

-Andreas.

_______________________________________________
dev-tech-xpcom mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-xpcom

Re: UTF-16 processing

Reply via email to