Richard Young wrote:
> The HTML page is encoded in plain ANSI.
Meaning what?
> I've also moved my Javascript function in an external file
Served over HTTP, or as a file:// URI?
> I'm not sure how the engine works in the background, maybe some JS
> string functions are screwing up the UTF8 when the code is being
> evaluated.
The "engine" (by which I assume you mean the process of getting data to
SpiderMonkey) works like this:
1) Take bytes from the network library.
2) Decide what encoding those bytes are in based on things like HTTP
response headers, etc, etc, defaulting to ISO-8859-1. The exact
algorithm can get pretty complicated, esp. if charset autodetect
is involved.
3) Transcode the bytes from the encoding you decided on to UTF-16.
4) Pass the UTF-16 to the JS engine, which treats it as UCS-2.
Note that the only place UTF-8 might appear here is as the byte encoding
decided
on in step 2.
Once you have a JSString, it will always be the result of transcoding the bytes
from the encoding decided on in step 2 to UCS-2. Any time those bytes were not
originally unicode encoded in the same encoding as detected in step 2, you will
get "corruption".
-Boris
_______________________________________________
dev-embedding mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-embedding