Richard Young wrote:
> The HTML page is encoded in plain ANSI.

Meaning what?

> I've also moved my Javascript function in an external file

Served over HTTP, or as a file:// URI?

> I'm not sure how the engine works in the background, maybe some JS
> string functions are screwing up the UTF8 when the code is being
> evaluated.

The "engine" (by which I assume you mean the process of getting data to 
SpiderMonkey) works like this:

1)  Take bytes from the network library.
2)  Decide what encoding those bytes are in based on things like HTTP
     response headers, etc, etc, defaulting to ISO-8859-1.  The exact
     algorithm can get pretty complicated, esp. if charset autodetect
     is involved.
3)  Transcode the bytes from the encoding you decided on to UTF-16.
4)  Pass the UTF-16 to the JS engine, which treats it as UCS-2.

Note that the only place UTF-8 might appear here is as the byte encoding 
decided 
on in step 2.

Once you have a JSString, it will always be the result of transcoding the bytes 
from the encoding decided on in step 2 to UCS-2.  Any time those bytes were not 
originally unicode encoded in the same encoding as detected in step 2, you will 
get "corruption".

-Boris
_______________________________________________
dev-embedding mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-embedding

Reply via email to