Well, I could reproduce this, and it seems to me to be a bug (at least as far as shell is concerned). If I write this code snippet:

  print("á".charCodeAt(0))

into a file "x.js", save it with UTF-8 encoding and run it with Rhino using

 java -jar js.jar x.js

it prints 8730. Turns out "á" is encoded as C3 A1, which is indeed UTF-8 for "á". However java.lang.System.getProperty("file.encoding") returns "MacRoman", and C3 in MacRoman translates to U+221A "SQUARE ROOT" character (decimal 8730). Same happens when directly typing it into the console.

So, there's a discrepancy between character encodings: console on Mac OS X apparently feeds the characters as UTF-8 encoded byte stream through System.in, but Rhino shell reads them as MacRoman, as that's the default Java encoding in the JRE (value of the "file.encoding" system property). Taken at face value, this is actually a bug in Java; if the console is UTF-8 based, the JRE should detect that, and set "file.encoding" to utf-8.

We could work around it if Rhino shell had an explicit command line encoding declaration, i.e. if you could specify "-c utf-8" -- that'd solve it.

Actually, I believe I'll just write code to solve this that'd be conformant to RFC-4329.

Attila.

--
home: http://www.szegedi.org
weblog: http://constc.blogspot.com

On Oct 18, 2008, at 12:31 AM, tlrobinson wrote:

In Rhino if I do "á".charCodeAt(0) I get 8730, whereas in Firefox,
Safari I get 225. (that's option-"e" then "a" in OS X)

Is this undefined behavior in JavaScript, a bug, or am I doing
something weird?

Thanks.
_______________________________________________
dev-tech-js-engine-rhino mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-js-engine-rhino

Reply via email to