Adrian Sutton wrote:

The flaw in the toUsingCharset method is two-fold:
Firstly, Strings in Java are *always* stored internally as UTF-8


I agree with the rest of your analysis of this, but I thought I should point out that Java Strings and "char"s are stored in UTF-16 rather than UTF-8. A "char" is an unsigned, two-byte value that can hold all the characters from UCS2.

As far as toUsingCharset goes, I agree that it looks broken. The code basically does:

return new String(target.getBytes(fromCharset), toCharset);

It's taking "target", which is a UTF-16 string, encoding it into a byte array in "fromCharset", and then decoding those bytes back into UTF-16 using "toCharset". So it's pretendeing the bytes in the array have two different meanings, one when it writes them and one when it reads them immediately afterward. I can't see how this could be correct.

-- Laura


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to