You got me! ;) Sung-Gu
----- Original Message ----- From: "André-John Mas" <[EMAIL PROTECTED]> To: "Commons HttpClient Project" <[EMAIL PROTECTED]> Sent: Friday, June 27, 2003 12:23 AM Subject: Re: [VOTE] Re: 2.0 release - deprecate some methods? > This doesn't look correct, if you are really wanting to convert > from one charset to another then you would have to do something > such as: > > String myString = new String(bytes,bytesCharset); > byte[] bytes2 = myString.getBytes(destCharset); > > Until you have the bytes, you don't have the final output, since > strings will be affected by the platformas native encoding if > you aren't careful. Otherwise if your destination is an outputstream, > then let the OutputWriter do the work for you: > > String myString = new String(bytes,bytesCharset); > OutputStreamWriter out = new > OutputStreamWriter(outStream, destCharset) > out.write(myString); > > I have just had to write a project that is fully UTF-8 compliant > and it taught me a lot about what Java does. Without any encoding > specified the string conversion default to the platform native > format, which is not what you always want. I had to go everywhere > and make sure the right conversions were being performed. > > regards > > Andre > > Laura Werner wrote: > > > Adrian Sutton wrote: > > > >> The flaw in the toUsingCharset method is two-fold: > >> Firstly, Strings in Java are *always* stored internally as UTF-8 > > > > > > > > I agree with the rest of your analysis of this, but I thought I should > > point out that Java Strings and "char"s are stored in UTF-16 rather than > > UTF-8. A "char" is an unsigned, two-byte value that can hold all the > > characters from UCS2. > > > > As far as toUsingCharset goes, I agree that it looks broken. The code > > basically does: > > > > return new String(target.getBytes(fromCharset), toCharset); > > > > It's taking "target", which is a UTF-16 string, encoding it into a byte > > array in "fromCharset", and then decoding those bytes back into UTF-16 > > using "toCharset". So it's pretendeing the bytes in the array have two > > different meanings, one when it writes them and one when it reads them > > immediately afterward. I can't see how this could be correct. > > > > -- Laura > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: > > [EMAIL PROTECTED] > > For additional commands, e-mail: > > [EMAIL PROTECTED] > > > > > > > -- > André-John Mas > Software Developer / Développeur Informatique > Newtrade Technologies > 63 de Brésoles, Suite 100, Montreal, Quebec, Canada H2Y 1V7 > mailto:[EMAIL PROTECTED] > tel +1 514 286-8187 x3017 > fax +1 514 221-3287 > > ---------------------------------------------------------------------- > If you have received this message in error, please notify the sender > immediately and delete the original without making a copy, disclosing > its contents or taking any action based thereon. > > Si vous avez reçu ce message par erreur, veuillez en aviser > immédiatement le signataire et effacer l'original, sans en tirer de > copie, en dévoiler le contenu ni prendre quelque mesure fondée sur > celui-ci. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >