I just got through Internationalizing a website... input and output. I ran into the exact same issues, and as Andre states, you pretty much need to check everywhere for byte[] ->String and String->byte[].
Then do the conversions he's given. I personally liked the more terse: byte[] outbytes = new String(inbytes, inputEncoding).getBytes(outputEncoding); - Matt Secoske André-John Mas <[EMAIL PROTECTED]> 06/26/2003 03:49 PM Please respond to "Commons HttpClient Project" To: Commons HttpClient Project <[EMAIL PROTECTED]> cc: Subject: Re: [VOTE] Re: 2.0 release - deprecate some methods? This doesn't look correct, if you are really wanting to convert from one charset to another then you would have to do something such as: String myString = new String(bytes,bytesCharset); byte[] bytes2 = myString.getBytes(destCharset); Until you have the bytes, you don't have the final output, since strings will be affected by the platformas native encoding if you aren't careful. Otherwise if your destination is an outputstream, then let the OutputWriter do the work for you: String myString = new String(bytes,bytesCharset); OutputStreamWriter out = new OutputStreamWriter(outStream, destCharset) out.write(myString); I have just had to write a project that is fully UTF-8 compliant and it taught me a lot about what Java does. Without any encoding specified the string conversion default to the platform native format, which is not what you always want. I had to go everywhere and make sure the right conversions were being performed. regards Andre Laura Werner wrote: > Adrian Sutton wrote: > >> The flaw in the toUsingCharset method is two-fold: >> Firstly, Strings in Java are *always* stored internally as UTF-8 > > > > I agree with the rest of your analysis of this, but I thought I should > point out that Java Strings and "char"s are stored in UTF-16 rather than > UTF-8. A "char" is an unsigned, two-byte value that can hold all the > characters from UCS2. > > As far as toUsingCharset goes, I agree that it looks broken. The code > basically does: > > return new String(target.getBytes(fromCharset), toCharset); > > It's taking "target", which is a UTF-16 string, encoding it into a byte > array in "fromCharset", and then decoding those bytes back into UTF-16 > using "toCharset". So it's pretendeing the bytes in the array have two > different meanings, one when it writes them and one when it reads them > immediately afterward. I can't see how this could be correct. > > -- Laura > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > -- André-John Mas Software Developer / Développeur Informatique Newtrade Technologies 63 de Brésoles, Suite 100, Montreal, Quebec, Canada H2Y 1V7 mailto:[EMAIL PROTECTED] tel +1 514 286-8187 x3017 fax +1 514 221-3287 ---------------------------------------------------------------------- If you have received this message in error, please notify the sender immediately and delete the original without making a copy, disclosing its contents or taking any action based thereon. Si vous avez reçu ce message par erreur, veuillez en aviser immédiatement le signataire et effacer l'original, sans en tirer de copie, en dévoiler le contenu ni prendre quelque mesure fondée sur celui-ci. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]