RE: UTF-8 Encoding Issue Since 5.0.27 ( gun in my mouth )
Thanks for you info Mark. It does appear that a part of my issue stems from my .properties files being in UTF-8. So I have to ask the question, why has this changed since if I run the same code in 5.0.24, I have no issue, and 5.0.28 has a problem. It sounds like a substantial problem that UTF-8 resource bundles aren't supported any more. Besides this simple example, I'm still seeing problems with a servlet returning XML in UTF-8. Again, no issue in 5.0.24, only after 5.0.25. I will put together a sample and post it shortly. Thanks again for the help, Rick -Original Message- From: Mark Thomas [mailto:[EMAIL PROTECTED] Posted At: Wednesday, September 01, 2004 4:14 PM Posted To: Tomcat Dev Conversation: UTF-8 Encoding Issue Since 5.0.27 ( gun in my mouth ) Subject: RE: UTF-8 Encoding Issue Since 5.0.27 ( gun in my mouth ) OK. I have a simple test case and all seems to be well. See the end of this message for the contents of my test files. My environment: Win XP SP2 - brave I know but all has been OK so far ;) JDK 1.4.2_05 Tomcat 5.0 branch, HEAD (latest) from CVS (very close to 5.0.28) Points to note: 1. All my test files are ASCII files. 2. I had all sorts of problems with non-ASCII properties files. I didn't get to the bottom of it but I think Windows was adding junk to the start of the file if it was UTF-8 encoded. Maybe having the first line as a comment would fix this but I haven't tested this. 3. There were times where Eclipse and Windows were reporting the exact same file as having different encodings. There is something odd here but I didn't look at this any further. 4. I had property file issues with 4.1.HEAD as well as 5.0.HEAD. 5. The downside of using ASCII files is that entering the UTF-8 characters by hand is a real pain. A simple conversion app should fix this though. 6. Apart from the property file issue, everything seems fine. Test files follow. Hope this helps, Mark PS I noticed that you cross-posted to the dev list. Please don't do this. Any message cross-posted is less likely rather than more likely to get a response. === utf8.jsp %@ page language=java import=java.lang.*,java.util.* contentType=text/html; charset=UTF-8 % !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN html head titleUTF-8 Encoding issue/title /head body pText from JSP page (which is ASCII encoded)./p form action=utf8.jsp method=post pEnglishinput type=radio value=en name=language /p pJapaneseinput type=radio value=ja name=language /p input type=submit value=Post form data / /form pText from resources bundle:/p % String language = request.getParameter(language); if (language == null) { language=en; } Locale locale = null; if (language.equalsIgnoreCase(en)) { locale = Locale.ENGLISH; } else { locale = Locale.JAPAN; } ResourceBundle bundle = ResourceBundle.getBundle(foo.bar.LocalStrings, locale); out.println(p + bundle.getString(test) + /p); % p%=request.getParameter(language) %/p /body /html = LocalStrings_en.properties = test=Test string from resources bundle = LocalStrings_ja.properties = test=\u30d5\u30a1\u30a4\u30eb\u30ed - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: UTF-8 Encoding Issue Since 5.0.27 ( gun in my mouth )
The change (which is required by the spec) is that if the character set has not been set before a call to getWriter() then it will default to ISO-8859-1. There was some discussion on the tomcat-dev list about this (see http://marc.theaimsgroup.com/?l=tomcat-devm=109104739719572w=2) I'll try and put together a very simple JSP test case and get back to you. Mark -Original Message- From: Rick [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 01, 2004 3:44 AM To: 'Tomcat Users List'; [EMAIL PROTECTED] Subject: UTF-8 Encoding Issue Since 5.0.27 ( gun in my mouth ) Since 5.0.27, pretty much all of my UTF-8 i8 code seems to be messed up. The problem seems to have been caused by whatever fix was created for issue -- ServletResponse.setContentType sets response encoding after getWriter was called (Bugtraq 5062838) (luehe) -- Now it seems almost impossible to properly set the encoding type of some of my JSPs and all of my Servlets that return UTF-8 XML data. As an example, my login page allows the user to switch to Japanese text. Text data is read with a ResourceBundle, which reads from a UTF-8 encoded .properties file. If the encoding of the .jsp page itself is in ASCII, then I can't get the characters to show up at all any more. I have to save the .jsp page as UTF-8. Added set JAVA_OPTS=-Dfile.encoding=UTF-8 to my catalina.bat file Then, If I try to set a character set in my page header, it messes up. This works in some cases... %@ page language=java import=java.util.* contentType=text/html % response.getCharacterEncoding() = ISO-8859-1 The really scary part is that with no meta or charset actually set, that the browser(IE) correctly changes to UTF-8 and displays the content fine. But if I change the actual file encoding of the .jsp page from UTF-8 back to ASCII. Then IE does not change to UTF-8 and the page is messed up again. Why does the actual encoding of the .jsp file itself dictate the response sent to the client? It appears that the actual encoding of the source file someone how gets past along and then I'm unable to alter the character encoding, and if I try, it just causes everything to go to hell. This use to work before 5.0.27, but now doesn't, even though all data and pages are encoded in UTF-8. %@ page language=java import=java.util.* contentType=text/html; charset=UTF-8 % response.getCharacterEncoding() = UTF-8 Before 5.0.27, all I had to do to get my output in UTF-8 was ... contentType=text/html; charset=UTF-8 Now I have to mess with the actual .jsp file page encodings and still can't get most to work properly as well as none of my servlets will return correct UTF-8 data. I have tried setting pageEncoding in the page tag as well with no luck. Thanks for anyone's insight or help on this, its never fun to find out that something that had been working quite solid , up and blows up for no good reason. Current dev machine is on windows xp by the way, vanilla install of Tomcat 5.0.28. I will be setting this up on a Linux box for more testing shortly. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: UTF-8 Encoding Issue Since 5.0.27 ( gun in my mouth )
OK. I have a simple test case and all seems to be well. See the end of this message for the contents of my test files. My environment: Win XP SP2 - brave I know but all has been OK so far ;) JDK 1.4.2_05 Tomcat 5.0 branch, HEAD (latest) from CVS (very close to 5.0.28) Points to note: 1. All my test files are ASCII files. 2. I had all sorts of problems with non-ASCII properties files. I didn't get to the bottom of it but I think Windows was adding junk to the start of the file if it was UTF-8 encoded. Maybe having the first line as a comment would fix this but I haven't tested this. 3. There were times where Eclipse and Windows were reporting the exact same file as having different encodings. There is something odd here but I didn't look at this any further. 4. I had property file issues with 4.1.HEAD as well as 5.0.HEAD. 5. The downside of using ASCII files is that entering the UTF-8 characters by hand is a real pain. A simple conversion app should fix this though. 6. Apart from the property file issue, everything seems fine. Test files follow. Hope this helps, Mark PS I noticed that you cross-posted to the dev list. Please don't do this. Any message cross-posted is less likely rather than more likely to get a response. === utf8.jsp %@ page language=java import=java.lang.*,java.util.* contentType=text/html; charset=UTF-8 % !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN html head titleUTF-8 Encoding issue/title /head body pText from JSP page (which is ASCII encoded)./p form action=utf8.jsp method=post pEnglishinput type=radio value=en name=language /p pJapaneseinput type=radio value=ja name=language /p input type=submit value=Post form data / /form pText from resources bundle:/p % String language = request.getParameter(language); if (language == null) { language=en; } Locale locale = null; if (language.equalsIgnoreCase(en)) { locale = Locale.ENGLISH; } else { locale = Locale.JAPAN; } ResourceBundle bundle = ResourceBundle.getBundle(foo.bar.LocalStrings, locale); out.println(p + bundle.getString(test) + /p); % p%=request.getParameter(language) %/p /body /html = LocalStrings_en.properties = test=Test string from resources bundle = LocalStrings_ja.properties = test=\u30d5\u30a1\u30a4\u30eb\u30ed - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
UTF-8 Encoding Issue Since 5.0.27 ( gun in my mouth )
Since 5.0.27, pretty much all of my UTF-8 i8 code seems to be messed up. The problem seems to have been caused by whatever fix was created for issue -- ServletResponse.setContentType sets response encoding after getWriter was called (Bugtraq 5062838) (luehe) -- Now it seems almost impossible to properly set the encoding type of some of my JSPs and all of my Servlets that return UTF-8 XML data. As an example, my login page allows the user to switch to Japanese text. Text data is read with a ResourceBundle, which reads from a UTF-8 encoded .properties file. If the encoding of the .jsp page itself is in ASCII, then I can't get the characters to show up at all any more. I have to save the .jsp page as UTF-8. Added set JAVA_OPTS=-Dfile.encoding=UTF-8 to my catalina.bat file Then, If I try to set a character set in my page header, it messes up. This works in some cases... %@ page language=java import=java.util.* contentType=text/html % response.getCharacterEncoding() = ISO-8859-1 The really scary part is that with no meta or charset actually set, that the browser(IE) correctly changes to UTF-8 and displays the content fine. But if I change the actual file encoding of the .jsp page from UTF-8 back to ASCII. Then IE does not change to UTF-8 and the page is messed up again. Why does the actual encoding of the .jsp file itself dictate the response sent to the client? It appears that the actual encoding of the source file someone how gets past along and then I'm unable to alter the character encoding, and if I try, it just causes everything to go to hell. This use to work before 5.0.27, but now doesn't, even though all data and pages are encoded in UTF-8. %@ page language=java import=java.util.* contentType=text/html; charset=UTF-8 % response.getCharacterEncoding() = UTF-8 Before 5.0.27, all I had to do to get my output in UTF-8 was ... contentType=text/html; charset=UTF-8 Now I have to mess with the actual .jsp file page encodings and still can't get most to work properly as well as none of my servlets will return correct UTF-8 data. I have tried setting pageEncoding in the page tag as well with no luck. Thanks for anyone's insight or help on this, its never fun to find out that something that had been working quite solid , up and blows up for no good reason. Current dev machine is on windows xp by the way, vanilla install of Tomcat 5.0.28. I will be setting this up on a Linux box for more testing shortly. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]