RE: UTF-8 Encoding Issue Since 5.0.27 ( gun in my mouth )

2004-09-02 Thread Rick
Thanks for you info Mark.
  It does appear that a part of my issue stems from my .properties files
being in UTF-8.
So I have to ask the question, why has this changed since if I run the same
code in 5.0.24, I have no issue, and 5.0.28 has a problem.   It sounds like
a substantial problem that UTF-8 resource bundles aren't supported any more.


Besides this simple example, I'm still seeing problems with a servlet
returning XML in UTF-8. Again, no issue in 5.0.24, only after 5.0.25.

I will put together a sample and post it shortly.

Thanks again for the help,

Rick

-Original Message-
From: Mark Thomas [mailto:[EMAIL PROTECTED] 
Posted At: Wednesday, September 01, 2004 4:14 PM
Posted To: Tomcat Dev
Conversation: UTF-8 Encoding Issue Since 5.0.27 ( gun in my mouth )
Subject: RE: UTF-8 Encoding Issue Since 5.0.27 ( gun in my mouth )


OK. I have a simple test case and all seems to be well. See the end of this
message for the contents of my test files.

My environment:
Win XP SP2 - brave I know but all has been OK so far ;) JDK 1.4.2_05 Tomcat
5.0 branch, HEAD (latest) from CVS (very close to 5.0.28)

Points to note:
1. All my test files are ASCII files.
2. I had all sorts of problems with non-ASCII properties files. I didn't get
to the bottom of it but I think Windows was adding junk to the start of the
file if it was UTF-8 encoded. Maybe having the first line as a comment would
fix this but I haven't tested this.
3. There were times where Eclipse and Windows were reporting the exact same
file as having different encodings. There is something odd here but I didn't
look at this any further.
4. I had property file issues with 4.1.HEAD as well as 5.0.HEAD.
5. The downside of using ASCII files is that entering the UTF-8 characters
by hand is a real pain. A simple conversion app should fix this though.
6. Apart from the property file issue, everything seems fine.

Test files follow.

Hope this helps,

Mark

PS I noticed that you cross-posted to the dev list. Please don't do this.
Any message cross-posted is less likely rather than more likely to get a
response.

=== utf8.jsp 
%@ page language=java import=java.lang.*,java.util.*
contentType=text/html; charset=UTF-8 % !DOCTYPE HTML PUBLIC -//W3C//DTD
HTML 4.01 Transitional//EN html
  head
titleUTF-8 Encoding issue/title
  /head
  body
pText from JSP page (which is ASCII encoded)./p
form action=utf8.jsp method=post
  pEnglishinput type=radio value=en name=language /p
  pJapaneseinput type=radio value=ja name=language /p
  input type=submit value=Post form data /
/form
pText from resources bundle:/p
%
  String language = request.getParameter(language);
  
  if (language == null) {
language=en;
  }
  
  Locale locale = null;
  if (language.equalsIgnoreCase(en)) {
locale = Locale.ENGLISH;
  } else {
locale = Locale.JAPAN;
  }
  
  ResourceBundle bundle =
ResourceBundle.getBundle(foo.bar.LocalStrings,
locale);
  out.println(p + bundle.getString(test) + /p);
%
p%=request.getParameter(language) %/p
  /body
/html

= LocalStrings_en.properties =
test=Test string from resources bundle

= LocalStrings_ja.properties =
test=\u30d5\u30a1\u30a4\u30eb\u30ed



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: UTF-8 Encoding Issue Since 5.0.27 ( gun in my mouth )

2004-09-01 Thread Mark Thomas
The change (which is required by the spec) is that if the character set has not
been set before a call to getWriter() then it will default to ISO-8859-1. There
was some discussion on the tomcat-dev list about this (see
http://marc.theaimsgroup.com/?l=tomcat-devm=109104739719572w=2)

I'll try and put together a very simple JSP test case and get back to you.

Mark

 -Original Message-
 From: Rick [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, September 01, 2004 3:44 AM
 To: 'Tomcat Users List'; [EMAIL PROTECTED]
 Subject: UTF-8 Encoding Issue Since 5.0.27 ( gun in my mouth )
 
 Since 5.0.27, pretty much all of my UTF-8 i8 code seems to be 
 messed up. 
 
 The problem seems to have been caused by whatever fix was 
 created for issue
 --
 ServletResponse.setContentType sets response encoding after 
 getWriter was
 called (Bugtraq 5062838) (luehe) 
 --
 
 Now it seems almost impossible to properly set the encoding 
 type of some of
 my JSPs and all of my Servlets that return UTF-8 XML data.
 
 As an example, my login page allows the user to switch to 
 Japanese text.
 Text data is read with a ResourceBundle, which reads from a 
 UTF-8 encoded
 .properties file.
 
 If the encoding of the .jsp page itself is in ASCII, then I 
 can't get the
 characters to show up at all any more.
 I have to save the .jsp page as UTF-8.  
 Added set JAVA_OPTS=-Dfile.encoding=UTF-8 to my catalina.bat file
 
 Then, If I try to set a character set in my page header, it messes up.
 
 This works in some cases...
 %@ page language=java import=java.util.* 
 contentType=text/html %
 response.getCharacterEncoding() = ISO-8859-1
 
 The really scary part is that with no meta or charset 
 actually set, that the
 browser(IE) correctly changes to UTF-8 and displays the 
 content fine.   But
 if I change the actual file encoding of the .jsp page from 
 UTF-8 back to
 ASCII. Then IE does not change to UTF-8 and the page is 
 messed up again.
 Why does the actual encoding of the .jsp file itself dictate 
 the response
 sent to the client?
 
 It appears that the actual encoding of the source file 
 someone how gets past
 along and then I'm unable to alter the character encoding, 
 and if I try, it
 just causes everything to go to hell.
 
 
 This use to work before 5.0.27, but now doesn't, even though 
 all data and
 pages are encoded in UTF-8.
 %@ page language=java import=java.util.* contentType=text/html;
 charset=UTF-8 %
 response.getCharacterEncoding() = UTF-8
 
 
 Before 5.0.27, all I had to do to get my output in UTF-8 was ...
  contentType=text/html; charset=UTF-8
 
 Now I have to mess with the actual .jsp file page encodings 
 and still can't
 get most to work properly as well as none of my servlets will 
 return correct
 UTF-8 data.  
 
 I have tried setting pageEncoding in the page tag as well 
 with no luck.
 
 
 Thanks for anyone's insight or help on this, its never fun to 
 find out that
 something that had been working quite solid , up and blows up 
 for no good
 reason.
 
 Current dev machine is on windows xp by the way, vanilla 
 install of Tomcat
 5.0.28.
 I will be setting this up on a Linux box for more testing shortly.
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: UTF-8 Encoding Issue Since 5.0.27 ( gun in my mouth )

2004-09-01 Thread Mark Thomas
OK. I have a simple test case and all seems to be well. See the end of this
message for the contents of my test files.

My environment:
Win XP SP2 - brave I know but all has been OK so far ;)
JDK 1.4.2_05
Tomcat 5.0 branch, HEAD (latest) from CVS (very close to 5.0.28)

Points to note:
1. All my test files are ASCII files.
2. I had all sorts of problems with non-ASCII properties files. I didn't get to
the bottom of it but I think Windows was adding junk to the start of the file if
it was UTF-8 encoded. Maybe having the first line as a comment would fix this
but I haven't tested this.
3. There were times where Eclipse and Windows were reporting the exact same file
as having different encodings. There is something odd here but I didn't look at
this any further.
4. I had property file issues with 4.1.HEAD as well as 5.0.HEAD.
5. The downside of using ASCII files is that entering the UTF-8 characters by
hand is a real pain. A simple conversion app should fix this though.
6. Apart from the property file issue, everything seems fine.

Test files follow.

Hope this helps,

Mark

PS I noticed that you cross-posted to the dev list. Please don't do this. Any
message cross-posted is less likely rather than more likely to get a response.

=== utf8.jsp 
%@ page language=java import=java.lang.*,java.util.*
contentType=text/html; charset=UTF-8 %
!DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN
html
  head
titleUTF-8 Encoding issue/title
  /head
  body
pText from JSP page (which is ASCII encoded)./p
form action=utf8.jsp method=post
  pEnglishinput type=radio value=en name=language /p
  pJapaneseinput type=radio value=ja name=language /p
  input type=submit value=Post form data /
/form
pText from resources bundle:/p
%
  String language = request.getParameter(language);
  
  if (language == null) {
language=en;
  }
  
  Locale locale = null;
  if (language.equalsIgnoreCase(en)) {
locale = Locale.ENGLISH;
  } else {
locale = Locale.JAPAN;
  }
  
  ResourceBundle bundle = ResourceBundle.getBundle(foo.bar.LocalStrings,
locale);
  out.println(p + bundle.getString(test) + /p);
%
p%=request.getParameter(language) %/p
  /body
/html

= LocalStrings_en.properties =
test=Test string from resources bundle

= LocalStrings_ja.properties =
test=\u30d5\u30a1\u30a4\u30eb\u30ed



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



UTF-8 Encoding Issue Since 5.0.27 ( gun in my mouth )

2004-08-31 Thread Rick
Since 5.0.27, pretty much all of my UTF-8 i8 code seems to be messed up. 

The problem seems to have been caused by whatever fix was created for issue
--
ServletResponse.setContentType sets response encoding after getWriter was
called (Bugtraq 5062838) (luehe) 
--

Now it seems almost impossible to properly set the encoding type of some of
my JSPs and all of my Servlets that return UTF-8 XML data.

As an example, my login page allows the user to switch to Japanese text.
Text data is read with a ResourceBundle, which reads from a UTF-8 encoded
.properties file.

If the encoding of the .jsp page itself is in ASCII, then I can't get the
characters to show up at all any more.
I have to save the .jsp page as UTF-8.  
Added set JAVA_OPTS=-Dfile.encoding=UTF-8 to my catalina.bat file

Then, If I try to set a character set in my page header, it messes up.

This works in some cases...
%@ page language=java import=java.util.* contentType=text/html %
response.getCharacterEncoding() = ISO-8859-1

The really scary part is that with no meta or charset actually set, that the
browser(IE) correctly changes to UTF-8 and displays the content fine.   But
if I change the actual file encoding of the .jsp page from UTF-8 back to
ASCII. Then IE does not change to UTF-8 and the page is messed up again.
Why does the actual encoding of the .jsp file itself dictate the response
sent to the client?

It appears that the actual encoding of the source file someone how gets past
along and then I'm unable to alter the character encoding, and if I try, it
just causes everything to go to hell.


This use to work before 5.0.27, but now doesn't, even though all data and
pages are encoded in UTF-8.
%@ page language=java import=java.util.* contentType=text/html;
charset=UTF-8 %
response.getCharacterEncoding() = UTF-8


Before 5.0.27, all I had to do to get my output in UTF-8 was ...
 contentType=text/html; charset=UTF-8

Now I have to mess with the actual .jsp file page encodings and still can't
get most to work properly as well as none of my servlets will return correct
UTF-8 data.  

I have tried setting pageEncoding in the page tag as well with no luck.


Thanks for anyone's insight or help on this, its never fun to find out that
something that had been working quite solid , up and blows up for no good
reason.

Current dev machine is on windows xp by the way, vanilla install of Tomcat
5.0.28.
I will be setting this up on a Linux box for more testing shortly.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]