RE: utf-8 with tomcat 5: second round
This is exactly what should happen. You are working with characters not bytes hence you see 1 UTF-8 character. Mark > -Original Message- > From: Asher Tarnopolski [mailto:[EMAIL PROTECTED] > Sent: Sunday, July 04, 2004 11:18 PM > To: Tomcat Users List > Subject: Re: utf-8 with tomcat 5: second round > > hey mark, thanks for response. > i run the code i pasted below. > for example, i enter one hebrew letter. it's utf > code is 1488. > on tc 4.0.xx i get the following results: > > 7 (the length of its utf-8 code) > א (the letter itself in utf-8 encoding) > א(same as above parsed to be visible in browser) > > in tc 5 i get this: > 1(which already lets me know that this is not really utf-8) > the entered hebrew letter > the entered hebrew letter (nothing is parsed, so '&' signed > wasn't even met) > this is it. > > - Original Message - > From: "Mark Thomas" <[EMAIL PROTECTED]> > To: "'Tomcat Users List'" <[EMAIL PROTECTED]>; "'Asher > Tarnopolski'" <[EMAIL PROTECTED]> > Sent: Sunday, July 04, 2004 8:46 PM > Subject: RE: utf-8 with tomcat 5: second round > > > > Asher, > > > > A few questions... > > > > What do you put in the text box on the form and what output > do you see? > > > > Are you really using " method=post>" or do you > mean > > ? > > > > When I did my test I copied your UTF-8 character form the > bugzilla report > and > > pasted into the text box. I was seeing question marks in > the output until > I > > added the <[EMAIL PROTECTED] pageEncoding="UTF-8"%> The test was on XP > (as per the > bug > > report) and I assume you used IE as the browser. > > > > The URI encoding is a red herring in this case. Because you > are using post > it is > > only the request encoding that matters. > > > > The full text of my test JSP is below. > > > > Mark > > > > <%@ page language="java" import="java.lang.*,java.util.*" %> > > <%@ page pageEncoding="UTF-8" %> > > > > > > > > > > > > > > > > > > > > <% > > request.setCharacterEncoding("UTF-8"); > > > > if(request.getParameter("source")!=null) > > { > > out.println(request.getParameter("source").length()+""); > > > > out.println(request.getParameter("source")); > > > > StringBuffer sb = new StringBuffer(); > > for(int i=0; i > { > > if(request.getParameter("source").charAt(i) == '&') > > sb.append("&"); > > else > > sb.append(request.getParameter("source").charAt(i)); > > > > } > > out.println(""+ sb.toString()); > > } > > %> > > > > > > > > > > > > > > > -Original Message- > > > From: Asher Tarnopolski [mailto:[EMAIL PROTECTED] > > > Sent: Sunday, July 04, 2004 6:25 PM > > > To: [EMAIL PROTECTED] > > > Subject: utf-8 with tomcat 5: second round > > > > > > hi folks, > > > i've published a question about it a couple of days ago, but > > > didn't get any responses. > > > i've tried some things i found in bugzilla, but they didn't > > > help. so, i wanna try to get your help once more. > > > once more about my problem: > > > i try to send utf-8 encoded parameters in POST body, but they > > > arrived encoded in ISO... > > > this worked perfectly with tomcat 4.0.x. > > > from the info i've got from a developer at bugzilla i learned > > > that the difference between tc4.0 and tc5 > > > that causes the change is actually in coyote http1.1 > > > connector. there is an attribute > > > called useBodyEncodingForURI which was set to "true" in tc4, > > > but became "false" in tc5. > > > setting it to "true" together with <%@ page > > > pageEncoding="UTF-8" %> and > > > <%request.setCharacterEncoding("UTF-8");%> will make the > difference. > > > i made the change, the jsp tags are in the code and coyote > > > settings look like this now: > > > > > > > > > > > > > >maxThreads="150" minSpareThreads="25" > > > maxSpareThreads="75"
Re: utf-8 with tomcat 5: second round
Hmm, OK, still try the filter tho as I still expect that setting the char encoding where you have it in the .jsp will be too late. Before using the filter (with struts) I was using a controller servlet (non-struts) that set the encoding first thing. I run UTF-8 through TC4, TC5 with no changes to the TC config at all. Mike Asher Tarnopolski wrote: sorry, no struts are involved. - Original Message - From: "M.Hockings" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, July 05, 2004 7:04 PM Subject: Re: utf-8 with tomcat 5: second round Hi Asher, It looks like you are using Struts? If so then setting the encoding in the response is too late as the Struts runtime has already set it. Look into using a filter (that is what I do) for your webapp, I expect that should solve your problem. You can Google about for more on utf-8 and Struts. http://www.anassina.com/struts/i18n/i18n.html Good luck Mike Asher Tarnopolski wrote: hey mark, thanks for response. i run the code i pasted below. for example, i enter one hebrew letter. it's utf code is 1488. on tc 4.0.xx i get the following results: 7 (the length of its utf-8 code) א (the letter itself in utf-8 encoding) א(same as above parsed to be visible in browser) in tc 5 i get this: 1(which already lets me know that this is not really utf-8) the entered hebrew letter the entered hebrew letter (nothing is parsed, so '&' signed wasn't even met) this is it. - Original Message - From: "Mark Thomas" <[EMAIL PROTECTED]> To: "'Tomcat Users List'" <[EMAIL PROTECTED]>; "'Asher Tarnopolski'" <[EMAIL PROTECTED]> Sent: Sunday, July 04, 2004 8:46 PM Subject: RE: utf-8 with tomcat 5: second round Asher, A few questions... What do you put in the text box on the form and what output do you see? Are you really using "" or do you mean ? When I did my test I copied your UTF-8 character form the bugzilla report and pasted into the text box. I was seeing question marks in the output until I added the <[EMAIL PROTECTED] pageEncoding="UTF-8"%> The test was on XP (as per the bug report) and I assume you used IE as the browser. The URI encoding is a red herring in this case. Because you are using post it is only the request encoding that matters. The full text of my test JSP is below. Mark <%@ page language="java" import="java.lang.*,java.util.*" %> <%@ page pageEncoding="UTF-8" %> <% request.setCharacterEncoding("UTF-8"); if(request.getParameter("source")!=null) { out.println(request.getParameter("source").length()+""); out.println(request.getParameter("source")); StringBuffer sb = new StringBuffer(); for(int i=0; i } out.println(""+ sb.toString()); } %> -Original Message- From: Asher Tarnopolski [mailto:[EMAIL PROTECTED] Sent: Sunday, July 04, 2004 6:25 PM To: [EMAIL PROTECTED] Subject: utf-8 with tomcat 5: second round hi folks, i've published a question about it a couple of days ago, but didn't get any responses. i've tried some things i found in bugzilla, but they didn't help. so, i wanna try to get your help once more. once more about my problem: i try to send utf-8 encoded parameters in POST body, but they arrived encoded in ISO... this worked perfectly with tomcat 4.0.x. from the info i've got from a developer at bugzilla i learned that the difference between tc4.0 and tc5 that causes the change is actually in coyote http1.1 connector. there is an attribute called useBodyEncodingForURI which was set to "true" in tc4, but became "false" in tc5. setting it to "true" together with <%@ page pageEncoding="UTF-8" %> and <%request.setCharacterEncoding("UTF-8");%> will make the difference. i made the change, the jsp tags are in the code and coyote settings look like this now: but this doesn't help! another request to bugzilla didn't help either, i was told that this is not a bug in tomcat, so they are not going to deal with the question. well, may be it's not a tomcat bug, but it should be some kind of bug. any ideas? my testing code comes here: <[EMAIL PROTECTED] contentType="text/html; charset=utf-8"%> <[EMAIL PROTECTED] pageEncoding="utf-8"%> <% request.setCharacterEncoding("UTF-8"); if(request.getParameter("source")!=null) { out.println(request.getParameter("source").length()+""); out.println(request.getParameter("source")); StringBuffer sb = new StringBuffer(); for(int i=0; i } out.println(""+ sb.toString()); } %> - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: utf-8 with tomcat 5: second round
sorry, no struts are involved. - Original Message - From: "M.Hockings" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, July 05, 2004 7:04 PM Subject: Re: utf-8 with tomcat 5: second round > Hi Asher, > > It looks like you are using Struts? If so then setting the encoding in > the response is too late as the Struts runtime has already set it. > > Look into using a filter (that is what I do) for your webapp, I expect > that should solve your problem. > > You can Google about for more on utf-8 and Struts. > > http://www.anassina.com/struts/i18n/i18n.html > > Good luck > > Mike > > > Asher Tarnopolski wrote: > > hey mark, thanks for response. > > i run the code i pasted below. > > for example, i enter one hebrew letter. it's utf > > code is 1488. > > on tc 4.0.xx i get the following results: > > > > 7 (the length of its utf-8 code) > > א (the letter itself in utf-8 encoding) > > א(same as above parsed to be visible in browser) > > > > in tc 5 i get this: > > 1(which already lets me know that this is not really utf-8) > > the entered hebrew letter > > the entered hebrew letter (nothing is parsed, so '&' signed wasn't even met) > > this is it. > > > > - Original Message ----- > > From: "Mark Thomas" <[EMAIL PROTECTED]> > > To: "'Tomcat Users List'" <[EMAIL PROTECTED]>; "'Asher > > Tarnopolski'" <[EMAIL PROTECTED]> > > Sent: Sunday, July 04, 2004 8:46 PM > > Subject: RE: utf-8 with tomcat 5: second round > > > > > > > >>Asher, > >> > >>A few questions... > >> > >>What do you put in the text box on the form and what output do you see? > >> > >>Are you really using "" or do you > > > > mean > > > >>? > >> > >>When I did my test I copied your UTF-8 character form the bugzilla report > > > > and > > > >>pasted into the text box. I was seeing question marks in the output until > > > > I > > > >>added the <[EMAIL PROTECTED] pageEncoding="UTF-8"%> The test was on XP (as per the > > > > bug > > > >>report) and I assume you used IE as the browser. > >> > >>The URI encoding is a red herring in this case. Because you are using post > > > > it is > > > >>only the request encoding that matters. > >> > >>The full text of my test JSP is below. > >> > >>Mark > >> > >><%@ page language="java" import="java.lang.*,java.util.*" %> > >><%@ page pageEncoding="UTF-8" %> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >><% > >>request.setCharacterEncoding("UTF-8"); > >> > >>if(request.getParameter("source")!=null) > >>{ > >> out.println(request.getParameter("source").length()+""); > >> > >> out.println(request.getParameter("source")); > >> > >> StringBuffer sb = new StringBuffer(); > >> for(int i=0; i >> { > >>if(request.getParameter("source").charAt(i) == '&') > >> sb.append("&"); > >>else > >> sb.append(request.getParameter("source").charAt(i)); > >> > >> } > >> out.println(""+ sb.toString()); > >>} > >>%> > >> > >> > >> > >> > >> > >> > >>>-Original Message- > >>>From: Asher Tarnopolski [mailto:[EMAIL PROTECTED] > >>>Sent: Sunday, July 04, 2004 6:25 PM > >>>To: [EMAIL PROTECTED] > >>>Subject: utf-8 with tomcat 5: second round > >>> > >>>hi folks, > >>>i've published a question about it a couple of days ago, but > >>>didn't get any responses. > >>>i've tried some things i found in bugzilla, but they didn't > >>>help. so, i wanna try to get your help once more. > >>>once more about my problem: > >>>i try to send utf-8 encoded parameters in POST body, but they > >>>arrived encoded in ISO... > >>>this worked perfectly with tomcat 4.0.x. > >>>from the info i've
Re: utf-8 with tomcat 5: second round
Hi Asher, It looks like you are using Struts? If so then setting the encoding in the response is too late as the Struts runtime has already set it. Look into using a filter (that is what I do) for your webapp, I expect that should solve your problem. You can Google about for more on utf-8 and Struts. http://www.anassina.com/struts/i18n/i18n.html Good luck Mike Asher Tarnopolski wrote: hey mark, thanks for response. i run the code i pasted below. for example, i enter one hebrew letter. it's utf code is 1488. on tc 4.0.xx i get the following results: 7 (the length of its utf-8 code) א (the letter itself in utf-8 encoding) א(same as above parsed to be visible in browser) in tc 5 i get this: 1(which already lets me know that this is not really utf-8) the entered hebrew letter the entered hebrew letter (nothing is parsed, so '&' signed wasn't even met) this is it. - Original Message - From: "Mark Thomas" <[EMAIL PROTECTED]> To: "'Tomcat Users List'" <[EMAIL PROTECTED]>; "'Asher Tarnopolski'" <[EMAIL PROTECTED]> Sent: Sunday, July 04, 2004 8:46 PM Subject: RE: utf-8 with tomcat 5: second round Asher, A few questions... What do you put in the text box on the form and what output do you see? Are you really using "" or do you mean ? When I did my test I copied your UTF-8 character form the bugzilla report and pasted into the text box. I was seeing question marks in the output until I added the <[EMAIL PROTECTED] pageEncoding="UTF-8"%> The test was on XP (as per the bug report) and I assume you used IE as the browser. The URI encoding is a red herring in this case. Because you are using post it is only the request encoding that matters. The full text of my test JSP is below. Mark <%@ page language="java" import="java.lang.*,java.util.*" %> <%@ page pageEncoding="UTF-8" %> <% request.setCharacterEncoding("UTF-8"); if(request.getParameter("source")!=null) { out.println(request.getParameter("source").length()+""); out.println(request.getParameter("source")); StringBuffer sb = new StringBuffer(); for(int i=0; i } out.println(""+ sb.toString()); } %> -Original Message- From: Asher Tarnopolski [mailto:[EMAIL PROTECTED] Sent: Sunday, July 04, 2004 6:25 PM To: [EMAIL PROTECTED] Subject: utf-8 with tomcat 5: second round hi folks, i've published a question about it a couple of days ago, but didn't get any responses. i've tried some things i found in bugzilla, but they didn't help. so, i wanna try to get your help once more. once more about my problem: i try to send utf-8 encoded parameters in POST body, but they arrived encoded in ISO... this worked perfectly with tomcat 4.0.x. from the info i've got from a developer at bugzilla i learned that the difference between tc4.0 and tc5 that causes the change is actually in coyote http1.1 connector. there is an attribute called useBodyEncodingForURI which was set to "true" in tc4, but became "false" in tc5. setting it to "true" together with <%@ page pageEncoding="UTF-8" %> and <%request.setCharacterEncoding("UTF-8");%> will make the difference. i made the change, the jsp tags are in the code and coyote settings look like this now: but this doesn't help! another request to bugzilla didn't help either, i was told that this is not a bug in tomcat, so they are not going to deal with the question. well, may be it's not a tomcat bug, but it should be some kind of bug. any ideas? my testing code comes here: <[EMAIL PROTECTED] contentType="text/html; charset=utf-8"%> <[EMAIL PROTECTED] pageEncoding="utf-8"%> <% request.setCharacterEncoding("UTF-8"); if(request.getParameter("source")!=null) { out.println(request.getParameter("source").length()+""); out.println(request.getParameter("source")); StringBuffer sb = new StringBuffer(); for(int i=0; i } out.println(""+ sb.toString()); } %> - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: utf-8 with tomcat 5: second round
hey mark, thanks for response. i run the code i pasted below. for example, i enter one hebrew letter. it's utf code is 1488. on tc 4.0.xx i get the following results: 7 (the length of its utf-8 code) א (the letter itself in utf-8 encoding) א(same as above parsed to be visible in browser) in tc 5 i get this: 1(which already lets me know that this is not really utf-8) the entered hebrew letter the entered hebrew letter (nothing is parsed, so '&' signed wasn't even met) this is it. - Original Message - From: "Mark Thomas" <[EMAIL PROTECTED]> To: "'Tomcat Users List'" <[EMAIL PROTECTED]>; "'Asher Tarnopolski'" <[EMAIL PROTECTED]> Sent: Sunday, July 04, 2004 8:46 PM Subject: RE: utf-8 with tomcat 5: second round > Asher, > > A few questions... > > What do you put in the text box on the form and what output do you see? > > Are you really using "" or do you mean > ? > > When I did my test I copied your UTF-8 character form the bugzilla report and > pasted into the text box. I was seeing question marks in the output until I > added the <[EMAIL PROTECTED] pageEncoding="UTF-8"%> The test was on XP (as per the bug > report) and I assume you used IE as the browser. > > The URI encoding is a red herring in this case. Because you are using post it is > only the request encoding that matters. > > The full text of my test JSP is below. > > Mark > > <%@ page language="java" import="java.lang.*,java.util.*" %> > <%@ page pageEncoding="UTF-8" %> > > > > > > > > > > <% > request.setCharacterEncoding("UTF-8"); > > if(request.getParameter("source")!=null) > { > out.println(request.getParameter("source").length()+""); > > out.println(request.getParameter("source")); > > StringBuffer sb = new StringBuffer(); > for(int i=0; i { > if(request.getParameter("source").charAt(i) == '&') > sb.append("&"); > else > sb.append(request.getParameter("source").charAt(i)); > > } > out.println(""+ sb.toString()); > } > %> > > > > > > > > -Original Message- > > From: Asher Tarnopolski [mailto:[EMAIL PROTECTED] > > Sent: Sunday, July 04, 2004 6:25 PM > > To: [EMAIL PROTECTED] > > Subject: utf-8 with tomcat 5: second round > > > > hi folks, > > i've published a question about it a couple of days ago, but > > didn't get any responses. > > i've tried some things i found in bugzilla, but they didn't > > help. so, i wanna try to get your help once more. > > once more about my problem: > > i try to send utf-8 encoded parameters in POST body, but they > > arrived encoded in ISO... > > this worked perfectly with tomcat 4.0.x. > > from the info i've got from a developer at bugzilla i learned > > that the difference between tc4.0 and tc5 > > that causes the change is actually in coyote http1.1 > > connector. there is an attribute > > called useBodyEncodingForURI which was set to "true" in tc4, > > but became "false" in tc5. > > setting it to "true" together with <%@ page > > pageEncoding="UTF-8" %> and > > <%request.setCharacterEncoding("UTF-8");%> will make the difference. > > i made the change, the jsp tags are in the code and coyote > > settings look like this now: > > > > > > > > >maxThreads="150" minSpareThreads="25" > > maxSpareThreads="75" > >enableLookups="false" redirectPort="8443" > > acceptCount="100" > >debug="0" connectionTimeout="2" > >useBodyEncodingForURI="true" > >disableUploadTimeout="true" /> > > > > > > but this doesn't help! another request to bugzilla didn't > > help either, i was told that this is not a bug in tomcat, > > so they are not going to deal with the question. well, may be > > it's not a tomcat bug, but it should be some kind of bug. > > any ideas? > > > > my testing code comes here: > > > > > > > > <[EMAIL PROTECTED] contentType="text/html; charset=utf-8"%> > > <[EMAIL PROTECTED] pageEncoding="utf-8"%> > > > > > > > > > > > > > > > > > > > > > > > > <% > > request.setCharacterEncoding("UTF-8"); > > > > if(request.getParameter("source")!=null) > > { > > out.println(request.getParameter("source").length()+""); > > > > out.println(request.getParameter("source")); > > > > StringBuffer sb = new StringBuffer(); > > for(int i=0; i > { > > if(request.getParameter("source").charAt(i) == '&') > > sb.append("&"); > > else > > sb.append(request.getParameter("source").charAt(i)); > > > > } > > out.println(""+ sb.toString()); > > } > > %> > > > > > > > > > > > > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: utf-8 with tomcat 5: second round
Asher, A few questions... What do you put in the text box on the form and what output do you see? Are you really using "" or do you mean ? When I did my test I copied your UTF-8 character form the bugzilla report and pasted into the text box. I was seeing question marks in the output until I added the <[EMAIL PROTECTED] pageEncoding="UTF-8"%> The test was on XP (as per the bug report) and I assume you used IE as the browser. The URI encoding is a red herring in this case. Because you are using post it is only the request encoding that matters. The full text of my test JSP is below. Mark <%@ page language="java" import="java.lang.*,java.util.*" %> <%@ page pageEncoding="UTF-8" %> <% request.setCharacterEncoding("UTF-8"); if(request.getParameter("source")!=null) { out.println(request.getParameter("source").length()+""); out.println(request.getParameter("source")); StringBuffer sb = new StringBuffer(); for(int i=0; i"+ sb.toString()); } %> > -Original Message- > From: Asher Tarnopolski [mailto:[EMAIL PROTECTED] > Sent: Sunday, July 04, 2004 6:25 PM > To: [EMAIL PROTECTED] > Subject: utf-8 with tomcat 5: second round > > hi folks, > i've published a question about it a couple of days ago, but > didn't get any responses. > i've tried some things i found in bugzilla, but they didn't > help. so, i wanna try to get your help once more. > once more about my problem: > i try to send utf-8 encoded parameters in POST body, but they > arrived encoded in ISO... > this worked perfectly with tomcat 4.0.x. > from the info i've got from a developer at bugzilla i learned > that the difference between tc4.0 and tc5 > that causes the change is actually in coyote http1.1 > connector. there is an attribute > called useBodyEncodingForURI which was set to "true" in tc4, > but became "false" in tc5. > setting it to "true" together with <%@ page > pageEncoding="UTF-8" %> and > <%request.setCharacterEncoding("UTF-8");%> will make the difference. > i made the change, the jsp tags are in the code and coyote > settings look like this now: > > > > maxThreads="150" minSpareThreads="25" > maxSpareThreads="75" >enableLookups="false" redirectPort="8443" > acceptCount="100" >debug="0" connectionTimeout="2" >useBodyEncodingForURI="true" >disableUploadTimeout="true" /> > > > but this doesn't help! another request to bugzilla didn't > help either, i was told that this is not a bug in tomcat, > so they are not going to deal with the question. well, may be > it's not a tomcat bug, but it should be some kind of bug. > any ideas? > > my testing code comes here: > > > > <[EMAIL PROTECTED] contentType="text/html; charset=utf-8"%> > <[EMAIL PROTECTED] pageEncoding="utf-8"%> > > > > > > > > > > > > <% > request.setCharacterEncoding("UTF-8"); > > if(request.getParameter("source")!=null) > { > out.println(request.getParameter("source").length()+""); > > out.println(request.getParameter("source")); > > StringBuffer sb = new StringBuffer(); > for(int i=0; i { > if(request.getParameter("source").charAt(i) == '&') > sb.append("&"); > else > sb.append(request.getParameter("source").charAt(i)); > > } > out.println(""+ sb.toString()); > } > %> > > > > > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]