Re: character encoding of a HttpServletRequest
2010/1/10 Jos Snellings jos.snelli...@pandora.be This is not a specific cocoon issue, I believe. It probably has to do with Tomcat 5.5.27. request.setCharacterEncoding simply does not work; it does not change a thing. request.getCharacterEncoding returns nothing. You have to call request.setCharacterEncoding() really early for it to have any impact. Your best bet is to look at spring's CharacterEncodingFilter. You can add that to your web.xml to get the character set defined very early on. -Dom
Re: character encoding of a HttpServletRequest
Thanks, I will try CharacterEncodingFilter! I will lookup in the code were filtering takes place, because the problem is rather that it looks like the form data are filtered twice. In addition, do I remember right that there used to be a cocoon servlet setting, init-param param-nameform-encoding/param-name param-valueUTF-8/param-value /init-param Cheers, thanks for the hint. I will post the result... I will certainly not be the only person who is confronted with this problem. Jos On Mon, 2010-01-11 at 08:54 +, Dominic Mitchell wrote: 2010/1/10 Jos Snellings jos.snelli...@pandora.be This is not a specific cocoon issue, I believe. It probably has to do with Tomcat 5.5.27. request.setCharacterEncoding simply does not work; it does not change a thing. request.getCharacterEncoding returns nothing. You have to call request.setCharacterEncoding() really early for it to have any impact. Your best bet is to look at spring's CharacterEncodingFilter. You can add that to your web.xml to get the character set defined very early on. -Dom - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org
Re: character encoding of a HttpServletRequest
On Mon, Jan 11, 2010 at 9:12 AM, Jos Snellings jos.snelli...@pandora.bewrote: Thanks, I will try CharacterEncodingFilter! I will lookup in the code were filtering takes place, because the problem is rather that it looks like the form data are filtered twice. In addition, do I remember right that there used to be a cocoon servlet setting, init-param param-nameform-encoding/param-name param-valueUTF-8/param-value /init-param Cheers, thanks for the hint. I will post the result... I will certainly not be the only person who is confronted with this problem. There are so many places to set the encoding. And just for fun, you can have different encodings for query string parameters and form-data in the body in the same request. Sigh. I've had good luck with CharacterEncodingFilterhttp://static.springsource.org/spring/docs/2.5.6/api/org/springframework/web/filter/CharacterEncodingFilter.htmlthough. -Dom
Re: character encoding of a HttpServletRequest
Jos Snellings wrote: Hi, HttpServletRequest looks 'imperfect': Cocoon 3, alpha 2. A generator accesses the HttpServletRequest in the setup method: request = HttpContextHelper.getRequest(parameters); text = request.getParameter(tekst); The pages, including forms are ecoded in utf-8. The String 'text' is strange: the original content (utf-8) is encoded once again: if the string on the form was one character, say 'é', the string has a length of 4 bytes. It is the result of utf-8 encoding the two byte character coming from the client. So, a second conversion is happening. Now: new String(request.getParameter(text).getBytes(ISO-8859-1)) works fine. Where should this be corrected? Jos, in Cocoon 3 there isn't any code that changes the encoding of request parameters. The plain HttpServletRequest as provided by the servlet container is used. IIRC Tomcat uses ISO-8859-1 by default which follows the recommendation of the Servlet API spec: ~~~ SRV.4.9 Request data encoding Currently, many browsers do not send a char encoding qualifier with the Content-Type header, leaving open the determination of the character encoding for reading HTTP requests. The default encoding of a request the container uses to create the request reader and parse POST data must be “ISO-8859-1” if none has been specified by the client request. However, in order to indicate to the developer in this case the failure of the client to send a character encoding, the container returns null from the getCharacterEncoding method. If the client hasn’t set character encoding and the request data is encoded with a different encoding than the default as described above, breakage can occur. To remedy this situation, a new method setCharacterEncoding(String enc) has been added to the ServletRequest interface. Developers can override the character encoding supplied by the container by calling this method. It must be called prior to parsing any post data or reading any input from the request. Calling this method once data has been read will not affect the encoding. ~~~ So as some others suggested, the best option is using one of the CharecterEncoding servlet filters and not to remedy this situation somewhere in C3. -- Reinhard Pötz Managing Director, {Indoqa} GmbH http://www.indoqa.com/en/people/reinhard.poetz/ Member of the Apache Software Foundation Apache Cocoon Committer, PMC member reinh...@apache.org - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org
Re: character encoding of a HttpServletRequest
This, to notify you that the solution you suggested works fine: So, for all cocoon users: if you are experiencing problems with the character encoding of POST form data (which is very likely to occur): the problem is generally cured by Inserting the following code in web.xml filter filter-nameencodingFilter/filter-name filter-classorg.springframework.web.filter.CharacterEncodingFilter/filter-class init-param param-nameencoding/param-name param-valueUTF-8/param-value /init-param init-param param-nameforceEncoding/param-name param-valuetrue/param-value /init-param /filter filter-mapping filter-nameencodingFilter/filter-name url-pattern/*/url-pattern /filter-mapping (Insert it as the first children under the web-app root element) Jos On Mon, 2010-01-11 at 08:54 +, Dominic Mitchell wrote: 2010/1/10 Jos Snellings jos.snelli...@pandora.be This is not a specific cocoon issue, I believe. It probably has to do with Tomcat 5.5.27. request.setCharacterEncoding simply does not work; it does not change a thing. request.getCharacterEncoding returns nothing. You have to call request.setCharacterEncoding() really early for it to have any impact. Your best bet is to look at spring's CharacterEncodingFilter. You can add that to your web.xml to get the character set defined very early on. -Dom - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org
Re: character encoding of a HttpServletRequest
That is right! It is just a confusing situation :-( The filter works fine. The init() method of a generator does not give a chance to call setCharacterEncoding, as the parsing already happened. The good thing is that the code is already in spring, so, no new external dependencies. Maybe later on I add a tryToGuessEncodingFilter. Jos On Mon, 2010-01-11 at 10:49 +0100, Reinhard Pötz wrote: Jos Snellings wrote: Hi, HttpServletRequest looks 'imperfect': Cocoon 3, alpha 2. A generator accesses the HttpServletRequest in the setup method: request = HttpContextHelper.getRequest(parameters); text = request.getParameter(tekst); The pages, including forms are ecoded in utf-8. The String 'text' is strange: the original content (utf-8) is encoded once again: if the string on the form was one character, say 'é', the string has a length of 4 bytes. It is the result of utf-8 encoding the two byte character coming from the client. So, a second conversion is happening. Now: new String(request.getParameter(text).getBytes(ISO-8859-1)) works fine. Where should this be corrected? Jos, in Cocoon 3 there isn't any code that changes the encoding of request parameters. The plain HttpServletRequest as provided by the servlet container is used. IIRC Tomcat uses ISO-8859-1 by default which follows the recommendation of the Servlet API spec: ~~~ SRV.4.9 Request data encoding Currently, many browsers do not send a char encoding qualifier with the Content-Type header, leaving open the determination of the character encoding for reading HTTP requests. The default encoding of a request the container uses to create the request reader and parse POST data must be “ISO-8859-1” if none has been specified by the client request. However, in order to indicate to the developer in this case the failure of the client to send a character encoding, the container returns null from the getCharacterEncoding method. If the client hasn’t set character encoding and the request data is encoded with a different encoding than the default as described above, breakage can occur. To remedy this situation, a new method setCharacterEncoding(String enc) has been added to the ServletRequest interface. Developers can override the character encoding supplied by the container by calling this method. It must be called prior to parsing any post data or reading any input from the request. Calling this method once data has been read will not affect the encoding. ~~~ So as some others suggested, the best option is using one of the CharecterEncoding servlet filters and not to remedy this situation somewhere in C3. - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org
Re: character encoding of a HttpServletRequest
On Mon, Jan 11, 2010 at 10:34 AM, Jos Snellings jos.snelli...@pandora.bewrote: That is right! It is just a confusing situation :-( The filter works fine. The init() method of a generator does not give a chance to call setCharacterEncoding, as the parsing already happened. The good thing is that the code is already in spring, so, no new external dependencies. Maybe later on I add a tryToGuessEncodingFilter. Trying to guess encodings isn't a good idea, in general. About the only one that can be reliably detected is UTF-8. In past projects, I've done something like this: String result; try { result = new String(someBytes, UTF-8); catch (EncodingError e) { result = new String(someBytes, Windows-1252); } In my experience, Windows-1252 was a better guess than ISO-8859-1, as users tend to paste in stuff from word documents with curly quotes. -Dom
Re: character encoding of a HttpServletRequest
This is not a specific cocoon issue, I believe. It probably has to do with Tomcat 5.5.27. request.setCharacterEncoding simply does not work; it does not change a thing. request.getCharacterEncoding returns nothing. Best, Jos On Sat, 2010-01-09 at 08:01 +0100, Jos Snellings wrote: Hi, HttpServletRequest looks 'imperfect': Cocoon 3, alpha 2. A generator accesses the HttpServletRequest in the setup method: request = HttpContextHelper.getRequest(parameters); text = request.getParameter(tekst); The pages, including forms are ecoded in utf-8. The String 'text' is strange: the original content (utf-8) is encoded once again: if the string on the form was one character, say 'é', the string has a length of 4 bytes. It is the result of utf-8 encoding the two byte character coming from the client. So, a second conversion is happening. Now: new String(request.getParameter(text).getBytes(ISO-8859-1)) works fine. Where should this be corrected? Cheers, Jos - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org
character encoding of a HttpServletRequest
Hi, HttpServletRequest looks 'imperfect': Cocoon 3, alpha 2. A generator accesses the HttpServletRequest in the setup method: request = HttpContextHelper.getRequest(parameters); text = request.getParameter(tekst); The pages, including forms are ecoded in utf-8. The String 'text' is strange: the original content (utf-8) is encoded once again: if the string on the form was one character, say 'é', the string has a length of 4 bytes. It is the result of utf-8 encoding the two byte character coming from the client. So, a second conversion is happening. Now: new String(request.getParameter(text).getBytes(ISO-8859-1)) works fine. Where should this be corrected? Cheers, Jos - To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org