Re: character encoding of a HttpServletRequest

2010-01-11 Thread Dominic Mitchell
2010/1/10 Jos Snellings jos.snelli...@pandora.be

 This is not a specific cocoon issue, I believe. It probably has to do
 with Tomcat 5.5.27.
 request.setCharacterEncoding simply does not work; it does not change a
 thing.
 request.getCharacterEncoding returns nothing.


You have to call request.setCharacterEncoding() really early for it to have
any impact.  Your best bet is to look at spring's CharacterEncodingFilter.
You can add that to your web.xml to get the character set defined very early
on.

-Dom


Re: character encoding of a HttpServletRequest

2010-01-11 Thread Jos Snellings
Thanks, I will try CharacterEncodingFilter!
I will lookup in the code were filtering takes place, because the
problem is rather that it looks like the form data are filtered twice. 

In addition, do I remember right that there used to be a cocoon servlet
setting, 
init-param
param-nameform-encoding/param-name
param-valueUTF-8/param-value
/init-param

Cheers, thanks for the hint. I will post the result... I will certainly
not be the only person who is confronted with this problem.

Jos


On Mon, 2010-01-11 at 08:54 +, Dominic Mitchell wrote:
 2010/1/10 Jos Snellings jos.snelli...@pandora.be
 This is not a specific cocoon issue, I believe. It probably
 has to do
 with Tomcat 5.5.27.
 request.setCharacterEncoding simply does not work; it does not
 change a
 thing.
 request.getCharacterEncoding returns nothing.
 
 You have to call request.setCharacterEncoding() really early for it to
 have any impact.  Your best bet is to look at spring's
 CharacterEncodingFilter.  You can add that to your web.xml to get the
 character set defined very early on.
 
 -Dom 
 
 



-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



Re: character encoding of a HttpServletRequest

2010-01-11 Thread Dominic Mitchell
On Mon, Jan 11, 2010 at 9:12 AM, Jos Snellings jos.snelli...@pandora.bewrote:

 Thanks, I will try CharacterEncodingFilter!
 I will lookup in the code were filtering takes place, because the
 problem is rather that it looks like the form data are filtered twice.

 In addition, do I remember right that there used to be a cocoon servlet
 setting,
init-param
param-nameform-encoding/param-name
param-valueUTF-8/param-value
/init-param

 Cheers, thanks for the hint. I will post the result... I will certainly
 not be the only person who is confronted with this problem.


There are so many places to set the encoding.  And just for fun, you can
have different encodings for query string parameters and form-data in the
body in the same request.  Sigh.

I've had good luck with
CharacterEncodingFilterhttp://static.springsource.org/spring/docs/2.5.6/api/org/springframework/web/filter/CharacterEncodingFilter.htmlthough.

-Dom


Re: character encoding of a HttpServletRequest

2010-01-11 Thread Reinhard Pötz

Jos Snellings wrote:
 Hi,
 
 HttpServletRequest looks 'imperfect':
 Cocoon 3, alpha 2.
 A generator accesses the HttpServletRequest in the setup method:
 
 request = HttpContextHelper.getRequest(parameters);
 text = request.getParameter(tekst);
 
 The pages, including forms are ecoded in utf-8.
 The String 'text' is strange: the original content (utf-8) is encoded
 once again:
 if the string on the form was one character, say 'é', the string has a
 length of 4 bytes. It is the result of utf-8 encoding the two byte
 character coming from the client. So, a second conversion is happening.
 
 Now:
 new String(request.getParameter(text).getBytes(ISO-8859-1)) works
 fine.
 
 Where should this be corrected?

Jos,

in Cocoon 3 there isn't any code that changes the encoding of request
parameters. The plain HttpServletRequest as provided by the servlet
container is used.

IIRC Tomcat uses ISO-8859-1 by default which follows the recommendation
of the Servlet API spec:

~~~
SRV.4.9 Request data encoding
Currently, many browsers do not send a char encoding qualifier with the
Content-Type header, leaving open the determination of the character
encoding for reading HTTP requests. The default encoding of a request
the container uses to create the request reader and parse POST data must
be “ISO-8859-1” if none has been specified by the client request.
However, in order to indicate to the developer in this case the failure
of the client to send a character encoding, the container returns null
from the getCharacterEncoding method.
If the client hasn’t set character encoding and the request data is
encoded with a different encoding than the default as described above,
breakage can occur. To remedy this situation, a new method
setCharacterEncoding(String enc) has been added to the ServletRequest
interface. Developers can override the character encoding supplied by
the container by calling this method. It must be called prior to parsing
any post data or reading any input from the request. Calling
this method once data has been read will not affect the encoding.
~~~

So as some others suggested, the best option is using one of the
CharecterEncoding servlet filters and not to remedy this situation
somewhere in C3.

-- 
Reinhard Pötz   Managing Director, {Indoqa} GmbH
 http://www.indoqa.com/en/people/reinhard.poetz/

Member of the Apache Software Foundation
Apache Cocoon Committer, PMC member  reinh...@apache.org


-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



Re: character encoding of a HttpServletRequest

2010-01-11 Thread Jos Snellings
This, to notify you that the solution you suggested works fine:
So, for all cocoon users: if you are experiencing problems with the
character encoding of POST form data (which is very likely to occur):
the problem is generally cured by
Inserting the following code in web.xml

filter
filter-nameencodingFilter/filter-name

filter-classorg.springframework.web.filter.CharacterEncodingFilter/filter-class
init-param
param-nameencoding/param-name
param-valueUTF-8/param-value
/init-param
init-param
param-nameforceEncoding/param-name
param-valuetrue/param-value
/init-param
 /filter

 filter-mapping
filter-nameencodingFilter/filter-name
url-pattern/*/url-pattern
 /filter-mapping

(Insert it as the first children under the web-app root element)

Jos


On Mon, 2010-01-11 at 08:54 +, Dominic Mitchell wrote:
 2010/1/10 Jos Snellings jos.snelli...@pandora.be
 This is not a specific cocoon issue, I believe. It probably
 has to do
 with Tomcat 5.5.27.
 request.setCharacterEncoding simply does not work; it does not
 change a
 thing.
 request.getCharacterEncoding returns nothing.
 
 You have to call request.setCharacterEncoding() really early for it to
 have any impact.  Your best bet is to look at spring's
 CharacterEncodingFilter.  You can add that to your web.xml to get the
 character set defined very early on.
 
 -Dom 
 
 



-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



Re: character encoding of a HttpServletRequest

2010-01-11 Thread Jos Snellings
That is right!
It is just a confusing situation :-(
The filter works fine. The init() method of a generator does not give a
chance to call setCharacterEncoding, as the parsing already happened.
The good thing is that the code is already in spring, so, no new
external dependencies. Maybe later on I add a
tryToGuessEncodingFilter.

Jos

On Mon, 2010-01-11 at 10:49 +0100, Reinhard Pötz wrote:
 Jos Snellings wrote:
  Hi,
  
  HttpServletRequest looks 'imperfect':
  Cocoon 3, alpha 2.
  A generator accesses the HttpServletRequest in the setup method:
  
  request = HttpContextHelper.getRequest(parameters);
  text = request.getParameter(tekst);
  
  The pages, including forms are ecoded in utf-8.
  The String 'text' is strange: the original content (utf-8) is encoded
  once again:
  if the string on the form was one character, say 'é', the string has a
  length of 4 bytes. It is the result of utf-8 encoding the two byte
  character coming from the client. So, a second conversion is happening.
  
  Now:
  new String(request.getParameter(text).getBytes(ISO-8859-1)) works
  fine.
  
  Where should this be corrected?
 
 Jos,
 
 in Cocoon 3 there isn't any code that changes the encoding of request
 parameters. The plain HttpServletRequest as provided by the servlet
 container is used.
 
 IIRC Tomcat uses ISO-8859-1 by default which follows the recommendation
 of the Servlet API spec:
 
 ~~~
 SRV.4.9 Request data encoding
 Currently, many browsers do not send a char encoding qualifier with the
 Content-Type header, leaving open the determination of the character
 encoding for reading HTTP requests. The default encoding of a request
 the container uses to create the request reader and parse POST data must
 be “ISO-8859-1” if none has been specified by the client request.
 However, in order to indicate to the developer in this case the failure
 of the client to send a character encoding, the container returns null
 from the getCharacterEncoding method.
 If the client hasn’t set character encoding and the request data is
 encoded with a different encoding than the default as described above,
 breakage can occur. To remedy this situation, a new method
 setCharacterEncoding(String enc) has been added to the ServletRequest
 interface. Developers can override the character encoding supplied by
 the container by calling this method. It must be called prior to parsing
 any post data or reading any input from the request. Calling
 this method once data has been read will not affect the encoding.
 ~~~
 
 So as some others suggested, the best option is using one of the
 CharecterEncoding servlet filters and not to remedy this situation
 somewhere in C3.
 



-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



Re: character encoding of a HttpServletRequest

2010-01-11 Thread Dominic Mitchell
On Mon, Jan 11, 2010 at 10:34 AM, Jos Snellings jos.snelli...@pandora.bewrote:

 That is right!
 It is just a confusing situation :-(
 The filter works fine. The init() method of a generator does not give a
 chance to call setCharacterEncoding, as the parsing already happened.
 The good thing is that the code is already in spring, so, no new
 external dependencies. Maybe later on I add a
 tryToGuessEncodingFilter.


Trying to guess encodings isn't a good idea, in general.  About the only one
that can be reliably detected is UTF-8.  In past projects, I've done
something like this:

  String result;
  try {
result = new String(someBytes, UTF-8);
  catch (EncodingError e) {
result = new String(someBytes, Windows-1252);
  }

In my experience, Windows-1252 was a better guess than ISO-8859-1, as users
tend to paste in stuff from word documents with curly quotes.

-Dom


Re: character encoding of a HttpServletRequest

2010-01-10 Thread Jos Snellings
This is not a specific cocoon issue, I believe. It probably has to do
with Tomcat 5.5.27.
request.setCharacterEncoding simply does not work; it does not change a
thing.
request.getCharacterEncoding returns nothing.

Best,
Jos


On Sat, 2010-01-09 at 08:01 +0100, Jos Snellings wrote:
 Hi,
 
 HttpServletRequest looks 'imperfect':
 Cocoon 3, alpha 2.
 A generator accesses the HttpServletRequest in the setup method:
 
 request = HttpContextHelper.getRequest(parameters);
 text = request.getParameter(tekst);
 
 The pages, including forms are ecoded in utf-8.
 The String 'text' is strange: the original content (utf-8) is encoded
 once again:
 if the string on the form was one character, say 'é', the string has a
 length of 4 bytes. It is the result of utf-8 encoding the two byte
 character coming from the client. So, a second conversion is happening.
 
 Now:
 new String(request.getParameter(text).getBytes(ISO-8859-1)) works
 fine.
 
 Where should this be corrected?
 
 Cheers,
 Jos
 
 
 -
 To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
 For additional commands, e-mail: users-h...@cocoon.apache.org
 
 



-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



character encoding of a HttpServletRequest

2010-01-08 Thread Jos Snellings
Hi,

HttpServletRequest looks 'imperfect':
Cocoon 3, alpha 2.
A generator accesses the HttpServletRequest in the setup method:

request = HttpContextHelper.getRequest(parameters);
text = request.getParameter(tekst);

The pages, including forms are ecoded in utf-8.
The String 'text' is strange: the original content (utf-8) is encoded
once again:
if the string on the form was one character, say 'é', the string has a
length of 4 bytes. It is the result of utf-8 encoding the two byte
character coming from the client. So, a second conversion is happening.

Now:
new String(request.getParameter(text).getBytes(ISO-8859-1)) works
fine.

Where should this be corrected?

Cheers,
Jos


-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org