Re: jasper: weird behaviour

2001-08-08 Thread Jeff Kilbride

You might be interested in this post from Tomcat-dev.

--jeff

---
Hi All!

Different encodings support in Servlet/JSP is an ancient well-known problem.
The setCharacterEncoding() method of HttpServletRequest allows to change
request
encoding before reading parameters. Thus, servlet is able to change encoding
in
accordance with its needs. (Small lyrical digression: what does this
encoding mean?
I'll post my thoughts about it separately)
Howevet the problem still exists in JSP (there were several postings about
the problem in
this maillist). The purpose of this mail is to propose a solution for
encodings support in JSP.

Problem description
===
A JSP programmer is not able to change request encoding for incoming JSP
request, since
"This method [setCharacterEncoding] must be called prior to parsing any post
data or
reading any input from the request. Calling this method once data has been
read will
not affect the encoding." (Servlet 2.3 Spec). This happens because request
parameters
being read inside org.pache.jasper.servlet.JspServlet, before calling
generated JSP-servlet.
As a result we have the following behaviour of compiled JSP for non-English
environments:
1) incoming request being read using 'ISO-8859-1'
2) getParameter() method returns a value in 'ISO-8859-1', but JSP-servlet
suppose the
   return value has JVM default encoding (say "KOI8-R") -- here is ???
instead of
   real parameter value. Here is a problem.

Problem solution

There should be a configurable optional parameter for JspServlet (say
'requestEncoding') to
change request encoding. According to this parameter JspServlet should call
setCharacterEncoding()
before processing request. It does not conflict with JSP 1.2 Spec, since
there are now any
words about default encoding of incoming request over there.

I have made neccessary changes to implement this feature in
tomcat-4.0-20010807. It works fine
with different Cyrillic encodings. (Suppose the same result for the rest of
non-Latin1 encodings).
I clearly understand that proposed solution is not a panacea and it's a
subject to discuss.


Regards,
Andrey Aristarkhov


Diffs are followed (also as attachments). I have also attached a sample JSP
for encoding testing.


file: org/apache/jasper/EmbededServletOptions.java

147a148,152
>  * Java platform encoding for incoming request.
>  */
> private String requestEncoding;
>
> /**
219a225,228
> public String getRequestEncoding() {
> return requestEncoding;
> }
>
320a330
> this.requestEncoding = config.getInitParameter("requestEncoding");

file: org/apache/jasper/EmbededServletOptions.java

144a145,149
>
> /**
>  * Java platform encoding for incoming request.
>  */
> public String getRequestEncoding();

file: org/apache/jasper/servlet/JspServlet.java

422c422,426
< String includeUri
---
> // According to section 4.9 of Servlet 2.3 spec we have to
> // setCharacterEncoding() before reading any parameter
> if (options.getRequestEncoding()!=null)
>   request.setCharacterEncoding(options.getRequestEncoding());
> String includeUri






jasper: weird behaviour

2001-08-08 Thread Jacek Prucia


Hello therer tomcat users ;)

I'm not sure if this is a bug, so I'm posting a description of an unusual
problem, and hope that if this is not a bug, somebody will prove that I'm
missing something here...

I have JSP page that has static content (outside <% %> tags) in ISO-8859-2,
and a few static html pages. When it's completelly up to Tomcat to generate
some page (for example redirect request, or internal server error) - it
outputs Content-Type: header corectly (whole Linux enviroment is set to
pl_PL) like this:

Content-Type: text/html;charset=ISO-8859-2

When it sends static html, it outputs:

Content-Type: text/html

..but this is corrected by  tag. However *EVERYTIME*
tomcat is sending back output of a JSP page, it is sending this:

Content-Type: text/html;charset=ISO-8859-1

which is ok (as defined in JSP spec), but there's *NO* way to change it!
I've tried nearly everything, including:

<% response.setHeader("Content-Type", "text/html;charset=ISO-8859-2"); %>

or

<%@ page contentType("text/html;charset=ISO-8859-2"); %>

All those tags make response.setHeader(...) apear on top of __jspService
(inside proper .java file in $TOMCAT_HOME/work), but then... header get's
overwritten by tomcat to ISO-8859-1 which scrambles all content and forces
user to pick up ISO-8859-2 from browsers encoding menus everytime document
is generated, which is really annoying. Bowsers seems to ignore  tag in favour of server generated Content-Type: header.

Thanks to tomcat beeing opensource, I can just play with
share/org/apache/jasper/compilser/Compiler.java, and broke spec by setting
default encoding to ISO-8859-2, but I feel like that's not the way...

Is it a bug in jasper, or am I missing something here?

-- 
Jacek Prucia
7bulls.com S.A.