What!  No opinions??

I take it either my original explanation was entirely obscure,
or so obviously true was to be beyond discussion :).

To clarify, the problematic language in the JSP spec:
---
2.7.4 Delivering Localized Content
[snip]
The JSP 1.1 specification assumes that JSP pages that will deliver
content in a given character encoding will be written in that
character encoding. In particular, the contentType attribute of the
page directive describes both the character encoding of the JSP page
and the character encoding of the resulting stream.
[snip]
The contentType attribute must only be used when the character
encoding is organized such that ASCII characters stand for themselves,
at least until the contentType attribute is found. The directive
containing the contentType attribute should appear as early as
possible in the JSP page.
---

The above is fairly reasonable when the original page encoding is an
ASCII variant, and the output encoding is an ASCII variant (as it will
nearly always be).

It doesn't work well when the native encoding on a platform is EBCDIC.

On an EBCDIC box it is reasonable to accept *.jsp pages EBCDIC encoded
as input, and generated ASCII (ISO-8859-1) as output.

I don't see any way for the JSP compiler to know the encoding of pages
on read without asking the container (1).

(The same is also true when reading *.html pages, BTW).

Once you have an input encoding that may be different from the output
encoding, you need an intermediate encoding that will not lose any
information (2).


I have updated the "Jasper" reference implementation to reflect this
(which was necessary to make JSP pages work *at all* on an EBCDIC box).


There is still a missing bit in Tomcat/Jakarta in that the container
needs to supply the encoding for *.jsp/*.html on read.  The current
implementation assumes ASCII in all cases - which in fact will work
just fine in *almost* every case... so there is no urgent need to
fill in this bit of the implementation :).


-----Original Message-----
From: A mailing list about Java Server Pages specification and reference
[mailto:[EMAIL PROTECTED]]On Behalf Of Preston L. Bannister
Sent: Thursday, December 30, 1999 6:04 PM
To: [EMAIL PROTECTED]
Subject: The contentType attribute to the 'page' directive.


In looking at the character set handling for the Jakarta JSP compiler
it seems to me there are really three character sets that need be known
to the JSP compiler:

1.  The character set of the *.jsp file.  Usually this is ISO-8859-1
    (ASCII) but might reasonably be EBCDIC on EBCDIC systems.
    This really should be chosen by a hint from the container.

2.  The character set of the generated *.java file.
    I would suggest that UTF8 is the most reasonable choice here.

3.  The character set of the response written by the compiled servlet.
    This should be up to the author of the JSP page, as reflected in
    the current contentType attribute.

Opinions?

--
Preston L. Bannister
http://members.home.com/preston
[EMAIL PROTECTED]

===========================================================================
To unsubscribe: mailto [EMAIL PROTECTED] with body: "signoff JSP-INTEREST".
FAQs on JSP can be found at:
 http://java.sun.com/products/jsp/faq.html
 http://www.esperanto.org.nz/jsp/jspfaq.html

Reply via email to