doMike,
I'd be happy to add some info the manual, perhaps in the section on
setting up the database connection. Just let me know what it should say.
Maybe a page on the Wiki as well?
I tried to resume the issues/questions I encountered. Put it in the best
place(s)! Don't hesitate to correct my english...or any other points.
Pierre
Full unicode webapp notice
Jave uses for itself the Unicode character set, e.g. ucs-2, not utf-8.
Things begin to be messy as soon
as the JVM needs to transfer data. By default it uses value of system
property "file.encoding", most often
iso-8859-1 character set. But... not always. See for example stderr,
stdout and stdin, which use OS character
set.Or the native API, without any translation.
0/ About webapp:
The key point here is: the JVM seems to transfer directly the data
between the browser and the DBMS. At least
if you tell it what it receives and what to sent ...
1/ About DBMS:
Lucky: use a database whose character set supports Unicode. But even
then, make sure the database
connection drivers are being gentle with their data handling: tell them
about data send to and received
from the JVM with utf-8 character set.
Unlucky: the database doesn't support Unicode, either it contains legacy
data, or the DBMS doesn't support Unicode.
About MySql: prior to version 4.1, it lacks support to Unicode. The
driver "mysql-connector-java" is needed, and it
must know about utf-8, e.g. with the parameters:
- useUnicode, to "true",
- characterEncoding, to "UTF-8".
The driver will transparently converts utf-8 to/from the character set
used in the DB, whatever it is.
Then in keel.properties:
jdbc.keel-dbpool.dburl=jdbc:mysql://localhost:3306/popsuite?useUnicode=true&characterEncoding=UTF-8
Since version 4.1, MySql supports Unicode. With utf-8 as default
character set, then
the driver doesn't need such parameters any more.
In keel.properties put:
jdbc.keel-dbpool.dburl=jdbc:mysql://localhost:3306/popsuite
If Mysql has iso-8859-1 as default character set, then the former
parameters are still needed.
2/ About data received from brower
Current browsers don't specify in their requests which character set
must be used to read them. By default,
it's iso-8859-1 with HTTP 1.1. Servlet containers can try to guess
which character set is used. But if
the character set is known, e.g. if the container told to the browser
which one to use, then it's
possible to give this piece of information to the servlet which
receives the request.
Since Servlet API 2.3, there is a request.setCharacterEncoding method
to do it. And for a clean job,
this can be done from a container filter (if Struts is used, it's not
a choice!). Tomcat provides
such a filter (1). It can be used as it is with any other container
compliant to servlet API 2.3.
Keel-client is configured to used it by default. See its
web-filter.xml and web-filter-mapping.xml files.
About Tomcat: since its version 5 (and late 4.x), the
"useBodyEncodingForURI" attribute have to be set to "true"
to use the same character set for query (get) and body (post) parts
of the request (see Coyote Http Connector
in server.xml). Otherwise "URIEncoding" attribute value, if exists
(iso-8859-1 by default), will be used
as character set for the query.
Notes:
(1) See {tomcat}/webapps/servlets-examples/WEB-INF/classes/filters
(2) "useBodyEncodingForURI" and "URIEncoding" attributes, if needed,
must be specified for each
port connector.
(3) Don't use "org.apache.catalina.valves.RequestDumperValve" in
server.xml unless it is patched
with a request.setCharacterEncoding. This valve is called before
the filters, so any call to
setCharacterEncoding method from outside the valve will be
without any effect.
3/ About data send to browser
HTTP response must have appropriate Content-type header value.
If JSP is used, each file must have:
- a jsp directive with contentType attribute:
<jsp:directive.page language="java" contentType="text/html;
charset=UTF-8" />
This tells to the compiler which character set to be used for the data
send to the browser;
Note: "pageEncoding" parameter tells to the compiler which character
set is used in the JSP.
Specify it only if its value defers from the character set
value.
- a meta tag with contentType attribute:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
This tells to the browser which character set to be used.
The browsers use the character set of the HTML page to generate the
"post" part of the request, not the values
given by the "accept-charset" attribute of the form tags.
4/ Resources properties files use iso-8859-1 character set, whatever is
the "file.encoding" value. A JVM
will tranparently convert them in Unicode (ucs-2, not utf-8).
When non iso-8859-1 characters are needed, Unicode escaping sequence
must be used, e.g. in the
form \uhhhh, where each "h" is a hexadecimal digit. For example
\u2297 is the Unicode encoding of
the circle times character (a cercle with a "x" inside, see
http://us.metamath.org/symbols/otimes.gif)
To do the job, two ways are possible:
- Most of the text needs only iso-8859-1 characters: enter non
iso-8859-1 characters directly
using Unicode escaping sequence.
For languages where most characters are in the Latin-I character
set, this is awkward but possible.
- Otherwise: enter the text in any editor with its default character
set. Then translate the resources properties file
with the "native2ascii" program. In spite of its name, it converts
text to iso-8859-1 plus Unicode escaping sequences.
For languages like Chinese, Arabic, etc, this is the only one
realistic way.
Note: native2ascii can be found in {java_sdk}/bin. See also
http://java.sun.com/j2se/1.4.2/docs/tooldocs/windows/native2ascii.html
Note: for resources properties classes, the character set used by
the compiler is the same that for other
code files, e.g. the default code page of the operating system,
unless a command line parameter
tells otherwise.
http://keelframework.org/documentation.shtml
Keelgroup mailing list
[EMAIL PROTECTED]
http://lists.keelframework.com/listinfo.cgi/keelgroup-keelframework.com