doMike,

I'd be happy to add some info the manual, perhaps in the section on
setting up the database connection. Just let me know what it should say.

Maybe a page on the Wiki as well?

I tried to resume the issues/questions I encountered. Put it in the best place(s)! Don't hesitate to correct my english...or any other points.

Pierre

Full unicode webapp notice

Jave uses for itself the Unicode character set, e.g. ucs-2, not utf-8. Things begin to be messy as soon as the JVM needs to transfer data. By default it uses value of system property "file.encoding", most often iso-8859-1 character set. But... not always. See for example stderr, stdout and stdin, which use OS character
set.Or the native API, without any translation.

0/ About webapp:

The key point here is: the JVM seems to transfer directly the data between the browser and the DBMS. At least
if you tell it what it receives and what to sent ...

1/ About DBMS:

Lucky: use a database whose character set supports Unicode. But even then, make sure the database connection drivers are being gentle with their data handling: tell them about data send to and received from the JVM with utf-8 character set. Unlucky: the database doesn't support Unicode, either it contains legacy data, or the DBMS doesn't support Unicode.

About MySql: prior to version 4.1, it lacks support to Unicode. The driver "mysql-connector-java" is needed, and it
must know about utf-8, e.g. with the parameters:
- useUnicode, to "true",
- characterEncoding, to "UTF-8".

The driver will transparently converts utf-8 to/from the character set used in the DB, whatever it is.

Then in keel.properties:
jdbc.keel-dbpool.dburl=jdbc:mysql://localhost:3306/popsuite?useUnicode=true&characterEncoding=UTF-8

Since version 4.1, MySql supports Unicode. With utf-8 as default character set, then
the driver doesn't need such parameters any more.
In keel.properties put:
jdbc.keel-dbpool.dburl=jdbc:mysql://localhost:3306/popsuite

If Mysql has iso-8859-1 as default character set, then the former parameters are still needed.

2/  About data received from brower

Current browsers don't specify in their requests which character set must be used to read them. By default, it's iso-8859-1 with HTTP 1.1. Servlet containers can try to guess which character set is used. But if the character set is known, e.g. if the container told to the browser which one to use, then it's possible to give this piece of information to the servlet which receives the request. Since Servlet API 2.3, there is a request.setCharacterEncoding method to do it. And for a clean job, this can be done from a container filter (if Struts is used, it's not a choice!). Tomcat provides such a filter (1). It can be used as it is with any other container compliant to servlet API 2.3. Keel-client is configured to used it by default. See its web-filter.xml and web-filter-mapping.xml files.

About Tomcat: since its version 5 (and late 4.x), the "useBodyEncodingForURI" attribute have to be set to "true" to use the same character set for query (get) and body (post) parts of the request (see Coyote Http Connector in server.xml). Otherwise "URIEncoding" attribute value, if exists (iso-8859-1 by default), will be used
  as character set for the query.
Notes:
  (1) See {tomcat}/webapps/servlets-examples/WEB-INF/classes/filters
(2) "useBodyEncodingForURI" and "URIEncoding" attributes, if needed, must be specified for each
      port connector.
(3) Don't use "org.apache.catalina.valves.RequestDumperValve" in server.xml unless it is patched with a request.setCharacterEncoding. This valve is called before the filters, so any call to setCharacterEncoding method from outside the valve will be without any effect.

3/ About data send to browser

HTTP response must have appropriate Content-type header value.

If JSP is used, each file must have:
- a jsp directive with contentType attribute:
<jsp:directive.page language="java" contentType="text/html; charset=UTF-8" />

This tells to the compiler which character set to be used for the data send to the browser; Note: "pageEncoding" parameter tells to the compiler which character set is used in the JSP. Specify it only if its value defers from the character set value. - a meta tag with contentType attribute:
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

 This tells to the browser which character set to be used.

The browsers use the character set of the HTML page to generate the "post" part of the request, not the values
 given by the "accept-charset" attribute of the form tags.

4/ Resources properties files use iso-8859-1 character set, whatever is the "file.encoding" value. A JVM
  will tranparently convert them in Unicode (ucs-2, not utf-8).
When non iso-8859-1 characters are needed, Unicode escaping sequence must be used, e.g. in the form \uhhhh, where each "h" is a hexadecimal digit. For example \u2297 is the Unicode encoding of the circle times character (a cercle with a "x" inside, see http://us.metamath.org/symbols/otimes.gif) To do the job, two ways are possible: - Most of the text needs only iso-8859-1 characters: enter non iso-8859-1 characters directly using Unicode escaping sequence. For languages where most characters are in the Latin-I character set, this is awkward but possible.

- Otherwise: enter the text in any editor with its default character set. Then translate the resources properties file with the "native2ascii" program. In spite of its name, it converts text to iso-8859-1 plus Unicode escaping sequences. For languages like Chinese, Arabic, etc, this is the only one realistic way.

    Note: native2ascii can be found in {java_sdk}/bin. See also
http://java.sun.com/j2se/1.4.2/docs/tooldocs/windows/native2ascii.html Note: for resources properties classes, the character set used by the compiler is the same that for other code files, e.g. the default code page of the operating system, unless a command line parameter
   tells otherwise.


http://keelframework.org/documentation.shtml
Keelgroup mailing list
[EMAIL PROTECTED]
http://lists.keelframework.com/listinfo.cgi/keelgroup-keelframework.com

Reply via email to