Re: [Keelgroup] full Unicode webapp

Raoul Pierre Tue, 05 Jul 2005 02:30:45 -0700

doMike,

I'd be happy to add some info the manual, perhaps in the section on
setting up the database connection. Just let me know what it should say.


Maybe a page on the Wiki as well?

I tried to resume the issues/questions I encountered. Put it in the bestplace(s)! Don't hesitate to correct my english...or any other points.


Pierre

Full unicode webapp notice

Jave uses for itself the Unicode character set, e.g. ucs-2, not utf-8.Things begin to be messy as soonas the JVM needs to transfer data. By default it uses value of systemproperty "file.encoding", most ofteniso-8859-1 character set. But... not always. See for example stderr,stdout and stdin, which use OS character

set.Or the native API, without any translation.

0/ About webapp:

The key point here is: the JVM seems to transfer directly the databetween the browser and the DBMS. At least

if you tell it what it receives and what to sent ...

1/ About DBMS:

Lucky: use a database whose character set supports Unicode. But eventhen, make sure the databaseconnection drivers are being gentle with their data handling: tell themabout data send to and receivedfrom the JVM with utf-8 character set.Unlucky: the database doesn't support Unicode, either it contains legacydata, or the DBMS doesn't support Unicode.

About MySql: prior to version 4.1, it lacks support to Unicode. Thedriver "mysql-connector-java" is needed, and it

must know about utf-8, e.g. with the parameters:
- useUnicode, to "true",
- characterEncoding, to "UTF-8".

The driver will transparently converts utf-8 to/from the character setused in the DB, whatever it is.


Then in keel.properties:
jdbc.keel-dbpool.dburl=jdbc:mysql://localhost:3306/popsuite?useUnicode=true&amp;characterEncoding=UTF-8

Since version 4.1, MySql supports Unicode. With utf-8 as defaultcharacter set, then

the driver doesn't need such parameters any more.
In keel.properties put:
jdbc.keel-dbpool.dburl=jdbc:mysql://localhost:3306/popsuite

If Mysql has iso-8859-1 as default character set, then the formerparameters are still needed.


2/  About data received from brower

Current browsers don't specify in their requests which character setmust be used to read them. By default,it's iso-8859-1 with HTTP 1.1. Servlet containers can try to guesswhich character set is used. But ifthe character set is known, e.g. if the container told to the browserwhich one to use, then it'spossible to give this piece of information to the servlet whichreceives the request.Since Servlet API 2.3, there is a request.setCharacterEncoding methodto do it. And for a clean job,this can be done from a container filter (if Struts is used, it's nota choice!). Tomcat providessuch a filter (1). It can be used as it is with any other containercompliant to servlet API 2.3.Keel-client is configured to used it by default. See itsweb-filter.xml and web-filter-mapping.xml files.

About Tomcat: since its version 5 (and late 4.x), the"useBodyEncodingForURI" attribute have to be set to "true"to use the same character set for query (get) and body (post) partsof the request (see Coyote Http Connectorin server.xml). Otherwise "URIEncoding" attribute value, if exists(iso-8859-1 by default), will be used

  as character set for the query.

Notes:

  (1) See {tomcat}/webapps/servlets-examples/WEB-INF/classes/filters

(2) "useBodyEncodingForURI" and "URIEncoding" attributes, if needed,must be specified for each

      port connector.

(3) Don't use "org.apache.catalina.valves.RequestDumperValve" inserver.xml unless it is patchedwith a request.setCharacterEncoding. This valve is called beforethe filters, so any call tosetCharacterEncoding method from outside the valve will bewithout any effect.


3/ About data send to browser

HTTP response must have appropriate Content-type header value.

If JSP is used, each file must have:
- a jsp directive with contentType attribute:

<jsp:directive.page language="java" contentType="text/html;charset=UTF-8" />

This tells to the compiler which character set to be used for the datasend to the browser;Note: "pageEncoding" parameter tells to the compiler which characterset is used in the JSP.Specify it only if its value defers from the character setvalue.- a meta tag with contentType attribute:

  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

 This tells to the browser which character set to be used.

The browsers use the character set of the HTML page to generate the"post" part of the request, not the values

 given by the "accept-charset" attribute of the form tags.

4/ Resources properties files use iso-8859-1 character set, whatever isthe "file.encoding" value. A JVM

  will tranparently convert them in Unicode (ucs-2, not utf-8).

When non iso-8859-1 characters are needed, Unicode escaping sequencemust be used, e.g. in theform \uhhhh, where each "h" is a hexadecimal digit. For example\u2297 is the Unicode encoding ofthe circle times character (a cercle with a "x" inside, seehttp://us.metamath.org/symbols/otimes.gif)To do the job, two ways are possible:- Most of the text needs only iso-8859-1 characters: enter noniso-8859-1 characters directlyusing Unicode escaping sequence.For languages where most characters are in the Latin-I characterset, this is awkward but possible.

- Otherwise: enter the text in any editor with its default characterset. Then translate the resources properties filewith the "native2ascii" program. In spite of its name, it convertstext to iso-8859-1 plus Unicode escaping sequences.For languages like Chinese, Arabic, etc, this is the only onerealistic way.


    Note: native2ascii can be found in {java_sdk}/bin. See also

http://java.sun.com/j2se/1.4.2/docs/tooldocs/windows/native2ascii.htmlNote: for resources properties classes, the character set used bythe compiler is the same that for othercode files, e.g. the default code page of the operating system,unless a command line parameter

   tells otherwise.


http://keelframework.org/documentation.shtml
Keelgroup mailing list
[EMAIL PROTECTED]
http://lists.keelframework.com/listinfo.cgi/keelgroup-keelframework.com

Re: [Keelgroup] full Unicode webapp

Reply via email to