Apache Jssi 1.1.2 had problem when processing jhtml page having DBCS words. After making jssi work under Apache 1.3.6 for OS/2, I test a jhtml page with <SERVLET> </SERVLET> tag and found that if the jhtml page had some DBCS words (Chinese Big5) before the <SERVLET> </SERVLET> tag then the page generated by jssi will be wrong on words' position. For example, I let jhtml only have one line like this : -------- hello.jhtml ---------- <SERVLET CODE="HelloWorldServlet.class"> </SERVLET> ------------------------------- Then all things work fine. But if I insert A DBCS (Chinese Big5) word in front of above line like this: ( '我' is A Chinese Big5 word may not understand by your system. ) -------- hello.jhtml ---------- 我 <SERVLET CODE="HelloWorldServlet.class"> </SERVLET> ------------------------------- the page generated by jssi will contain a '>' in the last line of the page ! And try to add one more DBCS word like this : -------- hello.jhtml ---------- 我我 <SERVLET CODE="HelloWorldServlet.class"> </SERVLET> ------------------------------- the page generated by jssi will contain a 'T>' in the last line of the page ! Java support Unicode and will display correct words by setting correct charset. The reason for why one DBCS (a Unicode) word causing the wrong word position is jssi do not count correct bytes for these words ! A Unicode needs two bytes but ASCII needs only one byte. When Jssi parsing incoming jhtml page it will count the words in the page and determine where to break the page for servlet code inserting into it. However Java VM count one Unicode word as 'ONE' word and jssi thinks 'ONE BYTE' , that's why 'T>' will appear in the last line of generated page because jssi count '我我' two Unicode word as TWO BYTES but actually its FOUR BYTES. Then jssi determine wrong words position and shift two bytes so we get the 'T>' ! the following java code will show the result about 'one word but two bytes'. -------------------- String a = "我a的b朋c友"; System.out.println(a.length() + " " + a); String b = new String(a.getBytes("UTF8"), "ISO8859_1"); System.out.println(b.length() + " " + b); String c = new String(b.getBytes("ISO8859_1"), "UTF8"); System.out.println(c.length() + " " + c); -------------------- the result is .. -------------------- 7 我a的b朋c友 // Unicode (Big5) display only 7 words count 15 ???a???b???c??? // ISO8859_1 display count 15 words (correct bytes) 7 我a的b朋c友 // Back to Unicode display only 7 words count -------------------- I had some thoughts about sloving the problem. One, when parsing jhtml page, just using the new String(a.getBytes("UTF8"), "ISO8859_1"); method to convert the content and count the correct bytes and then using new String(b.getBytes("ISO8859_1"), "UTF8"); to convert back. Since ISO8859_1 is standard for many environment all over the world, I suggest create one more init parameter named 'transEncoding' (or somename else) only can be 'yes' or 'no' for determine if the jssi need to translate Encoding for count the correct bytes by using ISO8859_1 and the user defined charset, one of the jssi's init parameter, and let user put correct charset for their own country code. Two, Java do support Unicode by using the InputStreamReader/OutputStreamWriter for character encoding. If jssi can use these kind of io class then can solve DBCS problem. Since Apache 1.3.6 for OS/2 Warp with Jserv 1.0final work fine with the charset Big5 (UTF8) and display correct chinese words on Netscape browser the jssi, however using InputStream/OutputStream io class and do not deal well with DBCS words and counts wrong bytes .... I hope this will make JSSI more greatful ! -- Waily Yang ------------------------------------- | Email mailto:[EMAIL PROTECTED] | | Homepage http://www.HappyElec/Waily | | Location Taipei, Taiwan, R.O.C. (DBCS BIG5) | | Club Team OS/2 in Taiwan|Power User Group | | Newsgroups news:tw.bbs.comp.os2 | | Java & REXX & C++ Programmer using OS/2 TWarp | --------------------------------------------------- ------------------------------------------------------------ To subscribe: [EMAIL PROTECTED] To unsubscribe: [EMAIL PROTECTED] Problems?: [EMAIL PROTECTED]