erm... that code snipple was from the XMLSerializer, not the HTMLSerializer as i wrote, but the approach should be the same.. sorry!
 
mvh karl øie
-----Original Message-----
From: Karl Øie [mailto:[EMAIL PROTECTED]]
Sent: 12. desember 2001 13:51
To: [EMAIL PROTECTED]; Arun.N
Subject: RE: urgent encoding problem...

the increasing page size does not concern me (:-) because the serializer should write directly to the response.getPrintWriter(). Then again, the serializer does not flush before the end of the page, so users must wait till the page is finished.
 
when it comes to the missing characters. you could try to create your own serializer, let's take a look at the code for the HTMLSerializer (org.apache.cocoon.serialization.XMLSerializer);
 
the method for the outputstream uses javax.xml to set the transformers properties
 
    public void setOutputStream(OutputStream out) {
        try {
            super.setOutputStream(out);
            this.handler = getTransformerFactory().newTransformerHandler();
            format.put(OutputKeys.METHOD,"xml");
            handler.setResult(new StreamResult(this.output));
            handler.getTransformer().setOutputProperties(format);
            this.setContentHandler(handler);
            this.setLexicalHandler(handler);
        } catch (Exception e) {
            getLogger().error("XMLSerializer.setOutputStream()", e);
            throw new RuntimeException(e.toString());
        }
    }
 
 
if you here force the transformer to use your encoding like this;
 
 
    public void setOutputStream(OutputStream out) {
        try {
            super.setOutputStream(out);
            this.handler = getTransformerFactory().newTransformerHandler();
            format.put(OutputKeys.METHOD,"xml");
            format.put(OutputKeys.ENCODING,"SHIFT_JIS");    <----- add this!!!!
            handler.setResult(new StreamResult(this.output));
            handler.getTransformer().setOutputProperties(format);
            this.setContentHandler(handler);
            this.setLexicalHandler(handler);
        } catch (Exception e) {
            getLogger().error("XMLSerializer.setOutputStream()", e);
            throw new RuntimeException(e.toString());
        }
    }
 
and then recompile cocoon, now try to your page and tell me what happens. please also read this note from the xalan faq
 
 
 
mvh karl øie
 
 
-----Original Message-----
From: Arun.N [mailto:[EMAIL PROTECTED]]
Sent: 12. desember 2001 13:33
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: urgent encoding problem...

Thank you karl.
        I have fixed that problem and i have followed the method which you have said.
every thing is being displayed properly but there is still a problem in the source code.
if the string contains    ‚±‚¿‚ç‚É–|–óŒã‚Ì•¶Í‚ª•\Ž¦‚³‚ê‚Ü‚·B
it is priniting こちらに翻訳後の文章が表示されます。
but when i look into the source of the output html it is showing  &#12371;&#12385;&#12425;&#12395;&#32763;&#35379;&#24460;&#12398;&#25991;&#31456;&#12364;&#34920;&#31034;&#12373;&#12428;&#12414;&#12377;&#12290;
 
if the source has characters like this also ‚±‚¿‚ç‚É–|–óŒã‚Ì•¶Í‚ª•\Ž¦‚³‚ê‚Ü‚·B it will work fine and out put japanese characters will be the same . but why is cocoon processor replacing everything into numbers. My concern here is, it is increasing the page size.
any comments in this regard ???
Thankx in advance,
Arun.N
 
 
----- Original Message -----
From: Karl Øie
Sent: Wednesday, December 12, 2001 5:44 PM
Subject: RE: urgent encoding problem...

it's not that people don't bother to answer you but a lot of people here don't have any experience with shift-jis encoding. as a Norwegian I have the same problem, non Scandinavians can hardly reproduce problems revolving Scandinavian-characters.
 
when it comes to your string problem there can be several sources. first of all you can test the dom by feeding it a string that has been created with a declared encoding, like :
 
new String( "æ e trønder æ å" ); will not work on all jdks/platforms

new String( "æ e trønder æ å", "UTF-16" ); will work on most sane jdks/platforms

try to create all your strings with shift_jis forced, just in case. second find out weither StringWriter does support shift_jis, as far as i know StringWriter are working on chars and strings and should support shift_jis if all strings fed to it is shift_jis created. lastly there is some problems regarding the PrintWriter that the servlet api are using to return serialized content to the browser, try to serialize to a file instead of to the browser, if the file accepts shift_jis then you should look up fixes/gotchas regarding shift_jis and jsp as cocoon are using the jsp mechanism to send the response back to the user.

the best place to start looking is the xalan faqs and docs because if you use the xml or html serializer it's using the xalan implementations.

mvh karl øie

 

-----Original Message-----
From: Arun.N [mailto:[EMAIL PROTECTED]]
Sent: 12. desember 2001 12:48
To: [EMAIL PROTECTED]
Subject: Re: urgent encoding problem...

Hi all,
            First of all i thank everybody for not bothering to reply. I corrected the second and the third problem. If the list is still alive and anyone cares to give me solution for the first problem please do reply.....
thankx,
Arun.N
 
----- Original Message -----
From: Arun.N
Sent: Tuesday, December 11, 2001 1:31 PM
Subject: urgent encoding problem...

Hi all,
            I have some problems with the xsp pages and encoding. When i try to display Shift_JIS encoded characters it is not displaying properly.
when i hard code the japnese characters it is working properly. for example in this xsp page
 
<?xml version="1.0" encoding="Shift_JIS"?>
<?cocoon-process type="xsp"?>
<?cocoon-process type="xslt"?>
<?xml-stylesheet href="xsl/viewMail-to-html.xsl" type="text/xsl" ?>
<xsp:page
  language="java"
  encoding="Shift_JIS"
  xmlns:xsp="http://www.apache.org/1999/XSP/Core"
  xmlns:request="http://www.apache.org/1999/XSP/Request"
  xmlns:util="http://www.apache.org/1999/XSP/Util"
 >
<page>
   <title>melpo View Mail</title>
  <body>
        <label>‚ ‚È‚½‚ÌPC‚Ì’†‚̃[ƒ‹ƒNƒ‰ƒCƒAƒ“ƒg‚ªÄŠJ‚³‚ê‚Ü‚µ‚½B </label>
    </body>
</xsp:page>
 
the display html is working fine and the characters are working properly .. but the source of the html shows
<html>
    <body>
    &#12354;&#12394;&#12383;&#12398;PC&#12398;&#20013;&#12398;&#12513;&#12540;&#12523;&#12463;&#12521;&#12452;&#12450;&#12531;&#12488;&#12364;&#20877;&#38283;&#12373;&#12428;&#12414;&#12375;&#12383;&#12290;
    </body>
</html>
<!-- This page was served in 2278 milliseconds by Cocoon 1.8.2 -->
 
but why is the characters converted into numbers. the problem i have here is this consumes more bytes .. so if the device has some size limitations of the source of the page then it is a problem. if the characters are left same way then it would consume less bytes for the source page.
 
 
The second problem is, when i dynamically include xml in my xsp it is not working. But the same string when hardcode in the xsp page it is working fine.
<?xml version="1.0" encoding="Shift_JIS"?>
<?cocoon-process type="xsp"?>
<?cocoon-process type="xslt"?>
<?xml-stylesheet href="xsl/viewMail-to-html.xsl" type="text/xsl" ?>
<xsp:page
  language="java"
  encoding="Shift_JIS"
  xmlns:xsp="http://www.apache.org/1999/XSP/Core"
  xmlns:request="http://www.apache.org/1999/XSP/Request"
  xmlns:util="http://www.apache.org/1999/XSP/Util"
 >
<page>
   <title>melpo View Mail</title>
  <body>
        <xsp:logic>
             String xml = (String) request.getAttribute(xml);
            <xsp:content>
               <util:include-expr><util:expr>xml</util:expr></util:include-expr>  // this will append an xml string like <label>‚ ‚È‚½‚ÌPC‚Ì’†‚̃[J‚³‚ê‚Ü‚µ‚½B </label>          
            </xsp:content>
        </xsp:logic>
    </body>
</xsp:page>
 
i am getting an error
 
org.xml.sax.SAXException: An invalid XML character (Unicode: 0x13) was found in the element content of the document. [FATAL ERROR] [File: "null" Line: 1 Column: 109] (nested exception: org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x13) was found in the element content of the document.)
But the string i am getting if hardcoded itzworking fine. because whrn i hardcode it, the xsp page when getting compiled, it is converting all the characters to those numbers. and whenever the string is dynamically included the it is not working..................................
 
and the third problem is ,
    when i load a string to a dom andthen get back the string the encoding information is gone.The characers displayed are ???????????
        String fullXml = "<?xml version=\"1.0\" encoding=\"Shift_JIS\"?><Response><Message>Mail Client in your PC has been ƒƒOƒAƒEƒg Restarted ƒGƒLƒTƒCƒg : –|–󁄗˜—p‹K–ñ xxx </Message></Response>";
 
      DOMParser parser = new DOMParser();
      InputStream is = new ByteArrayInputStream(fullXml.getBytes());
      InputSource isource=new InputSource(is);
      parser.parse(isource);
      Document xmlDoc= parser.getDocument();       //created an dom
 ------------ doing some manipulation ------------------
      OutputFormat    format  = new OutputFormat( xmlDoc );   //Serialize DOM
      StringWriter  stringOut = new StringWriter();           //Writer will be a String
      XMLSerializer    serial = new XMLSerializer( stringOut, format );
      serial.asDOMSerializer();                               // As a DOM Serializer
      serial.serialize( xmlDoc.getDocumentElement() );
      String returnXML = stringOut.toString();  // got back the xml as String.
 
now if i display the string " returnXML " all the japanese characters are gone. the output is only "???????????"
 
Can any of you please give a solution for these problems, as it is very urgent for me. I have been trying to solve theses isuues from past 2 days and have searched mail archives i was not able to find a solution.
 
Thankx in Advance
 
regards,
Arun.N,

Reply via email to