Hi Joerg!

Thanx for your reply.

The pure Tidy works properly (output stream encoding is the same as the input stream 
encoding).
The problem, from my point of view, is in transformer (or streamer [if xpath is null 
value]) input stream encoding (HTMLGenerator),
because Tidy DOM parser returns KOI8-R encoded document  (the same as Tidy input 
document encoding), 
but  HTMLGenegator needs, I guess, UTF-8 encoded document in input stream for it's 
transformer or streamer.

What do you think about my guessing?

> Hello Yuri,
> 
> I only can confirm the bug in HTML generator. It seems it can not read 
> the KOI8-R encoded file correctly. I tested it with your html snippet 
> saved to a static file.
> 
> serializer.setOutputProperty(OutputKeys.ENCODING, "KOI8-R"); of course 
> does not help, because that's only the output. Configuring the 
> serializer in the sitemap to KOI8-R works correctly, if the input file 
> is not encoded in KOI8-R (and I guess in some other more or less exotic 
> encodings too).
> If it were a bug in the serializer, the character reference like ð 
> would be ok, because a character, that's not directly available in this 
> encoding, must be expressed/referenced by such a reference.
> 
> I hope, I didn't say anything wrong ;-) Yuri, I think it's the best to 
> post a bug in bugzilla at http://nagoya.apache.org/bugzilla/.
> 
> Regards,
> 
> Joerg
> 
> Yury Mikhienko wrote:
> > Hi all!
> > 
> > Can anyone help me with the following problem:
> > 
> > I have a KOI8-R encoded HTML document. After processing this document with 
>HTMLGenerator, in output I have ISO-8859-1 encoded document  :((
> > 
> > for example
> > The source document:
> > (from URL: /test)
> > <html>
> > <head>
> > <meta http-equiv="Content-Type" content="text/html; charset=KOI8-R">
> > <title>Привет!</title>
> > </head>
> > <body vlink="blue" link="blue" alink="red" bgcolor="white">
> >  
> > <title>Привет!</title>
> > 
> > </body>
> > </html>
> > 
> >  (in sitemap.xmap):
> >     <map:serializer logger="sitemap.serializer.xml" mime-type="text/xml" 
>name="xml" src="org.apache.cocoon.serialization.XMLSerializer">
> >       <buffer-size>1024</buffer-size>
> >       <encoding>KOI8-R</encoding>
> >     </map:serializer>
> > 
> > ...
> > 
> >    <map:match pattern="test">
> >     <map:generate src="work/test/test.xml"/>
> >     <map:transform src="work/test/test-page2html.xsl"/>
> >     <map:serialize type="html"/>
> >    </map:match>
> > 
> >     <map:match pattern="test-include">
> >      <map:generate  src="http://localhost/cocoon/test"; type="html">
> >      </map:generate>
> >      <map:serialize type="xml"/>
> >     </map:match>    
> > 
> > After HTMLGenerator processing I get the:
> > 
> > <?xml version="1.0" encoding="KOI8-R"?>
> > <html xmlns="http://www.w3.org/1999/xhtml";>
> > <head>
> > 
> > <meta content="HTML Tidy, see www.w3.org" name="generator"/>
> > 
> > <meta content="text/html; charset=KOI8-R" http-equiv="Content-Type"/>
> > <!-- It Wrong!!!!!!!!!!!!!!!! -->
> > 
><title>&#240;&#210;&#201;&#215;&#197;&#212;!</title><title>&#240;&#210;&#201;&#215;&#197;&#212;!</title>
> > <!-- !!!!!!!!!!!!!!!!!!! -->
> > </head>
> > <body bgcolor="white" alink="red" link="blue" vlink="blue"/></html>
> > 
> > I add the following line in HTMLGenerator :
> >  ...
> >                 Transformer serializer = 
>TransformerFactory.newInstance().newTransformer();
> >                 serializer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, 
>"yes");
> > //########################### NEW ############################   
> >              serializer.setOutputProperty(OutputKeys.ENCODING, "KOI8-R");
> > //############################################################
> >                 NodeList nl = processor.selectNodeList(doc, xpath);
> > 
> > ...
> > 
> > But that dont solve the problem :((( .
> > 
> > Where I'm wrong? Please, help me!
> > 
> > Thanx for advise.
> 

-- 
 
Best regards,
Yury Mikhienko.
IT engineer, ZAO "Mobicom-Kavkaz"

---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>

Reply via email to