Hi all

 

I have the url content in a string.That  means string contains html code.I want to pass it to w3c tidy ,perform clean and stored it in a byte array.I tried with following code snippet.But it showing error message and output some junk data. Instead of byte array I tried with FileOutputStream and create a temporary file.It worked fine.I also want to pass this bytearraycontent to XSLTInputHandler directly.How is it possible?

 

The following is my html code

 

<html>

<title>

<head></head>

</title>

<body>

<h3>This is for testing</h3>

</body>

</html>

 

servlet code snippet

 

String pageContent;           //string pagecontent contains the html content

StringBufferInputStream sbis=new StringBufferInputStream(pageContent);

ByteArrayOutputStream baos=new ByteArrayOutputStream();

Tidy tidy = new Tidy();

            try

            {                      

                                               

                  tidy.setXmlOut(true);

                  tidy.setXHTML(false);

                  tidy.setMakeClean(true);

                  tidy.setTidyMark(false);

                  tidy.setUpperCaseTags(false);

                  tidy.setUpperCaseAttrs(false);

                  tidy.setQuoteAmpersand(false);

                  tidy.setNumEntities(true);               

                  tidy.setCharEncoding(Configuration.UTF8);

                 

                  tidy.parse(sbis,baos);             

                  byte [] buff=new byte[2048];

                  buff=baos.toByteArray();                       

                  str=(String)buff.toString();

                  System.out.println("This is XHTML>>>>>>>>>>>>\n");

                  System.out.println(str);

            }

catch(Exception ex){}

sbis.close();

baos.close();

 

 

It results following output

 

4:47:11,406 ERROR [STDERR]

Tidy (vers 4th August 2000) Parsing "InputStream"

14:47:11,421 ERROR [STDERR] line 3 column 1 - Warning: missing </title> before <head>

14:47:11,421 ERROR [STDERR] line 3 column 1 - Warning: <head> isn't allowed in <body> elements

14:47:11,421 ERROR [STDERR] line 3 column 7 - Warning: </head> isn't allowed in <body> elements

14:47:11,421 ERROR [STDERR] line 4 column 1 - Warning: discarding unexpected </title>

14:47:11,421 ERROR [STDERR] line 5 column 1 - Warning: <body> isn't allowed in <body> elements

14:47:11,437 ERROR [STDERR]

InputStream: Document content looks like HTML 3.2

14:47:11,437 ERROR [STDERR] 5 warnings/errors were found!

14:47:11,437 INFO  [STDOUT] This is XHTML>>>>>>>>>>>>

14:47:11,437 INFO  [STDOUT] [EMAIL PROTECTED]

 

 

 

 



**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**********************************************************************

Reply via email to