Hi all I have the url content in a string.That means string contains html code.I want to pass it to w3c tidy ,perform clean and
stored it in a byte array.I tried with following code
snippet.But it showing error message and output some
junk data. Instead of byte array I tried with FileOutputStream
and create a temporary file.It worked fine.I also want to pass this bytearraycontent
to XSLTInputHandler directly.How
is it possible? The following is my html code <html> <title> <head></head> </title> <body> <h3>This is for testing</h3> </body> </html> servlet code
snippet String pageContent; //string pagecontent
contains the html content StringBufferInputStream sbis=new StringBufferInputStream(pageContent); ByteArrayOutputStream baos=new ByteArrayOutputStream(); Tidy tidy = new Tidy(); try { tidy.setXmlOut(true); tidy.setXHTML(false); tidy.setMakeClean(true); tidy.setTidyMark(false); tidy.setUpperCaseTags(false); tidy.setUpperCaseAttrs(false); tidy.setQuoteAmpersand(false); tidy.setNumEntities(true); tidy.setCharEncoding(Configuration.UTF8); tidy.parse(sbis,baos); byte [] buff=new byte[2048]; buff=baos.toByteArray(); str=(String)buff.toString(); System.out.println("This is
XHTML>>>>>>>>>>>>\n"); System.out.println(str); } catch(Exception ex){} sbis.close(); baos.close(); It results following output 4:47:11,406 ERROR [STDERR] Tidy (vers 14:47:11,421
ERROR [STDERR] line 3 column 1 - Warning: missing </title> before
<head> 14:47:11,421
ERROR [STDERR] line 3 column 1 - Warning: <head>
isn't allowed in <body> elements 14:47:11,421
ERROR [STDERR] line 3 column 7 - Warning: </head> isn't allowed in
<body> elements 14:47:11,421
ERROR [STDERR] line 4 column 1 - Warning: discarding
unexpected </title> 14:47:11,421
ERROR [STDERR] line 5 column 1 - Warning: <body>
isn't allowed in <body> elements 14:47:11,437 ERROR [STDERR] InputStream: Document content looks like
HTML 3.2 14:47:11,437
ERROR [STDERR] 5 warnings/errors were found! 14:47:11,437 INFO [STDOUT]
This is XHTML>>>>>>>>>>>> 14:47:11,437 INFO [STDOUT]
[EMAIL PROTECTED]
|
- How to store the parsing output of tidy into a ByteArray Eldho George