AutoDetectParser is not parsing UTF-16 content types

chraj007 Wed, 29 Aug 2012 08:56:27 -0700

Hello,
   Im trying to parse a file whose content type is UTF-16. Im unable to
parse the document using the following code. Please Help me.


       ContentHandler textHandler = new BodyContentHandler();
        TeeContentHandler teeHandler            =        new
TeeContentHandler(textHandler);
        parser.parse(input, teeHandler, metadata, context);      
        String tt = textHandler.toString();
//to print the text

 byte[] converttoBytes = tt.getBytes("UTF-16");
        String string = new String(converttoBytes, "utf-8");
       System.out.println(string);

but its printing along with all html tags.

Thank You,
Rajesh Chejerla



--
View this message in context: 
http://lucene.472066.n3.nabble.com/AutoDetectParser-is-not-parsing-UTF-16-content-types-tp4004075.html
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

AutoDetectParser is not parsing UTF-16 content types

Reply via email to