Hello,
Im trying to parse a file whose content type is UTF-16. Im unable to
parse the document using the following code. Please Help me.
ContentHandler textHandler = new BodyContentHandler();
TeeContentHandler teeHandler = new
TeeContentHandler(textHandler);
parser.parse(input, teeHandler, metadata, context);
String tt = textHandler.toString();
//to print the text
byte[] converttoBytes = tt.getBytes("UTF-16");
String string = new String(converttoBytes, "utf-8");
System.out.println(string);
but its printing along with all html tags.
Thank You,
Rajesh Chejerla
--
View this message in context:
http://lucene.472066.n3.nabble.com/AutoDetectParser-is-not-parsing-UTF-16-content-types-tp4004075.html
Sent from the Apache Tika - Development mailing list archive at Nabble.com.