Have you tried using the PDFParser [1] and the OfficeParser [2]
classes from Tika?

This question seems to be more appropriate for the Tika user mailing list [3]?

[1] 
http://tika.apache.org/1.3/api/org/apache/tika/parser/pdf/PDFParser.html#parse(java.io.InputStream,
org.xml.sax.ContentHandler, org.apache.tika.metadata.Metadata,
org.apache.tika.parser.ParseContext)
[2] 
http://tika.apache.org/1.3/api/org/apache/tika/parser/microsoft/OfficeParser.html#parse(java.io.InputStream,
org.xml.sax.ContentHandler, org.apache.tika.metadata.Metadata,
org.apache.tika.parser.ParseContext)
[3] http://tika.apache.org/mail-lists.html

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to