Hi all, I've checked on same corpus. Here's the comparaison : ||Tika||POI||PDFbox||Failed docs|| |1.4|3.9|1.8.1|92| |1.5|3.10-beta2|1.8.4|182|
========================== TIKA 1.4 ======================================== - pdf (7) * (1) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@4d39a96c * (3) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.ParserDecorator$1@4d39a96c * (3) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unable to extract PDF content - pptx (8) * (7) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Error creating OOXML extractor * (1) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@4db190a5 - doc (2) * (2) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@6ddd7ea2 - ppt (40) * (39) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@6ddd7ea2 * (1) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.ParserDecorator$1@6ddd7ea2 - xls (9) * (7) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@6ddd7ea2 * (2) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.ParserDecorator$1@6ddd7ea2 - dwg (4) * (4) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: AC1014 - odp (2) * (2) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.ParserDecorator$1@7286f080 - rtf (13) * (13) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@455a7af4 - pps (5) * (5) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@6ddd7ea2 ========================== TIKA 1.5 ======================================== - pdf (16) * (10) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@1e59efa5 * (3) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.ParserDecorator$1@1e59efa5 * (3) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unable to extract PDF content - pptx (19) * (7) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Error creating OOXML extractor * (12) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@2b195ebd - doc (11) * (9) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@7b796022 * (2) com.polyspot.document.converter.ConversionException: org.xml.sax.SAXException: Namespace http://www.w3.org/1999/xhtml not declared - ppt (47) * (46) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@7b796022 * (1) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.ParserDecorator$1@7b796022 - xls (9) * (7) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@7b796022 * (2) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.ParserDecorator$1@7b796022 - xlsx (28) * (28) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@2b195ebd - dwg (4) * (4) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: AC1014 - odp (2) * (2) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.ParserDecorator$1@3dc15f75 - rtf (39) * (35) com.polyspot.document.converter.ConversionException: org.xml.sax.SAXException: Namespace http://www.w3.org/1999/xhtml not declared * (4) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@101e1163 - pps (7) * (7) com.polyspot.document.converter.ConversionException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@7b796022 Regards, Hong-Thai