Hi all,

I've checked on same corpus. Here's the comparaison :
||Tika||POI||PDFbox||Failed docs||
|1.4|3.9|1.8.1|92|
|1.5|3.10-beta2|1.8.4|182|

========================== TIKA 1.4 ========================================
                - pdf (7)
                               * (1) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.ParserDecorator$1@4d39a96c
                               * (3) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from 
org.apache.tika.parser.ParserDecorator$1@4d39a96c
                               * (3) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unable to extract PDF content
                - pptx (8)
                               * (7) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Error creating OOXML extractor
                               * (1) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.ParserDecorator$1@4db190a5
                - doc (2)
                               * (2) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.ParserDecorator$1@6ddd7ea2
                - ppt (40)
                               * (39) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.ParserDecorator$1@6ddd7ea2
                               * (1) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from 
org.apache.tika.parser.ParserDecorator$1@6ddd7ea2
                - xls (9)
                               * (7) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.ParserDecorator$1@6ddd7ea2
                               * (2) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from 
org.apache.tika.parser.ParserDecorator$1@6ddd7ea2
                - dwg (4)
                               * (4) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: 
AC1014
                - odp (2)
                               * (2) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from 
org.apache.tika.parser.ParserDecorator$1@7286f080
                - rtf (13)
                               * (13) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.ParserDecorator$1@455a7af4
                - pps (5)
                               * (5) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.ParserDecorator$1@6ddd7ea2

========================== TIKA 1.5 ========================================
                - pdf (16)
                               * (10) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.ParserDecorator$1@1e59efa5
                               * (3) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from 
org.apache.tika.parser.ParserDecorator$1@1e59efa5
                               * (3) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unable to extract PDF content
                - pptx (19)
                               * (7) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Error creating OOXML extractor
                               * (12) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.ParserDecorator$1@2b195ebd
                - doc (11)
                               * (9) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.ParserDecorator$1@7b796022
                               * (2) 
com.polyspot.document.converter.ConversionException: org.xml.sax.SAXException: 
Namespace http://www.w3.org/1999/xhtml not declared
                - ppt (47)
                               * (46) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.ParserDecorator$1@7b796022
                               * (1) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from 
org.apache.tika.parser.ParserDecorator$1@7b796022
                - xls (9)
                               * (7) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.ParserDecorator$1@7b796022
                               * (2) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from 
org.apache.tika.parser.ParserDecorator$1@7b796022
                - xlsx (28)
                               * (28) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.ParserDecorator$1@2b195ebd
                - dwg (4)
                               * (4) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: 
AC1014
                - odp (2)
                               * (2) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from 
org.apache.tika.parser.ParserDecorator$1@3dc15f75
                - rtf (39)
                               * (35) 
com.polyspot.document.converter.ConversionException: org.xml.sax.SAXException: 
Namespace http://www.w3.org/1999/xhtml not declared
                               * (4) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.ParserDecorator$1@101e1163
                - pps (7)
                               * (7) 
com.polyspot.document.converter.ConversionException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.ParserDecorator$1@7b796022


Regards,

Hong-Thai

Reply via email to