[ 
https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754611#comment-15754611
 ] 

Tim Allison commented on TIKA-2208:
-----------------------------------

bq. I'm saying it's better because previously we were rejecting the document 
entirely.

Really?  Wow. 

I'll go ahead with 1) above and add the test file to POI some time today so 
that the extra classes at least from this test file will be included in the 
next release of POI's ooxml-schemas.

If you have any opinion on 2), let us know.  [~gagravarr], your thoughts?

Finally, I'm sorry I was slow to respond to this issue.  Thank you, Nick, for 
the ping.  Nice doing work with you and ES! 

Cheers!

> Catch missing libraires
> -----------------------
>
>                 Key: TIKA-2208
>                 URL: https://issues.apache.org/jira/browse/TIKA-2208
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: David Pilato
>
> Hi there
> We have decided to remove support for some formats when using Tika to extract 
> text and metadata.
> We defined our list of Parsers:
> {code:java}
>     private static final Parser PARSERS[] = new Parser[] {
>         // documents
>         new org.apache.tika.parser.html.HtmlParser(),
>         new org.apache.tika.parser.rtf.RTFParser(),
>         new org.apache.tika.parser.pdf.PDFParser(),
>         new org.apache.tika.parser.txt.TXTParser(),
>         new org.apache.tika.parser.microsoft.OfficeParser(),
>         new org.apache.tika.parser.microsoft.OldExcelParser(),
>         new org.apache.tika.parser.microsoft.ooxml.OOXMLParser(),
>         new org.apache.tika.parser.odf.OpenDocumentParser(),
>         new org.apache.tika.parser.iwork.IWorkPackageParser(),
>         new org.apache.tika.parser.xml.DcXMLParser(),
>         new org.apache.tika.parser.epub.EpubParser(),
>     };
>     private static final AutoDetectParser PARSER_INSTANCE = new 
> AutoDetectParser(PARSERS);
>     private static final Tika TIKA_INSTANCE = new 
> Tika(PARSER_INSTANCE.getDetector(), PARSER_INSTANCE);
> {code}
> But when a MS Office Word document embeds another non supported document 
> (Like a Visio Schema) an {{NoClassDefFoundError}} is raised.
> Would it be possible to catch such a case and throw in that case a 
> {{TikaException}} so it behaves as an Exception and not as a Throwable?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to