[ 
https://issues.apache.org/jira/browse/TIKA-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584249#comment-13584249
 ] 

Michael McCandless commented on TIKA-1074:
------------------------------------------

{quote}
bq. InterruptedException is never thrown in these places today, so I can't add 
the separate catch clause (compiler is angry).

It's a checked exception, so if it isn't declared to be thrown by POI, it 
shouldn't get thrown here (even though the VM doesn't strictly prohibit that).
{quote}

Exactly: I'm trying to future proof.

bq. So in that case the extra check shouldn't even be needed.

Wait, do you mean I should remove the handling entirely (not bother future 
proofing)?

{quote}
bq. I think it's cleaner to set the interrupt bit and let the next place that 
waits see the interrupt bit and throw IE?

I don't really like this approach. We're essentially saying: "Yes, you asked me 
to stop what I'm doing, but instead I'll just finish up what I was doing and 
ask the next guy to stop." Instead, when receiving an IE I'd prefer Tika to 
stop immediately, either by letting the IE bubble up or (where necessary) by 
throwing a TikaException that wraps the IE.
{quote}

OK, maybe we can throw TikaException today (*and* set the interrupt
bit), and then in the future (if/when these places really do throw
IE), we can change this to throwing a IE instead of TikaException.  I
can put that as a TODO.

                
> Extraction should continue if an exception is hit visiting an embedded 
> document
> -------------------------------------------------------------------------------
>
>                 Key: TIKA-1074
>                 URL: https://issues.apache.org/jira/browse/TIKA-1074
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 1.4
>
>         Attachments: TIKA-1074.patch, TIKA-1074.patch
>
>
> Spinoff from TIKA-1072.
> In that issue, a problematic document (still not sure if document is corrupt, 
> or possible POI bug) caused an exception when visiting the embedded documents.
> If I change Tika to suppress that exception, the rest of the document 
> extracts fine.
> So somehow I think we should be more robust here, and maybe log the 
> exception, or save/record the exception(s) somewhere so after parsing the app 
> could decide what to do about them ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to