[ 
https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648562#comment-15648562
 ] 

Tim Allison commented on TIKA-2159:
-----------------------------------

For the general solution, I see two options:

1) store the stacktrace in the container's metadata with a key signifying an 
exception when trying to read an embedded stream.
2) send the exception (along with a zero-length bytestream) through to the 
embedded parser which then has to check to see if there's already been an 
exception.

The second might be a bit more work, but it would more closely align what the 
user sees when there's a ParseException on an embedded object and when there's 
an exception just trying to get the stream before trying to parse the embedded 
object.

With the first option, the user would have to check for stacktraces in the 
embedded docs (as stored by the RecursiveParserWrapper) _and_ stacktraces 
stored in the container files.

Preferences or other options?

> Handle pre-parse embedded object exceptions uniformly and more robustly
> -----------------------------------------------------------------------
>
>                 Key: TIKA-2159
>                 URL: https://issues.apache.org/jira/browse/TIKA-2159
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Tim Allison
>            Priority: Minor
>
> When an embedded document is parsed and causes an exception, we're currently 
> catching that and swallowing it in ParsingEmbeddedDocumentExtractor (the 
> default) or reporting it in the RecursiveParserWrapper by storing the 
> stacktrace in the Metadata of the embedded document.
> However, if there's an exception during detection on the embedded stream or 
> on getting the stream _before_ the stream hits the parser, we aren't handling 
> that uniformly or robustly across parsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to