[ 
https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652870#comment-15652870
 ] 

Tim Allison commented on TIKA-2159:
-----------------------------------

bq. ParsingEmbeddedDocumentExtractor already has some non-ideal error handling 
bits, so writing some special keys onto the container might allow us to tidy 
some bits of that up too if we do #1

I thought this would work easily, but, of course, all we see in the 
extractEmbedded is the child document's metadata. Plan B: add an (optional for 
now) parent metadata object to the constructor of 
ParsingEmbeddedDocumentExtractor.

> Handle pre-parse embedded object exceptions uniformly and more robustly
> -----------------------------------------------------------------------
>
>                 Key: TIKA-2159
>                 URL: https://issues.apache.org/jira/browse/TIKA-2159
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Tim Allison
>            Priority: Minor
>
> When an embedded document is parsed and causes an exception, we're currently 
> catching that and swallowing it in ParsingEmbeddedDocumentExtractor (the 
> default) or reporting it in the RecursiveParserWrapper by storing the 
> stacktrace in the Metadata of the embedded document.
> However, if there's an exception during detection on the embedded stream or 
> on getting the stream _before_ the stream hits the parser, we aren't handling 
> that uniformly or robustly across parsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to