[
https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648562#comment-15648562
]
Tim Allison edited comment on TIKA-2159 at 11/9/16 5:30 PM:
------------------------------------------------------------
For the general solution, I see two options:
1) store the stacktrace in the container's metadata with a key signifying an
exception when trying to read an embedded stream (perhaps
{{TikaCoreProperties.TIKA_META_EXCEPTION_WARNING}}?)
2) send the exception (along with a zero-length bytestream) through to the
embedded parser which then has to check to see if there's already been an
exception.
The second might be a bit more work, but it would more closely align what the
user sees when there's a ParseException on an embedded object and when there's
an exception just trying to get the stream before trying to parse the embedded
object.
With the first option would be far simpler to code, but the user would have to
check for stacktraces in the embedded docs (as stored by the
RecursiveParserWrapper) _and_ stacktraces stored in the container files. Not
too much work...
Preferences or other options?
was (Author: [email protected]):
For the general solution, I see two options:
1) store the stacktrace in the container's metadata with a key signifying an
exception when trying to read an embedded stream.
2) send the exception (along with a zero-length bytestream) through to the
embedded parser which then has to check to see if there's already been an
exception.
The second might be a bit more work, but it would more closely align what the
user sees when there's a ParseException on an embedded object and when there's
an exception just trying to get the stream before trying to parse the embedded
object.
With the first option would be far simpler to code, but the user would have to
check for stacktraces in the embedded docs (as stored by the
RecursiveParserWrapper) _and_ stacktraces stored in the container files. Not
too much work...
Preferences or other options?
> Handle pre-parse embedded object exceptions uniformly and more robustly
> -----------------------------------------------------------------------
>
> Key: TIKA-2159
> URL: https://issues.apache.org/jira/browse/TIKA-2159
> Project: Tika
> Issue Type: Bug
> Components: parser
> Reporter: Tim Allison
> Priority: Minor
>
> When an embedded document is parsed and causes an exception, we're currently
> catching that and swallowing it in ParsingEmbeddedDocumentExtractor (the
> default) or reporting it in the RecursiveParserWrapper by storing the
> stacktrace in the Metadata of the embedded document.
> However, if there's an exception during detection on the embedded stream or
> on getting the stream _before_ the stream hits the parser, we aren't handling
> that uniformly or robustly across parsers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)