[
https://issues.apache.org/jira/browse/TIKA-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18033823#comment-18033823
]
Tim Allison commented on TIKA-4533:
-----------------------------------
There's also a subtle bug in using the digesting parser when created by the
AutoDetectParser via tika-config vs what we're doing in the TikaCLI, where we
wrap the AutoDetectParser in a DigestingParser. The former causes a potential
zip-bomb exception, and the latter works.
> DigestingParser needs to write out embedded containers for digesting
> --------------------------------------------------------------------
>
> Key: TIKA-4533
> URL: https://issues.apache.org/jira/browse/TIKA-4533
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Minor
>
> If there's an embedded file an office document, we sometimes pass that around
> as an openContainer in the TikaInputStream. The digester is not currently
> translating that back to bytes for digesting.
> We need to apply a StreamTranslator and digest the output of that.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)