[
https://issues.apache.org/jira/browse/TIKA-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18033837#comment-18033837
]
Tim Allison commented on TIKA-4533:
-----------------------------------
There's yet a larger problem in how the SecureContentHandler is calculating zip
bombs on a TikaInputStream if its length is 0, which happens when we initialize
a TikaInputStream.get(new byte[0]) and then put the openContainer there as we
do with embedded MSOffice files and PSTEmails. The parser uses that
openContainer and ignores the underlying stream. Unfortunately, the
SecureContentHandler doesn't deal with this. We've gotten lucky, I think,
because the secure content handler only triggers with a threshold > 1_000_000.
> DigestingParser needs to write out embedded containers for digesting
> --------------------------------------------------------------------
>
> Key: TIKA-4533
> URL: https://issues.apache.org/jira/browse/TIKA-4533
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Minor
>
> If there's an embedded file an office document, we sometimes pass that around
> as an openContainer in the TikaInputStream. The digester is not currently
> translating that back to bytes for digesting.
> We need to apply a StreamTranslator and digest the output of that.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)