[ https://issues.apache.org/jira/browse/TIKA-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475557#comment-17475557 ]
Tim Allison edited comment on TIKA-3644 at 1/13/22, 5:26 PM: ------------------------------------------------------------- It looks like the package-depth detector (the 5 in your config) is not triggered if the embeddedDocument extractor calls the parse with outputHTML=false. In the MSOffice parser, outputHTML=true; however in the OOXMLParser, outputHTML=false. I propose that we change all outputHTML=true throughout the parsers. That said, I did get a zip bomb exception when I set the maxDepth to 5 on both MSOffice and ooxml files. -Looking at the SecureContentHandler, I'm frankly not certain what the difference between packageDepth and depth is.- :P It looks like the difference is that maxPackageDepth should cover embedded items, where as maxDepth covers all html entities. was (Author: talli...@mitre.org): It looks like the package-depth detector (the 5 in your config) is not triggered if the embeddedDocument extractor calls the parse with outputHTML=false. In the MSOffice parser, outputHTML=true; however in the OOXMLParser, outputHTML=false. I propose that we change all outputHTML=true throughout the parsers. That said, I did get a zip bomb exception when I set the maxDepth to 5 on both MSOffice and ooxml files. Looking at the SecureContentHandler, I'm frankly not certain what the difference between packageDepth and depth is. :P > OfficeParser can not detect embedded zip bomb in the office documents > --------------------------------------------------------------------- > > Key: TIKA-3644 > URL: https://issues.apache.org/jira/browse/TIKA-3644 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 2.2.1 > Reporter: Sergen Bağ > Priority: Minor > Attachments: 10_2_2_2_2.zip, tika_exception.PNG, zipbomb.doc, > zipbomb.docx, zipbomb.ppt, zipbomb.pptx, zipbomb.xls, zipbomb.xlsx > > > Hi, I am trying to get "zip bomb detection" exception but I can't. I used > attachments as below and I saw this situation like that: > When I send "zipbomb.xls" and "zipbomb.doc" to Tika, Tika threw exception. > When I send "zipbomb.xlsx","zipbomb.docx","zipbomb.ppt" and "zipbomb.pptx" to > Tika, Tika didn't throw exception. > Thanks. -- This message was sent by Atlassian Jira (v8.20.1#820001)