[ https://issues.apache.org/jira/browse/TIKA-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr updated TIKA-4206: ---------------------------------- Description: I see TIKA-216 which aims to prevent Zip bombs, but I'm seeing what looks like a bomb on 3.0.0 Beta. The zip bomb is a mime encoded attachment to an email, which may be why it isn't throwing an error. On my machine attempting to extract text (-J) the process continues infinitely (or at least 10 hours, which is when I stopped it). The actual file is embedded in a .gz file inside of an ARC file. However, extracting the attached .txt file produces the same error. The original ARC file is at: [https://eotarchive.s3.amazonaws.com/crawl-data/EOT-2004/segments/NARA-004/warc/NARA-PEOT-2004-20041111065521-04317-crawling-fast-c_NARA-PEOT-2004-20041111101148-00173-crawling008.archive.org.arc.gz] was: I see Tika-216 which aims to prevent Zip bombs, but I'm seeing what looks like a bomb on 3.0.0 Beta. The zip bomb is a mime encoded attachment to an email, which may be why it isn't throwing an error. On my machine attempting to extract text (-J) the process continues infinitely (or at least 10 hours, which is when I stopped it). The actual file is embedded in a .gz file inside of an ARC file. However, extracting the attached .txt file produces the same error. The original ARC file is at: https://eotarchive.s3.amazonaws.com/crawl-data/EOT-2004/segments/NARA-004/warc/NARA-PEOT-2004-20041111065521-04317-crawling-fast-c_NARA-PEOT-2004-20041111101148-00173-crawling008.archive.org.arc.gz > Variation on Zip Bomb > --------------------- > > Key: TIKA-4206 > URL: https://issues.apache.org/jira/browse/TIKA-4206 > Project: Tika > Issue Type: Bug > Affects Versions: 3.0.0-BETA > Reporter: Gregory Lepore > Priority: Major > Attachments: sample-42-mail-bomb.txt > > > I see TIKA-216 which aims to prevent Zip bombs, but I'm seeing what looks > like a bomb on 3.0.0 Beta. The zip bomb is a mime encoded attachment to an > email, which may be why it isn't throwing an error. > On my machine attempting to extract text (-J) the process continues > infinitely (or at least 10 hours, which is when I stopped it). > The actual file is embedded in a .gz file inside of an ARC file. However, > extracting the attached .txt file produces the same error. > > The original ARC file is at: > [https://eotarchive.s3.amazonaws.com/crawl-data/EOT-2004/segments/NARA-004/warc/NARA-PEOT-2004-20041111065521-04317-crawling-fast-c_NARA-PEOT-2004-20041111101148-00173-crawling008.archive.org.arc.gz] -- This message was sent by Atlassian Jira (v8.20.10#820010)