[ https://issues.apache.org/jira/browse/COMPRESS-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346406#comment-17346406 ]
Robin Schimpf commented on COMPRESS-542: ---------------------------------------- Didn't have the time for a close look but just running the test and see it go down to ~100ms is impressive. Great work! > Corrupt 7z allocates huge amount of SevenZEntries > ------------------------------------------------- > > Key: COMPRESS-542 > URL: https://issues.apache.org/jira/browse/COMPRESS-542 > Project: Commons Compress > Issue Type: Bug > Affects Versions: 1.20 > Reporter: Robin Schimpf > Priority: Major > Attachments: > Reduced_memory_allocation_for_corrupted_7z_archives.patch, > endheadercorrupted.7z, endheadercorrupted2.7z > > Time Spent: 3h 10m > Remaining Estimate: 0h > > We ran into a problem where a 1.43GB corrupt 7z file tried to allocate about > 138 million SevenZArchiveEntries which will use about 12GB of memory. Sadly > I'm unable to share the file. If you have enough Memory available the > following exception is thrown. > {code:java} > java.io.IOException: Start header corrupt and unable to guess end Header > at > org.apache.commons.compress.archivers.sevenz.SevenZFile.tryToLocateEndHeader(SevenZFile.java:511) > at > org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:470) > at > org.apache.commons.compress.archivers.sevenz.SevenZFile.<init>(SevenZFile.java:336) > at > org.apache.commons.compress.archivers.sevenz.SevenZFile.<init>(SevenZFile.java:128) > at > org.apache.commons.compress.archivers.sevenz.SevenZFile.<init>(SevenZFile.java:369) > {code} > 7z itself aborts really quick when I'm trying to list the content of the file. > {code:java} > 7z l "corrupt.7z" > 7-Zip 18.01 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-01-28 > Scanning the drive for archives: > 1 file, 1537752212 bytes (1467 MiB) > Listing archive: corrupt.7z > ERROR: corrupt.7z : corrupt.7z > Open ERROR: Can not open the file as [7z] archive > ERRORS: > Is not archive > Errors: 1 > {code} > I hacked together the attached patch which will reduce the memory allocation > to about 1GB. So lazy instantiation of the entries could be a good solution > to the problem. Optimal would be to only create the entries if the headers > could be parsed correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005)