[ 
https://issues.apache.org/jira/browse/COMPRESS-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345655#comment-17345655
 ] 

Stefan Bodewig commented on COMPRESS-542:
-----------------------------------------

I've added a few more counts during the sanity check phase and the result is 
convincing IMHO. The unit test {{testNoOOMOnCorruptedHeader}} added earlier 
initially took something like 15s on my machine and now finishes with less than 
half a second (and almost no memory  being consumed at all).

I'll run a few benchmarks to see whether - and if how much - the additional 
parser run hurts for valid archives to get some data on whether to enable 
sanity checking for each archive (it is right now) or only for suspect broken 
archives.

> Corrupt 7z allocates huge amount of SevenZEntries
> -------------------------------------------------
>
>                 Key: COMPRESS-542
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-542
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.20
>            Reporter: Robin Schimpf
>            Priority: Major
>         Attachments: 
> Reduced_memory_allocation_for_corrupted_7z_archives.patch, 
> endheadercorrupted.7z, endheadercorrupted2.7z
>
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We ran into a problem where a 1.43GB corrupt 7z file tried to allocate about 
> 138 million SevenZArchiveEntries which will use about 12GB of memory. Sadly 
> I'm unable to share the file. If you have enough Memory available the 
> following exception is thrown.
> {code:java}
> java.io.IOException: Start header corrupt and unable to guess end Header
>       at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.tryToLocateEndHeader(SevenZFile.java:511)
>       at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:470)
>       at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.<init>(SevenZFile.java:336)
>       at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.<init>(SevenZFile.java:128)
>       at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.<init>(SevenZFile.java:369)
> {code}
> 7z itself aborts really quick when I'm trying to list the content of the file.
> {code:java}
> 7z l "corrupt.7z"
> 7-Zip 18.01 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-01-28
> Scanning the drive for archives:
> 1 file, 1537752212 bytes (1467 MiB)
> Listing archive: corrupt.7z
> ERROR: corrupt.7z : corrupt.7z
> Open ERROR: Can not open the file as [7z] archive
> ERRORS:
> Is not archive
> Errors: 1
> {code}
> I hacked together the attached patch which will reduce the memory allocation 
> to about 1GB. So lazy instantiation of the entries could be a good solution 
> to the problem. Optimal would be to only create the entries if the headers 
> could be parsed correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to