[jira] [Commented] (COMPRESS-500) Discrepancy in file size extracted using ZipArchieveInputStream and Gzip decompress component

Stefan Bodewig (Jira) Sun, 02 Feb 2020 04:46:13 -0800


    [ 
https://issues.apache.org/jira/browse/COMPRESS-500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17028427#comment-17028427
 ]


Stefan Bodewig commented on COMPRESS-500:
-----------------------------------------

I've written a small test program and will attach it. Running it you see:

 
 * things work as expected when using ZipFile
 * things don't work at all and you get an exception when using 
ZipArchiveInputStream without explicitly allowing the combination of data 
descriptors and stord entries. This has been true in 1.8 and remains true
 * when allowing the combination Compress 1.19 and later throws an Exception

{code:java}
java.util.zip.ZipException: compressed and uncompressed size don't match while 
reading a stored entry using data descriptor. Either the archive is broken or 
it can not be read using ZipArchiveInputStream and you must use ZipFile. A 
common cause for this is a ZIP archive containing a ZIP archive. See 
http://commons.apache.org/proper/commons-compress/zip.html#ZipArchiveInputStream_vs_ZipFile
{code}
So unfortunately what I suspected is true. You are looking at the kind of 
archive that is not possible to extract using {{ZipArchiveInputStream}} and 
there is no workaround for it.

If you create the archive yourself, either ensure you don't use a data 
descriptor and store the size information inside of the local file header or 
use the DEFLATED method, as wasteful as it may seem.

If you do not control the original archive, then you must store it to disk or 
keep it in memory (see {{SeekableInMemoryByteChannel}}) and use {{ZipFile}}.

> Discrepancy in file size extracted using ZipArchieveInputStream and Gzip 
> decompress component 
> ----------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-500
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-500
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.8, 1.18
>            Reporter: Anvesh Mora
>            Priority: Major
>         Attachments: invalidzip.zip.partaa, invalidzip.zip.partab, 
> invalidzip.zip.partac, invalidzip.zip.partad, invalidzip.zip.partae, 
> invalidzip.zip.partaf, invalidzip.zip.partag, invalidzip.zip.partah, 
> invalidzip.zip.partai
>
>
> Recent time I raised a bug facing a issue of "invalid Entry Size"  
> COMPRESS-494 ( Not resolved yet).
>  
> And we are seeing a new issue, before explaining we have a file structure as 
> below and it is received as a stream of data over HTTPS.
>  
> *File Structure*:
> In Zip file
>      We have zero or more gz files which need to decompressed
>      And meta data at the end of the zip entries (end of stream), used for 
> downloading next file zip file. As plain text.
>  
> And Now in production we are seeing new issue where we the entire gz file is 
> not decompressing. We found out that the utility on Cent OS7 is able to 
> extract and decompress the entire where as our library is failing. Below are 
> the differences in Sizes:
> Using API: *765460480* bytes
> And using Cent OS7 Linux utilities: *2032925215* bytes.
>  
> We are getting EOF File exception at GzipCompressorInputStream.java:278, I'm 
> not sure of why.
>  
> Need you help on this as we are blocked in the production. This could be a 
> potential fix for our library to make it more robust.
>  
> Let me know HOW CAN WE INCREASE THE PRIORITY IF NEEDED!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (COMPRESS-500) Discrepancy in file size extracted using ZipArchieveInputStream and Gzip decompress component

Reply via email to