[jira] [Commented] (COMPRESS-500) Discrepancy in file size extracted using ZipArchieveInputStream and Gzip decompress component

Anvesh Mora (Jira) Mon, 30 Dec 2019 03:22:07 -0800


    [ 
https://issues.apache.org/jira/browse/COMPRESS-500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005262#comment-17005262
 ]


Anvesh Mora commented on COMPRESS-500:
--------------------------------------

[~bodewig], Yes your understanding right. We are uncompressing the gzip file in 
Zip file.

I'm determining the size of file and comparing it after the gzip file is 
decompressed ( after it has been written to disk) by common-compress library 
and also unzip & gunzip on Centos.

 

And Entry size is not same when looked in ZipEntry. Basically it's giving -1:
{code:java}
Entry Name: cloud_3672_20191209220000.log.gz Entry size: -1
{code}
 

I did small code snippet to test with JDK ZipFile and GZip I got similar output 
file size as centos without any EOF excpetion:
{code:java}
Enumeration<? extends ZipEntry> zipEntries = zipFile.entries();

                while(zipEntries.hasMoreElements()){
                        ZipEntry entry = zipEntries.nextElement();
                        InputStream inputStream = zipFile.getInputStream(entry);
                        GZIPInputStream gzipCompressorInputStream = new 
GZIPInputStream(inputStream);

                        OutputStream os = new BufferedOutputStream(new 
FileOutputStream("/home/amora/Work/"+entry.getName()));
                        IOUtils.copy(gzipCompressorInputStream,os);
                }

{code}
 

 File decompressed size is: 2032922454 (Dec 30 11:16)

 

> Discrepancy in file size extracted using ZipArchieveInputStream and Gzip 
> decompress component 
> ----------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-500
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-500
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.8, 1.18
>            Reporter: Anvesh Mora
>            Priority: Major
>
> Recent time I raised a bug facing a issue of "invalid Entry Size"  
> COMPRESS-494 ( Not resolved yet).
>  
> And we are seeing a new issue, before explaining we have a file structure as 
> below and it is received as a stream of data over HTTPS.
>  
> *File Structure*:
> In Zip file
>      We have zero or more gz files which need to decompressed
>      And meta data at the end of the zip entries (end of stream), used for 
> downloading next file zip file. As plain text.
>  
> And Now in production we are seeing new issue where we the entire gz file is 
> not decompressing. We found out that the utility on Cent OS7 is able to 
> extract and decompress the entire where as our library is failing. Below are 
> the differences in Sizes:
> Using API: *765460480* bytes
> And using Cent OS7 Linux utilities: *2032925215* bytes.
>  
> We are getting EOF File exception at GzipCompressorInputStream.java:278, I'm 
> not sure of why.
>  
> Need you help on this as we are blocked in the production. This could be a 
> potential fix for our library to make it more robust.
>  
> Let me know HOW CAN WE INCREASE THE PRIORITY IF NEEDED!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (COMPRESS-500) Discrepancy in file size extracted using ZipArchieveInputStream and Gzip decompress component

Reply via email to