[jira] [Commented] (COMPRESS-508) Bug: cannot get file size of ArchiveEntry using ZipArchiveInputStream

Peter Lee (Jira) Thu, 02 Apr 2020 05:49:52 -0700


    [ 
https://issues.apache.org/jira/browse/COMPRESS-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073700#comment-17073700
 ]


Peter Lee commented on COMPRESS-508:
------------------------------------

> I'm not familiar with how ZIP is formed, but shouldn't metadata exist in the 
>beginning (in a header section), showing where each entry exists, to be able 
>to reach it easily?

Unfortunely, it's not in the beginning.:(

 

Seems it would make it a little more clear if I introduce you some of the Zip 
specification([https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT]). 
Hope it's not that borning. :D

 

The zip file format is :

[local file header 1]
 [encryption header 1]
 [file data 1]
 [data descriptor 1]
 . 
 .
 .
 [local file header n]
 [encryption header n]
 [file data n]
 [data descriptor n]
 [archive decryption header] 
 [archive extra data record] 
 [central directory header 1]
 .
 .
 .
 [central directory header n]
 [zip64 end of central directory record]
 [zip64 end of central directory locator] 
 [end of central directory record]

 

Please note that some of these parts may not exist in a zip file.

If we are using ZipFile, it would be something like this:

The central directory headers are the metadata you are talking about. How can 
we locate the central directory headers? The offset of it central directory 
headers is stored in the End Of Central Directory(EOCD) (or Zip64 EOCD). The 
EOCD has a fixed length, so we can locate it by locating back from the end of 
the zip file. Then how can we successfully extract the entry we need? We need 
to position to the corresponding Local File Header X(the offset is stored in 
the Central Directory Header X). So we need to extract the file we in this 
sequence:

EOCD(end of central directory) -> (Zip64 EOCD locator -> Zip64 EOCD) -> 
CDH(central directory header) -> LFH(local file header).

We are always jumping backwards, so we need something like 
{{SeekableByteChannel}} to do this - and of course you can reopen the input 
stream every time you want to jump backwards, but I believe it must be time 
consuming.:(

 

If you are using ZipArchiveInputStream, it would be much more different:

We can read entry by entry from the beginning of the zip file. The metadata we 
know are stored in the LFH or the data descriptor. We do not know how many 
entries are there in the zip archive file, and we do not know the metadata of 
it until we have read the LFH or the data descriptor of it. Unfortunely, the 
data descriptor lies after the file data, which means we do not know the 
Compressed Size, Uncompressed Size and CRC checksum before we extract it(by 
using a deflator). This is what you're facing now ; the test.zip is using a 
data descriptor, so we do not know its size before we have extracted it. That's 
why the size is -1.

 

This is why we need to use SeekableByteChannel in ZipFile : we need something 
that can be repositioned with a low cost. Input streams couldn't achieve this.

This may be a little complicant for you. Hope it does not confuse you.:)

> Bug: cannot get file size of ArchiveEntry using ZipArchiveInputStream
> ---------------------------------------------------------------------
>
>                 Key: COMPRESS-508
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-508
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Build
>    Affects Versions: 1.20
>         Environment: Android 9 and Android 10, on both emulator and real 
> device .
>            Reporter: AD_LB
>            Priority: Major
>         Attachments: 2020-03-31_20-53-36.png, 2020-04-01_18-28-19.mp4, 
> ZipTest.zip, ZipTest2.zip, test.zip
>
>
> I'm trying to use ZipArchiveInputStream to iterate over the items of a zip 
> file (which may or may not be a real file on the file-system, which is why I 
> use a stream), optionally creating a stream from specific entries.
> One of the operations I need is to get the size of the files within.
> For some reason, it fails to do so. Not only that, but it throws an exception 
> when I'm done with it:
> {code:java}
> Error:org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException:
>  Unsupported feature data descriptor used in entry ...
> {code}
> I've attached here 3 files:sample project, the problematic zip file (remember 
> that you need to put it in the correct path and grant storage permission), 
> and a screenshot of the issue.
> Note that if I open the file using a third party PC app (such as 
> [7-zip|https://www.7-zip.org/]  ), it works fine, including showing the file 
> size inside.
> Files:
> !2020-03-31_20-53-36.png![^test.zip]
> [^ZipTest.zip]
> Here's the relevant code (kotlin) :
>  
> {code:java}
>         thread {
>             try {
>                 val file = File("/storage/emulated/0/test.zip")
>                 ZipArchiveInputStream(FileInputStream(file)).use {
>                     while (true) {
>                         val entry = it.nextEntry ?: break
>                         Log.d("AppLog", "entry:${entry.name} ${entry.size} ")
>                     }
>                 }
>                 Log.d("AppLog", "got archive ")
>             } catch (e: Exception) {
>                 Log.d("AppLog", "Error:$e")
>                 e.printStackTrace()
>             }
>         }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (COMPRESS-508) Bug: cannot get file size of ArchiveEntry using ZipArchiveInputStream

Reply via email to