[ https://issues.apache.org/jira/browse/COMPRESS-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073700#comment-17073700 ]
Peter Lee commented on COMPRESS-508: ------------------------------------ > I'm not familiar with how ZIP is formed, but shouldn't metadata exist in the >beginning (in a header section), showing where each entry exists, to be able >to reach it easily? Unfortunely, it's not in the beginning.:( Seems it would make it a little more clear if I introduce you some of the Zip specification([https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT]). Hope it's not that borning. :D The zip file format is : [local file header 1] [encryption header 1] [file data 1] [data descriptor 1] . . . [local file header n] [encryption header n] [file data n] [data descriptor n] [archive decryption header] [archive extra data record] [central directory header 1] . . . [central directory header n] [zip64 end of central directory record] [zip64 end of central directory locator] [end of central directory record] Please note that some of these parts may not exist in a zip file. If we are using ZipFile, it would be something like this: The central directory headers are the metadata you are talking about. How can we locate the central directory headers? The offset of it central directory headers is stored in the End Of Central Directory(EOCD) (or Zip64 EOCD). The EOCD has a fixed length, so we can locate it by locating back from the end of the zip file. Then how can we successfully extract the entry we need? We need to position to the corresponding Local File Header X(the offset is stored in the Central Directory Header X). So we need to extract the file we in this sequence: EOCD(end of central directory) -> (Zip64 EOCD locator -> Zip64 EOCD) -> CDH(central directory header) -> LFH(local file header). We are always jumping backwards, so we need something like {{SeekableByteChannel}} to do this - and of course you can reopen the input stream every time you want to jump backwards, but I believe it must be time consuming.:( If you are using ZipArchiveInputStream, it would be much more different: We can read entry by entry from the beginning of the zip file. The metadata we know are stored in the LFH or the data descriptor. We do not know how many entries are there in the zip archive file, and we do not know the metadata of it until we have read the LFH or the data descriptor of it. Unfortunely, the data descriptor lies after the file data, which means we do not know the Compressed Size, Uncompressed Size and CRC checksum before we extract it(by using a deflator). This is what you're facing now ; the test.zip is using a data descriptor, so we do not know its size before we have extracted it. That's why the size is -1. This is why we need to use SeekableByteChannel in ZipFile : we need something that can be repositioned with a low cost. Input streams couldn't achieve this. This may be a little complicant for you. Hope it does not confuse you.:) > Bug: cannot get file size of ArchiveEntry using ZipArchiveInputStream > --------------------------------------------------------------------- > > Key: COMPRESS-508 > URL: https://issues.apache.org/jira/browse/COMPRESS-508 > Project: Commons Compress > Issue Type: Bug > Components: Build > Affects Versions: 1.20 > Environment: Android 9 and Android 10, on both emulator and real > device . > Reporter: AD_LB > Priority: Major > Attachments: 2020-03-31_20-53-36.png, 2020-04-01_18-28-19.mp4, > ZipTest.zip, ZipTest2.zip, test.zip > > > I'm trying to use ZipArchiveInputStream to iterate over the items of a zip > file (which may or may not be a real file on the file-system, which is why I > use a stream), optionally creating a stream from specific entries. > One of the operations I need is to get the size of the files within. > For some reason, it fails to do so. Not only that, but it throws an exception > when I'm done with it: > {code:java} > Error:org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException: > Unsupported feature data descriptor used in entry ... > {code} > I've attached here 3 files:sample project, the problematic zip file (remember > that you need to put it in the correct path and grant storage permission), > and a screenshot of the issue. > Note that if I open the file using a third party PC app (such as > [7-zip|https://www.7-zip.org/] ), it works fine, including showing the file > size inside. > Files: > !2020-03-31_20-53-36.png![^test.zip] > [^ZipTest.zip] > Here's the relevant code (kotlin) : > > {code:java} > thread { > try { > val file = File("/storage/emulated/0/test.zip") > ZipArchiveInputStream(FileInputStream(file)).use { > while (true) { > val entry = it.nextEntry ?: break > Log.d("AppLog", "entry:${entry.name} ${entry.size} ") > } > } > Log.d("AppLog", "got archive ") > } catch (e: Exception) { > Log.d("AppLog", "Error:$e") > e.printStackTrace() > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)