[
https://issues.apache.org/jira/browse/COMPRESS-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194010#comment-17194010
]
Stefan Bodewig commented on COMPRESS-555:
-----------------------------------------
Unfortunately trying to read STORED entries that use a data descriptor is
unreliable to say the least. It is very easy to do if you can read the central
directory at the end of the archive - and thus ZipFile handles them just fine,
but reading the archive as a stream is a very different issue.
The default right now will tell you "I don't think I can handle this entry" if
you use the {{canReadEntryData}} method. The non-default option will read
forward until it finds something that looks like the signature of the next ZIP
entry. This will completely break down if the STORED entry contains such a
sequence of bytes - ZIPs in ZIPs is the primary example for this (think WARs
containing JARs for example). In recent versions we'll try to verify the
claimed size we read from what we believe to be the data descriptor matches the
length we've read, but then you are faced with an IOException for reading an
entry that the stream claimed to be able to handle.
Personally I believe the option will lead to too much confusion to enable it by
default. I prefer to have users take the deliberate choice and realize what
they are signing up for. Even better they would find a way to make the initial
stream seekable and use Zipfile rather than ZipArchiveInputStream.
> ZipArchiveInputStream should allow stored entries with data descriptor by
> default
> ---------------------------------------------------------------------------------
>
> Key: COMPRESS-555
> URL: https://issues.apache.org/jira/browse/COMPRESS-555
> Project: Commons Compress
> Issue Type: Improvement
> Components: Archivers
> Affects Versions: 1.20
> Reporter: Trevor Bentley
> Priority: Major
> Fix For: 1.21
>
>
> We are currently using tika for text extraction which uses commons-compress
> for handling zips. Currently some sites are returning zips that have entries
> with stored data descriptors which fail to extract due to the
> ZipArchiveInputStream defaulting to false for
> 'allowStoredEntriesWithDataDescriptor'.
> Allowing the reading of stored entries on Zip archives should be enabled by
> default.
> PR: https://github.com/apache/commons-compress/pull/137
--
This message was sent by Atlassian Jira
(v8.3.4#803005)