Cannot Read Winzip Archives With Unicode Extra Fields
-----------------------------------------------------

                 Key: COMPRESS-164
                 URL: https://issues.apache.org/jira/browse/COMPRESS-164
             Project: Commons Compress
          Issue Type: Bug
          Components: Archivers
    Affects Versions: 1.3
         Environment: Windows 7, Oracle JDK 6
            Reporter: Volker Leidl


I have a zip file created with WinZip containing Unicode extra fields. Upon 
attempting to extract it with 
org.apache.commons.compress.archivers.zip.ZipFile, ZipFile.getInputStream() 
returns null for ZipArchiveEntries previously retrieved with ZipFile.getEntry() 
or even ZipFile.getEntries(). See UTF8ZipFilesTest.patch in the attachments for 
a test case exposing the bug. The original test case stopped short of trying to 
read the entries, that's why this wasn't flagged up before. 

The problem lies in the fact that inside ZipFile.java entries are stored in a 
HashMap. However, at one point after populating the HashMap, the unicode extra 
fields are read, which leads to a change of the ZipArchiveEntry name, and 
therefore a change of its hash code. Because of this, subsequent gets on the 
HashMap fail to retrieve the original values.

ZipFile.patch contains an (admittedly simple-minded) fix for this problem by 
reconstructing the entries HashMap after the Unicode extra fields have been 
parsed. The purpose of this patch is mainly to show that the problem is indeed 
what I think, rather than providing a well-designed solution.

The patches have been tested against revision 1210416.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to