James C. Ahlstrom <ahlstro...@users.sourceforge.net> added the comment:

I received a bug report from a user.  He had a zip file created by Mac OS 
10.5.8 that the zipfile module claimed was not a valid zip file.  The traceback 
went to function _EndRecData(fpin).  The file had a valid comment appended, but 
recorded a comment length of zero.  I am posting to this thread because it 
seems related.

The zero comment length is incorrect, but the file is read without complaint by 
other software.  Note that this is not end of file garbage; the comment is 
valid.

The _EndRecData(fpin) function is reading the "End of Central Directory" 
record.  The endrec[7] is the comment length, but of more interest is endrec[6] 
which is the start of the Central Directory.  This is a file offset, and if 
valid it points to a Central Directory record.  These records start with a 
known string PK\001\002.

I propose that the correct fix is to delete the test for correct comment 
length, and replace it with a test that reads four bytes at offset endrec[6] 
and makes sure it is PK\001\002 and a valid record.  This is viewed not as a 
hack for defective software, but rather as an improved sanity check, since 
finding the Central Directory is vital to reading a zip file.  This code 
replaces the end of _EndRecData(fpin), and was taken from Python2.5 (so check 
against the relevant version):

    if start >= 0:     # Correct signature string was found
        endrec = struct.unpack(structEndArchive, data[start:start+22])
        endrec = list(endrec)
        comment = data[start+22:]

## Relevant changes here
        fpin.seek(endrec[6], 0)         # Seek to the start of the central 
directory
        dat = fpin.read(4)      # Read four bytes
        # Note: Mac OS is known to add a comment, but record the length as zero.
        if dat == stringCentralDir:     # Success
##
            # Append the archive comment and start offset
            endrec.append(comment)
            endrec.append(filesize - END_BLOCK + start)
            if endrec[-4] == -1 or endrec[-4] == 0xffffffff:
                return _EndRecData64(fpin, - END_BLOCK + start, endrec)
            return endrec
    return      # Error, return None

----------
nosy: +ahlstromjc

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue1757072>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to