[issue1757072] Zipfile robustness
James C. Ahlstrom jahl...@gmail.com added the comment: For completeness, I checked other versions of Python. The example zip file fails in Python 3.1, but succeeds in Python 3.2.2. The patch for 3.2.2 removed the check for correct comment length, but substituted no further check for validity. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1757072 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1757072] Zipfile robustness
James C. Ahlstrom jahl...@gmail.com added the comment: Problem was reported on 2.7. I will check in detail this weekend. Please stand by. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1757072 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1757072] Zipfile robustness
James C. Ahlstrom jahl...@gmail.com added the comment: I grabbed a 2.7.2 zipfile.py, and my original comments stand. If there is a garbage at end of file patch, I can't find it; please provide a line number or a hint. The user at yale.edu reports that the patch works. Here is a diff of my changes. To test, append some junk to a good zipfile: echo junk good.zip, and try reading it. Let me know if you want me to do anything else; maybe look at 3.2; or email me offline. *** zipfile.py 2011-12-09 11:25:07.0 -0500 --- ../zipfile.py 2011-12-09 05:48:00.0 -0500 *** *** 237,248 recData = data[start:start+sizeEndCentDir] endrec = list(struct.unpack(structEndArchive, recData)) comment = data[start+sizeEndCentDir:] ! ## Remove # check that comment length is correct ! ## Remove if endrec[_ECD_COMMENT_SIZE] == len(comment): ! # check that the offset to the Central Directory points to a valid item ! fpin.seek(endrec[_ECD_OFFSET], 0) ! dat = fpin.read(4) ! if dat == stringCentralDir: # Append the archive comment and start offset endrec.append(comment) endrec.append(maxCommentStart + start) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1757072 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1757072] Zipfile robustness
James C. Ahlstrom ahlstro...@users.sourceforge.net added the comment: I received a bug report from a user. He had a zip file created by Mac OS 10.5.8 that the zipfile module claimed was not a valid zip file. The traceback went to function _EndRecData(fpin). The file had a valid comment appended, but recorded a comment length of zero. I am posting to this thread because it seems related. The zero comment length is incorrect, but the file is read without complaint by other software. Note that this is not end of file garbage; the comment is valid. The _EndRecData(fpin) function is reading the End of Central Directory record. The endrec[7] is the comment length, but of more interest is endrec[6] which is the start of the Central Directory. This is a file offset, and if valid it points to a Central Directory record. These records start with a known string PK\001\002. I propose that the correct fix is to delete the test for correct comment length, and replace it with a test that reads four bytes at offset endrec[6] and makes sure it is PK\001\002 and a valid record. This is viewed not as a hack for defective software, but rather as an improved sanity check, since finding the Central Directory is vital to reading a zip file. This code replaces the end of _EndRecData(fpin), and was taken from Python2.5 (so check against the relevant version): if start = 0: # Correct signature string was found endrec = struct.unpack(structEndArchive, data[start:start+22]) endrec = list(endrec) comment = data[start+22:] ## Relevant changes here fpin.seek(endrec[6], 0) # Seek to the start of the central directory dat = fpin.read(4) # Read four bytes # Note: Mac OS is known to add a comment, but record the length as zero. if dat == stringCentralDir: # Success ## # Append the archive comment and start offset endrec.append(comment) endrec.append(filesize - END_BLOCK + start) if endrec[-4] == -1 or endrec[-4] == 0x: return _EndRecData64(fpin, - END_BLOCK + start, endrec) return endrec return # Error, return None -- nosy: +ahlstromjc ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1757072 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com