Thomas Wouters added the comment:

The spec isn't very explicit about it, yes, but it does say this:

4.4.16 relative offset of local header: (4 bytes)

       This is the offset from the start of the first disk on
       which this file appears, to where the local header should
       be found.

"the start of the first disk" could be construed to mean "the start of the ZIP 
archive embedded in this file". However, if you consider the information that's 
available, the only way to make ZIP archives work in the face of ZIP64 and 
other extensions that add information between the end-of-central-directory 
record and the end of the central directory, it's obvious that you can't 
correctly handle ZIP archives that start at an arbitrary point in a file.

ZIP archives have both a 4-byte magic number at the start, and a central 
directory at the end. The end-of-central-directory record is the very last 
thing in the file, and it records both the offset of the start of the central 
directory and the size of the central directory. In absense of any ZIP 
extensions that add records between the end-of-central-directory record and the 
end of the central directory, you can use those to correct all offsets in the 
ZIP archive. But as soon as you add (for example) ZIP64 records, this no longer 
works: ZIP64 has an end-of-zip64-central-directory locator, and variable-sized 
end-of-zip64-central-directory record. The locator is fixed size right before 
the end-of-central-directory record and records the offset (from the start of 
the file) to the end-of-zip64-central-directory record, but *not* the size of 
that record or any other information you can use to determine the offset of the 
start of the archive in the file.

Only by assuming the central directory record comes right before the 
end-of-central-directory record, or assuming fixed sizes for the ZIP64 record, 
can you deal with ZIP archives with offsets not from the start of the file. 
This assumption is not only *not* guaranteed by the ZIP spec, it's explicitly 
invalidated by ZIP64's variable sized records, and possibly other extensions 
(like encryption, compression and digital signatures, although I don't remember 
if those actually affect this).

It's true that many ZIP tools try to deal with these kinds of archives, 
although they *do* realise it's wrong and they usually *do* warn about it. They 
still can't deal with it if it uses variable-sized ZIP64 features (other than 
trawling through the file looking for the 4-byte magic numbers).

Here's an example of code that breaks because of this: 
https://github.com/Yhg1s/zipfile-hacks. I tried to convince zipfile to create 
Zip64 files with extra fields (the variable-sized parts) but unfortunately the 
*cough* "design" of the zipfile module doesn't allow that -- feel free to 
ignore the force_zip64 parts of the script.

(I'm using two python installations I had laying around here; I could've used 
2.7.12 vs 2.7.13 instead, and the results would be the same.)

# Python 2.7.12 -- so old behaviour
% python create_small_zip64.py -v --mode w --preamble '#!/usr/bin/python' 
py2-preamble-w.zip create_small_zip64.py
% python create_small_zip64.py -v --mode a --preamble '#!/usr/bin/python' 
py2-preamble-a.zip create_small_zip64.py

# Python 3.6.0+ -- after this change, so new behaviour
% ~/python/installs/py36-opt/bin/python3 create_small_zip64.py -v --mode w 
--preamble '#!/usr/bin/python' py3-preamble-w.zip create_small_zip64.py
% ~/python/installs/py36-opt/bin/python3 create_small_zip64.py -v --mode a 
--preamble '#!/usr/bin/python' py3-preamble-a.zip create_small_zip64.py

The old zipfiles are fine:
% zip -T py2-preamble-w.zip
test of py2-preamble-w.zip OK
% zip -T py2-preamble-a.zip
test of py2-preamble-a.zip OK

The new one using 'w' is also fine (as expected):
% zip -T py3-preamble-w.zip
test of py3-preamble-w.zip OK

The new one using 'a' is broken:
% zip -T py3-preamble-a.zip
warning [py3-preamble-a.zip]:  17 extra bytes at beginning or within zipfile
  (attempting to process anyway)
test of py3-preamble-a.zip FAILED

zip error: Zip file invalid, could not spawn unzip, or wrong unzip (original 
files unmodified)

The 'unzip' tool does work, but it also prints a warning:
% unzip -l py3-preamble-a.zip
Archive:  py3-preamble-a.zip
warning [py3-preamble-a.zip]:  17 extra bytes at beginning or within zipfile
  (attempting to process anyway)
  Length      Date    Time    Name
---------  ---------- -----   ----
     4016  2017-05-03 14:23   create_small_zip64.py
---------                     -------
     4016                     1 file

Whether other tools try to compensate for the error depends greatly on the 
tool; there's quite a few that don't.

For the record, we had two different bits of code that created zipfiles with 
preambles using mode='a', created by (at least) two different people. I don't 
think it's unreasonable to assume that if you have a file with existing data 
you don't want the ZipFile to overwrite, it should be using mode 'a' :P

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29094>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to