Thomas Wouters added the comment:
The spec isn't very explicit about it, yes, but it does say this:
4.4.16 relative offset of local header: (4 bytes)
This is the offset from the start of the first disk on
which this file appears, to where the local header should
be found.
"the start of the first disk" could be construed to mean "the start of the ZIP
archive embedded in this file". However, if you consider the information that's
available, the only way to make ZIP archives work in the face of ZIP64 and
other extensions that add information between the end-of-central-directory
record and the end of the central directory, it's obvious that you can't
correctly handle ZIP archives that start at an arbitrary point in a file.
ZIP archives have both a 4-byte magic number at the start, and a central
directory at the end. The end-of-central-directory record is the very last
thing in the file, and it records both the offset of the start of the central
directory and the size of the central directory. In absense of any ZIP
extensions that add records between the end-of-central-directory record and the
end of the central directory, you can use those to correct all offsets in the
ZIP archive. But as soon as you add (for example) ZIP64 records, this no longer
works: ZIP64 has an end-of-zip64-central-directory locator, and variable-sized
end-of-zip64-central-directory record. The locator is fixed size right before
the end-of-central-directory record and records the offset (from the start of
the file) to the end-of-zip64-central-directory record, but *not* the size of
that record or any other information you can use to determine the offset of the
start of the archive in the file.
Only by assuming the central directory record comes right before the
end-of-central-directory record, or assuming fixed sizes for the ZIP64 record,
can you deal with ZIP archives with offsets not from the start of the file.
This assumption is not only *not* guaranteed by the ZIP spec, it's explicitly
invalidated by ZIP64's variable sized records, and possibly other extensions
(like encryption, compression and digital signatures, although I don't remember
if those actually affect this).
It's true that many ZIP tools try to deal with these kinds of archives,
although they *do* realise it's wrong and they usually *do* warn about it. They
still can't deal with it if it uses variable-sized ZIP64 features (other than
trawling through the file looking for the 4-byte magic numbers).
Here's an example of code that breaks because of this:
https://github.com/Yhg1s/zipfile-hacks. I tried to convince zipfile to create
Zip64 files with extra fields (the variable-sized parts) but unfortunately the
*cough* "design" of the zipfile module doesn't allow that -- feel free to
ignore the force_zip64 parts of the script.
(I'm using two python installations I had laying around here; I could've used
2.7.12 vs 2.7.13 instead, and the results would be the same.)
# Python 2.7.12 -- so old behaviour
% python create_small_zip64.py -v --mode w --preamble '#!/usr/bin/python'
py2-preamble-w.zip create_small_zip64.py
% python create_small_zip64.py -v --mode a --preamble '#!/usr/bin/python'
py2-preamble-a.zip create_small_zip64.py
# Python 3.6.0+ -- after this change, so new behaviour
% ~/python/installs/py36-opt/bin/python3 create_small_zip64.py -v --mode w
--preamble '#!/usr/bin/python' py3-preamble-w.zip create_small_zip64.py
% ~/python/installs/py36-opt/bin/python3 create_small_zip64.py -v --mode a
--preamble '#!/usr/bin/python' py3-preamble-a.zip create_small_zip64.py
The old zipfiles are fine:
% zip -T py2-preamble-w.zip
test of py2-preamble-w.zip OK
% zip -T py2-preamble-a.zip
test of py2-preamble-a.zip OK
The new one using 'w' is also fine (as expected):
% zip -T py3-preamble-w.zip
test of py3-preamble-w.zip OK
The new one using 'a' is broken:
% zip -T py3-preamble-a.zip
warning [py3-preamble-a.zip]: 17 extra bytes at beginning or within zipfile
(attempting to process anyway)
test of py3-preamble-a.zip FAILED
zip error: Zip file invalid, could not spawn unzip, or wrong unzip (original
files unmodified)
The 'unzip' tool does work, but it also prints a warning:
% unzip -l py3-preamble-a.zip
Archive: py3-preamble-a.zip
warning [py3-preamble-a.zip]: 17 extra bytes at beginning or within zipfile
(attempting to process anyway)
Length Date Time Name
--------- ---------- ----- ----
4016 2017-05-03 14:23 create_small_zip64.py
--------- -------
4016 1 file
Whether other tools try to compensate for the error depends greatly on the
tool; there's quite a few that don't.
For the record, we had two different bits of code that created zipfiles with
preambles using mode='a', created by (at least) two different people. I don't
think it's unreasonable to assume that if you have a file with existing data
you don't want the ZipFile to overwrite, it should be using mode 'a' :P
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue29094>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com