It doesn't matter. The structure of Zip archive files is what it is and it is being used on the document formats that interest us. We have no choice in the matter [;<).
There are profiles of Zip when it is employed as a carrier for standard document-file formats. It is important to know (1) the Zip specification that is the basis for a standard format and (2) the profile that is used. That applies to OOXML (in the OPC portion of the spec) and ODF (in Part 3 of ODF 1.2, section 17 of ODF 1.1). It applies for ePub also. There is also now a common ISO profile of Zip that is intended to provide a progression of layers for use in support of document-format specifications. - Dennis SOME BACKGROUND The local file headers are often produced serially as the archive is built and are there for serial processing of the Zip on structures that do not allow random access into the stream. (OPC has a level of abstraction that allows more-efficient streaming over networks and in cloud applications but I don't know how much that is exploited outside of Microsoft products. You may find it interesting to know that Visual Studio employs OPC in a variety of ways in carrying development artifacts.) The global directory, at the end is a cross check and, for positionable streams, an additional support for ensuring that the Zip has not been damaged. In some cases, the global directory has more information than local file headers, since such details might only be known after the local file stream has been produced (checksums for example, even the length of a stream), and the global forms can employ larger pointers and sizes than can be used in the local file headers. The global directory might also be usable in recovery of data from a damaged Zip for which an intact global directory is still present. For programs on modern file systems, I suspect that the global directory is used almost exclusively, although the local file headers are still there, and correct. In fact, some programs "sniff" the first local file header of ODF packages to detect the "mimetype" file entry, although it is not required that it be the first local file header. I find all of this intriguing, myself. It is a challenge to provide a durable model that delivers an useful API above the physical Zip structure that adapts to available capabilities and removes concern for such details, allowing isolation under a better abstraction for use on behalf of a document format. -----Original Message----- From: jan i [mailto:[email protected]] Sent: Saturday, August 1, 2015 02:33 To: [email protected] Subject: Zip madness ! Hi Does anybody know why zip has a mad inefficient directory structure ? I try to understand the why, but fail. A zip file, contains 1 global directory with information about every single file (flat structure, no sub directories, but filenames may contain a "/"). That is logical and expected. BUT in front of every file, there are a local file header, with filename about 3/4 of the information from the global directory. This information seems pure redundant and unneeded. What am I missing here ? on one of my test docx, the local headers are about 10% of the filesize (looong filenames) which could be thrown away. Hope somebody can see what I failed to see. rgds jan i.
