It doesn't matter.  The structure of Zip archive files is what it is and it is 
being used on the document formats that interest us.  We have no choice in the 
matter [;<).

There are profiles of Zip when it is employed as a carrier for standard 
document-file formats.  It is important to know (1) the Zip specification that 
is the basis for a standard format and (2) the profile that is used.  That 
applies to OOXML (in the OPC portion of the spec) and ODF (in Part 3 of ODF 
1.2, section 17 of ODF 1.1). It applies for ePub also.  There is also now a 
common ISO profile of Zip that is intended to provide a progression of layers 
for use in support of document-format specifications.  
 
 - Dennis

SOME BACKGROUND

The local file headers are often produced serially as the archive is built and 
are there for serial processing of the Zip on structures that do not allow 
random access into the stream.  (OPC has a level of abstraction that allows 
more-efficient streaming over networks and in cloud applications but I don't 
know how much that is exploited outside of Microsoft products.  You may find it 
interesting to know that Visual Studio employs OPC in a variety of ways in 
carrying development artifacts.)

The global directory, at the end is a cross check and, for positionable 
streams, an additional support for ensuring that the Zip has not been damaged.  
In some cases, the global directory has more information than local file 
headers, since such details might only be known after the local file stream has 
been produced (checksums for example, even the length of a stream), and the 
global forms can employ larger pointers and sizes than can be used in the local 
file headers.  The global directory might also be usable in recovery of data 
from a damaged Zip for which an intact global directory is still present.  For 
programs on modern file systems, I suspect that the global directory is used 
almost exclusively, although the local file headers are still there, and 
correct.  In fact, some programs "sniff" the first local file header of ODF 
packages to detect the "mimetype" file entry, although it is not required that 
it be the first local file header.

I find all of this intriguing, myself.  It is a challenge to provide a durable 
model that delivers an useful API above the physical Zip structure that adapts 
to available capabilities and removes concern for such details, allowing 
isolation under a better abstraction for use on behalf of a document format.

-----Original Message-----
From: jan i [mailto:[email protected]] 
Sent: Saturday, August 1, 2015 02:33
To: [email protected]
Subject: Zip madness !

Hi

Does anybody know why zip has a mad inefficient directory structure ?

I try to understand the why, but fail.

A zip file, contains 1 global directory with information about every single
file (flat structure, no
sub directories, but filenames may contain a "/"). That is logical and
expected.

BUT in front of every file, there are a local file header, with filename
about 3/4 of the information
from the global directory. This information seems pure redundant and
unneeded.

What am I missing here ? on one of my test docx, the local headers are
about 10% of the filesize (looong filenames) which could be thrown away.

Hope somebody can see what I failed to see.
rgds
jan i.

Reply via email to