Hi  Jan,

I’ll get to your question in a moment, but I just checked out the 
newZipExperiment branch and noticed that almost all of the source files have 
changed (I was expecting a relatively small diff, with only a few files 
changed). It looks like most of these differences are due to reordering the 
#includes at the top of each source file. If we’re going to do this, could we 
make it a separate commit in master, so it’s easier to see exactly what has 
changed in the zip branch?

Actually I normally intentionally put system headers after other headers in the 
project, as it helps to detect cases where a custom header depends on types 
declared in a system header, and thus for which importing that header (by 
itself) in a source file would result in compilation errors due to the missing 
references. For example DFBuffer.h has an #include <stdarg.h> at the type since 
some of the functions take the va_list data type. If one of us uses such this 
type in another header which doesn’t have #include <stdarg.h>, then any C file 
that imports it (directly or indirectly) has to remember to explicitly include 
stdarg.h (and that could be a *lot* of files, if the header is referenced from 
lots of places). So by placing the any system includes needed by the source 
file after all custom headers, we can pick up on these errors more easily.

Regarding the zip file format, I need to look up on some stuff and will get 
back to you shortly. But I suspect some of the duplication may be related to 
the fact that a zip file is meant to be read backwards. Rather than starting at 
the beginning of the file, reading begins at the end, working backwards through 
the file to find potentially multiple copies of the directory listing. This 
serves two purposes:

1) You can “modify” the contents of a zip file simply by appending (with the 
compressed content of new/changed files added, and a new directory listing 
including these files, an *not* including any files which have been “deleted”, 
i.e. masked out).

2) A zip file can be appended to the end of another file format; the most 
common example being self-extracting .exe files. Since .exe files are read from 
the beginning, the program loader on windows doesn’t care about the fact that 
there’s the trailing data at the end. And it’s still a valid zip file, since 
the .exe content at the start is ignored when reading the directory listing.

I think you may be aware of some of these details already, and there’s some 
nuances I’ve probably missed. I’m about to have a look through the code you 
currently have in the branch.

—
Dr Peter M. Kelly
[email protected]

PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

> On 1 Aug 2015, at 4:33 pm, jan i <[email protected]> wrote:
> 
> Hi
> 
> Does anybody know why zip has a mad inefficient directory structure ?
> 
> I try to understand the why, but fail.
> 
> A zip file, contains 1 global directory with information about every single
> file (flat structure, no
> sub directories, but filenames may contain a "/"). That is logical and
> expected.
> 
> BUT in front of every file, there are a local file header, with filename
> about 3/4 of the information
> from the global directory. This information seems pure redundant and
> unneeded.
> 
> What am I missing here ? on one of my test docx, the local headers are
> about 10% of the filesize (looong filenames) which could be thrown away.
> 
> Hope somebody can see what I failed to see.
> rgds
> jan i.

Reply via email to