Hi Jan, I’ll get to your question in a moment, but I just checked out the newZipExperiment branch and noticed that almost all of the source files have changed (I was expecting a relatively small diff, with only a few files changed). It looks like most of these differences are due to reordering the #includes at the top of each source file. If we’re going to do this, could we make it a separate commit in master, so it’s easier to see exactly what has changed in the zip branch?
Actually I normally intentionally put system headers after other headers in the project, as it helps to detect cases where a custom header depends on types declared in a system header, and thus for which importing that header (by itself) in a source file would result in compilation errors due to the missing references. For example DFBuffer.h has an #include <stdarg.h> at the type since some of the functions take the va_list data type. If one of us uses such this type in another header which doesn’t have #include <stdarg.h>, then any C file that imports it (directly or indirectly) has to remember to explicitly include stdarg.h (and that could be a *lot* of files, if the header is referenced from lots of places). So by placing the any system includes needed by the source file after all custom headers, we can pick up on these errors more easily. Regarding the zip file format, I need to look up on some stuff and will get back to you shortly. But I suspect some of the duplication may be related to the fact that a zip file is meant to be read backwards. Rather than starting at the beginning of the file, reading begins at the end, working backwards through the file to find potentially multiple copies of the directory listing. This serves two purposes: 1) You can “modify” the contents of a zip file simply by appending (with the compressed content of new/changed files added, and a new directory listing including these files, an *not* including any files which have been “deleted”, i.e. masked out). 2) A zip file can be appended to the end of another file format; the most common example being self-extracting .exe files. Since .exe files are read from the beginning, the program loader on windows doesn’t care about the fact that there’s the trailing data at the end. And it’s still a valid zip file, since the .exe content at the start is ignored when reading the directory listing. I think you may be aware of some of these details already, and there’s some nuances I’ve probably missed. I’m about to have a look through the code you currently have in the branch. — Dr Peter M. Kelly [email protected] PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966) > On 1 Aug 2015, at 4:33 pm, jan i <[email protected]> wrote: > > Hi > > Does anybody know why zip has a mad inefficient directory structure ? > > I try to understand the why, but fail. > > A zip file, contains 1 global directory with information about every single > file (flat structure, no > sub directories, but filenames may contain a "/"). That is logical and > expected. > > BUT in front of every file, there are a local file header, with filename > about 3/4 of the information > from the global directory. This information seems pure redundant and > unneeded. > > What am I missing here ? on one of my test docx, the local headers are > about 10% of the filesize (looong filenames) which could be thrown away. > > Hope somebody can see what I failed to see. > rgds > jan i.
