Re: Zip madness !

Peter Kelly Sat, 01 Aug 2015 10:35:43 -0700

Hi Jan,

I’ve just fixed one bug I found (was causing a crash; but valgrind helped 
narrow it down) - a DFextZipDirEntry pointer was being set via incorrect 
pointer entry (see my commit to the newZipExperiment branch for details).


After fixing this I got a correct directory listing of a test document I 
created in Word - I only tested it with one file however, so it may not address 
the problem you ran into with the particular test file you mentioned.

—
Dr Peter M. Kelly
[email protected]

PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

> On 1 Aug 2015, at 10:41 pm, Peter Kelly <[email protected]> wrote:
> 
> Hi  Jan,
> 
> I’ll get to your question in a moment, but I just checked out the 
> newZipExperiment branch and noticed that almost all of the source files have 
> changed (I was expecting a relatively small diff, with only a few files 
> changed). It looks like most of these differences are due to reordering the 
> #includes at the top of each source file. If we’re going to do this, could we 
> make it a separate commit in master, so it’s easier to see exactly what has 
> changed in the zip branch?
> 
> Actually I normally intentionally put system headers after other headers in 
> the project, as it helps to detect cases where a custom header depends on 
> types declared in a system header, and thus for which importing that header 
> (by itself) in a source file would result in compilation errors due to the 
> missing references. For example DFBuffer.h has an #include <stdarg.h> at the 
> type since some of the functions take the va_list data type. If one of us 
> uses such this type in another header which doesn’t have #include <stdarg.h>, 
> then any C file that imports it (directly or indirectly) has to remember to 
> explicitly include stdarg.h (and that could be a *lot* of files, if the 
> header is referenced from lots of places). So by placing the any system 
> includes needed by the source file after all custom headers, we can pick up 
> on these errors more easily.
> 
> Regarding the zip file format, I need to look up on some stuff and will get 
> back to you shortly. But I suspect some of the duplication may be related to 
> the fact that a zip file is meant to be read backwards. Rather than starting 
> at the beginning of the file, reading begins at the end, working backwards 
> through the file to find potentially multiple copies of the directory 
> listing. This serves two purposes:
> 
> 1) You can “modify” the contents of a zip file simply by appending (with the 
> compressed content of new/changed files added, and a new directory listing 
> including these files, an *not* including any files which have been 
> “deleted”, i.e. masked out).
> 
> 2) A zip file can be appended to the end of another file format; the most 
> common example being self-extracting .exe files. Since .exe files are read 
> from the beginning, the program loader on windows doesn’t care about the fact 
> that there’s the trailing data at the end. And it’s still a valid zip file, 
> since the .exe content at the start is ignored when reading the directory 
> listing.
> 
> I think you may be aware of some of these details already, and there’s some 
> nuances I’ve probably missed. I’m about to have a look through the code you 
> currently have in the branch.
> 
> —
> Dr Peter M. Kelly
> [email protected]
> 
> PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)
> 
>> On 1 Aug 2015, at 4:33 pm, jan i <[email protected]> wrote:
>> 
>> Hi
>> 
>> Does anybody know why zip has a mad inefficient directory structure ?
>> 
>> I try to understand the why, but fail.
>> 
>> A zip file, contains 1 global directory with information about every single
>> file (flat structure, no
>> sub directories, but filenames may contain a "/"). That is logical and
>> expected.
>> 
>> BUT in front of every file, there are a local file header, with filename
>> about 3/4 of the information
>> from the global directory. This information seems pure redundant and
>> unneeded.
>> 
>> What am I missing here ? on one of my test docx, the local headers are
>> about 10% of the filesize (looong filenames) which could be thrown away.
>> 
>> Hope somebody can see what I failed to see.
>> rgds
>> jan i.
>

Re: Zip madness !

Reply via email to