Re: [darcs-devel] a measurement of using 7z instead of gz to compress patches

zooko Fri, 18 Jan 2008 12:03:17 -0800

> Well, the tar version compresses much better than zip because it is
> streaming and doesn't have an index. since you tar then gzip, it  
> doesn't
> compress each patch independently but rather compresses a single  
> stream
> containing all the patches. since patches are extremely similar, this
> results in substantial improvements over individually compressing each
> file like zip does in order to individually access them via its index.


Really?  Perhaps you were talking about "zip" [1], but I am talking  
about "7z" a.k.a. "7-zip" [2].

I decided to measure this on my allmydata.org repository.  Hopefully  
the names are self-explanatory.  The first two are ones that are  
actually implemented by darcs at this time.

$ du -sk * # So the numbers are in KiB
81592   trunk-nocompress-individual-patches
48980   trunk-gz-individual-patches
44528   trunk-7z-individual-patches
18784   trunk-tar-gz-all-at-once
10644   trunk-tar-7z-all-at-once
10600   trunk-7z-all-at-once

So basically, it would be interesting if some Haskell hacker wanted  
to make 7z archives (or else 7z-compressed tarballs) available in  
through a Haskell interface.  Depending on your filesystem and your  
usage, it might well be that the result is faster as well as more  
compact (they key to being faster would be to avoid a seek() that  
wasn't already cached by your filesystem.  Obviously tighter  
compression can help with this.).

Regards,

Zooko

[1] http://en.wikipedia.org/wiki/ZIP_(file_format)
[2] http://en.wikipedia.org/wiki/7z
_______________________________________________
darcs-devel mailing list
darcs-devel@darcs.net
http://lists.osuosl.org/mailman/listinfo/darcs-devel

Re: [darcs-devel] a measurement of using 7z instead of gz to compress patches

Reply via email to