Mikhael Goikhman <[EMAIL PROTECTED]> writes:
> % revision=archzoom--devel--0--patch-300
> % cd `tla library-find $revision`/..
> % tar cf - --exclude $revision/,,patch-set --exclude $revision/,,index \
> --exclude $revision/,,index-by-name $revision | gzip -9 >$revision.tar.gz
> % du -s --block-size=1 $revision
> % ls -s --block-size=1 $revision.tar.gz
> 3403776 archzoom--devel--0--patch-300
> 163840 archzoom--devel--0--patch-300.tar.gz
>
> The ratio is 21. There is a small, but increasing gain when compared with
> earlier revisions (18), in particular because {arch} contains a lot of
> small files that are compressed nicely. Probably better than hardlinking.
You're comparing the size of a *single* revision directory against
tar+gz. This doesn't make much sense since, by definition, the hard
link trick compresses data *across* several revisions.
> Please don't forget that a hardlink costs more than 0,
Can you elaborate on that?
> and also that for
> every merged external revision there are at least 2 more files, in {arch}
> and ,,patch-log/, and possibly new subdirs too (not hardlink-able).
Right.
> For me (and for du/rm) it is not the size, but number of inodes that is
> more important, so this very CPU expensive solution would not solve much.
There are several good papers on the topic [0,1,2]. I'm pretty
confident that hard link + gzip of individual files would yield a better
compression ratio than keeping several whole revision tarballs, *when*
several subsequent revisions are kept.
Thanks,
Ludovic.
[0] http://ssrc.cse.ucsc.edu/Papers/you-mss04.pdf
[1] http://ssrc.cse.ucsc.edu/Papers/you-icde05.pdf
[2]
http://www.usenix.org/events/usenix04/tech/general/full_papers/kulkarni/kulkarni_html/paper.html
_______________________________________________
Gnu-arch-users mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnu-arch-users
GNU arch home page:
http://savannah.gnu.org/projects/gnu-arch/