On Tue, 2009-02-10 at 20:43 -0500, Nathaniel W Filardo wrote: > Incidentally, a git repository of the crawls, from 2002/1212 to 2009/0205, > is available at http://mirrors.acm.jhu.edu/trees/plan9native/ . Git gets > the data down to 165M after a gc run, so perhaps it's a better idea than a > venti-based mirror.
Where did 165M came from? The history itself seems to be only about 58M or so: $ wget http://mirrors.acm.jhu.edu/trees/plan9native/.git/objects/pack/pack-afe021812ab52f698895941f8eb5ad4e3d75020e.pack $ ls -l pack-afe021812ab52f698895941f8eb5ad4e3d75020e.pack -rw-rw-r-- 1 rs76089 staff 61039150 Feb 11 06:40 pack-afe021812ab52f698895941f8eb5ad4e3d75020e.pack And, after the following simple minded manipulations: $ git init $ git unpack-objects < pack* $ git checkout -b master 68e58814202bccfbd7186962daedd754ae76d7df warning: You appear to be on a branch yet to be born. warning: Forcing checkout of 68e58814202bccfbd7186962daedd754ae76d7df. Checking out files: 100% (14229/14229), done. Already on "master" $ git repack -ad --window 100 --depth 100 Counting objects: 39971, done. Compressing objects: 100% (39354/39354), done. Writing objects: 100% (39971/39971), done. Total 39971 (delta 25278), reused 0 (delta 0) Made it even smaller (you can fine tune it even more, based on usage requirements): $ ls -l .git/objects/pack/*.pack -r--r--r-- 1 rs76089 staff 57694396 Feb 11 11:03 .git/objects/pack/pack-afe021812ab52f698895941f8eb5ad4e3d75020e.pack > I haven't managed to make my version of Uriel's port > (thanks for the start! :) ) of git do the right thing in enough cases yet, > so the git repo may not be updated for a while, but I figured somebody might > want to play with it in the interim. The coolest things, of course, would be to have a way of running git on the bell labs end. But doing a replica and repacking everything locally is not bad at all. Thanks, Roman.