On Tue, 2009-02-10 at 20:43 -0500, Nathaniel W Filardo wrote:
> Incidentally, a git repository of the crawls, from 2002/1212 to 2009/0205,
> is available at http://mirrors.acm.jhu.edu/trees/plan9native/ .  Git gets
> the data down to 165M after a gc run, so perhaps it's a better idea than a
> venti-based mirror. 

Where did 165M came from? The history itself seems to be only about 58M
or so:
  $ wget 
  $ ls -l pack-afe021812ab52f698895941f8eb5ad4e3d75020e.pack 
  -rw-rw-r--   1 rs76089  staff    61039150 Feb 11 06:40 

And, after the following simple minded manipulations:
  $ git init
  $ git unpack-objects < pack*
  $ git checkout -b master 68e58814202bccfbd7186962daedd754ae76d7df
  warning: You appear to be on a branch yet to be born.
  warning: Forcing checkout of 68e58814202bccfbd7186962daedd754ae76d7df.
  Checking out files: 100% (14229/14229), done.
  Already on "master"
  $ git repack -ad --window 100 --depth 100
  Counting objects: 39971, done.
  Compressing objects: 100% (39354/39354), done.
  Writing objects: 100% (39971/39971), done.
  Total 39971 (delta 25278), reused 0 (delta 0)

Made it even smaller (you can fine tune it even more, based on
usage requirements):
  $ ls -l .git/objects/pack/*.pack                                            
  -r--r--r--   1 rs76089  staff    57694396 Feb 11 11:03 

>  I haven't managed to make my version of Uriel's port
> (thanks for the start! :) ) of git do the right thing in enough cases yet,
> so the git repo may not be updated for a while, but I figured somebody might
> want to play with it in the interim.

The coolest things, of course, would be to have a way of running git on
the bell labs end. But doing a replica and repacking everything locally
is not bad at all.


Reply via email to