Re: [9fans] Plan 9 source history (was: Re: source browsing via http is back)

2009-02-12 Thread Venkatesh Srinivas

On Wed, Feb 11, 2009 at 07:07:48PM +0100, Uriel wrote:

Oh, glad that somebody found my partial git port useful, I might give
it another push some time.

Having a git/hg repo of the plan9 history is something I have been
thinking about for a while, really cool that you got something going
already.

Will you provide a standard git web interface (and a 'native' git
interface for more efficient cloning)?


http://acm.jhu.edu/git/plan9 is a git web interface, git://acm.jhu.edu/git/plan9
is a native git interface.

-- vs



Re: [9fans] Plan 9 source history (was: Re: source browsing via http is back)

2009-02-11 Thread Uriel
Oh, glad that somebody found my partial git port useful, I might give
it another push some time.

Having a git/hg repo of the plan9 history is something I have been
thinking about for a while, really cool that you got something going
already.

Will you provide a standard git web interface (and a 'native' git
interface for more efficient cloning)?

Peace

uriel

On Wed, Feb 11, 2009 at 2:43 AM, Nathaniel W Filardo n...@cs.jhu.edu wrote:
 On Tue, Feb 10, 2009 at 02:45:43PM -0800, Roman V. Shaposhnik wrote:
 On Tue, 2009-02-10 at 17:28 -0500, erik quanstrom wrote:
  what leads you to beleve that that amount of sharing will be
  significant?

 Just a hunch so far. I don't have hard data to prove anything.
 On the other hand, I'd be surprised if massive updates (not pulling
 in a couple of months) didn't benefit from the sharing.

 Thanks,
 Roman.

 I have mirrored, with vac -f, every sources dump from 2002 to
 yesterday with
  -e acme/acid/386 -e acme/acid/alpha -e acme/acid/arm \
  -e acme/acid/mips -e acme/acid/power -e acme/bin/386 \
  -e acme/bin/alpha -e acme/bin/arm -e acme/bin/mips \
  -e acme/bin/power -e acme/mail/386 -e acme/mail/alpha \
  -e acme/mail/arm -e acme/mail/mips -e acme/mail/power \
  -e sys/man/vol1.ps -e sys/man/vol1.ps.gz -e sys/man/vol1.pdf \
  LICENSE* NOTICE acme lib rc sys ;
 intending to get all the source and not the binaries.  I patched my vac to
 ignore atimes (replacing the vac metadata field with the mtime) to increase
 metadata block sharing.  As of 2009/0205 (a convenient snapshot to du), this
 represents about 140.7 MB of data per dump.  The entire copy takes 550 MB
 (240 MB actual storage in Venti).  (With no sharing whatsoever, this would
 be approx. 310 GB.)  I would like to re-archive this with the Rabin
 fingerprinting vac for comparison.

 (In case anybody wants to rush out and recreate the results, it took
 roughly 10 to 15 minutes per dump to dispatch all the Tstat requests to
 sources.)

 Incidentally, a git repository of the crawls, from 2002/1212 to 2009/0205,
 is available at http://mirrors.acm.jhu.edu/trees/plan9native/ .  Git gets
 the data down to 165M after a gc run, so perhaps it's a better idea than a
 venti-based mirror.  I haven't managed to make my version of Uriel's port
 (thanks for the start! :) ) of git do the right thing in enough cases yet,
 so the git repo may not be updated for a while, but I figured somebody might
 want to play with it in the interim.

 --nwf;




Re: [9fans] Plan 9 source history (was: Re: source browsing via http is back)

2009-02-11 Thread Venkatesh Srinivas

On Wed, Feb 11, 2009 at 07:07:48PM +0100, Uriel wrote:

Oh, glad that somebody found my partial git port useful, I might give
it another push some time.

Having a git/hg repo of the plan9 history is something I have been
thinking about for a while, really cool that you got something going
already.

Will you provide a standard git web interface (and a 'native' git
interface for more efficient cloning)?



We'll have a git web interface up pretty soon, within this week.

-- vs



Re: [9fans] Plan 9 source history (was: Re: source browsing via http is back)

2009-02-11 Thread Roman V. Shaposhnik
On Wed, 2009-02-11 at 13:19 -0500, Venkatesh Srinivas wrote:
 On Wed, Feb 11, 2009 at 07:07:48PM +0100, Uriel wrote:
 Oh, glad that somebody found my partial git port useful, I might give
 it another push some time.
 
 Having a git/hg repo of the plan9 history is something I have been
 thinking about for a while, really cool that you got something going
 already.
 
 Will you provide a standard git web interface (and a 'native' git
 interface for more efficient cloning)?
 
 
 We'll have a git web interface up pretty soon, within this week.

Since its a more or less r/o Git repo, why not also provide a mirror on
one of these guys:
http://github.com/
http://repo.or.cz/
http://gitorious.org/

Not only would it reduce the stress on your servers, but it'll also 
enable some of the source-browsing features that these sites
implement.

I can set things up myself, as long as you give me *some* kind of
access to your Git repo.

Thanks,
Roman.




Re: [9fans] Plan 9 source history (was: Re: source browsing via http is back)

2009-02-11 Thread Nathaniel W Filardo
On Wed, Feb 11, 2009 at 10:35:33AM -0800, Roman V. Shaposhnik wrote:
 Since its a more or less r/o Git repo, why not also provide a mirror on
 one of these guys:
 http://github.com/
 http://repo.or.cz/
 http://gitorious.org/

I don't want to make it even that official yet, in case somebody suggests
changes, e.g. that I've missed files in my crawls (it occurred to me that
I've missed the contents of dist/, if that bothers anybody).  Revising
history like this will break git's history, naturally, and so probably
merits throwing away repos and restarting.  But eventually, it does seem
like a good idea.

--nwf;


pgpujmAe7s53Y.pgp
Description: PGP signature


Re: [9fans] Plan 9 source history (was: Re: source browsing via http is back)

2009-02-11 Thread Roman V. Shaposhnik
On Tue, 2009-02-10 at 20:43 -0500, Nathaniel W Filardo wrote:
 Incidentally, a git repository of the crawls, from 2002/1212 to 2009/0205,
 is available at http://mirrors.acm.jhu.edu/trees/plan9native/ .  Git gets
 the data down to 165M after a gc run, so perhaps it's a better idea than a
 venti-based mirror. 

Where did 165M came from? The history itself seems to be only about 58M
or so:
  $ wget 
http://mirrors.acm.jhu.edu/trees/plan9native/.git/objects/pack/pack-afe021812ab52f698895941f8eb5ad4e3d75020e.pack
  $ ls -l pack-afe021812ab52f698895941f8eb5ad4e3d75020e.pack 
  -rw-rw-r--   1 rs76089  staff61039150 Feb 11 06:40 
pack-afe021812ab52f698895941f8eb5ad4e3d75020e.pack

And, after the following simple minded manipulations:
  $ git init
  $ git unpack-objects  pack*
  $ git checkout -b master 68e58814202bccfbd7186962daedd754ae76d7df
  warning: You appear to be on a branch yet to be born.
  warning: Forcing checkout of 68e58814202bccfbd7186962daedd754ae76d7df.
  Checking out files: 100% (14229/14229), done.
  Already on master
  $ git repack -ad --window 100 --depth 100
  Counting objects: 39971, done.
  Compressing objects: 100% (39354/39354), done.
  Writing objects: 100% (39971/39971), done.
  Total 39971 (delta 25278), reused 0 (delta 0)

Made it even smaller (you can fine tune it even more, based on
usage requirements):
  $ ls -l .git/objects/pack/*.pack
  -r--r--r--   1 rs76089  staff57694396 Feb 11 11:03 
.git/objects/pack/pack-afe021812ab52f698895941f8eb5ad4e3d75020e.pack

  I haven't managed to make my version of Uriel's port
 (thanks for the start! :) ) of git do the right thing in enough cases yet,
 so the git repo may not be updated for a while, but I figured somebody might
 want to play with it in the interim.

The coolest things, of course, would be to have a way of running git on
the bell labs end. But doing a replica and repacking everything locally
is not bad at all.

Thanks,
Roman.




[9fans] Plan 9 source history (was: Re: source browsing via http is back)

2009-02-10 Thread Nathaniel W Filardo
On Tue, Feb 10, 2009 at 02:45:43PM -0800, Roman V. Shaposhnik wrote:
 On Tue, 2009-02-10 at 17:28 -0500, erik quanstrom wrote:
  what leads you to beleve that that amount of sharing will be
  significant?
 
 Just a hunch so far. I don't have hard data to prove anything.
 On the other hand, I'd be surprised if massive updates (not pulling
 in a couple of months) didn't benefit from the sharing.
 
 Thanks,
 Roman.

I have mirrored, with vac -f, every sources dump from 2002 to
yesterday with 
  -e acme/acid/386 -e acme/acid/alpha -e acme/acid/arm \
  -e acme/acid/mips -e acme/acid/power -e acme/bin/386 \
  -e acme/bin/alpha -e acme/bin/arm -e acme/bin/mips \
  -e acme/bin/power -e acme/mail/386 -e acme/mail/alpha \
  -e acme/mail/arm -e acme/mail/mips -e acme/mail/power \
  -e sys/man/vol1.ps -e sys/man/vol1.ps.gz -e sys/man/vol1.pdf \
  LICENSE* NOTICE acme lib rc sys ;
intending to get all the source and not the binaries.  I patched my vac to
ignore atimes (replacing the vac metadata field with the mtime) to increase
metadata block sharing.  As of 2009/0205 (a convenient snapshot to du), this
represents about 140.7 MB of data per dump.  The entire copy takes 550 MB
(240 MB actual storage in Venti).  (With no sharing whatsoever, this would
be approx. 310 GB.)  I would like to re-archive this with the Rabin
fingerprinting vac for comparison.

(In case anybody wants to rush out and recreate the results, it took
roughly 10 to 15 minutes per dump to dispatch all the Tstat requests to
sources.)

Incidentally, a git repository of the crawls, from 2002/1212 to 2009/0205,
is available at http://mirrors.acm.jhu.edu/trees/plan9native/ .  Git gets
the data down to 165M after a gc run, so perhaps it's a better idea than a
venti-based mirror.  I haven't managed to make my version of Uriel's port
(thanks for the start! :) ) of git do the right thing in enough cases yet,
so the git repo may not be updated for a while, but I figured somebody might
want to play with it in the interim.

--nwf;


pgp70dn2xgB8F.pgp
Description: PGP signature


Re: [9fans] Plan 9 source history (was: Re: source browsing via http is back)

2009-02-10 Thread erik quanstrom
 (240 MB actual storage in Venti).  (With no sharing whatsoever, this would
 be approx. 310 GB.)  I would like to re-archive this with the Rabin
 fingerprinting vac for comparison.

by no sharing do you mean if each file tree were stored in a
seperate fs, or do you mean that the original fs + one copy of
each file each time it has changed is 310GB?  i'm guessing the
former?

- erik