Re: [9fans] Plan 9 source history (was: Re: source browsing via http is back)
On Wed, Feb 11, 2009 at 07:07:48PM +0100, Uriel wrote: Oh, glad that somebody found my partial git port useful, I might give it another push some time. Having a git/hg repo of the plan9 history is something I have been thinking about for a while, really cool that you got something going already. Will you provide a standard git web interface (and a 'native' git interface for more efficient cloning)? http://acm.jhu.edu/git/plan9 is a git web interface, git://acm.jhu.edu/git/plan9 is a native git interface. -- vs
Re: [9fans] Plan 9 source history (was: Re: source browsing via http is back)
Oh, glad that somebody found my partial git port useful, I might give it another push some time. Having a git/hg repo of the plan9 history is something I have been thinking about for a while, really cool that you got something going already. Will you provide a standard git web interface (and a 'native' git interface for more efficient cloning)? Peace uriel On Wed, Feb 11, 2009 at 2:43 AM, Nathaniel W Filardo n...@cs.jhu.edu wrote: On Tue, Feb 10, 2009 at 02:45:43PM -0800, Roman V. Shaposhnik wrote: On Tue, 2009-02-10 at 17:28 -0500, erik quanstrom wrote: what leads you to beleve that that amount of sharing will be significant? Just a hunch so far. I don't have hard data to prove anything. On the other hand, I'd be surprised if massive updates (not pulling in a couple of months) didn't benefit from the sharing. Thanks, Roman. I have mirrored, with vac -f, every sources dump from 2002 to yesterday with -e acme/acid/386 -e acme/acid/alpha -e acme/acid/arm \ -e acme/acid/mips -e acme/acid/power -e acme/bin/386 \ -e acme/bin/alpha -e acme/bin/arm -e acme/bin/mips \ -e acme/bin/power -e acme/mail/386 -e acme/mail/alpha \ -e acme/mail/arm -e acme/mail/mips -e acme/mail/power \ -e sys/man/vol1.ps -e sys/man/vol1.ps.gz -e sys/man/vol1.pdf \ LICENSE* NOTICE acme lib rc sys ; intending to get all the source and not the binaries. I patched my vac to ignore atimes (replacing the vac metadata field with the mtime) to increase metadata block sharing. As of 2009/0205 (a convenient snapshot to du), this represents about 140.7 MB of data per dump. The entire copy takes 550 MB (240 MB actual storage in Venti). (With no sharing whatsoever, this would be approx. 310 GB.) I would like to re-archive this with the Rabin fingerprinting vac for comparison. (In case anybody wants to rush out and recreate the results, it took roughly 10 to 15 minutes per dump to dispatch all the Tstat requests to sources.) Incidentally, a git repository of the crawls, from 2002/1212 to 2009/0205, is available at http://mirrors.acm.jhu.edu/trees/plan9native/ . Git gets the data down to 165M after a gc run, so perhaps it's a better idea than a venti-based mirror. I haven't managed to make my version of Uriel's port (thanks for the start! :) ) of git do the right thing in enough cases yet, so the git repo may not be updated for a while, but I figured somebody might want to play with it in the interim. --nwf;
Re: [9fans] Plan 9 source history (was: Re: source browsing via http is back)
On Wed, Feb 11, 2009 at 07:07:48PM +0100, Uriel wrote: Oh, glad that somebody found my partial git port useful, I might give it another push some time. Having a git/hg repo of the plan9 history is something I have been thinking about for a while, really cool that you got something going already. Will you provide a standard git web interface (and a 'native' git interface for more efficient cloning)? We'll have a git web interface up pretty soon, within this week. -- vs
Re: [9fans] Plan 9 source history (was: Re: source browsing via http is back)
On Wed, 2009-02-11 at 13:19 -0500, Venkatesh Srinivas wrote: On Wed, Feb 11, 2009 at 07:07:48PM +0100, Uriel wrote: Oh, glad that somebody found my partial git port useful, I might give it another push some time. Having a git/hg repo of the plan9 history is something I have been thinking about for a while, really cool that you got something going already. Will you provide a standard git web interface (and a 'native' git interface for more efficient cloning)? We'll have a git web interface up pretty soon, within this week. Since its a more or less r/o Git repo, why not also provide a mirror on one of these guys: http://github.com/ http://repo.or.cz/ http://gitorious.org/ Not only would it reduce the stress on your servers, but it'll also enable some of the source-browsing features that these sites implement. I can set things up myself, as long as you give me *some* kind of access to your Git repo. Thanks, Roman.
Re: [9fans] Plan 9 source history (was: Re: source browsing via http is back)
On Wed, Feb 11, 2009 at 10:35:33AM -0800, Roman V. Shaposhnik wrote: Since its a more or less r/o Git repo, why not also provide a mirror on one of these guys: http://github.com/ http://repo.or.cz/ http://gitorious.org/ I don't want to make it even that official yet, in case somebody suggests changes, e.g. that I've missed files in my crawls (it occurred to me that I've missed the contents of dist/, if that bothers anybody). Revising history like this will break git's history, naturally, and so probably merits throwing away repos and restarting. But eventually, it does seem like a good idea. --nwf; pgpujmAe7s53Y.pgp Description: PGP signature
Re: [9fans] Plan 9 source history (was: Re: source browsing via http is back)
On Tue, 2009-02-10 at 20:43 -0500, Nathaniel W Filardo wrote: Incidentally, a git repository of the crawls, from 2002/1212 to 2009/0205, is available at http://mirrors.acm.jhu.edu/trees/plan9native/ . Git gets the data down to 165M after a gc run, so perhaps it's a better idea than a venti-based mirror. Where did 165M came from? The history itself seems to be only about 58M or so: $ wget http://mirrors.acm.jhu.edu/trees/plan9native/.git/objects/pack/pack-afe021812ab52f698895941f8eb5ad4e3d75020e.pack $ ls -l pack-afe021812ab52f698895941f8eb5ad4e3d75020e.pack -rw-rw-r-- 1 rs76089 staff61039150 Feb 11 06:40 pack-afe021812ab52f698895941f8eb5ad4e3d75020e.pack And, after the following simple minded manipulations: $ git init $ git unpack-objects pack* $ git checkout -b master 68e58814202bccfbd7186962daedd754ae76d7df warning: You appear to be on a branch yet to be born. warning: Forcing checkout of 68e58814202bccfbd7186962daedd754ae76d7df. Checking out files: 100% (14229/14229), done. Already on master $ git repack -ad --window 100 --depth 100 Counting objects: 39971, done. Compressing objects: 100% (39354/39354), done. Writing objects: 100% (39971/39971), done. Total 39971 (delta 25278), reused 0 (delta 0) Made it even smaller (you can fine tune it even more, based on usage requirements): $ ls -l .git/objects/pack/*.pack -r--r--r-- 1 rs76089 staff57694396 Feb 11 11:03 .git/objects/pack/pack-afe021812ab52f698895941f8eb5ad4e3d75020e.pack I haven't managed to make my version of Uriel's port (thanks for the start! :) ) of git do the right thing in enough cases yet, so the git repo may not be updated for a while, but I figured somebody might want to play with it in the interim. The coolest things, of course, would be to have a way of running git on the bell labs end. But doing a replica and repacking everything locally is not bad at all. Thanks, Roman.
[9fans] Plan 9 source history (was: Re: source browsing via http is back)
On Tue, Feb 10, 2009 at 02:45:43PM -0800, Roman V. Shaposhnik wrote: On Tue, 2009-02-10 at 17:28 -0500, erik quanstrom wrote: what leads you to beleve that that amount of sharing will be significant? Just a hunch so far. I don't have hard data to prove anything. On the other hand, I'd be surprised if massive updates (not pulling in a couple of months) didn't benefit from the sharing. Thanks, Roman. I have mirrored, with vac -f, every sources dump from 2002 to yesterday with -e acme/acid/386 -e acme/acid/alpha -e acme/acid/arm \ -e acme/acid/mips -e acme/acid/power -e acme/bin/386 \ -e acme/bin/alpha -e acme/bin/arm -e acme/bin/mips \ -e acme/bin/power -e acme/mail/386 -e acme/mail/alpha \ -e acme/mail/arm -e acme/mail/mips -e acme/mail/power \ -e sys/man/vol1.ps -e sys/man/vol1.ps.gz -e sys/man/vol1.pdf \ LICENSE* NOTICE acme lib rc sys ; intending to get all the source and not the binaries. I patched my vac to ignore atimes (replacing the vac metadata field with the mtime) to increase metadata block sharing. As of 2009/0205 (a convenient snapshot to du), this represents about 140.7 MB of data per dump. The entire copy takes 550 MB (240 MB actual storage in Venti). (With no sharing whatsoever, this would be approx. 310 GB.) I would like to re-archive this with the Rabin fingerprinting vac for comparison. (In case anybody wants to rush out and recreate the results, it took roughly 10 to 15 minutes per dump to dispatch all the Tstat requests to sources.) Incidentally, a git repository of the crawls, from 2002/1212 to 2009/0205, is available at http://mirrors.acm.jhu.edu/trees/plan9native/ . Git gets the data down to 165M after a gc run, so perhaps it's a better idea than a venti-based mirror. I haven't managed to make my version of Uriel's port (thanks for the start! :) ) of git do the right thing in enough cases yet, so the git repo may not be updated for a while, but I figured somebody might want to play with it in the interim. --nwf; pgp70dn2xgB8F.pgp Description: PGP signature
Re: [9fans] Plan 9 source history (was: Re: source browsing via http is back)
(240 MB actual storage in Venti). (With no sharing whatsoever, this would be approx. 310 GB.) I would like to re-archive this with the Rabin fingerprinting vac for comparison. by no sharing do you mean if each file tree were stored in a seperate fs, or do you mean that the original fs + one copy of each file each time it has changed is 310GB? i'm guessing the former? - erik