On Fri, Apr 02, 2010 at 01:16:58AM +0200, Ask Bjørn Hansen wrote:
> 
> On Apr 1, 2010, at 16:50, Tim Bunce wrote:
> 
> > * The need for widespread mirroring is less significant than it was in
> > years past. (Also using git as the inter-mirror transport of source files
> > means there'll be much less traffic between mirrors. Effectively only
> > the diffs between releases.)
> 
> The bandwidth isn't an issue -- the disk IO is.

Anyone know how much IO (stats, reads etc) it takes for a git server to
know that nothing has changed when a does a fetch?

> Maybe there'd be less disk IO with git if all of CPAN was in one big
> repository; but there are many good reasons for it not to be.
> 
> If we had a repository per distribution we're back to square one; more
> or less.

I agree that one big repro isn't the way to go. So we need one repro per
distribution. Given that, we need an efficient way to communicate which
of the distro repros have changed.

I'm no expert with git but I wonder if submodules may help here:
http://www.kernel.org/pub/software/scm/git/docs/git-submodule.html
https://git.wiki.kernel.org/index.php/GitSubmoduleTutorial

Imagine a cpan-all 'superproject' repro that has all the distros as
submodules.  This repro would be tiny when cloned because it only
contains empty directories for the distos plus the metadata for where
the upstream distro repro lives and what the current commit it.
When a distro is updated the cpan-all repro would be updated
to reference the latest version of the distro.

Given its small size it could be regularly and widely sync'd.
(And may prove to be a very useful thing in itself for branching and
tagging etc. I see _lots_ of possibilities there.)

For a cpan-git-mirror to update the individual distro submodule repros
it would simply do "git submodule update". (I thought this might go and
do a "git fetch" on all the submodule repros, but it doesn't. I checked.)

So, for a cpan-git-mirror to update itself it only needs to do:

    cd cpan-all && git pull && git submodule update

The git pull of cpan-all repro would be very fast as it's tiny.
The git submodule update will only do anything for repros that
cpan-all indicates have changed (or are new).

Hopefully someone with more git foo than me can sanity check it.
Assuming I'm not talking nonsense, I think this has great potential.

Tim.

Reply via email to