On Fri, Apr 02, 2010 at 01:16:58AM +0200, Ask Bjørn Hansen wrote: > > On Apr 1, 2010, at 16:50, Tim Bunce wrote: > > > * The need for widespread mirroring is less significant than it was in > > years past. (Also using git as the inter-mirror transport of source files > > means there'll be much less traffic between mirrors. Effectively only > > the diffs between releases.) > > The bandwidth isn't an issue -- the disk IO is.
Anyone know how much IO (stats, reads etc) it takes for a git server to know that nothing has changed when a does a fetch? > Maybe there'd be less disk IO with git if all of CPAN was in one big > repository; but there are many good reasons for it not to be. > > If we had a repository per distribution we're back to square one; more > or less. I agree that one big repro isn't the way to go. So we need one repro per distribution. Given that, we need an efficient way to communicate which of the distro repros have changed. I'm no expert with git but I wonder if submodules may help here: http://www.kernel.org/pub/software/scm/git/docs/git-submodule.html https://git.wiki.kernel.org/index.php/GitSubmoduleTutorial Imagine a cpan-all 'superproject' repro that has all the distros as submodules. This repro would be tiny when cloned because it only contains empty directories for the distos plus the metadata for where the upstream distro repro lives and what the current commit it. When a distro is updated the cpan-all repro would be updated to reference the latest version of the distro. Given its small size it could be regularly and widely sync'd. (And may prove to be a very useful thing in itself for branching and tagging etc. I see _lots_ of possibilities there.) For a cpan-git-mirror to update the individual distro submodule repros it would simply do "git submodule update". (I thought this might go and do a "git fetch" on all the submodule repros, but it doesn't. I checked.) So, for a cpan-git-mirror to update itself it only needs to do: cd cpan-all && git pull && git submodule update The git pull of cpan-all repro would be very fast as it's tiny. The git submodule update will only do anything for repros that cpan-all indicates have changed (or are new). Hopefully someone with more git foo than me can sanity check it. Assuming I'm not talking nonsense, I think this has great potential. Tim.