* Tim Bunce <tim.bu...@pobox.com> [2010-04-02 15:55]: > So, for a cpan-git-mirror to update itself it only needs to do: > > cd cpan-all && git pull && git submodule update > > The git pull of cpan-all repro would be very fast as it's tiny.
With 15,000(?) distributions = submodules = directories, it’s not *that* tiny. You don’t want to stuff those all in the top-level directory. It would would get a lot of churn *and* would be quite big, which would require it to be downloaded all over pretty much every time you update your mirror. Git has something like 50 bytes of overhead per directory entry aside from the name, so that would make nearly a megabyte transferred for potentially every single distribution upload. (But Git compresses a series of revisions to the same object using deltas, so in practice much of the O(n) overhead of commits only costs O(1) in storage and transfer – most of the time.) If you do a very simple F/FO/FOOBAR scheme like the CPAN already does, you still get comparatively much churn for some still rather big directories, because any change to a subdirectory causes the entire chain of objects representing the directory levels above it to also change. I don’t know if that churn is bad enough to require a different solution. If it is, then the best solution obviously would be a way for distributions to migrate within the structure according to their level of activity, so that churn would be localised to few directory objects, but I can’t think of a suitable proposal off hand so I can’t estimate how feasible such a scheme might be. > Hopefully someone with more git foo than me can sanity check > it. Assuming I'm not talking nonsense, I think this has great > potential. It would take some trickery and thought to do well, but it’s not obviously broken as designed. Regards, -- Aristotle Pagaltzis // <http://plasmasturm.org/>