* Tim Bunce <tim.bu...@pobox.com> [2010-04-02 15:55]:
> So, for a cpan-git-mirror to update itself it only needs to do:
>
>     cd cpan-all && git pull && git submodule update
>
> The git pull of cpan-all repro would be very fast as it's tiny.

With 15,000(?) distributions = submodules = directories, it’s not
*that* tiny.

You don’t want to stuff those all in the top-level directory. It
would would get a lot of churn *and* would be quite big, which
would require it to be downloaded all over pretty much every time
you update your mirror. Git has something like 50 bytes of
overhead per directory entry aside from the name, so that would
make nearly a megabyte transferred for potentially every single
distribution upload. (But Git compresses a series of revisions to
the same object using deltas, so in practice much of the O(n)
overhead of commits only costs O(1) in storage and transfer
– most of the time.)

If you do a very simple F/FO/FOOBAR scheme like the CPAN already
does, you still get comparatively much churn for some still
rather big directories, because any change to a subdirectory
causes the entire chain of objects representing the directory
levels above it to also change. I don’t know if that churn is
bad enough to require a different solution.

If it is, then the best solution obviously would be a way for
distributions to migrate within the structure according to their
level of activity, so that churn would be localised to few
directory objects, but I can’t think of a suitable proposal off
hand so I can’t estimate how feasible such a scheme might be.

> Hopefully someone with more git foo than me can sanity check
> it. Assuming I'm not talking nonsense, I think this has great
> potential.

It would take some trickery and thought to do well, but it’s not
obviously broken as designed.

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

Reply via email to