On Wed, May 16, 2018 at 03:29:42PM -0400, Konstantin Ryabitsev wrote:

> On 05/16/18 15:23, Jeff King wrote:
> > I implemented "repack -k", which keeps all objects and just rolls them
> > into the new pack (along with any currently-loose unreachable objects).
> > Aside from corner cases (e.g., where somebody accidentally added a 20GB
> > file to an otherwise 100MB-repo and then rolled it back), it usually
> > doesn't significantly affect the repository size.
> 
> Hmm... I should read manpages more often! :)
> 
> So, do you suggest that this is a better approach:
> 
> - mother repos: "git repack -adk"
> - child repos: "git repack -Adl" (followed by prune)

Yes, that's pretty close to what we do at GitHub. Before doing any
repacking in the mother repo, we actually do the equivalent of:

  git fetch --prune ../$id.git +refs/*:refs/remotes/$id/*
  git repack -Adl

from each child to pick up any new objects to de-duplicate (our "mother"
repos are not real repos at all, but just big shared-object stores).

I say "equivalent" because those commands can actually be a bit slow. So
we do some hacky tricks like directly moving objects in the filesystem.

In theory the fetch means that it's safe to actually prune in the mother
repo, but in practice there are still races. They don't come up often,
but if you have enough repositories, they do eventually. :)

-Peff

Reply via email to