On Wed, May 16, 2018 at 01:40:56PM -0600, Martin Fick wrote:

> > In theory the fetch means that it's safe to actually prune
> > in the mother repo, but in practice there are still
> > races. They don't come up often, but if you have enough
> > repositories, they do eventually. :)
> 
> Peff,
> 
> I would be very curious to hear what you think of this 
> approach to mitigating the effect of those races?
> 
> https://git.eclipse.org/r/c/122288/2

The crux of the problem is that we have no way to atomically mark an
object as "I am using this -- do not delete" with respect to the actual
deletion. 

So if I'm reading your approach correctly, you put objects into a
purgatory rather than delete them, and let some operations rescue them
from purgatory if we had a race.  That's certainly a direction we've
considered, but I think there are some open questions, like:

  1. When do you rescue from purgatory? Any time the object is
     referenced? Do you then pull in all of its reachable objects too?

  2. How do you decide when to drop an object from purgatory? And
     specifically, how do you avoid racing with somebody using the
     object as you're pruning purgatory?

  3. How do you know that an operation has been run that will actually
     rescue the object, as opposed to silently having a corrupted state
     on disk?

     E.g., imagine this sequence:

       a. git-prune computes reachability and finds that commit X is
          ready to be pruned

       b. another process sees that commit X exists and builds a commit
          that references it as a parent

       c. git-prune drops the object into purgatory

     Now we have a corrupt state created by the process in (b), since we
     have a reachable object in purgatory. But what if nobody goes back
     and tries to read those commits in the meantime?

I think this might be solvable by using the purgatory as a kind of
"lock", where prune does something like:

  1. compute reachability

  2. move candidate objects into purgatory; nobody can look into
     purgatory except us

  3. compute reachability _again_, making sure that no purgatory objects
     are used (if so, rollback the deletion and try again)

But even that's not quite there, because you need to have some
consistent atomic view of what's "used". Just checking refs isn't
enough, because some other process may be planning to reference a
purgatory object but not yet have updated the ref. So you need some
atomic way of saying "I am interested in using this object".

-Peff

Reply via email to