On Sun, May 19, 2013 at 10:26 PM, Michael Haggerty <mhag...@alum.mit.edu> wrote:
> Recently a number of race conditions related to references have been
> discovered.  There is likely to be a two-pronged solution to the
> races:
>
> * For traditional, filesystem-based references, there will have to be
>   more checks that the ref caches are still up-to-date at the time of
>   their use (see, for example, [1]).  If not, the ref cache will have
>   to be invalidated and reloaded.  Assuming that the invalidation of
>   the old cache includes freeing its memory, such an invalidation will
>   cause lots of refname strings to be freed even though callers might
>   still hold references to them.
>
> * For server-class installations, filesystem-based references might
>   not be robust enough for 100% reliable operation, because the
>   reading of the complete set of references is not an atomic
>   operation.  If another reference storage mechanism is developed,
>   there is no reason to expect the refnames strings to have long
>   lifetimes.

(Sorry for going slightly off-topic and returning to the general
discussion on how to resolve the race conditions...)

For server-class installations we need ref storage that can be read
(and updated?) atomically, and the current system of loose + packed
files won't work since reading (and updating) more than a single file
is not an atomic operation. Trivially, one could resolve this by
dropping loose refs, and always using a single packed-refs file, but
that would make it prohibitively expensive to update refs (the entire
packed-refs file must be rewritten for every update).

Now, observe that we don't have these race conditions in the object
database, because it is an add-only immutable data store.

What if we stored the refs as a tree object in the object database,
referenced by a single (loose) ref? There would be a _single_ (albeit
highly contentious) file outside the object database that represent
the current state of the refs, but hopefully we can guarantee
atomicity when reading (and updating?) that one file. Transactions can
be done by:
 1. Recording the tree id holding the refs before starting manipulation.
 2. Creating a new tree object holding the manipulated state.
 3. Re-checking the tree id before replacing the loose ref. If
unchanged: commit, else: rollback/error out.

All readers would trivially have access to a consistent refs view,
since the state of the entire refs hierarchy is held in the tree id
read from that single loose ref.

It seems to me this should be somewhat less prohibitively expensive
than maintaining all refs in a single packed-refs file. That said, we
do end up producing a few new objects for every single ref update,
most of which would be thrown away by a future "gc". This might bog
things down, but I'm not sure how much.

I'm sure someone must have had this idea before (although I don't
remember this alternative being raised at the Git Merge conference),
so please enlighten me as to why this won't work... ;)

...Johan


PS: Keeping reflogs is just a matter of wrapping the ref tree in a
commit object using the previous state of the ref tree as its parent.

-- 
Johan Herland, <jo...@herland.net>
www.herland.net
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to