As some of you may know, I'm currently working on support for partial
clones/fetches in Git (where blobs above a user-specified size threshold
are not downloaded - only their names and sizes are downloaded). To do
this, the client repository needs to be able to download blobs at will
whenever it needs a missing one (for example, upon checkout).

So I have done this by adding support for a hook in Git [1], and
updating the object-reading code in Git to, by default, automatically
invoke this hook whenever necessary. (This means that existing
subsystems will all work by default, in theory at least.) My current
design is for the hook to have maximum flexibility - when invoked with a
list of SHA-1s, it must merely ensure that those objects are in the
local repo, whether packed or loose.

I am also working on a command (fetch-blob) to be bundled with Git to be
used as a default hook, and here is where the problem lies.

Suppose you have missing blob AB12 and CD34 that you now need, so
fetch-blob is invoked. It sends the literals AB12 and CD34 to a new
server endpoint and obtains a packfile, which it then pipes through "git
index-pack". The issue is that "git index-pack" wants to try to access
AB12 and CD34 in the local repo in order to do a SHA-1 collision check,
and therefore fetch-blob is invoked once again, creating infinite
recursion.

This is straightforwardly fixed by making "git index-pack" understand
about missing blobs, but this might be a symptom of this approach being
error-prone (custom hooks that invoke any Git command must be extra
careful).

So I have thought of a few solutions, each with its pros and cons:

1. Require the hook to instead output a packfile to stdout. This means
that that hook no longer needs to access the local repo, and thus has
less dependence on Git commands. But this reduces the flexibility in
that its output must be packed, not loose. (This is fine for the use
cases I'm thinking of, but probably not so for others.)

2. Add support for an environment variable to Git that suppresses access
to the missing blob manifest, in effect, suppressing invocation of the
hook. This allows anyone (the person configuring Git or the hook writer)
to suppress this access, although they might need in-depth knowledge to
know whether the hook is meant to be run with such access suppressed or
required.

3. Like the above, except for a command-line argument to Git.

What do you think? Any solutions that I am missing?

[1] Work in progress, but you can see an earlier version here: 
https://public-inbox.org/git/b917a463f0ad4ce0ab115203b3f24894961a2e75.1497558851.git.jonathanta...@google.com/

Reply via email to