Re: Proposal for missing blob support in Git repos

Jonathan Tan Mon, 01 May 2017 12:12:53 -0700

On 04/30/2017 08:57 PM, Junio C Hamano wrote:

One thing I wonder is what the performance impact of a change like
this to the codepath that wants to see if an object does _not_ exist
in the repository.  When creating a new object by hashing raw data,
we see if an object with the same name already exists before writing
the compressed loose object out (or comparing the payload to detect
hash collision).  With a "missing blob" support, we'd essentially
spawn an extra process every time we want to create a new blob
locally, and most of the time that is done only to hear the external
command to say "no, we've never heard of such an object", with a
possibly large latency.


If we do not have to worry about that (or if it is no use to worry
about it, because we cannot avoid it if we wanted to do the lazy
loading of objects from elsewhere), then the patch presented here
looked like a sensible first step towards the stated goal.

Thanks.

Thanks for your comments. If you're referring to the codepath involvingwrite_sha1_file() (for example, builtin/hash-object -> index_fd orbuiltin/unpack-objects), that is fine because write_sha1_file() invokesfreshen_packed_object() and freshen_loose_object() directly to check ifthe object already exists (and thus does not invoke the new mechanism inthis patch).

Having said that, looking at other parts of the fetching mechanism,there are a few calls to has_sha1_file() and others that might need tobe checked. (We have already discussed one - the one in rev-list wheninvoked to check connectivity.) I could take a look at that, but washoping for discussion on what I've sent so far (so that I know that I'mon the right track, and because it somewhat works, albeit slowly).

Re: Proposal for missing blob support in Git repos

Reply via email to