On 04/30/2017 08:57 PM, Junio C Hamano wrote:
One thing I wonder is what the performance impact of a change like
this to the codepath that wants to see if an object does _not_ exist
in the repository.  When creating a new object by hashing raw data,
we see if an object with the same name already exists before writing
the compressed loose object out (or comparing the payload to detect
hash collision).  With a "missing blob" support, we'd essentially
spawn an extra process every time we want to create a new blob
locally, and most of the time that is done only to hear the external
command to say "no, we've never heard of such an object", with a
possibly large latency.

If we do not have to worry about that (or if it is no use to worry
about it, because we cannot avoid it if we wanted to do the lazy
loading of objects from elsewhere), then the patch presented here
looked like a sensible first step towards the stated goal.

Thanks.

Thanks for your comments. If you're referring to the codepath involving write_sha1_file() (for example, builtin/hash-object -> index_fd or builtin/unpack-objects), that is fine because write_sha1_file() invokes freshen_packed_object() and freshen_loose_object() directly to check if the object already exists (and thus does not invoke the new mechanism in this patch).

Having said that, looking at other parts of the fetching mechanism, there are a few calls to has_sha1_file() and others that might need to be checked. (We have already discussed one - the one in rev-list when invoked to check connectivity.) I could take a look at that, but was hoping for discussion on what I've sent so far (so that I know that I'm on the right track, and because it somewhat works, albeit slowly).

Reply via email to