2018-03-09 2:19 GMT+09:00 René Scharfe <[email protected]>:
> Am 08.03.2018 um 13:06 schrieb Takuto Ikuta:
>> +static int add_loose_objects_to_set(const struct object_id *oid,
>> + const char *path,
>> + void *data)
>> +{
>> + struct oidset* set = (struct oidset*)(data);
>
> This cast is not needed (unlike in C++). And the asterisk should be stuck
> to the variable, not the type (see Documentation/CodingGuidelines).
>
>> + oidset_insert(set, oid);
>
> In fact, you could just put "data" in here instead of "set" (without a
> cast), with no loss in readability or safety.
>
Thank you for review, changed to use data directly in v2.
>> + return 0;
>> +}
>> +
>> static int everything_local(struct fetch_pack_args *args,
>> struct ref **refs,
>> struct ref **sought, int nr_sought)
>> @@ -719,16 +728,21 @@ static int everything_local(struct fetch_pack_args
>> *args,
>> int retval;
>> int old_save_commit_buffer = save_commit_buffer;
>> timestamp_t cutoff = 0;
>> + struct oidset loose_oid_set = OIDSET_INIT;
>> +
>> + for_each_loose_object(add_loose_objects_to_set, &loose_oid_set, 0);
>>
>> save_commit_buffer = 0;
>>
>> for (ref = *refs; ref; ref = ref->next) {
>> struct object *o;
>> + unsigned int flag = OBJECT_INFO_QUICK;
>>
>> - if (!has_object_file_with_flags(&ref->old_oid,
>> - OBJECT_INFO_QUICK))
>> - continue;
>> + if (!oidset_contains(&loose_oid_set, &ref->old_oid))
>> + flag |= OBJECT_INFO_SKIP_LOOSE;
>>
>> + if (!has_object_file_with_flags(&ref->old_oid, flag))
>> + continue;
>> o = parse_object(&ref->old_oid);
>> if (!o)
>> continue;
>> @@ -744,6 +758,8 @@ static int everything_local(struct fetch_pack_args *args,
>> }
>> }
>>
>> + oidset_clear(&loose_oid_set);
>> +
>
> This part looks fine to me. (Except perhaps call the variable "flags"
> because you sometimes have two?)
>
Changed flag names.
> Why not include packed objects as well? Probably because packs have
> indexes which can queried quickly to determine object existence, and
> because there are only few loose objects in typical repositories,
> right?
>
Correct. In my target repository, chromium, fetch-pack's slowness
comes from many lstat
to non-existing loose objects for remote refs. I focus on to remove such lstat.
> A similar cache was introduced by cc817ca3ef (sha1_name: cache
> readdir(3) results in find_short_object_filename()) to speed up
> finding unambiguous shorts object hashes. I wonder if it could be
> used here as well, but I don't see an easy way.
>
>> if (!args->no_dependents) {
>> if (!args->deepen) {
>> for_each_ref(mark_complete_oid, NULL);
>> diff --git a/sha1_file.c b/sha1_file.c
>> index 1b94f39c4..c903cbcec 100644
>> --- a/sha1_file.c
>> +++ b/sha1_file.c
>> @@ -1262,6 +1262,9 @@ int sha1_object_info_extended(const unsigned char
>> *sha1, struct object_info *oi,
>> if (find_pack_entry(real, &e))
>> break;
>>
>> + if (flags & OBJECT_INFO_SKIP_LOOSE)
>> + return -1;
>> +
>> /* Most likely it's a loose object. */
>> if (!sha1_loose_object_info(real, oi, flags))
>> return 0;
>>
>
> This early return doesn't just skip checking loose objects. It
> also skips reloading packs and fetching missing objects for
> partial clones. That may not be a problem for fetch-pack, but
> it means the flag has a misleading name. Do you get the same
> performance improvement if you make it only skip that
> sha1_loose_object_info() call?
>
I changed the flag name following Junio's suggestion. If we skip
sha1_loose_object_info call,
reprepare_packed_git(...) runs for every non-existing refs and git
fetch becomes slower.
Takuto