On Wed, Oct 10 2018, SZEDER Gábor wrote:

> On Mon, Sep 17, 2018 at 03:33:35PM +0000, Ævar Arnfjörð Bjarmason wrote:
>>     $ git -c gc.writeCommitGraph=true gc
>>     [...]
>>     Annotating commits in commit graph: 1565573, done.
>>     Computing commit graph generation numbers: 100% (782484/782484), done.
>
> While poking around 'commit-graph.c' in my Bloom filter experiment, I
> saw similar numbers like above, and was confused by the much higher
> than expected number of annotated commits.  It's about twice as much
> as the number of commits in the repository, or the number shown on the
> very next line.
>
>> diff --git a/commit-graph.c b/commit-graph.c
>> index 8a1bec7b8a..2c5d996194 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
>> -static void close_reachable(struct packed_oid_list *oids)
>> +static void close_reachable(struct packed_oid_list *oids, int 
>> report_progress)
>>  {
>>      int i;
>>      struct commit *commit;
>> +    struct progress *progress = NULL;
>> +    int j = 0;
>>
>> +    if (report_progress)
>> +            progress = start_delayed_progress(
>> +                    _("Annotating commits in commit graph"), 0);
>>      for (i = 0; i < oids->nr; i++) {
>> +            display_progress(progress, ++j);
>>              commit = lookup_commit(the_repository, &oids->list[i]);
>>              if (commit)
>>                      commit->object.flags |= UNINTERESTING;
>> @@ -604,6 +616,7 @@ static void close_reachable(struct packed_oid_list *oids)
>>       * closure.
>>       */
>>      for (i = 0; i < oids->nr; i++) {
>> +            display_progress(progress, ++j);
>>              commit = lookup_commit(the_repository, &oids->list[i]);
>>
>>              if (commit && !parse_commit(commit))
>> @@ -611,19 +624,28 @@ static void close_reachable(struct packed_oid_list 
>> *oids)
>>      }
>
> The above loops have already counted all the commits, and, more
> importantly, did all the hard work that takes time and makes the
> progress indicator useful.
>
>>      for (i = 0; i < oids->nr; i++) {
>> +            display_progress(progress, ++j);

[...]

> This display_progress() call, however, doesn't seem to be necessary.
> First, it counts all commits for a second time, resulting in the ~2x
> difference compared to the actual number of commits, and then causing
> my confusion.  Second, all what this loop is doing is setting a flag
> in commits that were already looked up and parsed in the above loops.
> IOW this loop is very fast, and the progress indicator jumps from
> ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
> progress indicator at all.

You're right, I tried this patch on top:

    diff --git a/commit-graph.c b/commit-graph.c
    index a686758603..cccd83de72 100644
    --- a/commit-graph.c
    +++ b/commit-graph.c
    @@ -655,12 +655,16 @@ static void close_reachable(struct packed_oid_list 
*oids, int report_progress)
                if (commit)
                        commit->object.flags |= UNINTERESTING;
        }
    +   stop_progress(&progress); j = 0;

        /*
         * As this loop runs, oids->nr may grow, but not more
         * than the number of missing commits in the reachable
         * closure.
         */
    +   if (report_progress)
    +           progress = start_delayed_progress(
    +                   _("Annotating commits in commit graph 2"), 0);
        for (i = 0; i < oids->nr; i++) {
                display_progress(progress, ++j);
                commit = lookup_commit(the_repository, &oids->list[i]);
    @@ -668,7 +672,11 @@ static void close_reachable(struct packed_oid_list 
*oids, int report_progress)
                if (commit && !parse_commit(commit))
                        add_missing_parents(oids, commit);
        }
    +   stop_progress(&progress); j = 0;

    +   if (report_progress)
    +           progress = start_delayed_progress(
    +                   _("Annotating commits in commit graph 3"), 0);
        for (i = 0; i < oids->nr; i++) {
                display_progress(progress, ++j);
                commit = lookup_commit(the_repository, &oids->list[i]);

And on a large repo with around 3 million commits the 3rd progress bar
doesn't kick in.

But if I apply this on top:

    diff --git a/progress.c b/progress.c
    index 5a99c9fbf0..89cc705bf7 100644
    --- a/progress.c
    +++ b/progress.c
    @@ -58,8 +58,8 @@ static void set_progress_signal(void)
        sa.sa_flags = SA_RESTART;
        sigaction(SIGALRM, &sa, NULL);

    -   v.it_interval.tv_sec = 1;
    -   v.it_interval.tv_usec = 0;
    +   v.it_interval.tv_sec = 0;
    +   v.it_interval.tv_usec = 250000;
        v.it_value = v.it_interval;
        setitimer(ITIMER_REAL, &v, NULL);
     }

I.e. start the timer after 1/4 of a second instead of 1 second, I get
that progress bar.

So I'm inclined to keep it. It just needs to be 4x the size before it's
noticeably hanging for 1 second.

That repo isn't all that big compared to what we've heard about out
there, and inner loops like this have a tendency to accumulate some more
code over time without a re-visit of why we weren't monitoring progress
there.

But maybe we can fix the message. We say "Annotating commits in commit
grap", not "Counting" or whatever. I was trying to find something that
didn't imply that we were doing this once. One can annotate a thing more
than once, but maybe ther's a better way to explain this...

We had some more accurate progress reporting in close_reachable(),
discussed in
https://public-inbox.org/git/87efe5qqks....@evledraar.gmail.com/ I still
think the *main* use-case for these things is to just report that we're
not hanging, so maybe the proper solution is to pick up Duy's patch to
display a spinner insted of a numeric progress.

>>              commit = lookup_commit(the_repository, &oids->list[i]);
>>
>>              if (commit)
>>                      commit->object.flags &= ~UNINTERESTING;
>>      }
>> +    stop_progress(&progress);
>>  }

Reply via email to