Derrick Stolee <dsto...@microsoft.com> writes:

> When running 'git branch --contains', the in_merge_bases_many()
> method calls paint_down_to_common() to discover if a specific
> commit is reachable from a set of branches. Commits with lower
> generation number are not needed to correctly answer the
> containment query of in_merge_bases_many().
>
> Add a new parameter, min_generation, to paint_down_to_common() that
> prevents walking commits with generation number strictly less than
> min_generation. If 0 is given, then there is no functional change.

This is thanks to the fact that generation numbers start at zero (as
special case, though, with _ZERO), and we use strict inequality to avoid
handling _ZERO etc. in a special way.

As you wrote in response in previous version of this series, because
paint_down_to_common() is file-local, there is no need to come up with
symbolic name for GENERATION_NO_CUTOFF case.

All right.

>
> For in_merge_bases_many(), we can pass commit->generation as the
> cutoff, and this saves time during 'git branch --contains' queries
> that would otherwise walk "around" the commit we are inspecting.

All right, and when using paint_down_to_common() to actually find merge
bases, and not only check for containment, we cannot use cutoff.
Therefore at least one call site needs to run it without functional
change... which we can do.  Good.

>
> For a copy of the Linux repository, where HEAD is checked out at
> v4.13~100, we get the following performance improvement for
> 'git branch --contains' over the previous commit:
>
> Before: 0.21s
> After:  0.13s
> Rel %: -38%

Nice.

>
> Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
> ---
>  commit.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)

Let me reorder chunks a bit to make it easier to review.

>
> diff --git a/commit.c b/commit.c
> index 7bb007f56a..e2e16ea1a7 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1070,7 +1080,7 @@ int in_merge_bases_many(struct commit *commit, int 
> nr_reference, struct commit *
>       if (commit->generation > min_generation)
>               return ret;
>  
> -     bases = paint_down_to_common(commit, nr_reference, reference);
> +     bases = paint_down_to_common(commit, nr_reference, reference, 
> commit->generation);
>       if (commit->object.flags & PARENT2)
>               ret = 1;
>       clear_commit_marks(commit, all_flags);
> @@ -808,11 +808,14 @@ static int queue_has_nonstale(struct prio_queue *queue)
>  }
>  
>  /* all input commits in one and twos[] must have been parsed! */
> -static struct commit_list *paint_down_to_common(struct commit *one, int n, 
> struct commit **twos)
> +static struct commit_list *paint_down_to_common(struct commit *one, int n,
> +                                             struct commit **twos,
> +                                             int min_generation)
>  {
>       struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
>       struct commit_list *result = NULL;
>       int i;
> +     uint32_t last_gen = GENERATION_NUMBER_INFINITY;
>  
>       one->object.flags |= PARENT1;
>       if (!n) {
> @@ -831,6 +834,13 @@ static struct commit_list *paint_down_to_common(struct 
> commit *one, int n, struc
>               struct commit_list *parents;
>               int flags;
>  
> +             if (commit->generation > last_gen)
> +                     BUG("bad generation skip");
> +             last_gen = commit->generation;

Shouldn't we provide more information about where the problem is to the
user, to make it easier to debug the repository / commit-graph data?

Good to have this sanity check here.

> +
> +             if (commit->generation < min_generation)
> +                     break;

So the reasoning for this, as far as I understand, is the following.
Please correct me if I am wrong.

The callsite with non-zero min_generation, in_merge_bases_many(), tries
to find out if "commit" is an ancestor of one of the "references".  At
least one of "references" is above "commit", so in_merge_bases_many()
uses paint_down_to_common() - but is interested only if "commit" was
painted as reachable from one of "references".

Thus we can interrupt the walk if we know that none of [considered]
commits in the queue can reach "commit"/"one" - as if they were all
STALE.

The search is done using priority queue (a bit like in Dijkstra
algorithm), with newer commits - with larger generation numbers -
considered first.  Thus if current commit has generation number less
than min_generation cutoff, i.e. if it is below "commit", then all
remaining commits in the queue are below cutoff.

Good.

> +
>               flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
>               if (flags == (PARENT1 | PARENT2)) {
>                       if (!(commit->object.flags & RESULT)) {
> @@ -879,7 +889,7 @@ static struct commit_list *merge_bases_many(struct commit 
> *one, int n, struct co
>                       return NULL;
>       }
>  
> -     list = paint_down_to_common(one, n, twos);
> +     list = paint_down_to_common(one, n, twos, 0);

When calculating merge bases there is no such possibility of an early
return due to generation number cutoff.  All right then.

>  
>       while (list) {
>               struct commit *commit = pop_commit(&list);
> @@ -946,7 +956,7 @@ static int remove_redundant(struct commit **array, int 
> cnt)
>                       filled_index[filled] = j;
>                       work[filled++] = array[j];
>               }
> -             common = paint_down_to_common(array[i], filled, work);
> +             common = paint_down_to_common(array[i], filled, work, 0);

Here we are interested not only if "one"/array[i] is reachable from
"twos"/work, but also if "twos" is reachable from "one".  Simple cutoff
only works in one way, though I wonder if we couldn't use cutoff being
minimum generation number of "one" and "twos" together.

But that may be left for a separate commit (after checking that the
above is correct).

Not as simple and obvious as paint_down_to_common() used in
in_merge_bases_any(), so it is all right.

>               if (array[i]->object.flags & PARENT2)
>                       redundant[i] = 1;
>               for (j = 0; j < filled; j++)

Reply via email to