Re: [PATCH v4 04/10] commit: use generations in paint_down_to_common()

Jakub Narebski Sun, 29 Apr 2018 08:41:24 -0700

Derrick Stolee <dsto...@microsoft.com> writes:

> Define compare_commits_by_gen_then_commit_date(), which uses generation
> numbers as a primary comparison and commit date to break ties (or as a
> comparison when both commits do not have computed generation numbers).


All right, this looks reasonable thing to do when we have access to
commit generation numbers..

> Since the commit-graph file is closed under reachability, we know that
> all commits in the file have generation at most GENERATION_NUMBER_MAX
> which is less than GENERATION_NUMBER_INFINITY.

Thus the condition that if B is reachable from A, then gen(A) >= gen(B),
even if they have generation numbers _INFINITY, _MAX or _ZERO.

We use generation numbers, if possible, to choose closest commit; if
not, we use dates.

>
> This change does not affect the number of commits that are walked during
> the execution of paint_down_to_common(), only the order that those
> commits are inspected. In the case that commit dates violate topological
> order (i.e. a parent is "newer" than a child), the previous code could
> walk a commit twice: if a commit is reached with the PARENT1 bit, but
> later is re-visited with the PARENT2 bit, then that PARENT2 bit must be
> propagated to its parents. Using generation numbers avoids this extra
> effort, even if it is somewhat rare.

Actually the ordering of commits walked does not affect the correctness
of the result.  Better ordering means that commits do not need to be
walked twice; I think it would be possible to craft repository in which
unlucky clock skew would lead to depth-first walk of commits later part
of walk would mark STALE.

Pedantry aside, I think it is a good description of analysis of change
results.

>
> Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
> ---
>  commit.c | 20 +++++++++++++++++++-
>  commit.h |  1 +
>  2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/commit.c b/commit.c
> index 711f674c18..4d00b0a1d6 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -640,6 +640,24 @@ static int compare_commits_by_author_date(const void 
> *a_, const void *b_,
>       return 0;
>  }
>  
> +int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, 
> void *unused)
> +{
> +     const struct commit *a = a_, *b = b_;
> +
> +     /* newer commits first */

To be pedantic, larger generation number does not necessary mean that
commit was created later (is newer), only that it is on longer chain
since common ancestor or root commit.

> +     if (a->generation < b->generation)
> +             return 1;
> +     else if (a->generation > b->generation)
> +             return -1;

If the commit-graph feature is not available, or is disabled, all
commits would have the same generation number (_INFINITY), then this
block would be always practically no-op.

This is not very costly: 2 access to data which should be in cache, and
1 to 2 comparison operations.  But I wonder if we wouldn't want to avoid
this nano-cost if possible...

> +
> +     /* use date as a heuristic when generations are equal */
> +     if (a->date < b->date)
> +             return 1;
> +     else if (a->date > b->date)
> +             return -1;
> +     return 0;

The above is the same code as in compare_commits_by_commit_date(), but
there it is with "newer commits with larger date first" as comment
instead.

All right: we need inlining for speed.

> +}
> +
>  int compare_commits_by_commit_date(const void *a_, const void *b_, void 
> *unused)
>  {
>       const struct commit *a = a_, *b = b_;
> @@ -789,7 +807,7 @@ static int queue_has_nonstale(struct prio_queue *queue)
>  /* all input commits in one and twos[] must have been parsed! */
>  static struct commit_list *paint_down_to_common(struct commit *one, int n, 
> struct commit **twos)
>  {
> -     struct prio_queue queue = { compare_commits_by_commit_date };
> +     struct prio_queue queue = { compare_commits_by_gen_then_commit_date };

I wonder if it would be worth it to avoid comparing by generation
numbers without commit graph data:

  +     struct prio_queue queue;
  [...]
  +     if (commit_graph)
  +             queue.compare = compare_commits_by_gen_then_commit_date;
  +     else
  +             queue.compare = compare_commits_by_commit_date;

Or something like that.  But perhaps this nano-optimization is not worth
it (it is not that complicated, though).


Sidenote: when I searched for compare_commits_by_commit_date use, I have
noticed that it is used, I think as heuristics, for packfile creation in
upload-pack.c and fetch-pack.c.  Would they similarly improve with
compare_commits_by_gen_then_commit_date?

This is of course not something that this commit, or this patch series,
needs to address now.

>       struct commit_list *result = NULL;
>       int i;
>  
> diff --git a/commit.h b/commit.h
> index aac3b8c56f..64436ff44e 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -341,6 +341,7 @@ extern int remove_signature(struct strbuf *buf);
>  extern int check_commit_signature(const struct commit *commit, struct 
> signature_check *sigc);
>  
>  int compare_commits_by_commit_date(const void *a_, const void *b_, void 
> *unused);
> +int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, 
> void *unused);

All right.

>  
>  LAST_ARG_MUST_BE_NULL
>  extern int run_commit_hook(int editor_is_used, const char *index_file, const 
> char *name, ...);

Re: [PATCH v4 04/10] commit: use generations in paint_down_to_common()

Reply via email to