Re: [PATCH 16/16] commit-reach: use can_all_from_reach

2018-07-16 Thread Stefan Beller
On Mon, Jul 16, 2018 at 6:00 AM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> The is_descendant_of method previously used in_merge_bases() to check if
> the commit can reach any of the commits in the provided list. This had
> two performance problems:
>
> 1. The performance is quadratic in worst-case.
>
> 2. A single in_merge_bases() call requires walking beyond the target
>commit in order to find the full set of boundary commits that may be
>merge-bases.
>
> The can_all_from_reach method avoids this quadratic behavior and can
> limit the search beyond the target commits using generation numbers. It
> requires a small prototype adjustment to stop using commit-date as a
> cutoff, as that optimization is no longer appropriate here.
>
> Since in_merge_bases() uses paint_down_to_common(), is_descendant_of()
> naturally found cutoffs to avoid walking the entire commit graph. Since
> we want to always return the correct result, we cannot use the
> min_commit_date cutoff in can_all_from_reach. We then rely on generation
> numbers to provide the cutoff.
>
> Since not all repos will have a commit-graph file, nor will we always
> have generation numbers computed for a commit-graph file, create a new
> method, generation_numbers_enabled(), that checks for a commit-graph
> file and sees if the first commit in the file has a non-zero generation
> number. In the case that we do not have generation numbers, use the old
> logic for is_descendant_of().
>
> Performance was meausured on a copy of the Linux repository using the
> 'test-tool reach is_descendant_of' command using this input:
>
> A:v4.9
> X:v4.10
> X:v4.11
> X:v4.12
> X:v4.13
> X:v4.14
> X:v4.15
> X:v4.16
> X:v4.17
> X.v3.0
>
> Note that this input is tailored to demonstrate the quadratic nature of
> the previous method, as it will compute merge-bases for v4.9 versus all
> of the later versions before checking against v4.1.
>
> Before: 0.26 s
>  After: 0.21 s
>
> Since we previously used the is_descendant_of method in the ref_newer
> method, we also measured performance there using
> 'test-tool reach ref_newer' with this input:
>
> A:v4.9
> B:v3.19
>
> Before: 0.10 s
>  After: 0.08 s
>
> By adding a new commit with parent v3.19, we test the non-reachable case
> of ref_newer:
>
> Before: 0.09 s
>  After: 0.08 s
>
> Signed-off-by: Derrick Stolee 
> ---

Thanks for the commit message. The code itself looks good!

I think this series is nearly done, I have only commented on
style issues so far, which are easier to address than fundamental
design issues or naming things.

Thanks,
Stefan


[PATCH 16/16] commit-reach: use can_all_from_reach

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The is_descendant_of method previously used in_merge_bases() to check if
the commit can reach any of the commits in the provided list. This had
two performance problems:

1. The performance is quadratic in worst-case.

2. A single in_merge_bases() call requires walking beyond the target
   commit in order to find the full set of boundary commits that may be
   merge-bases.

The can_all_from_reach method avoids this quadratic behavior and can
limit the search beyond the target commits using generation numbers. It
requires a small prototype adjustment to stop using commit-date as a
cutoff, as that optimization is no longer appropriate here.

Since in_merge_bases() uses paint_down_to_common(), is_descendant_of()
naturally found cutoffs to avoid walking the entire commit graph. Since
we want to always return the correct result, we cannot use the
min_commit_date cutoff in can_all_from_reach. We then rely on generation
numbers to provide the cutoff.

Since not all repos will have a commit-graph file, nor will we always
have generation numbers computed for a commit-graph file, create a new
method, generation_numbers_enabled(), that checks for a commit-graph
file and sees if the first commit in the file has a non-zero generation
number. In the case that we do not have generation numbers, use the old
logic for is_descendant_of().

Performance was meausured on a copy of the Linux repository using the
'test-tool reach is_descendant_of' command using this input:

A:v4.9
X:v4.10
X:v4.11
X:v4.12
X:v4.13
X:v4.14
X:v4.15
X:v4.16
X:v4.17
X.v3.0

Note that this input is tailored to demonstrate the quadratic nature of
the previous method, as it will compute merge-bases for v4.9 versus all
of the later versions before checking against v4.1.

Before: 0.26 s
 After: 0.21 s

Since we previously used the is_descendant_of method in the ref_newer
method, we also measured performance there using
'test-tool reach ref_newer' with this input:

A:v4.9
B:v3.19

Before: 0.10 s
 After: 0.08 s

By adding a new commit with parent v3.19, we test the non-reachable case
of ref_newer:

Before: 0.09 s
 After: 0.08 s

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 18 ++
 commit-graph.h |  6 ++
 commit-reach.c | 24 +---
 3 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index b0a55ad12..e9786fa86 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -233,6 +233,24 @@ static int prepare_commit_graph(struct repository *r)
return !!r->objects->commit_graph;
 }
 
+int generation_numbers_enabled(struct repository *r)
+{
+   uint32_t first_generation;
+   struct commit_graph *g;
+   if (!prepare_commit_graph(r))
+  return 0;
+
+   g = r->objects->commit_graph;
+
+   if (!g->num_commits)
+   return 0;
+
+   first_generation = get_be32(g->chunk_commit_data +
+   g->hash_len + 8) >> 2;
+
+   return !!first_generation;
+}
+
 static void close_commit_graph(void)
 {
free_commit_graph(the_repository->objects->commit_graph);
diff --git a/commit-graph.h b/commit-graph.h
index 76e098934..0de8f8831 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -51,6 +51,12 @@ struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
+/*
+ * Return 1 if and only if the repository has a commit-graph
+ * file and generation numbers are computed in that file.
+ */
+int generation_numbers_enabled(struct repository *r);
+
 void write_commit_graph_reachable(const char *obj_dir, int append);
 void write_commit_graph(const char *obj_dir,
struct string_list *pack_indexes,
diff --git a/commit-reach.c b/commit-reach.c
index ac132c8e4..9eb622540 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -276,15 +276,25 @@ int is_descendant_of(struct commit *commit, struct 
commit_list *with_commit)
 {
if (!with_commit)
return 1;
-   while (with_commit) {
-   struct commit *other;
 
-   other = with_commit->item;
-   with_commit = with_commit->next;
-   if (in_merge_bases(other, commit))
-   return 1;
+   if (generation_numbers_enabled(the_repository)) {
+   struct commit_list *from_list = NULL;
+   int result;
+   commit_list_insert(commit, _list);
+   result = can_all_from_reach(from_list, with_commit, 0);
+   free_commit_list(from_list);
+   return result;
+   } else {
+   while (with_commit) {
+   struct commit *other;
+
+   other = with_commit->item;
+   with_commit = with_commit->next;
+   if (in_merge_bases(other, commit))
+   return 1;
+   }
+   return 0;
}
-   return 0;
 }
 
 /*
--