Re: [PATCH v3 11/14] commit: integrate commit graph with commit parsing

2018-02-15 Thread Junio C Hamano
Derrick Stolee  writes:

> +struct object_id *get_nth_commit_oid(struct commit_graph *g,
> +  uint32_t n,
> +  struct object_id *oid)
> +{
> + hashcpy(oid->hash, g->chunk_oid_lookup + g->hash_len * n);
> + return oid;
> +}

This looks like a rather klunky API to me.  

It seems that many current callers in this series (not limited to
this step but in later patches in the series) discard the returned
value.

I would understand the API a lot better if the function returned
"const struct object_id *" that points into the copy of the oid the
graph structure keeps (and the caller can do hashcpy() if it wants
to).

That would allow the API to later check for errors when the caller
gives 'n' that is too large by returning a NULL, for example.

> +static struct commit_list **insert_parent_or_die(struct commit_graph *g,
> +int pos,
> +struct commit_list **pptr)
> +{
> + struct commit *c;
> + struct object_id oid;
> + get_nth_commit_oid(g, pos, &oid);
> + c = lookup_commit(&oid);


Re: [PATCH v3 11/14] commit: integrate commit graph with commit parsing

2018-02-14 Thread Derrick Stolee

On 2/13/2018 7:12 PM, Jonathan Tan wrote:

On Thu,  8 Feb 2018 15:37:35 -0500
Derrick Stolee  wrote:


| Command  | Before | After  | Rel % |
|--|||---|
| log --oneline --topo-order -1000 |  5.9s  |  0.7s  | -88%  |
| branch -vv   |  0.42s |  0.27s | -35%  |
| rev-list --all   |  6.4s  |  1.0s  | -84%  |
| rev-list --all --objects | 32.6s  | 27.6s  | -15%  |

Could we have a performance test (in t/perf) demonstrating this?


The rev-list perf tests are found in t/perf/p0001-rev-list.sh

The "log --oneline --topo-order -1000" test would be good to add to 
t/perf/p4211-line-log.sh


The "branch -vv" test is pretty uninteresting unless you set up your 
repo to have local branches significantly behind the remote branches. It 
depends a lot more on the data shape than the others which only need a 
large number of reachable objects.


One reason I did not use the builtin perf test scripts is that they seem 
to ignore all local config options, and hence do not inherit the 
core.commitGraph=true setting from the repos pointed at by GIT_PERF_REPO.





+static int check_commit_parents(struct commit *item, struct commit_graph *g,
+   uint32_t pos, const unsigned char *commit_data)

Document what this function does? Also, this function probably needs a
better name.


+/*
+ * Given a commit struct, try to fill the commit struct info, including:
+ *  1. tree object
+ *  2. date
+ *  3. parents.
+ *
+ * Returns 1 if and only if the commit was found in the commit graph.
+ *
+ * See parse_commit_buffer() for the fallback after this call.
+ */
+int parse_commit_in_graph(struct commit *item)
+{

The documentation above duplicates what's in the header file, so we can
probably omit it.


+extern struct object_id *get_nth_commit_oid(struct commit_graph *g,
+   uint32_t n,
+   struct object_id *oid);

This doesn't seem to be used elsewhere - do you plan for a future patch
to use it?




Re: [PATCH v3 11/14] commit: integrate commit graph with commit parsing

2018-02-13 Thread Jonathan Tan
On Thu,  8 Feb 2018 15:37:35 -0500
Derrick Stolee  wrote:

> | Command  | Before | After  | Rel % |
> |--|||---|
> | log --oneline --topo-order -1000 |  5.9s  |  0.7s  | -88%  |
> | branch -vv   |  0.42s |  0.27s | -35%  |
> | rev-list --all   |  6.4s  |  1.0s  | -84%  |
> | rev-list --all --objects | 32.6s  | 27.6s  | -15%  |

Could we have a performance test (in t/perf) demonstrating this?

> +static int check_commit_parents(struct commit *item, struct commit_graph *g,
> + uint32_t pos, const unsigned char *commit_data)

Document what this function does? Also, this function probably needs a
better name.

> +/*
> + * Given a commit struct, try to fill the commit struct info, including:
> + *  1. tree object
> + *  2. date
> + *  3. parents.
> + *
> + * Returns 1 if and only if the commit was found in the commit graph.
> + *
> + * See parse_commit_buffer() for the fallback after this call.
> + */
> +int parse_commit_in_graph(struct commit *item)
> +{

The documentation above duplicates what's in the header file, so we can
probably omit it.

> +extern struct object_id *get_nth_commit_oid(struct commit_graph *g,
> + uint32_t n,
> + struct object_id *oid);

This doesn't seem to be used elsewhere - do you plan for a future patch
to use it?


[PATCH v3 11/14] commit: integrate commit graph with commit parsing

2018-02-08 Thread Derrick Stolee
Teach Git to inspect a commit graph file to supply the contents of a
struct commit when calling parse_commit_gently(). This implementation
satisfies all post-conditions on the struct commit, including loading
parents, the root tree, and the commit date. The only loosely-expected
condition is that the commit buffer is loaded into the cache. This
was checked in log-tree.c:show_log(), but the "return;" on failure
produced unexpected results (i.e. the message line was never terminated).
The new behavior of loading the buffer when needed prevents the
unexpected behavior.

If core.commitGraph is false, then do not check graph files.

In test script t5318-commit-graph.sh, add output-matching conditions on
read-only graph operations.

By loading commits from the graph instead of parsing commit buffers, we
save a lot of time on long commit walks. Here are some performance
results for a copy of the Linux repository where 'master' has 704,766
reachable commits and is behind 'origin/master' by 19,610 commits.

| Command  | Before | After  | Rel % |
|--|||---|
| log --oneline --topo-order -1000 |  5.9s  |  0.7s  | -88%  |
| branch -vv   |  0.42s |  0.27s | -35%  |
| rev-list --all   |  6.4s  |  1.0s  | -84%  |
| rev-list --all --objects | 32.6s  | 27.6s  | -15%  |

Signed-off-by: Derrick Stolee 
---
 alloc.c |   1 +
 commit-graph.c  | 202 
 commit-graph.h  |  21 -
 commit.c|   3 +
 commit.h|   3 +
 log-tree.c  |   3 +-
 t/t5318-commit-graph.sh |  44 ++-
 7 files changed, 272 insertions(+), 5 deletions(-)

diff --git a/alloc.c b/alloc.c
index 12afadfacd..cf4f8b61e1 100644
--- a/alloc.c
+++ b/alloc.c
@@ -93,6 +93,7 @@ void *alloc_commit_node(void)
struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
c->object.type = OBJ_COMMIT;
c->index = alloc_commit_index();
+   c->graph_pos = COMMIT_NOT_FROM_GRAPH;
return c;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index 95b813c2c7..aff67c458e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -38,6 +38,9 @@
 #define GRAPH_MIN_SIZE (GRAPH_CHUNKLOOKUP_SIZE + GRAPH_FANOUT_SIZE + \
GRAPH_OID_LEN + 8)
 
+/* global storage */
+struct commit_graph *commit_graph = NULL;
+
 char *get_graph_head_filename(const char *pack_dir)
 {
struct strbuf fname = STRBUF_INIT;
@@ -229,6 +232,205 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file, const char *p
return graph;
 }
 
+static void prepare_commit_graph_one(const char *obj_dir)
+{
+   char *graph_file;
+   struct object_id oid;
+   struct strbuf pack_dir = STRBUF_INIT;
+   strbuf_addstr(&pack_dir, obj_dir);
+   strbuf_add(&pack_dir, "/pack", 5);
+
+   if (!get_graph_head_hash(pack_dir.buf, &oid))
+   return;
+
+   graph_file = get_commit_graph_filename_hash(pack_dir.buf, &oid);
+
+   commit_graph = load_commit_graph_one(graph_file, pack_dir.buf);
+   strbuf_release(&pack_dir);
+}
+
+static int prepare_commit_graph_run_once = 0;
+void prepare_commit_graph(void)
+{
+   struct alternate_object_database *alt;
+   char *obj_dir;
+
+   if (prepare_commit_graph_run_once)
+   return;
+   prepare_commit_graph_run_once = 1;
+
+   obj_dir = get_object_directory();
+   prepare_commit_graph_one(obj_dir);
+   prepare_alt_odb();
+   for (alt = alt_odb_list; !commit_graph && alt; alt = alt->next)
+   prepare_commit_graph_one(alt->path);
+}
+
+static int bsearch_graph(struct commit_graph *g, struct object_id *oid, 
uint32_t *pos)
+{
+   int result = bsearch_hash(oid->hash, g->chunk_oid_fanout,
+ g->chunk_oid_lookup, g->hash_len);
+
+   if (result >= 0) {
+   *pos = result;
+   return 1;
+   } else {
+   *pos = -result - 1;
+   return 0;
+   }
+}
+
+struct object_id *get_nth_commit_oid(struct commit_graph *g,
+uint32_t n,
+struct object_id *oid)
+{
+   hashcpy(oid->hash, g->chunk_oid_lookup + g->hash_len * n);
+   return oid;
+}
+
+static struct commit_list **insert_parent_or_die(struct commit_graph *g,
+  int pos,
+  struct commit_list **pptr)
+{
+   struct commit *c;
+   struct object_id oid;
+   get_nth_commit_oid(g, pos, &oid);
+   c = lookup_commit(&oid);
+   if (!c)
+   die("could not find commit %s", oid_to_hex(&oid));
+   c->graph_pos = pos;
+   return &commit_list_insert(c, pptr)->next;
+}
+
+static int check_commit_parents(struct commit *item, struct commit_graph *g,
+