Re: [PATCH v2 14/14] commit.h: delete 'util' field in struct commit
On Mon, May 14, 2018 at 8:14 PM, Derrick Stoleewrote: > I disagree with the removal of "generation". My intention is to make the > commit-graph feature a standard feature that most repos (of reasonable size) > want to have. In that case, 'graph_pos' and 'generation' will be set during > every parse and used by every commit walk. This is just my gut reaction. My 'often used' is a poor choice of words. The problem here is memory consumption in full-repo walk like 'git clone' or 'git gc', where more memory use may equal more swapping and that will slow things down. commit-slab also has a few advantage over packing everything in struct commit besides the performance argument. It makes it easier to see what field is used for what. And if an operation only uses a field in a short time, then it's possible to free data after it's done (impossible to free these struct commits without a lot more thinking and checking; whatever you add here will stay until the end of the process, which could be a long time for things like pack-objects) > As I investigate these changes, I'll try to see what performance hit is > caused by converting the graph_pos and/or generation to commit slabs. > (Again, I'm assuming the slabs will make this slower. I may be wrong here.) The slab allocation code is very similar to memory-pool, which was made common code also by Microsoft to make reading cache faster so I don't think it will slow things down for you (again, guessing, no real measurement from my side). This does make me think it may be a good idea to try unify commit-slab and memory-pool, let commit-slab be a thin layer on top of memory-pool or something like that (the fewer custom mem allocators we have, the better) -- Duy
Re: [PATCH v2 14/14] commit.h: delete 'util' field in struct commit
On 5/14/2018 1:30 PM, Duy Nguyen wrote: On Mon, May 14, 2018 at 6:07 PM, Duy Nguyenwrote: On Mon, May 14, 2018 at 04:52:29PM +0900, Junio C Hamano wrote: Nguyễn Thái Ngọc Duy writes: diff --git a/commit.h b/commit.h index 838f6a6b26..70371e111e 100644 --- a/commit.h +++ b/commit.h @@ -18,12 +18,16 @@ struct commit_list { struct commit { struct object object; - void *util; unsigned int index; timestamp_t date; struct commit_list *parents; struct tree *tree; uint32_t graph_pos; + /* +* Do not add more fields here unless it's _very_ often +* used. Use commit-slab to associate more data with a commit +* instead. +*/ }; That's a logical consequence and a natural endgame of this pleasent-to-read series. THanks. Unfortunately we are gaining more and more stuff in "struct commit" with recent topics, and we may want to see if we can evict some of them out to shrink it again. Sigh.. ds/lazy-load-trees already enters 'next' so a fixup patch is something like this. Sorry I take this patch back. I didn't realize it was a rename and the old field named 'tree' was already there (I vaguely recalled some "tree" in this struct but didn't stop to think about it). Moving graph_pos out is an option, but only 32-bit arch gains from it (64-bit arch will have 4 bytes padding anyway) so probably not worth the effort. "generation" field should probably be moved out though. I'm happy to take a look at this patch and figure out the best way to integrate it with the changes I've been doing. I disagree with the removal of "generation". My intention is to make the commit-graph feature a standard feature that most repos (of reasonable size) want to have. In that case, 'graph_pos' and 'generation' will be set during every parse and used by every commit walk. This is just my gut reaction. As I investigate these changes, I'll try to see what performance hit is caused by converting the graph_pos and/or generation to commit slabs. (Again, I'm assuming the slabs will make this slower. I may be wrong here.) Thanks, -Stolee
Re: [PATCH v2 14/14] commit.h: delete 'util' field in struct commit
On Mon, May 14, 2018 at 6:07 PM, Duy Nguyenwrote: > On Mon, May 14, 2018 at 04:52:29PM +0900, Junio C Hamano wrote: >> Nguyễn Thái Ngọc Duy writes: >> >> > diff --git a/commit.h b/commit.h >> > index 838f6a6b26..70371e111e 100644 >> > --- a/commit.h >> > +++ b/commit.h >> > @@ -18,12 +18,16 @@ struct commit_list { >> > >> > struct commit { >> > struct object object; >> > - void *util; >> > unsigned int index; >> > timestamp_t date; >> > struct commit_list *parents; >> > struct tree *tree; >> > uint32_t graph_pos; >> > + /* >> > +* Do not add more fields here unless it's _very_ often >> > +* used. Use commit-slab to associate more data with a commit >> > +* instead. >> > +*/ >> > }; >> >> That's a logical consequence and a natural endgame of this >> pleasent-to-read series. THanks. >> >> Unfortunately we are gaining more and more stuff in "struct commit" >> with recent topics, and we may want to see if we can evict some of >> them out to shrink it again. > > Sigh.. ds/lazy-load-trees already enters 'next' so a fixup patch is > something like this. Sorry I take this patch back. I didn't realize it was a rename and the old field named 'tree' was already there (I vaguely recalled some "tree" in this struct but didn't stop to think about it). Moving graph_pos out is an option, but only 32-bit arch gains from it (64-bit arch will have 4 bytes padding anyway) so probably not worth the effort. "generation" field should probably be moved out though. -- Duy
Re: [PATCH v2 14/14] commit.h: delete 'util' field in struct commit
On Mon, May 14, 2018 at 04:52:29PM +0900, Junio C Hamano wrote: > Nguyễn Thái Ngọc Duywrites: > > > diff --git a/commit.h b/commit.h > > index 838f6a6b26..70371e111e 100644 > > --- a/commit.h > > +++ b/commit.h > > @@ -18,12 +18,16 @@ struct commit_list { > > > > struct commit { > > struct object object; > > - void *util; > > unsigned int index; > > timestamp_t date; > > struct commit_list *parents; > > struct tree *tree; > > uint32_t graph_pos; > > + /* > > +* Do not add more fields here unless it's _very_ often > > +* used. Use commit-slab to associate more data with a commit > > +* instead. > > +*/ > > }; > > That's a logical consequence and a natural endgame of this > pleasent-to-read series. THanks. > > Unfortunately we are gaining more and more stuff in "struct commit" > with recent topics, and we may want to see if we can evict some of > them out to shrink it again. Sigh.. ds/lazy-load-trees already enters 'next' so a fixup patch is something like this. Derrick, do you want to finish up this patch? I could do that by myself, but I guess this could be good for you to get familiar with commit-slab because ds/generation-numbers on 'pu' also adds a new field in struct commit. We should avoid adding more stuff in struct commit to reduce overhead when these fields are not used (and I'm not sure how often generation numbers are used to justify its place here). One clean way to do it is with commit-slab, as demonstrated here for 'maybe_tree' field. Anyway the patch is like this -- 8< -- diff --git a/commit-graph.c b/commit-graph.c index 70fa1b25fd..9d084f6200 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -9,6 +9,7 @@ #include "revision.h" #include "sha1-lookup.h" #include "commit-graph.h" +#include "commit-slab.h" #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */ #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ @@ -38,6 +39,26 @@ #define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \ GRAPH_OID_LEN + 8) + +/* + * If the commit is loaded from the commit-graph file, then this + * member may be NULL. Only access it through get_commit_tree() + * or get_commit_tree_oid(). + */ +define_commit_slab(maybe_tree, struct tree *); +struct maybe_tree maybe_trees = COMMIT_SLAB_INIT(1, maybe_trees); + +struct tree *set_maybe_tree(const struct commit *commit, struct tree *tree) +{ + *maybe_tree_at(_trees, commit) = tree; + return tree; +} + +struct tree *get_maybe_tree(const struct commit *commit) +{ + return *maybe_tree_at(_trees, commit); +} + char *get_commit_graph_filename(const char *obj_dir) { return xstrfmt("%s/info/commit-graph", obj_dir); @@ -256,7 +277,7 @@ static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uin item->object.parsed = 1; item->graph_pos = pos; - item->maybe_tree = NULL; + set_maybe_tree(item, NULL); date_high = get_be32(commit_data + g->hash_len + 8) & 0x3; date_low = get_be32(commit_data + g->hash_len + 12); @@ -322,15 +343,15 @@ static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit * GRAPH_DATA_WIDTH * (c->graph_pos); hashcpy(oid.hash, commit_data); - c->maybe_tree = lookup_tree(); - - return c->maybe_tree; + return set_maybe_tree(c, lookup_tree()); } struct tree *get_commit_tree_in_graph(const struct commit *c) { - if (c->maybe_tree) - return c->maybe_tree; + struct tree *maybe_tree = get_maybe_tree(c); + + if (maybe_tree) + return maybe_tree; if (c->graph_pos == COMMIT_NOT_FROM_GRAPH) BUG("get_commit_tree_in_graph called from non-commit-graph commit"); diff --git a/commit-graph.h b/commit-graph.h index 260a468e73..4a02e4b05e 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -45,4 +45,7 @@ void write_commit_graph(const char *obj_dir, int nr_commits, int append); +struct tree *set_maybe_tree(const struct commit *commit, struct tree *tree); +struct tree *get_maybe_tree(const struct commit *commit); + #endif diff --git a/commit.c b/commit.c index 711f674c18..45521ab2de 100644 --- a/commit.c +++ b/commit.c @@ -298,8 +298,10 @@ void free_commit_buffer(struct commit *commit) struct tree *get_commit_tree(const struct commit *commit) { - if (commit->maybe_tree || !commit->object.parsed) - return commit->maybe_tree; + struct tree *maybe_tree = get_maybe_tree(commit); + + if (maybe_tree || !commit->object.parsed) + return maybe_tree; if (commit->graph_pos == COMMIT_NOT_FROM_GRAPH) BUG("commit has NULL tree, but was not loaded from commit-graph"); @@ -351,7 +353,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s if
Re: [PATCH v2 14/14] commit.h: delete 'util' field in struct commit
Nguyễn Thái Ngọc Duywrites: > diff --git a/commit.h b/commit.h > index 838f6a6b26..70371e111e 100644 > --- a/commit.h > +++ b/commit.h > @@ -18,12 +18,16 @@ struct commit_list { > > struct commit { > struct object object; > - void *util; > unsigned int index; > timestamp_t date; > struct commit_list *parents; > struct tree *tree; > uint32_t graph_pos; > + /* > + * Do not add more fields here unless it's _very_ often > + * used. Use commit-slab to associate more data with a commit > + * instead. > + */ > }; That's a logical consequence and a natural endgame of this pleasent-to-read series. THanks. Unfortunately we are gaining more and more stuff in "struct commit" with recent topics, and we may want to see if we can evict some of them out to shrink it again. To make it clear that everything that is less often used is keyed off of the "index" field, I think it makes sense to move that field to either end of the struct (I think in today's integration I resolved a few merges to leave the "index" field at the end).
[PATCH v2 14/14] commit.h: delete 'util' field in struct commit
If you have come this far, you probably have seen that this 'util' pointer is used for many different purposes. Some are not even contained in a command code, but buried deep in common code with no clue who will use it and how. The move to using commit-slab gives us a much better picture of how some piece of data is associated with a commit and what for. Since nobody uses 'util' pointer anymore, we can retire so that nobody will abuse it again. commit-slab will be the way forward for associating data to a commit. As a side benefit, this shrinks struct commit by 8 bytes (on 64-bit architecture) which should help reduce memory usage for reachability test a bit. This is also what commit-slab is invented for [1]. [1] 96c4f4a370 (commit: allow associating auxiliary info on-demand - 2013-04-09) Signed-off-by: Nguyễn Thái Ngọc Duy--- commit.h | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/commit.h b/commit.h index 838f6a6b26..70371e111e 100644 --- a/commit.h +++ b/commit.h @@ -18,12 +18,16 @@ struct commit_list { struct commit { struct object object; - void *util; unsigned int index; timestamp_t date; struct commit_list *parents; struct tree *tree; uint32_t graph_pos; + /* +* Do not add more fields here unless it's _very_ often +* used. Use commit-slab to associate more data with a commit +* instead. +*/ }; extern int save_commit_buffer; -- 2.17.0.705.g3525833791