Re: [PATCH v2 14/14] commit.h: delete 'util' field in struct commit

2018-05-18 Thread Duy Nguyen
On Mon, May 14, 2018 at 8:14 PM, Derrick Stolee  wrote:
> I disagree with the removal of "generation". My intention is to make the
> commit-graph feature a standard feature that most repos (of reasonable size)
> want to have. In that case, 'graph_pos' and 'generation' will be set during
> every parse and used by every commit walk. This is just my gut reaction.

My 'often used' is a poor choice of words. The problem here is memory
consumption in full-repo walk like 'git clone' or 'git gc', where more
memory use may equal more swapping and that will slow things down.

commit-slab also has a few advantage over packing everything in struct
commit besides the performance argument. It makes it easier to see
what field is used for what. And if an operation only uses a field in
a short time, then it's possible to free data after it's done
(impossible to free these struct commits without a lot more thinking
and checking; whatever you add here will stay until the end of the
process, which could be a long time for things like pack-objects)

> As I investigate these changes, I'll try to see what performance hit is
> caused by converting the graph_pos and/or generation to commit slabs.
> (Again, I'm assuming the slabs will make this slower. I may be wrong here.)

The slab allocation code is very similar to memory-pool, which was
made common code also by Microsoft to make reading cache faster so I
don't think it will slow things down for you (again, guessing, no real
measurement from my side). This does make me think it may be a good
idea to try unify commit-slab and memory-pool, let commit-slab be a
thin layer on top of memory-pool or something like that (the fewer
custom mem allocators we have, the better)
-- 
Duy


Re: [PATCH v2 14/14] commit.h: delete 'util' field in struct commit

2018-05-14 Thread Derrick Stolee

On 5/14/2018 1:30 PM, Duy Nguyen wrote:

On Mon, May 14, 2018 at 6:07 PM, Duy Nguyen  wrote:

On Mon, May 14, 2018 at 04:52:29PM +0900, Junio C Hamano wrote:

Nguyễn Thái Ngọc Duy   writes:


diff --git a/commit.h b/commit.h
index 838f6a6b26..70371e111e 100644
--- a/commit.h
+++ b/commit.h
@@ -18,12 +18,16 @@ struct commit_list {

  struct commit {
 struct object object;
-   void *util;
 unsigned int index;
 timestamp_t date;
 struct commit_list *parents;
 struct tree *tree;
 uint32_t graph_pos;
+   /*
+* Do not add more fields here unless it's _very_ often
+* used. Use commit-slab to associate more data with a commit
+* instead.
+*/
  };

That's a logical consequence and a natural endgame of this
pleasent-to-read series.  THanks.

Unfortunately we are gaining more and more stuff in "struct commit"
with recent topics, and we may want to see if we can evict some of
them out to shrink it again.

Sigh.. ds/lazy-load-trees already enters 'next' so a fixup patch is
something like this.

Sorry I take this patch back. I didn't realize it was a rename and the
old field named 'tree' was already there (I vaguely recalled some
"tree" in this struct but didn't stop to think about it). Moving
graph_pos out is an option, but only 32-bit arch gains from it (64-bit
arch will have 4 bytes padding anyway) so probably not worth the
effort. "generation" field should probably be moved out though.


I'm happy to take a look at this patch and figure out the best way to 
integrate it with the changes I've been doing.


I disagree with the removal of "generation". My intention is to make the 
commit-graph feature a standard feature that most repos (of reasonable 
size) want to have. In that case, 'graph_pos' and 'generation' will be 
set during every parse and used by every commit walk. This is just my 
gut reaction.


As I investigate these changes, I'll try to see what performance hit is 
caused by converting the graph_pos and/or generation to commit slabs. 
(Again, I'm assuming the slabs will make this slower. I may be wrong here.)


Thanks,
-Stolee


Re: [PATCH v2 14/14] commit.h: delete 'util' field in struct commit

2018-05-14 Thread Duy Nguyen
On Mon, May 14, 2018 at 6:07 PM, Duy Nguyen  wrote:
> On Mon, May 14, 2018 at 04:52:29PM +0900, Junio C Hamano wrote:
>> Nguyễn Thái Ngọc Duy   writes:
>>
>> > diff --git a/commit.h b/commit.h
>> > index 838f6a6b26..70371e111e 100644
>> > --- a/commit.h
>> > +++ b/commit.h
>> > @@ -18,12 +18,16 @@ struct commit_list {
>> >
>> >  struct commit {
>> > struct object object;
>> > -   void *util;
>> > unsigned int index;
>> > timestamp_t date;
>> > struct commit_list *parents;
>> > struct tree *tree;
>> > uint32_t graph_pos;
>> > +   /*
>> > +* Do not add more fields here unless it's _very_ often
>> > +* used. Use commit-slab to associate more data with a commit
>> > +* instead.
>> > +*/
>> >  };
>>
>> That's a logical consequence and a natural endgame of this
>> pleasent-to-read series.  THanks.
>>
>> Unfortunately we are gaining more and more stuff in "struct commit"
>> with recent topics, and we may want to see if we can evict some of
>> them out to shrink it again.
>
> Sigh.. ds/lazy-load-trees already enters 'next' so a fixup patch is
> something like this.

Sorry I take this patch back. I didn't realize it was a rename and the
old field named 'tree' was already there (I vaguely recalled some
"tree" in this struct but didn't stop to think about it). Moving
graph_pos out is an option, but only 32-bit arch gains from it (64-bit
arch will have 4 bytes padding anyway) so probably not worth the
effort. "generation" field should probably be moved out though.
-- 
Duy


Re: [PATCH v2 14/14] commit.h: delete 'util' field in struct commit

2018-05-14 Thread Duy Nguyen
On Mon, May 14, 2018 at 04:52:29PM +0900, Junio C Hamano wrote:
> Nguyễn Thái Ngọc Duy   writes:
> 
> > diff --git a/commit.h b/commit.h
> > index 838f6a6b26..70371e111e 100644
> > --- a/commit.h
> > +++ b/commit.h
> > @@ -18,12 +18,16 @@ struct commit_list {
> >  
> >  struct commit {
> > struct object object;
> > -   void *util;
> > unsigned int index;
> > timestamp_t date;
> > struct commit_list *parents;
> > struct tree *tree;
> > uint32_t graph_pos;
> > +   /*
> > +* Do not add more fields here unless it's _very_ often
> > +* used. Use commit-slab to associate more data with a commit
> > +* instead.
> > +*/
> >  };
> 
> That's a logical consequence and a natural endgame of this
> pleasent-to-read series.  THanks.
> 
> Unfortunately we are gaining more and more stuff in "struct commit"
> with recent topics, and we may want to see if we can evict some of
> them out to shrink it again.

Sigh.. ds/lazy-load-trees already enters 'next' so a fixup patch is
something like this.

Derrick, do you want to finish up this patch? I could do that by
myself, but I guess this could be good for you to get familiar with
commit-slab because ds/generation-numbers on 'pu' also adds a new
field in struct commit.

We should avoid adding more stuff in struct commit to reduce overhead
when these fields are not used (and I'm not sure how often generation
numbers are used to justify its place here). One clean way to do it is
with commit-slab, as demonstrated here for 'maybe_tree' field.

Anyway the patch is like this

-- 8< --
diff --git a/commit-graph.c b/commit-graph.c
index 70fa1b25fd..9d084f6200 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -9,6 +9,7 @@
 #include "revision.h"
 #include "sha1-lookup.h"
 #include "commit-graph.h"
+#include "commit-slab.h"
 
 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
@@ -38,6 +39,26 @@
 #define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
GRAPH_OID_LEN + 8)
 
+
+/*
+ * If the commit is loaded from the commit-graph file, then this
+ * member may be NULL. Only access it through get_commit_tree()
+ * or get_commit_tree_oid().
+ */
+define_commit_slab(maybe_tree, struct tree *);
+struct maybe_tree maybe_trees = COMMIT_SLAB_INIT(1, maybe_trees);
+
+struct tree *set_maybe_tree(const struct commit *commit, struct tree *tree)
+{
+   *maybe_tree_at(_trees, commit) = tree;
+   return tree;
+}
+
+struct tree *get_maybe_tree(const struct commit *commit)
+{
+   return *maybe_tree_at(_trees, commit);
+}
+
 char *get_commit_graph_filename(const char *obj_dir)
 {
return xstrfmt("%s/info/commit-graph", obj_dir);
@@ -256,7 +277,7 @@ static int fill_commit_in_graph(struct commit *item, struct 
commit_graph *g, uin
item->object.parsed = 1;
item->graph_pos = pos;
 
-   item->maybe_tree = NULL;
+   set_maybe_tree(item, NULL);
 
date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;
date_low = get_be32(commit_data + g->hash_len + 12);
@@ -322,15 +343,15 @@ static struct tree *load_tree_for_commit(struct 
commit_graph *g, struct commit *
   GRAPH_DATA_WIDTH * (c->graph_pos);
 
hashcpy(oid.hash, commit_data);
-   c->maybe_tree = lookup_tree();
-
-   return c->maybe_tree;
+   return set_maybe_tree(c, lookup_tree());
 }
 
 struct tree *get_commit_tree_in_graph(const struct commit *c)
 {
-   if (c->maybe_tree)
-   return c->maybe_tree;
+   struct tree *maybe_tree = get_maybe_tree(c);
+
+   if (maybe_tree)
+   return maybe_tree;
if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
BUG("get_commit_tree_in_graph called from non-commit-graph 
commit");
 
diff --git a/commit-graph.h b/commit-graph.h
index 260a468e73..4a02e4b05e 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -45,4 +45,7 @@ void write_commit_graph(const char *obj_dir,
int nr_commits,
int append);
 
+struct tree *set_maybe_tree(const struct commit *commit, struct tree *tree);
+struct tree *get_maybe_tree(const struct commit *commit);
+
 #endif
diff --git a/commit.c b/commit.c
index 711f674c18..45521ab2de 100644
--- a/commit.c
+++ b/commit.c
@@ -298,8 +298,10 @@ void free_commit_buffer(struct commit *commit)
 
 struct tree *get_commit_tree(const struct commit *commit)
 {
-   if (commit->maybe_tree || !commit->object.parsed)
-   return commit->maybe_tree;
+   struct tree *maybe_tree = get_maybe_tree(commit);
+
+   if (maybe_tree || !commit->object.parsed)
+   return maybe_tree;
 
if (commit->graph_pos == COMMIT_NOT_FROM_GRAPH)
BUG("commit has NULL tree, but was not loaded from 
commit-graph");
@@ -351,7 +353,7 @@ int parse_commit_buffer(struct commit *item, const void 
*buffer, unsigned long s
if 

Re: [PATCH v2 14/14] commit.h: delete 'util' field in struct commit

2018-05-14 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> diff --git a/commit.h b/commit.h
> index 838f6a6b26..70371e111e 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -18,12 +18,16 @@ struct commit_list {
>  
>  struct commit {
>   struct object object;
> - void *util;
>   unsigned int index;
>   timestamp_t date;
>   struct commit_list *parents;
>   struct tree *tree;
>   uint32_t graph_pos;
> + /*
> +  * Do not add more fields here unless it's _very_ often
> +  * used. Use commit-slab to associate more data with a commit
> +  * instead.
> +  */
>  };

That's a logical consequence and a natural endgame of this
pleasent-to-read series.  THanks.

Unfortunately we are gaining more and more stuff in "struct commit"
with recent topics, and we may want to see if we can evict some of
them out to shrink it again.

To make it clear that everything that is less often used is keyed
off of the "index" field, I think it makes sense to move that field
to either end of the struct (I think in today's integration I
resolved a few merges to leave the "index" field at the end).


[PATCH v2 14/14] commit.h: delete 'util' field in struct commit

2018-05-12 Thread Nguyễn Thái Ngọc Duy
If you have come this far, you probably have seen that this 'util'
pointer is used for many different purposes. Some are not even
contained in a command code, but buried deep in common code with no
clue who will use it and how.

The move to using commit-slab gives us a much better picture of how
some piece of data is associated with a commit and what for. Since
nobody uses 'util' pointer anymore, we can retire so that nobody will
abuse it again. commit-slab will be the way forward for associating
data to a commit.

As a side benefit, this shrinks struct commit by 8 bytes (on 64-bit
architecture) which should help reduce memory usage for reachability
test a bit. This is also what commit-slab is invented for [1].

[1] 96c4f4a370 (commit: allow associating auxiliary info on-demand -
2013-04-09)

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 commit.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/commit.h b/commit.h
index 838f6a6b26..70371e111e 100644
--- a/commit.h
+++ b/commit.h
@@ -18,12 +18,16 @@ struct commit_list {
 
 struct commit {
struct object object;
-   void *util;
unsigned int index;
timestamp_t date;
struct commit_list *parents;
struct tree *tree;
uint32_t graph_pos;
+   /*
+* Do not add more fields here unless it's _very_ often
+* used. Use commit-slab to associate more data with a commit
+* instead.
+*/
 };
 
 extern int save_commit_buffer;
-- 
2.17.0.705.g3525833791