Re: [PATCH 03/12] shallow.c: use commit-slab for commit depth instead of commit->util

2018-05-12 Thread Jeff King
On Sat, May 12, 2018 at 02:09:18PM +0200, Duy Nguyen wrote:

> > That wastes an extra 4 bytes per slot over storing an int directly, but
> > it's the same as storing an 8-byte pointer, plus you avoid the storage
> > and runtime overhead of malloc.
> 
> Or we could have a way to let the user decide initial values. If the
> initial value here is -1 (which can't possibly be used in the current
> code), it could be the sentinel value.

That's actually a little tricky. Right now the sentinels are assigned by
xcalloc(), so they're all-bits zero. We _could_ walk any newly allocated
slab and assign a value to each element, but I'd worry that is wasteful
for cases where might only use a small part of the slab (and my
assumption is that the OS can give us zero'd pages pretty cheaply, and
maybe even avoid allocating a page until we touch them, but I don't know
how true that is).

> Did you notice the for loop in the end to free "int *"? I don't like
> peeking inside a slab that way and would prefer passing a "free"
> function pointer to clear_commit_depth(), or just assign a "free"
> function to some new field in struct commit_depth and
> clear_commit_depth() will call it. If we have a new field for "free"
> callback in the struct, it makes sense to have an "init" callback to
> do extra initialization on top of xcalloc.

You might find that a free() is tricky with the type system. This is all
implemented with macros, so you'd end up calling free() on an int (even
if it's a function that nobody ever calls). I suppose the value could be
cast to void to shut up the compiler, but then that function is like a
ticking time bomb. ;)

So I guess you'd need a variant of define_commit_slab() that defines the
clear function and one that doesn't.

> > I actually wonder if we could wrap commit_slab with a variant that
> > stores the sentinel data itself, to make this pattern easier. I.e.,
> > something like:
> >
> >   #define define_commit_slab_sentinel(name, type) \
> > struct name##_value { \
> > unsigned valid:1; \
> > type value; \
> > }; \
> > define_commit_slab(name, struct name##_value)
> >
> > and some matching "peek" and "at" functions to manipulate value
> > directly.
> 
> If you keep this valid bit in a separate slab, you can pack these bits
> very tight and not worry about wasting memory. Lookup in commit-slab
> is cheap enough that doing double lookups (one for valid field, one
> for value) is not a problem, I think.

True, though your cache behavior gets worse if your have two separate
arrays. I'm not sure pinching bytes matters all that much. linux.git has
~600k commits, so even there a few bytes each is not so bad (although
again, we get into cache issues).

> Yeah I think I will stick with a faithful conversion for now. The
> conversion shows room for improvement which could be the next
> microproject (I thought of adding this removing 'util' task as a 2019
> microproject but it was too tricky for newcomers to do).

Makes sense to me.

-Peff


Re: [PATCH 03/12] shallow.c: use commit-slab for commit depth instead of commit->util

2018-05-12 Thread Duy Nguyen
On Sat, May 12, 2018 at 11:18 AM, Jeff King  wrote:
> On Sat, May 12, 2018 at 05:07:48AM -0400, Jeff King wrote:
>
>> So no, it wouldn't work to directly store depths with the code as
>> written.  I'm not sure if the depth can ever be 0. If not, then it would
>> be a suitable sentinel as:
>>
>>   int *slot = commit_depth_at(, p->item);
>>   if (!*slot || cur_depth < *slot)
>>   *slot = cur_depth;
>>
>> But somebody would have to dig into the possible values of cur_depth
>> there (which would make sense to do as a separate patch anyway, since
>> the point of this is to be a direct conversion).
>
> By the way, one other approach if xcalloc() doesn't produce a good
> sentinel is to use a data type that does. ;) E.g., something like this
> should work:
>
>   struct depth {
> unsigned valid:1;
> int value;
>   };
>   define_commit_slab(commit_depth, struct depth);
>
>   ...
>
>   struct depth *slot = commit_depth_at(, p->item);
>   if (!slot->valid || cur_depth < slot->value) {
> slot->value = cur_depth;
> slot->valid = 1;
>   }
>
> That wastes an extra 4 bytes per slot over storing an int directly, but
> it's the same as storing an 8-byte pointer, plus you avoid the storage
> and runtime overhead of malloc.

Or we could have a way to let the user decide initial values. If the
initial value here is -1 (which can't possibly be used in the current
code), it could be the sentinel value.

Did you notice the for loop in the end to free "int *"? I don't like
peeking inside a slab that way and would prefer passing a "free"
function pointer to clear_commit_depth(), or just assign a "free"
function to some new field in struct commit_depth and
clear_commit_depth() will call it. If we have a new field for "free"
callback in the struct, it makes sense to have an "init" callback to
do extra initialization on top of xcalloc.

> I actually wonder if we could wrap commit_slab with a variant that
> stores the sentinel data itself, to make this pattern easier. I.e.,
> something like:
>
>   #define define_commit_slab_sentinel(name, type) \
> struct name##_value { \
> unsigned valid:1; \
> type value; \
> }; \
> define_commit_slab(name, struct name##_value)
>
> and some matching "peek" and "at" functions to manipulate value
> directly.

If you keep this valid bit in a separate slab, you can pack these bits
very tight and not worry about wasting memory. Lookup in commit-slab
is cheap enough that doing double lookups (one for valid field, one
for value) is not a problem, I think.

> I think the end result would be nicer, but it's turning into a little
> bit of a rabbit hole. So I don't mind going with your direct conversion
> here for now.

Yeah I think I will stick with a faithful conversion for now. The
conversion shows room for improvement which could be the next
microproject (I thought of adding this removing 'util' task as a 2019
microproject but it was too tricky for newcomers to do).
-- 
Duy


Re: [PATCH 03/12] shallow.c: use commit-slab for commit depth instead of commit->util

2018-05-12 Thread Duy Nguyen
On Sat, May 12, 2018 at 11:07 AM, Jeff King  wrote:
> On Sat, May 12, 2018 at 10:00:19AM +0200, Nguyễn Thái Ngọc Duy wrote:
>
>> @@ -82,25 +84,29 @@ struct commit_list *get_shallow_commits(struct 
>> object_array *heads, int depth,
>>   struct object_array stack = OBJECT_ARRAY_INIT;
>>   struct commit *commit = NULL;
>>   struct commit_graft *graft;
>> + struct commit_depth depths;
>>
>> + init_commit_depth();
>>   while (commit || i < heads->nr || stack.nr) {
>>   struct commit_list *p;
>>   if (!commit) {
>>   if (i < heads->nr) {
>> + int **depth_slot;
>>   commit = (struct commit *)
>>   deref_tag(heads->objects[i++].item, 
>> NULL, 0);
>>   if (!commit || commit->object.type != 
>> OBJ_COMMIT) {
>>   commit = NULL;
>>   continue;
>>   }
>> - if (!commit->util)
>> - commit->util = xmalloc(sizeof(int));
>> - *(int *)commit->util = 0;
>> + depth_slot = commit_depth_at(, commit);
>> + if (!*depth_slot)
>> + *depth_slot = xmalloc(sizeof(int));
>> + **depth_slot = 0;
>
> It looks like we could save ourselves an extra layer of indirection (and
> tiny heap allocations) by just storing an "int" directly in the slab.
> Do we ever use the NULL as a sentinel value?
>
> Here we just allocate it if not set. Let's see if we can find some
> others...
>
>> @@ -116,25 +122,32 @@ struct commit_list *get_shallow_commits(struct 
>> object_array *heads, int depth,
>>   }
>>   commit->object.flags |= not_shallow_flag;
>>   for (p = commit->parents, commit = NULL; p; p = p->next) {
>> - if (!p->item->util) {
>> - int *pointer = xmalloc(sizeof(int));
>> - p->item->util = pointer;
>> - *pointer =  cur_depth;
>> + int **depth_slot = commit_depth_at(, p->item);
>> + if (!*depth_slot) {
>> + *depth_slot = xmalloc(sizeof(int));
>> + **depth_slot = cur_depth;
>>   } else {
>> - int *pointer = p->item->util;
>> - if (cur_depth >= *pointer)
>> + if (cur_depth >= **depth_slot)
>>   continue;
>> - *pointer = cur_depth;
>> + **depth_slot = cur_depth;
>>   }
>
> Here we malloc again if it's not set. But we do behave slightly
> differently when we see NULL, in that we do not bother to even compare
> against cur_depth. So if we were to directly store ints, we'd see "0" as
> the sentinel depth here, which would not match our "cur_depth >=
> depth_slot" check.
>
> So no, it wouldn't work to directly store depths with the code as
> written.  I'm not sure if the depth can ever be 0. If not, then it would
> be a suitable sentinel as:
>
>   int *slot = commit_depth_at(, p->item);
>   if (!*slot || cur_depth < *slot)
> *slot = cur_depth;
>
> But somebody would have to dig into the possible values of cur_depth
> there (which would make sense to do as a separate patch anyway, since
> the point of this is to be a direct conversion).

I actually tried that first, going with storing int directly in the
slab instead of int*. And some shallow tests failed so I didn't bother
(the goal was to get rid of 'util' pointer, not to optimize more)
-- 
Duy


Re: [PATCH 03/12] shallow.c: use commit-slab for commit depth instead of commit->util

2018-05-12 Thread Jeff King
On Sat, May 12, 2018 at 05:07:48AM -0400, Jeff King wrote:

> So no, it wouldn't work to directly store depths with the code as
> written.  I'm not sure if the depth can ever be 0. If not, then it would
> be a suitable sentinel as:
> 
>   int *slot = commit_depth_at(, p->item);
>   if (!*slot || cur_depth < *slot)
>   *slot = cur_depth;
> 
> But somebody would have to dig into the possible values of cur_depth
> there (which would make sense to do as a separate patch anyway, since
> the point of this is to be a direct conversion).

By the way, one other approach if xcalloc() doesn't produce a good
sentinel is to use a data type that does. ;) E.g., something like this
should work:

  struct depth {
unsigned valid:1;
int value;
  };
  define_commit_slab(commit_depth, struct depth);

  ...

  struct depth *slot = commit_depth_at(, p->item);
  if (!slot->valid || cur_depth < slot->value) {
slot->value = cur_depth;
slot->valid = 1;
  }

That wastes an extra 4 bytes per slot over storing an int directly, but
it's the same as storing an 8-byte pointer, plus you avoid the storage
and runtime overhead of malloc.

I actually wonder if we could wrap commit_slab with a variant that
stores the sentinel data itself, to make this pattern easier. I.e.,
something like:

  #define define_commit_slab_sentinel(name, type) \
struct name##_value { \
unsigned valid:1; \
type value; \
}; \
define_commit_slab(name, struct name##_value)

and some matching "peek" and "at" functions to manipulate value
directly.

I think the end result would be nicer, but it's turning into a little
bit of a rabbit hole. So I don't mind going with your direct conversion
here for now.

-Peff


Re: [PATCH 03/12] shallow.c: use commit-slab for commit depth instead of commit->util

2018-05-12 Thread Jeff King
On Sat, May 12, 2018 at 10:00:19AM +0200, Nguyễn Thái Ngọc Duy wrote:

> @@ -82,25 +84,29 @@ struct commit_list *get_shallow_commits(struct 
> object_array *heads, int depth,
>   struct object_array stack = OBJECT_ARRAY_INIT;
>   struct commit *commit = NULL;
>   struct commit_graft *graft;
> + struct commit_depth depths;
>  
> + init_commit_depth();
>   while (commit || i < heads->nr || stack.nr) {
>   struct commit_list *p;
>   if (!commit) {
>   if (i < heads->nr) {
> + int **depth_slot;
>   commit = (struct commit *)
>   deref_tag(heads->objects[i++].item, 
> NULL, 0);
>   if (!commit || commit->object.type != 
> OBJ_COMMIT) {
>   commit = NULL;
>   continue;
>   }
> - if (!commit->util)
> - commit->util = xmalloc(sizeof(int));
> - *(int *)commit->util = 0;
> + depth_slot = commit_depth_at(, commit);
> + if (!*depth_slot)
> + *depth_slot = xmalloc(sizeof(int));
> + **depth_slot = 0;

It looks like we could save ourselves an extra layer of indirection (and
tiny heap allocations) by just storing an "int" directly in the slab.
Do we ever use the NULL as a sentinel value?

Here we just allocate it if not set. Let's see if we can find some
others...

> @@ -116,25 +122,32 @@ struct commit_list *get_shallow_commits(struct 
> object_array *heads, int depth,
>   }
>   commit->object.flags |= not_shallow_flag;
>   for (p = commit->parents, commit = NULL; p; p = p->next) {
> - if (!p->item->util) {
> - int *pointer = xmalloc(sizeof(int));
> - p->item->util = pointer;
> - *pointer =  cur_depth;
> + int **depth_slot = commit_depth_at(, p->item);
> + if (!*depth_slot) {
> + *depth_slot = xmalloc(sizeof(int));
> + **depth_slot = cur_depth;
>   } else {
> - int *pointer = p->item->util;
> - if (cur_depth >= *pointer)
> + if (cur_depth >= **depth_slot)
>   continue;
> - *pointer = cur_depth;
> + **depth_slot = cur_depth;
>   }

Here we malloc again if it's not set. But we do behave slightly
differently when we see NULL, in that we do not bother to even compare
against cur_depth. So if we were to directly store ints, we'd see "0" as
the sentinel depth here, which would not match our "cur_depth >=
depth_slot" check.

So no, it wouldn't work to directly store depths with the code as
written.  I'm not sure if the depth can ever be 0. If not, then it would
be a suitable sentinel as:

  int *slot = commit_depth_at(, p->item);
  if (!*slot || cur_depth < *slot)
*slot = cur_depth;

But somebody would have to dig into the possible values of cur_depth
there (which would make sense to do as a separate patch anyway, since
the point of this is to be a direct conversion).

-Peff


[PATCH 03/12] shallow.c: use commit-slab for commit depth instead of commit->util

2018-05-12 Thread Nguyễn Thái Ngọc Duy
It's done so that commit->util can be removed. See more explanation in
the commit that removes commit->util.

While at there, plug a leak for keeping track of depth in this code.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 shallow.c | 37 +
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/shallow.c b/shallow.c
index df4d44ea7a..6ea411b0d2 100644
--- a/shallow.c
+++ b/shallow.c
@@ -12,6 +12,7 @@
 #include "commit-slab.h"
 #include "revision.h"
 #include "list-objects.h"
+#include "commit-slab.h"
 
 static int is_shallow = -1;
 static struct stat_validity shallow_stat;
@@ -74,6 +75,7 @@ int is_repository_shallow(void)
return is_shallow;
 }
 
+define_commit_slab(commit_depth, int *);
 struct commit_list *get_shallow_commits(struct object_array *heads, int depth,
int shallow_flag, int not_shallow_flag)
 {
@@ -82,25 +84,29 @@ struct commit_list *get_shallow_commits(struct object_array 
*heads, int depth,
struct object_array stack = OBJECT_ARRAY_INIT;
struct commit *commit = NULL;
struct commit_graft *graft;
+   struct commit_depth depths;
 
+   init_commit_depth();
while (commit || i < heads->nr || stack.nr) {
struct commit_list *p;
if (!commit) {
if (i < heads->nr) {
+   int **depth_slot;
commit = (struct commit *)
deref_tag(heads->objects[i++].item, 
NULL, 0);
if (!commit || commit->object.type != 
OBJ_COMMIT) {
commit = NULL;
continue;
}
-   if (!commit->util)
-   commit->util = xmalloc(sizeof(int));
-   *(int *)commit->util = 0;
+   depth_slot = commit_depth_at(, commit);
+   if (!*depth_slot)
+   *depth_slot = xmalloc(sizeof(int));
+   **depth_slot = 0;
cur_depth = 0;
} else {
commit = (struct commit *)
object_array_pop();
-   cur_depth = *(int *)commit->util;
+   cur_depth = **commit_depth_peek(, 
commit);
}
}
parse_commit_or_die(commit);
@@ -116,25 +122,32 @@ struct commit_list *get_shallow_commits(struct 
object_array *heads, int depth,
}
commit->object.flags |= not_shallow_flag;
for (p = commit->parents, commit = NULL; p; p = p->next) {
-   if (!p->item->util) {
-   int *pointer = xmalloc(sizeof(int));
-   p->item->util = pointer;
-   *pointer =  cur_depth;
+   int **depth_slot = commit_depth_at(, p->item);
+   if (!*depth_slot) {
+   *depth_slot = xmalloc(sizeof(int));
+   **depth_slot = cur_depth;
} else {
-   int *pointer = p->item->util;
-   if (cur_depth >= *pointer)
+   if (cur_depth >= **depth_slot)
continue;
-   *pointer = cur_depth;
+   **depth_slot = cur_depth;
}
if (p->next)
add_object_array(>item->object,
NULL, );
else {
commit = p->item;
-   cur_depth = *(int *)commit->util;
+   depth_slot = commit_depth_peek(, commit);
+   cur_depth = **depth_slot;
}
}
}
+   for (i = 0; i < depths.slab_count; i++) {
+   int j;
+
+   for (j = 0; j < depths.slab_size; j++)
+   free(depths.slab[i][j]);
+   }
+   clear_commit_depth();
 
return result;
 }
-- 
2.17.0.705.g3525833791