Re: [PATCH 03/12] shallow.c: use commit-slab for commit depth instead of commit->util
On Sat, May 12, 2018 at 02:09:18PM +0200, Duy Nguyen wrote: > > That wastes an extra 4 bytes per slot over storing an int directly, but > > it's the same as storing an 8-byte pointer, plus you avoid the storage > > and runtime overhead of malloc. > > Or we could have a way to let the user decide initial values. If the > initial value here is -1 (which can't possibly be used in the current > code), it could be the sentinel value. That's actually a little tricky. Right now the sentinels are assigned by xcalloc(), so they're all-bits zero. We _could_ walk any newly allocated slab and assign a value to each element, but I'd worry that is wasteful for cases where might only use a small part of the slab (and my assumption is that the OS can give us zero'd pages pretty cheaply, and maybe even avoid allocating a page until we touch them, but I don't know how true that is). > Did you notice the for loop in the end to free "int *"? I don't like > peeking inside a slab that way and would prefer passing a "free" > function pointer to clear_commit_depth(), or just assign a "free" > function to some new field in struct commit_depth and > clear_commit_depth() will call it. If we have a new field for "free" > callback in the struct, it makes sense to have an "init" callback to > do extra initialization on top of xcalloc. You might find that a free() is tricky with the type system. This is all implemented with macros, so you'd end up calling free() on an int (even if it's a function that nobody ever calls). I suppose the value could be cast to void to shut up the compiler, but then that function is like a ticking time bomb. ;) So I guess you'd need a variant of define_commit_slab() that defines the clear function and one that doesn't. > > I actually wonder if we could wrap commit_slab with a variant that > > stores the sentinel data itself, to make this pattern easier. I.e., > > something like: > > > > #define define_commit_slab_sentinel(name, type) \ > > struct name##_value { \ > > unsigned valid:1; \ > > type value; \ > > }; \ > > define_commit_slab(name, struct name##_value) > > > > and some matching "peek" and "at" functions to manipulate value > > directly. > > If you keep this valid bit in a separate slab, you can pack these bits > very tight and not worry about wasting memory. Lookup in commit-slab > is cheap enough that doing double lookups (one for valid field, one > for value) is not a problem, I think. True, though your cache behavior gets worse if your have two separate arrays. I'm not sure pinching bytes matters all that much. linux.git has ~600k commits, so even there a few bytes each is not so bad (although again, we get into cache issues). > Yeah I think I will stick with a faithful conversion for now. The > conversion shows room for improvement which could be the next > microproject (I thought of adding this removing 'util' task as a 2019 > microproject but it was too tricky for newcomers to do). Makes sense to me. -Peff
Re: [PATCH 03/12] shallow.c: use commit-slab for commit depth instead of commit->util
On Sat, May 12, 2018 at 11:18 AM, Jeff Kingwrote: > On Sat, May 12, 2018 at 05:07:48AM -0400, Jeff King wrote: > >> So no, it wouldn't work to directly store depths with the code as >> written. I'm not sure if the depth can ever be 0. If not, then it would >> be a suitable sentinel as: >> >> int *slot = commit_depth_at(, p->item); >> if (!*slot || cur_depth < *slot) >> *slot = cur_depth; >> >> But somebody would have to dig into the possible values of cur_depth >> there (which would make sense to do as a separate patch anyway, since >> the point of this is to be a direct conversion). > > By the way, one other approach if xcalloc() doesn't produce a good > sentinel is to use a data type that does. ;) E.g., something like this > should work: > > struct depth { > unsigned valid:1; > int value; > }; > define_commit_slab(commit_depth, struct depth); > > ... > > struct depth *slot = commit_depth_at(, p->item); > if (!slot->valid || cur_depth < slot->value) { > slot->value = cur_depth; > slot->valid = 1; > } > > That wastes an extra 4 bytes per slot over storing an int directly, but > it's the same as storing an 8-byte pointer, plus you avoid the storage > and runtime overhead of malloc. Or we could have a way to let the user decide initial values. If the initial value here is -1 (which can't possibly be used in the current code), it could be the sentinel value. Did you notice the for loop in the end to free "int *"? I don't like peeking inside a slab that way and would prefer passing a "free" function pointer to clear_commit_depth(), or just assign a "free" function to some new field in struct commit_depth and clear_commit_depth() will call it. If we have a new field for "free" callback in the struct, it makes sense to have an "init" callback to do extra initialization on top of xcalloc. > I actually wonder if we could wrap commit_slab with a variant that > stores the sentinel data itself, to make this pattern easier. I.e., > something like: > > #define define_commit_slab_sentinel(name, type) \ > struct name##_value { \ > unsigned valid:1; \ > type value; \ > }; \ > define_commit_slab(name, struct name##_value) > > and some matching "peek" and "at" functions to manipulate value > directly. If you keep this valid bit in a separate slab, you can pack these bits very tight and not worry about wasting memory. Lookup in commit-slab is cheap enough that doing double lookups (one for valid field, one for value) is not a problem, I think. > I think the end result would be nicer, but it's turning into a little > bit of a rabbit hole. So I don't mind going with your direct conversion > here for now. Yeah I think I will stick with a faithful conversion for now. The conversion shows room for improvement which could be the next microproject (I thought of adding this removing 'util' task as a 2019 microproject but it was too tricky for newcomers to do). -- Duy
Re: [PATCH 03/12] shallow.c: use commit-slab for commit depth instead of commit->util
On Sat, May 12, 2018 at 11:07 AM, Jeff Kingwrote: > On Sat, May 12, 2018 at 10:00:19AM +0200, Nguyễn Thái Ngọc Duy wrote: > >> @@ -82,25 +84,29 @@ struct commit_list *get_shallow_commits(struct >> object_array *heads, int depth, >> struct object_array stack = OBJECT_ARRAY_INIT; >> struct commit *commit = NULL; >> struct commit_graft *graft; >> + struct commit_depth depths; >> >> + init_commit_depth(); >> while (commit || i < heads->nr || stack.nr) { >> struct commit_list *p; >> if (!commit) { >> if (i < heads->nr) { >> + int **depth_slot; >> commit = (struct commit *) >> deref_tag(heads->objects[i++].item, >> NULL, 0); >> if (!commit || commit->object.type != >> OBJ_COMMIT) { >> commit = NULL; >> continue; >> } >> - if (!commit->util) >> - commit->util = xmalloc(sizeof(int)); >> - *(int *)commit->util = 0; >> + depth_slot = commit_depth_at(, commit); >> + if (!*depth_slot) >> + *depth_slot = xmalloc(sizeof(int)); >> + **depth_slot = 0; > > It looks like we could save ourselves an extra layer of indirection (and > tiny heap allocations) by just storing an "int" directly in the slab. > Do we ever use the NULL as a sentinel value? > > Here we just allocate it if not set. Let's see if we can find some > others... > >> @@ -116,25 +122,32 @@ struct commit_list *get_shallow_commits(struct >> object_array *heads, int depth, >> } >> commit->object.flags |= not_shallow_flag; >> for (p = commit->parents, commit = NULL; p; p = p->next) { >> - if (!p->item->util) { >> - int *pointer = xmalloc(sizeof(int)); >> - p->item->util = pointer; >> - *pointer = cur_depth; >> + int **depth_slot = commit_depth_at(, p->item); >> + if (!*depth_slot) { >> + *depth_slot = xmalloc(sizeof(int)); >> + **depth_slot = cur_depth; >> } else { >> - int *pointer = p->item->util; >> - if (cur_depth >= *pointer) >> + if (cur_depth >= **depth_slot) >> continue; >> - *pointer = cur_depth; >> + **depth_slot = cur_depth; >> } > > Here we malloc again if it's not set. But we do behave slightly > differently when we see NULL, in that we do not bother to even compare > against cur_depth. So if we were to directly store ints, we'd see "0" as > the sentinel depth here, which would not match our "cur_depth >= > depth_slot" check. > > So no, it wouldn't work to directly store depths with the code as > written. I'm not sure if the depth can ever be 0. If not, then it would > be a suitable sentinel as: > > int *slot = commit_depth_at(, p->item); > if (!*slot || cur_depth < *slot) > *slot = cur_depth; > > But somebody would have to dig into the possible values of cur_depth > there (which would make sense to do as a separate patch anyway, since > the point of this is to be a direct conversion). I actually tried that first, going with storing int directly in the slab instead of int*. And some shallow tests failed so I didn't bother (the goal was to get rid of 'util' pointer, not to optimize more) -- Duy
Re: [PATCH 03/12] shallow.c: use commit-slab for commit depth instead of commit->util
On Sat, May 12, 2018 at 05:07:48AM -0400, Jeff King wrote: > So no, it wouldn't work to directly store depths with the code as > written. I'm not sure if the depth can ever be 0. If not, then it would > be a suitable sentinel as: > > int *slot = commit_depth_at(, p->item); > if (!*slot || cur_depth < *slot) > *slot = cur_depth; > > But somebody would have to dig into the possible values of cur_depth > there (which would make sense to do as a separate patch anyway, since > the point of this is to be a direct conversion). By the way, one other approach if xcalloc() doesn't produce a good sentinel is to use a data type that does. ;) E.g., something like this should work: struct depth { unsigned valid:1; int value; }; define_commit_slab(commit_depth, struct depth); ... struct depth *slot = commit_depth_at(, p->item); if (!slot->valid || cur_depth < slot->value) { slot->value = cur_depth; slot->valid = 1; } That wastes an extra 4 bytes per slot over storing an int directly, but it's the same as storing an 8-byte pointer, plus you avoid the storage and runtime overhead of malloc. I actually wonder if we could wrap commit_slab with a variant that stores the sentinel data itself, to make this pattern easier. I.e., something like: #define define_commit_slab_sentinel(name, type) \ struct name##_value { \ unsigned valid:1; \ type value; \ }; \ define_commit_slab(name, struct name##_value) and some matching "peek" and "at" functions to manipulate value directly. I think the end result would be nicer, but it's turning into a little bit of a rabbit hole. So I don't mind going with your direct conversion here for now. -Peff
Re: [PATCH 03/12] shallow.c: use commit-slab for commit depth instead of commit->util
On Sat, May 12, 2018 at 10:00:19AM +0200, Nguyễn Thái Ngọc Duy wrote: > @@ -82,25 +84,29 @@ struct commit_list *get_shallow_commits(struct > object_array *heads, int depth, > struct object_array stack = OBJECT_ARRAY_INIT; > struct commit *commit = NULL; > struct commit_graft *graft; > + struct commit_depth depths; > > + init_commit_depth(); > while (commit || i < heads->nr || stack.nr) { > struct commit_list *p; > if (!commit) { > if (i < heads->nr) { > + int **depth_slot; > commit = (struct commit *) > deref_tag(heads->objects[i++].item, > NULL, 0); > if (!commit || commit->object.type != > OBJ_COMMIT) { > commit = NULL; > continue; > } > - if (!commit->util) > - commit->util = xmalloc(sizeof(int)); > - *(int *)commit->util = 0; > + depth_slot = commit_depth_at(, commit); > + if (!*depth_slot) > + *depth_slot = xmalloc(sizeof(int)); > + **depth_slot = 0; It looks like we could save ourselves an extra layer of indirection (and tiny heap allocations) by just storing an "int" directly in the slab. Do we ever use the NULL as a sentinel value? Here we just allocate it if not set. Let's see if we can find some others... > @@ -116,25 +122,32 @@ struct commit_list *get_shallow_commits(struct > object_array *heads, int depth, > } > commit->object.flags |= not_shallow_flag; > for (p = commit->parents, commit = NULL; p; p = p->next) { > - if (!p->item->util) { > - int *pointer = xmalloc(sizeof(int)); > - p->item->util = pointer; > - *pointer = cur_depth; > + int **depth_slot = commit_depth_at(, p->item); > + if (!*depth_slot) { > + *depth_slot = xmalloc(sizeof(int)); > + **depth_slot = cur_depth; > } else { > - int *pointer = p->item->util; > - if (cur_depth >= *pointer) > + if (cur_depth >= **depth_slot) > continue; > - *pointer = cur_depth; > + **depth_slot = cur_depth; > } Here we malloc again if it's not set. But we do behave slightly differently when we see NULL, in that we do not bother to even compare against cur_depth. So if we were to directly store ints, we'd see "0" as the sentinel depth here, which would not match our "cur_depth >= depth_slot" check. So no, it wouldn't work to directly store depths with the code as written. I'm not sure if the depth can ever be 0. If not, then it would be a suitable sentinel as: int *slot = commit_depth_at(, p->item); if (!*slot || cur_depth < *slot) *slot = cur_depth; But somebody would have to dig into the possible values of cur_depth there (which would make sense to do as a separate patch anyway, since the point of this is to be a direct conversion). -Peff
[PATCH 03/12] shallow.c: use commit-slab for commit depth instead of commit->util
It's done so that commit->util can be removed. See more explanation in the commit that removes commit->util. While at there, plug a leak for keeping track of depth in this code. Signed-off-by: Nguyễn Thái Ngọc Duy--- shallow.c | 37 + 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/shallow.c b/shallow.c index df4d44ea7a..6ea411b0d2 100644 --- a/shallow.c +++ b/shallow.c @@ -12,6 +12,7 @@ #include "commit-slab.h" #include "revision.h" #include "list-objects.h" +#include "commit-slab.h" static int is_shallow = -1; static struct stat_validity shallow_stat; @@ -74,6 +75,7 @@ int is_repository_shallow(void) return is_shallow; } +define_commit_slab(commit_depth, int *); struct commit_list *get_shallow_commits(struct object_array *heads, int depth, int shallow_flag, int not_shallow_flag) { @@ -82,25 +84,29 @@ struct commit_list *get_shallow_commits(struct object_array *heads, int depth, struct object_array stack = OBJECT_ARRAY_INIT; struct commit *commit = NULL; struct commit_graft *graft; + struct commit_depth depths; + init_commit_depth(); while (commit || i < heads->nr || stack.nr) { struct commit_list *p; if (!commit) { if (i < heads->nr) { + int **depth_slot; commit = (struct commit *) deref_tag(heads->objects[i++].item, NULL, 0); if (!commit || commit->object.type != OBJ_COMMIT) { commit = NULL; continue; } - if (!commit->util) - commit->util = xmalloc(sizeof(int)); - *(int *)commit->util = 0; + depth_slot = commit_depth_at(, commit); + if (!*depth_slot) + *depth_slot = xmalloc(sizeof(int)); + **depth_slot = 0; cur_depth = 0; } else { commit = (struct commit *) object_array_pop(); - cur_depth = *(int *)commit->util; + cur_depth = **commit_depth_peek(, commit); } } parse_commit_or_die(commit); @@ -116,25 +122,32 @@ struct commit_list *get_shallow_commits(struct object_array *heads, int depth, } commit->object.flags |= not_shallow_flag; for (p = commit->parents, commit = NULL; p; p = p->next) { - if (!p->item->util) { - int *pointer = xmalloc(sizeof(int)); - p->item->util = pointer; - *pointer = cur_depth; + int **depth_slot = commit_depth_at(, p->item); + if (!*depth_slot) { + *depth_slot = xmalloc(sizeof(int)); + **depth_slot = cur_depth; } else { - int *pointer = p->item->util; - if (cur_depth >= *pointer) + if (cur_depth >= **depth_slot) continue; - *pointer = cur_depth; + **depth_slot = cur_depth; } if (p->next) add_object_array(>item->object, NULL, ); else { commit = p->item; - cur_depth = *(int *)commit->util; + depth_slot = commit_depth_peek(, commit); + cur_depth = **depth_slot; } } } + for (i = 0; i < depths.slab_count; i++) { + int j; + + for (j = 0; j < depths.slab_size; j++) + free(depths.slab[i][j]); + } + clear_commit_depth(); return result; } -- 2.17.0.705.g3525833791