Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-17 Thread Jeff King
On Sun, Oct 14, 2018 at 04:29:06PM +0200, René Scharfe wrote:

> Anyway, drove the generative approach a bit further, and came up with
> the new DEFINE_SORT below.  I'm unsure about the name; perhaps it should
> be called DEFINE_SORT_BY_COMPARE_FUNCTION_BODY, but that's a bit long.
> It handles casts and const attributes behind the scenes and avoids
> repetition, but looks a bit weird, as it is placed where a function
> signature would go.
> 
> Apart from that the macro is simple and doesn't use any tricks or
> added checks.  It just sets up boilerplate functions to offer type-safe
> sorting.
> 
> diffcore-rename.c and refs/packed-backend.c receive special treatment in
> the patch because their compare functions are used outside of sorting as
> well.  I made them take typed pointers nevertheless and used them from
> DEFINE_SORT; the wrapper generated by that macro is supposed to be
> private.  Given that such reuse is rare and I think we don't need a way
> to make it public.
> 
> What do y'all think about this direction?

I think it's the best we're likely to do, and is an improvement on the
status quo.

The patch looks overall sane to me. I think DEFINE_SORT() is a fine
name.

I think given a macro parameter "foo" you could generate sort_by_foo()
and compare_foo(), which would eliminate the extra layer in those
two cases you mentioned. But I'm also fine with the approach you've
shown here.

-Peff


Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-16 Thread Junio C Hamano
René Scharfe  writes:

> Apart from that the macro is simple and doesn't use any tricks or
> added checks.  It just sets up boilerplate functions to offer type-safe
> sorting.
> ...
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 5f2e90932f..491230fc57 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -1066,6 +1066,21 @@ static inline void sane_qsort(void *base, size_t 
> nmemb, size_t size,
>   qsort(base, nmemb, size, compar);
>  }
>  
> +#define DECLARE_SORT(scope, name, elemtype) \
> +scope void name(elemtype, size_t)
> +
> +#define DEFINE_SORT(scope, name, elemtype, one, two) \
> +static int name##_compare(const elemtype, const elemtype);   \
> +static int name##_compare_void(const void *a, const void *b) \
> +{\
> + return name##_compare(a, b);\
> +}\
> +scope void name(elemtype base, size_t nmemb) \
> +{\
> + QSORT(base, nmemb, name##_compare_void);\
> +}\
> +static int name##_compare(const elemtype one, const elemtype two)

... and here comes the body of the comparison function that takes
two "things" we are about, i.e. elements of the array being sorted.

Quite cleanly done and the result looks pleasant, at least to me.


Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-15 Thread René Scharfe
Am 15.10.2018 um 17:31 schrieb Derrick Stolee:
> On 10/14/2018 10:29 AM, René Scharfe wrote:
>> diff --git a/git-compat-util.h b/git-compat-util.h
>> index 5f2e90932f..491230fc57 100644
>> --- a/git-compat-util.h
>> +++ b/git-compat-util.h
>> @@ -1066,6 +1066,21 @@ static inline void sane_qsort(void *base, size_t 
>> nmemb, size_t size,
>>  qsort(base, nmemb, size, compar);
>>   }
>>   
>> +#define DECLARE_SORT(scope, name, elemtype) \
>> +scope void name(elemtype, size_t)
>> +
>> +#define DEFINE_SORT(scope, name, elemtype, one, two)
>> \
>> +static int name##_compare(const elemtype, const elemtype);  \
>> +static int name##_compare_void(const void *a, const void *b)
>> \
>> +{   \
>> +return name##_compare(a, b);\
>> +}   \
>> +scope void name(elemtype base, size_t nmemb)
>> \
>> +{   \
>> +QSORT(base, nmemb, name##_compare_void);\
>> +}   \
>> +static int name##_compare(const elemtype one, const elemtype two)
>> +
> 
> Since you were worried about the "private" name of the compare function, 
> maybe split this macro into two: DEFINE_COMPARE and DEFINE_SORT. Then, 
> if someone wants direct access to the compare function, they could use 
> the DEFINE_COMPARE to ensure the typing is correct, and use QSORT as 
> normal with name##_compare_void.

The pointers are converted to const void * somewhere along the way from
qsort() to compare function.  Splitting the macro would require type
check tricks to make sure the types of the compare function matches the
array to be sorted.  Letting a single macro bake it all into a layer
cake of generated functions is a lot simpler.

> As I think about this, I think this is less of a problem than is worth 
> this split. The commit-slab definitions generate a lot of methods using 
> the "name##" convention, so perhaps we should just trust developers 
> using the macros to look up the macro definition or similar examples. In 
> that sense, including a conversion that consumes the compare function 
> directly can be a signpost for future callers.

Using the generated compare function name directly is a bit awkward; e.g.
in the two example cases it would be sort_by_score_compare() and
sort_packed_ref_records_compare().  Defining the real compare function
the usual way (with a proper name) and having the DEFINE_SORT block call
it is a bit more repetitive, but clean and understandable IMHO.

We also could just leave complicated cases alone..

> I would say that maybe the times where you need to do something special 
> should be pulled out into their own patches, so we can call attention to 
> them directly.

Right; this patch was just a sketch.

> I like this "sort_by_" convention..
> 
>>   
>>  for (i = 0; i < nr_packs; i++) {
>>  pack_names[i] = pairs[i].pack_name;
>> @@ -455,10 +453,8 @@ struct pack_midx_entry {
>>  uint64_t offset;
>>   };
>>   
>> -static int midx_oid_compare(const void *_a, const void *_b)
>> +DEFINE_SORT(static, sort_midx, struct pack_midx_entry *, a, b)
>>   {
>> -const struct pack_midx_entry *a = (const struct pack_midx_entry *)_a;
>> -const struct pack_midx_entry *b = (const struct pack_midx_entry *)_b;
>>  int cmp = oidcmp(>oid, >oid);
>>   
>>  if (cmp)
>> @@ -573,7 +569,7 @@ static struct pack_midx_entry *get_sorted_entries(struct 
>> multi_pack_index *m,
>>  }
>>  }
>>   
>> -QSORT(entries_by_fanout, nr_fanout, midx_oid_compare);
>> +sort_midx(entries_by_fanout, nr_fanout);
> 
> ...but it isn't followed here. Perhaps "sort_by_oid"?

That function sorts by oid, pack_mtime, and pack_int_id, but including
all these fields in the name is a bit unwieldy.  Being unspecific by
calling it sort_midx() was the lazy way out.  Mentioning only oid is a
bit misleading.  Perhaps sort_by_oid_etc()?

René



Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-15 Thread Derrick Stolee

On 10/14/2018 10:29 AM, René Scharfe wrote:

It still has some repetition, converted code is a bit longer than the
current one, and I don't know how to build a Coccinelle rule that would
do that conversion.

Looked for a possibility to at least leave QSORT call-sites alone by
enhancing that macro, but didn't find any.  Found a few websites
showing off mindblowing macros, thouhg, this one in particular:

https://github.com/pfultz2/Cloak/wiki/C-Preprocessor-tricks,-tips,-and-idioms

Anyway, drove the generative approach a bit further, and came up with
the new DEFINE_SORT below.  I'm unsure about the name; perhaps it should
be called DEFINE_SORT_BY_COMPARE_FUNCTION_BODY, but that's a bit long.
It handles casts and const attributes behind the scenes and avoids
repetition, but looks a bit weird, as it is placed where a function
signature would go.

Apart from that the macro is simple and doesn't use any tricks or
added checks.  It just sets up boilerplate functions to offer type-safe
sorting.

diffcore-rename.c and refs/packed-backend.c receive special treatment in
the patch because their compare functions are used outside of sorting as
well.  I made them take typed pointers nevertheless and used them from
DEFINE_SORT; the wrapper generated by that macro is supposed to be
private.  Given that such reuse is rare and I think we don't need a way
to make it public.

What do y'all think about this direction?

---

[snip]

diff --git a/git-compat-util.h b/git-compat-util.h
index 5f2e90932f..491230fc57 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1066,6 +1066,21 @@ static inline void sane_qsort(void *base, size_t nmemb, 
size_t size,
qsort(base, nmemb, size, compar);
  }
  
+#define DECLARE_SORT(scope, name, elemtype) \

+scope void name(elemtype, size_t)
+
+#define DEFINE_SORT(scope, name, elemtype, one, two)   \
+static int name##_compare(const elemtype, const elemtype); \
+static int name##_compare_void(const void *a, const void *b)   \
+{  \
+   return name##_compare(a, b);\
+}  \
+scope void name(elemtype base, size_t nmemb)   \
+{  \
+   QSORT(base, nmemb, name##_compare_void);\
+}  \
+static int name##_compare(const elemtype one, const elemtype two)
+


Since you were worried about the "private" name of the compare function, 
maybe split this macro into two: DEFINE_COMPARE and DEFINE_SORT. Then, 
if someone wants direct access to the compare function, they could use 
the DEFINE_COMPARE to ensure the typing is correct, and use QSORT as 
normal with name##_compare_void.


As I think about this, I think this is less of a problem than is worth 
this split. The commit-slab definitions generate a lot of methods using 
the "name##" convention, so perhaps we should just trust developers 
using the macros to look up the macro definition or similar examples. In 
that sense, including a conversion that consumes the compare function 
directly can be a signpost for future callers.


I would say that maybe the times where you need to do something special 
should be pulled out into their own patches, so we can call attention to 
them directly.


[snip]

diff --git a/midx.c b/midx.c
index 713d6f9dde..4407db7949 100644
--- a/midx.c
+++ b/midx.c
@@ -419,10 +419,8 @@ struct pack_pair {
char *pack_name;
  };
  
-static int pack_pair_compare(const void *_a, const void *_b)

+DEFINE_SORT(static, sort_by_pack_name, struct pack_pair *, a, b)
  {
-   struct pack_pair *a = (struct pack_pair *)_a;
-   struct pack_pair *b = (struct pack_pair *)_b;
return strcmp(a->pack_name, b->pack_name);
  }
  
@@ -438,7 +436,7 @@ static void sort_packs_by_name(char **pack_names, uint32_t nr_packs, uint32_t *p

pairs[i].pack_name = pack_names[i];
}
  
-	QSORT(pairs, nr_packs, pack_pair_compare);

+   sort_by_pack_name(pairs, nr_packs);


I like this "sort_by_" convention..

  
  	for (i = 0; i < nr_packs; i++) {

pack_names[i] = pairs[i].pack_name;
@@ -455,10 +453,8 @@ struct pack_midx_entry {
uint64_t offset;
  };
  
-static int midx_oid_compare(const void *_a, const void *_b)

+DEFINE_SORT(static, sort_midx, struct pack_midx_entry *, a, b)
  {
-   const struct pack_midx_entry *a = (const struct pack_midx_entry *)_a;
-   const struct pack_midx_entry *b = (const struct pack_midx_entry *)_b;
int cmp = oidcmp(>oid, >oid);
  
  	if (cmp)

@@ -573,7 +569,7 @@ static struct pack_midx_entry *get_sorted_entries(struct 
multi_pack_index *m,
}
}
  
-		QSORT(entries_by_fanout, nr_fanout, midx_oid_compare);

+  

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-14 Thread René Scharfe
Am 05.10.2018 um 21:42 schrieb Jeff King:
> On Fri, Oct 05, 2018 at 09:36:28PM +0200, René Scharfe wrote:
> 
>> Am 05.10.2018 um 21:08 schrieb Jeff King:
>>> On Fri, Oct 05, 2018 at 08:48:27PM +0200, René Scharfe wrote:
 +#define DEFINE_SORT(name, type, compare)  \
 +static int compare##_void(const void *one, const void *two)   
 \
 +{ \
 +  return compare(one, two);   \
 +} \
 +static void name(type base, size_t nmemb) \
 +{ \
 +  const type dummy = NULL;\
 +  if (nmemb > 1)  \
 +  qsort(base, nmemb, sizeof(base[0]), compare##_void);\
 +  else if (0) \
 +  compare(dummy, dummy);  \
 +}
>>>
>>> I do like that this removes the need to have the code block aspart of
>>> the macro.
>>>
>>> Did you measure to see if there is any runtime impact?
>>
>> No, but I wouldn't expect any -- the generated code should be the same
>> in most cases.
>>
>> Here's an example: https://godbolt.org/z/gwXENy.
> 
> OK, that's good enough for me.
> 
>> The typed comparison function can be inlined into the one with the void
>> pointers, though.
> 
> Right, that makes sense. I suspect it depends on the comparison function
> being static, but in a DEFINE_SORT() world, they generally could be.
> 
> So I like this approach.

It still has some repetition, converted code is a bit longer than the
current one, and I don't know how to build a Coccinelle rule that would
do that conversion.

Looked for a possibility to at least leave QSORT call-sites alone by
enhancing that macro, but didn't find any.  Found a few websites
showing off mindblowing macros, thouhg, this one in particular:

https://github.com/pfultz2/Cloak/wiki/C-Preprocessor-tricks,-tips,-and-idioms

Anyway, drove the generative approach a bit further, and came up with
the new DEFINE_SORT below.  I'm unsure about the name; perhaps it should
be called DEFINE_SORT_BY_COMPARE_FUNCTION_BODY, but that's a bit long.
It handles casts and const attributes behind the scenes and avoids
repetition, but looks a bit weird, as it is placed where a function
signature would go.

Apart from that the macro is simple and doesn't use any tricks or
added checks.  It just sets up boilerplate functions to offer type-safe
sorting.

diffcore-rename.c and refs/packed-backend.c receive special treatment in
the patch because their compare functions are used outside of sorting as
well.  I made them take typed pointers nevertheless and used them from
DEFINE_SORT; the wrapper generated by that macro is supposed to be
private.  Given that such reuse is rare and I think we don't need a way
to make it public.

What do y'all think about this direction?

---
 bisect.c|  8 ++--
 builtin/describe.c  |  6 ++
 builtin/fmt-merge-msg.c | 10 --
 builtin/index-pack.c| 14 --
 builtin/name-rev.c  |  5 ++---
 builtin/pack-objects.c  |  7 ++-
 builtin/remote.c|  6 ++
 builtin/shortlog.c  | 15 ---
 commit-graph.c  |  6 ++
 delta-islands.c |  8 +++-
 diff.c  |  8 +++-
 diffcore-delta.c|  7 ++-
 diffcore-order.c|  7 ++-
 diffcore-rename.c   | 11 +++
 git-compat-util.h   | 15 +++
 help.c  |  7 ++-
 line-log.c  |  7 ++-
 midx.c  | 12 
 pack-check.c|  6 ++
 pathspec.c  |  8 ++--
 refs/packed-backend.c   | 11 ---
 sha1-array.c|  4 ++--
 sha1-name.c |  4 ++--
 23 files changed, 84 insertions(+), 108 deletions(-)

diff --git a/bisect.c b/bisect.c
index e8b17cf7e1..25257c2e69 100644
--- a/bisect.c
+++ b/bisect.c
@@ -192,12 +192,8 @@ struct commit_dist {
int distance;
 };
 
-static int compare_commit_dist(const void *a_, const void *b_)
+DEFINE_SORT(static, sort_by_commit_dist, struct commit_dist *, a, b)
 {
-   struct commit_dist *a, *b;
-
-   a = (struct commit_dist *)a_;
-   b = (struct commit_dist *)b_;
if (a->distance != b->distance)
return b->distance - a->distance; /* desc sort */
return oidcmp(>commit->object.oid, >commit->object.oid);
@@ -223,7 +219,7 @@ static struct commit_list *best_bisection_sorted(struct 
commit_list *list, int n
array[cnt].distance = distance;
cnt++;
}
-   QSORT(array, cnt, compare_commit_dist);
+   sort_by_commit_dist(array, cnt);
for (p = list, i = 0; i < cnt; i++) 

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread Jeff King
On Fri, Oct 05, 2018 at 09:42:49PM +0200, Ævar Arnfjörð Bjarmason wrote:

> >  that I imagine will create a lot of new problems. (We're just now
> > allowing C99 -- I don't even want to think about what kind of compiler
> > issues we'll run into on antique systems trying to use C++).
> 
> ...But just on this point: I was under the impression that this problem
> was way easier with C++. I.e. reason we're just now using C99 for
> portable C projects is because Microsoft for years refused to put any
> effort into updating their compiler to support newer C versions, while
> keeping up-to-date with C++, and that this has only recently started
> changing: https://en.wikipedia.org/wiki/C99#Implementations
> 
> Maybe there was some other popular vendor of C/C++ compilers that had
> the inverse of that story, but I'm not aware of any.

I'd worry about what the C++ story is on AIX, etc.

-Peff


Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread Ævar Arnfjörð Bjarmason


On Fri, Oct 05 2018, Jeff King wrote:

> On Fri, Oct 05, 2018 at 09:12:09PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > I'm not wild about declaring functions inside macros, just because it
>> > makes tools like ctags like useful (but I have certainly been guilty of
>> > it myself). I'd also worry that taking "code" as a macro parameter might
>> > not scale (what happens if the code has a comma in it?)
>>
>> There's always the option of generating the C code as part of some build
>> step and carrying around a big C file with various type-safe functions
>> that only differ in the types they operate on. It can even be committed
>> to source control.
>>
>> That sucks in some ways for sure, but is a lot friendlier for grepping,
>> ctags etc.
>
> Yeah, in a lot of ways the C preprocessor is not great for larger-scale
> code generation. I was hoping we could get away without having the
> bodies of these functions as part of the generated bit, though.
>
> I think what René showed later in the thread is not too bad in that
> respect.
>
>> I've just barely resisted the urge to include that thread where we were
>> discussing making the code C++-compiler compatible in the References
>> header :)
>
> Yes. The main thing I would want out of using C++ is type-safe,
> efficient data structures. IIRC, early versions of C++ were implemented
> via code generation, and we're basically walking down that same road.
>
> I'm not sure where the right cutoff is, though. It's nice to pick up
> the solution somebody else produced, but requiring a C++ compiler to
> build Git is a pretty big step[...]

No comment on whether git should use C++...

>  that I imagine will create a lot of new problems. (We're just now
> allowing C99 -- I don't even want to think about what kind of compiler
> issues we'll run into on antique systems trying to use C++).

...But just on this point: I was under the impression that this problem
was way easier with C++. I.e. reason we're just now using C99 for
portable C projects is because Microsoft for years refused to put any
effort into updating their compiler to support newer C versions, while
keeping up-to-date with C++, and that this has only recently started
changing: https://en.wikipedia.org/wiki/C99#Implementations

Maybe there was some other popular vendor of C/C++ compilers that had
the inverse of that story, but I'm not aware of any.


Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread Jeff King
On Fri, Oct 05, 2018 at 09:36:28PM +0200, René Scharfe wrote:

> Am 05.10.2018 um 21:08 schrieb Jeff King:
> > On Fri, Oct 05, 2018 at 08:48:27PM +0200, René Scharfe wrote:
> >> +#define DEFINE_SORT(name, type, compare)  \
> >> +static int compare##_void(const void *one, const void *two)   
> >> \
> >> +{ \
> >> +  return compare(one, two);   \
> >> +} \
> >> +static void name(type base, size_t nmemb) \
> >> +{ \
> >> +  const type dummy = NULL;\
> >> +  if (nmemb > 1)  \
> >> +  qsort(base, nmemb, sizeof(base[0]), compare##_void);\
> >> +  else if (0) \
> >> +  compare(dummy, dummy);  \
> >> +}
> > 
> > I do like that this removes the need to have the code block aspart of
> > the macro.
> > 
> > Did you measure to see if there is any runtime impact?
> 
> No, but I wouldn't expect any -- the generated code should be the same
> in most cases.
> 
> Here's an example: https://godbolt.org/z/gwXENy.

OK, that's good enough for me.

> The typed comparison function can be inlined into the one with the void
> pointers, though.

Right, that makes sense. I suspect it depends on the comparison function
being static, but in a DEFINE_SORT() world, they generally could be.

So I like this approach.

-Peff


Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread René Scharfe
Am 05.10.2018 um 21:08 schrieb Jeff King:
> On Fri, Oct 05, 2018 at 08:48:27PM +0200, René Scharfe wrote:
>> +#define DEFINE_SORT(name, type, compare)\
>> +static int compare##_void(const void *one, const void *two) \
>> +{   \
>> +return compare(one, two);   \
>> +}   \
>> +static void name(type base, size_t nmemb)   \
>> +{   \
>> +const type dummy = NULL;\
>> +if (nmemb > 1)  \
>> +qsort(base, nmemb, sizeof(base[0]), compare##_void);\
>> +else if (0) \
>> +compare(dummy, dummy);  \
>> +}
> 
> I do like that this removes the need to have the code block aspart of
> the macro.
> 
> Did you measure to see if there is any runtime impact?

No, but I wouldn't expect any -- the generated code should be the same
in most cases.

Here's an example: https://godbolt.org/z/gwXENy.

> As an aside, we may need to take a "scope" argument in case somebody
> wants to do this in a non-static way.

Sure.  (They could easily wrap the static function, but a macro
parameter is simpler still.)

> It would be nice if we could make
> this "static inline", but I don't think even a clever compiler would be
> able to omit the wrapper call.

It could, if it was to inline qsort(3).  Current compilers don't do
that AFAIK, but I wouldn't be too surprised if they started to.

The typed comparison function can be inlined into the one with the void
pointers, though.

René


Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread Jeff King
On Fri, Oct 05, 2018 at 09:12:09PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I'm not wild about declaring functions inside macros, just because it
> > makes tools like ctags like useful (but I have certainly been guilty of
> > it myself). I'd also worry that taking "code" as a macro parameter might
> > not scale (what happens if the code has a comma in it?)
> 
> There's always the option of generating the C code as part of some build
> step and carrying around a big C file with various type-safe functions
> that only differ in the types they operate on. It can even be committed
> to source control.
> 
> That sucks in some ways for sure, but is a lot friendlier for grepping,
> ctags etc.

Yeah, in a lot of ways the C preprocessor is not great for larger-scale
code generation. I was hoping we could get away without having the
bodies of these functions as part of the generated bit, though.

I think what René showed later in the thread is not too bad in that
respect.

> I've just barely resisted the urge to include that thread where we were
> discussing making the code C++-compiler compatible in the References
> header :)

Yes. The main thing I would want out of using C++ is type-safe,
efficient data structures. IIRC, early versions of C++ were implemented
via code generation, and we're basically walking down that same road.

I'm not sure where the right cutoff is, though. It's nice to pick up
the solution somebody else produced, but requiring a C++ compiler to
build Git is a pretty big step that I imagine will create a lot of new
problems. (We're just now allowing C99 -- I don't even want to think
about what kind of compiler issues we'll run into on antique systems
trying to use C++).

-Peff


Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread Ævar Arnfjörð Bjarmason


On Fri, Oct 05 2018, Jeff King wrote:

> On Fri, Oct 05, 2018 at 12:59:02AM +0200, René Scharfe wrote:
>
>> We could also do something like this to reduce the amount of manual
>> casting, but do we want to?  (Macro at the bottom, three semi-random
>> examples at the top.)
>> [...]
>> diff --git a/git-compat-util.h b/git-compat-util.h
>> index 5f2e90932f..f9e78d69a2 100644
>> --- a/git-compat-util.h
>> +++ b/git-compat-util.h
>> @@ -1066,6 +1066,18 @@ static inline void sane_qsort(void *base, size_t 
>> nmemb, size_t size,
>>  qsort(base, nmemb, size, compar);
>>  }
>>
>> +#define DEFINE_SORT(name, elemtype, one, two, code) \
>> +static int name##_compare(const void *one##_v_, const void *two##_v_)   
>> \
>> +{   \
>> +elemtype const *one = one##_v_; \
>> +elemtype const *two = two##_v_; \
>> +code;   \
>> +}   \
>> +static void name(elemtype *array, size_t n) \
>> +{   \
>> +QSORT(array, n, name##_compare);\
>> +}
>
> Interesting. When I saw the callers of this macro, I first thought you
> were just removing the casts from the comparison function, but the real
> value here is the matching QSORT() wrapper which provides the type
> safety.
>
> I'm not wild about declaring functions inside macros, just because it
> makes tools like ctags like useful (but I have certainly been guilty of
> it myself). I'd also worry that taking "code" as a macro parameter might
> not scale (what happens if the code has a comma in it?)

There's always the option of generating the C code as part of some build
step and carrying around a big C file with various type-safe functions
that only differ in the types they operate on. It can even be committed
to source control.

That sucks in some ways for sure, but is a lot friendlier for grepping,
ctags etc.

I've just barely resisted the urge to include that thread where we were
discussing making the code C++-compiler compatible in the References
header :)


Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread Jeff King
On Fri, Oct 05, 2018 at 08:48:27PM +0200, René Scharfe wrote:

> If the comparison function has proper types then we need to declare a
> version with void pointer parameters as well to give to qsort(3).  I
> think using cast function pointers is undefined.  Perhaps like this?

I think it's undefined, too, though we have many instances already.

> +#define DEFINE_SORT(name, type, compare) \
> +static int compare##_void(const void *one, const void *two)  \
> +{\
> + return compare(one, two);   \
> +}\
> +static void name(type base, size_t nmemb)\
> +{\
> + const type dummy = NULL;\
> + if (nmemb > 1)  \
> + qsort(base, nmemb, sizeof(base[0]), compare##_void);\
> + else if (0) \
> + compare(dummy, dummy);  \
> +}

I do like that this removes the need to have the code block aspart of
the macro.

Did you measure to see if there is any runtime impact?

As an aside, we may need to take a "scope" argument in case somebody
wants to do this in a non-static way. It would be nice if we could make
this "static inline", but I don't think even a clever compiler would be
able to omit the wrapper call.

-Peff


Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread René Scharfe
Am 05.10.2018 um 18:51 schrieb Jeff King:
> On Fri, Oct 05, 2018 at 12:59:02AM +0200, René Scharfe wrote:
> 
>> We could also do something like this to reduce the amount of manual
>> casting, but do we want to?  (Macro at the bottom, three semi-random
>> examples at the top.)
>> [...]
>> diff --git a/git-compat-util.h b/git-compat-util.h
>> index 5f2e90932f..f9e78d69a2 100644
>> --- a/git-compat-util.h
>> +++ b/git-compat-util.h
>> @@ -1066,6 +1066,18 @@ static inline void sane_qsort(void *base, size_t 
>> nmemb, size_t size,
>>  qsort(base, nmemb, size, compar);
>>  }
>>  
>> +#define DEFINE_SORT(name, elemtype, one, two, code) \
>> +static int name##_compare(const void *one##_v_, const void *two##_v_)   
>> \
>> +{   \
>> +elemtype const *one = one##_v_; \
>> +elemtype const *two = two##_v_; \
>> +code;   \
>> +}   \
>> +static void name(elemtype *array, size_t n) \
>> +{   \
>> +QSORT(array, n, name##_compare);\
>> +}
> 
> Interesting. When I saw the callers of this macro, I first thought you
> were just removing the casts from the comparison function, but the real
> value here is the matching QSORT() wrapper which provides the type
> safety.

Indeed.

> I'm not wild about declaring functions inside macros, just because it
> makes tools like ctags like useful (but I have certainly been guilty of
> it myself). I'd also worry that taking "code" as a macro parameter might
> not scale (what happens if the code has a comma in it?)

It works fine, as long as the comma is surrounded by parentheses, so
function calls with more than one parameter are fine without any change.

> I think we can address that last part by switching the definition order.
> Like:
> 
>   #define DEFINE_SORT(name, elemtype, one, two) \
>   static int name##_compare(const void *, const void *);\
>   static void name(elemtype *array, size_t n)   \
>   { \
>   QSORT(array, n, name##_compare);\
>   } \
>   static int name##_compare(const void *one##_v_, const void *two##_v_) \
>   { \
>   elemtype const *one = one##_v_; \
>   elemtype const *two = two##_v_; \
> 
> And then expecting the caller to do:
> 
>   DEFINE_SORT(foo, struct foo, a, b)
>  /* code goes here */
>   }
> 
> The unbalanced braces are nasty, though (and likely to screw up editor
> formatting, highlighting, etc).

Adding an extra pair of parentheses if needed is also not ideal, but has
less downsides, I think.

> I wonder if it would be possible to just declare the comparison function
> with its real types, and then teach QSORT() to do a type check. That
> would require typeof() at least, but it would be OK for the type-check
> to be available only to gcc/clang users, I think.
> 
> I'm not quite sure what that type-check would look like, but I was
> thinking something along the lines of (inside the QSORT macro):
> 
>   do {
> /* this will yield a type mismatch if fed the wrong function */
> int (*check)(const typeof(array), const typeof(array)) = compar;
> sane_qsort(array, n, sizeof(*array), n);
>   } while (0)
> 
> I have no idea if that even comes close to compiling, though.

If the comparison function has proper types then we need to declare a
version with void pointer parameters as well to give to qsort(3).  I
think using cast function pointers is undefined.  Perhaps like this?

---
 bisect.c  | 11 +--
 commit-graph.c|  8 
 commit-reach.c| 12 +++-
 git-compat-util.h | 14 ++
 4 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/bisect.c b/bisect.c
index e8b17cf7e1..1fc6278c6b 100644
--- a/bisect.c
+++ b/bisect.c
@@ -192,17 +192,16 @@ struct commit_dist {
int distance;
 };
 
-static int compare_commit_dist(const void *a_, const void *b_)
+static int compare_commit_dist(const struct commit_dist *a,
+  const struct commit_dist *b)
 {
-   struct commit_dist *a, *b;
-
-   a = (struct commit_dist *)a_;
-   b = (struct commit_dist *)b_;
if (a->distance != b->distance)
return b->distance - a->distance; /* desc sort */
return oidcmp(>commit->object.oid, >commit->object.oid);
 }
 
+DEFINE_SORT(sort_by_commit_dist, struct commit_dist *, compare_commit_dist)
+
 static 

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread Jeff King
On Fri, Oct 05, 2018 at 12:59:02AM +0200, René Scharfe wrote:

> We could also do something like this to reduce the amount of manual
> casting, but do we want to?  (Macro at the bottom, three semi-random
> examples at the top.)
> [...]
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 5f2e90932f..f9e78d69a2 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -1066,6 +1066,18 @@ static inline void sane_qsort(void *base, size_t 
> nmemb, size_t size,
>   qsort(base, nmemb, size, compar);
>  }
>  
> +#define DEFINE_SORT(name, elemtype, one, two, code)  \
> +static int name##_compare(const void *one##_v_, const void *two##_v_)
> \
> +{\
> + elemtype const *one = one##_v_; \
> + elemtype const *two = two##_v_; \
> + code;   \
> +}\
> +static void name(elemtype *array, size_t n)  \
> +{\
> + QSORT(array, n, name##_compare);\
> +}

Interesting. When I saw the callers of this macro, I first thought you
were just removing the casts from the comparison function, but the real
value here is the matching QSORT() wrapper which provides the type
safety.

I'm not wild about declaring functions inside macros, just because it
makes tools like ctags like useful (but I have certainly been guilty of
it myself). I'd also worry that taking "code" as a macro parameter might
not scale (what happens if the code has a comma in it?)

I think we can address that last part by switching the definition order.
Like:

  #define DEFINE_SORT(name, elemtype, one, two) \
  static int name##_compare(const void *, const void *);\
  static void name(elemtype *array, size_t n)   \
  { \
QSORT(array, n, name##_compare);\
  } \
  static int name##_compare(const void *one##_v_, const void *two##_v_) \
  { \
elemtype const *one = one##_v_; \
elemtype const *two = two##_v_; \

And then expecting the caller to do:

  DEFINE_SORT(foo, struct foo, a, b)
 /* code goes here */
  }

The unbalanced braces are nasty, though (and likely to screw up editor
formatting, highlighting, etc).

I wonder if it would be possible to just declare the comparison function
with its real types, and then teach QSORT() to do a type check. That
would require typeof() at least, but it would be OK for the type-check
to be available only to gcc/clang users, I think.

I'm not quite sure what that type-check would look like, but I was
thinking something along the lines of (inside the QSORT macro):

  do {
/* this will yield a type mismatch if fed the wrong function */
int (*check)(const typeof(array), const typeof(array)) = compar;
sane_qsort(array, n, sizeof(*array), n);
  } while (0)

I have no idea if that even comes close to compiling, though.

-Peff


Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread Derrick Stolee

On 10/4/2018 6:59 PM, René Scharfe wrote:

Am 01.10.2018 um 22:37 schrieb René Scharfe:

Am 01.10.2018 um 21:26 schrieb Derrick Stolee:

Good catch! I'm disappointed that we couldn't use type-checking here, as
it is quite difficult to discover that the types are wrong here.

Generics in C are hard, and type checking traditionally falls by the
wayside.  You could use macros for that, like klib [*] does, but that
has its own downsides (more object text, debugging the sort macros
themselves is harder).

[*] https://github.com/attractivechaos/klib/blob/master/ksort.h

We could also do something like this to reduce the amount of manual
casting, but do we want to?  (Macro at the bottom, three semi-random
examples at the top.)


I like the idea! It certainly can assist in some of the repeat work when 
preparing to QSORT, and make it less error-prone.



---
  bisect.c  | 11 +++
  commit-graph.c|  9 ++---
  commit-reach.c| 12 +---
  git-compat-util.h | 12 
  4 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/bisect.c b/bisect.c
index e8b17cf7e1..06be3a3c15 100644
--- a/bisect.c
+++ b/bisect.c
@@ -192,16 +192,11 @@ struct commit_dist {
int distance;
  };
  
-static int compare_commit_dist(const void *a_, const void *b_)

-{
-   struct commit_dist *a, *b;
-
-   a = (struct commit_dist *)a_;
-   b = (struct commit_dist *)b_;
+DEFINE_SORT(sort_by_commit_dist, struct commit_dist, a, b, {
if (a->distance != b->distance)
return b->distance - a->distance; /* desc sort */
return oidcmp(>commit->object.oid, >commit->object.oid);
-}
+})
  
  static struct commit_list *best_bisection_sorted(struct commit_list *list, int nr)

  {
@@ -223,7 +218,7 @@ static struct commit_list *best_bisection_sorted(struct 
commit_list *list, int n
array[cnt].distance = distance;
cnt++;
}
-   QSORT(array, cnt, compare_commit_dist);
+   sort_by_commit_dist(array, cnt);
for (p = list, i = 0; i < cnt; i++) {
struct object *obj = &(array[i].commit->object);
  
diff --git a/commit-graph.c b/commit-graph.c

index 7f4519ec3b..a2202414e0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -550,12 +550,7 @@ static void write_graph_chunk_large_edges(struct hashfile 
*f,
}
  }
  
-static int commit_compare(const void *_a, const void *_b)

-{
-   const struct object_id *a = (const struct object_id *)_a;
-   const struct object_id *b = (const struct object_id *)_b;
-   return oidcmp(a, b);
-}
+DEFINE_SORT(sort_oids, struct object_id, a, b, return oidcmp(a, b))
  
  struct packed_commit_list {

struct commit **list;
@@ -780,7 +775,7 @@ void write_commit_graph(const char *obj_dir,
  
  	close_reachable();
  
-	QSORT(oids.list, oids.nr, commit_compare);

+   sort_oids(oids.list, oids.nr);
  
  	count_distinct = 1;

for (i = 1; i < oids.nr; i++) {
diff --git a/commit-reach.c b/commit-reach.c
index 2f5e592d16..3aef47c3dd 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -527,17 +527,15 @@ int commit_contains(struct ref_filter *filter, struct 
commit *commit,
return is_descendant_of(commit, list);
  }
  
-static int compare_commits_by_gen(const void *_a, const void *_b)

-{
-   const struct commit *a = *(const struct commit * const *)_a;
-   const struct commit *b = *(const struct commit * const *)_b;
-
+DEFINE_SORT(sort_commits_by_gen, struct commit *, ap, bp, {
+   const struct commit *a = *ap;
+   const struct commit *b = *bp;
if (a->generation < b->generation)
return -1;
if (a->generation > b->generation)
return 1;
return 0;
-}
+})


Here, to make the macro version compile you need to cast ap and bp, 
which gives us a level of type-safety that wasn't there before. That can 
help us find errors at compile-time!


  
  int can_all_from_reach_with_flag(struct object_array *from,

 unsigned int with_flag,
@@ -580,7 +578,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
nr_commits++;
}
  
-	QSORT(list, nr_commits, compare_commits_by_gen);

+   sort_commits_by_gen(list, nr_commits);
  
  	for (i = 0; i < nr_commits; i++) {

/* DFS from list[i] */
diff --git a/git-compat-util.h b/git-compat-util.h
index 5f2e90932f..f9e78d69a2 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1066,6 +1066,18 @@ static inline void sane_qsort(void *base, size_t nmemb, 
size_t size,
qsort(base, nmemb, size, compar);
  }
  
+#define DEFINE_SORT(name, elemtype, one, two, code)			\

+static int name##_compare(const void *one##_v_, const void *two##_v_)  \
+{  \
+   elemtype const *one = one##_v_; \
+   elemtype const *two = two##_v_; \

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-04 Thread René Scharfe
Am 01.10.2018 um 22:37 schrieb René Scharfe:
> Am 01.10.2018 um 21:26 schrieb Derrick Stolee:
>> On 10/1/2018 3:16 PM, René Scharfe wrote:
>>> Am 28.06.2018 um 14:31 schrieb Derrick Stolee via GitGitGadget:
 diff --git a/commit-reach.c b/commit-reach.c
 index c58e50fbb..ac132c8e4 100644
 --- a/commit-reach.c
 +++ b/commit-reach.c
 @@ -513,65 +513,88 @@ int commit_contains(struct ref_filter *filter, 
 struct commit *commit,
return is_descendant_of(commit, list);
   }
   
 -int reachable(struct commit *from, int with_flag, int assign_flag,
 -time_t min_commit_date)
 +static int compare_commits_by_gen(const void *_a, const void *_b)
   {
 -  struct prio_queue work = { compare_commits_by_commit_date };
 +  const struct commit *a = (const struct commit *)_a;
 +  const struct commit *b = (const struct commit *)_b;
>>> This cast is bogus.  QSORT gets handed a struct commit **, i.e. an array
>>> of pointers, and qsort(1) passes references to those pointers to the
>>> compare function, and not the pointer values.
>>
>> Good catch! I'm disappointed that we couldn't use type-checking here, as 
>> it is quite difficult to discover that the types are wrong here.
> 
> Generics in C are hard, and type checking traditionally falls by the
> wayside.  You could use macros for that, like klib [*] does, but that
> has its own downsides (more object text, debugging the sort macros
> themselves is harder).
> 
> [*] https://github.com/attractivechaos/klib/blob/master/ksort.h

We could also do something like this to reduce the amount of manual
casting, but do we want to?  (Macro at the bottom, three semi-random
examples at the top.)
---
 bisect.c  | 11 +++
 commit-graph.c|  9 ++---
 commit-reach.c| 12 +---
 git-compat-util.h | 12 
 4 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/bisect.c b/bisect.c
index e8b17cf7e1..06be3a3c15 100644
--- a/bisect.c
+++ b/bisect.c
@@ -192,16 +192,11 @@ struct commit_dist {
int distance;
 };
 
-static int compare_commit_dist(const void *a_, const void *b_)
-{
-   struct commit_dist *a, *b;
-
-   a = (struct commit_dist *)a_;
-   b = (struct commit_dist *)b_;
+DEFINE_SORT(sort_by_commit_dist, struct commit_dist, a, b, {
if (a->distance != b->distance)
return b->distance - a->distance; /* desc sort */
return oidcmp(>commit->object.oid, >commit->object.oid);
-}
+})
 
 static struct commit_list *best_bisection_sorted(struct commit_list *list, int 
nr)
 {
@@ -223,7 +218,7 @@ static struct commit_list *best_bisection_sorted(struct 
commit_list *list, int n
array[cnt].distance = distance;
cnt++;
}
-   QSORT(array, cnt, compare_commit_dist);
+   sort_by_commit_dist(array, cnt);
for (p = list, i = 0; i < cnt; i++) {
struct object *obj = &(array[i].commit->object);
 
diff --git a/commit-graph.c b/commit-graph.c
index 7f4519ec3b..a2202414e0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -550,12 +550,7 @@ static void write_graph_chunk_large_edges(struct hashfile 
*f,
}
 }
 
-static int commit_compare(const void *_a, const void *_b)
-{
-   const struct object_id *a = (const struct object_id *)_a;
-   const struct object_id *b = (const struct object_id *)_b;
-   return oidcmp(a, b);
-}
+DEFINE_SORT(sort_oids, struct object_id, a, b, return oidcmp(a, b))
 
 struct packed_commit_list {
struct commit **list;
@@ -780,7 +775,7 @@ void write_commit_graph(const char *obj_dir,
 
close_reachable();
 
-   QSORT(oids.list, oids.nr, commit_compare);
+   sort_oids(oids.list, oids.nr);
 
count_distinct = 1;
for (i = 1; i < oids.nr; i++) {
diff --git a/commit-reach.c b/commit-reach.c
index 2f5e592d16..3aef47c3dd 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -527,17 +527,15 @@ int commit_contains(struct ref_filter *filter, struct 
commit *commit,
return is_descendant_of(commit, list);
 }
 
-static int compare_commits_by_gen(const void *_a, const void *_b)
-{
-   const struct commit *a = *(const struct commit * const *)_a;
-   const struct commit *b = *(const struct commit * const *)_b;
-
+DEFINE_SORT(sort_commits_by_gen, struct commit *, ap, bp, {
+   const struct commit *a = *ap;
+   const struct commit *b = *bp;
if (a->generation < b->generation)
return -1;
if (a->generation > b->generation)
return 1;
return 0;
-}
+})
 
 int can_all_from_reach_with_flag(struct object_array *from,
 unsigned int with_flag,
@@ -580,7 +578,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
nr_commits++;
}
 
-   QSORT(list, nr_commits, compare_commits_by_gen);
+   sort_commits_by_gen(list, nr_commits);
 
for (i = 0; i < nr_commits; i++) {

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-01 Thread René Scharfe
Am 01.10.2018 um 21:26 schrieb Derrick Stolee:
> On 10/1/2018 3:16 PM, René Scharfe wrote:
>> Am 28.06.2018 um 14:31 schrieb Derrick Stolee via GitGitGadget:
>>> diff --git a/commit-reach.c b/commit-reach.c
>>> index c58e50fbb..ac132c8e4 100644
>>> --- a/commit-reach.c
>>> +++ b/commit-reach.c
>>> @@ -513,65 +513,88 @@ int commit_contains(struct ref_filter *filter, struct 
>>> commit *commit,
>>> return is_descendant_of(commit, list);
>>>   }
>>>   
>>> -int reachable(struct commit *from, int with_flag, int assign_flag,
>>> - time_t min_commit_date)
>>> +static int compare_commits_by_gen(const void *_a, const void *_b)
>>>   {
>>> -   struct prio_queue work = { compare_commits_by_commit_date };
>>> +   const struct commit *a = (const struct commit *)_a;
>>> +   const struct commit *b = (const struct commit *)_b;
>> This cast is bogus.  QSORT gets handed a struct commit **, i.e. an array
>> of pointers, and qsort(1) passes references to those pointers to the
>> compare function, and not the pointer values.
> 
> Good catch! I'm disappointed that we couldn't use type-checking here, as 
> it is quite difficult to discover that the types are wrong here.

Generics in C are hard, and type checking traditionally falls by the
wayside.  You could use macros for that, like klib [*] does, but that
has its own downsides (more object text, debugging the sort macros
themselves is harder).

[*] https://github.com/attractivechaos/klib/blob/master/ksort.h

>> diff --git a/commit-reach.c b/commit-reach.c
>> index 00e5ceee6f..2f5e592d16 100644
>> --- a/commit-reach.c
>> +++ b/commit-reach.c
>> @@ -529,8 +529,8 @@ int commit_contains(struct ref_filter *filter, struct 
>> commit *commit,
>>   
>>   static int compare_commits_by_gen(const void *_a, const void *_b)
>>   {
>> -const struct commit *a = (const struct commit *)_a;
>> -const struct commit *b = (const struct commit *)_b;
>> +const struct commit *a = *(const struct commit * const *)_a;
>> +const struct commit *b = *(const struct commit * const *)_b;
> 
> I would expect s/* const */**/ here, but I'm guessing your formulation 
> is a bit extra careful about types.

Yeah, that second const is not necessary, as the dereference in the same
line makes it inconsequential, but I added it to make clear that this
function is really not supposed to write at all..

René


Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-01 Thread Derrick Stolee

On 10/1/2018 3:16 PM, René Scharfe wrote:

Am 28.06.2018 um 14:31 schrieb Derrick Stolee via GitGitGadget:

diff --git a/commit-reach.c b/commit-reach.c
index c58e50fbb..ac132c8e4 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -513,65 +513,88 @@ int commit_contains(struct ref_filter *filter, struct 
commit *commit,
return is_descendant_of(commit, list);
  }
  
-int reachable(struct commit *from, int with_flag, int assign_flag,

- time_t min_commit_date)
+static int compare_commits_by_gen(const void *_a, const void *_b)
  {
-   struct prio_queue work = { compare_commits_by_commit_date };
+   const struct commit *a = (const struct commit *)_a;
+   const struct commit *b = (const struct commit *)_b;

This cast is bogus.  QSORT gets handed a struct commit **, i.e. an array
of pointers, and qsort(1) passes references to those pointers to the
compare function, and not the pointer values.


Good catch! I'm disappointed that we couldn't use type-checking here, as 
it is quite difficult to discover that the types are wrong here.




As a result it's unlikely that the array is sorted in the intended
order.  Given that, a silly question: Is sorting even necessary here?


The reason to sort is to hopefully minimize the amount we walk by 
exploring the "lower" commits first. This is a performance-only thing, 
not a correctness issue (which is why the bug exists). Even then, it is 
just a heuristic.

Anyway, the patch below should fix it.

-- >8 --
Subject: [PATCH] commit-reach: fix cast in compare_commits_by_gen()

The elements of the array to be sorted are commit pointers, so the
comparison function gets handed references to these pointers, not
pointers to commit objects.  Cast to the right type and dereference
once to correctly get the commit reference.

Found using Clang's ASan and t5500.

Signed-off-by: Rene Scharfe 
---
Has this patch a performance impact?

  commit-reach.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 00e5ceee6f..2f5e592d16 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -529,8 +529,8 @@ int commit_contains(struct ref_filter *filter, struct 
commit *commit,
  
  static int compare_commits_by_gen(const void *_a, const void *_b)

  {
-   const struct commit *a = (const struct commit *)_a;
-   const struct commit *b = (const struct commit *)_b;
+   const struct commit *a = *(const struct commit * const *)_a;
+   const struct commit *b = *(const struct commit * const *)_b;


I would expect s/* const */**/ here, but I'm guessing your formulation 
is a bit extra careful about types.


Thanks!

Reviewed-by: Derrick Stolee 


Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-01 Thread René Scharfe
Am 28.06.2018 um 14:31 schrieb Derrick Stolee via GitGitGadget:
> diff --git a/commit-reach.c b/commit-reach.c
> index c58e50fbb..ac132c8e4 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -513,65 +513,88 @@ int commit_contains(struct ref_filter *filter, struct 
> commit *commit,
>   return is_descendant_of(commit, list);
>  }
>  
> -int reachable(struct commit *from, int with_flag, int assign_flag,
> -   time_t min_commit_date)
> +static int compare_commits_by_gen(const void *_a, const void *_b)
>  {
> - struct prio_queue work = { compare_commits_by_commit_date };
> + const struct commit *a = (const struct commit *)_a;
> + const struct commit *b = (const struct commit *)_b;

This cast is bogus.  QSORT gets handed a struct commit **, i.e. an array
of pointers, and qsort(1) passes references to those pointers to the
compare function, and not the pointer values.

As a result it's unlikely that the array is sorted in the intended
order.  Given that, a silly question: Is sorting even necessary here?

Anyway, the patch below should fix it.

>  
> - prio_queue_put(, from);
> - while (work.nr) {
> - struct commit_list *list;
> - struct commit *commit = prio_queue_get();
> -
> - if (commit->object.flags & with_flag) {
> - from->object.flags |= assign_flag;
> - break;
> - }
> - if (!commit->object.parsed)
> - parse_object(the_repository, >object.oid);
> - if (commit->object.flags & REACHABLE)
> - continue;
> - commit->object.flags |= REACHABLE;
> - if (commit->date < min_commit_date)
> - continue;
> - for (list = commit->parents; list; list = list->next) {
> - struct commit *parent = list->item;
> - if (!(parent->object.flags & REACHABLE))
> - prio_queue_put(, parent);
> - }
> - }
> - from->object.flags |= REACHABLE;
> - clear_commit_marks(from, REACHABLE);
> - clear_prio_queue();
> - return (from->object.flags & assign_flag);
> + if (a->generation < b->generation)
> + return -1;
> + if (a->generation > b->generation)
> + return 1;
> + return 0;
>  }
>  
>  int can_all_from_reach_with_flag(struct object_array *from,
>int with_flag, int assign_flag,
> -  time_t min_commit_date)
> +  time_t min_commit_date,
> +  uint32_t min_generation)
>  {
> + struct commit **list = NULL;
>   int i;
> + int result = 1;
>  
> + ALLOC_ARRAY(list, from->nr);
>   for (i = 0; i < from->nr; i++) {
> - struct object *from_one = from->objects[i].item;
> + list[i] = (struct commit *)from->objects[i].item;
>  
> - if (from_one->flags & assign_flag)
> - continue;
> - from_one = deref_tag(the_repository, from_one, "a from object", 
> 0);
> - if (!from_one || from_one->type != OBJ_COMMIT) {
> - /* no way to tell if this is reachable by
> -  * looking at the ancestry chain alone, so
> -  * leave a note to ourselves not to worry about
> -  * this object anymore.
> -  */
> - from->objects[i].item->flags |= assign_flag;
> - continue;
> - }
> - if (!reachable((struct commit *)from_one, with_flag, 
> assign_flag,
> -min_commit_date))
> + parse_commit(list[i]);
> +
> + if (list[i]->generation < min_generation)
>   return 0;
>   }
> - return 1;
> +
> + QSORT(list, from->nr, compare_commits_by_gen);

-- >8 --
Subject: [PATCH] commit-reach: fix cast in compare_commits_by_gen()

The elements of the array to be sorted are commit pointers, so the
comparison function gets handed references to these pointers, not
pointers to commit objects.  Cast to the right type and dereference
once to correctly get the commit reference.

Found using Clang's ASan and t5500.

Signed-off-by: Rene Scharfe 
---
Has this patch a performance impact?

 commit-reach.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 00e5ceee6f..2f5e592d16 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -529,8 +529,8 @@ int commit_contains(struct ref_filter *filter, struct 
commit *commit,
 
 static int compare_commits_by_gen(const void *_a, const void *_b)
 {
-   const struct commit *a = (const struct commit *)_a;
-   const struct commit *b = (const struct commit *)_b;
+   const struct commit *a = *(const struct commit * const *)_a;
+   const struct commit *b = *(const struct commit * 

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-07-16 Thread Jonathan Tan
> The first step includes using a depth-first-search (DFS) from each from
> commit, sorted by ascending generation number. We do not walk beyond the
> minimum generation number or the minimum commit date. This DFS is likely
> to be faster than the existing reachable() method because we expect
> previous ref values to be along the first-parent history.
> 
> If we find a target commit, then we mark everything in the DFS stack as
> a RESULT. This expands the set of targets for the other from commits. We
> also mark the visited commits using 'assign_flag' to prevent re-walking
> the same code.

Thanks for this - it was very helpful in understanding the code.

The function itself uses a DFS stack that contains only the trail
leading up to the currently processed node, and not the one that I'm
more familiar with, which also contains the siblings of processed nodes.
I'll annotate the function with my thought process in the hope that it
will aid future reviewers. (The diff as seen in the e-mail is confusing
so I'm reproducing the function itself, not any + or -.)

> int can_all_from_reach_with_flag(struct object_array *from,
>int with_flag, int assign_flag,
>time_t min_commit_date,
>uint32_t min_generation)
> {
>   struct commit **list = NULL;
>   int i;
>   int result = 1;
> 
>   ALLOC_ARRAY(list, from->nr);
>   for (i = 0; i < from->nr; i++) {
>   list[i] = (struct commit *)from->objects[i].item;
> 
>   parse_commit(list[i]);
> 
>   if (list[i]->generation < min_generation)
>   return 0;
>   }
> 
>   QSORT(list, from->nr, compare_commits_by_gen);
> 
>   for (i = 0; i < from->nr; i++) {
>   /* DFS from list[i] */
>   struct commit_list *stack = NULL;
> 
>   list[i]->object.flags |= assign_flag;
>   commit_list_insert(list[i], );
> 
>   while (stack) {
>   struct commit_list *parent;
> 
>   if (stack->item->object.flags & with_flag) {
>   pop_commit();
>   continue;
>   }

I wish that the code would refrain from pushing such an object instead
of popping it at the first opportunity, but I guess that doing so would
require the equivalent of a labeled break/continue. I have no qualms
with using "goto" in this case, but I know that some people don't like
it :-P

>   for (parent = stack->item->parents; parent; parent = 
> parent->next) {
>   if (parent->item->object.flags & (with_flag | 
> RESULT))
>   stack->item->object.flags |= RESULT;

Straightforward, and also produces the bubbling up that we want. An
object is never popped unless it has the "with_flag" flag (see above) or
all its parents have been processed. The object can encounter the "if"
statement multiple times; the last one is when all its parents have been
processed (and thus have the RESULT flag set if necessary).

>   if (!(parent->item->object.flags & 
> assign_flag)) {
>   parent->item->object.flags |= 
> assign_flag;
> 
>   parse_commit(parent->item);
> 
>   if (parent->item->date < 
> min_commit_date ||
>   parent->item->generation < 
> min_generation)
>   continue;
> 
>   commit_list_insert(parent->item, 
> );
>   break;
>   }

If not yet processed, push it onto the stack and break. The child commit
is still left on the stack. The next time the child commit is processed
(in an iteration of the "while" loop), the "for" loop will iterate until
the next unprocessed parent.

In the DFS that I'm used to, all parents would be pushed here, but
perhaps the fact that the iteration is postorder confuses things.
Anyway, if someone comes up with a better algorithm, replacing it
shouldn't be too difficult - the algorithm is contained within this
function, and there are tests to check the correctness of the algorithm
update.

>   }
> 
>   if (!parent)
>   pop_commit();

Only when we have no parents left are we completely done with the
current object.

>   }
> 
>   if (!(list[i]->object.flags & (with_flag | RESULT))) {
>   result = 0;
>   goto cleanup;
>   }

And after the DFS, if the original object did not have an appropriate
flag set, we do not bother with the other "want" objects.

>   }
> 
> cleanup:
>   for (i = 0; i < from->nr; i++) {
>   

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-07-16 Thread Stefan Beller
On Mon, Jul 16, 2018 at 6:00 AM Derrick Stolee via GitGitGadget
 wrote:
>
> Note how the time increases between the two cases in the two versions.
> The new code increases relative to the number of commits that need to be
> walked, but not directly relative to the number of 'from' commits.

Cool!

>  int can_all_from_reach_with_flag(struct object_array *from,
>  int with_flag, int assign_flag,
> -time_t min_commit_date)
> +time_t min_commit_date,
> +uint32_t min_generation)
>  {

> for (i = 0; i < from->nr; i++) {
[...]
> +   parse_commit(list[i]);

parse_commit_or_die or handle the return code?
(or a comment why we specifically are allowed to ignore
the return code here)

[...]
> +   for (parent = stack->item->parents; parent; parent = 
> parent->next) {
[...]
> +   parse_commit(parent->item);

same here.


[PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-07-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The can_all_from_reach_with_flags() algorithm is currently quadratic in
the worst case, because it calls the reachable() method for every 'from'
without tracking which commits have already been walked or which can
already reach a commit in 'to'.

Rewrite the algorithm to walk each commit a constant number of times.

We also add some optimizations that should work for the main consumer of
this method: fetch negotitation (haves/wants).

The first step includes using a depth-first-search (DFS) from each from
commit, sorted by ascending generation number. We do not walk beyond the
minimum generation number or the minimum commit date. This DFS is likely
to be faster than the existing reachable() method because we expect
previous ref values to be along the first-parent history.

If we find a target commit, then we mark everything in the DFS stack as
a RESULT. This expands the set of targets for the other from commits. We
also mark the visited commits using 'assign_flag' to prevent re-walking
the same code.

We still need to clear our flags at the end, which is why we will have a
total of three visits to each commit.

Performance was measured on the Linux repository using
'test-tool reach can_all_from_reach'. The input included rows seeded by
tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as
v3.[0-9]*. This mimics a (very large) fetch that says "I have all major
v3 releases and want all major v4 releases." The "large" case included
X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate
tags to the set, which does not greatly increase the number of objects
that are considered, but does increase the number of 'from' commits,
demonstrating the quadratic nature of the previous code.

Small Case
--

Before: 1.52 s
 After: 0.26 s

Large Case
--

Before: 3.50 s
 After: 0.27 s

Note how the time increases between the two cases in the two versions.
The new code increases relative to the number of commits that need to be
walked, but not directly relative to the number of 'from' commits.

Signed-off-by: Derrick Stolee 
---
 commit-reach.c | 124 ++---
 commit-reach.h |   6 ++-
 upload-pack.c  |   5 +-
 3 files changed, 85 insertions(+), 50 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index c58e50fbb..ac132c8e4 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -513,65 +513,88 @@ int commit_contains(struct ref_filter *filter, struct 
commit *commit,
return is_descendant_of(commit, list);
 }
 
-int reachable(struct commit *from, int with_flag, int assign_flag,
- time_t min_commit_date)
+static int compare_commits_by_gen(const void *_a, const void *_b)
 {
-   struct prio_queue work = { compare_commits_by_commit_date };
+   const struct commit *a = (const struct commit *)_a;
+   const struct commit *b = (const struct commit *)_b;
 
-   prio_queue_put(, from);
-   while (work.nr) {
-   struct commit_list *list;
-   struct commit *commit = prio_queue_get();
-
-   if (commit->object.flags & with_flag) {
-   from->object.flags |= assign_flag;
-   break;
-   }
-   if (!commit->object.parsed)
-   parse_object(the_repository, >object.oid);
-   if (commit->object.flags & REACHABLE)
-   continue;
-   commit->object.flags |= REACHABLE;
-   if (commit->date < min_commit_date)
-   continue;
-   for (list = commit->parents; list; list = list->next) {
-   struct commit *parent = list->item;
-   if (!(parent->object.flags & REACHABLE))
-   prio_queue_put(, parent);
-   }
-   }
-   from->object.flags |= REACHABLE;
-   clear_commit_marks(from, REACHABLE);
-   clear_prio_queue();
-   return (from->object.flags & assign_flag);
+   if (a->generation < b->generation)
+   return -1;
+   if (a->generation > b->generation)
+   return 1;
+   return 0;
 }
 
 int can_all_from_reach_with_flag(struct object_array *from,
 int with_flag, int assign_flag,
-time_t min_commit_date)
+time_t min_commit_date,
+uint32_t min_generation)
 {
+   struct commit **list = NULL;
int i;
+   int result = 1;
 
+   ALLOC_ARRAY(list, from->nr);
for (i = 0; i < from->nr; i++) {
-   struct object *from_one = from->objects[i].item;
+   list[i] = (struct commit *)from->objects[i].item;
 
-   if (from_one->flags & assign_flag)
-   continue;
-   from_one = deref_tag(the_repository, from_one, "a from object", 
0);
-   if (!from_one ||