Re: [ANNOUNCE] Git v2.19.0-rc0

2018-09-02 Thread Kaartic Sivaraam
On Thu, 2018-08-23 at 06:26 -0400, Derrick Stolee wrote:
> 
> Around the time that my proposed approaches were getting vetoed for 
> alignment issues, I figured I was out of my depth here. I reached out to 
> Daniel Lemire (of EWAH bitmap fame) on Twitter [1]. His blog is full of 
> posts of word-based approaches to different problems, so I thought he 
> might know something off the top of his head that would be applicable. 
> His conclusion (after looking only a short time) was to take a 'hasheq' 
> approach [2] like Peff suggested [3]. Since that requires auditing all 
> callers of hashcmp to see if hasheq is appropriate, it is not a good 
> solution for 2.19 but (in my opinion) should be evaluated as part of the 
> 2.20 cycle.
> 

That was an interesting blog post, indeed. It had an interesting
comments section. One comment especially caught my eyes was [a]:

"So the way gcc (and maybe clang) handles this is specifically by
recognizing memcmp and checking whether a only a 2-way result is needed
and then essentially replacing it with a memcmp_eq call.

..."

I find this to be an interesting note. It seems GCC does optimize when
we clearly indicate that we use the result of the memcmp as a boolean.
So would that help in anyway? Maybe it would help in writing a `hasheq`
method easily? I'm not sure.

> [1] https://twitter.com/stolee/status/1032312965754748930
> 
> [2] 
> https://lemire.me/blog/2018/08/22/avoid-lexicographical-comparisons-when-testing-for-string-equality/
> 
> [3] 
> https://public-inbox.org/git/20180822030344.ga14...@sigill.intra.peff.net/
> 
> [4] 
> https://public-inbox.org/git/7ea416cf-b043-1274-e161-85a8780b8...@gmail.com/


[a]: 
https://lemire.me/blog/2018/08/22/avoid-lexicographical-comparisons-when-testing-for-string-equality/#comment-344073

--
Sivaraam



Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-27 Thread Junio C Hamano
Jeff King  writes:

> Actually, what I showed earlier does seem to have some weirdness with
> else-if. But I finally stumbled on something even better:
>
>   - oidcmp(a, b) != 0
>   + !oideq(a, b)
>
> Because of the isomorphisms that coccinelle knows about, that catches
> everything we want.

Nice ;-)



Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-25 Thread Jeff King
On Fri, Aug 24, 2018 at 12:45:10PM -0400, Derrick Stolee wrote:

> Here are the numbers for Linux:
> 
> |  | v2.18.0  | v2.19.0-rc0 | HEAD   |
> |--|--|-||
> |  | 86.5 | 70.739  | 57.266 |
> |  | 60.582   | 101.928 | 56.641 |
> |  | 58.964   | 60.139  | 60.258 |
> |  | 59.47    | 61.141  | 58.213 |
> |  | 62.554   | 60.73   | 84.54  |
> |  | 59.139   | 85.424  | 57.745 |
> |  | 58.487   | 59.31   | 59.979 |
> |  | 58.653   | 69.845  | 60.181 |
> |  | 58.085   | 102.777 | 61.455 |
> |  | 58.304   | 60.459  | 62.551 |
> | Max  | 86.5 | 102.777 | 84.54  |
> | Min  | 58.085   | 59.31   | 56.641 |
> | Avg* | 59.51913 | 71.30063    | 59.706 |
> | Med  | 59.0515  | 65.493  | 60.08  |

Thanks. The median ones are the most interesting, I think (and show a
very satisfying recovery from my patch).

I'm surprised at the variance, especially in your v2.19 runs. It makes
me distrust the mean (and implies to me we could do better by throwing
away outliers based on value, not just the single high/low; or just
using the median also solves that).

The variance in the v2.18 column is what I'd expect based on past
experience (slow cold cache to start, then a few percent change
run-to-run).

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-24 Thread Derrick Stolee

On 8/23/2018 4:59 PM, Derrick Stolee wrote:


When I choose my own metrics for performance tests, I like to run at 
least 10 runs, remove the largest AND smallest runs from the samples, 
and then take the average. I did this manually for 'git rev-list --all 
--objects' on git/git and got the following results:


v2.18.0    v2.19.0-rc0   HEAD

3.126 s    3.308 s   3.170 s

For full disclosure, here is a full table including all samples:

|  | v2.18.0 | v2.19.0-rc0 | HEAD    |
|--|-|-|-|
|  | 4.58    | 3.302   | 3.239   |
|  | 3.13    | 3.337   | 3.133   |
|  | 3.213   | 3.291   | 3.159   |
|  | 3.219   | 3.318   | 3.131   |
|  | 3.077   | 3.302   | 3.163   |
|  | 3.074   | 3.328   | 3.119   |
|  | 3.022   | 3.277   | 3.125   |
|  | 3.083   | 3.259   | 3.203   |
|  | 3.057   | 3.311   | 3.223   |
|  | 3.155   | 3.413   | 3.225   |
| Max  | 4.58    | 3.413   | 3.239   |
| Min  | 3.022   | 3.259   | 3.119   |
| Avg* | 3.126   | 3.30825 | 3.17025 |

(Note that the largest one was the first run, on v2.18.0, which is due 
to a cold disk.)


I just kicked off a script that will run this test on the Linux repo 
while I drive home. I'll be able to report a similar table of data 
easily.


Here are the numbers for Linux:

|  | v2.18.0  | v2.19.0-rc0 | HEAD   |
|--|--|-||
|  | 86.5 | 70.739  | 57.266 |
|  | 60.582   | 101.928 | 56.641 |
|  | 58.964   | 60.139  | 60.258 |
|  | 59.47    | 61.141  | 58.213 |
|  | 62.554   | 60.73   | 84.54  |
|  | 59.139   | 85.424  | 57.745 |
|  | 58.487   | 59.31   | 59.979 |
|  | 58.653   | 69.845  | 60.181 |
|  | 58.085   | 102.777 | 61.455 |
|  | 58.304   | 60.459  | 62.551 |
| Max  | 86.5 | 102.777 | 84.54  |
| Min  | 58.085   | 59.31   | 56.641 |
| Avg* | 59.51913 | 71.30063    | 59.706 |
| Med  | 59.0515  | 65.493  | 60.08  |

Thanks,

-Stolee



Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-24 Thread Derrick Stolee

On 8/24/2018 2:45 AM, Jeff King wrote:

On Thu, Aug 23, 2018 at 10:59:55PM -0400, Jeff King wrote:


So I think we have a winner. I'll polish that up into patches and send
it out later tonight.

Oof. This rabbit hole keeps going deeper and deeper. I wrote up my
coccinelle findings separately in:

   https://public-inbox.org/git/20180824064228.ga3...@sigill.intra.peff.net/

which is possibly a coccinelle bug (there I talked about oidcmp, since
it can be demonstrated with the existing transformations, but the same
thing happens with my hasheq patches). I'll wait to see how that
discussion plays out, but I do otherwise have hasheq() patches ready to
go, so it's probably not worth anybody else digging in further.

I was trying this myself yesterday, and found a doc _somewhere_ that 
said what we should do is this:


@@
expression e1, e2;
@@
if (
- oidcmp(e1, e2)
+ !oideq(e1, e2)
 )

(Don't forget the space before the last end-paren!)

-Stolee


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-24 Thread Ævar Arnfjörð Bjarmason


On Fri, Aug 24 2018, Jeff King wrote:

> On Thu, Aug 23, 2018 at 04:59:27PM -0400, Derrick Stolee wrote:
>
>> Using git/git:
>>
>> Test v2.18.0 v2.19.0-rc0 HEAD
>> -
>> 0001.2: 3.10(3.02+0.08) 3.27(3.17+0.09) +5.5% 3.14(3.02+0.11) +1.3%
>>
>>
>> Using torvalds/linux:
>>
>> Test v2.18.0 v2.19.0-rc0 HEAD
>> --
>> 0001.2: 56.08(45.91+1.50) 56.60(46.62+1.50) +0.9% 54.61(45.47+1.46) -2.6%
>
> Interesting that these timings aren't as dramatic as the ones you got
> the other day (mine seemed to shift, too; for whatever reason it seems
> like under load the difference is larger).
>
>> Now here is where I get on my soapbox (and create a TODO for myself later).
>> I ran the above with GIT_PERF_REPEAT_COUNT=10, which intuitively suggests
>> that the results should be _more_ accurate than the default of 3. However, I
>> then remember that we only report the *minimum* time from all the runs,
>> which is likely to select an outlier from the distribution. To test this, I
>> ran a few tests manually and found the variation between runs to be larger
>> than 3%.
>
> Yes, I agree it's not a great system. The whole "best of 3" thing is
> OK for throwing out cold-cache warmups, but it's really bad for teasing
> out the significance of small changes, or even understanding how much
> run-to-run noise there is.
>
>> When I choose my own metrics for performance tests, I like to run at least
>> 10 runs, remove the largest AND smallest runs from the samples, and then
>> take the average. I did this manually for 'git rev-list --all --objects' on
>> git/git and got the following results:
>
> I agree that technique is better. I wonder if there's something even
> more statistically rigorous we could do. E.g., to compute the variance
> and throw away outliers based on standard deviations. And also to report
> the variance to give a sense of the significance of any changes.
>
> Obviously more runs gives greater confidence in the results, but 10
> sounds like a lot. Many of these tests take minutes to run. Letting it
> go overnight is OK if you're doing a once-per-release mega-run, but it's
> pretty painful if you just want to generate some numbers to show off
> your commit.

An ex-coworker who's a *lot* smarter about these things than I am wrote
this module: https://metacpan.org/pod/Dumbbench

I while ago I dabbled briefly with integrating it into t/perf/ but got
distracted by something else:

The module currently works similar to [more traditional iterative
benchmark modules], except (in layman terms) it will run the command
many times, estimate the uncertainty of the result and keep
iterating until a certain user-defined precision has been
reached. Then, it calculates the resulting uncertainty and goes
through some pain to discard bad runs and subtract overhead from the
timings. The reported timing includes an uncertainty, so that
multiple benchmarks can more easily be compared.

Details of how it works here:
https://metacpan.org/pod/Dumbbench#HOW-IT-WORKS-AND-WHY-IT-DOESN'T

Something like that seems to me to be an inherently better
approach. I.e. we have lots of test cases that take 500ms, and some that
take maybe 5 minutes (depending on the size of the repository).

Indiscriminately repeating all of those for GIT_PERF_REPEAT_COUNT must
be dumber than something like the above method.

We could also speed up the runtime of the perf tests a lot with such a
method, by e.g. saying that we're OK with less certainty on tests that
take a longer time than those that take a shorter time.


>> v2.18.0 v2.19.0-rc0 HEAD
>> 
>> 3.126 s 3.308 s 3.170 s
>
> So that's 5% worsening in 2.19, and we reclaim all but 1.4% of it. Those
> numbers match what I expect (and what I was seeing in some of my earlier
> timings).
>
>> I just kicked off a script that will run this test on the Linux repo while I
>> drive home. I'll be able to report a similar table of data easily.
>
> Thanks, I'd expect it to come up with similar percentages. So we'll see
> if that holds true. :)
>
>> My TODO is to consider aggregating the data this way (or with a median)
>> instead of reporting the minimum.
>
> Yes, I think that would be a great improvement for t/perf.
>
> -Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Jeff King
On Thu, Aug 23, 2018 at 04:59:27PM -0400, Derrick Stolee wrote:

> Using git/git:
> 
> Test  v2.18.0   v2.19.0-rc0 HEAD
> -
> 0001.2:   3.10(3.02+0.08)   3.27(3.17+0.09) +5.5% 3.14(3.02+0.11) +1.3%
> 
> 
> Using torvalds/linux:
> 
> Test v2.18.0 v2.19.0-rc0   HEAD
> --
> 0001.2:  56.08(45.91+1.50)   56.60(46.62+1.50) +0.9% 54.61(45.47+1.46) -2.6%

Interesting that these timings aren't as dramatic as the ones you got
the other day (mine seemed to shift, too; for whatever reason it seems
like under load the difference is larger).

> Now here is where I get on my soapbox (and create a TODO for myself later).
> I ran the above with GIT_PERF_REPEAT_COUNT=10, which intuitively suggests
> that the results should be _more_ accurate than the default of 3. However, I
> then remember that we only report the *minimum* time from all the runs,
> which is likely to select an outlier from the distribution. To test this, I
> ran a few tests manually and found the variation between runs to be larger
> than 3%.

Yes, I agree it's not a great system. The whole "best of 3" thing is
OK for throwing out cold-cache warmups, but it's really bad for teasing
out the significance of small changes, or even understanding how much
run-to-run noise there is.

> When I choose my own metrics for performance tests, I like to run at least
> 10 runs, remove the largest AND smallest runs from the samples, and then
> take the average. I did this manually for 'git rev-list --all --objects' on
> git/git and got the following results:

I agree that technique is better. I wonder if there's something even
more statistically rigorous we could do. E.g., to compute the variance
and throw away outliers based on standard deviations. And also to report
the variance to give a sense of the significance of any changes.

Obviously more runs gives greater confidence in the results, but 10
sounds like a lot. Many of these tests take minutes to run. Letting it
go overnight is OK if you're doing a once-per-release mega-run, but it's
pretty painful if you just want to generate some numbers to show off
your commit.

> v2.18.0    v2.19.0-rc0   HEAD
> 
> 3.126 s    3.308 s   3.170 s

So that's 5% worsening in 2.19, and we reclaim all but 1.4% of it. Those
numbers match what I expect (and what I was seeing in some of my earlier
timings).

> I just kicked off a script that will run this test on the Linux repo while I
> drive home. I'll be able to report a similar table of data easily.

Thanks, I'd expect it to come up with similar percentages. So we'll see
if that holds true. :)

> My TODO is to consider aggregating the data this way (or with a median)
> instead of reporting the minimum.

Yes, I think that would be a great improvement for t/perf.

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Jeff King
On Thu, Aug 23, 2018 at 10:59:55PM -0400, Jeff King wrote:

> So I think we have a winner. I'll polish that up into patches and send
> it out later tonight.

Oof. This rabbit hole keeps going deeper and deeper. I wrote up my
coccinelle findings separately in:

  https://public-inbox.org/git/20180824064228.ga3...@sigill.intra.peff.net/

which is possibly a coccinelle bug (there I talked about oidcmp, since
it can be demonstrated with the existing transformations, but the same
thing happens with my hasheq patches). I'll wait to see how that
discussion plays out, but I do otherwise have hasheq() patches ready to
go, so it's probably not worth anybody else digging in further.

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Jeff King
On Thu, Aug 23, 2018 at 07:48:42PM -0700, Jacob Keller wrote:

> Odd...
> 
> What about..
> 
> - if (oidcmp(a,b))
> + if(!oideq(a,b))
>   { ... }

Nope, it doesn't like that syntactically.

> Or maybe you need to use something like
> 
>   <...
> - if (oidcmp(a,b))
> + if (!oideq(a,b))
>   ...>

Nor that (I also tried finding documentation on what exactly the angle
brackets mean, but couldn't).

> Hmm. Yea, semantic patches are a bit confusing overall sometimes.
> 
> But it looks like you got something which works?

Actually, what I showed earlier does seem to have some weirdness with
else-if. But I finally stumbled on something even better:

  - oidcmp(a, b) != 0
  + !oideq(a, b)

Because of the isomorphisms that coccinelle knows about, that catches
everything we want.  Obvious ones like:

diff --git a/bisect.c b/bisect.c
index 41c56a665e..7c1d8f1a6d 100644
--- a/bisect.c
+++ b/bisect.c
@@ -595,7 +595,7 @@ static struct commit_list *skip_away(struct commit_list 
*list, int count)
 
for (i = 0; cur; cur = cur->next, i++) {
if (i == index) {
-   if (oidcmp(&cur->item->object.oid, current_bad_oid))
+   if (!oideq(&cur->item->object.oid, current_bad_oid))
return cur;
if (previous)
return previous;

and compound conditionals like:

diff --git a/blame.c b/blame.c
index 10d72e36dd..538d0ab1aa 100644
--- a/blame.c
+++ b/blame.c
@@ -1834,7 +1834,7 @@ void setup_scoreboard(struct blame_scoreboard *sb,

sb->revs->children.name = "children";
while (c->parents &&
-  oidcmp(&c->object.oid, &sb->final->object.oid)) {
+  !oideq(&c->object.oid, &sb->final->object.oid)) {
struct commit_list *l = xcalloc(1, sizeof(*l));

l->item = c;

and even non-if contexts, like:

diff --git a/sha1-file.c b/sha1-file.c
index 631f6b9dc2..d85f4e93e1 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -825,7 +825,7 @@ int check_object_signature(const struct object_id *oid, 
void *map,
 
if (map) {
hash_object_file(map, size, type, &real_oid);
-   return oidcmp(oid, &real_oid) ? -1 : 0;
+   return !oideq(oid, &real_oid) ? -1 : 0;
}

So I think we have a winner. I'll polish that up into patches and send
it out later tonight.

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Jacob Keller
On Thu, Aug 23, 2018 at 5:16 PM Jeff King  wrote:
>
> On Thu, Aug 23, 2018 at 08:06:37PM -0400, Jeff King wrote:
>
> > This almost works:
> >
> >   @@
> >   expression a, b;
> >   statement s;
> >   @@
> >   - if (oidcmp(a, b)) s
> >   + if (!oideq(a, b)) s
> >
> > [...]
>
> > So I really do want some way of saying "all of the block, no matter what
> > it is". Or of leaving it out as context.
>
> Aha. The magic invocation is:
>
>   @@
>   expression a, b;
>   statement s;
>   @@
>   - if (oidcmp(a, b))
>   + if (!oideq(a, b))
>   s
>
> I would have expected that you could replace "s" with "...", but that
> does not seem to work.
>
> -Peff

Odd...

What about..

- if (oidcmp(a,b))
+ if(!oideq(a,b))
  { ... }

Or maybe you need to use something like

  <...
- if (oidcmp(a,b))
+ if (!oideq(a,b))
  ...>

Hmm. Yea, semantic patches are a bit confusing overall sometimes.

But it looks like you got something which works?

Thanks,
Jake


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Jeff King
On Thu, Aug 23, 2018 at 08:06:37PM -0400, Jeff King wrote:

> This almost works:
> 
>   @@
>   expression a, b;
>   statement s;
>   @@
>   - if (oidcmp(a, b)) s
>   + if (!oideq(a, b)) s
>
> [...]

> So I really do want some way of saying "all of the block, no matter what
> it is". Or of leaving it out as context.

Aha. The magic invocation is:

  @@
  expression a, b;
  statement s;
  @@
  - if (oidcmp(a, b))
  + if (!oideq(a, b))
  s

I would have expected that you could replace "s" with "...", but that
does not seem to work.

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Jeff King
On Thu, Aug 23, 2018 at 07:40:49PM -0400, Jeff King wrote:

> > You can look for explicitly "if (oidcmp(...))" though. I don't know if
> > you can catch *any* use which degrades to boolean outside of an if
> > statement, but I wouldn't expect there to be too many of those?
> 
> Yeah, that was my thought, too. And I've been trying this all afternoon
> without success. Why doesn't this work:
> 
>   @@
>   expression a, b;
>   @@
>   - if (oidcmp(a, b))
>   + if (!oideq(a, b))
> 
> I get:
> 
>   Fatal error: exception Failure("minus: parse error: \n = File
>   \"contrib/coccinelle/oideq.cocci\", line 21, column 0,  charpos =
>   221\naround = '', whole content = \n")
> 
> If I do:
> 
>   - if (oidcmp(a, b)) { ... }
> 
> that seems to please the parser for the minus line. But I cannot include
> the "..." on the plus line. Clearly the "..." part should be context,
> but I can't seem to find the right syntax.

This almost works:

  @@
  expression a, b;
  statement s;
  @@
  - if (oidcmp(a, b)) s
  + if (!oideq(a, b)) s

It generates this, for example:

diff -u -p a/bisect.c b/bisect.c
--- a/bisect.c
+++ b/bisect.c
@@ -595,7 +595,7 @@ static struct commit_list *skip_away(str
 
for (i = 0; cur; cur = cur->next, i++) {
if (i == index) {
-   if (oidcmp(&cur->item->object.oid, current_bad_oid))
+   if (!oideq(&cur->item->object.oid, current_bad_oid))
return cur;
if (previous)
return previous;

which is what we want. But it also generates this:

diff -u -p a/bundle.c b/bundle.c
--- a/bundle.c
+++ b/bundle.c
@@ -369,25 +369,11 @@ static int write_bundle_refs(int bundle_
 * commit that is referenced by the tag, and not the tag
 * itself.
 */
-   if (oidcmp(&oid, &e->item->oid)) {
-   /*
-* Is this the positive end of a range expressed
-* in terms of a tag (e.g. v2.0 from the range
-* "v1.0..v2.0")?
-*/
-   struct commit *one = 
lookup_commit_reference(the_repository,
-&oid);
+   if (!oideq(&oid, &e->item->oid)) {
+   struct commit 
*one=lookup_commit_reference(the_repository,
+  &oid);
struct object *obj;
-
if (e->item == &(one->object)) {
-   /*
-* Need to include e->name as an
-* independent ref to the pack-objects
-* input, so that the tag is included
-* in the output; otherwise we would
-* end up triggering "empty bundle"
-* error.
-*/
obj = parse_object_or_die(&oid, e->name);
obj->flags |= SHOWN;
add_pending_object(revs, obj, e->name);

So I really do want some way of saying "all of the block, no matter what
it is". Or of leaving it out as context.

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Jeff King
On Thu, Aug 23, 2018 at 04:30:10PM -0700, Jacob Keller wrote:

> On Thu, Aug 23, 2018 at 9:30 AM Jeff King  wrote:
> > I think that audit isn't actually too bad (but definitely not something
> > we should do for v2.19). The cocci patch I showed earlier hits most of
> > them. It misses the negated ones (e.g., "if (oidcmp(...))"). I'm not
> > sure if there's a way to ask coccinelle only for oidcmp()
> > used in a boolean context.
> >
> 
> You can look for explicitly "if (oidcmp(...))" though. I don't know if
> you can catch *any* use which degrades to boolean outside of an if
> statement, but I wouldn't expect there to be too many of those?

Yeah, that was my thought, too. And I've been trying this all afternoon
without success. Why doesn't this work:

  @@
  expression a, b;
  @@
  - if (oidcmp(a, b))
  + if (!oideq(a, b))

I get:

  Fatal error: exception Failure("minus: parse error: \n = File
  \"contrib/coccinelle/oideq.cocci\", line 21, column 0,  charpos =
  221\naround = '', whole content = \n")

If I do:

  - if (oidcmp(a, b)) { ... }

that seems to please the parser for the minus line. But I cannot include
the "..." on the plus line. Clearly the "..." part should be context,
but I can't seem to find the right syntax.

FWIW, I do have patches adding hasheq() and converting all of the
!oidcmp() cases. I may resort to hand-investigating each of the negated
ones, but I really feel like I should be able to do better with
coccinelle.

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Jacob Keller
On Thu, Aug 23, 2018 at 9:30 AM Jeff King  wrote:
> I think that audit isn't actually too bad (but definitely not something
> we should do for v2.19). The cocci patch I showed earlier hits most of
> them. It misses the negated ones (e.g., "if (oidcmp(...))"). I'm not
> sure if there's a way to ask coccinelle only for oidcmp()
> used in a boolean context.
>

You can look for explicitly "if (oidcmp(...))" though. I don't know if
you can catch *any* use which degrades to boolean outside of an if
statement, but I wouldn't expect there to be too many of those?

Thanks,
Jake


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Derrick Stolee

On 8/23/2018 2:53 PM, Jeff King wrote:

On Thu, Aug 23, 2018 at 06:26:58AM -0400, Derrick Stolee wrote:


I think you can safely
ignore the rest of it if you are otherwise occupied. Even if v2.19 ships
without some mitigation, I don't know that it's all that big a deal,
given the numbers I generated (which for some reason are less dramatic
than Stolee's).

My numbers may be more dramatic because my Linux environment is a virtual
machine.

If you have a chance, can you run p0001 on my patch (compared to
2.19-rc0, or to both v2.18 and v2.19-rc0)? It would be nice to double
check that it really is fixing the problem you saw.


Sure. Note: I had to create a new Linux VM on a different machine 
between Tuesday and today, so the absolute numbers are different.


Using git/git:

Test  v2.18.0   v2.19.0-rc0 HEAD
-
0001.2:   3.10(3.02+0.08)   3.27(3.17+0.09) +5.5% 3.14(3.02+0.11) +1.3%


Using torvalds/linux:

Test v2.18.0 v2.19.0-rc0   HEAD
--
0001.2:  56.08(45.91+1.50)   56.60(46.62+1.50) +0.9% 54.61(45.47+1.46) -2.6%


Now here is where I get on my soapbox (and create a TODO for myself 
later). I ran the above with GIT_PERF_REPEAT_COUNT=10, which intuitively 
suggests that the results should be _more_ accurate than the default of 
3. However, I then remember that we only report the *minimum* time from 
all the runs, which is likely to select an outlier from the 
distribution. To test this, I ran a few tests manually and found the 
variation between runs to be larger than 3%.


When I choose my own metrics for performance tests, I like to run at 
least 10 runs, remove the largest AND smallest runs from the samples, 
and then take the average. I did this manually for 'git rev-list --all 
--objects' on git/git and got the following results:


v2.18.0    v2.19.0-rc0   HEAD

3.126 s    3.308 s   3.170 s

For full disclosure, here is a full table including all samples:

|  | v2.18.0 | v2.19.0-rc0 | HEAD    |
|--|-|-|-|
|  | 4.58    | 3.302   | 3.239   |
|  | 3.13    | 3.337   | 3.133   |
|  | 3.213   | 3.291   | 3.159   |
|  | 3.219   | 3.318   | 3.131   |
|  | 3.077   | 3.302   | 3.163   |
|  | 3.074   | 3.328   | 3.119   |
|  | 3.022   | 3.277   | 3.125   |
|  | 3.083   | 3.259   | 3.203   |
|  | 3.057   | 3.311   | 3.223   |
|  | 3.155   | 3.413   | 3.225   |
| Max  | 4.58    | 3.413   | 3.239   |
| Min  | 3.022   | 3.259   | 3.119   |
| Avg* | 3.126   | 3.30825 | 3.17025 |

(Note that the largest one was the first run, on v2.18.0, which is due 
to a cold disk.)


I just kicked off a script that will run this test on the Linux repo 
while I drive home. I'll be able to report a similar table of data easily.


My TODO is to consider aggregating the data this way (or with a median) 
instead of reporting the minimum.


Thanks,

-Stolee



Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Jeff King
On Thu, Aug 23, 2018 at 06:26:58AM -0400, Derrick Stolee wrote:

> > I think you can safely
> > ignore the rest of it if you are otherwise occupied. Even if v2.19 ships
> > without some mitigation, I don't know that it's all that big a deal,
> > given the numbers I generated (which for some reason are less dramatic
> > than Stolee's).
> My numbers may be more dramatic because my Linux environment is a virtual
> machine.

If you have a chance, can you run p0001 on my patch (compared to
2.19-rc0, or to both v2.18 and v2.19-rc0)? It would be nice to double
check that it really is fixing the problem you saw.

-Peff


wide t/perf output, was Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Jeff King
On Thu, Aug 23, 2018 at 06:20:26AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Here are numbers for p0001.2 run against linux.git on a few
> > versions. This is using -O2 with gcc 8.2.0.
> >
> >   Test v2.18.0 v2.19.0-rc0   HEAD
> >   
> > --
> >   0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) 
> > -1.0%
> 
> I see what you did to the formatting here, which is a topic of
> another thread ;-).

Do you happen to have a link? I missed that one, and digging turned up
nothing.

A while ago I wrote the patch below. I don't recall why I never sent it
in (and it doesn't apply cleanly these days, though I'm sure it could be
forward-ported).

-- >8 --
Date: Wed, 20 Jan 2016 23:54:14 -0500
Subject: [PATCH] t/perf: add "tall" output format

When aggregating results, we usually show a list of tests,
one per line, with the tested revisions in columns across.
Like:

$ ./aggregate.perl 348d4f2^ 348d4f2 p7000-filter-branch.sh
Test  348d4f2^   348d4f2
---
7000.2: noop filter   295.32(269.61+14.36)   7.92(0.85+0.72) -97.3%

This is useful if you have a lot of tests to show, but few
revisions; you're effectively comparing the two items on
each line. But sometimes you have the opposite: few tests,
but a large number of revisions. In this case, the lines
get very long, and it's hard to compare values.

This patch introduces a "tall" format that shows the same
data in a more vertical manner:

$ ./aggregate.perl --tall \
348d4f2^ 348d4f2 \
jk/filter-branch-empty^ \
jk/filter-branch-empty \
p7000-filter-branch.sh
Test: p7000-filter-branch.2
348d4f2^  295.32(269.61+14.36)
348d4f27.92(0.85+0.72) -97.3%
jk/filter-branch-empty^9.37(0.87+0.80) -96.8%
jk/filter-branch-empty 7.71(0.92+0.62) -97.4%

Signed-off-by: Jeff King 
---
 t/perf/aggregate.perl | 124 ++
 1 file changed, 88 insertions(+), 36 deletions(-)

diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
index e401208488..d108a02ccd 100755
--- a/t/perf/aggregate.perl
+++ b/t/perf/aggregate.perl
@@ -17,29 +17,41 @@ sub get_times {
return ($rt, $4, $5);
 }
 
-sub format_times {
+sub format_times_list {
my ($r, $u, $s, $firstr) = @_;
if (!defined $r) {
return "";
}
my $out = sprintf "%.2f(%.2f+%.2f)", $r, $u, $s;
+   my $pct;
if (defined $firstr) {
if ($firstr > 0) {
-   $out .= sprintf " %+.1f%%", 100.0*($r-$firstr)/$firstr;
+   $pct = sprintf "%+.1f%%", 100.0*($r-$firstr)/$firstr;
} elsif ($r == 0) {
-   $out .= " =";
+   $pct = "=";
} else {
-   $out .= " +inf";
+   $pct = "+inf";
}
}
-   return $out;
+   return ($out, $pct);
+}
+
+sub format_times {
+   my ($times, $pct) = format_times_list(@_);
+   return defined $pct ? "$times $pct" : $times;
 }
 
 my (@dirs, %dirnames, %dirabbrevs, %prefixes, @tests);
+my ($tall_format);
 while (scalar @ARGV) {
my $arg = $ARGV[0];
my $dir;
last if -f $arg or $arg eq "--";
+   if ($arg eq "--tall") {
+   $tall_format = 1;
+   shift @ARGV;
+   next;
+   }
if (! -d $arg) {
my $rev = Git::command_oneline(qw(rev-parse --verify), $arg);
$dir = "build/".$rev;
@@ -122,6 +134,11 @@ sub have_slash {
return 0;
 }
 
+sub printable_dir {
+   my ($d) = @_;
+   return exists $dirabbrevs{$d} ? $dirabbrevs{$d} : $dirnames{$d};
+}
+
 my %newdirabbrevs = %dirabbrevs;
 while (!have_duplicate(values %newdirabbrevs)) {
%dirabbrevs = %newdirabbrevs;
@@ -132,44 +149,79 @@ sub have_slash {
}
 }
 
-my %times;
-my @colwidth = ((0)x@dirs);
-for my $i (0..$#dirs) {
-   my $d = $dirs[$i];
-   my $w = length (exists $dirabbrevs{$d} ? $dirabbrevs{$d} : 
$dirnames{$d});
-   $colwidth[$i] = $w if $w > $colwidth[$i];
-}
-for my $t (@subtests) {
-   my $firstr;
+binmode STDOUT, ":utf8" or die "PANIC on binmode: $!";
+
+if (!$tall_format) {
+   my %times;
+   my @colwidth = ((0)x@dirs);
for my $i (0..$#dirs) {
my $d = $dirs[$i];
-   $times{$prefixes{$d}.$t} = 
[get_times("$resultsdir/$prefixes{$d}$t.times")];
-   my ($r,$u,$s) = @{$times{$prefixes{$d}.$t}};
-   my $w = length format_times($r,$u,$s,$firstr);
+   my $w = length(printable_dir($d));
$colwidth[$i] = $w if $w > $colwidth[$i];
-   $firstr = $r unless defined $firstr;
}
-}
-my $t

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Jeff King
On Thu, Aug 23, 2018 at 06:26:58AM -0400, Derrick Stolee wrote:

> Around the time that my proposed approaches were getting vetoed for
> alignment issues, I figured I was out of my depth here. I reached out to
> Daniel Lemire (of EWAH bitmap fame) on Twitter [1]. His blog is full of
> posts of word-based approaches to different problems, so I thought he might
> know something off the top of his head that would be applicable. His
> conclusion (after looking only a short time) was to take a 'hasheq' approach
> [2] like Peff suggested [3]. Since that requires auditing all callers of
> hashcmp to see if hasheq is appropriate, it is not a good solution for 2.19
> but (in my opinion) should be evaluated as part of the 2.20 cycle.

I think that audit isn't actually too bad (but definitely not something
we should do for v2.19). The cocci patch I showed earlier hits most of
them. It misses the negated ones (e.g., "if (oidcmp(...))"). I'm not
sure if there's a way to ask coccinelle only for oidcmp()
used in a boolean context.

Just skimming the grep output, it's basically every call except the ones
we use for binary search.

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Junio C Hamano
Jeff King  writes:

> Here are numbers for p0001.2 run against linux.git on a few
> versions. This is using -O2 with gcc 8.2.0.
>
>   Test v2.18.0 v2.19.0-rc0   HEAD
>   
> --
>   0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) 
> -1.0%

I see what you did to the formatting here, which is a topic of
another thread ;-).

Thanks, as Derrick also noted, I agree this is an appropriate
workaround for the upcoming release and we may want to explore
hasheq() and other solutions as part of the effort during the next
cycle (which can start now if people are bored---after all working
on the codebase is the best way to hunt for recent bugs).

Thanks, will queue.

> diff --git a/cache.h b/cache.h
> index b1fd3d58ab..4d014541ab 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1023,6 +1023,16 @@ extern const struct object_id null_oid;
>  
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> *sha2)
>  {
> + /*
> +  * This is a temporary optimization hack. By asserting the size here,
> +  * we let the compiler know that it's always going to be 20, which lets
> +  * it turn this fixed-size memcmp into a few inline instructions.
> +  *
> +  * This will need to be extended or ripped out when we learn about
> +  * hashes of different sizes.
> +  */
> + if (the_hash_algo->rawsz != 20)
> + BUG("hash size not yet supported by hashcmp");
>   return memcmp(sha1, sha2, the_hash_algo->rawsz);
>  }


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Junio C Hamano
Derrick Stolee  writes:

> I was thinking that having a mitigation for 2.19 is best, and then we
> can focus as part of the 2.20 cycle how we can properly avoid this
> cost, especially when 32 is a valid option.
> ...
> ... to
> take a 'hasheq' approach [2] like Peff suggested [3]. Since that
> requires auditing all callers of hashcmp to see if hasheq is
> appropriate, it is not a good solution for 2.19 but (in my opinion)
> should be evaluated as part of the 2.20 cycle.

Thanks for thoughtful comments.  I think it makes sense to go with
the "tell compiler hashcmp() currently is only about 20-byte array"
for now, as it is trivial to see why it cannot break things.

During 2.20 cycle, we may find out that hasheq() abstraction
performs better than hashcmp() with multiple lengths, and end up
doing that "audit and replace to use hasheq()".  But as you said,
it probably isnot a good idea to rush it in this cycle.


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Derrick Stolee

On 8/23/2018 1:04 AM, Jeff King wrote:

On Thu, Aug 23, 2018 at 03:47:07AM +, brian m. carlson wrote:


I expect that's going to be the case as well.  I have patches that
wire up actual SHA-256 support in my hash-impl branch.

However, having said that, I'm happy to defer to whatever everyone else
thinks is best for 2.19.  The assert solution would be fine with me in
this situation, and if we need to pull it out in the future, that's okay
with me.

I don't really have a strong opinion on this either way, so if someone
else does, please say so.  I have somewhat more limited availability
over the next couple days, as I'm travelling on business, but I'm happy
to review a patch (and it seems like Peff has one minus the actual
commit message).

I just posted the patch elsewhere in the thread.


Thank you for that!


I think you can safely
ignore the rest of it if you are otherwise occupied. Even if v2.19 ships
without some mitigation, I don't know that it's all that big a deal,
given the numbers I generated (which for some reason are less dramatic
than Stolee's).
My numbers may be more dramatic because my Linux environment is a 
virtual machine.


I was thinking that having a mitigation for 2.19 is best, and then we 
can focus as part of the 2.20 cycle how we can properly avoid this cost, 
especially when 32 is a valid option.


Around the time that my proposed approaches were getting vetoed for 
alignment issues, I figured I was out of my depth here. I reached out to 
Daniel Lemire (of EWAH bitmap fame) on Twitter [1]. His blog is full of 
posts of word-based approaches to different problems, so I thought he 
might know something off the top of his head that would be applicable. 
His conclusion (after looking only a short time) was to take a 'hasheq' 
approach [2] like Peff suggested [3]. Since that requires auditing all 
callers of hashcmp to see if hasheq is appropriate, it is not a good 
solution for 2.19 but (in my opinion) should be evaluated as part of the 
2.20 cycle.


Of course, if someone with knowledge of word-alignment issues across the 
platforms we support knows how to enforce an alignment for object_id, 
then something word-based like [4] could be reconsidered.


Thanks, everyone!
-Stolee

[1] https://twitter.com/stolee/status/1032312965754748930

[2] 
https://lemire.me/blog/2018/08/22/avoid-lexicographical-comparisons-when-testing-for-string-equality/


[3] 
https://public-inbox.org/git/20180822030344.ga14...@sigill.intra.peff.net/


[4] 
https://public-inbox.org/git/7ea416cf-b043-1274-e161-85a8780b8...@gmail.com/


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread brian m. carlson
On Thu, Aug 23, 2018 at 01:02:25AM -0400, Jeff King wrote:
> Here's the patch. For some reason my numbers aren't quite as large as
> they were yesterday (I was very careful to keep the system unloaded
> today, whereas yesterday I was doing a few other things, so perhaps that
> is the difference).

This looks sane to me.  Thanks for writing this up.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Jonathan Nieder
Jeff King wrote:

> Here's the patch. For some reason my numbers aren't quite as large as
> they were yesterday (I was very careful to keep the system unloaded
> today, whereas yesterday I was doing a few other things, so perhaps that
> is the difference).
>
> -- >8 --
> Subject: [PATCH] hashcmp: assert constant hash size
>
> Prior to 509f6f62a4 (cache: update object ID functions for
> the_hash_algo, 2018-07-16), hashcmp() called memcmp() with a
> constant size of 20 bytes. Some compilers were able to turn
> that into a few quad-word comparisons, which is faster than
> actually calling memcmp().
>
> In 509f6f62a4, we started using the_hash_algo->rawsz
> instead. Even though this will always be 20, the compiler
> doesn't know that while inlining hashcmp() and ends up just
> generating a call to memcmp().
>
> Eventually we'll have to deal with multiple hash sizes, but
> for the upcoming v2.19, we can restore some of the original
> performance by asserting on the size. That gives the
> compiler enough information to know that the memcmp will
> always be called with a length of 20, and it performs the
> same optimization.
>
> Here are numbers for p0001.2 run against linux.git on a few
> versions. This is using -O2 with gcc 8.2.0.
>
>   Test v2.18.0 v2.19.0-rc0   HEAD
>   
> --
>   0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) 
> -1.0%
>
> You can see that v2.19 is a little slower than v2.18. This
> commit ended up slightly faster than v2.18, but there's a
> fair bit of run-to-run noise (the generated code in the two
> cases is basically the same). This patch does seem to be
> consistently 1-2% faster than v2.19.
>
> I tried changing hashcpy(), which was also touched by
> 509f6f62a4, in the same way, but couldn't measure any
> speedup. Which makes sense, at least for this workload. A
> traversal of the whole commit graph requires looking up
> every entry of every tree via lookup_object(). That's many
> multiples of the numbers of objects in the repository (most
> of the lookups just return "yes, we already saw that
> object").
>
> Reported-by: Derrick Stolee 
> Signed-off-by: Jeff King 
> ---
>  cache.h | 10 ++
>  1 file changed, 10 insertions(+)

Reviewed-by: Jonathan Nieder 

Verified using "make object.s" that the memcmp call goes away.  Thank
you.


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Jeff King
On Thu, Aug 23, 2018 at 03:47:07AM +, brian m. carlson wrote:

> I expect that's going to be the case as well.  I have patches that
> wire up actual SHA-256 support in my hash-impl branch.
> 
> However, having said that, I'm happy to defer to whatever everyone else
> thinks is best for 2.19.  The assert solution would be fine with me in
> this situation, and if we need to pull it out in the future, that's okay
> with me.
> 
> I don't really have a strong opinion on this either way, so if someone
> else does, please say so.  I have somewhat more limited availability
> over the next couple days, as I'm travelling on business, but I'm happy
> to review a patch (and it seems like Peff has one minus the actual
> commit message).

I just posted the patch elsewhere in the thread. I think you can safely
ignore the rest of it if you are otherwise occupied. Even if v2.19 ships
without some mitigation, I don't know that it's all that big a deal,
given the numbers I generated (which for some reason are less dramatic
than Stolee's).

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Jeff King
On Wed, Aug 22, 2018 at 07:27:56PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > FWIW, it's not 10%. The best I measured was ~4% on a very
> > hashcmp-limited operation, and I suspect even that may be highly
> > dependent on the compiler. We might be able to improve more by
> > sprinkling more asserts around, but there are 75 mentions of
> > the_hash_algo->rawsz. I wouldn't want to an assert at each one.
> >
> > I don't mind doing one or a handful of these asserts as part of v2.19 if
> > we want to try to reclaim those few percent. But I suspect the very
> > first commit in any further hash-transition work is just going to be to
> > rip them all out.
> 
> I was thinking just hashcmp and hashcpy.
> 
> Ideally such a change would come with a performance test to help the
> person writing that very first commit.  Except we already have
> performance tests that capture this. ;-)
> 
> For further hash-transition work, I agree someone may want to revert
> this, and I don't mind such a revert appearing right away in "next".
> And it's possible that we might have to do the equivalent of manual
> template expansion to recover the performance in some
> performance-sensitive areas.  Maybe we can get the compiler to
> cooperate with us in that and maybe we can't.  That's okay with me.
> 
> Anyway, I'll resend your patch with a commit message added some time
> this evening.

Here's the patch. For some reason my numbers aren't quite as large as
they were yesterday (I was very careful to keep the system unloaded
today, whereas yesterday I was doing a few other things, so perhaps that
is the difference).

-- >8 --
Subject: [PATCH] hashcmp: assert constant hash size

Prior to 509f6f62a4 (cache: update object ID functions for
the_hash_algo, 2018-07-16), hashcmp() called memcmp() with a
constant size of 20 bytes. Some compilers were able to turn
that into a few quad-word comparisons, which is faster than
actually calling memcmp().

In 509f6f62a4, we started using the_hash_algo->rawsz
instead. Even though this will always be 20, the compiler
doesn't know that while inlining hashcmp() and ends up just
generating a call to memcmp().

Eventually we'll have to deal with multiple hash sizes, but
for the upcoming v2.19, we can restore some of the original
performance by asserting on the size. That gives the
compiler enough information to know that the memcmp will
always be called with a length of 20, and it performs the
same optimization.

Here are numbers for p0001.2 run against linux.git on a few
versions. This is using -O2 with gcc 8.2.0.

  Test v2.18.0 v2.19.0-rc0   HEAD
  --
  0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) -1.0%

You can see that v2.19 is a little slower than v2.18. This
commit ended up slightly faster than v2.18, but there's a
fair bit of run-to-run noise (the generated code in the two
cases is basically the same). This patch does seem to be
consistently 1-2% faster than v2.19.

I tried changing hashcpy(), which was also touched by
509f6f62a4, in the same way, but couldn't measure any
speedup. Which makes sense, at least for this workload. A
traversal of the whole commit graph requires looking up
every entry of every tree via lookup_object(). That's many
multiples of the numbers of objects in the repository (most
of the lookups just return "yes, we already saw that
object").

Reported-by: Derrick Stolee 
Signed-off-by: Jeff King 
---
 cache.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/cache.h b/cache.h
index b1fd3d58ab..4d014541ab 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,6 +1023,16 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
+   /*
+* This is a temporary optimization hack. By asserting the size here,
+* we let the compiler know that it's always going to be 20, which lets
+* it turn this fixed-size memcmp into a few inline instructions.
+*
+* This will need to be extended or ripped out when we learn about
+* hashes of different sizes.
+*/
+   if (the_hash_algo->rawsz != 20)
+   BUG("hash size not yet supported by hashcmp");
return memcmp(sha1, sha2, the_hash_algo->rawsz);
 }
 
-- 
2.19.0.rc0.412.g7005db4e88



Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread brian m. carlson
On Wed, Aug 22, 2018 at 10:16:18PM -0400, Jeff King wrote:
> FWIW, it's not 10%. The best I measured was ~4% on a very
> hashcmp-limited operation, and I suspect even that may be highly
> dependent on the compiler. We might be able to improve more by
> sprinkling more asserts around, but there are 75 mentions of
> the_hash_algo->rawsz. I wouldn't want to an assert at each one.
> 
> I don't mind doing one or a handful of these asserts as part of v2.19 if
> we want to try to reclaim those few percent. But I suspect the very
> first commit in any further hash-transition work is just going to be to
> rip them all out.

I expect that's going to be the case as well.  I have patches that
wire up actual SHA-256 support in my hash-impl branch.

However, having said that, I'm happy to defer to whatever everyone else
thinks is best for 2.19.  The assert solution would be fine with me in
this situation, and if we need to pull it out in the future, that's okay
with me.

I don't really have a strong opinion on this either way, so if someone
else does, please say so.  I have somewhat more limited availability
over the next couple days, as I'm travelling on business, but I'm happy
to review a patch (and it seems like Peff has one minus the actual
commit message).
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Jonathan Nieder
Jeff King wrote:

> FWIW, it's not 10%. The best I measured was ~4% on a very
> hashcmp-limited operation, and I suspect even that may be highly
> dependent on the compiler. We might be able to improve more by
> sprinkling more asserts around, but there are 75 mentions of
> the_hash_algo->rawsz. I wouldn't want to an assert at each one.
>
> I don't mind doing one or a handful of these asserts as part of v2.19 if
> we want to try to reclaim those few percent. But I suspect the very
> first commit in any further hash-transition work is just going to be to
> rip them all out.

I was thinking just hashcmp and hashcpy.

Ideally such a change would come with a performance test to help the
person writing that very first commit.  Except we already have
performance tests that capture this. ;-)

For further hash-transition work, I agree someone may want to revert
this, and I don't mind such a revert appearing right away in "next".
And it's possible that we might have to do the equivalent of manual
template expansion to recover the performance in some
performance-sensitive areas.  Maybe we can get the compiler to
cooperate with us in that and maybe we can't.  That's okay with me.

Anyway, I'll resend your patch with a commit message added some time
this evening.

Thanks,
Jonathan


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Jeff King
On Wed, Aug 22, 2018 at 06:23:43PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> >> On Tue, 2018-08-21 at 23:03 -0400, Jeff King wrote:
> 
> >>>  static inline int hashcmp(const unsigned char *sha1, const unsigned
> >>> char *sha2)
> >>>  {
> >>> +   assert(the_hash_algo->rawsz == 20);
> >>> return memcmp(sha1, sha2, the_hash_algo->rawsz);
> >>>  }
> [...]
> >  The bigger questions are:
> >
> >   - are we OK with such an assertion; and
> >
> >   - does the assertion still give us the desired behavior when we add in
> > a branch for rawsz==32?
> >
> > And I think the answers for those are both "probably not".
> 
> At this point in the release process, I think the answer to the first
> question is a pretty clear "yes".
> 
> A ~10% increase in latency of some operations is quite significant, in
> exchange for no user benefit yet.  We can continue to try to figure
> out how to convince compilers to generate good code for this (and
> that's useful), but in the meantime we should also do the simple thing
> to avoid the regression for users.

FWIW, it's not 10%. The best I measured was ~4% on a very
hashcmp-limited operation, and I suspect even that may be highly
dependent on the compiler. We might be able to improve more by
sprinkling more asserts around, but there are 75 mentions of
the_hash_algo->rawsz. I wouldn't want to an assert at each one.

I don't mind doing one or a handful of these asserts as part of v2.19 if
we want to try to reclaim those few percent. But I suspect the very
first commit in any further hash-transition work is just going to be to
rip them all out.

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Jonathan Nieder
Hi,

Jeff King wrote:
>> On Tue, 2018-08-21 at 23:03 -0400, Jeff King wrote:

>>>  static inline int hashcmp(const unsigned char *sha1, const unsigned
>>> char *sha2)
>>>  {
>>> +   assert(the_hash_algo->rawsz == 20);
>>> return memcmp(sha1, sha2, the_hash_algo->rawsz);
>>>  }
[...]
>  The bigger questions are:
>
>   - are we OK with such an assertion; and
>
>   - does the assertion still give us the desired behavior when we add in
> a branch for rawsz==32?
>
> And I think the answers for those are both "probably not".

At this point in the release process, I think the answer to the first
question is a pretty clear "yes".

A ~10% increase in latency of some operations is quite significant, in
exchange for no user benefit yet.  We can continue to try to figure
out how to convince compilers to generate good code for this (and
that's useful), but in the meantime we should also do the simple thing
to avoid the regression for users.

Thanks,
Jonathan


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Derrick Stolee

On 8/22/2018 12:58 PM, Duy Nguyen wrote:

On Wed, Aug 22, 2018 at 6:49 PM Derrick Stolee  wrote:

On 8/22/2018 12:26 PM, Jeff King wrote:

On Wed, Aug 22, 2018 at 06:14:24PM +0200, Duy Nguyen wrote:


On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen  wrote:

On Wed, Aug 22, 2018 at 6:03 PM Jeff King  wrote:

On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:


The other thing I was going to recommend (and I'll try to test this out
myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
volatile variable, since it is being referenced through a pointer. Perhaps
storing the value locally and then casing on it would help?

I tried various sprinkling of "const" around the declarations to make it
clear that the values wouldn't change once we saw them. But I couldn't
detect any difference. At most I think that would let us hoist the "if"
out of the loop, but gcc still seems unwilling to expand the memcmp when
there are other branches.

I think if that's the thing we want to have happen, we really do need to
just write it out on that branch rather than saying "memcmp".

This reminds me of an old discussion about memcpy() vs doing explicit
compare loop with lots of performance measurements..

Ah found it. Not sure if it is still relevant in light of multiple hash support

https://public-inbox.org/git/20110427225114.ga16...@elte.hu/

Yes, that was what I meant. We actually did switch to that hand-rolled
loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
memcmp instead of open-coded loop, 2017-08-09).

Looking at that commit, I'm surprised the old logic was just a for loop, 
instead of a word-based approach, such as the following:

Might work on x86 but it breaks on cpu architectures with stricter
alignment. I don't think we have a guarantee that object_id is always
8 byte aligned.
You (and Peff) are probably correct here, which is unfortunate. I'm not 
familiar with alignment constraints, but assume that such a word-based 
approach is best.


Thanks,
-Stolee



Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Junio C Hamano
Jeff King  writes:

> On Wed, Aug 22, 2018 at 12:49:34PM -0400, Derrick Stolee wrote:
>
>> > Yes, that was what I meant. We actually did switch to that hand-rolled
>> > loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
>> > memcmp instead of open-coded loop, 2017-08-09).
>> 
>> Looking at that commit, I'm surprised the old logic was just a for
>> loop, instead of a word-based approach, such as the following:
>> [...]
>> +struct object_id_20 {
>> +   uint64_t data0;
>> +   uint64_t data1;
>> +   uint32_t data2;
>> +};
>> +
>>  static inline int hashcmp(const unsigned char *sha1, const unsigned char
>> *sha2)
>>  {
>> -   return memcmp(sha1, sha2, the_hash_algo->rawsz);
>> +   if (the_hash_algo->rawsz == 20) {
>> +   struct object_id_20 *obj1 = (struct object_id_20 *)sha1;
>> +   struct object_id_20 *obj2 = (struct object_id_20 *)sha2;
>
> I wonder if you're potentially running afoul of alignment requirements
> here.

Yup, and I think that all was discussed in that old thread ;-)


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Jeff King
On Wed, Aug 22, 2018 at 12:49:34PM -0400, Derrick Stolee wrote:

> > Yes, that was what I meant. We actually did switch to that hand-rolled
> > loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
> > memcmp instead of open-coded loop, 2017-08-09).
> 
> Looking at that commit, I'm surprised the old logic was just a for
> loop, instead of a word-based approach, such as the following:
> [...]
> +struct object_id_20 {
> +   uint64_t data0;
> +   uint64_t data1;
> +   uint32_t data2;
> +};
> +
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char
> *sha2)
>  {
> -   return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +   if (the_hash_algo->rawsz == 20) {
> +   struct object_id_20 *obj1 = (struct object_id_20 *)sha1;
> +   struct object_id_20 *obj2 = (struct object_id_20 *)sha2;

I wonder if you're potentially running afoul of alignment requirements
here.

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Duy Nguyen
On Wed, Aug 22, 2018 at 6:49 PM Derrick Stolee  wrote:
>
> On 8/22/2018 12:26 PM, Jeff King wrote:
> > On Wed, Aug 22, 2018 at 06:14:24PM +0200, Duy Nguyen wrote:
> >
> >> On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen  wrote:
> >>> On Wed, Aug 22, 2018 at 6:03 PM Jeff King  wrote:
>  On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:
> 
> > The other thing I was going to recommend (and I'll try to test this out
> > myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
> > volatile variable, since it is being referenced through a pointer. 
> > Perhaps
> > storing the value locally and then casing on it would help?
>  I tried various sprinkling of "const" around the declarations to make it
>  clear that the values wouldn't change once we saw them. But I couldn't
>  detect any difference. At most I think that would let us hoist the "if"
>  out of the loop, but gcc still seems unwilling to expand the memcmp when
>  there are other branches.
> 
>  I think if that's the thing we want to have happen, we really do need to
>  just write it out on that branch rather than saying "memcmp".
> >>> This reminds me of an old discussion about memcpy() vs doing explicit
> >>> compare loop with lots of performance measurements..
> >> Ah found it. Not sure if it is still relevant in light of multiple hash 
> >> support
> >>
> >> https://public-inbox.org/git/20110427225114.ga16...@elte.hu/
> > Yes, that was what I meant. We actually did switch to that hand-rolled
> > loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
> > memcmp instead of open-coded loop, 2017-08-09).
>
> Looking at that commit, I'm surprised the old logic was just a for loop, 
> instead of a word-based approach, such as the following:

Might work on x86 but it breaks on cpu architectures with stricter
alignment. I don't think we have a guarantee that object_id is always
8 byte aligned.

>
> diff --git a/cache.h b/cache.h
> index b1fd3d58ab..5e5819ad49 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1021,9 +1021,41 @@ extern int find_unique_abbrev_r(char *hex, const
> struct object_id *oid, int len)
>   extern const unsigned char null_sha1[GIT_MAX_RAWSZ];
>   extern const struct object_id null_oid;
>
> +static inline int word_cmp_32(uint32_t a, uint32_t b)
> +{
> +   return memcmp(&a, &b, sizeof(uint32_t));
> +}
> +
> +static inline int word_cmp_64(uint64_t a, uint64_t b)
> +{
> +   return memcmp(&a, &b, sizeof(uint64_t));
> +}
> +
> +struct object_id_20 {
> +   uint64_t data0;
> +   uint64_t data1;
> +   uint32_t data2;
> +};
> +
>   static inline int hashcmp(const unsigned char *sha1, const unsigned
> char *sha2)
>   {
> -   return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +   if (the_hash_algo->rawsz == 20) {
> +   struct object_id_20 *obj1 = (struct object_id_20 *)sha1;
> +   struct object_id_20 *obj2 = (struct object_id_20 *)sha2;
> +
> +   if (obj1->data0 == obj2->data0) {
> +   if (obj1->data1 == obj2->data1) {
> +   if (obj1->data2 == obj2->data2) {
> +   return 0;
> +   }
> +   return word_cmp_32(obj1->data2,
> obj2->data2);
> +   }
> +   return word_cmp_64(obj1->data1, obj2->data1);
> +   }
> +   return word_cmp_64(obj1->data0, obj2->data0);
> +   }
> +
> +   assert(0);
>   }
>
>   static inline int oidcmp(const struct object_id *oid1, const struct
> object_id *oid2)
>
>


-- 
Duy


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Derrick Stolee

On 8/22/2018 12:26 PM, Jeff King wrote:

On Wed, Aug 22, 2018 at 06:14:24PM +0200, Duy Nguyen wrote:


On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen  wrote:

On Wed, Aug 22, 2018 at 6:03 PM Jeff King  wrote:

On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:


The other thing I was going to recommend (and I'll try to test this out
myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
volatile variable, since it is being referenced through a pointer. Perhaps
storing the value locally and then casing on it would help?

I tried various sprinkling of "const" around the declarations to make it
clear that the values wouldn't change once we saw them. But I couldn't
detect any difference. At most I think that would let us hoist the "if"
out of the loop, but gcc still seems unwilling to expand the memcmp when
there are other branches.

I think if that's the thing we want to have happen, we really do need to
just write it out on that branch rather than saying "memcmp".

This reminds me of an old discussion about memcpy() vs doing explicit
compare loop with lots of performance measurements..

Ah found it. Not sure if it is still relevant in light of multiple hash support

https://public-inbox.org/git/20110427225114.ga16...@elte.hu/

Yes, that was what I meant. We actually did switch to that hand-rolled
loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
memcmp instead of open-coded loop, 2017-08-09).


Looking at that commit, I'm surprised the old logic was just a for loop, 
instead of a word-based approach, such as the following:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..5e5819ad49 100644
--- a/cache.h
+++ b/cache.h
@@ -1021,9 +1021,41 @@ extern int find_unique_abbrev_r(char *hex, const 
struct object_id *oid, int len)

 extern const unsigned char null_sha1[GIT_MAX_RAWSZ];
 extern const struct object_id null_oid;

+static inline int word_cmp_32(uint32_t a, uint32_t b)
+{
+   return memcmp(&a, &b, sizeof(uint32_t));
+}
+
+static inline int word_cmp_64(uint64_t a, uint64_t b)
+{
+   return memcmp(&a, &b, sizeof(uint64_t));
+}
+
+struct object_id_20 {
+   uint64_t data0;
+   uint64_t data1;
+   uint32_t data2;
+};
+
 static inline int hashcmp(const unsigned char *sha1, const unsigned 
char *sha2)

 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   if (the_hash_algo->rawsz == 20) {
+   struct object_id_20 *obj1 = (struct object_id_20 *)sha1;
+   struct object_id_20 *obj2 = (struct object_id_20 *)sha2;
+
+   if (obj1->data0 == obj2->data0) {
+   if (obj1->data1 == obj2->data1) {
+   if (obj1->data2 == obj2->data2) {
+   return 0;
+   }
+   return word_cmp_32(obj1->data2, 
obj2->data2);

+   }
+   return word_cmp_64(obj1->data1, obj2->data1);
+   }
+   return word_cmp_64(obj1->data0, obj2->data0);
+   }
+
+   assert(0);
 }

 static inline int oidcmp(const struct object_id *oid1, const struct 
object_id *oid2)





Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Jeff King
On Wed, Aug 22, 2018 at 06:14:24PM +0200, Duy Nguyen wrote:

> On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen  wrote:
> >
> > On Wed, Aug 22, 2018 at 6:03 PM Jeff King  wrote:
> > >
> > > On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:
> > >
> > > > The other thing I was going to recommend (and I'll try to test this out
> > > > myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
> > > > volatile variable, since it is being referenced through a pointer. 
> > > > Perhaps
> > > > storing the value locally and then casing on it would help?
> > >
> > > I tried various sprinkling of "const" around the declarations to make it
> > > clear that the values wouldn't change once we saw them. But I couldn't
> > > detect any difference. At most I think that would let us hoist the "if"
> > > out of the loop, but gcc still seems unwilling to expand the memcmp when
> > > there are other branches.
> > >
> > > I think if that's the thing we want to have happen, we really do need to
> > > just write it out on that branch rather than saying "memcmp".
> >
> > This reminds me of an old discussion about memcpy() vs doing explicit
> > compare loop with lots of performance measurements..
> 
> Ah found it. Not sure if it is still relevant in light of multiple hash 
> support
> 
> https://public-inbox.org/git/20110427225114.ga16...@elte.hu/

Yes, that was what I meant. We actually did switch to that hand-rolled
loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
memcmp instead of open-coded loop, 2017-08-09).

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Duy Nguyen
On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen  wrote:
>
> On Wed, Aug 22, 2018 at 6:03 PM Jeff King  wrote:
> >
> > On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:
> >
> > > The other thing I was going to recommend (and I'll try to test this out
> > > myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
> > > volatile variable, since it is being referenced through a pointer. Perhaps
> > > storing the value locally and then casing on it would help?
> >
> > I tried various sprinkling of "const" around the declarations to make it
> > clear that the values wouldn't change once we saw them. But I couldn't
> > detect any difference. At most I think that would let us hoist the "if"
> > out of the loop, but gcc still seems unwilling to expand the memcmp when
> > there are other branches.
> >
> > I think if that's the thing we want to have happen, we really do need to
> > just write it out on that branch rather than saying "memcmp".
>
> This reminds me of an old discussion about memcpy() vs doing explicit
> compare loop with lots of performance measurements..

Ah found it. Not sure if it is still relevant in light of multiple hash support

https://public-inbox.org/git/20110427225114.ga16...@elte.hu/
-- 
Duy


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Duy Nguyen
On Wed, Aug 22, 2018 at 6:03 PM Jeff King  wrote:
>
> On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:
>
> > The other thing I was going to recommend (and I'll try to test this out
> > myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
> > volatile variable, since it is being referenced through a pointer. Perhaps
> > storing the value locally and then casing on it would help?
>
> I tried various sprinkling of "const" around the declarations to make it
> clear that the values wouldn't change once we saw them. But I couldn't
> detect any difference. At most I think that would let us hoist the "if"
> out of the loop, but gcc still seems unwilling to expand the memcmp when
> there are other branches.
>
> I think if that's the thing we want to have happen, we really do need to
> just write it out on that branch rather than saying "memcmp".

This reminds me of an old discussion about memcpy() vs doing explicit
compare loop with lots of performance measurements.. Is that what you
meant by "write it out"?
-- 
Duy


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Jeff King
On Wed, Aug 22, 2018 at 10:28:56AM -0400, Derrick Stolee wrote:

> In my testing, I've had the best luck with this change:
> 
> diff --git a/cache.h b/cache.h
> index b1fd3d58ab..6c8b51c390 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1023,7 +1023,14 @@ extern const struct object_id null_oid;
> 
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char
> *sha2)
>  {
> -   return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +   switch (the_hash_algo->rawsz) {
> +   case 20:
> +   return memcmp(sha1, sha2, 20);
> +   case 32:
> +   return memcmp(sha1, sha2, 32);
> +   default:
> +   assert(0);
> +   }
>  }
> 
> The fact that '20' and '32' are constants here may be helpful to the
> compiler. Can someone else test the perf?

I tested that one last night (and just re-tested it now to be sure). It
seems to just generate two separate calls to memcmp, with no speed
improvement.

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Jeff King
On Wed, Aug 22, 2018 at 08:42:20AM -0400, Paul Smith wrote:

> On Tue, 2018-08-21 at 23:03 -0400, Jeff King wrote:
> >  static inline int hashcmp(const unsigned char *sha1, const unsigned
> > char *sha2)
> >  {
> > +   assert(the_hash_algo->rawsz == 20);
> > return memcmp(sha1, sha2, the_hash_algo->rawsz);
> >  }
> 
> I'm not familiar with Git code, but for most environments assert() is a
> macro which is compiled out when built for "release mode" (whatever
> that might mean).  If that's the case for Git too, then relying on
> assert() to provide a side-effect (even an optimizer hint side-effect)
> won't work and this will actually get slower when built for "release
> mode".
> 
> Just a thought...

We don't have such a "release mode" in Git, though of course people may
pass -DNDEBUG to the compiler if they want.

However, to me how we spell the assert is mostly orthogonal to the
discussion. We can do "if (...) BUG(...)" to get a guaranteed-present
conditional. The bigger questions are:

  - are we OK with such an assertion; and

  - does the assertion still give us the desired behavior when we add in
a branch for rawsz==32?

And I think the answers for those are both "probably not".

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Jeff King
On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:

> The other thing I was going to recommend (and I'll try to test this out
> myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
> volatile variable, since it is being referenced through a pointer. Perhaps
> storing the value locally and then casing on it would help?

I tried various sprinkling of "const" around the declarations to make it
clear that the values wouldn't change once we saw them. But I couldn't
detect any difference. At most I think that would let us hoist the "if"
out of the loop, but gcc still seems unwilling to expand the memcmp when
there are other branches.

I think if that's the thing we want to have happen, we really do need to
just write it out on that branch rather than saying "memcmp".

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Jeff King
On Wed, Aug 22, 2018 at 09:39:57AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I don't have a good option. The assert() thing works until I add in the
> > "32" branch, but that's just punting the issue off until you add support
> > for the new hash.
> >
> > Hand-rolling our own asm or C is a portability headache, and we need to
> > change all of the callsites to use a new hasheq().
> >
> > Hiding it behind a per-hash function is conceptually cleanest, but not
> > quite as fast. And it also requires hasheq().
> >
> > So all of the solutions seem non-trivial.  Again, I'm starting to wonder
> > if it's worth chasing this few percent.
> 
> Did you try __builtin_expect? It's a GCC builtin for these sorts of
> situations, and sometimes helps:
> https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
> 
> I.e. you'd tell GCC we expect to have the 20 there with:
> 
> if (__builtin_expect(the_hash_algo->rawsz == 20, 1)) { ... }
> 
> The perl codebase has LIKELY() and UNLIKELY() macros for this which if
> the feature isn't available fall back on just plain C code:
> https://github.com/Perl/perl5/blob/v5.27.7/perl.h#L3335-L3344

Sadly, no, this doesn't seem to change anything. We still end up with a
single call to memcmp.

I also tried "hiding" the fallback call like this:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..7808bf3d6b 100644
--- a/cache.h
+++ b/cache.h
@@ -1021,9 +1021,13 @@ extern int find_unique_abbrev_r(char *hex, const struct 
object_id *oid, int len)
 extern const unsigned char null_sha1[GIT_MAX_RAWSZ];
 extern const struct object_id null_oid;
 
+int super_secret_memcmp(const void *a, const void *b, size_t len);
+
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   if (the_hash_algo->rawsz == 20)
+   return memcmp(sha1, sha2, 20);
+   return super_secret_memcmp(sha1, sha2, the_hash_algo->rawsz);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)
diff --git a/sha1-file.c b/sha1-file.c
index 97b7423848..5cd0a4b73f 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -2280,3 +2280,8 @@ int read_loose_object(const char *path,
munmap(map, mapsize);
return ret;
 }
+
+int super_secret_memcmp(const void *a, const void *b, size_t len)
+{
+   return memcmp(a, b, len);
+}

but that just results in calling memcmp and super_secret_memcmp on the
two codepaths (with or without the __builtin_expect).

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Derrick Stolee

On 8/22/2018 1:36 AM, brian m. carlson wrote:

On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:

So I wonder if there's some other way to tell the compiler that we'll
only have a few values. An enum comes to mind, though I don't think the
enum rules are strict enough to make this guarantee (after all, it's OK
to bitwise-OR enums, so they clearly don't specify all possible values).

I was thinking about this:

diff --git a/cache.h b/cache.h
index 1398b2a4e4..1f5c6e9319 100644
--- a/cache.h
+++ b/cache.h
@@ -1033,7 +1033,14 @@ extern const struct object_id null_oid;
  
  static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)

  {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   switch (the_hash_algo->rawsz) {
+   case 20:
+   return memcmp(sha1, sha2, 20);
+   case 32:
+   return memcmp(sha1, sha2, 32);
+   default:
+   assert(0);
+   }
  }
  
  static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)


That would make it obvious that there are at most two options.
Unfortunately, gcc for me determines that the buffer in walker.c is 20
bytes in size and steadfastly refuses to compile because it doesn't know
that the value will never be 32 in our codebase currently.  I'd need to
send in more patches before it would compile.

I don't know if something like this is an improvement or now, but this
seems to at least compile:

diff --git a/cache.h b/cache.h
index 1398b2a4e4..3207f74771 100644
--- a/cache.h
+++ b/cache.h
@@ -1033,7 +1033,13 @@ extern const struct object_id null_oid;
  
  static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)

  {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   switch (the_hash_algo->rawsz) {
+   case 20:
+   case 32:
+   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   default:
+   assert(0);
+   }
  }

In my testing, I've had the best luck with this change:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..6c8b51c390 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,7 +1023,14 @@ extern const struct object_id null_oid;

 static inline int hashcmp(const unsigned char *sha1, const unsigned 
char *sha2)

 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   switch (the_hash_algo->rawsz) {
+   case 20:
+   return memcmp(sha1, sha2, 20);
+   case 32:
+   return memcmp(sha1, sha2, 32);
+   default:
+   assert(0);
+   }
 }

The fact that '20' and '32' are constants here may be helpful to the 
compiler. Can someone else test the perf?


Thanks,
-Stolee


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Paul Smith
On Tue, 2018-08-21 at 23:03 -0400, Jeff King wrote:
>  static inline int hashcmp(const unsigned char *sha1, const unsigned
> char *sha2)
>  {
> +   assert(the_hash_algo->rawsz == 20);
> return memcmp(sha1, sha2, the_hash_algo->rawsz);
>  }

I'm not familiar with Git code, but for most environments assert() is a
macro which is compiled out when built for "release mode" (whatever
that might mean).  If that's the case for Git too, then relying on
assert() to provide a side-effect (even an optimizer hint side-effect)
won't work and this will actually get slower when built for "release
mode".

Just a thought...


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Derrick Stolee



On 8/22/2018 3:39 AM, Ævar Arnfjörð Bjarmason wrote:

On Wed, Aug 22, 2018 at 8:20 AM Jeff King  wrote:

On Wed, Aug 22, 2018 at 05:36:26AM +, brian m. carlson wrote:


On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:
I don't know if something like this is an improvement or now, but this
seems to at least compile:

diff --git a/cache.h b/cache.h
index 1398b2a4e4..3207f74771 100644
--- a/cache.h
+++ b/cache.h
@@ -1033,7 +1033,13 @@ extern const struct object_id null_oid;

  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
*sha2)
  {
- return memcmp(sha1, sha2, the_hash_algo->rawsz);
+ switch (the_hash_algo->rawsz) {
+ case 20:
+ case 32:
+ return memcmp(sha1, sha2, the_hash_algo->rawsz);
+ default:
+ assert(0);
+ }

I think that would end up with the same slow code, as gcc would rather
call memcmp than expand out the two sets of asm.


I won't have time to sit down and test this out until tomorrow afternoon
at the earliest.  If you want to send in something in the mean time,
even if that limits things to just 20 for now, that's fine.

I don't have a good option. The assert() thing works until I add in the
"32" branch, but that's just punting the issue off until you add support
for the new hash.

Hand-rolling our own asm or C is a portability headache, and we need to
change all of the callsites to use a new hasheq().

Hiding it behind a per-hash function is conceptually cleanest, but not
quite as fast. And it also requires hasheq().

So all of the solutions seem non-trivial.  Again, I'm starting to wonder
if it's worth chasing this few percent.

Did you try __builtin_expect? It's a GCC builtin for these sorts of
situations, and sometimes helps:
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

I.e. you'd tell GCC we expect to have the 20 there with:

 if (__builtin_expect(the_hash_algo->rawsz == 20, 1)) { ... }

The perl codebase has LIKELY() and UNLIKELY() macros for this which if
the feature isn't available fall back on just plain C code:
https://github.com/Perl/perl5/blob/v5.27.7/perl.h#L3335-L3344
The other thing I was going to recommend (and I'll try to test this out 
myself later) is to see if 'the_hash_algo->rawsz' is being treated as a 
volatile variable, since it is being referenced through a pointer. 
Perhaps storing the value locally and then casing on it would help?


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Derrick Stolee

On 8/21/2018 11:36 PM, Jeff King wrote:

On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:


with the obvious "oideq()" implementation added, that seems to get me to
2-3%. Not _quite_ as good as the original branching version I showed.
And we had to touch all the callsites (although arguably that kind of
"eq" function is a better interface anyway, since it obviously allows
for more optimization.

So maybe the branching thing is actually not so insane. It makes new
hash_algo's Just Work; they just won't be optimized. And the change is
very localized.

Hmph. So I went back to double-check my measurements on that branching
version, and I couldn't replicate it!

I'm actually relieved to see this, as I couldn't either.


It turns out what I showed (and measured) before has a bug. Can you see
it?
I had rewritten the section from scratch instead of applying your diff, 
so I didn't get the sha1-sha1 error. I decided to sleep on it instead of 
sending my email.



So the assert() version really is the fastest. I didn't test, but I
suspect we could "trick" the compiler by having the fallback call an
opaque wrapper around memcmp(). That would prevent it from combining the
two paths, and presumably it would still optimize the constant-20 side.
Or maybe it would eventually decide our inline function is getting too
big and scrap it. Which probably crosses a line of craziness (if I
didn't already cross it two emails ago).

I appreciate your effort here.

Thanks
-Stolee


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Ævar Arnfjörð Bjarmason
On Wed, Aug 22, 2018 at 8:20 AM Jeff King  wrote:
>
> On Wed, Aug 22, 2018 at 05:36:26AM +, brian m. carlson wrote:
>
> > On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:
> > > So I wonder if there's some other way to tell the compiler that we'll
> > > only have a few values. An enum comes to mind, though I don't think the
> > > enum rules are strict enough to make this guarantee (after all, it's OK
> > > to bitwise-OR enums, so they clearly don't specify all possible values).
> >
> > I was thinking about this:
> >
> > diff --git a/cache.h b/cache.h
> > index 1398b2a4e4..1f5c6e9319 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -1033,7 +1033,14 @@ extern const struct object_id null_oid;
> >
> >  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> > *sha2)
> >  {
> > - return memcmp(sha1, sha2, the_hash_algo->rawsz);
> > + switch (the_hash_algo->rawsz) {
> > + case 20:
> > + return memcmp(sha1, sha2, 20);
> > + case 32:
> > + return memcmp(sha1, sha2, 32);
> > + default:
> > + assert(0);
> > + }
> >  }
>
> Unfortunately this version doesn't seem to be any faster than the status
> quo. And looking at the generated asm, it still looks to be calling
> memcpy(). Removing the "case 32" branch switches it back to fast
> assembly (this is all using gcc 8.2.0, btw). So I think we're deep into
> guessing what the optimizer is going to do, and there's a good chance
> that other versions are going to optimize it differently.
>
> We might be better off just writing it out manually. Unfortunately, it's
> a bit hard because the neg/0/pos return is more expensive to compute
> than pure equality. And only the compiler knows at each inlined site
> whether we actually want equality. So now we're back to switching every
> caller to use hasheq() if that's what they want.
>
> But _if_ we're OK with that, and _if_ we don't mind some ifdefs for
> portability, then this seems as fast as the original (memcmp+constant)
> code on my machine:
>
> diff --git a/cache.h b/cache.h
> index b1fd3d58ab..c406105f3c 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1023,7 +1023,16 @@ extern const struct object_id null_oid;
>
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> *sha2)
>  {
> -   return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +   switch (the_hash_algo->rawsz) {
> +   case 20:
> +   if (*(uint32_t *)sha1 == *(uint32_t *)sha2 &&
> +   *(unsigned __int128 *)(sha1+4) == *(unsigned __int128 
> *)(sha2+4))
> +   return 0;
> +   case 32:
> +   return memcmp(sha1, sha2, 32);
> +   default:
> +   assert(0);
> +   }
>  }
>
>  static inline int oidcmp(const struct object_id *oid1, const struct 
> object_id *oid2)
>
> Which is really no surprise, because the generated asm looks about the
> same. There are obviously alignment questions there. It's possible it
> could even be written portably as a simple loop. Or maybe not. We used
> to do that, but modern compilers were able to optimize the memcmp
> better. Maybe that's changed. Or maybe they were simply unwilling to
> unroll a 20-length loop to find out that it could be turned into a few
> quad-word compares.
>
> > That would make it obvious that there are at most two options.
> > Unfortunately, gcc for me determines that the buffer in walker.c is 20
> > bytes in size and steadfastly refuses to compile because it doesn't know
> > that the value will never be 32 in our codebase currently.  I'd need to
> > send in more patches before it would compile.
>
> Yeah, I see that warning all over the place (everywhere that calls
> is_null_oid(), which is passing in a 20-byte buffer).
>
> > I don't know if something like this is an improvement or now, but this
> > seems to at least compile:
> >
> > diff --git a/cache.h b/cache.h
> > index 1398b2a4e4..3207f74771 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -1033,7 +1033,13 @@ extern const struct object_id null_oid;
> >
> >  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> > *sha2)
> >  {
> > - return memcmp(sha1, sha2, the_hash_algo->rawsz);
> > + switch (the_hash_algo->rawsz) {
> > + case 20:
> > + case 32:
> > + return memcmp(sha1, sha2, the_hash_algo->rawsz);
> > + default:
> > + assert(0);
> > + }
>
> I think that would end up with the same slow code, as gcc would rather
> call memcmp than expand out the two sets of asm.
>
> > I won't have time to sit down and test this out until tomorrow afternoon
> > at the earliest.  If you want to send in something in the mean time,
> > even if that limits things to just 20 for now, that's fine.
>
> I don't have a good option. The assert() thing works until I add in the
> "32" branch, but that's just punting the issue off until 

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-21 Thread Jeff King
On Wed, Aug 22, 2018 at 05:36:26AM +, brian m. carlson wrote:

> On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:
> > So I wonder if there's some other way to tell the compiler that we'll
> > only have a few values. An enum comes to mind, though I don't think the
> > enum rules are strict enough to make this guarantee (after all, it's OK
> > to bitwise-OR enums, so they clearly don't specify all possible values).
> 
> I was thinking about this:
> 
> diff --git a/cache.h b/cache.h
> index 1398b2a4e4..1f5c6e9319 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1033,7 +1033,14 @@ extern const struct object_id null_oid;
>  
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> *sha2)
>  {
> - return memcmp(sha1, sha2, the_hash_algo->rawsz);
> + switch (the_hash_algo->rawsz) {
> + case 20:
> + return memcmp(sha1, sha2, 20);
> + case 32:
> + return memcmp(sha1, sha2, 32);
> + default:
> + assert(0);
> + }
>  }

Unfortunately this version doesn't seem to be any faster than the status
quo. And looking at the generated asm, it still looks to be calling
memcpy(). Removing the "case 32" branch switches it back to fast
assembly (this is all using gcc 8.2.0, btw). So I think we're deep into
guessing what the optimizer is going to do, and there's a good chance
that other versions are going to optimize it differently.

We might be better off just writing it out manually. Unfortunately, it's
a bit hard because the neg/0/pos return is more expensive to compute
than pure equality. And only the compiler knows at each inlined site
whether we actually want equality. So now we're back to switching every
caller to use hasheq() if that's what they want.

But _if_ we're OK with that, and _if_ we don't mind some ifdefs for
portability, then this seems as fast as the original (memcmp+constant)
code on my machine:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..c406105f3c 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,7 +1023,16 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   switch (the_hash_algo->rawsz) {
+   case 20:
+   if (*(uint32_t *)sha1 == *(uint32_t *)sha2 &&
+   *(unsigned __int128 *)(sha1+4) == *(unsigned __int128 
*)(sha2+4))
+   return 0;
+   case 32:
+   return memcmp(sha1, sha2, 32);
+   default:
+   assert(0);
+   }
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)

Which is really no surprise, because the generated asm looks about the
same. There are obviously alignment questions there. It's possible it
could even be written portably as a simple loop. Or maybe not. We used
to do that, but modern compilers were able to optimize the memcmp
better. Maybe that's changed. Or maybe they were simply unwilling to
unroll a 20-length loop to find out that it could be turned into a few
quad-word compares.

> That would make it obvious that there are at most two options.
> Unfortunately, gcc for me determines that the buffer in walker.c is 20
> bytes in size and steadfastly refuses to compile because it doesn't know
> that the value will never be 32 in our codebase currently.  I'd need to
> send in more patches before it would compile.

Yeah, I see that warning all over the place (everywhere that calls
is_null_oid(), which is passing in a 20-byte buffer).

> I don't know if something like this is an improvement or now, but this
> seems to at least compile:
> 
> diff --git a/cache.h b/cache.h
> index 1398b2a4e4..3207f74771 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1033,7 +1033,13 @@ extern const struct object_id null_oid;
>  
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> *sha2)
>  {
> - return memcmp(sha1, sha2, the_hash_algo->rawsz);
> + switch (the_hash_algo->rawsz) {
> + case 20:
> + case 32:
> + return memcmp(sha1, sha2, the_hash_algo->rawsz);
> + default:
> + assert(0);
> + }

I think that would end up with the same slow code, as gcc would rather
call memcmp than expand out the two sets of asm.

> I won't have time to sit down and test this out until tomorrow afternoon
> at the earliest.  If you want to send in something in the mean time,
> even if that limits things to just 20 for now, that's fine.

I don't have a good option. The assert() thing works until I add in the
"32" branch, but that's just punting the issue off until you add support
for the new hash.

Hand-rolling our own asm or C is a portability headache, and we need to
change all of the callsites to use a new hasheq().

Hiding it behind a per-hash function is conceptually cleanest, but not
quite as fast. And it also requires hash

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-21 Thread brian m. carlson
On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:
> So I wonder if there's some other way to tell the compiler that we'll
> only have a few values. An enum comes to mind, though I don't think the
> enum rules are strict enough to make this guarantee (after all, it's OK
> to bitwise-OR enums, so they clearly don't specify all possible values).

I was thinking about this:

diff --git a/cache.h b/cache.h
index 1398b2a4e4..1f5c6e9319 100644
--- a/cache.h
+++ b/cache.h
@@ -1033,7 +1033,14 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   switch (the_hash_algo->rawsz) {
+   case 20:
+   return memcmp(sha1, sha2, 20);
+   case 32:
+   return memcmp(sha1, sha2, 32);
+   default:
+   assert(0);
+   }
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)

That would make it obvious that there are at most two options.
Unfortunately, gcc for me determines that the buffer in walker.c is 20
bytes in size and steadfastly refuses to compile because it doesn't know
that the value will never be 32 in our codebase currently.  I'd need to
send in more patches before it would compile.

I don't know if something like this is an improvement or now, but this
seems to at least compile:

diff --git a/cache.h b/cache.h
index 1398b2a4e4..3207f74771 100644
--- a/cache.h
+++ b/cache.h
@@ -1033,7 +1033,13 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   switch (the_hash_algo->rawsz) {
+   case 20:
+   case 32:
+   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   default:
+   assert(0);
+   }
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)

I won't have time to sit down and test this out until tomorrow afternoon
at the earliest.  If you want to send in something in the mean time,
even if that limits things to just 20 for now, that's fine.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-21 Thread Jeff King
On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:

> with the obvious "oideq()" implementation added, that seems to get me to
> 2-3%. Not _quite_ as good as the original branching version I showed.
> And we had to touch all the callsites (although arguably that kind of
> "eq" function is a better interface anyway, since it obviously allows
> for more optimization.
> 
> So maybe the branching thing is actually not so insane. It makes new
> hash_algo's Just Work; they just won't be optimized. And the change is
> very localized.

Hmph. So I went back to double-check my measurements on that branching
version, and I couldn't replicate it!

It turns out what I showed (and measured) before has a bug. Can you see
it?

diff --git a/cache.h b/cache.h
index b1fd3d58ab..9c004a26c9 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,7 +1023,10 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   if (the_hash_algo->rawsz == 20)
+   return memcmp(sha1, sha2, 20);
+   else
+   return memcmp(sha1, sha1, the_hash_algo->rawsz);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)


The problem is the fallback code compares "sha1" to "sha1". The compiler
realizes that's a noop and is able to treat it like a constant. Thus
essentially leaving only the first branch, which it then expands into a
few instructions.

If we fix that bug, then we really do memcmp on either side of the
conditional. And the compiler is smart enough to realize that hey,
that's the same as just calling memcmp with the_hash_algo->rawsz on
either side. And we end up with roughly the same code that we started
with.

So the assert() version really is the fastest. I didn't test, but I
suspect we could "trick" the compiler by having the fallback call an
opaque wrapper around memcmp(). That would prevent it from combining the
two paths, and presumably it would still optimize the constant-20 side.
Or maybe it would eventually decide our inline function is getting too
big and scrap it. Which probably crosses a line of craziness (if I
didn't already cross it two emails ago).

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-21 Thread Jeff King
On Wed, Aug 22, 2018 at 12:48:16AM +, brian m. carlson wrote:

> > diff --git a/cache.h b/cache.h
> > index b1fd3d58ab..9c004a26c9 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -1023,7 +1023,10 @@ extern const struct object_id null_oid;
> >  
> >  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> > *sha2)
> >  {
> > -   return memcmp(sha1, sha2, the_hash_algo->rawsz);
> > +   if (the_hash_algo->rawsz == 20)
> > +   return memcmp(sha1, sha2, 20);
> > +   else
> > +   return memcmp(sha1, sha1, the_hash_algo->rawsz);
> >  }
> >  
> >  static inline int oidcmp(const struct object_id *oid1, const struct 
> > object_id *oid2)
> > on top of v2.19-rc0 seems to give me about a 3% speedup (though I might
> > be imaging it, as there's a bit of noise). A function pointer in
> > the_hash_algo might make even more sense.
> 
> It's possible that might be a better solution.  I looked into a GCC
> assertion that the value was either 20 or 32, and that in itself didn't
> seem to help, at least in the generated code.  Your solution is likely
> better in that regard.
> 
> We could wire it up to be either 20 or 32 and let people experimenting
> with other sizes of algorithms just add another branch.  I haven't
> tested how that performs, though.

Here's a _really_ dirty one:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..a6750524ea 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,6 +1023,7 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
+   assert(the_hash_algo->rawsz == 20);
return memcmp(sha1, sha2, the_hash_algo->rawsz);
 }
 

We probably don't want to do that, because it makes experimenting with
new hash algos a bit painful, but it gives the same 3-4% speedup pretty
consistently. But I think it demonstrates pretty clearly that giving the
compiler the extra limit information is sufficient. Presumably the
fixed-size memcmp turns into a few multi-word compares.

And indeed, if I look at the generated asm for the call in lookup_object
(which is likely the one we're hitting a lot in this case), I see:

  # cache.h:1027: return memcmp(sha1, sha2, the_hash_algo->rawsz);
  .loc 4 1027 9 is_stmt 0 view .LVU86
  movq(%rsi), %rcx# MEM[(void *)sha1_25(D)], MEM[(void 
*)sha1_25(D)]
  movq8(%rsi), %rdi   # MEM[(void *)sha1_25(D)], tmp125
  xorq4(%rax), %rcx   # MEM[(void *)_6], tmp116
  xorq8(%r8), %rdi# MEM[(void *)_6], tmp115
  orq %rcx, %rdi  # tmp116, tmp115
  jne .L27#,
  movl16(%r8), %ecx   # MEM[(void *)_6], tmp129
  cmpl%ecx, 16(%rsi)  # tmp129, MEM[(void *)sha1_25(D)]
  jne .L27#,

So I wonder if there's some other way to tell the compiler that we'll
only have a few values. An enum comes to mind, though I don't think the
enum rules are strict enough to make this guarantee (after all, it's OK
to bitwise-OR enums, so they clearly don't specify all possible values).

Having a dedicate hashcmp function for each hash_algo seems like the
sanest approach. We pay for one indirect function call, but the function
itself will have the constants available. But it does introduce one
extra complication. We're benefiting here from knowing that the size is
always 20, but also the inline hashcmp knows that we only care about
equality, not comparison.

So if I do this:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..da56da7be2 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,7 +1023,7 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   return the_hash_algo->cmp_fn(sha1, sha2);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)
diff --git a/hash.h b/hash.h
index 7c8238bc2e..ac22ba63b6 100644
--- a/hash.h
+++ b/hash.h
@@ -64,6 +64,7 @@ typedef union git_hash_ctx git_hash_ctx;
 typedef void (*git_hash_init_fn)(git_hash_ctx *ctx);
 typedef void (*git_hash_update_fn)(git_hash_ctx *ctx, const void *in, size_t 
len);
 typedef void (*git_hash_final_fn)(unsigned char *hash, git_hash_ctx *ctx);
+typedef int (*git_hash_cmp_fn)(const void *a, const void *b);
 
 struct git_hash_algo {
/*
@@ -90,6 +91,8 @@ struct git_hash_algo {
/* The hash finalization function. */
git_hash_final_fn final_fn;
 
+   git_hash_cmp_fn cmp_fn;
+
/* The OID of the empty tree. */
const struct object_id *empty_tree;
 
diff --git a/sha1-file.c b/sha1-file.c
index 97b7423848..7072e360d7 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -69,6 +69,11 @@ static void git_hash_sha1_final(unsigned char *hash, 
git_hash_ctx *ctx)
git_SHA1_Final(hash, &ctx->sha1);
 }
 
+static int git_hash_sha1_cmp(const void *a, const void *b)
+{
+   return memcmp(a, b, 20);
+}
+
 static void git_hash

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-21 Thread brian m. carlson
On Tue, Aug 21, 2018 at 05:29:24PM -0400, Jeff King wrote:
> 0001.2: rev-list --all --objects  37.07(36.62+0.45)   39.11(38.58+0.51) +5.5%
> 
> Less change, but my overall times were smaller, too, so clearly our
> hardware or exact repos are a little bit different. Those numbers seem
> pretty consistent in further runs.
> 
> It bisects to 509f6f62a4 (cache: update object ID functions for
> the_hash_algo, 2018-07-16). Which make sense. An "--objects" traversal
> spends a huge amount of time checking each tree entry to see if we've
> processed that object yet, which ends up as hashcmp() in the hash table.
> I expect that a fixed 20-byte memcmp() can be optimized a lot more than
> one with an arbitrary value.
> 
> Even if _we_ know the value can only take on one of a few values, I
> don't know that we have an easy way to tell the compiler that. Possibly
> we could improve things by jumping directly to an optimized code path.
> Sort of a poor-man's JIT. ;)
> 
> Doing this:
> 
> diff --git a/cache.h b/cache.h
> index b1fd3d58ab..9c004a26c9 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1023,7 +1023,10 @@ extern const struct object_id null_oid;
>  
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> *sha2)
>  {
> - return memcmp(sha1, sha2, the_hash_algo->rawsz);
> + if (the_hash_algo->rawsz == 20)
> + return memcmp(sha1, sha2, 20);
> + else
> + return memcmp(sha1, sha1, the_hash_algo->rawsz);
>  }
>  
>  static inline int oidcmp(const struct object_id *oid1, const struct 
> object_id *oid2)
> on top of v2.19-rc0 seems to give me about a 3% speedup (though I might
> be imaging it, as there's a bit of noise). A function pointer in
> the_hash_algo might make even more sense.

It's possible that might be a better solution.  I looked into a GCC
assertion that the value was either 20 or 32, and that in itself didn't
seem to help, at least in the generated code.  Your solution is likely
better in that regard.

We could wire it up to be either 20 or 32 and let people experimenting
with other sizes of algorithms just add another branch.  I haven't
tested how that performs, though.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-21 Thread Jeff King
On Tue, Aug 21, 2018 at 04:41:02PM -0400, Derrick Stolee wrote:

> On 8/20/2018 6:13 PM, Junio C Hamano wrote:
> > An early preview release Git v2.19.0-rc0 is now available for
> > testing at the usual places.
> 
> As part of testing the release candidate, I ran the performance suite
> against a fresh clone of the Linux repository using v2.18.0 and v2.19.0-rc0
> (also: GIT_PERF_REPEAT_COUNT=10).

Wow, you're a glutton for punishment. :)

> I found a few nice improvements, but I
> also found a possible regression in tree walking. I say "tree walking"
> because it was revealed using p0001-rev-list.sh, but only with the
> "--objects" flag. I also saw some similar numbers on 'git log --raw'.
> 
> Test v2.18.0 v2.19.0-rc0
> 
> 0001.1: rev-list --all 6.69(6.33+0.35) 6.52(6.20+0.31) -2.5%
> 0001.2: rev-list --all --objects 52.14(47.43+1.02)   57.15(51.09+1.18) +9.6%
> 
> To me, 9.6% seems out of the range of just noise for this length of a
> command, but I could be wrong. Could anyone else try to repro these results?

I got:

0001.2: rev-list --all --objects  37.07(36.62+0.45)   39.11(38.58+0.51) +5.5%

Less change, but my overall times were smaller, too, so clearly our
hardware or exact repos are a little bit different. Those numbers seem
pretty consistent in further runs.

It bisects to 509f6f62a4 (cache: update object ID functions for
the_hash_algo, 2018-07-16). Which make sense. An "--objects" traversal
spends a huge amount of time checking each tree entry to see if we've
processed that object yet, which ends up as hashcmp() in the hash table.
I expect that a fixed 20-byte memcmp() can be optimized a lot more than
one with an arbitrary value.

Even if _we_ know the value can only take on one of a few values, I
don't know that we have an easy way to tell the compiler that. Possibly
we could improve things by jumping directly to an optimized code path.
Sort of a poor-man's JIT. ;)

Doing this:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..9c004a26c9 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,7 +1023,10 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   if (the_hash_algo->rawsz == 20)
+   return memcmp(sha1, sha2, 20);
+   else
+   return memcmp(sha1, sha1, the_hash_algo->rawsz);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)
on top of v2.19-rc0 seems to give me about a 3% speedup (though I might
be imaging it, as there's a bit of noise). A function pointer in
the_hash_algo might make even more sense.

-Peff


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-21 Thread Derrick Stolee

On 8/20/2018 6:13 PM, Junio C Hamano wrote:

An early preview release Git v2.19.0-rc0 is now available for
testing at the usual places.


As part of testing the release candidate, I ran the performance suite 
against a fresh clone of the Linux repository using v2.18.0 and 
v2.19.0-rc0 (also: GIT_PERF_REPEAT_COUNT=10). I found a few nice 
improvements, but I also found a possible regression in tree walking. I 
say "tree walking" because it was revealed using p0001-rev-list.sh, but 
only with the "--objects" flag. I also saw some similar numbers on 'git 
log --raw'.


Test v2.18.0 v2.19.0-rc0

0001.1: rev-list --all 6.69(6.33+0.35) 6.52(6.20+0.31) -2.5%
0001.2: rev-list --all --objects 52.14(47.43+1.02)   57.15(51.09+1.18) +9.6%

To me, 9.6% seems out of the range of just noise for this length of a 
command, but I could be wrong. Could anyone else try to repro these results?


(This may also not just be tree-walking, but general pack-file loading 
and decompression, since I computed and stored a commit-graph file. 
Hence, commits are not being parsed from the pack-file by either command.)


Aside: the perf results were not all bad. Here was an interesting 
improvement:


Test v2.18.0 v2.19.0-rc0

0002.1: read_cache/discard_cache 1000 times 5.63(5.30+0.32)   
3.34(3.03+0.30) -40.7%


Thanks,

-Stolee



Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-20 Thread Stefan Beller
On Mon, Aug 20, 2018 at 5:27 PM Jonathan Nieder  wrote:
>
> Jonathan Nieder wrote:
> > Stefan Beller wrote:
> >> Junio C Hamano wrote:
>
> >>>  * "git submodule" did not correctly adjust core.worktree setting that
> >>>indicates whether/where a submodule repository has its associated
> >>>working tree across various state transitions, which has been
> >>>corrected.
> >>>(merge 984cd77ddb sb/submodule-core-worktree later to maint).
> >>
> >> Personally I do not view this as a bug fix but a feature
> >> (but then again my thinking might be tainted of too much
> >> submodule work) hence I would not merge it down.
> >
> > Can you elaborate?
>
> ... ah, I figured it out.  You are saying "would not merge it down to
> maint".  In that case, I agree, since this this is not a recent bug
> (it's existed since before v1.7.10-rc1~14^2~2, 2012-03-02).

Yeah; the behavior was the gold standard for submodules ever since,
so I am wary of changing it under the guise of fixing a bug.
The core.worktree setting doesn't harm the user by default; you
need to craft a very specific situation to benefit from this feature.

Stefan


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-20 Thread Jonathan Nieder
Jonathan Nieder wrote:
> Stefan Beller wrote:
>> Junio C Hamano wrote:

>>>  * "git submodule" did not correctly adjust core.worktree setting that
>>>indicates whether/where a submodule repository has its associated
>>>working tree across various state transitions, which has been
>>>corrected.
>>>(merge 984cd77ddb sb/submodule-core-worktree later to maint).
>>
>> Personally I do not view this as a bug fix but a feature
>> (but then again my thinking might be tainted of too much
>> submodule work) hence I would not merge it down.
>
> Can you elaborate?

... ah, I figured it out.  You are saying "would not merge it down to
maint".  In that case, I agree, since this this is not a recent bug
(it's existed since before v1.7.10-rc1~14^2~2, 2012-03-02).

Thanks,
Jonathan


Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-20 Thread Jonathan Nieder
(-cc: other lists)
Stefan Beller wrote:
> Junio C Hamano wrote:

>>  * "git submodule" did not correctly adjust core.worktree setting that
>>indicates whether/where a submodule repository has its associated
>>working tree across various state transitions, which has been
>>corrected.
>>(merge 984cd77ddb sb/submodule-core-worktree later to maint).
>
> Personally I do not view this as a bug fix but a feature
> (but then again my thinking might be tainted of too much
> submodule work) hence I would not merge it down.

Can you elaborate?

The symptom that this series fixes was pretty bad, so I'm pretty glad
you wrote it.

Thanks,
Jonathan