subject:"\[ANNOUNCE\] Git v2.19.0\-rc0"

On Thu, Aug 23, 2018 at 07:48:42PM -0700, Jacob Keller wrote:

> Odd...
> 
> What about..
> 
> - if (oidcmp(a,b))
> + if(!oideq(a,b))
>   { ... }

Nope, it doesn't like that syntactically.

> Or maybe you need to use something like
> 
>   <...
> - if (oidcmp(a,b))
> + if (!oideq(a,b))
>   ...>

Nor that (I also tried finding documentation on what exactly the angle
brackets mean, but couldn't).

> Hmm. Yea, semantic patches are a bit confusing overall sometimes.
> 
> But it looks like you got something which works?

Actually, what I showed earlier does seem to have some weirdness with
else-if. But I finally stumbled on something even better:

  - oidcmp(a, b) != 0
  + !oideq(a, b)

Because of the isomorphisms that coccinelle knows about, that catches
everything we want.  Obvious ones like:

diff --git a/bisect.c b/bisect.c
index 41c56a665e..7c1d8f1a6d 100644
--- a/bisect.c
+++ b/bisect.c
@@ -595,7 +595,7 @@ static struct commit_list *skip_away(struct commit_list 
*list, int count)
 
for (i = 0; cur; cur = cur->next, i++) {
if (i == index) {
-   if (oidcmp(>item->object.oid, current_bad_oid))
+   if (!oideq(>item->object.oid, current_bad_oid))
return cur;
if (previous)
return previous;

and compound conditionals like:

diff --git a/blame.c b/blame.c
index 10d72e36dd..538d0ab1aa 100644
--- a/blame.c
+++ b/blame.c
@@ -1834,7 +1834,7 @@ void setup_scoreboard(struct blame_scoreboard *sb,

sb->revs->children.name = "children";
while (c->parents &&
-  oidcmp(>object.oid, >final->object.oid)) {
+  !oideq(>object.oid, >final->object.oid)) {
struct commit_list *l = xcalloc(1, sizeof(*l));

l->item = c;

and even non-if contexts, like:

diff --git a/sha1-file.c b/sha1-file.c
index 631f6b9dc2..d85f4e93e1 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -825,7 +825,7 @@ int check_object_signature(const struct object_id *oid, 
void *map,
 
if (map) {
hash_object_file(map, size, type, _oid);
-   return oidcmp(oid, _oid) ? -1 : 0;
+   return !oideq(oid, _oid) ? -1 : 0;
}

So I think we have a winner. I'll polish that up into patches and send
it out later tonight.

-Peff

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Jacob Keller

On Thu, Aug 23, 2018 at 5:16 PM Jeff King  wrote:
>
> On Thu, Aug 23, 2018 at 08:06:37PM -0400, Jeff King wrote:
>
> > This almost works:
> >
> >   @@
> >   expression a, b;
> >   statement s;
> >   @@
> >   - if (oidcmp(a, b)) s
> >   + if (!oideq(a, b)) s
> >
> > [...]
>
> > So I really do want some way of saying "all of the block, no matter what
> > it is". Or of leaving it out as context.
>
> Aha. The magic invocation is:
>
>   @@
>   expression a, b;
>   statement s;
>   @@
>   - if (oidcmp(a, b))
>   + if (!oideq(a, b))
>   s
>
> I would have expected that you could replace "s" with "...", but that
> does not seem to work.
>
> -Peff

Odd...

What about..

- if (oidcmp(a,b))
+ if(!oideq(a,b))
  { ... }

Or maybe you need to use something like

  <...
- if (oidcmp(a,b))
+ if (!oideq(a,b))
  ...>

Hmm. Yea, semantic patches are a bit confusing overall sometimes.

But it looks like you got something which works?

Thanks,
Jake

Re: [ANNOUNCE] Git v2.19.0-rc0

On Thu, Aug 23, 2018 at 08:06:37PM -0400, Jeff King wrote:

> This almost works:
> 
>   @@
>   expression a, b;
>   statement s;
>   @@
>   - if (oidcmp(a, b)) s
>   + if (!oideq(a, b)) s
>
> [...]

> So I really do want some way of saying "all of the block, no matter what
> it is". Or of leaving it out as context.

Aha. The magic invocation is:

  @@
  expression a, b;
  statement s;
  @@
  - if (oidcmp(a, b))
  + if (!oideq(a, b))
  s

I would have expected that you could replace "s" with "...", but that
does not seem to work.

-Peff

Re: [ANNOUNCE] Git v2.19.0-rc0

On Thu, Aug 23, 2018 at 07:40:49PM -0400, Jeff King wrote:

> > You can look for explicitly "if (oidcmp(...))" though. I don't know if
> > you can catch *any* use which degrades to boolean outside of an if
> > statement, but I wouldn't expect there to be too many of those?
> 
> Yeah, that was my thought, too. And I've been trying this all afternoon
> without success. Why doesn't this work:
> 
>   @@
>   expression a, b;
>   @@
>   - if (oidcmp(a, b))
>   + if (!oideq(a, b))
> 
> I get:
> 
>   Fatal error: exception Failure("minus: parse error: \n = File
>   \"contrib/coccinelle/oideq.cocci\", line 21, column 0,  charpos =
>   221\naround = '', whole content = \n")
> 
> If I do:
> 
>   - if (oidcmp(a, b)) { ... }
> 
> that seems to please the parser for the minus line. But I cannot include
> the "..." on the plus line. Clearly the "..." part should be context,
> but I can't seem to find the right syntax.

This almost works:

  @@
  expression a, b;
  statement s;
  @@
  - if (oidcmp(a, b)) s
  + if (!oideq(a, b)) s

It generates this, for example:

diff -u -p a/bisect.c b/bisect.c
--- a/bisect.c
+++ b/bisect.c
@@ -595,7 +595,7 @@ static struct commit_list *skip_away(str
 
for (i = 0; cur; cur = cur->next, i++) {
if (i == index) {
-   if (oidcmp(>item->object.oid, current_bad_oid))
+   if (!oideq(>item->object.oid, current_bad_oid))
return cur;
if (previous)
return previous;

which is what we want. But it also generates this:

diff -u -p a/bundle.c b/bundle.c
--- a/bundle.c
+++ b/bundle.c
@@ -369,25 +369,11 @@ static int write_bundle_refs(int bundle_
 * commit that is referenced by the tag, and not the tag
 * itself.
 */
-   if (oidcmp(, >item->oid)) {
-   /*
-* Is this the positive end of a range expressed
-* in terms of a tag (e.g. v2.0 from the range
-* "v1.0..v2.0")?
-*/
-   struct commit *one = 
lookup_commit_reference(the_repository,
-);
+   if (!oideq(, >item->oid)) {
+   struct commit 
*one=lookup_commit_reference(the_repository,
+  );
struct object *obj;
-
if (e->item == &(one->object)) {
-   /*
-* Need to include e->name as an
-* independent ref to the pack-objects
-* input, so that the tag is included
-* in the output; otherwise we would
-* end up triggering "empty bundle"
-* error.
-*/
obj = parse_object_or_die(, e->name);
obj->flags |= SHOWN;
add_pending_object(revs, obj, e->name);

So I really do want some way of saying "all of the block, no matter what
it is". Or of leaving it out as context.

-Peff

Re: [ANNOUNCE] Git v2.19.0-rc0

On Thu, Aug 23, 2018 at 04:30:10PM -0700, Jacob Keller wrote:

> On Thu, Aug 23, 2018 at 9:30 AM Jeff King  wrote:
> > I think that audit isn't actually too bad (but definitely not something
> > we should do for v2.19). The cocci patch I showed earlier hits most of
> > them. It misses the negated ones (e.g., "if (oidcmp(...))"). I'm not
> > sure if there's a way to ask coccinelle only for oidcmp()
> > used in a boolean context.
> >
> 
> You can look for explicitly "if (oidcmp(...))" though. I don't know if
> you can catch *any* use which degrades to boolean outside of an if
> statement, but I wouldn't expect there to be too many of those?

Yeah, that was my thought, too. And I've been trying this all afternoon
without success. Why doesn't this work:

  @@
  expression a, b;
  @@
  - if (oidcmp(a, b))
  + if (!oideq(a, b))

I get:

  Fatal error: exception Failure("minus: parse error: \n = File
  \"contrib/coccinelle/oideq.cocci\", line 21, column 0,  charpos =
  221\naround = '', whole content = \n")

If I do:

  - if (oidcmp(a, b)) { ... }

that seems to please the parser for the minus line. But I cannot include
the "..." on the plus line. Clearly the "..." part should be context,
but I can't seem to find the right syntax.

FWIW, I do have patches adding hasheq() and converting all of the
!oidcmp() cases. I may resort to hand-investigating each of the negated
ones, but I really feel like I should be able to do better with
coccinelle.

-Peff

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Jacob Keller

On Thu, Aug 23, 2018 at 9:30 AM Jeff King  wrote:
> I think that audit isn't actually too bad (but definitely not something
> we should do for v2.19). The cocci patch I showed earlier hits most of
> them. It misses the negated ones (e.g., "if (oidcmp(...))"). I'm not
> sure if there's a way to ask coccinelle only for oidcmp()
> used in a boolean context.
>

You can look for explicitly "if (oidcmp(...))" though. I don't know if
you can catch *any* use which degrades to boolean outside of an if
statement, but I wouldn't expect there to be too many of those?

Thanks,
Jake

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Derrick Stolee


On 8/23/2018 2:53 PM, Jeff King wrote:

On Thu, Aug 23, 2018 at 06:26:58AM -0400, Derrick Stolee wrote:


I think you can safely
ignore the rest of it if you are otherwise occupied. Even if v2.19 ships
without some mitigation, I don't know that it's all that big a deal,
given the numbers I generated (which for some reason are less dramatic
than Stolee's).

My numbers may be more dramatic because my Linux environment is a virtual
machine.

If you have a chance, can you run p0001 on my patch (compared to
2.19-rc0, or to both v2.18 and v2.19-rc0)? It would be nice to double
check that it really is fixing the problem you saw.


Sure. Note: I had to create a new Linux VM on a different machine 
between Tuesday and today, so the absolute numbers are different.


Using git/git:

Test  v2.18.0   v2.19.0-rc0 HEAD
-
0001.2:   3.10(3.02+0.08)   3.27(3.17+0.09) +5.5% 3.14(3.02+0.11) +1.3%


Using torvalds/linux:

Test v2.18.0 v2.19.0-rc0   HEAD
--
0001.2:  56.08(45.91+1.50)   56.60(46.62+1.50) +0.9% 54.61(45.47+1.46) -2.6%


Now here is where I get on my soapbox (and create a TODO for myself 
later). I ran the above with GIT_PERF_REPEAT_COUNT=10, which intuitively 
suggests that the results should be _more_ accurate than the default of 
3. However, I then remember that we only report the *minimum* time from 
all the runs, which is likely to select an outlier from the 
distribution. To test this, I ran a few tests manually and found the 
variation between runs to be larger than 3%.


When I choose my own metrics for performance tests, I like to run at 
least 10 runs, remove the largest AND smallest runs from the samples, 
and then take the average. I did this manually for 'git rev-list --all 
--objects' on git/git and got the following results:


v2.18.0    v2.19.0-rc0   HEAD

3.126 s    3.308 s   3.170 s

For full disclosure, here is a full table including all samples:

|  | v2.18.0 | v2.19.0-rc0 | HEAD    |
|--|-|-|-|
|  | 4.58    | 3.302   | 3.239   |
|  | 3.13    | 3.337   | 3.133   |
|  | 3.213   | 3.291   | 3.159   |
|  | 3.219   | 3.318   | 3.131   |
|  | 3.077   | 3.302   | 3.163   |
|  | 3.074   | 3.328   | 3.119   |
|  | 3.022   | 3.277   | 3.125   |
|  | 3.083   | 3.259   | 3.203   |
|  | 3.057   | 3.311   | 3.223   |
|  | 3.155   | 3.413   | 3.225   |
| Max  | 4.58    | 3.413   | 3.239   |
| Min  | 3.022   | 3.259   | 3.119   |
| Avg* | 3.126   | 3.30825 | 3.17025 |

(Note that the largest one was the first run, on v2.18.0, which is due 
to a cold disk.)


I just kicked off a script that will run this test on the Linux repo 
while I drive home. I'll be able to report a similar table of data easily.


My TODO is to consider aggregating the data this way (or with a median) 
instead of reporting the minimum.


Thanks,

-Stolee

Re: [ANNOUNCE] Git v2.19.0-rc0

On Thu, Aug 23, 2018 at 06:26:58AM -0400, Derrick Stolee wrote:

> > I think you can safely
> > ignore the rest of it if you are otherwise occupied. Even if v2.19 ships
> > without some mitigation, I don't know that it's all that big a deal,
> > given the numbers I generated (which for some reason are less dramatic
> > than Stolee's).
> My numbers may be more dramatic because my Linux environment is a virtual
> machine.

If you have a chance, can you run p0001 on my patch (compared to
2.19-rc0, or to both v2.18 and v2.19-rc0)? It would be nice to double
check that it really is fixing the problem you saw.

-Peff

wide t/perf output, was Re: [ANNOUNCE] Git v2.19.0-rc0

On Thu, Aug 23, 2018 at 06:20:26AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Here are numbers for p0001.2 run against linux.git on a few
> > versions. This is using -O2 with gcc 8.2.0.
> >
> >   Test v2.18.0 v2.19.0-rc0   HEAD
> >   
> > --
> >   0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) 
> > -1.0%
> 
> I see what you did to the formatting here, which is a topic of
> another thread ;-).

Do you happen to have a link? I missed that one, and digging turned up
nothing.

A while ago I wrote the patch below. I don't recall why I never sent it
in (and it doesn't apply cleanly these days, though I'm sure it could be
forward-ported).

-- >8 --
Date: Wed, 20 Jan 2016 23:54:14 -0500
Subject: [PATCH] t/perf: add "tall" output format

When aggregating results, we usually show a list of tests,
one per line, with the tested revisions in columns across.
Like:

$ ./aggregate.perl 348d4f2^ 348d4f2 p7000-filter-branch.sh
Test  348d4f2^   348d4f2
---
7000.2: noop filter   295.32(269.61+14.36)   7.92(0.85+0.72) -97.3%

This is useful if you have a lot of tests to show, but few
revisions; you're effectively comparing the two items on
each line. But sometimes you have the opposite: few tests,
but a large number of revisions. In this case, the lines
get very long, and it's hard to compare values.

This patch introduces a "tall" format that shows the same
data in a more vertical manner:

$ ./aggregate.perl --tall \
348d4f2^ 348d4f2 \
jk/filter-branch-empty^ \
jk/filter-branch-empty \
p7000-filter-branch.sh
Test: p7000-filter-branch.2
348d4f2^  295.32(269.61+14.36)
348d4f27.92(0.85+0.72) -97.3%
jk/filter-branch-empty^9.37(0.87+0.80) -96.8%
jk/filter-branch-empty 7.71(0.92+0.62) -97.4%

Signed-off-by: Jeff King 
---
 t/perf/aggregate.perl | 124 ++
 1 file changed, 88 insertions(+), 36 deletions(-)

diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
index e401208488..d108a02ccd 100755
--- a/t/perf/aggregate.perl
+++ b/t/perf/aggregate.perl
@@ -17,29 +17,41 @@ sub get_times {
return ($rt, $4, $5);
 }
 
-sub format_times {
+sub format_times_list {
my ($r, $u, $s, $firstr) = @_;
if (!defined $r) {
return "";
}
my $out = sprintf "%.2f(%.2f+%.2f)", $r, $u, $s;
+   my $pct;
if (defined $firstr) {
if ($firstr > 0) {
-   $out .= sprintf " %+.1f%%", 100.0*($r-$firstr)/$firstr;
+   $pct = sprintf "%+.1f%%", 100.0*($r-$firstr)/$firstr;
} elsif ($r == 0) {
-   $out .= " =";
+   $pct = "=";
} else {
-   $out .= " +inf";
+   $pct = "+inf";
}
}
-   return $out;
+   return ($out, $pct);
+}
+
+sub format_times {
+   my ($times, $pct) = format_times_list(@_);
+   return defined $pct ? "$times $pct" : $times;
 }
 
 my (@dirs, %dirnames, %dirabbrevs, %prefixes, @tests);
+my ($tall_format);
 while (scalar @ARGV) {
my $arg = $ARGV[0];
my $dir;
last if -f $arg or $arg eq "--";
+   if ($arg eq "--tall") {
+   $tall_format = 1;
+   shift @ARGV;
+   next;
+   }
if (! -d $arg) {
my $rev = Git::command_oneline(qw(rev-parse --verify), $arg);
$dir = "build/".$rev;
@@ -122,6 +134,11 @@ sub have_slash {
return 0;
 }
 
+sub printable_dir {
+   my ($d) = @_;
+   return exists $dirabbrevs{$d} ? $dirabbrevs{$d} : $dirnames{$d};
+}
+
 my %newdirabbrevs = %dirabbrevs;
 while (!have_duplicate(values %newdirabbrevs)) {
%dirabbrevs = %newdirabbrevs;
@@ -132,44 +149,79 @@ sub have_slash {
}
 }
 
-my %times;
-my @colwidth = ((0)x@dirs);
-for my $i (0..$#dirs) {
-   my $d = $dirs[$i];
-   my $w = length (exists $dirabbrevs{$d} ? $dirabbrevs{$d} : 
$dirnames{$d});
-   $colwidth[$i] = $w if $w > $colwidth[$i];
-}
-for my $t (@subtests) {
-   my $firstr;
+binmode STDOUT, ":utf8" or die "PANIC on binmode: $!";
+
+if (!$tall_format) {
+   my %times;
+   my @colwidth = ((0)x@dirs);
for my $i (0..$#dirs) {
my $d = $dirs[$i];
-   $times{$prefixes{$d}.$t} = 
[get_times("$resultsdir/$prefixes{$d}$t.times")];
-   my ($r,$u,$s) = @{$times{$prefixes{$d}.$t}};
-   my $w = length format_times($r,$u,$s,$firstr);
+   my $w = length(printable_dir($d));
$colwidth[$i] = $w if $w > $colwidth[$i];
-   $firstr = $r unless defined $firstr;
}
-}
-my

Re: [ANNOUNCE] Git v2.19.0-rc0

On Thu, Aug 23, 2018 at 06:26:58AM -0400, Derrick Stolee wrote:

> Around the time that my proposed approaches were getting vetoed for
> alignment issues, I figured I was out of my depth here. I reached out to
> Daniel Lemire (of EWAH bitmap fame) on Twitter [1]. His blog is full of
> posts of word-based approaches to different problems, so I thought he might
> know something off the top of his head that would be applicable. His
> conclusion (after looking only a short time) was to take a 'hasheq' approach
> [2] like Peff suggested [3]. Since that requires auditing all callers of
> hashcmp to see if hasheq is appropriate, it is not a good solution for 2.19
> but (in my opinion) should be evaluated as part of the 2.20 cycle.

I think that audit isn't actually too bad (but definitely not something
we should do for v2.19). The cocci patch I showed earlier hits most of
them. It misses the negated ones (e.g., "if (oidcmp(...))"). I'm not
sure if there's a way to ask coccinelle only for oidcmp()
used in a boolean context.

Just skimming the grep output, it's basically every call except the ones
we use for binary search.

-Peff

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Junio C Hamano

Jeff King  writes:

> Here are numbers for p0001.2 run against linux.git on a few
> versions. This is using -O2 with gcc 8.2.0.
>
>   Test v2.18.0 v2.19.0-rc0   HEAD
>   
> --
>   0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) 
> -1.0%

I see what you did to the formatting here, which is a topic of
another thread ;-).

Thanks, as Derrick also noted, I agree this is an appropriate
workaround for the upcoming release and we may want to explore
hasheq() and other solutions as part of the effort during the next
cycle (which can start now if people are bored---after all working
on the codebase is the best way to hunt for recent bugs).

Thanks, will queue.

> diff --git a/cache.h b/cache.h
> index b1fd3d58ab..4d014541ab 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1023,6 +1023,16 @@ extern const struct object_id null_oid;
>  
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> *sha2)
>  {
> + /*
> +  * This is a temporary optimization hack. By asserting the size here,
> +  * we let the compiler know that it's always going to be 20, which lets
> +  * it turn this fixed-size memcmp into a few inline instructions.
> +  *
> +  * This will need to be extended or ripped out when we learn about
> +  * hashes of different sizes.
> +  */
> + if (the_hash_algo->rawsz != 20)
> + BUG("hash size not yet supported by hashcmp");
>   return memcmp(sha1, sha2, the_hash_algo->rawsz);
>  }

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Junio C Hamano

Derrick Stolee  writes:

> I was thinking that having a mitigation for 2.19 is best, and then we
> can focus as part of the 2.20 cycle how we can properly avoid this
> cost, especially when 32 is a valid option.
> ...
> ... to
> take a 'hasheq' approach [2] like Peff suggested [3]. Since that
> requires auditing all callers of hashcmp to see if hasheq is
> appropriate, it is not a good solution for 2.19 but (in my opinion)
> should be evaluated as part of the 2.20 cycle.

Thanks for thoughtful comments.  I think it makes sense to go with
the "tell compiler hashcmp() currently is only about 20-byte array"
for now, as it is trivial to see why it cannot break things.

During 2.20 cycle, we may find out that hasheq() abstraction
performs better than hashcmp() with multiple lengths, and end up
doing that "audit and replace to use hasheq()".  But as you said,
it probably isnot a good idea to rush it in this cycle.

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Derrick Stolee

On 8/23/2018 1:04 AM, Jeff King wrote:

On Thu, Aug 23, 2018 at 03:47:07AM +, brian m. carlson wrote:

I expect that's going to be the case as well. I have patches that
wire up actual SHA-256 support in my hash-impl branch.

However, having said that, I'm happy to defer to whatever everyone else
thinks is best for 2.19. The assert solution would be fine with me in
this situation, and if we need to pull it out in the future, that's okay
with me.

I don't really have a strong opinion on this either way, so if someone
else does, please say so. I have somewhat more limited availability
over the next couple days, as I'm travelling on business, but I'm happy
to review a patch (and it seems like Peff has one minus the actual
commit message).

I just posted the patch elsewhere in the thread.

Thank you for that!

I think you can safely
ignore the rest of it if you are otherwise occupied. Even if v2.19 ships
without some mitigation, I don't know that it's all that big a deal,
given the numbers I generated (which for some reason are less dramatic
than Stolee's).
My numbers may be more dramatic because my Linux environment is a
virtual machine.

I was thinking that having a mitigation for 2.19 is best, and then we
can focus as part of the 2.20 cycle how we can properly avoid this cost,
especially when 32 is a valid option.

Around the time that my proposed approaches were getting vetoed for
alignment issues, I figured I was out of my depth here. I reached out to
Daniel Lemire (of EWAH bitmap fame) on Twitter [1]. His blog is full of
posts of word-based approaches to different problems, so I thought he
might know something off the top of his head that would be applicable.
His conclusion (after looking only a short time) was to take a 'hasheq'
approach [2] like Peff suggested [3]. Since that requires auditing all
callers of hashcmp to see if hasheq is appropriate, it is not a good
solution for 2.19 but (in my opinion) should be evaluated as part of the
2.20 cycle.

Of course, if someone with knowledge of word-alignment issues across the
platforms we support knows how to enforce an alignment for object_id,
then something word-based like [4] could be reconsidered.

Thanks, everyone!
-Stolee

[1] https://twitter.com/stolee/status/1032312965754748930

[2]
https://lemire.me/blog/2018/08/22/avoid-lexicographical-comparisons-when-testing-for-string-equality/

[3]
https://public-inbox.org/git/20180822030344.ga14...@sigill.intra.peff.net/

[4]
https://public-inbox.org/git/7ea416cf-b043-1274-e161-85a8780b8...@gmail.com/

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread brian m. carlson

On Thu, Aug 23, 2018 at 01:02:25AM -0400, Jeff King wrote:
> Here's the patch. For some reason my numbers aren't quite as large as
> they were yesterday (I was very careful to keep the system unloaded
> today, whereas yesterday I was doing a few other things, so perhaps that
> is the difference).

This looks sane to me.  Thanks for writing this up.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Jonathan Nieder

Jeff King wrote:

> Here's the patch. For some reason my numbers aren't quite as large as
> they were yesterday (I was very careful to keep the system unloaded
> today, whereas yesterday I was doing a few other things, so perhaps that
> is the difference).
>
> -- >8 --
> Subject: [PATCH] hashcmp: assert constant hash size
>
> Prior to 509f6f62a4 (cache: update object ID functions for
> the_hash_algo, 2018-07-16), hashcmp() called memcmp() with a
> constant size of 20 bytes. Some compilers were able to turn
> that into a few quad-word comparisons, which is faster than
> actually calling memcmp().
>
> In 509f6f62a4, we started using the_hash_algo->rawsz
> instead. Even though this will always be 20, the compiler
> doesn't know that while inlining hashcmp() and ends up just
> generating a call to memcmp().
>
> Eventually we'll have to deal with multiple hash sizes, but
> for the upcoming v2.19, we can restore some of the original
> performance by asserting on the size. That gives the
> compiler enough information to know that the memcmp will
> always be called with a length of 20, and it performs the
> same optimization.
>
> Here are numbers for p0001.2 run against linux.git on a few
> versions. This is using -O2 with gcc 8.2.0.
>
>   Test v2.18.0 v2.19.0-rc0   HEAD
>   
> --
>   0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) 
> -1.0%
>
> You can see that v2.19 is a little slower than v2.18. This
> commit ended up slightly faster than v2.18, but there's a
> fair bit of run-to-run noise (the generated code in the two
> cases is basically the same). This patch does seem to be
> consistently 1-2% faster than v2.19.
>
> I tried changing hashcpy(), which was also touched by
> 509f6f62a4, in the same way, but couldn't measure any
> speedup. Which makes sense, at least for this workload. A
> traversal of the whole commit graph requires looking up
> every entry of every tree via lookup_object(). That's many
> multiples of the numbers of objects in the repository (most
> of the lookups just return "yes, we already saw that
> object").
>
> Reported-by: Derrick Stolee 
> Signed-off-by: Jeff King 
> ---
>  cache.h | 10 ++
>  1 file changed, 10 insertions(+)

Reviewed-by: Jonathan Nieder 

Verified using "make object.s" that the memcmp call goes away.  Thank
you.

Re: [ANNOUNCE] Git v2.19.0-rc0

On Thu, Aug 23, 2018 at 03:47:07AM +, brian m. carlson wrote:

> I expect that's going to be the case as well.  I have patches that
> wire up actual SHA-256 support in my hash-impl branch.
> 
> However, having said that, I'm happy to defer to whatever everyone else
> thinks is best for 2.19.  The assert solution would be fine with me in
> this situation, and if we need to pull it out in the future, that's okay
> with me.
> 
> I don't really have a strong opinion on this either way, so if someone
> else does, please say so.  I have somewhat more limited availability
> over the next couple days, as I'm travelling on business, but I'm happy
> to review a patch (and it seems like Peff has one minus the actual
> commit message).

I just posted the patch elsewhere in the thread. I think you can safely
ignore the rest of it if you are otherwise occupied. Even if v2.19 ships
without some mitigation, I don't know that it's all that big a deal,
given the numbers I generated (which for some reason are less dramatic
than Stolee's).

-Peff

Re: [ANNOUNCE] Git v2.19.0-rc0

On Wed, Aug 22, 2018 at 07:27:56PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > FWIW, it's not 10%. The best I measured was ~4% on a very
> > hashcmp-limited operation, and I suspect even that may be highly
> > dependent on the compiler. We might be able to improve more by
> > sprinkling more asserts around, but there are 75 mentions of
> > the_hash_algo->rawsz. I wouldn't want to an assert at each one.
> >
> > I don't mind doing one or a handful of these asserts as part of v2.19 if
> > we want to try to reclaim those few percent. But I suspect the very
> > first commit in any further hash-transition work is just going to be to
> > rip them all out.
> 
> I was thinking just hashcmp and hashcpy.
> 
> Ideally such a change would come with a performance test to help the
> person writing that very first commit.  Except we already have
> performance tests that capture this. ;-)
> 
> For further hash-transition work, I agree someone may want to revert
> this, and I don't mind such a revert appearing right away in "next".
> And it's possible that we might have to do the equivalent of manual
> template expansion to recover the performance in some
> performance-sensitive areas.  Maybe we can get the compiler to
> cooperate with us in that and maybe we can't.  That's okay with me.
> 
> Anyway, I'll resend your patch with a commit message added some time
> this evening.

Here's the patch. For some reason my numbers aren't quite as large as
they were yesterday (I was very careful to keep the system unloaded
today, whereas yesterday I was doing a few other things, so perhaps that
is the difference).

-- >8 --
Subject: [PATCH] hashcmp: assert constant hash size

Prior to 509f6f62a4 (cache: update object ID functions for
the_hash_algo, 2018-07-16), hashcmp() called memcmp() with a
constant size of 20 bytes. Some compilers were able to turn
that into a few quad-word comparisons, which is faster than
actually calling memcmp().

In 509f6f62a4, we started using the_hash_algo->rawsz
instead. Even though this will always be 20, the compiler
doesn't know that while inlining hashcmp() and ends up just
generating a call to memcmp().

Eventually we'll have to deal with multiple hash sizes, but
for the upcoming v2.19, we can restore some of the original
performance by asserting on the size. That gives the
compiler enough information to know that the memcmp will
always be called with a length of 20, and it performs the
same optimization.

Here are numbers for p0001.2 run against linux.git on a few
versions. This is using -O2 with gcc 8.2.0.

  Test v2.18.0 v2.19.0-rc0   HEAD
  --
  0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) -1.0%

You can see that v2.19 is a little slower than v2.18. This
commit ended up slightly faster than v2.18, but there's a
fair bit of run-to-run noise (the generated code in the two
cases is basically the same). This patch does seem to be
consistently 1-2% faster than v2.19.

I tried changing hashcpy(), which was also touched by
509f6f62a4, in the same way, but couldn't measure any
speedup. Which makes sense, at least for this workload. A
traversal of the whole commit graph requires looking up
every entry of every tree via lookup_object(). That's many
multiples of the numbers of objects in the repository (most
of the lookups just return "yes, we already saw that
object").

Reported-by: Derrick Stolee 
Signed-off-by: Jeff King 
---
 cache.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/cache.h b/cache.h
index b1fd3d58ab..4d014541ab 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,6 +1023,16 @@ extern const struct object_id null_oid;

 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
+   /*
+* This is a temporary optimization hack. By asserting the size here,
+* we let the compiler know that it's always going to be 20, which lets
+* it turn this fixed-size memcmp into a few inline instructions.
+*
+* This will need to be extended or ripped out when we learn about
+* hashes of different sizes.
+*/
+   if (the_hash_algo->rawsz != 20)
+   BUG("hash size not yet supported by hashcmp");
return memcmp(sha1, sha2, the_hash_algo->rawsz);
 }

-- 
2.19.0.rc0.412.g7005db4e88

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread brian m. carlson

On Wed, Aug 22, 2018 at 10:16:18PM -0400, Jeff King wrote:
> FWIW, it's not 10%. The best I measured was ~4% on a very
> hashcmp-limited operation, and I suspect even that may be highly
> dependent on the compiler. We might be able to improve more by
> sprinkling more asserts around, but there are 75 mentions of
> the_hash_algo->rawsz. I wouldn't want to an assert at each one.
> 
> I don't mind doing one or a handful of these asserts as part of v2.19 if
> we want to try to reclaim those few percent. But I suspect the very
> first commit in any further hash-transition work is just going to be to
> rip them all out.

I expect that's going to be the case as well.  I have patches that
wire up actual SHA-256 support in my hash-impl branch.

However, having said that, I'm happy to defer to whatever everyone else
thinks is best for 2.19.  The assert solution would be fine with me in
this situation, and if we need to pull it out in the future, that's okay
with me.

I don't really have a strong opinion on this either way, so if someone
else does, please say so.  I have somewhat more limited availability
over the next couple days, as I'm travelling on business, but I'm happy
to review a patch (and it seems like Peff has one minus the actual
commit message).
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Jonathan Nieder

Jeff King wrote:

> FWIW, it's not 10%. The best I measured was ~4% on a very
> hashcmp-limited operation, and I suspect even that may be highly
> dependent on the compiler. We might be able to improve more by
> sprinkling more asserts around, but there are 75 mentions of
> the_hash_algo->rawsz. I wouldn't want to an assert at each one.
>
> I don't mind doing one or a handful of these asserts as part of v2.19 if
> we want to try to reclaim those few percent. But I suspect the very
> first commit in any further hash-transition work is just going to be to
> rip them all out.

I was thinking just hashcmp and hashcpy.

Ideally such a change would come with a performance test to help the
person writing that very first commit.  Except we already have
performance tests that capture this. ;-)

For further hash-transition work, I agree someone may want to revert
this, and I don't mind such a revert appearing right away in "next".
And it's possible that we might have to do the equivalent of manual
template expansion to recover the performance in some
performance-sensitive areas.  Maybe we can get the compiler to
cooperate with us in that and maybe we can't.  That's okay with me.

Anyway, I'll resend your patch with a commit message added some time
this evening.

Thanks,
Jonathan

Re: [ANNOUNCE] Git v2.19.0-rc0

On Wed, Aug 22, 2018 at 06:23:43PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> >> On Tue, 2018-08-21 at 23:03 -0400, Jeff King wrote:
> 
> >>>  static inline int hashcmp(const unsigned char *sha1, const unsigned
> >>> char *sha2)
> >>>  {
> >>> +   assert(the_hash_algo->rawsz == 20);
> >>> return memcmp(sha1, sha2, the_hash_algo->rawsz);
> >>>  }
> [...]
> >  The bigger questions are:
> >
> >   - are we OK with such an assertion; and
> >
> >   - does the assertion still give us the desired behavior when we add in
> > a branch for rawsz==32?
> >
> > And I think the answers for those are both "probably not".
> 
> At this point in the release process, I think the answer to the first
> question is a pretty clear "yes".
> 
> A ~10% increase in latency of some operations is quite significant, in
> exchange for no user benefit yet.  We can continue to try to figure
> out how to convince compilers to generate good code for this (and
> that's useful), but in the meantime we should also do the simple thing
> to avoid the regression for users.

FWIW, it's not 10%. The best I measured was ~4% on a very
hashcmp-limited operation, and I suspect even that may be highly
dependent on the compiler. We might be able to improve more by
sprinkling more asserts around, but there are 75 mentions of
the_hash_algo->rawsz. I wouldn't want to an assert at each one.

I don't mind doing one or a handful of these asserts as part of v2.19 if
we want to try to reclaim those few percent. But I suspect the very
first commit in any further hash-transition work is just going to be to
rip them all out.

-Peff

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Jonathan Nieder

Hi,

Jeff King wrote:
>> On Tue, 2018-08-21 at 23:03 -0400, Jeff King wrote:

>>>  static inline int hashcmp(const unsigned char *sha1, const unsigned
>>> char *sha2)
>>>  {
>>> +   assert(the_hash_algo->rawsz == 20);
>>> return memcmp(sha1, sha2, the_hash_algo->rawsz);
>>>  }
[...]
>  The bigger questions are:
>
>   - are we OK with such an assertion; and
>
>   - does the assertion still give us the desired behavior when we add in
> a branch for rawsz==32?
>
> And I think the answers for those are both "probably not".

At this point in the release process, I think the answer to the first
question is a pretty clear "yes".

A ~10% increase in latency of some operations is quite significant, in
exchange for no user benefit yet.  We can continue to try to figure
out how to convince compilers to generate good code for this (and
that's useful), but in the meantime we should also do the simple thing
to avoid the regression for users.

Thanks,
Jonathan

Re: [ANNOUNCE] Git v2.19.0-rc0


On 8/22/2018 12:58 PM, Duy Nguyen wrote:

On Wed, Aug 22, 2018 at 6:49 PM Derrick Stolee  wrote:

On 8/22/2018 12:26 PM, Jeff King wrote:

On Wed, Aug 22, 2018 at 06:14:24PM +0200, Duy Nguyen wrote:


On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen  wrote:

On Wed, Aug 22, 2018 at 6:03 PM Jeff King  wrote:

On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:


The other thing I was going to recommend (and I'll try to test this out
myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
volatile variable, since it is being referenced through a pointer. Perhaps
storing the value locally and then casing on it would help?

I tried various sprinkling of "const" around the declarations to make it
clear that the values wouldn't change once we saw them. But I couldn't
detect any difference. At most I think that would let us hoist the "if"
out of the loop, but gcc still seems unwilling to expand the memcmp when
there are other branches.

I think if that's the thing we want to have happen, we really do need to
just write it out on that branch rather than saying "memcmp".

This reminds me of an old discussion about memcpy() vs doing explicit
compare loop with lots of performance measurements..

Ah found it. Not sure if it is still relevant in light of multiple hash support

https://public-inbox.org/git/20110427225114.ga16...@elte.hu/

Yes, that was what I meant. We actually did switch to that hand-rolled
loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
memcmp instead of open-coded loop, 2017-08-09).

Looking at that commit, I'm surprised the old logic was just a for loop, 
instead of a word-based approach, such as the following:

Might work on x86 but it breaks on cpu architectures with stricter
alignment. I don't think we have a guarantee that object_id is always
8 byte aligned.
You (and Peff) are probably correct here, which is unfortunate. I'm not 
familiar with alignment constraints, but assume that such a word-based 
approach is best.


Thanks,
-Stolee

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Junio C Hamano

Jeff King  writes:

> On Wed, Aug 22, 2018 at 12:49:34PM -0400, Derrick Stolee wrote:
>
>> > Yes, that was what I meant. We actually did switch to that hand-rolled
>> > loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
>> > memcmp instead of open-coded loop, 2017-08-09).
>> 
>> Looking at that commit, I'm surprised the old logic was just a for
>> loop, instead of a word-based approach, such as the following:
>> [...]
>> +struct object_id_20 {
>> +   uint64_t data0;
>> +   uint64_t data1;
>> +   uint32_t data2;
>> +};
>> +
>>  static inline int hashcmp(const unsigned char *sha1, const unsigned char
>> *sha2)
>>  {
>> -   return memcmp(sha1, sha2, the_hash_algo->rawsz);
>> +   if (the_hash_algo->rawsz == 20) {
>> +   struct object_id_20 *obj1 = (struct object_id_20 *)sha1;
>> +   struct object_id_20 *obj2 = (struct object_id_20 *)sha2;
>
> I wonder if you're potentially running afoul of alignment requirements
> here.

Yup, and I think that all was discussed in that old thread ;-)

Re: [ANNOUNCE] Git v2.19.0-rc0

On Wed, Aug 22, 2018 at 12:49:34PM -0400, Derrick Stolee wrote:

> > Yes, that was what I meant. We actually did switch to that hand-rolled
> > loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
> > memcmp instead of open-coded loop, 2017-08-09).
> 
> Looking at that commit, I'm surprised the old logic was just a for
> loop, instead of a word-based approach, such as the following:
> [...]
> +struct object_id_20 {
> +   uint64_t data0;
> +   uint64_t data1;
> +   uint32_t data2;
> +};
> +
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char
> *sha2)
>  {
> -   return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +   if (the_hash_algo->rawsz == 20) {
> +   struct object_id_20 *obj1 = (struct object_id_20 *)sha1;
> +   struct object_id_20 *obj2 = (struct object_id_20 *)sha2;

I wonder if you're potentially running afoul of alignment requirements
here.

-Peff

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Duy Nguyen

On Wed, Aug 22, 2018 at 6:49 PM Derrick Stolee  wrote:
>
> On 8/22/2018 12:26 PM, Jeff King wrote:
> > On Wed, Aug 22, 2018 at 06:14:24PM +0200, Duy Nguyen wrote:
> >
> >> On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen  wrote:
> >>> On Wed, Aug 22, 2018 at 6:03 PM Jeff King  wrote:
>  On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:
> 
> > The other thing I was going to recommend (and I'll try to test this out
> > myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
> > volatile variable, since it is being referenced through a pointer. 
> > Perhaps
> > storing the value locally and then casing on it would help?
>  I tried various sprinkling of "const" around the declarations to make it
>  clear that the values wouldn't change once we saw them. But I couldn't
>  detect any difference. At most I think that would let us hoist the "if"
>  out of the loop, but gcc still seems unwilling to expand the memcmp when
>  there are other branches.
> 
>  I think if that's the thing we want to have happen, we really do need to
>  just write it out on that branch rather than saying "memcmp".
> >>> This reminds me of an old discussion about memcpy() vs doing explicit
> >>> compare loop with lots of performance measurements..
> >> Ah found it. Not sure if it is still relevant in light of multiple hash 
> >> support
> >>
> >> https://public-inbox.org/git/20110427225114.ga16...@elte.hu/
> > Yes, that was what I meant. We actually did switch to that hand-rolled
> > loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
> > memcmp instead of open-coded loop, 2017-08-09).
>
> Looking at that commit, I'm surprised the old logic was just a for loop, 
> instead of a word-based approach, such as the following:

Might work on x86 but it breaks on cpu architectures with stricter
alignment. I don't think we have a guarantee that object_id is always
8 byte aligned.

>
> diff --git a/cache.h b/cache.h
> index b1fd3d58ab..5e5819ad49 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1021,9 +1021,41 @@ extern int find_unique_abbrev_r(char *hex, const
> struct object_id *oid, int len)
>   extern const unsigned char null_sha1[GIT_MAX_RAWSZ];
>   extern const struct object_id null_oid;
>
> +static inline int word_cmp_32(uint32_t a, uint32_t b)
> +{
> +   return memcmp(, , sizeof(uint32_t));
> +}
> +
> +static inline int word_cmp_64(uint64_t a, uint64_t b)
> +{
> +   return memcmp(, , sizeof(uint64_t));
> +}
> +
> +struct object_id_20 {
> +   uint64_t data0;
> +   uint64_t data1;
> +   uint32_t data2;
> +};
> +
>   static inline int hashcmp(const unsigned char *sha1, const unsigned
> char *sha2)
>   {
> -   return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +   if (the_hash_algo->rawsz == 20) {
> +   struct object_id_20 *obj1 = (struct object_id_20 *)sha1;
> +   struct object_id_20 *obj2 = (struct object_id_20 *)sha2;
> +
> +   if (obj1->data0 == obj2->data0) {
> +   if (obj1->data1 == obj2->data1) {
> +   if (obj1->data2 == obj2->data2) {
> +   return 0;
> +   }
> +   return word_cmp_32(obj1->data2,
> obj2->data2);
> +   }
> +   return word_cmp_64(obj1->data1, obj2->data1);
> +   }
> +   return word_cmp_64(obj1->data0, obj2->data0);
> +   }
> +
> +   assert(0);
>   }
>
>   static inline int oidcmp(const struct object_id *oid1, const struct
> object_id *oid2)
>
>


-- 
Duy

Re: [ANNOUNCE] Git v2.19.0-rc0


On 8/22/2018 12:26 PM, Jeff King wrote:

On Wed, Aug 22, 2018 at 06:14:24PM +0200, Duy Nguyen wrote:


On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen  wrote:

On Wed, Aug 22, 2018 at 6:03 PM Jeff King  wrote:

On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:


The other thing I was going to recommend (and I'll try to test this out
myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
volatile variable, since it is being referenced through a pointer. Perhaps
storing the value locally and then casing on it would help?

I tried various sprinkling of "const" around the declarations to make it
clear that the values wouldn't change once we saw them. But I couldn't
detect any difference. At most I think that would let us hoist the "if"
out of the loop, but gcc still seems unwilling to expand the memcmp when
there are other branches.

I think if that's the thing we want to have happen, we really do need to
just write it out on that branch rather than saying "memcmp".

This reminds me of an old discussion about memcpy() vs doing explicit
compare loop with lots of performance measurements..

Ah found it. Not sure if it is still relevant in light of multiple hash support

https://public-inbox.org/git/20110427225114.ga16...@elte.hu/

Yes, that was what I meant. We actually did switch to that hand-rolled
loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
memcmp instead of open-coded loop, 2017-08-09).


Looking at that commit, I'm surprised the old logic was just a for loop, 
instead of a word-based approach, such as the following:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..5e5819ad49 100644
--- a/cache.h
+++ b/cache.h
@@ -1021,9 +1021,41 @@ extern int find_unique_abbrev_r(char *hex, const 
struct object_id *oid, int len)

 extern const unsigned char null_sha1[GIT_MAX_RAWSZ];
 extern const struct object_id null_oid;

+static inline int word_cmp_32(uint32_t a, uint32_t b)
+{
+   return memcmp(, , sizeof(uint32_t));
+}
+
+static inline int word_cmp_64(uint64_t a, uint64_t b)
+{
+   return memcmp(, , sizeof(uint64_t));
+}
+
+struct object_id_20 {
+   uint64_t data0;
+   uint64_t data1;
+   uint32_t data2;
+};
+
 static inline int hashcmp(const unsigned char *sha1, const unsigned 
char *sha2)

 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   if (the_hash_algo->rawsz == 20) {
+   struct object_id_20 *obj1 = (struct object_id_20 *)sha1;
+   struct object_id_20 *obj2 = (struct object_id_20 *)sha2;
+
+   if (obj1->data0 == obj2->data0) {
+   if (obj1->data1 == obj2->data1) {
+   if (obj1->data2 == obj2->data2) {
+   return 0;
+   }
+   return word_cmp_32(obj1->data2, 
obj2->data2);

+   }
+   return word_cmp_64(obj1->data1, obj2->data1);
+   }
+   return word_cmp_64(obj1->data0, obj2->data0);
+   }
+
+   assert(0);
 }

 static inline int oidcmp(const struct object_id *oid1, const struct 
object_id *oid2)

Re: [ANNOUNCE] Git v2.19.0-rc0

On Wed, Aug 22, 2018 at 06:14:24PM +0200, Duy Nguyen wrote:

> On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen  wrote:
> >
> > On Wed, Aug 22, 2018 at 6:03 PM Jeff King  wrote:
> > >
> > > On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:
> > >
> > > > The other thing I was going to recommend (and I'll try to test this out
> > > > myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
> > > > volatile variable, since it is being referenced through a pointer. 
> > > > Perhaps
> > > > storing the value locally and then casing on it would help?
> > >
> > > I tried various sprinkling of "const" around the declarations to make it
> > > clear that the values wouldn't change once we saw them. But I couldn't
> > > detect any difference. At most I think that would let us hoist the "if"
> > > out of the loop, but gcc still seems unwilling to expand the memcmp when
> > > there are other branches.
> > >
> > > I think if that's the thing we want to have happen, we really do need to
> > > just write it out on that branch rather than saying "memcmp".
> >
> > This reminds me of an old discussion about memcpy() vs doing explicit
> > compare loop with lots of performance measurements..
> 
> Ah found it. Not sure if it is still relevant in light of multiple hash 
> support
> 
> https://public-inbox.org/git/20110427225114.ga16...@elte.hu/

Yes, that was what I meant. We actually did switch to that hand-rolled
loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
memcmp instead of open-coded loop, 2017-08-09).

-Peff

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Duy Nguyen

On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen  wrote:
>
> On Wed, Aug 22, 2018 at 6:03 PM Jeff King  wrote:
> >
> > On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:
> >
> > > The other thing I was going to recommend (and I'll try to test this out
> > > myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
> > > volatile variable, since it is being referenced through a pointer. Perhaps
> > > storing the value locally and then casing on it would help?
> >
> > I tried various sprinkling of "const" around the declarations to make it
> > clear that the values wouldn't change once we saw them. But I couldn't
> > detect any difference. At most I think that would let us hoist the "if"
> > out of the loop, but gcc still seems unwilling to expand the memcmp when
> > there are other branches.
> >
> > I think if that's the thing we want to have happen, we really do need to
> > just write it out on that branch rather than saying "memcmp".
>
> This reminds me of an old discussion about memcpy() vs doing explicit
> compare loop with lots of performance measurements..

Ah found it. Not sure if it is still relevant in light of multiple hash support

https://public-inbox.org/git/20110427225114.ga16...@elte.hu/
-- 
Duy

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Duy Nguyen

On Wed, Aug 22, 2018 at 6:03 PM Jeff King  wrote:
>
> On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:
>
> > The other thing I was going to recommend (and I'll try to test this out
> > myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
> > volatile variable, since it is being referenced through a pointer. Perhaps
> > storing the value locally and then casing on it would help?
>
> I tried various sprinkling of "const" around the declarations to make it
> clear that the values wouldn't change once we saw them. But I couldn't
> detect any difference. At most I think that would let us hoist the "if"
> out of the loop, but gcc still seems unwilling to expand the memcmp when
> there are other branches.
>
> I think if that's the thing we want to have happen, we really do need to
> just write it out on that branch rather than saying "memcmp".

This reminds me of an old discussion about memcpy() vs doing explicit
compare loop with lots of performance measurements.. Is that what you
meant by "write it out"?
-- 
Duy

Re: [ANNOUNCE] Git v2.19.0-rc0

On Wed, Aug 22, 2018 at 10:28:56AM -0400, Derrick Stolee wrote:

> In my testing, I've had the best luck with this change:
> 
> diff --git a/cache.h b/cache.h
> index b1fd3d58ab..6c8b51c390 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1023,7 +1023,14 @@ extern const struct object_id null_oid;
> 
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char
> *sha2)
>  {
> -   return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +   switch (the_hash_algo->rawsz) {
> +   case 20:
> +   return memcmp(sha1, sha2, 20);
> +   case 32:
> +   return memcmp(sha1, sha2, 32);
> +   default:
> +   assert(0);
> +   }
>  }
> 
> The fact that '20' and '32' are constants here may be helpful to the
> compiler. Can someone else test the perf?

I tested that one last night (and just re-tested it now to be sure). It
seems to just generate two separate calls to memcmp, with no speed
improvement.

-Peff

Re: [ANNOUNCE] Git v2.19.0-rc0

On Wed, Aug 22, 2018 at 08:42:20AM -0400, Paul Smith wrote:

> On Tue, 2018-08-21 at 23:03 -0400, Jeff King wrote:
> >  static inline int hashcmp(const unsigned char *sha1, const unsigned
> > char *sha2)
> >  {
> > +   assert(the_hash_algo->rawsz == 20);
> > return memcmp(sha1, sha2, the_hash_algo->rawsz);
> >  }
> 
> I'm not familiar with Git code, but for most environments assert() is a
> macro which is compiled out when built for "release mode" (whatever
> that might mean).  If that's the case for Git too, then relying on
> assert() to provide a side-effect (even an optimizer hint side-effect)
> won't work and this will actually get slower when built for "release
> mode".
> 
> Just a thought...

We don't have such a "release mode" in Git, though of course people may
pass -DNDEBUG to the compiler if they want.

However, to me how we spell the assert is mostly orthogonal to the
discussion. We can do "if (...) BUG(...)" to get a guaranteed-present
conditional. The bigger questions are:

  - are we OK with such an assertion; and

  - does the assertion still give us the desired behavior when we add in
a branch for rawsz==32?

And I think the answers for those are both "probably not".

-Peff

Re: [ANNOUNCE] Git v2.19.0-rc0

On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:

> The other thing I was going to recommend (and I'll try to test this out
> myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
> volatile variable, since it is being referenced through a pointer. Perhaps
> storing the value locally and then casing on it would help?

I tried various sprinkling of "const" around the declarations to make it
clear that the values wouldn't change once we saw them. But I couldn't
detect any difference. At most I think that would let us hoist the "if"
out of the loop, but gcc still seems unwilling to expand the memcmp when
there are other branches.

I think if that's the thing we want to have happen, we really do need to
just write it out on that branch rather than saying "memcmp".

-Peff

Re: [ANNOUNCE] Git v2.19.0-rc0

On Wed, Aug 22, 2018 at 09:39:57AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I don't have a good option. The assert() thing works until I add in the
> > "32" branch, but that's just punting the issue off until you add support
> > for the new hash.
> >
> > Hand-rolling our own asm or C is a portability headache, and we need to
> > change all of the callsites to use a new hasheq().
> >
> > Hiding it behind a per-hash function is conceptually cleanest, but not
> > quite as fast. And it also requires hasheq().
> >
> > So all of the solutions seem non-trivial.  Again, I'm starting to wonder
> > if it's worth chasing this few percent.
> 
> Did you try __builtin_expect? It's a GCC builtin for these sorts of
> situations, and sometimes helps:
> https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
> 
> I.e. you'd tell GCC we expect to have the 20 there with:
> 
> if (__builtin_expect(the_hash_algo->rawsz == 20, 1)) { ... }
> 
> The perl codebase has LIKELY() and UNLIKELY() macros for this which if
> the feature isn't available fall back on just plain C code:
> https://github.com/Perl/perl5/blob/v5.27.7/perl.h#L3335-L3344

Sadly, no, this doesn't seem to change anything. We still end up with a
single call to memcmp.

I also tried "hiding" the fallback call like this:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..7808bf3d6b 100644
--- a/cache.h
+++ b/cache.h
@@ -1021,9 +1021,13 @@ extern int find_unique_abbrev_r(char *hex, const struct 
object_id *oid, int len)
 extern const unsigned char null_sha1[GIT_MAX_RAWSZ];
 extern const struct object_id null_oid;
 
+int super_secret_memcmp(const void *a, const void *b, size_t len);
+
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   if (the_hash_algo->rawsz == 20)
+   return memcmp(sha1, sha2, 20);
+   return super_secret_memcmp(sha1, sha2, the_hash_algo->rawsz);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)
diff --git a/sha1-file.c b/sha1-file.c
index 97b7423848..5cd0a4b73f 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -2280,3 +2280,8 @@ int read_loose_object(const char *path,
munmap(map, mapsize);
return ret;
 }
+
+int super_secret_memcmp(const void *a, const void *b, size_t len)
+{
+   return memcmp(a, b, len);
+}

but that just results in calling memcmp and super_secret_memcmp on the
two codepaths (with or without the __builtin_expect).

-Peff

Re: [ANNOUNCE] Git v2.19.0-rc0


On 8/22/2018 1:36 AM, brian m. carlson wrote:

On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:

So I wonder if there's some other way to tell the compiler that we'll
only have a few values. An enum comes to mind, though I don't think the
enum rules are strict enough to make this guarantee (after all, it's OK
to bitwise-OR enums, so they clearly don't specify all possible values).

I was thinking about this:

diff --git a/cache.h b/cache.h
index 1398b2a4e4..1f5c6e9319 100644
--- a/cache.h
+++ b/cache.h
@@ -1033,7 +1033,14 @@ extern const struct object_id null_oid;
  
  static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)

  {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   switch (the_hash_algo->rawsz) {
+   case 20:
+   return memcmp(sha1, sha2, 20);
+   case 32:
+   return memcmp(sha1, sha2, 32);
+   default:
+   assert(0);
+   }
  }
  
  static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)


That would make it obvious that there are at most two options.
Unfortunately, gcc for me determines that the buffer in walker.c is 20
bytes in size and steadfastly refuses to compile because it doesn't know
that the value will never be 32 in our codebase currently.  I'd need to
send in more patches before it would compile.

I don't know if something like this is an improvement or now, but this
seems to at least compile:

diff --git a/cache.h b/cache.h
index 1398b2a4e4..3207f74771 100644
--- a/cache.h
+++ b/cache.h
@@ -1033,7 +1033,13 @@ extern const struct object_id null_oid;
  
  static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)

  {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   switch (the_hash_algo->rawsz) {
+   case 20:
+   case 32:
+   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   default:
+   assert(0);
+   }
  }

In my testing, I've had the best luck with this change:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..6c8b51c390 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,7 +1023,14 @@ extern const struct object_id null_oid;

 static inline int hashcmp(const unsigned char *sha1, const unsigned 
char *sha2)

 {
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   switch (the_hash_algo->rawsz) {
+   case 20:
+   return memcmp(sha1, sha2, 20);
+   case 32:
+   return memcmp(sha1, sha2, 32);
+   default:
+   assert(0);
+   }
 }

The fact that '20' and '32' are constants here may be helpful to the 
compiler. Can someone else test the perf?


Thanks,
-Stolee

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Paul Smith

On Tue, 2018-08-21 at 23:03 -0400, Jeff King wrote:
>  static inline int hashcmp(const unsigned char *sha1, const unsigned
> char *sha2)
>  {
> +   assert(the_hash_algo->rawsz == 20);
> return memcmp(sha1, sha2, the_hash_algo->rawsz);
>  }

I'm not familiar with Git code, but for most environments assert() is a
macro which is compiled out when built for "release mode" (whatever
that might mean).  If that's the case for Git too, then relying on
assert() to provide a side-effect (even an optimizer hint side-effect)
won't work and this will actually get slower when built for "release
mode".

Just a thought...

Re: [ANNOUNCE] Git v2.19.0-rc0




On 8/22/2018 3:39 AM, Ævar Arnfjörð Bjarmason wrote:

On Wed, Aug 22, 2018 at 8:20 AM Jeff King  wrote:

On Wed, Aug 22, 2018 at 05:36:26AM +, brian m. carlson wrote:


On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:
I don't know if something like this is an improvement or now, but this
seems to at least compile:

diff --git a/cache.h b/cache.h
index 1398b2a4e4..3207f74771 100644
--- a/cache.h
+++ b/cache.h
@@ -1033,7 +1033,13 @@ extern const struct object_id null_oid;

  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
*sha2)
  {
- return memcmp(sha1, sha2, the_hash_algo->rawsz);
+ switch (the_hash_algo->rawsz) {
+ case 20:
+ case 32:
+ return memcmp(sha1, sha2, the_hash_algo->rawsz);
+ default:
+ assert(0);
+ }

I think that would end up with the same slow code, as gcc would rather
call memcmp than expand out the two sets of asm.


I won't have time to sit down and test this out until tomorrow afternoon
at the earliest.  If you want to send in something in the mean time,
even if that limits things to just 20 for now, that's fine.

I don't have a good option. The assert() thing works until I add in the
"32" branch, but that's just punting the issue off until you add support
for the new hash.

Hand-rolling our own asm or C is a portability headache, and we need to
change all of the callsites to use a new hasheq().

Hiding it behind a per-hash function is conceptually cleanest, but not
quite as fast. And it also requires hasheq().

So all of the solutions seem non-trivial.  Again, I'm starting to wonder
if it's worth chasing this few percent.

Did you try __builtin_expect? It's a GCC builtin for these sorts of
situations, and sometimes helps:
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

I.e. you'd tell GCC we expect to have the 20 there with:

 if (__builtin_expect(the_hash_algo->rawsz == 20, 1)) { ... }

The perl codebase has LIKELY() and UNLIKELY() macros for this which if
the feature isn't available fall back on just plain C code:
https://github.com/Perl/perl5/blob/v5.27.7/perl.h#L3335-L3344
The other thing I was going to recommend (and I'll try to test this out 
myself later) is to see if 'the_hash_algo->rawsz' is being treated as a 
volatile variable, since it is being referenced through a pointer. 
Perhaps storing the value locally and then casing on it would help?

Re: [ANNOUNCE] Git v2.19.0-rc0


On 8/21/2018 11:36 PM, Jeff King wrote:

On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:


with the obvious "oideq()" implementation added, that seems to get me to
2-3%. Not _quite_ as good as the original branching version I showed.
And we had to touch all the callsites (although arguably that kind of
"eq" function is a better interface anyway, since it obviously allows
for more optimization.

So maybe the branching thing is actually not so insane. It makes new
hash_algo's Just Work; they just won't be optimized. And the change is
very localized.

Hmph. So I went back to double-check my measurements on that branching
version, and I couldn't replicate it!

I'm actually relieved to see this, as I couldn't either.


It turns out what I showed (and measured) before has a bug. Can you see
it?
I had rewritten the section from scratch instead of applying your diff, 
so I didn't get the sha1-sha1 error. I decided to sleep on it instead of 
sending my email.



So the assert() version really is the fastest. I didn't test, but I
suspect we could "trick" the compiler by having the fallback call an
opaque wrapper around memcmp(). That would prevent it from combining the
two paths, and presumably it would still optimize the constant-20 side.
Or maybe it would eventually decide our inline function is getting too
big and scrap it. Which probably crosses a line of craziness (if I
didn't already cross it two emails ago).

I appreciate your effort here.

Thanks
-Stolee

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-22 Thread Ævar Arnfjörð Bjarmason

On Wed, Aug 22, 2018 at 8:20 AM Jeff King  wrote:
>
> On Wed, Aug 22, 2018 at 05:36:26AM +, brian m. carlson wrote:
>
> > On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:
> > > So I wonder if there's some other way to tell the compiler that we'll
> > > only have a few values. An enum comes to mind, though I don't think the
> > > enum rules are strict enough to make this guarantee (after all, it's OK
> > > to bitwise-OR enums, so they clearly don't specify all possible values).
> >
> > I was thinking about this:
> >
> > diff --git a/cache.h b/cache.h
> > index 1398b2a4e4..1f5c6e9319 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -1033,7 +1033,14 @@ extern const struct object_id null_oid;
> >
> >  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> > *sha2)
> >  {
> > - return memcmp(sha1, sha2, the_hash_algo->rawsz);
> > + switch (the_hash_algo->rawsz) {
> > + case 20:
> > + return memcmp(sha1, sha2, 20);
> > + case 32:
> > + return memcmp(sha1, sha2, 32);
> > + default:
> > + assert(0);
> > + }
> >  }
>
> Unfortunately this version doesn't seem to be any faster than the status
> quo. And looking at the generated asm, it still looks to be calling
> memcpy(). Removing the "case 32" branch switches it back to fast
> assembly (this is all using gcc 8.2.0, btw). So I think we're deep into
> guessing what the optimizer is going to do, and there's a good chance
> that other versions are going to optimize it differently.
>
> We might be better off just writing it out manually. Unfortunately, it's
> a bit hard because the neg/0/pos return is more expensive to compute
> than pure equality. And only the compiler knows at each inlined site
> whether we actually want equality. So now we're back to switching every
> caller to use hasheq() if that's what they want.
>
> But _if_ we're OK with that, and _if_ we don't mind some ifdefs for
> portability, then this seems as fast as the original (memcmp+constant)
> code on my machine:
>
> diff --git a/cache.h b/cache.h
> index b1fd3d58ab..c406105f3c 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1023,7 +1023,16 @@ extern const struct object_id null_oid;
>
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> *sha2)
>  {
> -   return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +   switch (the_hash_algo->rawsz) {
> +   case 20:
> +   if (*(uint32_t *)sha1 == *(uint32_t *)sha2 &&
> +   *(unsigned __int128 *)(sha1+4) == *(unsigned __int128 
> *)(sha2+4))
> +   return 0;
> +   case 32:
> +   return memcmp(sha1, sha2, 32);
> +   default:
> +   assert(0);
> +   }
>  }
>
>  static inline int oidcmp(const struct object_id *oid1, const struct 
> object_id *oid2)
>
> Which is really no surprise, because the generated asm looks about the
> same. There are obviously alignment questions there. It's possible it
> could even be written portably as a simple loop. Or maybe not. We used
> to do that, but modern compilers were able to optimize the memcmp
> better. Maybe that's changed. Or maybe they were simply unwilling to
> unroll a 20-length loop to find out that it could be turned into a few
> quad-word compares.
>
> > That would make it obvious that there are at most two options.
> > Unfortunately, gcc for me determines that the buffer in walker.c is 20
> > bytes in size and steadfastly refuses to compile because it doesn't know
> > that the value will never be 32 in our codebase currently.  I'd need to
> > send in more patches before it would compile.
>
> Yeah, I see that warning all over the place (everywhere that calls
> is_null_oid(), which is passing in a 20-byte buffer).
>
> > I don't know if something like this is an improvement or now, but this
> > seems to at least compile:
> >
> > diff --git a/cache.h b/cache.h
> > index 1398b2a4e4..3207f74771 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -1033,7 +1033,13 @@ extern const struct object_id null_oid;
> >
> >  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> > *sha2)
> >  {
> > - return memcmp(sha1, sha2, the_hash_algo->rawsz);
> > + switch (the_hash_algo->rawsz) {
> > + case 20:
> > + case 32:
> > + return memcmp(sha1, sha2, the_hash_algo->rawsz);
> > + default:
> > + assert(0);
> > + }
>
> I think that would end up with the same slow code, as gcc would rather
> call memcmp than expand out the two sets of asm.
>
> > I won't have time to sit down and test this out until tomorrow afternoon
> > at the earliest.  If you want to send in something in the mean time,
> > even if that limits things to just 20 for now, that's fine.
>
> I don't have a good option. The assert() thing works until I add in the
> "32" branch, but that's just punting the issue off until

Re: [ANNOUNCE] Git v2.19.0-rc0