Re: [PATCH] sha1_file: use strbuf_add() instead of strbuf_addf()

2017-12-01 Thread Jeff King
On Fri, Dec 01, 2017 at 02:50:05PM -0500, Derrick Stolee wrote:

> > > + baselen = path->len;
> > We set this here so that the '/' is included as part of the base. Makes
> > sense, but can we now drop the earlier setting of baselen before the
> > opendir() call?
> 
> Yeah, probably. I had briefly considered just adding the '/' before the
> first assignment of "baselen", but didn't want to change the error output. I
> also don't know if there are side effects for other platforms by calling
> opendir() with a '/'-terminated path.

I noticed that, too. Since it's so easy to keep doing the opendir
without the slash, I'd prefer to avoid finding out if there are such
platforms. :)

> Good catch! A big reason to pull it inside and use strbuf_add over
> strbuf_addstr is to avoid a duplicate strlen() calculation. However, I can
> store the length before the conditional.

I'd give 50/50 odds no whether a compiler could optimize out that
strlen. We inline addstr exactly so that callsites can see that strlen
(it's primarily for string literals, where it can become a compile-time
constant, but I think it could apply here). But sometimes C's pointer
aliasing rules can be surprising in blocking "obviously correct"
optimizations like that.

The generated asm is a little dense, but I _think_ "gcc -O2" does in
fact do this with a single strlen based on the following tweak on top of
your patch:

diff --git a/sha1_file.c b/sha1_file.c
index 2160323c4a..f234519744 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1921,11 +1921,12 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
if (is_dot_or_dotdot(de->d_name))
continue;
 
+   strbuf_setlen(path, baselen);
+   strbuf_addstr(path, de->d_name);
+
if (strlen(de->d_name) == GIT_SHA1_HEXSZ - 2 &&
!hex_to_bytes(oid.hash + 1, de->d_name,
  GIT_SHA1_RAWSZ - 1)) {
-   strbuf_setlen(path, baselen);
-   strbuf_add(path, de->d_name, GIT_SHA1_HEXSZ - 2);
if (obj_cb) {
r = obj_cb(, path->buf, data);
if (r)

Not that I overly mind the manual assignment of the strlen result in
this particular case. But I'm a curious fellow by nature, and knowing
these kinds of answers helps us build up an accurate gut instinct for
future cases.

> Small change by storing the length in advance of the conditional:
> 
> while (de = readdir(...)) {
>     int namelen = strlen(de->d_name);
>     strbuf_setlen(path, baselen);
>     strbuf_add(path, de->d_name, namelen);
> 
>     if (namelen == HEXSZ - 2)
>         obj_cb(path->buf)
>     else
>         cruft_cb(path->buf);
> }

Yup, I don't mind that approach either, but do please use size_t to
store the result of strlen (I know it's nearly impossible to overflow in
this case, but I've been trying to push the codebase in that direction
slowly over time).

> >- there's an extra micro-optimization there, which is that if there's
> >  no obj_cb, we have no need to assemble the full path at all. I doubt
> >  it makes much of a difference, as most callers would pass an object
> >  callback (I'd be surprised if we even have one that doesn't).
> 
> After doing a few 'git grep' commands, I found several that include a NULL
> cruft_cb but none that have a NULL obj_cb.

Yeah, that agrees with my cursory look.

Thanks!

-Peff


Re: [PATCH] sha1_file: use strbuf_add() instead of strbuf_addf()

2017-12-01 Thread Derrick Stolee

On 12/1/2017 1:22 PM, Jeff King wrote:

On Fri, Dec 01, 2017 at 12:49:56PM -0500, Derrick Stolee wrote:

[snip]

diff --git a/sha1_file.c b/sha1_file.c
index 8ae6cb6285..2160323c4a 100644

This overall looks good, but I noticed one bug and a few cosmetic
improvements.


Thanks for finding quality improvements to this patch. I'll let it sit 
over the weekend for more feedback before sending a v2.





--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1914,17 +1914,18 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
}
  
  	oid.hash[0] = subdir_nr;

+   strbuf_add(path, "/", 1);

Maybe:

   strbuf_addch(path, '/');

would be a bit more readable (it's also a tiny bit faster, but this
isn't part of the tight loop).


+   baselen = path->len;

We set this here so that the '/' is included as part of the base. Makes
sense, but can we now drop the earlier setting of baselen before the
opendir() call?


Yeah, probably. I had briefly considered just adding the '/' before the 
first assignment of "baselen", but didn't want to change the error 
output. I also don't know if there are side effects for other platforms 
by calling opendir() with a '/'-terminated path.



while ((de = readdir(dir))) {
if (is_dot_or_dotdot(de->d_name))
continue;
  
-		strbuf_setlen(path, baselen);

-   strbuf_addf(path, "/%s", de->d_name);
-
if (strlen(de->d_name) == GIT_SHA1_HEXSZ - 2 &&
!hex_to_bytes(oid.hash + 1, de->d_name,
  GIT_SHA1_RAWSZ - 1)) {
+   strbuf_setlen(path, baselen);
+   strbuf_add(path, de->d_name, GIT_SHA1_HEXSZ - 2);

Moving this code into the conditional makes sense here, since that's
where we know the actual size. But what about the case when we _don't_
hit this conditional. We still do:

   if (cruft_cb)
 cruft_cb(path->buf);

which is now broken (it will just get the directory name without the
entry appended).


Good catch! A big reason to pull it inside and use strbuf_add over 
strbuf_addstr is to avoid a duplicate strlen() calculation. However, I 
can store the length before the conditional.



I think the optimized versions probably needs to be something more like:

   while (de = readdir(...)) {
   strbuf_setlen(path, baselen);
   if (size is HEXSZ - 2) {
   strbuf_add(path, de->d_name, HEXSZ - 2);
  obj_cb(path->buf);
   } else if (cruft_cb) {
   strbuf_addstr(path, de->d_name);
  cruft_cb(path->buf);
   }
   }


Small change by storing the length in advance of the conditional:

while (de = readdir(...)) {
    int namelen = strlen(de->d_name);
    strbuf_setlen(path, baselen);
    strbuf_add(path, de->d_name, namelen);

    if (namelen == HEXSZ - 2)
        obj_cb(path->buf)
    else
        cruft_cb(path->buf);
}


Two other thoughts:

   - from past discussions, I suspect most of your performance
 improvement actually comes from avoiding addf(), and the difference
 between addstr() and add() may be negligible here. It might be worth
 timing strbuf_addstr(). If that's similarly fast, that means we
 could keep the logic out of the conditional entirely.


addstr() duplicates the strlen(), which isn't much but we might as well 
save cycles where we can if it isn't too hard.



   - there's an extra micro-optimization there, which is that if there's
 no obj_cb, we have no need to assemble the full path at all. I doubt
 it makes much of a difference, as most callers would pass an object
 callback (I'd be surprised if we even have one that doesn't).


After doing a few 'git grep' commands, I found several that include a 
NULL cruft_cb but none that have a NULL obj_cb.


Thanks,
-Stolee


Re: [PATCH] sha1_file: use strbuf_add() instead of strbuf_addf()

2017-12-01 Thread Jeff King
On Fri, Dec 01, 2017 at 12:49:56PM -0500, Derrick Stolee wrote:

> Replace use of strbuf_addf() with strbuf_add() when enumerating
> loose objects in for_each_file_in_obj_subdir(). Since we already
> check the length and hex-values of the string before consuming
> the path, we can prevent extra computation by using the lower-
> level method.

Makes sense. It's a shame that "addf" is turning out to be so slow (I'm
still mildly curious if that differs between Windows and glibc, but
there are so many other variables between the two platforms it's hard to
test).

> One consumer of for_each_file_in_obj_subdir() is the abbreviation
> code. OID abbreviations use a cached list of loose objects (per
> object subdirectory) to make repeated queries fast, but there is
> significant cache load time when there are many loose objects.
> 
> Most repositories do not have many loose objects before repacking,
> but in the GVFS case the repos can grow to have millions of loose
> objects. Profiling 'git log' performance in GitForWindows on a
> GVFS-enabled repo with ~2.5 million loose objects revealed 12% of
> the CPU time was spent in strbuf_addf().

Yeah, we haven't heavily micro-optimized the case for having lots of
loose objects. The general philosophy about having lots of loose objects
is: don't. You're generally going to pay a system call for each access,
which is much heavier-weight than packfiles.

I think abbreviation-finding is the exception there, though, because we
literally just readdir() the entries and never do anything else with
them.

> Add a new performance test to p4211-line-log.sh that is more
> sensitive to this cache-loading. By limiting to 1000 commits, we
> more closely resemble user wait time when reading history into a
> pager.
> 
> For a copy of the Linux repo with two ~512 MB packfiles and ~572K
> loose objects, running 'git log --oneline --raw --parents -1000'
> had the following performance:
> 
>  HEAD~1HEAD
> 
>  7.70(7.15+0.54)   7.44(7.09+0.29) -3.4%

Thanks for including numbers. I think the setup here highlights a
weakness of the perf suite that we've discussed: if there's a
performance regression for your case, nobody is likely to notice because
they won't test on a repo with 500k loose objects.

Probably "repo with a bunch of loose objects" ought to be a stock
repository profile for doing regression runs.

> diff --git a/sha1_file.c b/sha1_file.c
> index 8ae6cb6285..2160323c4a 100644

This overall looks good, but I noticed one bug and a few cosmetic
improvements.

> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1914,17 +1914,18 @@ int for_each_file_in_obj_subdir(unsigned int 
> subdir_nr,
>   }
>  
>   oid.hash[0] = subdir_nr;
> + strbuf_add(path, "/", 1);

Maybe:

  strbuf_addch(path, '/');

would be a bit more readable (it's also a tiny bit faster, but this
isn't part of the tight loop).

> + baselen = path->len;

We set this here so that the '/' is included as part of the base. Makes
sense, but can we now drop the earlier setting of baselen before the
opendir() call?

>   while ((de = readdir(dir))) {
>   if (is_dot_or_dotdot(de->d_name))
>   continue;
>  
> - strbuf_setlen(path, baselen);
> - strbuf_addf(path, "/%s", de->d_name);
> -
>   if (strlen(de->d_name) == GIT_SHA1_HEXSZ - 2 &&
>   !hex_to_bytes(oid.hash + 1, de->d_name,
> GIT_SHA1_RAWSZ - 1)) {
> + strbuf_setlen(path, baselen);
> + strbuf_add(path, de->d_name, GIT_SHA1_HEXSZ - 2);

Moving this code into the conditional makes sense here, since that's
where we know the actual size. But what about the case when we _don't_
hit this conditional. We still do:

  if (cruft_cb)
cruft_cb(path->buf);

which is now broken (it will just get the directory name without the
entry appended).

I think the optimized versions probably needs to be something more like:

  while (de = readdir(...)) {
  strbuf_setlen(path, baselen);
  if (size is HEXSZ - 2) {
  strbuf_add(path, de->d_name, HEXSZ - 2);
  obj_cb(path->buf);
  } else if (cruft_cb) {
  strbuf_addstr(path, de->d_name);
  cruft_cb(path->buf);
  }
  }

Two other thoughts:

  - from past discussions, I suspect most of your performance
improvement actually comes from avoiding addf(), and the difference
between addstr() and add() may be negligible here. It might be worth
timing strbuf_addstr(). If that's similarly fast, that means we
could keep the logic out of the conditional entirely.

  - there's an extra micro-optimization there, which is that if there's
no obj_cb, we have no need to assemble the full path at all. I doubt
it makes much of a difference, as most callers would pass an object
callback (I'd be surprised if we even have one that doesn't).

-Peff


[PATCH] sha1_file: use strbuf_add() instead of strbuf_addf()

2017-12-01 Thread Derrick Stolee
Replace use of strbuf_addf() with strbuf_add() when enumerating
loose objects in for_each_file_in_obj_subdir(). Since we already
check the length and hex-values of the string before consuming
the path, we can prevent extra computation by using the lower-
level method.

One consumer of for_each_file_in_obj_subdir() is the abbreviation
code. OID abbreviations use a cached list of loose objects (per
object subdirectory) to make repeated queries fast, but there is
significant cache load time when there are many loose objects.

Most repositories do not have many loose objects before repacking,
but in the GVFS case the repos can grow to have millions of loose
objects. Profiling 'git log' performance in GitForWindows on a
GVFS-enabled repo with ~2.5 million loose objects revealed 12% of
the CPU time was spent in strbuf_addf().

Add a new performance test to p4211-line-log.sh that is more
sensitive to this cache-loading. By limiting to 1000 commits, we
more closely resemble user wait time when reading history into a
pager.

For a copy of the Linux repo with two ~512 MB packfiles and ~572K
loose objects, running 'git log --oneline --raw --parents -1000'
had the following performance:

 HEAD~1HEAD

 7.70(7.15+0.54)   7.44(7.09+0.29) -3.4%

Signed-off-by: Derrick Stolee 
---
 sha1_file.c  | 7 ---
 t/perf/p4211-line-log.sh | 4 
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 8ae6cb6285..2160323c4a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1914,17 +1914,18 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr,
}
 
oid.hash[0] = subdir_nr;
+   strbuf_add(path, "/", 1);
+   baselen = path->len;
 
while ((de = readdir(dir))) {
if (is_dot_or_dotdot(de->d_name))
continue;
 
-   strbuf_setlen(path, baselen);
-   strbuf_addf(path, "/%s", de->d_name);
-
if (strlen(de->d_name) == GIT_SHA1_HEXSZ - 2 &&
!hex_to_bytes(oid.hash + 1, de->d_name,
  GIT_SHA1_RAWSZ - 1)) {
+   strbuf_setlen(path, baselen);
+   strbuf_add(path, de->d_name, GIT_SHA1_HEXSZ - 2);
if (obj_cb) {
r = obj_cb(, path->buf, data);
if (r)
diff --git a/t/perf/p4211-line-log.sh b/t/perf/p4211-line-log.sh
index e0ed05907c..392bcc0e51 100755
--- a/t/perf/p4211-line-log.sh
+++ b/t/perf/p4211-line-log.sh
@@ -35,4 +35,8 @@ test_perf 'git log --oneline --raw --parents' '
git log --oneline --raw --parents >/dev/null
 '
 
+test_perf 'git log --oneline --raw --parents -1000' '
+   git log --oneline --raw --parents -1000 >/dev/null
+'
+
 test_done
-- 
2.15.0