Re: [PATCH 0/1] send-pack: set core.warnAmbiguousRefs=false

2018-11-06 Thread Derrick Stolee

On 11/6/2018 2:51 PM, Jeff King wrote:

On Tue, Nov 06, 2018 at 02:44:42PM -0500, Jeff King wrote:


The fix for this is simple: set core.warnAmbiguousRefs to false for this
specific call of git pack-objects coming from git send-pack. We don't want
to default it to false for all calls to git pack-objects, as it is valid to
pass ref names instead of object ids. This helps regain these seconds during
a push.

I don't think you actually care about the ambiguity check between refs
here; you just care about avoiding the ref check when we've seen (and
are mostly expecting) a 40-hex sha1. We have a more specific flag for
that: warn_on_object_refname_ambiguity.

And I think it would be OK to enable that all the time for pack-objects,
which is plumbing that does typically expect object names. See prior art
in 25fba78d36 (cat-file: disable object/refname ambiguity check for
batch mode, 2013-07-12) and 4c30d50402 (rev-list: disable object/refname
ambiguity check with --stdin, 2014-03-12).

I'd probably do it here:

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e50c6cd1ff..d370638a5d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3104,6 +3104,7 @@ static void get_object_list(int ac, const char **av)


Scoping the change into get_object_list does make sense. I was doing it 
a level higher, which is not worth it. I'll reproduce your change here.



struct rev_info revs;
char line[1000];
int flags = 0;
+   int save_warning;
  
  	repo_init_revisions(the_repository, , NULL);

save_commit_buffer = 0;
@@ -3112,6 +3113,9 @@ static void get_object_list(int ac, const char **av)
/* make sure shallows are read */
is_repository_shallow(the_repository);
  
+	save_warning = warn_on_object_refname_ambiguity;

+   warn_on_object_refname_ambiguity = 0;
+
while (fgets(line, sizeof(line), stdin) != NULL) {
int len = strlen(line);
if (len && line[len - 1] == '\n')
@@ -3138,6 +3142,8 @@ static void get_object_list(int ac, const char **av)
die(_("bad revision '%s'"), line);
}
  
+	warn_on_object_refname_ambiguity = save_warning;

+
if (use_bitmap_index && !get_object_list_from_bitmap())
return;
  


But I'll leave it to you to wrap that up in a patch, since you probably
should re-check your timings (which it would be interesting to include
in the commit message, if you have reproducible timings).


The timings change a lot depending on the disk cache and the remote 
refs, which is unfortunate, but I have measured a three-second improvement.


Thanks,
-Stolee


Re: [PATCH 0/1] send-pack: set core.warnAmbiguousRefs=false

2018-11-06 Thread Derrick Stolee

On 11/6/2018 2:44 PM, Jeff King wrote:

On Tue, Nov 06, 2018 at 11:13:47AM -0800, Derrick Stolee via GitGitGadget wrote:


I've been looking into the performance of git push for very large repos. Our
users are reporting that 60-80% of git push time is spent during the
"Enumerating objects" phase of git pack-objects.

A git push process runs several processes during its run, but one includes
git send-pack which calls git pack-objects and passes the known have/wants
into stdin using object ids. However, the default setting for
core.warnAmbiguousRefs requires git pack-objects to check for ref names
matching the ref_rev_parse_rules array in refs.c. This means that every
object is triggering at least six "file exists?" queries.

When there are a lot of refs, this can add up significantly! My PerfView
trace for a simple push measured 3 seconds spent checking these paths.

Some of this might be useful in the commit message. :)


The fix for this is simple: set core.warnAmbiguousRefs to false for this
specific call of git pack-objects coming from git send-pack. We don't want
to default it to false for all calls to git pack-objects, as it is valid to
pass ref names instead of object ids. This helps regain these seconds during
a push.

I don't think you actually care about the ambiguity check between refs
here; you just care about avoiding the ref check when we've seen (and
are mostly expecting) a 40-hex sha1. We have a more specific flag for
that: warn_on_object_refname_ambiguity.

And I think it would be OK to enable that all the time for pack-objects,
which is plumbing that does typically expect object names. See prior art
in 25fba78d36 (cat-file: disable object/refname ambiguity check for
batch mode, 2013-07-12) and 4c30d50402 (rev-list: disable object/refname
ambiguity check with --stdin, 2014-03-12).
Thanks for these pointers. Helps to know there is precedent for shutting 
down the

behavior without relying on "-c" flags.


Whenever I see a change like this to the pack-objects invocation for
send-pack, it makes me wonder if upload-pack would want the same thing.

It's a moot point if we just set the flag directly in inside
pack-objects, though.

I'll send a v2 that does just that.

Thanks,
-Stolee


Re: [PATCH 0/1] send-pack: set core.warnAmbiguousRefs=false

2018-11-06 Thread Jeff King
On Tue, Nov 06, 2018 at 02:44:42PM -0500, Jeff King wrote:

> > The fix for this is simple: set core.warnAmbiguousRefs to false for this
> > specific call of git pack-objects coming from git send-pack. We don't want
> > to default it to false for all calls to git pack-objects, as it is valid to
> > pass ref names instead of object ids. This helps regain these seconds during
> > a push.
> 
> I don't think you actually care about the ambiguity check between refs
> here; you just care about avoiding the ref check when we've seen (and
> are mostly expecting) a 40-hex sha1. We have a more specific flag for
> that: warn_on_object_refname_ambiguity.
> 
> And I think it would be OK to enable that all the time for pack-objects,
> which is plumbing that does typically expect object names. See prior art
> in 25fba78d36 (cat-file: disable object/refname ambiguity check for
> batch mode, 2013-07-12) and 4c30d50402 (rev-list: disable object/refname
> ambiguity check with --stdin, 2014-03-12).

I'd probably do it here:

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e50c6cd1ff..d370638a5d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3104,6 +3104,7 @@ static void get_object_list(int ac, const char **av)
struct rev_info revs;
char line[1000];
int flags = 0;
+   int save_warning;
 
repo_init_revisions(the_repository, , NULL);
save_commit_buffer = 0;
@@ -3112,6 +3113,9 @@ static void get_object_list(int ac, const char **av)
/* make sure shallows are read */
is_repository_shallow(the_repository);
 
+   save_warning = warn_on_object_refname_ambiguity;
+   warn_on_object_refname_ambiguity = 0;
+
while (fgets(line, sizeof(line), stdin) != NULL) {
int len = strlen(line);
if (len && line[len - 1] == '\n')
@@ -3138,6 +3142,8 @@ static void get_object_list(int ac, const char **av)
die(_("bad revision '%s'"), line);
}
 
+   warn_on_object_refname_ambiguity = save_warning;
+
if (use_bitmap_index && !get_object_list_from_bitmap())
return;
 

But I'll leave it to you to wrap that up in a patch, since you probably
should re-check your timings (which it would be interesting to include
in the commit message, if you have reproducible timings).

-Peff


Re: [PATCH 0/1] send-pack: set core.warnAmbiguousRefs=false

2018-11-06 Thread Jeff King
On Tue, Nov 06, 2018 at 11:13:47AM -0800, Derrick Stolee via GitGitGadget wrote:

> I've been looking into the performance of git push for very large repos. Our
> users are reporting that 60-80% of git push time is spent during the
> "Enumerating objects" phase of git pack-objects.
> 
> A git push process runs several processes during its run, but one includes 
> git send-pack which calls git pack-objects and passes the known have/wants
> into stdin using object ids. However, the default setting for 
> core.warnAmbiguousRefs requires git pack-objects to check for ref names
> matching the ref_rev_parse_rules array in refs.c. This means that every
> object is triggering at least six "file exists?" queries.
> 
> When there are a lot of refs, this can add up significantly! My PerfView
> trace for a simple push measured 3 seconds spent checking these paths.

Some of this might be useful in the commit message. :)

> The fix for this is simple: set core.warnAmbiguousRefs to false for this
> specific call of git pack-objects coming from git send-pack. We don't want
> to default it to false for all calls to git pack-objects, as it is valid to
> pass ref names instead of object ids. This helps regain these seconds during
> a push.

I don't think you actually care about the ambiguity check between refs
here; you just care about avoiding the ref check when we've seen (and
are mostly expecting) a 40-hex sha1. We have a more specific flag for
that: warn_on_object_refname_ambiguity.

And I think it would be OK to enable that all the time for pack-objects,
which is plumbing that does typically expect object names. See prior art
in 25fba78d36 (cat-file: disable object/refname ambiguity check for
batch mode, 2013-07-12) and 4c30d50402 (rev-list: disable object/refname
ambiguity check with --stdin, 2014-03-12).

> Derrick Stolee (1):
>   send-pack: set core.warnAmbiguousRefs=false
> 
>  send-pack.c | 2 ++
>  1 file changed, 2 insertions(+)

Whenever I see a change like this to the pack-objects invocation for
send-pack, it makes me wonder if upload-pack would want the same thing.

It's a moot point if we just set the flag directly in inside
pack-objects, though.

-Peff


[PATCH 0/1] send-pack: set core.warnAmbiguousRefs=false

2018-11-06 Thread Derrick Stolee via GitGitGadget
I've been looking into the performance of git push for very large repos. Our
users are reporting that 60-80% of git push time is spent during the
"Enumerating objects" phase of git pack-objects.

A git push process runs several processes during its run, but one includes 
git send-pack which calls git pack-objects and passes the known have/wants
into stdin using object ids. However, the default setting for 
core.warnAmbiguousRefs requires git pack-objects to check for ref names
matching the ref_rev_parse_rules array in refs.c. This means that every
object is triggering at least six "file exists?" queries.

When there are a lot of refs, this can add up significantly! My PerfView
trace for a simple push measured 3 seconds spent checking these paths.

The fix for this is simple: set core.warnAmbiguousRefs to false for this
specific call of git pack-objects coming from git send-pack. We don't want
to default it to false for all calls to git pack-objects, as it is valid to
pass ref names instead of object ids. This helps regain these seconds during
a push.

In addition to this patch submission, we are looking into merging it into
our fork sooner [1].

[1] https://github.com/Microsoft/git/pull/67

Derrick Stolee (1):
  send-pack: set core.warnAmbiguousRefs=false

 send-pack.c | 2 ++
 1 file changed, 2 insertions(+)


base-commit: cae598d9980661a978e2df4fb338518f7bf09572
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-68%2Fderrickstolee%2Fsend-pack-config-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-68/derrickstolee/send-pack-config-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/68
-- 
gitgitgadget