Re: [PATCH 0/1] send-pack: set core.warnAmbiguousRefs=false
On 11/6/2018 2:51 PM, Jeff King wrote: On Tue, Nov 06, 2018 at 02:44:42PM -0500, Jeff King wrote: The fix for this is simple: set core.warnAmbiguousRefs to false for this specific call of git pack-objects coming from git send-pack. We don't want to default it to false for all calls to git pack-objects, as it is valid to pass ref names instead of object ids. This helps regain these seconds during a push. I don't think you actually care about the ambiguity check between refs here; you just care about avoiding the ref check when we've seen (and are mostly expecting) a 40-hex sha1. We have a more specific flag for that: warn_on_object_refname_ambiguity. And I think it would be OK to enable that all the time for pack-objects, which is plumbing that does typically expect object names. See prior art in 25fba78d36 (cat-file: disable object/refname ambiguity check for batch mode, 2013-07-12) and 4c30d50402 (rev-list: disable object/refname ambiguity check with --stdin, 2014-03-12). I'd probably do it here: diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index e50c6cd1ff..d370638a5d 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3104,6 +3104,7 @@ static void get_object_list(int ac, const char **av) Scoping the change into get_object_list does make sense. I was doing it a level higher, which is not worth it. I'll reproduce your change here. struct rev_info revs; char line[1000]; int flags = 0; + int save_warning; repo_init_revisions(the_repository, , NULL); save_commit_buffer = 0; @@ -3112,6 +3113,9 @@ static void get_object_list(int ac, const char **av) /* make sure shallows are read */ is_repository_shallow(the_repository); + save_warning = warn_on_object_refname_ambiguity; + warn_on_object_refname_ambiguity = 0; + while (fgets(line, sizeof(line), stdin) != NULL) { int len = strlen(line); if (len && line[len - 1] == '\n') @@ -3138,6 +3142,8 @@ static void get_object_list(int ac, const char **av) die(_("bad revision '%s'"), line); } + warn_on_object_refname_ambiguity = save_warning; + if (use_bitmap_index && !get_object_list_from_bitmap()) return; But I'll leave it to you to wrap that up in a patch, since you probably should re-check your timings (which it would be interesting to include in the commit message, if you have reproducible timings). The timings change a lot depending on the disk cache and the remote refs, which is unfortunate, but I have measured a three-second improvement. Thanks, -Stolee
Re: [PATCH 0/1] send-pack: set core.warnAmbiguousRefs=false
On 11/6/2018 2:44 PM, Jeff King wrote: On Tue, Nov 06, 2018 at 11:13:47AM -0800, Derrick Stolee via GitGitGadget wrote: I've been looking into the performance of git push for very large repos. Our users are reporting that 60-80% of git push time is spent during the "Enumerating objects" phase of git pack-objects. A git push process runs several processes during its run, but one includes git send-pack which calls git pack-objects and passes the known have/wants into stdin using object ids. However, the default setting for core.warnAmbiguousRefs requires git pack-objects to check for ref names matching the ref_rev_parse_rules array in refs.c. This means that every object is triggering at least six "file exists?" queries. When there are a lot of refs, this can add up significantly! My PerfView trace for a simple push measured 3 seconds spent checking these paths. Some of this might be useful in the commit message. :) The fix for this is simple: set core.warnAmbiguousRefs to false for this specific call of git pack-objects coming from git send-pack. We don't want to default it to false for all calls to git pack-objects, as it is valid to pass ref names instead of object ids. This helps regain these seconds during a push. I don't think you actually care about the ambiguity check between refs here; you just care about avoiding the ref check when we've seen (and are mostly expecting) a 40-hex sha1. We have a more specific flag for that: warn_on_object_refname_ambiguity. And I think it would be OK to enable that all the time for pack-objects, which is plumbing that does typically expect object names. See prior art in 25fba78d36 (cat-file: disable object/refname ambiguity check for batch mode, 2013-07-12) and 4c30d50402 (rev-list: disable object/refname ambiguity check with --stdin, 2014-03-12). Thanks for these pointers. Helps to know there is precedent for shutting down the behavior without relying on "-c" flags. Whenever I see a change like this to the pack-objects invocation for send-pack, it makes me wonder if upload-pack would want the same thing. It's a moot point if we just set the flag directly in inside pack-objects, though. I'll send a v2 that does just that. Thanks, -Stolee
Re: [PATCH 0/1] send-pack: set core.warnAmbiguousRefs=false
On Tue, Nov 06, 2018 at 02:44:42PM -0500, Jeff King wrote: > > The fix for this is simple: set core.warnAmbiguousRefs to false for this > > specific call of git pack-objects coming from git send-pack. We don't want > > to default it to false for all calls to git pack-objects, as it is valid to > > pass ref names instead of object ids. This helps regain these seconds during > > a push. > > I don't think you actually care about the ambiguity check between refs > here; you just care about avoiding the ref check when we've seen (and > are mostly expecting) a 40-hex sha1. We have a more specific flag for > that: warn_on_object_refname_ambiguity. > > And I think it would be OK to enable that all the time for pack-objects, > which is plumbing that does typically expect object names. See prior art > in 25fba78d36 (cat-file: disable object/refname ambiguity check for > batch mode, 2013-07-12) and 4c30d50402 (rev-list: disable object/refname > ambiguity check with --stdin, 2014-03-12). I'd probably do it here: diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index e50c6cd1ff..d370638a5d 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3104,6 +3104,7 @@ static void get_object_list(int ac, const char **av) struct rev_info revs; char line[1000]; int flags = 0; + int save_warning; repo_init_revisions(the_repository, , NULL); save_commit_buffer = 0; @@ -3112,6 +3113,9 @@ static void get_object_list(int ac, const char **av) /* make sure shallows are read */ is_repository_shallow(the_repository); + save_warning = warn_on_object_refname_ambiguity; + warn_on_object_refname_ambiguity = 0; + while (fgets(line, sizeof(line), stdin) != NULL) { int len = strlen(line); if (len && line[len - 1] == '\n') @@ -3138,6 +3142,8 @@ static void get_object_list(int ac, const char **av) die(_("bad revision '%s'"), line); } + warn_on_object_refname_ambiguity = save_warning; + if (use_bitmap_index && !get_object_list_from_bitmap()) return; But I'll leave it to you to wrap that up in a patch, since you probably should re-check your timings (which it would be interesting to include in the commit message, if you have reproducible timings). -Peff
Re: [PATCH 0/1] send-pack: set core.warnAmbiguousRefs=false
On Tue, Nov 06, 2018 at 11:13:47AM -0800, Derrick Stolee via GitGitGadget wrote: > I've been looking into the performance of git push for very large repos. Our > users are reporting that 60-80% of git push time is spent during the > "Enumerating objects" phase of git pack-objects. > > A git push process runs several processes during its run, but one includes > git send-pack which calls git pack-objects and passes the known have/wants > into stdin using object ids. However, the default setting for > core.warnAmbiguousRefs requires git pack-objects to check for ref names > matching the ref_rev_parse_rules array in refs.c. This means that every > object is triggering at least six "file exists?" queries. > > When there are a lot of refs, this can add up significantly! My PerfView > trace for a simple push measured 3 seconds spent checking these paths. Some of this might be useful in the commit message. :) > The fix for this is simple: set core.warnAmbiguousRefs to false for this > specific call of git pack-objects coming from git send-pack. We don't want > to default it to false for all calls to git pack-objects, as it is valid to > pass ref names instead of object ids. This helps regain these seconds during > a push. I don't think you actually care about the ambiguity check between refs here; you just care about avoiding the ref check when we've seen (and are mostly expecting) a 40-hex sha1. We have a more specific flag for that: warn_on_object_refname_ambiguity. And I think it would be OK to enable that all the time for pack-objects, which is plumbing that does typically expect object names. See prior art in 25fba78d36 (cat-file: disable object/refname ambiguity check for batch mode, 2013-07-12) and 4c30d50402 (rev-list: disable object/refname ambiguity check with --stdin, 2014-03-12). > Derrick Stolee (1): > send-pack: set core.warnAmbiguousRefs=false > > send-pack.c | 2 ++ > 1 file changed, 2 insertions(+) Whenever I see a change like this to the pack-objects invocation for send-pack, it makes me wonder if upload-pack would want the same thing. It's a moot point if we just set the flag directly in inside pack-objects, though. -Peff
[PATCH 0/1] send-pack: set core.warnAmbiguousRefs=false
I've been looking into the performance of git push for very large repos. Our users are reporting that 60-80% of git push time is spent during the "Enumerating objects" phase of git pack-objects. A git push process runs several processes during its run, but one includes git send-pack which calls git pack-objects and passes the known have/wants into stdin using object ids. However, the default setting for core.warnAmbiguousRefs requires git pack-objects to check for ref names matching the ref_rev_parse_rules array in refs.c. This means that every object is triggering at least six "file exists?" queries. When there are a lot of refs, this can add up significantly! My PerfView trace for a simple push measured 3 seconds spent checking these paths. The fix for this is simple: set core.warnAmbiguousRefs to false for this specific call of git pack-objects coming from git send-pack. We don't want to default it to false for all calls to git pack-objects, as it is valid to pass ref names instead of object ids. This helps regain these seconds during a push. In addition to this patch submission, we are looking into merging it into our fork sooner [1]. [1] https://github.com/Microsoft/git/pull/67 Derrick Stolee (1): send-pack: set core.warnAmbiguousRefs=false send-pack.c | 2 ++ 1 file changed, 2 insertions(+) base-commit: cae598d9980661a978e2df4fb338518f7bf09572 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-68%2Fderrickstolee%2Fsend-pack-config-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-68/derrickstolee/send-pack-config-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/68 -- gitgitgadget