Re: [PATCH 2/2] serialize collection of refs that contain submodule changes
Heiko Voigtwrites: > I am not sure if I understand you correctly here. With the "ref cache layer" > you are referring to add_submodule_odb() which is called indirectly from > submodule_needs_pushing()? Those revs are only used to check whether the hash > we need on the remote side exists in the local submodule. That should not > change due to a push. The actual check whether the commit(s) exist on the > remote side is done using a 'rev-list' in a subprocess later. I was wondering what would happen in this scenario: * You have ON_DEMAND set, which causes "git -C sub push origin" to push out what are new, updating the remote tracking branches in the submodule, sub/.git/refs/remotes/origin/*. * Then you check again. If you used for-each-ref-in-submodule, the updated refs/remotes/origin/* may not have been re-read. But you check by spawning "rev-list ... --not --remotes" as a separate process in submodule_needs_pushing(), and that will force the new process to read the updated state, so it turns out that I was overly worried without good reason ;-) It may matter once somebody tries to internalize the external rev-list call done via start_command() interface to an internal call, though. But we are not there yet.
Re: [PATCH 2/2] serialize collection of refs that contain submodule changes
Hi, On Fri, Sep 16, 2016 at 10:47:46AM -0700, Junio C Hamano wrote: > One thing that makes me worried is how the ref cache layer interacts > with this. I see you first call push_unpushed_submodules() when > ON_DEMAND is set, which would result in pushes in submodule > repositories, updating their remote tracking branches. At that > point, before you make another call to find_unpushed_submodules(), > is our cached ref layer knows that the remote tracking branches > are now up to date (otherwise, we would incorrectly judge that these > submodules need pushing based on stale information)? I am not sure if I understand you correctly here. With the "ref cache layer" you are referring to add_submodule_odb() which is called indirectly from submodule_needs_pushing()? Those revs are only used to check whether the hash we need on the remote side exists in the local submodule. That should not change due to a push. The actual check whether the commit(s) exist on the remote side is done using a 'rev-list' in a subprocess later. > > diff --git a/transport.c b/transport.c > > index 94d6dc3..76e1daf 100644 > > --- a/transport.c > > +++ b/transport.c > > @@ -903,23 +903,29 @@ int transport_push(struct transport *transport, > > > > if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && > > !is_bare_repository()) { > > struct ref *ref = remote_refs; > > + struct sha1_array hashes = SHA1_ARRAY_INIT; > > + > > for (; ref; ref = ref->next) > > - if (!is_null_oid(>new_oid) && > > - !push_unpushed_submodules(ref->new_oid.hash, > > - transport->remote->name)) > > - die ("Failed to push all needed > > submodules!"); > > + if (!is_null_oid(>new_oid)) > > + sha1_array_append(, > > ref->new_oid.hash); > > + > > + if (!push_unpushed_submodules(, > > transport->remote->name)) > > + die ("Failed to push all needed submodules!"); > > Do we leak the contents of hashes here? Good catch, sorry about that. Will clear the hashes array. > > } > > > > if ((flags & (TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND | > > TRANSPORT_RECURSE_SUBMODULES_CHECK)) && > > !is_bare_repository()) { > > struct ref *ref = remote_refs; > > struct string_list needs_pushing = STRING_LIST_INIT_DUP; > > + struct sha1_array hashes = SHA1_ARRAY_INIT; > > > > for (; ref; ref = ref->next) > > - if (!is_null_oid(>new_oid) && > > - find_unpushed_submodules(ref->new_oid.hash, > > - transport->remote->name, > > _pushing)) > > - > > die_with_unpushed_submodules(_pushing); > > + if (!is_null_oid(>new_oid)) > > + sha1_array_append(, > > ref->new_oid.hash); > > + > > + if (find_unpushed_submodules(, > > transport->remote->name, > > + _pushing)) > > + die_with_unpushed_submodules(_pushing); > > Do we leak the contents of hashes here? I do not think we need to > worry about needs_pushing leaking, as we will always die if it is > not empty, but it might be a better code hygiene to clear it as > well. As above, will fix. Cheers Heiko
Re: [PATCH 2/2] serialize collection of refs that contain submodule changes
Heiko Voigtwrites: > diff --git a/submodule.c b/submodule.c > index b04c066..a15e346 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -627,24 +627,31 @@ static void free_submodules_sha1s(struct string_list > *submodules) > string_list_clear(submodules, 1); > } > > -int find_unpushed_submodules(unsigned char new_sha1[20], > +static void append_hash_to_argv(const unsigned char sha1[20], > + void *data) > +{ > + struct argv_array *argv = (struct argv_array *) data; > + argv_array_push(argv, sha1_to_hex(sha1)); > +} > + > +int find_unpushed_submodules(struct sha1_array *hashes, > const char *remotes_name, struct string_list *needs_pushing) > { > struct rev_info rev; > struct commit *commit; > - const char *argv[] = {NULL, NULL, "--not", "NULL", NULL}; > - int argc = ARRAY_SIZE(argv) - 1, i; > - char *sha1_copy; > + int i; > struct string_list submodules = STRING_LIST_INIT_DUP; > + struct argv_array argv = ARGV_ARRAY_INIT; > > - struct strbuf remotes_arg = STRBUF_INIT; > - > - strbuf_addf(_arg, "--remotes=%s", remotes_name); > init_revisions(, NULL); > - sha1_copy = xstrdup(sha1_to_hex(new_sha1)); > - argv[1] = sha1_copy; > - argv[3] = remotes_arg.buf; > - setup_revisions(argc, argv, , NULL); > + > + /* argv.argv[0] will be ignored by setup_revisions */ > + argv_array_push(, "find_unpushed_submodules"); > + sha1_array_for_each_unique(hashes, append_hash_to_argv, ); > + argv_array_push(, "--not"); > + argv_array_pushf(, "--remotes=%s", remotes_name); > + > + setup_revisions(argv.argc, argv.argv, , NULL); Yes, its about time to for us to lose that fixed-size argv[] and replace it with an argv-array ;-). > if (prepare_revision_walk()) > die("revision walk setup failed"); So this one used to get a single commit at the tip of what we pushed in the superproject and was asked "Look at the history we just pushed leading to the tip commit, and tell me if any of the ones new to the remote requires submodule commits the remote does not yet have". Now the caller collects all the tip commits and asks us once: "Here are the new tips we just pushed; in the history leading to them, is there a commit that the remote did not have that requires submodule history the remote does not yet have?". Makes sort-of sense. I speculated that you would be doing the same kind of optimization to feed all positive commits to rev-list at once in each submodule repository in the review of the previous one, but you didn't do it here. You did the same for superproject in this step. Perhaps 3 or 4 would do so in the submodule repository. One thing that makes me worried is how the ref cache layer interacts with this. I see you first call push_unpushed_submodules() when ON_DEMAND is set, which would result in pushes in submodule repositories, updating their remote tracking branches. At that point, before you make another call to find_unpushed_submodules(), is our cached ref layer knows that the remote tracking branches are now up to date (otherwise, we would incorrectly judge that these submodules need pushing based on stale information)? > diff --git a/transport.c b/transport.c > index 94d6dc3..76e1daf 100644 > --- a/transport.c > +++ b/transport.c > @@ -903,23 +903,29 @@ int transport_push(struct transport *transport, > > if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && > !is_bare_repository()) { > struct ref *ref = remote_refs; > + struct sha1_array hashes = SHA1_ARRAY_INIT; > + > for (; ref; ref = ref->next) > - if (!is_null_oid(>new_oid) && > - !push_unpushed_submodules(ref->new_oid.hash, > - transport->remote->name)) > - die ("Failed to push all needed > submodules!"); > + if (!is_null_oid(>new_oid)) > + sha1_array_append(, > ref->new_oid.hash); > + > + if (!push_unpushed_submodules(, > transport->remote->name)) > + die ("Failed to push all needed submodules!"); Do we leak the contents of hashes here? > } > > if ((flags & (TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND | > TRANSPORT_RECURSE_SUBMODULES_CHECK)) && > !is_bare_repository()) { > struct ref *ref = remote_refs; > struct string_list needs_pushing = STRING_LIST_INIT_DUP; > + struct sha1_array hashes = SHA1_ARRAY_INIT; > > for (; ref; ref = ref->next) > - if (!is_null_oid(>new_oid) && > - find_unpushed_submodules(ref->new_oid.hash, > -
Re: [PATCH 2/2] serialize collection of refs that contain submodule changes
On Wed, Sep 14, 2016 at 12:46 PM, Heiko Voigtwrote: > On Wed, Sep 14, 2016 at 07:51:30PM +0200, Heiko Voigt wrote: >> Here are some numbers (using the my development clone of git >> itself) from my local machine: >> >> rm -rf && mkdir && >> (cd && git init) && >> time git push --mirror >> >>real 0m16.056s >>user 0m24.424s >>sys0m1.380s >> >>real 0m15.885s >>user 0m24.204s >>sys0m1.296s >> >>real 0m16.731s >>user 0m24.176s >>sys0m1.244s >> >> rm -rf && mkdir && >> (cd && git init) && >> time git push --mirror --recurse-submodules=check >> >>real 0m21.441s >>user 0m29.560s >>sys0m1.480s >> >>real 0m21.319s >>user 0m29.484s >>sys0m1.464s >> >>real 0m21.261s >>user 0m29.252s >>sys0m1.592s >> >> Without my patches and --recurse-submodules=check the numbers are >> basically the same. I stopped the test with --recurse-submodules=check >> after ~ 9 minutes. > > Fun fact, I let the push without my patch and with > --recurse-submodules=check finish: Thanks for the numbers, one of the major push backs for origin/sb/push-make-submodule-check-the-default was that it introduced slowness; this patch might help a bit there.
Re: [PATCH 2/2] serialize collection of refs that contain submodule changes
On Wed, Sep 14, 2016 at 07:51:30PM +0200, Heiko Voigt wrote: > Here are some numbers (using the my development clone of git > itself) from my local machine: > > rm -rf && mkdir && > (cd && git init) && > time git push --mirror > >real 0m16.056s >user 0m24.424s >sys0m1.380s > >real 0m15.885s >user 0m24.204s >sys0m1.296s > >real 0m16.731s >user 0m24.176s >sys0m1.244s > > rm -rf && mkdir && > (cd && git init) && > time git push --mirror --recurse-submodules=check > >real 0m21.441s >user 0m29.560s >sys0m1.480s > >real 0m21.319s >user 0m29.484s >sys0m1.464s > >real 0m21.261s >user 0m29.252s >sys0m1.592s > > Without my patches and --recurse-submodules=check the numbers are > basically the same. I stopped the test with --recurse-submodules=check > after ~ 9 minutes. Fun fact, I let the push without my patch and with --recurse-submodules=check finish: real27m7.962s user27m15.568s sys 0m2.016s Thats quite some time... Cheers Heiko
[PATCH 2/2] serialize collection of refs that contain submodule changes
We are iterating over each pushed ref and want to check whether it contains changes to submodules. Instead of immediately checking each ref lets first collect them and then do the check for all of them in one revision walk. Signed-off-by: Heiko Voigt--- Sorry this was not catched earlier. This was implemented as part of summer of code and it seems we never tested with --mirror. This is the one which does only one revision walk instead of one for each ref. Here are some numbers (using the my development clone of git itself) from my local machine: rm -rf && mkdir && (cd && git init) && time git push --mirror real 0m16.056s user 0m24.424s sys 0m1.380s real 0m15.885s user 0m24.204s sys 0m1.296s real 0m16.731s user 0m24.176s sys 0m1.244s rm -rf && mkdir && (cd && git init) && time git push --mirror --recurse-submodules=check real 0m21.441s user 0m29.560s sys 0m1.480s real 0m21.319s user 0m29.484s sys 0m1.464s real 0m21.261s user 0m29.252s sys 0m1.592s Without my patches and --recurse-submodules=check the numbers are basically the same. I stopped the test with --recurse-submodules=check after ~ 9 minutes. Cheers Heiko submodule.c | 36 +--- submodule.h | 5 +++-- transport.c | 22 ++ 3 files changed, 38 insertions(+), 25 deletions(-) diff --git a/submodule.c b/submodule.c index b04c066..a15e346 100644 --- a/submodule.c +++ b/submodule.c @@ -627,24 +627,31 @@ static void free_submodules_sha1s(struct string_list *submodules) string_list_clear(submodules, 1); } -int find_unpushed_submodules(unsigned char new_sha1[20], +static void append_hash_to_argv(const unsigned char sha1[20], + void *data) +{ + struct argv_array *argv = (struct argv_array *) data; + argv_array_push(argv, sha1_to_hex(sha1)); +} + +int find_unpushed_submodules(struct sha1_array *hashes, const char *remotes_name, struct string_list *needs_pushing) { struct rev_info rev; struct commit *commit; - const char *argv[] = {NULL, NULL, "--not", "NULL", NULL}; - int argc = ARRAY_SIZE(argv) - 1, i; - char *sha1_copy; + int i; struct string_list submodules = STRING_LIST_INIT_DUP; + struct argv_array argv = ARGV_ARRAY_INIT; - struct strbuf remotes_arg = STRBUF_INIT; - - strbuf_addf(_arg, "--remotes=%s", remotes_name); init_revisions(, NULL); - sha1_copy = xstrdup(sha1_to_hex(new_sha1)); - argv[1] = sha1_copy; - argv[3] = remotes_arg.buf; - setup_revisions(argc, argv, , NULL); + + /* argv.argv[0] will be ignored by setup_revisions */ + argv_array_push(, "find_unpushed_submodules"); + sha1_array_for_each_unique(hashes, append_hash_to_argv, ); + argv_array_push(, "--not"); + argv_array_pushf(, "--remotes=%s", remotes_name); + + setup_revisions(argv.argc, argv.argv, , NULL); if (prepare_revision_walk()) die("revision walk setup failed"); @@ -652,8 +659,7 @@ int find_unpushed_submodules(unsigned char new_sha1[20], find_unpushed_submodule_commits(commit, ); reset_revision_walk(); - free(sha1_copy); - strbuf_release(_arg); + argv_array_clear(); for (i = 0; i < submodules.nr; i++) { struct string_list_item *item = [i]; @@ -691,12 +697,12 @@ static int push_submodule(const char *path) return 1; } -int push_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name) +int push_unpushed_submodules(struct sha1_array *hashes, const char *remotes_name) { int i, ret = 1; struct string_list needs_pushing = STRING_LIST_INIT_DUP; - if (!find_unpushed_submodules(new_sha1, remotes_name, _pushing)) + if (!find_unpushed_submodules(hashes, remotes_name, _pushing)) return 1; for (i = 0; i < needs_pushing.nr; i++) { diff --git a/submodule.h b/submodule.h index d9e197a..065b2f0 100644 --- a/submodule.h +++ b/submodule.h @@ -3,6 +3,7 @@ struct diff_options; struct argv_array; +struct sha1_array; enum { RECURSE_SUBMODULES_CHECK = -4, @@ -62,9 +63,9 @@ int submodule_uses_gitfile(const char *path); int ok_to_remove_submodule(const char *path); int merge_submodule(unsigned char result[20], const char *path, const unsigned char base[20], const unsigned char a[20], const unsigned char b[20], int search); -int find_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name, +int find_unpushed_submodules(struct sha1_array *hashes, const char *remotes_name, struct string_list *needs_pushing); -int push_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name); +int push_unpushed_submodules(struct sha1_array *hashes, const char *remotes_name); void connect_work_tree_and_git_dir(const char