Re: [PATCH 2/2] serialize collection of refs that contain submodule changes

2016-09-19 Thread Junio C Hamano
Heiko Voigt  writes:

> I am not sure if I understand you correctly here. With the "ref cache layer"
> you are referring to add_submodule_odb() which is called indirectly from
> submodule_needs_pushing()? Those revs are only used to check whether the hash
> we need on the remote side exists in the local submodule. That should not
> change due to a push. The actual check whether the commit(s) exist on the
> remote side is done using a 'rev-list' in a subprocess later.

I was wondering what would happen in this scenario:

 * You have ON_DEMAND set, which causes "git -C sub push origin" to
   push out what are new, updating the remote tracking branches in
   the submodule, sub/.git/refs/remotes/origin/*.

 * Then you check again.  If you used for-each-ref-in-submodule, the
   updated refs/remotes/origin/* may not have been re-read.

But you check by spawning "rev-list ... --not --remotes" as a
separate process in submodule_needs_pushing(), and that will force
the new process to read the updated state, so it turns out that I
was overly worried without good reason ;-)

It may matter once somebody tries to internalize the external
rev-list call done via start_command() interface to an internal
call, though.  But we are not there yet.


Re: [PATCH 2/2] serialize collection of refs that contain submodule changes

2016-09-19 Thread Heiko Voigt
Hi,

On Fri, Sep 16, 2016 at 10:47:46AM -0700, Junio C Hamano wrote:
> One thing that makes me worried is how the ref cache layer interacts
> with this.  I see you first call push_unpushed_submodules() when
> ON_DEMAND is set, which would result in pushes in submodule
> repositories, updating their remote tracking branches.  At that
> point, before you make another call to find_unpushed_submodules(),
> is our cached ref layer knows that the remote tracking branches
> are now up to date (otherwise, we would incorrectly judge that these
> submodules need pushing based on stale information)?

I am not sure if I understand you correctly here. With the "ref cache layer"
you are referring to add_submodule_odb() which is called indirectly from
submodule_needs_pushing()? Those revs are only used to check whether the hash
we need on the remote side exists in the local submodule. That should not
change due to a push. The actual check whether the commit(s) exist on the
remote side is done using a 'rev-list' in a subprocess later.


> > diff --git a/transport.c b/transport.c
> > index 94d6dc3..76e1daf 100644
> > --- a/transport.c
> > +++ b/transport.c
> > @@ -903,23 +903,29 @@ int transport_push(struct transport *transport,
> >  
> > if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && 
> > !is_bare_repository()) {
> > struct ref *ref = remote_refs;
> > +   struct sha1_array hashes = SHA1_ARRAY_INIT;
> > +
> > for (; ref; ref = ref->next)
> > -   if (!is_null_oid(>new_oid) &&
> > -   !push_unpushed_submodules(ref->new_oid.hash,
> > -   transport->remote->name))
> > -   die ("Failed to push all needed 
> > submodules!");
> > +   if (!is_null_oid(>new_oid))
> > +   sha1_array_append(, 
> > ref->new_oid.hash);
> > +
> > +   if (!push_unpushed_submodules(, 
> > transport->remote->name))
> > +   die ("Failed to push all needed submodules!");
> 
> Do we leak the contents of hashes here?

Good catch, sorry about that. Will clear the hashes array.


> > }
> >  
> > if ((flags & (TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND |
> >   TRANSPORT_RECURSE_SUBMODULES_CHECK)) && 
> > !is_bare_repository()) {
> > struct ref *ref = remote_refs;
> > struct string_list needs_pushing = STRING_LIST_INIT_DUP;
> > +   struct sha1_array hashes = SHA1_ARRAY_INIT;
> >  
> > for (; ref; ref = ref->next)
> > -   if (!is_null_oid(>new_oid) &&
> > -   find_unpushed_submodules(ref->new_oid.hash,
> > -   transport->remote->name, 
> > _pushing))
> > -   
> > die_with_unpushed_submodules(_pushing);
> > +   if (!is_null_oid(>new_oid))
> > +   sha1_array_append(, 
> > ref->new_oid.hash);
> > +
> > +   if (find_unpushed_submodules(, 
> > transport->remote->name,
> > +   _pushing))
> > +   die_with_unpushed_submodules(_pushing);
> 
> Do we leak the contents of hashes here?  I do not think we need to
> worry about needs_pushing leaking, as we will always die if it is
> not empty, but it might be a better code hygiene to clear it as
> well.

As above, will fix.

Cheers Heiko


Re: [PATCH 2/2] serialize collection of refs that contain submodule changes

2016-09-16 Thread Junio C Hamano
Heiko Voigt  writes:

> diff --git a/submodule.c b/submodule.c
> index b04c066..a15e346 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -627,24 +627,31 @@ static void free_submodules_sha1s(struct string_list 
> *submodules)
>   string_list_clear(submodules, 1);
>  }
>  
> -int find_unpushed_submodules(unsigned char new_sha1[20],
> +static void append_hash_to_argv(const unsigned char sha1[20],
> + void *data)
> +{
> + struct argv_array *argv = (struct argv_array *) data;
> + argv_array_push(argv, sha1_to_hex(sha1));
> +}
> +
> +int find_unpushed_submodules(struct sha1_array *hashes,
>   const char *remotes_name, struct string_list *needs_pushing)
>  {
>   struct rev_info rev;
>   struct commit *commit;
> - const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
> - int argc = ARRAY_SIZE(argv) - 1, i;
> - char *sha1_copy;
> + int i;
>   struct string_list submodules = STRING_LIST_INIT_DUP;
> + struct argv_array argv = ARGV_ARRAY_INIT;
>  
> - struct strbuf remotes_arg = STRBUF_INIT;
> -
> - strbuf_addf(_arg, "--remotes=%s", remotes_name);
>   init_revisions(, NULL);
> - sha1_copy = xstrdup(sha1_to_hex(new_sha1));
> - argv[1] = sha1_copy;
> - argv[3] = remotes_arg.buf;
> - setup_revisions(argc, argv, , NULL);
> +
> + /* argv.argv[0] will be ignored by setup_revisions */
> + argv_array_push(, "find_unpushed_submodules");
> + sha1_array_for_each_unique(hashes, append_hash_to_argv, );
> + argv_array_push(, "--not");
> + argv_array_pushf(, "--remotes=%s", remotes_name);
> +
> + setup_revisions(argv.argc, argv.argv, , NULL);

Yes, its about time to for us to lose that fixed-size argv[] and
replace it with an argv-array ;-).

>   if (prepare_revision_walk())
>   die("revision walk setup failed");

So this one used to get a single commit at the tip of what we pushed
in the superproject and was asked "Look at the history we just
pushed leading to the tip commit, and tell me if any of the ones new
to the remote requires submodule commits the remote does not yet
have".  Now the caller collects all the tip commits and asks us
once: "Here are the new tips we just pushed; in the history leading
to them, is there a commit that the remote did not have that requires
submodule history the remote does not yet have?".

Makes sort-of sense.

I speculated that you would be doing the same kind of optimization
to feed all positive commits to rev-list at once in each submodule
repository in the review of the previous one, but you didn't do it
here.  You did the same for superproject in this step.  Perhaps 3 or
4 would do so in the submodule repository.

One thing that makes me worried is how the ref cache layer interacts
with this.  I see you first call push_unpushed_submodules() when
ON_DEMAND is set, which would result in pushes in submodule
repositories, updating their remote tracking branches.  At that
point, before you make another call to find_unpushed_submodules(),
is our cached ref layer knows that the remote tracking branches
are now up to date (otherwise, we would incorrectly judge that these
submodules need pushing based on stale information)?

> diff --git a/transport.c b/transport.c
> index 94d6dc3..76e1daf 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -903,23 +903,29 @@ int transport_push(struct transport *transport,
>  
>   if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && 
> !is_bare_repository()) {
>   struct ref *ref = remote_refs;
> + struct sha1_array hashes = SHA1_ARRAY_INIT;
> +
>   for (; ref; ref = ref->next)
> - if (!is_null_oid(>new_oid) &&
> - !push_unpushed_submodules(ref->new_oid.hash,
> - transport->remote->name))
> - die ("Failed to push all needed 
> submodules!");
> + if (!is_null_oid(>new_oid))
> + sha1_array_append(, 
> ref->new_oid.hash);
> +
> + if (!push_unpushed_submodules(, 
> transport->remote->name))
> + die ("Failed to push all needed submodules!");

Do we leak the contents of hashes here?

>   }
>  
>   if ((flags & (TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND |
> TRANSPORT_RECURSE_SUBMODULES_CHECK)) && 
> !is_bare_repository()) {
>   struct ref *ref = remote_refs;
>   struct string_list needs_pushing = STRING_LIST_INIT_DUP;
> + struct sha1_array hashes = SHA1_ARRAY_INIT;
>  
>   for (; ref; ref = ref->next)
> - if (!is_null_oid(>new_oid) &&
> - find_unpushed_submodules(ref->new_oid.hash,
> -   

Re: [PATCH 2/2] serialize collection of refs that contain submodule changes

2016-09-14 Thread Stefan Beller
On Wed, Sep 14, 2016 at 12:46 PM, Heiko Voigt  wrote:
> On Wed, Sep 14, 2016 at 07:51:30PM +0200, Heiko Voigt wrote:
>> Here are some numbers (using the my development clone of git
>> itself) from my local machine:
>>
>> rm -rf  && mkdir  &&
>> (cd  && git init) &&
>> time git push --mirror 
>>
>>real   0m16.056s
>>user   0m24.424s
>>sys0m1.380s
>>
>>real   0m15.885s
>>user   0m24.204s
>>sys0m1.296s
>>
>>real   0m16.731s
>>user   0m24.176s
>>sys0m1.244s
>>
>> rm -rf  && mkdir  &&
>> (cd  && git init) &&
>> time git push --mirror --recurse-submodules=check 
>>
>>real   0m21.441s
>>user   0m29.560s
>>sys0m1.480s
>>
>>real   0m21.319s
>>user   0m29.484s
>>sys0m1.464s
>>
>>real   0m21.261s
>>user   0m29.252s
>>sys0m1.592s
>>
>> Without my patches and --recurse-submodules=check the numbers are
>> basically the same. I stopped the test with --recurse-submodules=check
>> after ~ 9 minutes.
>
> Fun fact, I let the push without my patch and with
> --recurse-submodules=check finish:

Thanks for the numbers, one of the major push backs for
origin/sb/push-make-submodule-check-the-default
was that it introduced slowness; this patch might help a bit there.


Re: [PATCH 2/2] serialize collection of refs that contain submodule changes

2016-09-14 Thread Heiko Voigt
On Wed, Sep 14, 2016 at 07:51:30PM +0200, Heiko Voigt wrote:
> Here are some numbers (using the my development clone of git
> itself) from my local machine:
> 
> rm -rf  && mkdir  &&
> (cd  && git init) &&
> time git push --mirror 
> 
>real   0m16.056s
>user   0m24.424s
>sys0m1.380s
> 
>real   0m15.885s
>user   0m24.204s
>sys0m1.296s
> 
>real   0m16.731s
>user   0m24.176s
>sys0m1.244s
> 
> rm -rf  && mkdir  &&
> (cd  && git init) &&
> time git push --mirror --recurse-submodules=check 
> 
>real   0m21.441s
>user   0m29.560s
>sys0m1.480s
> 
>real   0m21.319s
>user   0m29.484s
>sys0m1.464s
> 
>real   0m21.261s
>user   0m29.252s
>sys0m1.592s
> 
> Without my patches and --recurse-submodules=check the numbers are
> basically the same. I stopped the test with --recurse-submodules=check
> after ~ 9 minutes.

Fun fact, I let the push without my patch and with
--recurse-submodules=check finish:

real27m7.962s
user27m15.568s
sys 0m2.016s

Thats quite some time...

Cheers Heiko


[PATCH 2/2] serialize collection of refs that contain submodule changes

2016-09-14 Thread Heiko Voigt
We are iterating over each pushed ref and want to check whether it
contains changes to submodules. Instead of immediately checking each ref
lets first collect them and then do the check for all of them in one
revision walk.

Signed-off-by: Heiko Voigt 
---

Sorry this was not catched earlier. This was implemented as part of
summer of code and it seems we never tested with --mirror.

This is the one which does only one revision walk instead of one for
each ref. Here are some numbers (using the my development clone of git
itself) from my local machine:

rm -rf  && mkdir  &&
(cd  && git init) &&
time git push --mirror 

   real 0m16.056s
   user 0m24.424s
   sys  0m1.380s

   real 0m15.885s
   user 0m24.204s
   sys  0m1.296s

   real 0m16.731s
   user 0m24.176s
   sys  0m1.244s

rm -rf  && mkdir  &&
(cd  && git init) &&
time git push --mirror --recurse-submodules=check 

   real 0m21.441s
   user 0m29.560s
   sys  0m1.480s

   real 0m21.319s
   user 0m29.484s
   sys  0m1.464s

   real 0m21.261s
   user 0m29.252s
   sys  0m1.592s

Without my patches and --recurse-submodules=check the numbers are
basically the same. I stopped the test with --recurse-submodules=check
after ~ 9 minutes.

Cheers Heiko

 submodule.c | 36 +---
 submodule.h |  5 +++--
 transport.c | 22 ++
 3 files changed, 38 insertions(+), 25 deletions(-)

diff --git a/submodule.c b/submodule.c
index b04c066..a15e346 100644
--- a/submodule.c
+++ b/submodule.c
@@ -627,24 +627,31 @@ static void free_submodules_sha1s(struct string_list 
*submodules)
string_list_clear(submodules, 1);
 }
 
-int find_unpushed_submodules(unsigned char new_sha1[20],
+static void append_hash_to_argv(const unsigned char sha1[20],
+   void *data)
+{
+   struct argv_array *argv = (struct argv_array *) data;
+   argv_array_push(argv, sha1_to_hex(sha1));
+}
+
+int find_unpushed_submodules(struct sha1_array *hashes,
const char *remotes_name, struct string_list *needs_pushing)
 {
struct rev_info rev;
struct commit *commit;
-   const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
-   int argc = ARRAY_SIZE(argv) - 1, i;
-   char *sha1_copy;
+   int i;
struct string_list submodules = STRING_LIST_INIT_DUP;
+   struct argv_array argv = ARGV_ARRAY_INIT;
 
-   struct strbuf remotes_arg = STRBUF_INIT;
-
-   strbuf_addf(_arg, "--remotes=%s", remotes_name);
init_revisions(, NULL);
-   sha1_copy = xstrdup(sha1_to_hex(new_sha1));
-   argv[1] = sha1_copy;
-   argv[3] = remotes_arg.buf;
-   setup_revisions(argc, argv, , NULL);
+
+   /* argv.argv[0] will be ignored by setup_revisions */
+   argv_array_push(, "find_unpushed_submodules");
+   sha1_array_for_each_unique(hashes, append_hash_to_argv, );
+   argv_array_push(, "--not");
+   argv_array_pushf(, "--remotes=%s", remotes_name);
+
+   setup_revisions(argv.argc, argv.argv, , NULL);
if (prepare_revision_walk())
die("revision walk setup failed");
 
@@ -652,8 +659,7 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
find_unpushed_submodule_commits(commit, );
 
reset_revision_walk();
-   free(sha1_copy);
-   strbuf_release(_arg);
+   argv_array_clear();
 
for (i = 0; i < submodules.nr; i++) {
struct string_list_item *item = [i];
@@ -691,12 +697,12 @@ static int push_submodule(const char *path)
return 1;
 }
 
-int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name)
+int push_unpushed_submodules(struct sha1_array *hashes, const char 
*remotes_name)
 {
int i, ret = 1;
struct string_list needs_pushing = STRING_LIST_INIT_DUP;
 
-   if (!find_unpushed_submodules(new_sha1, remotes_name, _pushing))
+   if (!find_unpushed_submodules(hashes, remotes_name, _pushing))
return 1;
 
for (i = 0; i < needs_pushing.nr; i++) {
diff --git a/submodule.h b/submodule.h
index d9e197a..065b2f0 100644
--- a/submodule.h
+++ b/submodule.h
@@ -3,6 +3,7 @@
 
 struct diff_options;
 struct argv_array;
+struct sha1_array;
 
 enum {
RECURSE_SUBMODULES_CHECK = -4,
@@ -62,9 +63,9 @@ int submodule_uses_gitfile(const char *path);
 int ok_to_remove_submodule(const char *path);
 int merge_submodule(unsigned char result[20], const char *path, const unsigned 
char base[20],
const unsigned char a[20], const unsigned char b[20], int 
search);
-int find_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name,
+int find_unpushed_submodules(struct sha1_array *hashes, const char 
*remotes_name,
struct string_list *needs_pushing);
-int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name);
+int push_unpushed_submodules(struct sha1_array *hashes, const char 
*remotes_name);
 void connect_work_tree_and_git_dir(const char