Re: [RFC PATCHv2 09/17] update submodules: add scheduling to update submodules

2016-12-05 Thread Stefan Beller
On Mon, Dec 5, 2016 at 3:37 PM, David Turner <david.tur...@twosigma.com> wrote:
> This patch confuses me -- see below.
>
>> -Original Message-
>> From: Stefan Beller [mailto:sbel...@google.com]
>> Sent: Friday, December 02, 2016 7:30 PM
>> To: bmw...@google.com; David Turner
>> Cc: git@vger.kernel.org; sand...@crustytoothpaste.net; hvo...@hvoigt.net;
>> gits...@pobox.com; Stefan Beller
>> Subject: [RFC PATCHv2 09/17] update submodules: add scheduling to update
>> submodules
> [snip]
>> +static int update_submodule(const char *path, const struct object_id
>> *oid,
>> + int force, int is_new)
>> +{
>> + const char *git_dir;
>> + struct child_process cp = CHILD_PROCESS_INIT;
>> + const struct submodule *sub = submodule_from_path(null_sha1, path);
>> +
>> + if (!sub || !sub->name)
>> + return -1;
>> +
>> + git_dir = resolve_gitdir(git_common_path("modules/%s", sub->name));
>> +
>> + if (!git_dir)
>> + return -1;
>> +
>> + if (is_new)
>> + connect_work_tree_and_git_dir(path, git_dir);
>> +
>> + /* update index via `read-tree --reset sha1` */
>> + argv_array_pushl(, "read-tree",
>> +force ? "--reset" : "-m",
>> +"-u", sha1_to_hex(oid->hash), NULL);
>> + prepare_submodule_repo_env(_array);
>> + cp.git_cmd = 1;
>> + cp.no_stdin = 1;
>> + cp.dir = path;
>> + if (run_command()) {
>> + warning(_("reading the index in submodule '%s' failed"),
>> path);
>
> The error is not (usually) in "reading the index" -- it's "updating the 
> index" (or the working tree)
>
>> + child_process_clear();
>> + return -1;
>> + }
>> +
>> + /* write index to working dir */
>> + child_process_clear();
>> + child_process_init();
>> + argv_array_pushl(, "checkout-index", "-a", NULL);
>
> I'm confused -- doesn't read-tree -u already do this?  And if not, shouldn't 
> we back out the result of the read-tree, to leave the submodule as it was?
>
>> + cp.git_cmd = 1;
>> + cp.no_stdin = 1;
>> + cp.dir = path;
>> + if (force)
>> + argv_array_push(, "-f");
>> +
>> + if (run_command()) {
>> + warning(_("populating the working directory in submodule '%s'
>> failed"), path);
>> + child_process_clear();
>> + return -1;
>> + }
>> +
>> + /* get the HEAD right */
>> + child_process_clear();
>> + child_process_init();
>> + argv_array_pushl(, "checkout", "--recurse-submodules",
>> NULL);
>
>
> Why are we running checkout on the submodule when we've already done most of 
> the checkout? The only thing left is to set HEAD and recurse, right?  I must 
> be missing something.
>

Yes this is only used to set the HEAD correctly and then recurse down.
I tried to remove the first 2 calls to ch8ild processes at one point in time,
which did not work out.  I should have written in the commit message why
that was a problem. So I'll redo that just to see the problem and improve
the commit message.


RE: [RFC PATCHv2 09/17] update submodules: add scheduling to update submodules

2016-12-05 Thread David Turner
This patch confuses me -- see below.

> -Original Message-
> From: Stefan Beller [mailto:sbel...@google.com]
> Sent: Friday, December 02, 2016 7:30 PM
> To: bmw...@google.com; David Turner
> Cc: git@vger.kernel.org; sand...@crustytoothpaste.net; hvo...@hvoigt.net;
> gits...@pobox.com; Stefan Beller
> Subject: [RFC PATCHv2 09/17] update submodules: add scheduling to update
> submodules
[snip] 
> +static int update_submodule(const char *path, const struct object_id
> *oid,
> + int force, int is_new)
> +{
> + const char *git_dir;
> + struct child_process cp = CHILD_PROCESS_INIT;
> + const struct submodule *sub = submodule_from_path(null_sha1, path);
> +
> + if (!sub || !sub->name)
> + return -1;
> +
> + git_dir = resolve_gitdir(git_common_path("modules/%s", sub->name));
> +
> + if (!git_dir)
> + return -1;
> +
> + if (is_new)
> + connect_work_tree_and_git_dir(path, git_dir);
> +
> + /* update index via `read-tree --reset sha1` */
> + argv_array_pushl(, "read-tree",
> +force ? "--reset" : "-m",
> +"-u", sha1_to_hex(oid->hash), NULL);
> + prepare_submodule_repo_env(_array);
> + cp.git_cmd = 1;
> + cp.no_stdin = 1;
> + cp.dir = path;
> + if (run_command()) {
> + warning(_("reading the index in submodule '%s' failed"),
> path);

The error is not (usually) in "reading the index" -- it's "updating the index" 
(or the working tree)

> + child_process_clear();
> + return -1;
> + }
> +
> + /* write index to working dir */
> + child_process_clear();
> + child_process_init();
> + argv_array_pushl(, "checkout-index", "-a", NULL);

I'm confused -- doesn't read-tree -u already do this?  And if not, shouldn't we 
back out the result of the read-tree, to leave the submodule as it was?

> + cp.git_cmd = 1;
> + cp.no_stdin = 1;
> + cp.dir = path;
> + if (force)
> + argv_array_push(, "-f");
> +
> + if (run_command()) {
> + warning(_("populating the working directory in submodule '%s'
> failed"), path);
> + child_process_clear();
> + return -1;
> + }
> +
> + /* get the HEAD right */
> + child_process_clear();
> + child_process_init();
> + argv_array_pushl(, "checkout", "--recurse-submodules",
> NULL);


Why are we running checkout on the submodule when we've already done most of 
the checkout? The only thing left is to set HEAD and recurse, right?  I must be 
missing something.



[RFC PATCHv2 09/17] update submodules: add scheduling to update submodules

2016-12-02 Thread Stefan Beller
The walker of a tree is only expected to call `schedule_submodule_for_update`
and once done, to run `update_submodules`. This avoids directory/file
conflicts and later we can parallelize all submodule actions if needed.

Signed-off-by: Stefan Beller 
---
 submodule.c | 132 
 submodule.h |   5 +++
 2 files changed, 137 insertions(+)

diff --git a/submodule.c b/submodule.c
index 7bb64d6c69..02c28ef56b 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1348,3 +1348,135 @@ int parallel_submodules(void)
 {
return parallel_jobs;
 }
+
+static struct scheduled_submodules_update_type {
+   const char *path;
+   const struct object_id *oid;
+   /*
+* Do we need to perform a complete checkout or just incremental
+* update?
+*/
+   unsigned is_new:1;
+} *scheduled_submodules;
+#define SCHEDULED_SUBMODULES_INIT {NULL, NULL, 0}
+
+static int scheduled_submodules_nr, scheduled_submodules_alloc;
+
+void schedule_submodule_for_update(const struct cache_entry *ce, int is_new)
+{
+   struct scheduled_submodules_update_type *ssu;
+   ALLOC_GROW(scheduled_submodules,
+  scheduled_submodules_nr + 1,
+  scheduled_submodules_alloc);
+   ssu = _submodules[scheduled_submodules_nr++];
+   ssu->path = ce->name;
+   ssu->oid = >oid;
+   ssu->is_new = !!is_new;
+}
+
+static int update_submodule(const char *path, const struct object_id *oid,
+   int force, int is_new)
+{
+   const char *git_dir;
+   struct child_process cp = CHILD_PROCESS_INIT;
+   const struct submodule *sub = submodule_from_path(null_sha1, path);
+
+   if (!sub || !sub->name)
+   return -1;
+
+   git_dir = resolve_gitdir(git_common_path("modules/%s", sub->name));
+
+   if (!git_dir)
+   return -1;
+
+   if (is_new)
+   connect_work_tree_and_git_dir(path, git_dir);
+
+   /* update index via `read-tree --reset sha1` */
+   argv_array_pushl(, "read-tree",
+  force ? "--reset" : "-m",
+  "-u", sha1_to_hex(oid->hash), NULL);
+   prepare_submodule_repo_env(_array);
+   cp.git_cmd = 1;
+   cp.no_stdin = 1;
+   cp.dir = path;
+   if (run_command()) {
+   warning(_("reading the index in submodule '%s' failed"), path);
+   child_process_clear();
+   return -1;
+   }
+
+   /* write index to working dir */
+   child_process_clear();
+   child_process_init();
+   argv_array_pushl(, "checkout-index", "-a", NULL);
+   cp.git_cmd = 1;
+   cp.no_stdin = 1;
+   cp.dir = path;
+   if (force)
+   argv_array_push(, "-f");
+
+   if (run_command()) {
+   warning(_("populating the working directory in submodule '%s' 
failed"), path);
+   child_process_clear();
+   return -1;
+   }
+
+   /* get the HEAD right */
+   child_process_clear();
+   child_process_init();
+   argv_array_pushl(, "checkout", "--recurse-submodules", NULL);
+   cp.git_cmd = 1;
+   cp.no_stdin = 1;
+   cp.dir = path;
+   if (force)
+   argv_array_push(, "-f");
+   argv_array_push(, sha1_to_hex(oid->hash));
+
+   if (run_command()) {
+   warning(_("setting the HEAD in submodule '%s' failed"), path);
+   child_process_clear();
+   return -1;
+   }
+
+   child_process_clear();
+   return 0;
+}
+
+int update_submodules(int force)
+{
+   int i;
+   gitmodules_config();
+
+   /*
+* NEEDSWORK: As submodule updates can potentially take some
+* time each and they do not overlap (i.e. no d/f conflicts;
+* this can be parallelized using the run_commands.h API.
+*/
+   for (i = 0; i < scheduled_submodules_nr; i++) {
+   struct submodule *sub;
+   struct scheduled_submodules_update_type *ssu =
+   _submodules[i];
+
+   if (!submodule_is_interesting(ssu->path))
+   continue;
+
+   sub = submodule_from_path(null_sha1, ssu->path);
+
+   switch (sub->update_strategy) {
+   case SM_UPDATE_UNSPECIFIED: /* fall thru */
+   case SM_UPDATE_CHECKOUT:
+   update_submodule(ssu->path, ssu->oid,
+force, ssu->is_new);
+   break;
+   case SM_UPDATE_REBASE:
+   case SM_UPDATE_MERGE:
+   case SM_UPDATE_NONE:
+   case SM_UPDATE_COMMAND:
+   warning("update strategy for submodule '%s' not 
supported", ssu->path);
+   }
+   }
+
+   scheduled_submodules_nr = 0;
+   return 0;
+}
diff --git a/submodule.h b/submodule.h
index