date:20161115

Douglas Cox  writes:

>> This may or may not be related to the symptom
>> you are observing (if it is, then you would see a packfile created
>> in objects/pack/, not in loose objects in object/??/ directories).
>
> No, the file is loose (it's in .git/objects/eb in this case). This is
> seen immediately after the add, though I believe it's the same way
> when doing a commit on a changed file.

Then I do not have a guess as to where the symptom you are seeing is
coming from.

[PATCH v2] compression: unify pack.compression configuration parsing

There are three codepaths that use a variable whose name is
pack_compression_level to affect how objects and deltas sent to a
packfile is compressed.  Unlike zlib_compression_level that controls
the loose object compression, however, this variable was static to
each of these codepaths.  Two of them read the pack.compression
configuration variable, using core.compression as the default, and
one of them also allowed overriding it from the command line.

The other codepath in bulk-checkin did not pay any attention to the
configuration.

Unify the configuration parsing to git_default_config(), where we
implement the parsing of core.loosecompression and core.compression
and make the former override the latter, by moving code to parse
pack.compression and also allow core.compression to give default to
this variable.

Signed-off-by: Junio C Hamano 
---

 * Now comes with new tests to cover pack-objects and fast-import,
   which should not fail with or without code change because they
   are only here to prevent regression with this change.

   On the code side, *_compression_seen variables are also made
   private to config.c, as these variables are used only by the
   config reader to implement "core.compression gives the default to
   *.compression" logic and the code is consolidated to this single
   file.

 builtin/pack-objects.c  | 14 
 bulk-checkin.c  |  2 --
 cache.h |  2 +-
 config.c| 16 +
 environment.c   |  2 +-
 fast-import.c   | 13 ---
 t/t1050-large.sh| 29 
 t/t5315-pack-objects-compression.sh | 44 
 t/t9303-fast-import-compression.sh  | 67 +
 9 files changed, 158 insertions(+), 31 deletions(-)
 create mode 100755 t/t5315-pack-objects-compression.sh
 create mode 100755 t/t9303-fast-import-compression.sh

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0fd52bd6b4..8841f8b366 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -61,8 +61,6 @@ static int delta_search_threads;
 static int pack_to_stdout;
 static int num_preferred_base;
 static struct progress *progress_state;
-static int pack_compression_level = Z_DEFAULT_COMPRESSION;
-static int pack_compression_seen;
 
 static struct packed_git *reuse_packfile;
 static uint32_t reuse_packfile_objects;
@@ -2368,16 +2366,6 @@ static int git_pack_config(const char *k, const char *v, 
void *cb)
depth = git_config_int(k, v);
return 0;
}
-   if (!strcmp(k, "pack.compression")) {
-   int level = git_config_int(k, v);
-   if (level == -1)
-   level = Z_DEFAULT_COMPRESSION;
-   else if (level < 0 || level > Z_BEST_COMPRESSION)
-   die("bad pack compression level %d", level);
-   pack_compression_level = level;
-   pack_compression_seen = 1;
-   return 0;
-   }
if (!strcmp(k, "pack.deltacachesize")) {
max_delta_cache_size = git_config_int(k, v);
return 0;
@@ -2869,8 +2857,6 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
 
reset_pack_idx_option(_idx_opts);
git_config(git_pack_config, NULL);
-   if (!pack_compression_seen && core_compression_seen)
-   pack_compression_level = core_compression_level;
 
progress = isatty(2);
argc = parse_options(argc, argv, prefix, pack_objects_options,
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 4347f5c76a..991b4a13e2 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -7,8 +7,6 @@
 #include "pack.h"
 #include "strbuf.h"
 
-static int pack_compression_level = Z_DEFAULT_COMPRESSION;
-
 static struct bulk_checkin_state {
unsigned plugged:1;
 
diff --git a/cache.h b/cache.h
index a50a61a197..b14c0e8e38 100644
--- a/cache.h
+++ b/cache.h
@@ -670,7 +670,7 @@ extern const char *git_attributes_file;
 extern const char *git_hooks_path;
 extern int zlib_compression_level;
 extern int core_compression_level;
-extern int core_compression_seen;
+extern int pack_compression_level;
 extern size_t packed_git_window_size;
 extern size_t packed_git_limit;
 extern size_t delta_base_cache_limit;
diff --git a/config.c b/config.c
index 83fdecb1bc..2f1aef742e 100644
--- a/config.c
+++ b/config.c
@@ -66,6 +66,8 @@ static struct key_value_info *current_config_kvi;
  */
 static enum config_scope current_parsing_scope;
 
+static int core_compression_seen;
+static int pack_compression_seen;
 static int zlib_compression_seen;
 
 /*
@@ -865,6 +867,8 @@ static int git_default_core_config(const char *var, const 
char *value)
core_compression_seen = 1;
if (!zlib_compression_seen)
zlib_compression_level = level;
+   if

REPLY NOW

2016-11-15 Thread Bank of England

We have an inheritance of a deceased client with your surname. Kindly  
contact Andrew Bailey via email with your full Names and address. (  
baanidle...@hotmail.com )for more info.

--
Correo Corporativo Hospital Universitario del Valle E.S.E
***

"Estamos re-dimensionandonos para crecer!"

**

Re: Bug with disabling compression and 'binary' files.

2016-11-15 Thread Douglas Cox

> This may or may not be related to the symptom
> you are observing (if it is, then you would see a packfile created
> in objects/pack/, not in loose objects in object/??/ directories).

No, the file is loose (it's in .git/objects/eb in this case). This is
seen immediately after the add, though I believe it's the same way
when doing a commit on a changed file.


On Tue, Nov 15, 2016 at 8:42 PM, Junio C Hamano  wrote:
> Douglas Cox  writes:
>
>> I narrowed this down to the '-text' attribute that is set when
>> specifying 'binary'.  For some reason this flag is cancelling out the
>> core.compression = 0 setting and I think this is a bug?
>>
>> Unfortunately core.compression = 0 is also global. Ideally it would be
>> great if there was a separate 'compression' attribute that could be
>> specified in .gitattributes per wildcard similar to how -delta can be
>> used. This way we would still be able to get compression for
>> text/source files, while still getting the speed of skipping
>> compression for binary files that do not compress well.
>>
>> Has there been any discussion on having an attribute similar to this?
>
> Nope.
>
> I do not offhand think of a way for '-text' attribute (or any
> attribute for what matter) to interfere with compression level, but
> while reading the various parts of the system that futz with the
> compression level configuration, I noticed one thing.  When we do an
> initial "bulk-checkin" optimization that sends all objects to a
> single packfile upon "git add", the packfile creation uses its own
> compression level that is not affected by any configuration or
> command line option.  This may or may not be related to the symptom
> you are observing (if it is, then you would see a packfile created
> in objects/pack/, not in loose objects in object/??/ directories).
>
>
>

[PATCH] compression: unify pack.compression configuration parsing

There are three codepaths that use a variable whose name is
pack_compression_level to affect how objects and deltas sent to a
packfile is compressed.  Unlike zlib_compression_level that controls
the loose object compression, however, this variable was static to
each of these codepaths.  Two of them read the pack.compression
configuration variable, using core.compression as the default, and
one of them also allowed overriding it from the command line.

The other codepath in bulk-checkin did not pay any attention to the
configuration.

Unify the configuration parsing to git_default_config(), where we
implement the parsing of core.loosecompression and core.compression
and make the former override the latter, by moving code to parse
pack.compression and also allow core.compression to give default to
this variable.

Signed-off-by: Junio C Hamano 
---

 * I was primarily interested in teaching the configuration to
   bulk-checkin codepath, and added a test that succeeds only with
   the code change.  The handling of confugraiton in other two
   codepaths, pack-objects and fast-import, may be totally broken
   with this change if there is no existing test.  A similar set of
   tests for them are very much welcomed, as without them this
   change won't be 'next' worthy yet.

 builtin/pack-objects.c | 14 --
 bulk-checkin.c |  2 --
 cache.h|  2 ++
 config.c   | 14 ++
 environment.c  |  2 ++
 fast-import.c  | 13 -
 t/t1050-large.sh   | 30 ++
 7 files changed, 48 insertions(+), 29 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0fd52bd6b4..8841f8b366 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -61,8 +61,6 @@ static int delta_search_threads;
 static int pack_to_stdout;
 static int num_preferred_base;
 static struct progress *progress_state;
-static int pack_compression_level = Z_DEFAULT_COMPRESSION;
-static int pack_compression_seen;
 
 static struct packed_git *reuse_packfile;
 static uint32_t reuse_packfile_objects;
@@ -2368,16 +2366,6 @@ static int git_pack_config(const char *k, const char *v, 
void *cb)
depth = git_config_int(k, v);
return 0;
}
-   if (!strcmp(k, "pack.compression")) {
-   int level = git_config_int(k, v);
-   if (level == -1)
-   level = Z_DEFAULT_COMPRESSION;
-   else if (level < 0 || level > Z_BEST_COMPRESSION)
-   die("bad pack compression level %d", level);
-   pack_compression_level = level;
-   pack_compression_seen = 1;
-   return 0;
-   }
if (!strcmp(k, "pack.deltacachesize")) {
max_delta_cache_size = git_config_int(k, v);
return 0;
@@ -2869,8 +2857,6 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
 
reset_pack_idx_option(_idx_opts);
git_config(git_pack_config, NULL);
-   if (!pack_compression_seen && core_compression_seen)
-   pack_compression_level = core_compression_level;
 
progress = isatty(2);
argc = parse_options(argc, argv, prefix, pack_objects_options,
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 4347f5c76a..991b4a13e2 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -7,8 +7,6 @@
 #include "pack.h"
 #include "strbuf.h"
 
-static int pack_compression_level = Z_DEFAULT_COMPRESSION;
-
 static struct bulk_checkin_state {
unsigned plugged:1;
 
diff --git a/cache.h b/cache.h
index a50a61a197..a5d689f9d3 100644
--- a/cache.h
+++ b/cache.h
@@ -671,6 +671,8 @@ extern const char *git_hooks_path;
 extern int zlib_compression_level;
 extern int core_compression_level;
 extern int core_compression_seen;
+extern int pack_compression_level;
+extern int pack_compression_seen;
 extern size_t packed_git_window_size;
 extern size_t packed_git_limit;
 extern size_t delta_base_cache_limit;
diff --git a/config.c b/config.c
index 83fdecb1bc..0477386f27 100644
--- a/config.c
+++ b/config.c
@@ -865,6 +865,8 @@ static int git_default_core_config(const char *var, const 
char *value)
core_compression_seen = 1;
if (!zlib_compression_seen)
zlib_compression_level = level;
+   if (!pack_compression_seen)
+   pack_compression_level = level;
return 0;
}
 
@@ -1125,6 +1127,18 @@ int git_default_config(const char *var, const char 
*value, void *dummy)
pack_size_limit_cfg = git_config_ulong(var, value);
return 0;
}
+
+   if (!strcmp(var, "pack.compression")) {
+   int level = git_config_int(var, value);
+   if (level == -1)
+   level = Z_DEFAULT_COMPRESSION;
+   else if (level < 0 || level > Z_BEST_COMPRESSION)
+

Re: Bug with disabling compression and 'binary' files.

Douglas Cox  writes:

> I narrowed this down to the '-text' attribute that is set when
> specifying 'binary'.  For some reason this flag is cancelling out the
> core.compression = 0 setting and I think this is a bug?
>
> Unfortunately core.compression = 0 is also global. Ideally it would be
> great if there was a separate 'compression' attribute that could be
> specified in .gitattributes per wildcard similar to how -delta can be
> used. This way we would still be able to get compression for
> text/source files, while still getting the speed of skipping
> compression for binary files that do not compress well.
>
> Has there been any discussion on having an attribute similar to this?

Nope.  

I do not offhand think of a way for '-text' attribute (or any
attribute for what matter) to interfere with compression level, but
while reading the various parts of the system that futz with the
compression level configuration, I noticed one thing.  When we do an
initial "bulk-checkin" optimization that sends all objects to a
single packfile upon "git add", the packfile creation uses its own
compression level that is not affected by any configuration or
command line option.  This may or may not be related to the symptom
you are observing (if it is, then you would see a packfile created
in objects/pack/, not in loose objects in object/??/ directories).

Re: [PATCH v7 13/17] ref-filter: add `:dir` and `:base` options for ref printing atoms

2016-11-15 Thread Jacob Keller

On November 15, 2016 9:42:03 AM PST, Junio C Hamano  wrote:
>I think you are going in the right direction.  I had a similar
>thought but built around a different axis.  I.e. if strip=1 strips
>one from the left, perhaps we want to have rstrip=1 that strips one
>from the right, and also strip=-1 to mean strip everything except
>one from the left and so on?.  I think this and your keep (and
>perhaps you'll have rkeep for completeness) have the same expressive
>power.  I do not offhand have a preference one over the other.

I prefer strip implemented with negative numbers. That is simple and expressive 
and if we need we can implement rstrip if necessary.

>
>Somehow it sounds a bit strange to me to treat 'remotes' as the same
>class of token as 'heads' and 'tags' (I'd expect 'heads' and
>'remotes/origin' would be at the same level in end-user's mind), but
>that is probably an unrelated tangent.  The reason this series wants
>to introduce :base must be to emulate an existing feature, so that
>existing feature is a concrete counter-example that argues against
>my "it sounds a bit strange" reaction.

It may be a bit strange indeed. What is the requirement for this?

I think implementing a strip and rstrip ( if necessary ) with negative numbers 
would be most ideal.

Thanks
Jake

Re: [PATCH v3 5/6] grep: enable recurse-submodules to work on objects

On Fri, Nov 11, 2016 at 3:51 PM, Brandon Williams  wrote:

> to:
> HEAD:file
> HEAD:sub/file

  Maybe indent this ;)

>  static struct argv_array submodule_options = ARGV_ARRAY_INIT;
> +static const char *parent_basename;
>
>  static int grep_submodule_launch(struct grep_opt *opt,
>  const struct grep_source *gs);
> @@ -535,19 +537,53 @@ static int grep_submodule_launch(struct grep_opt *opt,
>  {
> struct child_process cp = CHILD_PROCESS_INIT;
> int status, i;
> +   const char *end_of_base;
> +   const char *name;
> struct work_item *w = opt->output_priv;
>
> +   end_of_base = strchr(gs->name, ':');
> +   if (end_of_base)
> +   name = end_of_base + 1;
> +   else
> +   name = gs->name;
> +
> prepare_submodule_repo_env(_array);
>
> /* Add super prefix */
> argv_array_pushf(, "--super-prefix=%s%s/",
>  super_prefix ? super_prefix : "",
> -gs->name);
> +name);
> argv_array_push(, "grep");
>
> +   /*
> +* Add basename of parent project
> +* When performing grep on a  object the filename is prefixed
> +* with the object's name: ':filename'.

This comment is hard to read as it's unclear what the  mean.
(Are the supposed to indicate a variable? If so why is file name not marked up?)

>  In order to
> +* provide uniformity of output we want to pass the name of the
> +* parent project's object name to the submodule so the submodule can
> +* prefix its output with the parent's name and not its own SHA1.
> +*/
> +   if (end_of_base)
> +   argv_array_pushf(, "--parent-basename=%.*s",
> +(int) (end_of_base - gs->name),
> +gs->name);

Do we pass this only with the tree-ish?
What if we are grepping the working tree and the file name contains a colon?

> +test_expect_success 'grep tree HEAD^' '
> +   cat >expect <<-\EOF &&
> +   HEAD^:a:foobar
> +   HEAD^:b/b:bar
> +   HEAD^:submodule/a:foobar
> +   EOF
> +
> +   git grep -e "bar" --recurse-submodules HEAD^ > actual &&
> +   test_cmp expect actual
> +'
> +
> +test_expect_success 'grep tree HEAD^^' '
> +   cat >expect <<-\EOF &&
> +   HEAD^^:a:foobar
> +   HEAD^^:b/b:bar
> +   EOF
> +
> +   git grep -e "bar" --recurse-submodules HEAD^^ > actual &&
> +   test_cmp expect actual
> +'
> +
> +test_expect_success 'grep tree and pathspecs' '
> +   cat >expect <<-\EOF &&
> +   HEAD:submodule/a:foobar
> +   HEAD:submodule/sub/a:foobar
> +   EOF
> +
> +   git grep -e "bar" --recurse-submodules HEAD -- submodule > actual &&
> +   test_cmp expect actual
> +'

Mind to add tests for
* recursive submodules (say 2 levels), preferrably not having the
  gitlink at the root each, i.e. root has a sub1 at path subs/sub1 and
sub1 has a sub2
  at path subs/sub2, such that recursing would produce a path like
  HEAD:subs/sub1/subs/sub2/dir/file ?
* file names with a colon in it
* instead of just HEAD referencing trees, maybe a sha1 referenced test as well
  (though it is not immediately clear what the benefit would be)
* what if the submodule doesn't have the commit referenced in the given sha1

Thanks,
Stefan

Re: [PATCH 14/16] checkout: recurse into submodules if asked to

On 11/15, Stefan Beller wrote:
> +int option_parse_recurse_submodules(const struct option *opt,
> + const char *arg, int unset)
> +{
> + if (unset) {
> + recurse_submodules = RECURSE_SUBMODULES_OFF;
> + return 0;
> + }
> + if (arg)
> + recurse_submodules =
> + parse_update_recurse_submodules_arg(opt->long_name,
> + arg);
> + else
> + recurse_submodules = RECURSE_SUBMODULES_ON;
> +
> + return 0;
> +}

I assume it is ok to always return 0 from this function?  Also, I know
we don't like having braces on single statement if's but it is a bit
hard to read that multi-line statement without braces.  But that's just
my opinion :)

-- 
Brandon Williams

Re: [PATCH 13/16] submodule: teach unpack_trees() to update submodules

On 11/15, Stefan Beller wrote:
> + if (!S_ISGITLINK(ce->ce_mode)) {
> + int flags = 
> CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE;

For readability you may want to have spaces between the two flags

> + if (o->index_only
> + || (!((old->ce_flags & CE_VALID) || ce_skip_worktree(old))
> + && (o->reset || ce_uptodate(old
> + return 0;

The coding guidelines say that git prefers to have the logical operators
on the right side like this:

if (o->index_only ||
(!((old->ce_flags & CE_VALID) || ce_skip_worktree(old)) &&
 (o->reset || ce_uptodate(old
return 0;

It also says the other way is ok too, just a thought :)

> + if (submodule_is_interesting(old->name, 
> null_sha1)
> + && is_submodule_checkout_safe(new->name, 
> >oid))
> + return 0;

here too.

> + } else {
> + /*
> +  * new is not a submodule any more, so only
> +  * care if we care:
> +  */
> + if (submodule_is_interesting(old->name, 
> null_sha1)
> + && ok_to_remove_submodule(old->name))
> + return 0;

and here

-- 
Brandon Williams

RE: [PATCH 13/16] submodule: teach unpack_trees() to update submodules

[I've reviewed up-to and including 13; I'll look at 14-16 tomorrow-ish]

> -Original Message-
> From: Stefan Beller [mailto:sbel...@google.com]
> Sent: Tuesday, November 15, 2016 6:07 PM
> Cc: git@vger.kernel.org; bmw...@google.com; gits...@pobox.com;
> jrnie...@gmail.com; mogulgu...@gmail.com; David Turner; Stefan Beller
> Subject: [PATCH 13/16] submodule: teach unpack_trees() to update
> submodules
...
>   msgs[ERROR_NOT_UPTODATE_DIR] =
>   _("Updating the following directories would lose untracked
> files in it:\n%s");
> + msgs[ERROR_NOT_UPTODATE_SUBMODULE] =
> + _("Updating the following submodules would lose modifications
> in
> +it:\n%s");

s/it/them/

>   if (!strcmp(cmd, "checkout"))
>   msg = advice_commit_before_merge
> @@ -1315,19 +1320,18 @@ static int verify_uptodate_1(const struct
> cache_entry *ce,
>   return 0;
> 
>   if (!lstat(ce->name, )) {
> - int flags =
> CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE;
> - unsigned changed = ie_match_stat(o->src_index, ce, ,
> flags);
> - if (!changed)
> - return 0;
> - /*
> -  * NEEDSWORK: the current default policy is to allow
> -  * submodule to be out of sync wrt the superproject
> -  * index.  This needs to be tightened later for
> -  * submodules that are marked to be automatically
> -  * checked out.
> -  */
> - if (S_ISGITLINK(ce->ce_mode))
> - return 0;
> + if (!S_ISGITLINK(ce->ce_mode)) {

I generally prefer to avoid if (!x) { A } else { B } -- I would rather just see 
if (x) { B } else { A }.

> + if (!changed) {
> + /* old is always a submodule */
> + if (S_ISGITLINK(new->ce_mode)) {
> + /*
> +  * new is also a submodule, so check if we care
> +  * and then if can checkout the new sha1 safely
> +  */
> + if (submodule_is_interesting(old->name, 
> null_sha1)
> + && is_submodule_checkout_safe(new->name, 
> 
> >oid))
> + return 0;
> + } else {
> + /*
> +  * new is not a submodule any more, so only
> +  * care if we care:
> +  */
> + if (submodule_is_interesting(old->name, 
> null_sha1)
> + && ok_to_remove_submodule(old->name))
> + return 0;
> + }

Do we need a return 1 in here somewhere?  Because otherwise, we fall through 
and return 0 later.

Bug with disabling compression and 'binary' files.

2016-11-15 Thread Douglas Cox

I was doing some experiments today with large-ish (100-200MB) binary
files and was trying to determine the best configuration for Git. Here
are the steps and timings I saw:

git init Test
cp .../largemovie.mp4 .
time git add largemovie.mp4

This took 6.5s for a 200MB file.

This file compressed a little bit, but not enough to matter, so I
wanted to see how much faster the add would be with compression
disabled. So I ran:

git config core.compression = 0

I then completely reran the test above starting with a clean
repository. This time the add took only 2.08s.  I repeated these two
tests about 10 times using the same file each time to verify it wasn't
related to disk caching, etc.

At this point I decided that this was likely a good setting for this
repository, so I also created a .gitattributes file and added a few
entries often seen in suggestions for dealing with binary files:

*.mp4 binary -delta

The goal here was to disable any delta compression done during a
gc/pack and use the other settings 'binary' applies. Unfortunately
when I ran the test again (still using compression = 0), the test was
back to taking 6.5s and the file inside the .git/objects/ folder was
compressed again.

I narrowed this down to the '-text' attribute that is set when
specifying 'binary'.  For some reason this flag is cancelling out the
core.compression = 0 setting and I think this is a bug?

Unfortunately core.compression = 0 is also global. Ideally it would be
great if there was a separate 'compression' attribute that could be
specified in .gitattributes per wildcard similar to how -delta can be
used. This way we would still be able to get compression for
text/source files, while still getting the speed of skipping
compression for binary files that do not compress well.

Has there been any discussion on having an attribute similar to this?

Thanks,
-Doug

Re: [PATCH v15 01/27] bisect--helper: use OPT_CMDMODE instead of OPT_BOOL

Stephan Beyer  writes:

> Besides the things I'm mentioning in respective patch e-mails, I wonder
> why several bisect--helper commands are prefixed by "bisect"; I'm
> talking about:
>
>   git bisect--helper --bisect-clean-state
>...
>   git bisect--helper --bisect-start
>   etc.
>
> instead of
>
>   git bisect--helper --clean-state
>...
>   git bisect--helper --start
>   etc.
>
> Well, I know *why* they have these names: because the shell function
> names are simply reused. But I don't know why these prefixes are kept in
> the bisect--helper command options. On the other hand, these command
> names are not exposed to the user and may hence not be that important.(?)

That's a good point ;-) 

These are not intended to be end-user entry points, so names that
are bit longer than necessary does not bother me too much.
Hopefully the longer-term endgame would be not to need a separate
"bisect-helper" binary at all but to have a "git bisect" binary
making these requests as subroutine calls, and at that point, the
names of the functions would want to have "bisect" prefix.

RE: [PATCH 07/16] update submodules: introduce submodule_is_interesting

> -Original Message-
> From: Stefan Beller [mailto:sbel...@google.com]
> Sent: Tuesday, November 15, 2016 6:07 PM
> Cc: git@vger.kernel.org; bmw...@google.com; gits...@pobox.com;
> jrnie...@gmail.com; mogulgu...@gmail.com; David Turner; Stefan Beller
> Subject: [PATCH 07/16] update submodules: introduce
> submodule_is_interesting
> +int submodule_is_interesting(const char *path, const unsigned char
> +*sha1) {

This is apparently only ever (in this series) called with null_sha1.  So either 
this arg is unnecessary, or there are bugs elsewhere in the code.

Re: [PATCH 11/16] teach unpack_trees() to remove submodule contents

On 11/15, Stefan Beller wrote:
> Extend rmdir_or_warn() to remove the directories of those submodules which
> are scheduled for removal. Also teach verify_clean_submodule() to check
> that a submodule configured to be removed is not modified before scheduling
> it for removal.
> 
> Signed-off-by: Stefan Beller 
> ---
>  unpack-trees.c | 6 ++
>  wrapper.c  | 4 
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/unpack-trees.c b/unpack-trees.c
> index ea6bdd2..576e1d5 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -9,6 +9,7 @@
>  #include "refs.h"
>  #include "attr.h"
>  #include "split-index.h"
> +#include "submodule.h"
>  #include "dir.h"
>  
>  /*
> @@ -1361,15 +1362,12 @@ static void invalidate_ce_path(const struct 
> cache_entry *ce,
>  /*
>   * Check that checking out ce->sha1 in subdir ce->name is not
>   * going to overwrite any working files.
> - *
> - * Currently, git does not checkout subprojects during a superproject
> - * checkout, so it is not going to overwrite anything.
>   */
>  static int verify_clean_submodule(const struct cache_entry *ce,
> enum unpack_trees_error_types error_type,
> struct unpack_trees_options *o)
>  {
> - return 0;
> + return submodule_is_interesting(ce->name, null_sha1) && 
> is_submodule_modified(ce->name, 0);
>  }

So what does the return value from this function meant to mean? Is '1'
mean the submodule is clean while '0' indicates it is dirty or is it the
reverse of that?  Reading this it seems to me a value of '1' means "yes
the submodule is clean!" but the way the return value is calculated
tells a different story.  Either I'm understanding it incorrectly or I
think the return should be something like this:

  return submodule_is_interesting(ce->name, null_sha1) && 
!is_submodule_modified(ce->name, 0);

Where we return '1' if the submodule is interesting and it hasn't been
modified.

>  
>  static int verify_clean_subdirectory(const struct cache_entry *ce,
> diff --git a/wrapper.c b/wrapper.c
> index e7f1979..17c08de 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -2,6 +2,7 @@
>   * Various trivial helper wrappers around standard functions
>   */
>  #include "cache.h"
> +#include "submodule.h"
>  
>  static void do_nothing(size_t size)
>  {
> @@ -592,6 +593,9 @@ int unlink_or_warn(const char *file)
>  
>  int rmdir_or_warn(const char *file)
>  {
> + if (submodule_is_interesting(file, null_sha1)
> + && depopulate_submodule(file))
> + return -1;
>   return warn_if_unremovable("rmdir", file, rmdir(file));
>  }

It seems weird to me that rmdir is doing checks to see if the file being
removed is a submodule.  Shouldn't those checks have occurred before
calling rmdir?

-- 
Brandon Williams

Re: [PATCH v3 4/4] submodule_needs_pushing() NEEDSWORK when we can not answer this question

Stefan Beller  writes:

>> "We do not know" ...
>
> ... because there is no way to check for us as we don't have the
> submodule commits.
>
> " We do consider it safe as no one in their sane mind would
> have changed the submodule pointers without having the
> submodule around. If a user did however change the submodules
> without having the submodule commits around, this indicates an
> expert who knows what they were doing."

I didn't think it through myself to arrive at such a conclusion, but
to me the above sounds like a sensible reasoning [*1*].

>>   We currently
>> +* proceed pushing here as if the submodules commits are
>> +* available on a remote. Since we can not check the
>> +* remote availability for this submodule we should
>> +* consider changing this behavior to: Stop here and
>> +* tell the user how to skip this check if wanted.
>> +*/
>> return 0;
>
> Thanks for adding the NEEDSWORK, I just wrote the above lines
> to clarify my thought process, not as a suggestion for change.

One thing I would suggest would be "Stop here, explain the situation
and then tell the user how to skip".  I am not convinced that it
would _help_ users and make it _safer_ if we stopped here, though.

> Overall the series looks good to me; the nits are minor IMHO.

Ditto.

[Footnote]

*1* My version was more like "we do not know if they would get into
a situation where they do not have enough submodule commits if
we pushed our superproject, but more importantly, we DO KNOW
that it would not help an iota if we pushed our submodule to
them, so there is no point stopping the push of superproject
saying 'no, no, no, you must push the submodule first'".

RE: [PATCH 09/16] update submodules: add scheduling to update submodules

> -Original Message-
> From: Brandon Williams [mailto:bmw...@google.com]
> > +struct scheduled_submodules_update_type {
> > +   const char *path;
> > +   const struct object_id *oid;
> > +   /*
> > +* Do we need to perform a complete checkout or just incremental
> > +* update?
> > +*/
> > +   unsigned is_new:1;
> > +} *scheduled_submodules;
> > +#define SCHEDULED_SUBMODULES_INIT {NULL, NULL}
> 
> I may not know enough about these types of initializors but that Init
> macro only has 2 entries while there are three entries in the struct
> itself.

In fact, only the first NULL is necessary; unspecified initializer entries in C 
default to zero.

Re: [PATCH v3 4/6] grep: optionally recurse into submodules

On Fri, Nov 11, 2016 at 3:51 PM, Brandon Williams  wrote:
> Allow grep to recognize submodules and recursively search for patterns in
> each submodule.  This is done by forking off a process to recursively
> call grep on each submodule.  The top level --super-prefix option is
> used to pass a path to the submodule which can in turn be used to
> prepend to output or in pathspec matching logic.
>
> Recursion only occurs for submodules which have been initialized and
> checked out by the parent project.  If a submodule hasn't been
> initialized and checked out it is simply skipped.
>
> In order to support the existing multi-threading infrastructure in grep,
> output from each child process is captured in a strbuf so that it can be
> later printed to the console in an ordered fashion.
>
> To limit the number of theads that are created, each child process has
> half the number of threads as its parents (minimum of 1), otherwise we
> potentailly have a fork-bomb.
>
> Signed-off-by: Brandon Williams 
> ---
>  Documentation/git-grep.txt |   5 +
>  builtin/grep.c | 300 
> ++---
>  git.c  |   2 +-
>  t/t7814-grep-recurse-submodules.sh |  99 
>  4 files changed, 385 insertions(+), 21 deletions(-)
>  create mode 100755 t/t7814-grep-recurse-submodules.sh
>
> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> index 0ecea6e..17aa1ba 100644
> --- a/Documentation/git-grep.txt
> +++ b/Documentation/git-grep.txt
> @@ -26,6 +26,7 @@ SYNOPSIS
>[--threads ]
>[-f ] [-e] 
>[--and|--or|--not|(|)|-e ...]
> +  [--recurse-submodules]
>[ [--[no-]exclude-standard] [--cached | --no-index | --untracked] 
> | ...]
>[--] [...]
>
> @@ -88,6 +89,10 @@ OPTIONS
> mechanism.  Only useful when searching files in the current directory
> with `--no-index`.
>
> +--recurse-submodules::
> +   Recursively search in each submodule that has been initialized and
> +   checked out in the repository.
> +
>  -a::
>  --text::
> Process binary files as if they were text.
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 8887b6a..1fd292f 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -18,12 +18,20 @@
>  #include "quote.h"
>  #include "dir.h"
>  #include "pathspec.h"
> +#include "submodule.h"
>
>  static char const * const grep_usage[] = {
> N_("git grep [] [-e]  [...] [[--] ...]"),
> NULL
>  };
>
> +static const char *super_prefix;
> +static int recurse_submodules;
> +static struct argv_array submodule_options = ARGV_ARRAY_INIT;
> +
> +static int grep_submodule_launch(struct grep_opt *opt,
> +const struct grep_source *gs);
> +
>  #define GREP_NUM_THREADS_DEFAULT 8
>  static int num_threads;
>
> @@ -174,7 +182,10 @@ static void *run(void *arg)
> break;
>
> opt->output_priv = w;
> -   hit |= grep_source(opt, >source);
> +   if (w->source.type == GREP_SOURCE_SUBMODULE)
> +   hit |= grep_submodule_launch(opt, >source);
> +   else
> +   hit |= grep_source(opt, >source);
> grep_source_clear_data(>source);
> work_done(w);
> }
> @@ -300,6 +311,10 @@ static int grep_sha1(struct grep_opt *opt, const 
> unsigned char *sha1,
> if (opt->relative && opt->prefix_length) {
> quote_path_relative(filename + tree_name_len, opt->prefix, 
> );
> strbuf_insert(, 0, filename, tree_name_len);
> +   } else if (super_prefix) {
> +   strbuf_add(, filename, tree_name_len);
> +   strbuf_addstr(, super_prefix);
> +   strbuf_addstr(, filename + tree_name_len);
> } else {
> strbuf_addstr(, filename);
> }
> @@ -328,10 +343,13 @@ static int grep_file(struct grep_opt *opt, const char 
> *filename)
>  {
> struct strbuf buf = STRBUF_INIT;
>
> -   if (opt->relative && opt->prefix_length)
> +   if (opt->relative && opt->prefix_length) {
> quote_path_relative(filename, opt->prefix, );
> -   else
> +   } else {
> +   if (super_prefix)
> +   strbuf_addstr(, super_prefix);
> strbuf_addstr(, filename);
> +   }
>
>  #ifndef NO_PTHREADS
> if (num_threads) {
> @@ -378,31 +396,258 @@ static void run_pager(struct grep_opt *opt, const char 
> *prefix)
> exit(status);
>  }
>
> -static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, 
> int cached)
> +static void compile_submodule_options(const struct grep_opt *opt,
> + const struct pathspec *pathspec,
> + int cached, int untracked,
> + int

Re: [PATCH 10/16] update submodules: is_submodule_checkout_safe

On 11/15, Stefan Beller wrote:
> In later patches we introduce the options and flag for commands
> that modify the working directory, e.g. git-checkout.
> 
> This piece of code will answer the question:
> "Is it safe to change the submodule to this new state?"
> e.g. is it overwriting untracked files or are there local
> changes that would be overwritten?
> 
> Signed-off-by: Stefan Beller 
> ---
>  submodule.c | 22 ++
>  submodule.h |  2 ++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/submodule.c b/submodule.c
> index 2773151..2149ef7 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1201,6 +1201,28 @@ int ok_to_remove_submodule(const char *path)
>   return ok_to_remove;
>  }
>  
> +int is_submodule_checkout_safe(const char *path, const struct object_id *oid)
> +{
> + struct child_process cp = CHILD_PROCESS_INIT;
> +
> + if (!is_submodule_populated(path))
> + /* The submodule is not populated, it's safe to check it out */
> + /*
> +  * TODO: When git learns to re-populate submodules, a check 
> must be
> +  * added here to assert that no local files will be overwritten.
> +  */

When you mean local files do you mean in the situation where we want to
checkout a submodule at path 'foo' but there already exists a file at
path 'foo'?

> + return 1;
> +
> + argv_array_pushl(, "read-tree", "-n", "-m", "HEAD",
> +  sha1_to_hex(oid->hash), NULL);
> +
> + prepare_submodule_repo_env(_array);
> + cp.git_cmd = 1;
> + cp.no_stdin = 1;
> + cp.dir = path;
> + return run_command() == 0;
> +}
> +
>  static int find_first_merges(struct object_array *result, const char *path,
>   struct commit *a, struct commit *b)
>  {
> diff --git a/submodule.h b/submodule.h
> index f01f87c..a7fa634 100644
> --- a/submodule.h
> +++ b/submodule.h
> @@ -74,6 +74,8 @@ extern unsigned is_submodule_modified(const char *path, int 
> ignore_untracked);
>  extern int is_submodule_populated(const char *path);
>  extern int submodule_uses_gitfile(const char *path);
>  extern int ok_to_remove_submodule(const char *path);
> +extern int is_submodule_checkout_safe(const char *path,
> +   const struct object_id *oid);
>  extern int merge_submodule(unsigned char result[20], const char *path,
>  const unsigned char base[20],
>  const unsigned char a[20],
> -- 
> 2.10.1.469.g00a8914
> 

-- 
Brandon Williams

Re: gitweb html validation

Ralf Thielow  writes:

> Only block level elements are
> allowed to be inside form tags, according to
> https://www.w3.org/2010/04/xhtml10-strict.html#elem_form
> ...
> I think it's better to just move the -Tag outside of the
> surrounding div?
> Something like this perhaps, I didn't test it myself yet.

That sounds like a sensible update to me (no, I do not run gitweb
myself).  Is this the only  we have in the UI, or is it the
only one that is problematic?

> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 7cf68f07b..33d7c154f 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -5531,8 +5531,8 @@ sub git_project_search_form {
>   $limit = " in '$project_filter/'";
>   }
>  
> - print "\n";
>   print $cgi->start_form(-method => 'get', -action => $my_uri) .
> +   "\n" .
> $cgi->hidden(-name => 'a', -value => 'project_list')  . "\n";
>   print $cgi->hidden(-name => 'pf', -value => $project_filter). "\n"
>   if (defined $project_filter);
> @@ -5544,11 +5544,11 @@ sub git_project_search_form {
>-checked => $search_use_regexp) .
> "\n" .
> $cgi->submit(-name => 'btnS', -value => 'Search') .
> -   $cgi->end_form() . "\n" .
> $cgi->a({-href => href(project => undef, searchtext => undef,
>project_filter => $project_filter)},
> esc_html("List all projects$limit")) . "\n";
> - print "\n";
> + print "\n" .
> +   $cgi->end_form() . "\n";
>  }
>  
>  # entry for given @keys needs filling if at least one of keys in list
> diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css
> index 321260103..507740b6a 100644
> --- a/gitweb/static/gitweb.css
> +++ b/gitweb/static/gitweb.css
> @@ -539,7 +539,7 @@ div.projsearch {
>   margin: 20px 0px;
>  }
>  
> -div.projsearch form {
> +form div.projsearch {
>   margin-bottom: 2px;
>  }
>

Re: [PATCH 09/16] update submodules: add scheduling to update submodules

On 11/15, Stefan Beller wrote:
> +static int update_submodule(const char *path, const struct object_id *oid,
> + int force, int is_new)
> +{
> + const char *git_dir;
> + struct child_process cp = CHILD_PROCESS_INIT;
> + const struct submodule *sub = submodule_from_path(null_sha1, path);
> +
> + if (!sub || !sub->name)
> + return -1;
> +
> + git_dir = resolve_gitdir(git_common_path("modules/%s", sub->name));
> +
> + if (!git_dir)
> + return -1;
> +
> + if (is_new)
> + connect_work_tree_and_git_dir(path, git_dir);

> +
> + /* update index via `read-tree --reset sha1` */
> + argv_array_pushl(, "read-tree",
> +force ? "--reset" : "-m",
> +"-u", sha1_to_hex(oid->hash), NULL);
> + prepare_submodule_repo_env(_array);
> + cp.git_cmd = 1;
> + cp.no_stdin = 1;
> + cp.dir = path;
> + if (run_command()) {
> + warning(_("reading the index in submodule '%s' failed"), path);
> + child_process_clear();
> + return -1;
> + }
> +
> + /* write index to working dir */
> + child_process_clear();
> + child_process_init();
> + argv_array_pushl(, "checkout-index", "-a", NULL);
> + cp.git_cmd = 1;
> + cp.no_stdin = 1;
> + cp.dir = path;
> + if (force)
> + argv_array_push(, "-f");
> +
> + if (run_command()) {
> + warning(_("populating the working directory in submodule '%s' 
> failed"), path);
> + child_process_clear();
> + return -1;
> + }
> +
> + /* get the HEAD right */
> + child_process_clear();
> + child_process_init();
> + argv_array_pushl(, "checkout", "--recurse-submodules", NULL);
> + cp.git_cmd = 1;
> + cp.no_stdin = 1;
> + cp.dir = path;
> + if (force)
> + argv_array_push(, "-f");
> + argv_array_push(, sha1_to_hex(oid->hash));
> +
> + if (run_command()) {
> + warning(_("setting the HEAD in submodule '%s' failed"), path);
> + child_process_clear();
> + return -1;
> + }
> +
> + child_process_clear();
> + return 0;
> +}

If run command is successful then it handles the clearing of the child
process struct, correct?  Is there a negative to having all the explicit
clears when the child was successful?

> +
>  int depopulate_submodule(const char *path)
>  {
>   int ret = 0;
> @@ -1336,3 +1405,51 @@ void prepare_submodule_repo_env(struct argv_array *out)
>   }
>   argv_array_push(out, "GIT_DIR=.git");
>  }
> +
> +struct scheduled_submodules_update_type {
> + const char *path;
> + const struct object_id *oid;
> + /*
> +  * Do we need to perform a complete checkout or just incremental
> +  * update?
> +  */
> + unsigned is_new:1;
> +} *scheduled_submodules;
> +#define SCHEDULED_SUBMODULES_INIT {NULL, NULL}

I may not know enough about these types of initializors but that Init
macro only has 2 entries while there are three entries in the struct
itself.

> +
> +int scheduled_submodules_nr, scheduled_submodules_alloc;

Should these globals be static since they should be scoped to only this
file?

> +
> +void schedule_submodule_for_update(const struct cache_entry *ce, int is_new)
> +{
> + struct scheduled_submodules_update_type *ssu;
> + ALLOC_GROW(scheduled_submodules,
> +scheduled_submodules_nr + 1,
> +scheduled_submodules_alloc);
> + ssu = _submodules[scheduled_submodules_nr++];
> + ssu->path = ce->name;
> + ssu->oid = >oid;
> + ssu->is_new = !!is_new;
> +}
> +
> +int update_submodules(int force)
> +{
> + int i;
> + gitmodules_config();
> +
> + /*
> +  * NEEDSWORK: As submodule updates can potentially take some
> +  * time each and they do not overlap (i.e. no d/f conflicts;
> +  * this can be parallelized using the run_commands.h API.
> +  */
> + for (i = 0; i < scheduled_submodules_nr; i++) {
> + struct scheduled_submodules_update_type *ssu =
> + _submodules[i];
> +
> + if (submodule_is_interesting(ssu->path, null_sha1))
> + update_submodule(ssu->path, ssu->oid,
> +  force, ssu->is_new);
> + }
> +
> + scheduled_submodules_nr = 0;
> + return 0;
> +}

nit: organization wise it makes more sense to me to have the
'update_submodule' helper function be located more closely to the
'update_submodules' function.

-- 
Brandon Williams

Re: [PATCH v3 1/6] submodules: add helper functions to determine presence of submodules

On Fri, Nov 11, 2016 at 3:51 PM, Brandon Williams  wrote:
> Add two helper functions to submodules.c.
> `is_submodule_initialized()` checks if a submodule has been initialized
> at a given path and `is_submodule_populated()` check if a submodule
> has been checked out at a given path.

This reminds me to write the documentation patch explaining the
concepts of submodules (specifically that overview page would state
all the possible states of submodules)

This patch looks good,
Stefan

> +
> +   if (module) {
> +   char *key = xstrfmt("submodule.%s.url", module->name);
> +   char *value = NULL;

minor nit:
In case a reroll is needed, you could replace `value` by
`not_needed` or `unused` to make it easier to follow.
Hence it also doesn't need initialization (Doh, it does
for free() to work, nevermind).

Re: [PATCH 08/16] update submodules: add depopulate_submodule

On 11/15, Stefan Beller wrote:
> Implement the functionality needed to enable work tree manipulating
> commands to that a deleted submodule should not only affect the index

"to that a deleted"  did you mean "so that a deleted"

> (leaving all the files of the submodule in the work tree) but also to
> remove the work tree of the superproject (including any untracked
> files).
> 
> To do so, we need an equivalent of "rm -rf", which is already found in
> entry.c, so expose that and for clarity add a suffix "_or_dir" to it.
> 
> That will only work properly when the submodule uses a gitfile instead of
> a .git directory and no untracked files are present. Otherwise the removal
> will fail with a warning (which is just what happened until now).

So if a submodule uses a .git directory then it will be ignored during
the checkout?  All other submodules will actually be removed? Couldn't
you end up in an undesirable state with a checkout effecting one
submodule but not another?

> diff --git a/cache.h b/cache.h
> index a50a61a..65c47e4 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -2018,4 +2018,6 @@ void sleep_millisec(int millisec);
>   */
>  void safe_create_dir(const char *dir, int share);
>  
> +void remove_subtree_or_die(const char *path);
> +
>  #endif /* CACHE_H */

Should probably place an explicit 'extern' in the function prototype.

> +int depopulate_submodule(const char *path)
> +{
> + int ret = 0;
> + char *dot_git = xstrfmt("%s/.git", path);
> +
> + /* Is it populated? */
> + if (!resolve_gitdir(dot_git))
> + goto out;
> +
> + /* Does it have a .git directory? */
> + if (!submodule_uses_gitfile(path)) {
> + warning(_("cannot remove submodule '%s' because it (or one of "
> +   "its nested submodules) uses a .git directory"),
> +   path);
> + ret = -1;
> + goto out;
> + }
> +
> + remove_subtree_or_die(path);
> +
> +out:
> + free(dot_git);
> + return ret;
> +}

-- 
Brandon Williams

Re: [PATCH 07/16] update submodules: introduce submodule_is_interesting

On 11/15, Stefan Beller wrote:
> +/**
> + * When updating the working tree, do we need to check if the submodule needs
> + * updating. We do not require a check if we are already sure that the
> + * submodule doesn't need updating, e.g. when we are not interested in 
> submodules
> + * or the submodule is marked uninteresting by being not initialized.
> + */

The first sentence seems a bit awkward.  It seems like its worded as a
question, maybe drop the 'do'?

-- 
Brandon Williams

Re: [PATCH 04/16] update submodules: add is_submodule_populated

On 11/15, Stefan Beller wrote:
> This is nearly same as Brandon sent out.
> (First patch of origin/bw/grep-recurse-submodules,
> will drop this patch once Brandons series is stable
> enough to build on).

I would be thrilled to see more reviews on that series :D

-- 
Brandon Williams

Re: [PATCH v15 13/27] bisect--helper: `bisect_start` shell function partially in C

Hi,

On 10/14/2016 04:14 PM, Pranit Bauva wrote:
> diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
> index 6a5878c..1d3e17f 100644
> --- a/builtin/bisect--helper.c
> +++ b/builtin/bisect--helper.c
> @@ -24,6 +27,8 @@ static const char * const git_bisect_helper_usage[] = {
>   N_("git bisect--helper --bisect-check-and-set-terms  
>  "),
>   N_("git bisect--helper --bisect-next-check []  
>    N_("git bisect--helper --bisect-terms [--term-good | --term-old | 
> --term-bad | --term-new]"),
> + N_("git bisect--helper --bisect start [--term-{old,good}= 
> --term-{new,bad}=]"
> +   "[--no-checkout] [ 
> [...]] [--] [...]"),

Typo: "--bisect start" with space instead of "-"

> @@ -403,6 +408,205 @@ static int bisect_terms(struct bisect_terms *terms, 
> const char **argv, int argc)
>   return 0;
>  }
>  
> +static int bisect_start(struct bisect_terms *terms, int no_checkout,
> + const char **argv, int argc)
> +{
> + int i, has_double_dash = 0, must_write_terms = 0, bad_seen = 0;
> + int flags, pathspec_pos, retval = 0;
> + struct string_list revs = STRING_LIST_INIT_DUP;
> + struct string_list states = STRING_LIST_INIT_DUP;
> + struct strbuf start_head = STRBUF_INIT;
> + struct strbuf bisect_names = STRBUF_INIT;
> + struct strbuf orig_args = STRBUF_INIT;
> + const char *head;
> + unsigned char sha1[20];
> + FILE *fp = NULL;
> + struct object_id oid;
> +
> + if (is_bare_repository())
> + no_checkout = 1;
> +
> + for (i = 0; i < argc; i++) {
> + if (!strcmp(argv[i], "--")) {
> + has_double_dash = 1;
> + break;
> + }
> + }
> +
> + for (i = 0; i < argc; i++) {
> + const char *commit_id = xstrfmt("%s^{commit}", argv[i]);
> + const char *arg = argv[i];
> + if (!strcmp(argv[i], "--")) {
> + has_double_dash = 1;

This is without effect since has_double_dash is already set to 1 by the
loop above. I think you can remove this line.

> + break;
> + } else if (!strcmp(arg, "--no-checkout")) {
> + no_checkout = 1;
> + } else if (!strcmp(arg, "--term-good") ||
> +  !strcmp(arg, "--term-old")) {
> + must_write_terms = 1;
> + terms->term_good = xstrdup(argv[++i]);
> + } else if (skip_prefix(arg, "--term-good=", )) {
> + must_write_terms = 1;
> + terms->term_good = xstrdup(arg);
> + } else if (skip_prefix(arg, "--term-old=", )) {
> + must_write_terms = 1;
> + terms->term_good = xstrdup(arg);

I think you can join the last two branches:

+   } else if (skip_prefix(arg, "--term-good=", ) ||
+  skip_prefix(arg, "--term-old=", )) {
+   must_write_terms = 1;
+   terms->term_good = xstrdup(arg);

> + } else if (!strcmp(arg, "--term-bad") ||
> +  !strcmp(arg, "--term-new")) {
> + must_write_terms = 1;
> + terms->term_bad = xstrdup(argv[++i]);
> + } else if (skip_prefix(arg, "--term-bad=", )) {
> + must_write_terms = 1;
> + terms->term_bad = xstrdup(arg);
> + } else if (skip_prefix(arg, "--term-new=", )) {
> + must_write_terms = 1;
> + terms->term_good = xstrdup(arg);

This has to be terms->term_bad = ...

Also, you can join the last two branches, again, ie,

+   } else if (skip_prefix(arg, "--term-bad=", ) ||
+  skip_prefix(arg, "--term-new=", )) {
+   must_write_terms = 1;
+   terms->term_bad = xstrdup(arg);

> + } else if (starts_with(arg, "--") &&
> +  !one_of(arg, "--term-good", "--term-bad", NULL)) {
> + die(_("unrecognised option: '%s'"), arg);
[...]
> + /*
> +  * Verify HEAD
> +  */
> + head = resolve_ref_unsafe("HEAD", 0, sha1, );
> + if (!head)
> + if (get_sha1("HEAD", sha1))
> + die(_("Bad HEAD - I need a HEAD"));
> +
> + if (!is_empty_or_missing_file(git_path_bisect_start())) {

You were so eager to re-use the comments from the shell script, but you
forgot the "Check if we are bisecting." comment above this line ;-)

> + /* Reset to the rev from where we started */
> + strbuf_read_file(_head, git_path_bisect_start(), 0);
> + strbuf_trim(_head);
> + if (!no_checkout) {
> + struct argv_array argv = ARGV_ARRAY_INIT;
[...]
> + if (must_write_terms)
> + if (write_terms(terms->term_bad,

[PATCH 15/16] completion: add '--recurse-submodules' to checkout

Signed-off-by: Stefan Beller 
---
 contrib/completion/git-completion.bash | 2 +-
 t/t9902-completion.sh  | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 21016bf..28acfdb 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1068,7 +1068,7 @@ _git_checkout ()
--*)
__gitcomp "
--quiet --ours --theirs --track --no-track --merge
-   --conflict= --orphan --patch
+   --conflict= --orphan --patch --recurse-submodules
"
;;
*)
diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 2ba62fb..d2d1102 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -447,6 +447,7 @@ test_expect_success 'double dash "git checkout"' '
--conflict=
--orphan Z
--patch Z
+   --recurse-submodules Z
EOF
 '
 
-- 
2.10.1.469.g00a8914

[PATCH 05/16] update submodules: add submodule config parsing

Similar as in b33a15b08 (push: add recurseSubmodules config option,
2015-11-17) and 027771fcb1 (submodule: allow erroneous values for the
fetchRecurseSubmodules option, 2015-08-17), we add submodule-config code
that is later used to parse whether we are interested in updating
submodules.

We need the `die_on_error` parameter to be able to call this parsing
function for the config file as well, which if incorrect let's Git die.

As we're just touching the header file, also mark all functions extern.

Signed-off-by: Stefan Beller 
---
 submodule-config.c | 22 ++
 submodule-config.h | 17 +
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 098085b..4b5297e 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -234,6 +234,28 @@ int parse_fetch_recurse_submodules_arg(const char *opt, 
const char *arg)
return parse_fetch_recurse(opt, arg, 1);
 }
 
+static int parse_update_recurse(const char *opt, const char *arg,
+   int die_on_error)
+{
+   switch (git_config_maybe_bool(opt, arg)) {
+   case 1:
+   return RECURSE_SUBMODULES_ON;
+   case 0:
+   return RECURSE_SUBMODULES_OFF;
+   default:
+   if (!strcmp(arg, "checkout"))
+   return RECURSE_SUBMODULES_ON;
+   if (die_on_error)
+   die("bad %s argument: %s", opt, arg);
+   return RECURSE_SUBMODULES_ERROR;
+   }
+}
+
+int parse_update_recurse_submodules_arg(const char *opt, const char *arg)
+{
+   return parse_update_recurse(opt, arg, 1);
+}
+
 static int parse_push_recurse(const char *opt, const char *arg,
   int die_on_error)
 {
diff --git a/submodule-config.h b/submodule-config.h
index d05c542..992bfbe 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -22,13 +22,14 @@ struct submodule {
int recommend_shallow;
 };
 
-int parse_fetch_recurse_submodules_arg(const char *opt, const char *arg);
-int parse_push_recurse_submodules_arg(const char *opt, const char *arg);
-int parse_submodule_config_option(const char *var, const char *value);
-const struct submodule *submodule_from_name(const unsigned char *commit_sha1,
-   const char *name);
-const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
-   const char *path);
-void submodule_free(void);
+extern int parse_fetch_recurse_submodules_arg(const char *opt, const char 
*arg);
+extern int parse_update_recurse_submodules_arg(const char *opt, const char 
*arg);
+extern int parse_push_recurse_submodules_arg(const char *opt, const char *arg);
+extern int parse_submodule_config_option(const char *var, const char *value);
+extern const struct submodule *submodule_from_name(
+   const unsigned char *commit_sha1, const char *name);
+extern const struct submodule *submodule_from_path(
+   const unsigned char *commit_sha1, const char *path);
+extern void submodule_free(void);
 
 #endif /* SUBMODULE_CONFIG_H */
-- 
2.10.1.469.g00a8914

[PATCH 10/16] update submodules: is_submodule_checkout_safe

In later patches we introduce the options and flag for commands
that modify the working directory, e.g. git-checkout.

This piece of code will answer the question:
"Is it safe to change the submodule to this new state?"
e.g. is it overwriting untracked files or are there local
changes that would be overwritten?

Signed-off-by: Stefan Beller 
---
 submodule.c | 22 ++
 submodule.h |  2 ++
 2 files changed, 24 insertions(+)

diff --git a/submodule.c b/submodule.c
index 2773151..2149ef7 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1201,6 +1201,28 @@ int ok_to_remove_submodule(const char *path)
return ok_to_remove;
 }
 
+int is_submodule_checkout_safe(const char *path, const struct object_id *oid)
+{
+   struct child_process cp = CHILD_PROCESS_INIT;
+
+   if (!is_submodule_populated(path))
+   /* The submodule is not populated, it's safe to check it out */
+   /*
+* TODO: When git learns to re-populate submodules, a check 
must be
+* added here to assert that no local files will be overwritten.
+*/
+   return 1;
+
+   argv_array_pushl(, "read-tree", "-n", "-m", "HEAD",
+sha1_to_hex(oid->hash), NULL);
+
+   prepare_submodule_repo_env(_array);
+   cp.git_cmd = 1;
+   cp.no_stdin = 1;
+   cp.dir = path;
+   return run_command() == 0;
+}
+
 static int find_first_merges(struct object_array *result, const char *path,
struct commit *a, struct commit *b)
 {
diff --git a/submodule.h b/submodule.h
index f01f87c..a7fa634 100644
--- a/submodule.h
+++ b/submodule.h
@@ -74,6 +74,8 @@ extern unsigned is_submodule_modified(const char *path, int 
ignore_untracked);
 extern int is_submodule_populated(const char *path);
 extern int submodule_uses_gitfile(const char *path);
 extern int ok_to_remove_submodule(const char *path);
+extern int is_submodule_checkout_safe(const char *path,
+ const struct object_id *oid);
 extern int merge_submodule(unsigned char result[20], const char *path,
   const unsigned char base[20],
   const unsigned char a[20],
-- 
2.10.1.469.g00a8914

[PATCH 02/16] submodule: modernize ok_to_remove_submodule to use argv_array

Instead of constructing the NULL terminated array ourselves, we
should make use of the argv_array infrastructure.

Signed-off-by: Stefan Beller 
---
 submodule.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/submodule.c b/submodule.c
index 6f7d883..53a6dbb 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1022,13 +1022,6 @@ int ok_to_remove_submodule(const char *path)
 {
ssize_t len;
struct child_process cp = CHILD_PROCESS_INIT;
-   const char *argv[] = {
-   "status",
-   "--porcelain",
-   "-u",
-   "--ignore-submodules=none",
-   NULL,
-   };
struct strbuf buf = STRBUF_INIT;
int ok_to_remove = 1;
 
@@ -1038,7 +1031,8 @@ int ok_to_remove_submodule(const char *path)
if (!submodule_uses_gitfile(path))
return 0;
 
-   cp.argv = argv;
+   argv_array_pushl(, "status", "--porcelain", "-uall",
+  "--ignore-submodules=none", NULL);
prepare_submodule_repo_env(_array);
cp.git_cmd = 1;
cp.no_stdin = 1;
-- 
2.10.1.469.g00a8914

[PATCH 04/16] update submodules: add is_submodule_populated

This is nearly same as Brandon sent out.
(First patch of origin/bw/grep-recurse-submodules,
will drop this patch once Brandons series is stable
enough to build on).

Signed-off-by: Stefan Beller 
---
 submodule.c | 11 +++
 submodule.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/submodule.c b/submodule.c
index c9d22e5..97eaf7c 100644
--- a/submodule.c
+++ b/submodule.c
@@ -914,6 +914,17 @@ int fetch_populated_submodules(const struct argv_array 
*options,
return spf.result;
 }
 
+int is_submodule_populated(const char *path)
+{
+   int retval = 0;
+   struct strbuf gitdir = STRBUF_INIT;
+   strbuf_addf(, "%s/.git", path);
+   if (resolve_gitdir(gitdir.buf))
+   retval = 1;
+   strbuf_release();
+   return retval;
+}
+
 unsigned is_submodule_modified(const char *path, int ignore_untracked)
 {
ssize_t len;
diff --git a/submodule.h b/submodule.h
index afc58d0..d44b4f1 100644
--- a/submodule.h
+++ b/submodule.h
@@ -61,6 +61,7 @@ extern int fetch_populated_submodules(const struct argv_array 
*options,
   const char *prefix, int command_line_option,
   int quiet, int max_parallel_jobs);
 extern unsigned is_submodule_modified(const char *path, int ignore_untracked);
+extern int is_submodule_populated(const char *path);
 extern int submodule_uses_gitfile(const char *path);
 extern int ok_to_remove_submodule(const char *path);
 extern int merge_submodule(unsigned char result[20], const char *path,
-- 
2.10.1.469.g00a8914

[PATCH 03/16] submodule: use absolute path for computing relative path connecting

This addresses a similar concern as in f8eaa0ba98b (submodule--helper,
module_clone: always operate on absolute paths, 2016-03-31)

When computing the relative path from one to another location, we
need to provide both locations as absolute paths to make sure the
computation of the relative path is correct.

While at it, change `real_work_tree` to be non const as we own
the memory.

Signed-off-by: Stefan Beller 
---
 submodule.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/submodule.c b/submodule.c
index 53a6dbb..c9d22e5 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1221,23 +1221,25 @@ void connect_work_tree_and_git_dir(const char 
*work_tree, const char *git_dir)
 {
struct strbuf file_name = STRBUF_INIT;
struct strbuf rel_path = STRBUF_INIT;
-   const char *real_work_tree = xstrdup(real_path(work_tree));
+   char *real_git_dir = xstrdup(real_path(git_dir));
+   char *real_work_tree = xstrdup(real_path(work_tree));
 
/* Update gitfile */
strbuf_addf(_name, "%s/.git", work_tree);
write_file(file_name.buf, "gitdir: %s",
-  relative_path(git_dir, real_work_tree, _path));
+  relative_path(real_git_dir, real_work_tree, _path));
 
/* Update core.worktree setting */
strbuf_reset(_name);
-   strbuf_addf(_name, "%s/config", git_dir);
+   strbuf_addf(_name, "%s/config", real_git_dir);
git_config_set_in_file(file_name.buf, "core.worktree",
-  relative_path(real_work_tree, git_dir,
+  relative_path(real_work_tree, real_git_dir,
 _path));
 
strbuf_release(_name);
strbuf_release(_path);
-   free((void *)real_work_tree);
+   free(real_work_tree);
+   free(real_git_dir);
 }
 
 int parallel_submodules(void)
-- 
2.10.1.469.g00a8914

[PATCH 07/16] update submodules: introduce submodule_is_interesting

In later patches we introduce the --recurse-submodule flag for commands
that modify the working directory, e.g. git-checkout.

It is potentially expensive to check if a submodule needs an update,
because a common theme to interact with submodules is to spawn a child
process for each interaction.

So let's introduce a function that pre checks if a submodule needs
to be checked for an update.

I am not particular happy with the name `submodule_is_interesting`,
in internal iterations I had `submodule_requires_check_for_update`
and `submodule_needs_update`, but I was even less happy with those
names. Maybe `submodule_interesting_for_update`?

Generally this is to answer "Am I allowed to touch the submodule
at all?" or: "Does the user expect me to touch it?"
which includes all of creation/deletion/update.

This patch is based off a prior attempt by Jens Lehmann to add
submodules to checkout.

Signed-off-by: Stefan Beller 
---
 submodule.c | 37 +
 submodule.h |  8 
 2 files changed, 45 insertions(+)

diff --git a/submodule.c b/submodule.c
index 38b0573..d34b721 100644
--- a/submodule.c
+++ b/submodule.c
@@ -500,6 +500,43 @@ void set_config_update_recurse_submodules(int value)
config_update_recurse_submodules = value;
 }
 
+int submodules_interesting_for_update(void)
+{
+   /*
+* Update can't be "none", "merge" or "rebase",
+* treat any value as OFF, except an explicit ON.
+*/
+   return config_update_recurse_submodules == RECURSE_SUBMODULES_ON;
+}
+
+int submodule_is_interesting(const char *path, const unsigned char *sha1)
+{
+   /*
+* If we cannot load a submodule config, we cannot get the name
+* of the submodule, so we'd need to follow the gitlink file
+*/
+   const struct submodule *sub;
+
+   if (!submodules_interesting_for_update())
+   return 0;
+
+   sub = submodule_from_path(sha1, path);
+   if (!sub)
+   return 0;
+
+   switch (sub->update_strategy.type) {
+   case SM_UPDATE_UNSPECIFIED:
+   case SM_UPDATE_CHECKOUT:
+   return 1;
+   case SM_UPDATE_REBASE:
+   case SM_UPDATE_MERGE:
+   case SM_UPDATE_NONE:
+   case SM_UPDATE_COMMAND:
+   return 0;
+   }
+   return 0;
+}
+
 static int has_remote(const char *refname, const struct object_id *oid,
  int flags, void *cb_data)
 {
diff --git a/submodule.h b/submodule.h
index 185ad18..3df6881 100644
--- a/submodule.h
+++ b/submodule.h
@@ -57,6 +57,14 @@ extern void show_submodule_inline_diff(FILE *f, const char 
*path,
const struct diff_options *opt);
 extern void set_config_fetch_recurse_submodules(int value);
 extern void set_config_update_recurse_submodules(int value);
+/**
+ * When updating the working tree, do we need to check if the submodule needs
+ * updating. We do not require a check if we are already sure that the
+ * submodule doesn't need updating, e.g. when we are not interested in 
submodules
+ * or the submodule is marked uninteresting by being not initialized.
+ */
+extern int submodule_is_interesting(const char *path, const unsigned char 
*sha1);
+extern int submodules_interesting_for_update(void);
 extern void check_for_new_submodule_commits(unsigned char new_sha1[20]);
 extern int fetch_populated_submodules(const struct argv_array *options,
   const char *prefix, int command_line_option,
-- 
2.10.1.469.g00a8914

[PATCH 16/16] checkout: add config option to recurse into submodules by default

To make it easier for the user, who doesn't want to give the
`--recurse-submodules` option whenever they run checkout, have an
option for to set the default behavior for checkout to recurse into
submodules.

Signed-off-by: Stefan Beller 
---
 Documentation/config.txt   |  6 ++
 Documentation/git-checkout.txt |  5 +++--
 submodule.c|  6 +++---
 t/t2013-checkout-submodule.sh  | 19 +++
 4 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index a0ab66a..67e0714 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -949,6 +949,12 @@ clean.requireForce::
A boolean to make git-clean do nothing unless given -f,
-i or -n.   Defaults to true.
 
+checkout.recurseSubmodules::
+   This option can be set to a boolean value.
+   When set to true checkout will recurse into submodules and
+   update them. When set to false, which is the default, checkout
+   will leave submodules untouched.
+
 color.branch::
A boolean to enable/disable color in the output of
linkgit:git-branch[1]. May be set to `always`,
diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
index a0ea2c5..819c430 100644
--- a/Documentation/git-checkout.txt
+++ b/Documentation/git-checkout.txt
@@ -260,8 +260,9 @@ section of linkgit:git-add[1] to learn how to operate the 
`--patch` mode.
Using --recurse-submodules will update the content of all initialized
submodules according to the commit recorded in the superproject. If
local modifications in a submodule would be overwritten the checkout
-   will fail until `-f` is used. If nothing (or --no-recurse-submodules)
-   is used, the work trees of submodules will not be updated.
+   will fail until `-f` is used. If `--no-recurse-submodules` is used,
+   the work trees of submodules will not be updated. If no command line
+   argument is given, `checkout.recurseSubmodules` is used as a default.
 
 ::
Branch to checkout; if it refers to a branch (i.e., a name that,
diff --git a/submodule.c b/submodule.c
index 2149ef7..0c807d9 100644
--- a/submodule.c
+++ b/submodule.c
@@ -160,10 +160,10 @@ int submodule_config(const char *var, const char *value, 
void *cb)
return 0;
} else if (starts_with(var, "submodule."))
return parse_submodule_config_option(var, value);
-   else if (!strcmp(var, "fetch.recursesubmodules")) {
+   else if (!strcmp(var, "fetch.recursesubmodules"))
config_fetch_recurse_submodules = 
parse_fetch_recurse_submodules_arg(var, value);
-   return 0;
-   }
+   else if (!strcmp(var, "checkout.recursesubmodules"))
+   config_update_recurse_submodules = 
parse_update_recurse_submodules_arg(var, value);
return 0;
 }
 
diff --git a/t/t2013-checkout-submodule.sh b/t/t2013-checkout-submodule.sh
index 60f6987..788c59d 100755
--- a/t/t2013-checkout-submodule.sh
+++ b/t/t2013-checkout-submodule.sh
@@ -149,6 +149,25 @@ test_expect_success '"checkout --recurse-submodules" 
repopulates submodule' '
)
 '
 
+test_expect_success 'option checkout.recurseSubmodules updates submodule' '
+   test_config -C super checkout.recurseSubmodules 1 &&
+   (
+   cd super &&
+   git checkout base &&
+   git checkout -b advanced-base &&
+   git -C submodule commit --allow-empty -m "empty commit" &&
+   git add submodule &&
+   git commit -m "advance submodule" &&
+   git checkout base &&
+   git diff-files --quiet &&
+   git diff-index --quiet --cached base &&
+   git checkout advanced-base &&
+   git diff-files --quiet &&
+   git diff-index --quiet --cached advanced-base &&
+   git checkout --recurse-submodules base
+   )
+'
+
 test_expect_success '"checkout --recurse-submodules" repopulates submodule in 
existing directory' '
(
cd super &&
-- 
2.10.1.469.g00a8914

RE: [PATCH 02/16] submodule: modernize ok_to_remove_submodule to use argv_array

> - "-u",
...
> + argv_array_pushl(, "status", "--porcelain", "-uall",

This also changes -u to -uall, which is not mentioned in the commit message.  
That should probably be called out.

[PATCH 09/16] update submodules: add scheduling to update submodules

The walker of a tree is only expected to call `schedule_submodule_for_update`
and once done, to run `update_submodules`. This avoids directory/file
conflicts and later we can parallelize all submodule actions if needed.

Signed-off-by: Stefan Beller 
---
 submodule.c | 117 
 submodule.h |   4 +++
 2 files changed, 121 insertions(+)

diff --git a/submodule.c b/submodule.c
index 0fa4613..2773151 100644
--- a/submodule.c
+++ b/submodule.c
@@ -308,6 +308,75 @@ static void print_submodule_summary(struct rev_info *rev, 
FILE *f,
strbuf_release();
 }
 
+static int update_submodule(const char *path, const struct object_id *oid,
+   int force, int is_new)
+{
+   const char *git_dir;
+   struct child_process cp = CHILD_PROCESS_INIT;
+   const struct submodule *sub = submodule_from_path(null_sha1, path);
+
+   if (!sub || !sub->name)
+   return -1;
+
+   git_dir = resolve_gitdir(git_common_path("modules/%s", sub->name));
+
+   if (!git_dir)
+   return -1;
+
+   if (is_new)
+   connect_work_tree_and_git_dir(path, git_dir);
+
+   /* update index via `read-tree --reset sha1` */
+   argv_array_pushl(, "read-tree",
+  force ? "--reset" : "-m",
+  "-u", sha1_to_hex(oid->hash), NULL);
+   prepare_submodule_repo_env(_array);
+   cp.git_cmd = 1;
+   cp.no_stdin = 1;
+   cp.dir = path;
+   if (run_command()) {
+   warning(_("reading the index in submodule '%s' failed"), path);
+   child_process_clear();
+   return -1;
+   }
+
+   /* write index to working dir */
+   child_process_clear();
+   child_process_init();
+   argv_array_pushl(, "checkout-index", "-a", NULL);
+   cp.git_cmd = 1;
+   cp.no_stdin = 1;
+   cp.dir = path;
+   if (force)
+   argv_array_push(, "-f");
+
+   if (run_command()) {
+   warning(_("populating the working directory in submodule '%s' 
failed"), path);
+   child_process_clear();
+   return -1;
+   }
+
+   /* get the HEAD right */
+   child_process_clear();
+   child_process_init();
+   argv_array_pushl(, "checkout", "--recurse-submodules", NULL);
+   cp.git_cmd = 1;
+   cp.no_stdin = 1;
+   cp.dir = path;
+   if (force)
+   argv_array_push(, "-f");
+   argv_array_push(, sha1_to_hex(oid->hash));
+
+   if (run_command()) {
+   warning(_("setting the HEAD in submodule '%s' failed"), path);
+   child_process_clear();
+   return -1;
+   }
+
+   child_process_clear();
+   return 0;
+}
+
 int depopulate_submodule(const char *path)
 {
int ret = 0;
@@ -1336,3 +1405,51 @@ void prepare_submodule_repo_env(struct argv_array *out)
}
argv_array_push(out, "GIT_DIR=.git");
 }
+
+struct scheduled_submodules_update_type {
+   const char *path;
+   const struct object_id *oid;
+   /*
+* Do we need to perform a complete checkout or just incremental
+* update?
+*/
+   unsigned is_new:1;
+} *scheduled_submodules;
+#define SCHEDULED_SUBMODULES_INIT {NULL, NULL}
+
+int scheduled_submodules_nr, scheduled_submodules_alloc;
+
+void schedule_submodule_for_update(const struct cache_entry *ce, int is_new)
+{
+   struct scheduled_submodules_update_type *ssu;
+   ALLOC_GROW(scheduled_submodules,
+  scheduled_submodules_nr + 1,
+  scheduled_submodules_alloc);
+   ssu = _submodules[scheduled_submodules_nr++];
+   ssu->path = ce->name;
+   ssu->oid = >oid;
+   ssu->is_new = !!is_new;
+}
+
+int update_submodules(int force)
+{
+   int i;
+   gitmodules_config();
+
+   /*
+* NEEDSWORK: As submodule updates can potentially take some
+* time each and they do not overlap (i.e. no d/f conflicts;
+* this can be parallelized using the run_commands.h API.
+*/
+   for (i = 0; i < scheduled_submodules_nr; i++) {
+   struct scheduled_submodules_update_type *ssu =
+   _submodules[i];
+
+   if (submodule_is_interesting(ssu->path, null_sha1))
+   update_submodule(ssu->path, ssu->oid,
+force, ssu->is_new);
+   }
+
+   scheduled_submodules_nr = 0;
+   return 0;
+}
diff --git a/submodule.h b/submodule.h
index 8518cf3..f01f87c 100644
--- a/submodule.h
+++ b/submodule.h
@@ -94,4 +94,8 @@ extern int parallel_submodules(void);
  */
 extern void prepare_submodule_repo_env(struct argv_array *out);
 
+extern void schedule_submodule_for_update(const struct cache_entry *ce,
+ int new);
+extern int update_submodules(int force);
+
 #endif
--

[PATCH 14/16] checkout: recurse into submodules if asked to

Allow checkout to recurse into submodules via
the command line option --[no-]recurse-submodules.

The flag for recurse-submodules in its current form
could be an OPT_BOOL, but eventually we may want to have
it as:

git checkout --recurse-submodules=rebase|merge| \
cherry-pick|checkout|none

which resembles the submodule..update options,
so naturally a value such as
"as-configured-or-X-as-fallback" would also come in handy.

Signed-off-by: Stefan Beller 
---
 Documentation/git-checkout.txt |   7 +
 builtin/checkout.c |  31 +++-
 t/lib-submodule-update.sh  |  10 +-
 t/t2013-checkout-submodule.sh  | 325 -
 4 files changed, 362 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
index 8e2c066..a0ea2c5 100644
--- a/Documentation/git-checkout.txt
+++ b/Documentation/git-checkout.txt
@@ -256,6 +256,13 @@ section of linkgit:git-add[1] to learn how to operate the 
`--patch` mode.
out anyway. In other words, the ref can be held by more than one
worktree.
 
+--[no-]recurse-submodules::
+   Using --recurse-submodules will update the content of all initialized
+   submodules according to the commit recorded in the superproject. If
+   local modifications in a submodule would be overwritten the checkout
+   will fail until `-f` is used. If nothing (or --no-recurse-submodules)
+   is used, the work trees of submodules will not be updated.
+
 ::
Branch to checkout; if it refers to a branch (i.e., a name that,
when prepended with "refs/heads/", is a valid ref), then that
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 9b2a5b3..2a626a3 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -21,12 +21,31 @@
 #include "submodule-config.h"
 #include "submodule.h"
 
+static int recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
+
 static const char * const checkout_usage[] = {
N_("git checkout [] "),
N_("git checkout [] [] -- ..."),
NULL,
 };
 
+int option_parse_recurse_submodules(const struct option *opt,
+   const char *arg, int unset)
+{
+   if (unset) {
+   recurse_submodules = RECURSE_SUBMODULES_OFF;
+   return 0;
+   }
+   if (arg)
+   recurse_submodules =
+   parse_update_recurse_submodules_arg(opt->long_name,
+   arg);
+   else
+   recurse_submodules = RECURSE_SUBMODULES_ON;
+
+   return 0;
+}
+
 struct checkout_opts {
int patch_mode;
int quiet;
@@ -826,7 +845,8 @@ static int switch_branches(const struct checkout_opts *opts,
parse_commit_or_die(new->commit);
}
 
-   ret = merge_working_tree(opts, , new, _error);
+   ret = merge_working_tree(opts, , new, _error) ||
+ update_submodules(opts->force);
if (ret) {
free(path_to_free);
return ret;
@@ -1160,6 +1180,9 @@ int cmd_checkout(int argc, const char **argv, const char 
*prefix)
N_("second guess 'git checkout 
'")),
OPT_BOOL(0, "ignore-other-worktrees", 
_other_worktrees,
 N_("do not check if another worktree is holding the 
given ref")),
+   { OPTION_CALLBACK, 0, "recurse-submodules", _submodules,
+   "checkout", "control recursive updating of 
submodules",
+   PARSE_OPT_OPTARG, option_parse_recurse_submodules },
OPT_BOOL(0, "progress", _progress, N_("force progress 
reporting")),
OPT_END(),
};
@@ -1190,6 +1213,12 @@ int cmd_checkout(int argc, const char **argv, const char 
*prefix)
git_xmerge_config("merge.conflictstyle", conflict_style, NULL);
}
 
+   if (recurse_submodules != RECURSE_SUBMODULES_OFF) {
+   git_config(submodule_config, NULL);
+   if (recurse_submodules != RECURSE_SUBMODULES_DEFAULT)
+   
set_config_update_recurse_submodules(recurse_submodules);
+   }
+
if ((!!opts.new_branch + !!opts.new_branch_force + 
!!opts.new_orphan_branch) > 1)
die(_("-b, -B and --orphan are mutually exclusive"));
 
diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 79cdd34..e0773c6 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -634,7 +634,13 @@ test_submodule_forced_switch () {
 
## Modified submodule #
# Updating a submodule sha1 doesn't update the submodule's work tree
-   test_expect_success "$command: modified submodule does not update 
submodule work tree" '
+   if test 
"$KNOWN_FAILURE_RECURSE_SUBMODULE_SERIES_BREAKS_REPLACE_SUBMODULE_TEST" = 1
+   then
+

[PATCH 13/16] submodule: teach unpack_trees() to update submodules

Signed-off-by: Stefan Beller 
---
 entry.c|  7 +++--
 unpack-trees.c | 97 --
 unpack-trees.h |  1 +
 3 files changed, 86 insertions(+), 19 deletions(-)

diff --git a/entry.c b/entry.c
index 2330b6e..dd829ec 100644
--- a/entry.c
+++ b/entry.c
@@ -270,7 +270,7 @@ int checkout_entry(struct cache_entry *ce,
 
if (!check_path(path.buf, path.len, , state->base_dir_len)) {
unsigned changed = ce_match_stat(ce, , 
CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE);
-   if (!changed)
+   if (!changed && (!S_ISDIR(st.st_mode) || 
!S_ISGITLINK(ce->ce_mode)))
return 0;
if (!state->force) {
if (!state->quiet)
@@ -287,9 +287,10 @@ int checkout_entry(struct cache_entry *ce,
 * just do the right thing)
 */
if (S_ISDIR(st.st_mode)) {
-   /* If it is a gitlink, leave it alone! */
-   if (S_ISGITLINK(ce->ce_mode))
+   if (S_ISGITLINK(ce->ce_mode)) {
+   schedule_submodule_for_update(ce, 1);
return 0;
+   }
if (!state->force)
return error("%s is a directory", path.buf);
remove_subtree();
diff --git a/unpack-trees.c b/unpack-trees.c
index 576e1d5..c5c22ed 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -29,6 +29,9 @@ static const char 
*unpack_plumbing_errors[NB_UNPACK_TREES_ERROR_TYPES] = {
/* ERROR_NOT_UPTODATE_DIR */
"Updating '%s' would lose untracked files in it",
 
+   /* ERROR_NOT_UPTODATE_SUBMODULE */
+   "Updating submodule '%s' would lose modifications in it",
+
/* ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN */
"Untracked working tree file '%s' would be overwritten by merge.",
 
@@ -80,6 +83,8 @@ void setup_unpack_trees_porcelain(struct unpack_trees_options 
*opts,
 
msgs[ERROR_NOT_UPTODATE_DIR] =
_("Updating the following directories would lose untracked 
files in it:\n%s");
+   msgs[ERROR_NOT_UPTODATE_SUBMODULE] =
+   _("Updating the following submodules would lose modifications 
in it:\n%s");
 
if (!strcmp(cmd, "checkout"))
msg = advice_commit_before_merge
@@ -1315,19 +1320,18 @@ static int verify_uptodate_1(const struct cache_entry 
*ce,
return 0;
 
if (!lstat(ce->name, )) {
-   int flags = CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE;
-   unsigned changed = ie_match_stat(o->src_index, ce, , flags);
-   if (!changed)
-   return 0;
-   /*
-* NEEDSWORK: the current default policy is to allow
-* submodule to be out of sync wrt the superproject
-* index.  This needs to be tightened later for
-* submodules that are marked to be automatically
-* checked out.
-*/
-   if (S_ISGITLINK(ce->ce_mode))
-   return 0;
+   if (!S_ISGITLINK(ce->ce_mode)) {
+   int flags = 
CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE;
+   unsigned changed = ie_match_stat(o->src_index, ce, , 
flags);
+   if (!changed)
+   return 0;
+   } else {
+   if (!submodule_is_interesting(ce->name, null_sha1))
+   return 0;
+   if (ce_stage(ce) ? is_submodule_checkout_safe(ce->name, 
>oid)
+   : !is_submodule_modified(ce->name, 1))
+   return 0;
+   }
errno = 0;
}
if (errno == ENOENT)
@@ -1350,6 +1354,59 @@ static int verify_uptodate_sparse(const struct 
cache_entry *ce,
return verify_uptodate_1(ce, o, ERROR_SPARSE_NOT_UPTODATE_FILE);
 }
 
+/*
+ * When a submodule gets turned into an unmerged entry, we want it to be
+ * up-to-date regarding the merge changes.
+ */
+static int verify_uptodate_submodule(const struct cache_entry *old,
+const struct cache_entry *new,
+struct unpack_trees_options *o)
+{
+   struct stat st;
+
+   if (o->index_only
+   || (!((old->ce_flags & CE_VALID) || ce_skip_worktree(old))
+   && (o->reset || ce_uptodate(old
+   return 0;
+   if (!submodule_is_interesting(new->name, null_sha1))
+   return 0;
+   if (!lstat(old->name, )) {
+   unsigned changed = ie_match_stat(o->src_index, old, ,
+CE_MATCH_IGNORE_VALID|
+

[PATCH 08/16] update submodules: add depopulate_submodule

Implement the functionality needed to enable work tree manipulating
commands to that a deleted submodule should not only affect the index
(leaving all the files of the submodule in the work tree) but also to
remove the work tree of the superproject (including any untracked
files).

To do so, we need an equivalent of "rm -rf", which is already found in
entry.c, so expose that and for clarity add a suffix "_or_dir" to it.

That will only work properly when the submodule uses a gitfile instead of
a .git directory and no untracked files are present. Otherwise the removal
will fail with a warning (which is just what happened until now).

Signed-off-by: Stefan Beller 
---
 cache.h |  2 ++
 entry.c |  8 
 submodule.c | 25 +
 submodule.h |  1 +
 4 files changed, 36 insertions(+)

diff --git a/cache.h b/cache.h
index a50a61a..65c47e4 100644
--- a/cache.h
+++ b/cache.h
@@ -2018,4 +2018,6 @@ void sleep_millisec(int millisec);
  */
 void safe_create_dir(const char *dir, int share);
 
+void remove_subtree_or_die(const char *path);
+
 #endif /* CACHE_H */
diff --git a/entry.c b/entry.c
index c6eea24..019826b 100644
--- a/entry.c
+++ b/entry.c
@@ -73,6 +73,14 @@ static void remove_subtree(struct strbuf *path)
die_errno("cannot rmdir '%s'", path->buf);
 }
 
+void remove_subtree_or_die(const char *path)
+{
+   struct strbuf sb = STRBUF_INIT;
+   strbuf_addstr(, path);
+   remove_subtree();
+   strbuf_release();
+}
+
 static int create_file(const char *path, unsigned int mode)
 {
mode = (mode & 0100) ? 0777 : 0666;
diff --git a/submodule.c b/submodule.c
index d34b721..0fa4613 100644
--- a/submodule.c
+++ b/submodule.c
@@ -308,6 +308,31 @@ static void print_submodule_summary(struct rev_info *rev, 
FILE *f,
strbuf_release();
 }
 
+int depopulate_submodule(const char *path)
+{
+   int ret = 0;
+   char *dot_git = xstrfmt("%s/.git", path);
+
+   /* Is it populated? */
+   if (!resolve_gitdir(dot_git))
+   goto out;
+
+   /* Does it have a .git directory? */
+   if (!submodule_uses_gitfile(path)) {
+   warning(_("cannot remove submodule '%s' because it (or one of "
+ "its nested submodules) uses a .git directory"),
+ path);
+   ret = -1;
+   goto out;
+   }
+
+   remove_subtree_or_die(path);
+
+out:
+   free(dot_git);
+   return ret;
+}
+
 /* Helper function to display the submodule header line prior to the full
  * summary output. If it can locate the submodule objects directory it will
  * attempt to lookup both the left and right commits and put them into the
diff --git a/submodule.h b/submodule.h
index 3df6881..8518cf3 100644
--- a/submodule.h
+++ b/submodule.h
@@ -65,6 +65,7 @@ extern void set_config_update_recurse_submodules(int value);
  */
 extern int submodule_is_interesting(const char *path, const unsigned char 
*sha1);
 extern int submodules_interesting_for_update(void);
+extern int depopulate_submodule(const char *path);
 extern void check_for_new_submodule_commits(unsigned char new_sha1[20]);
 extern int fetch_populated_submodules(const struct argv_array *options,
   const char *prefix, int command_line_option,
-- 
2.10.1.469.g00a8914

[PATCH 11/16] teach unpack_trees() to remove submodule contents

Extend rmdir_or_warn() to remove the directories of those submodules which
are scheduled for removal. Also teach verify_clean_submodule() to check
that a submodule configured to be removed is not modified before scheduling
it for removal.

Signed-off-by: Stefan Beller 
---
 unpack-trees.c | 6 ++
 wrapper.c  | 4 
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index ea6bdd2..576e1d5 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -9,6 +9,7 @@
 #include "refs.h"
 #include "attr.h"
 #include "split-index.h"
+#include "submodule.h"
 #include "dir.h"
 
 /*
@@ -1361,15 +1362,12 @@ static void invalidate_ce_path(const struct cache_entry 
*ce,
 /*
  * Check that checking out ce->sha1 in subdir ce->name is not
  * going to overwrite any working files.
- *
- * Currently, git does not checkout subprojects during a superproject
- * checkout, so it is not going to overwrite anything.
  */
 static int verify_clean_submodule(const struct cache_entry *ce,
  enum unpack_trees_error_types error_type,
  struct unpack_trees_options *o)
 {
-   return 0;
+   return submodule_is_interesting(ce->name, null_sha1) && 
is_submodule_modified(ce->name, 0);
 }
 
 static int verify_clean_subdirectory(const struct cache_entry *ce,
diff --git a/wrapper.c b/wrapper.c
index e7f1979..17c08de 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -2,6 +2,7 @@
  * Various trivial helper wrappers around standard functions
  */
 #include "cache.h"
+#include "submodule.h"
 
 static void do_nothing(size_t size)
 {
@@ -592,6 +593,9 @@ int unlink_or_warn(const char *file)
 
 int rmdir_or_warn(const char *file)
 {
+   if (submodule_is_interesting(file, null_sha1)
+   && depopulate_submodule(file))
+   return -1;
return warn_if_unremovable("rmdir", file, rmdir(file));
 }
 
-- 
2.10.1.469.g00a8914

[PATCH 12/16] entry: write_entry to write populate submodules

Signed-off-by: Stefan Beller 
---
 entry.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/entry.c b/entry.c
index 019826b..2330b6e 100644
--- a/entry.c
+++ b/entry.c
@@ -2,6 +2,7 @@
 #include "blob.h"
 #include "dir.h"
 #include "streaming.h"
+#include "submodule.h"
 
 static void create_directories(const char *path, int path_len,
   const struct checkout *state)
@@ -211,6 +212,7 @@ static int write_entry(struct cache_entry *ce,
return error("cannot create temporary submodule %s", 
path);
if (mkdir(path, 0777) < 0)
return error("cannot create submodule directory %s", 
path);
+   schedule_submodule_for_update(ce, 1);
break;
default:
return error("unknown file mode for %s in index", path);
-- 
2.10.1.469.g00a8914

[RFC PATCH 00/16] Checkout aware of Submodules!

When working with submodules, nearly anytime after checking out 
a different state of the projects, that has submodules changed
you'd run "git submodule update" with a current version of Git.

There are two problems with this approach:

* The "submodule update" command is dangerous as it
  doesn't check for work that may be lost in the submodule
  (e.g. a dangling commit).
* you may forget to run the command as checkout is supposed
  to do all the work for you.

Integrate updating the submodules into git checkout, with the same
safety promises that git-checkout has, i.e. not throw away data unless
asked to. This is done by first checking if the submodule is at the same
sha1 as it is recorded in the superproject. If there are changes we stop
proceeding the checkout just like it is when checking out a file that
has local changes.

The integration happens in the code that is also used in other commands
such that it will be easier in the future to make other commands aware
of submodule.

This also solves d/f conflicts in case you replace a file/directory
with a submodule or vice versa.

The patches are still a bit rough, but the overall series seems
promising enough to me that I want to put it out here.

Any review, specifically on the design level welcome!

Thanks,
Stefan

Stefan Beller (16):
  submodule.h: add extern keyword to functions, break line before 80
  submodule: modernize ok_to_remove_submodule to use argv_array
  submodule: use absolute path for computing relative path connecting
  update submodules: add is_submodule_populated
  update submodules: add submodule config parsing
  update submodules: add a config option to determine if submodules are
updated
  update submodules: introduce submodule_is_interesting
  update submodules: add depopulate_submodule
  update submodules: add scheduling to update submodules
  update submodules: is_submodule_checkout_safe
  teach unpack_trees() to remove submodule contents
  entry: write_entry to write populate submodules
  submodule: teach unpack_trees() to update submodules
  checkout: recurse into submodules if asked to
  completion: add '--recurse-submodules' to checkout
  checkout: add config option to recurse into submodules by default

 Documentation/config.txt   |   6 +
 Documentation/git-checkout.txt |   8 +
 builtin/checkout.c |  31 ++-
 cache.h|   2 +
 contrib/completion/git-completion.bash |   2 +-
 entry.c|  17 +-
 submodule-config.c |  22 +++
 submodule-config.h |  17 +-
 submodule.c| 246 +--
 submodule.h|  77 +---
 t/lib-submodule-update.sh  |  10 +-
 t/t2013-checkout-submodule.sh  | 344 -
 t/t9902-completion.sh  |   1 +
 unpack-trees.c | 103 --
 unpack-trees.h |   1 +
 wrapper.c  |   4 +
 16 files changed, 806 insertions(+), 85 deletions(-)

-- 
2.10.1.469.g00a8914

[PATCH 01/16] submodule.h: add extern keyword to functions, break line before 80

As the upcoming series will add a lot of functions to the submodule
header, it's good to start of a sane base. So format the header to
not exceed 80 characters a line and mark all functions to be extern.

Signed-off-by: Stefan Beller 
---
 submodule.h | 60 ++--
 1 file changed, 34 insertions(+), 26 deletions(-)

diff --git a/submodule.h b/submodule.h
index d9e197a..afc58d0 100644
--- a/submodule.h
+++ b/submodule.h
@@ -29,50 +29,58 @@ struct submodule_update_strategy {
 };
 #define SUBMODULE_UPDATE_STRATEGY_INIT {SM_UPDATE_UNSPECIFIED, NULL}
 
-int is_staging_gitmodules_ok(void);
-int update_path_in_gitmodules(const char *oldpath, const char *newpath);
-int remove_path_from_gitmodules(const char *path);
-void stage_updated_gitmodules(void);
-void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
+extern int is_staging_gitmodules_ok(void);
+extern int update_path_in_gitmodules(const char *oldpath, const char *newpath);
+extern int remove_path_from_gitmodules(const char *path);
+extern void stage_updated_gitmodules(void);
+extern void set_diffopt_flags_from_submodule_config(
+   struct diff_options *diffopt,
const char *path);
-int submodule_config(const char *var, const char *value, void *cb);
-void gitmodules_config(void);
-int parse_submodule_update_strategy(const char *value,
+extern int submodule_config(const char *var, const char *value, void *cb);
+extern void gitmodules_config(void);
+extern int parse_submodule_update_strategy(const char *value,
struct submodule_update_strategy *dst);
-const char *submodule_strategy_to_string(const struct 
submodule_update_strategy *s);
-void handle_ignore_submodules_arg(struct diff_options *diffopt, const char *);
-void show_submodule_summary(FILE *f, const char *path,
+extern const char *submodule_strategy_to_string(
+   const struct submodule_update_strategy *s);
+extern void handle_ignore_submodules_arg(struct diff_options *diffopt,
+const char *);
+extern void show_submodule_summary(FILE *f, const char *path,
const char *line_prefix,
struct object_id *one, struct object_id *two,
unsigned dirty_submodule, const char *meta,
const char *del, const char *add, const char *reset);
-void show_submodule_inline_diff(FILE *f, const char *path,
+extern void show_submodule_inline_diff(FILE *f, const char *path,
const char *line_prefix,
struct object_id *one, struct object_id *two,
unsigned dirty_submodule, const char *meta,
const char *del, const char *add, const char *reset,
const struct diff_options *opt);
-void set_config_fetch_recurse_submodules(int value);
-void check_for_new_submodule_commits(unsigned char new_sha1[20]);
-int fetch_populated_submodules(const struct argv_array *options,
+extern void set_config_fetch_recurse_submodules(int value);
+extern void check_for_new_submodule_commits(unsigned char new_sha1[20]);
+extern int fetch_populated_submodules(const struct argv_array *options,
   const char *prefix, int command_line_option,
   int quiet, int max_parallel_jobs);
-unsigned is_submodule_modified(const char *path, int ignore_untracked);
-int submodule_uses_gitfile(const char *path);
-int ok_to_remove_submodule(const char *path);
-int merge_submodule(unsigned char result[20], const char *path, const unsigned 
char base[20],
-   const unsigned char a[20], const unsigned char b[20], int 
search);
-int find_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name,
-   struct string_list *needs_pushing);
-int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name);
-void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
-int parallel_submodules(void);
+extern unsigned is_submodule_modified(const char *path, int ignore_untracked);
+extern int submodule_uses_gitfile(const char *path);
+extern int ok_to_remove_submodule(const char *path);
+extern int merge_submodule(unsigned char result[20], const char *path,
+  const unsigned char base[20],
+  const unsigned char a[20],
+  const unsigned char b[20], int search);
+extern int find_unpushed_submodules(unsigned char new_sha1[20],
+   const char *remotes_name,
+   struct string_list *needs_pushing);
+extern int push_unpushed_submodules(unsigned char new_sha1[20],
+   const char *remotes_name);
+extern void connect_work_tree_and_git_dir(const char *work_tree,
+ const char *git_dir);
+extern int parallel_submodules(void);
 
 /*
  * Prepare

[PATCH 06/16] update submodules: add a config option to determine if submodules are updated

In later patches we introduce the options and flag for commands
that modify the working directory, e.g. git-checkout.

Have a central place to store such settings whether we want to update
a submodule.

Signed-off-by: Stefan Beller 
---
 submodule.c | 6 ++
 submodule.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/submodule.c b/submodule.c
index 97eaf7c..38b0573 100644
--- a/submodule.c
+++ b/submodule.c
@@ -16,6 +16,7 @@
 #include "quote.h"
 
 static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
+static int config_update_recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int parallel_jobs = 1;
 static struct string_list changed_submodule_paths = STRING_LIST_INIT_NODUP;
 static int initialized_fetch_ref_tips;
@@ -494,6 +495,11 @@ void set_config_fetch_recurse_submodules(int value)
config_fetch_recurse_submodules = value;
 }
 
+void set_config_update_recurse_submodules(int value)
+{
+   config_update_recurse_submodules = value;
+}
+
 static int has_remote(const char *refname, const struct object_id *oid,
  int flags, void *cb_data)
 {
diff --git a/submodule.h b/submodule.h
index d44b4f1..185ad18 100644
--- a/submodule.h
+++ b/submodule.h
@@ -56,6 +56,7 @@ extern void show_submodule_inline_diff(FILE *f, const char 
*path,
const char *del, const char *add, const char *reset,
const struct diff_options *opt);
 extern void set_config_fetch_recurse_submodules(int value);
+extern void set_config_update_recurse_submodules(int value);
 extern void check_for_new_submodule_commits(unsigned char new_sha1[20]);
 extern int fetch_populated_submodules(const struct argv_array *options,
   const char *prefix, int command_line_option,
-- 
2.10.1.469.g00a8914

Re: [PATCH v3 4/4] submodule_needs_pushing() NEEDSWORK when we can not answer this question

On Tue, Nov 15, 2016 at 6:56 AM, Heiko Voigt  wrote:
> Signed-off-by: Heiko Voigt 
> ---
>  submodule.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/submodule.c b/submodule.c
> index e1196fd..29efee9 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -531,6 +531,14 @@ static int submodule_has_commits(const char *path, 
> struct sha1_array *commits)
>  static int submodule_needs_pushing(const char *path, struct sha1_array 
> *commits)
>  {
> if (!submodule_has_commits(path, commits))
> +   /* NEEDSWORK: The correct answer here is "We do not

style nit:
/*
 * Usually we prefer comments with both the first and last line of the
comment "empty".
 */
/* or just a one liner */

AFAICT these are the only two modes that we prefer in Git.
For a discussion of all the other style, enjoy Linus' guidance. ;)
http://lkml.iu.edu/hypermail/linux/kernel/1607.1/00627.html

> "We do not know" ...

... because there is no way to check for us as we don't have the
submodule commits.

" We do consider it safe as no one in their sane mind would
have changed the submodule pointers without having the
submodule around. If a user did however change the submodules
without having the submodule commits around, this indicates an
expert who knows what they were doing."

>   We currently
> +* proceed pushing here as if the submodules commits are
> +* available on a remote. Since we can not check the
> +* remote availability for this submodule we should
> +* consider changing this behavior to: Stop here and
> +* tell the user how to skip this check if wanted.
> +*/
> return 0;

Thanks for adding the NEEDSWORK, I just wrote the above lines
to clarify my thought process, not as a suggestion for change.

Overall the series looks good to me; the nits are minor IMHO.

Thanks,
Stefan

Re: [PATCH v3 3/4] batch check whether submodule needs pushing into one call

On Tue, Nov 15, 2016 at 6:56 AM, Heiko Voigt  wrote:

> -static int submodule_needs_pushing(const char *path, const unsigned char 
> sha1[20])
> +static int check_has_commit(const unsigned char sha1[20], void *data)
>  {
> -   if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
> +   int *has_commit = (int *) data;

nit: just as prior patches ;) void* can be cast implicitly.

Re: [PATCH v3 2/4] serialize collection of refs that contain submodule changes

On Tue, Nov 15, 2016 at 6:56 AM, Heiko Voigt  wrote:

> +++ b/submodule.c
> @@ -500,6 +500,13 @@ static int has_remote(const char *refname, const struct 
> object_id *oid,
> return 1;
>  }
>
> +static int append_sha1_to_argv(const unsigned char sha1[20], void *data)
> +{
> +   struct argv_array *argv = (struct argv_array *) data;

nit: no explicit cast needed when coming from a void pointer.

Re: [PATCH v3 1/4] serialize collection of changed submodules

On Tue, Nov 15, 2016 at 6:56 AM, Heiko Voigt  wrote:

> @@ -560,6 +575,31 @@ static void find_unpushed_submodule_commits(struct 
> commit *commit,
> diff_tree_combined_merge(commit, 1, );
>  }
>
> +struct collect_submodule_from_sha1s_data {
> +   char *submodule_path;
> +   struct string_list *needs_pushing;
> +};
> +
> +static int collect_submodules_from_sha1s(const unsigned char sha1[20],
> +   void *data)
> +{
> +   struct collect_submodule_from_sha1s_data *me =
> +   (struct collect_submodule_from_sha1s_data *) data;

nit: no explicit cast needed when coming from void* ?

Re: [PATCH v15 04/27] bisect--helper: `bisect_clean_state` shell function in C

On 11/15/2016 10:40 PM, Junio C Hamano wrote:
> Stephan Beyer  writes:
> 
>>> +int bisect_clean_state(void)
>>> +{
>>> +   int result = 0;
>>> +
>>> +   /* There may be some refs packed during bisection */
>>> +   struct string_list refs_for_removal = STRING_LIST_INIT_NODUP;
>>> +   for_each_ref_in("refs/bisect", mark_for_removal, (void *) 
>>> _for_removal);
>>> +   string_list_append(_for_removal, xstrdup("BISECT_HEAD"));
>>> +   result = delete_refs(_for_removal, REF_NODEREF);
>>> +   refs_for_removal.strdup_strings = 1;
>>> +   string_list_clear(_for_removal, 0);
>>
>> Does it have advantages to populate a list (with duplicated strings),
>> hand it to delete_refs(), and clear the list (and strings), instead of
>> just doing a single delete_ref() (or whatever name the singular function
>> has) in the callback?
> 
> Depending on ref backends, removing multiple refs may be a lot more
> efficient than calling a single ref removal for the same set of
> refs, and the comment upfront I think hints that the code was
> written in the way exactly with that in mind.  Removing N refs from
> a packed refs file will involve a loop that runs N times, each
> iteration loading the file, locating an entry among possibly 100s of
> refs to remove, and then rewriting the file.

Great, that's the reply I wanted to hear (and that I've considered but
wasn't sure of) ;)
[I did not want to dig into the sources and check if delete_refs() does
something smarter than invoking delete_ref() on each item of the list.]

~Stephan

Re: [PATCH v15 01/27] bisect--helper: use OPT_CMDMODE instead of OPT_BOOL

Hi,

On 10/27/2016 06:59 PM, Junio C Hamano wrote:
> Does any of you (and others on the list) have time and inclination
> to review this series?

Me, currently. ;)
Besides the things I'm mentioning in respective patch e-mails, I wonder
why several bisect--helper commands are prefixed by "bisect"; I'm
talking about:

git bisect--helper --bisect-clean-state
git bisect--helper --bisect-reset
git bisect--helper --bisect-write
git bisect--helper --bisect-check-and-set-terms
git bisect--helper --bisect-next-check
git bisect--helper --bisect-terms
git bisect--helper --bisect-start
etc.

instead of

git bisect--helper --clean-state
git bisect--helper --reset
git bisect--helper --write
git bisect--helper --check-and-set-terms
git bisect--helper --next-check
git bisect--helper --terms
git bisect--helper --start
etc.

Well, I know *why* they have these names: because the shell function
names are simply reused. But I don't know why these prefixes are kept in
the bisect--helper command options. On the other hand, these command
names are not exposed to the user and may hence not be that important.(?)

~Stephan

Re: [PATCH v15 04/27] bisect--helper: `bisect_clean_state` shell function in C

Stephan Beyer  writes:

>> +int bisect_clean_state(void)
>> +{
>> +int result = 0;
>> +
>> +/* There may be some refs packed during bisection */
>> +struct string_list refs_for_removal = STRING_LIST_INIT_NODUP;
>> +for_each_ref_in("refs/bisect", mark_for_removal, (void *) 
>> _for_removal);
>> +string_list_append(_for_removal, xstrdup("BISECT_HEAD"));
>> +result = delete_refs(_for_removal, REF_NODEREF);
>> +refs_for_removal.strdup_strings = 1;
>> +string_list_clear(_for_removal, 0);
>
> Does it have advantages to populate a list (with duplicated strings),
> hand it to delete_refs(), and clear the list (and strings), instead of
> just doing a single delete_ref() (or whatever name the singular function
> has) in the callback?

Depending on ref backends, removing multiple refs may be a lot more
efficient than calling a single ref removal for the same set of
refs, and the comment upfront I think hints that the code was
written in the way exactly with that in mind.  Removing N refs from
a packed refs file will involve a loop that runs N times, each
iteration loading the file, locating an entry among possibly 100s of
refs to remove, and then rewriting the file.

Besides, it is bad taste to delete each individual item being
iterated over in an interator in general, isn't it?

Re: [PATCH v15 04/27] bisect--helper: `bisect_clean_state` shell function in C

Hi,

On 10/14/2016 04:14 PM, Pranit Bauva wrote:
> diff --git a/bisect.c b/bisect.c
> index 6f512c2..45d598d 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -1040,3 +1046,40 @@ int estimate_bisect_steps(int all)
>  
>   return (e < 3 * x) ? n : n - 1;
>  }
> +
> +static int mark_for_removal(const char *refname, const struct object_id *oid,
> + int flag, void *cb_data)
> +{
> + struct string_list *refs = cb_data;
> + char *ref = xstrfmt("refs/bisect%s", refname);
> + string_list_append(refs, ref);
> + return 0;
> +}
> +
> +int bisect_clean_state(void)
> +{
> + int result = 0;
> +
> + /* There may be some refs packed during bisection */
> + struct string_list refs_for_removal = STRING_LIST_INIT_NODUP;
> + for_each_ref_in("refs/bisect", mark_for_removal, (void *) 
> _for_removal);
> + string_list_append(_for_removal, xstrdup("BISECT_HEAD"));
> + result = delete_refs(_for_removal, REF_NODEREF);
> + refs_for_removal.strdup_strings = 1;
> + string_list_clear(_for_removal, 0);

Does it have advantages to populate a list (with duplicated strings),
hand it to delete_refs(), and clear the list (and strings), instead of
just doing a single delete_ref() (or whatever name the singular function
has) in the callback?
I am only seeing the disadvantage: a list with duped strings.

> + unlink_or_warn(git_path_bisect_expected_rev());
[...]
> + unlink_or_warn(git_path_bisect_start());

Comparing it with the original shell code (which uses "rm -f"), I was
wondering a little after reading the function name unlink_or_warn()
here... Looking it up helped: despite its name, unlink_or_warn() is
really the function you have to use here ;D

> diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
> index 65cf519..4254d61 100644
> --- a/builtin/bisect--helper.c
> +++ b/builtin/bisect--helper.c
> @@ -5,10 +5,15 @@
>  #include "refs.h"
>  
>  static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
> +static GIT_PATH_FUNC(git_path_bisect_expected_rev, "BISECT_EXPECTED_REV")
> +static GIT_PATH_FUNC(git_path_bisect_ancestors_ok, "BISECT_ANCESTORS_OK")
> +static GIT_PATH_FUNC(git_path_bisect_log, "BISECT_LOG")
> +static GIT_PATH_FUNC(git_path_bisect_start, "BISECT_START")

This is perhaps a non-issue, but you do not need any of these new path
functions in *this* patch. I think nobody really cares, though, because
you will need them later.

Cheers
Stephan

Re* [PATCH v7 00/17] port branch.c to use ref-filter's printing options

Junio C Hamano  writes:

> Something like this needs to go before that step.

This time with a log message and an additional test.  The second
paragraph of the proposed log message is written expecting that this
patch will come before your "branch: use ref-filter printing APIs",
which I think is a good place to avoid breakage in the series.

-- >8 --
Subject: [PATCH] for-each-ref: do not segv with %(HEAD) on an unborn branch

The code to flip between "*" and " " prefixes depending on what
branch is checked out used in --format='%(HEAD)' did not consider
that HEAD may resolve to an unborn branch and dereferenced a NULL.

This will become a lot easier to trigger as the codepath will now be
shared with "git branch [--list]".

Signed-off-by: Junio C Hamano 
---
 ref-filter.c | 2 +-
 t/t3203-branch-output.sh | 8 
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/ref-filter.c b/ref-filter.c
index 944671af5a..c71d7360d2 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1318,7 +1318,7 @@ static void populate_value(struct ref_array_item *ref)
 
head = resolve_ref_unsafe("HEAD", RESOLVE_REF_READING,
  sha1, NULL);
-   if (!strcmp(ref->refname, head))
+   if (head && !strcmp(ref->refname, head))
v->s = "*";
else
v->s = " ";
diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index d8edaf27e9..1a8dbca8c8 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -37,6 +37,14 @@ test_expect_success 'git branch --list shows local branches' 
'
test_cmp expect actual
 '
 
+test_expect_success 'same, but on an unborn branch' '
+   test_when_finished "git checkout master" &&
+   git checkout --orphan naster &&
+   git branch --list >actual &&
+   sed -e "s/\* /  /" expect >expect-unborn &&
+   test_cmp expect-unborn actual
+'
+
 cat >expect <<'EOF'
   branch-one
   branch-two
-- 
2.11.0-rc1-154-gcd2a643dcd

Re: [PATCH] worktree: fix a sparse 'Using plain integer as NULL pointer' warning

2016-11-15 Thread Ramsay Jones



On 15/11/16 20:28, Ramsay Jones wrote:
> 
> Signed-off-by: Ramsay Jones 
> ---
> 
> Hi Duy,
> 
> If you need to re-roll your 'nd/worktree-move' branch, could you
> please squash this into the relevant patch [commit c49e92f5c
> ("worktree move: refuse to move worktrees with submodules", 12-11-2016)].
> 
> Also, one of the new tests introduced by commit 31a8f3066 ("worktree move:
> new command", 12-11-2016), fails for me, thus:
> 
>   $ ./t2028-worktree-move.sh -i -v
>   ...
>   --- expected2016-11-15 20:22:50.647241458 +
>   +++ actual  2016-11-15 20:22:50.647241458 +
>   @@ -1,3 +1,3 @@
>worktree /home/ramsay/git/t/trash directory.t2028-worktree-move
>   -worktree /home/ramsay/git/t/trash directory.t2028-worktree-move/destination
>worktree /home/ramsay/git/t/trash directory.t2028-worktree-move/elsewhere
>   +worktree /home/ramsay/git/t/trash directory.t2028-worktree-move/destination
>   not ok 12 - move worktree
>   #   
>   #   git worktree move source destination &&
>   #   test_path_is_missing source &&
>   #   git worktree list --porcelain | grep "^worktree" >actual &&
>   #   cat <<-EOF >expected &&
>   #   worktree $TRASH_DIRECTORY
>   #   worktree $TRASH_DIRECTORY/destination
>   #   worktree $TRASH_DIRECTORY/elsewhere
>   #   EOF
>   #   test_cmp expected actual &&
>   #   git -C destination log --format=%s >actual2 &&
>   #   echo init >expected2 &&
>   #   test_cmp expected2 actual2
>   #   
>   $ 
> 
> Is there an expectation that the submodules will be listed in

Er, that should read 'worktrees', of course! :(

ATB,
Ramsay Jones

Re: [PATCH v7 00/17] port branch.c to use ref-filter's printing options

Karthik Nayak  writes:

> This is part of unification of the commands 'git tag -l, git branch -l
> and git for-each-ref'. This ports over branch.c to use ref-filter's
> printing options.
>
> Karthik Nayak (17):
>   ref-filter: implement %(if), %(then), and %(else) atoms
>   ref-filter: include reference to 'used_atom' within 'atom_value'
>   ref-filter: implement %(if:equals=) and
> %(if:notequals=)
>   ref-filter: modify "%(objectname:short)" to take length
>   ref-filter: move get_head_description() from branch.c
>   ref-filter: introduce format_ref_array_item()
>   ref-filter: make %(upstream:track) prints "[gone]" for invalid
> upstreams
>   ref-filter: add support for %(upstream:track,nobracket)
>   ref-filter: make "%(symref)" atom work with the ':short' modifier
>   ref-filter: introduce refname_atom_parser_internal()
>   ref-filter: introduce symref_atom_parser() and refname_atom_parser()
>   ref-filter: make remote_ref_atom_parser() use
> refname_atom_parser_internal()
>   ref-filter: add `:dir` and `:base` options for ref printing atoms
>   ref-filter: allow porcelain to translate messages in the output
>   branch, tag: use porcelain output
>   branch: use ref-filter printing APIs
>   branch: implement '--format' option

This is not a new issue, but --format='%(HEAD)' you stole from
for-each-ref is broken when you are on an unborn branch, and the
second patch from the tip makes "git branch" (no other args) on
an unborn branch to segfault, when there are real branches that
have commits.

Something like this needs to go before that step.

 ref-filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ref-filter.c b/ref-filter.c
index 944671af5a..c71d7360d2 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1318,7 +1318,7 @@ static void populate_value(struct ref_array_item *ref)
 
head = resolve_ref_unsafe("HEAD", RESOLVE_REF_READING,
  sha1, NULL);
-   if (!strcmp(ref->refname, head))
+   if (head && !strcmp(ref->refname, head))
v->s = "*";
else
v->s = " ";

[PATCH] worktree: fix a sparse 'Using plain integer as NULL pointer' warning

2016-11-15 Thread Ramsay Jones


Signed-off-by: Ramsay Jones 
---

Hi Duy,

If you need to re-roll your 'nd/worktree-move' branch, could you
please squash this into the relevant patch [commit c49e92f5c
("worktree move: refuse to move worktrees with submodules", 12-11-2016)].

Also, one of the new tests introduced by commit 31a8f3066 ("worktree move:
new command", 12-11-2016), fails for me, thus:

  $ ./t2028-worktree-move.sh -i -v
  ...
  --- expected  2016-11-15 20:22:50.647241458 +
  +++ actual2016-11-15 20:22:50.647241458 +
  @@ -1,3 +1,3 @@
   worktree /home/ramsay/git/t/trash directory.t2028-worktree-move
  -worktree /home/ramsay/git/t/trash directory.t2028-worktree-move/destination
   worktree /home/ramsay/git/t/trash directory.t2028-worktree-move/elsewhere
  +worktree /home/ramsay/git/t/trash directory.t2028-worktree-move/destination
  not ok 12 - move worktree
  # 
  # git worktree move source destination &&
  # test_path_is_missing source &&
  # git worktree list --porcelain | grep "^worktree" >actual &&
  # cat <<-EOF >expected &&
  # worktree $TRASH_DIRECTORY
  # worktree $TRASH_DIRECTORY/destination
  # worktree $TRASH_DIRECTORY/elsewhere
  # EOF
  # test_cmp expected actual &&
  # git -C destination log --format=%s >actual2 &&
  # echo init >expected2 &&
  # test_cmp expected2 actual2
  # 
  $ 

Is there an expectation that the submodules will be listed in
any particular order by 'git worktree list --porcelain' ?

Thanks!

ATB,
Ramsay Jones

 builtin/worktree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index e738142..abdf462 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -526,7 +526,7 @@ static int unlock_worktree(int ac, const char **av, const 
char *prefix)
 
 static void validate_no_submodules(const struct worktree *wt)
 {
-   struct index_state istate = {0};
+   struct index_state istate = { NULL };
int i, found_submodules = 0;
 
if (read_index_from(, worktree_git_path(wt, "index")) > 0) {
-- 
2.10.0

Re: Protecting old temporary objects being reused from concurrent "git gc"?

Jeff King  writes:

> I suspect the issue is that read-tree populates the cache-tree index
> extension, and then write-tree omits the object write before it even
> gets to write_sha1_file(). The solution is that it should probably be
> calling one of the freshen() functions (possibly just replacing
> has_sha1_file() with check_and_freshen(), but I haven't looked).

I think the final writing always happens via write_sha1_file(), but
an earlier cache-tree update that says "if we have a tree object
already, then use it, otherwise even though we know the object name
for this subtree, do not record it in the cache-tree" codepath may
decide to record the subtree's sha1 without refreshing the referent.

A fix may look like this.

 cache-tree.c | 2 +-
 cache.h  | 1 +
 sha1_file.c  | 9 +++--
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 345ea35963..3ae6d056b4 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -401,7 +401,7 @@ static int update_one(struct cache_tree *it,
if (repair) {
unsigned char sha1[20];
hash_sha1_file(buffer.buf, buffer.len, tree_type, sha1);
-   if (has_sha1_file(sha1))
+   if (freshen_object(sha1))
hashcpy(it->sha1, sha1);
else
to_invalidate = 1;
diff --git a/cache.h b/cache.h
index 5cdea6833e..1f5694f308 100644
--- a/cache.h
+++ b/cache.h
@@ -1126,6 +1126,7 @@ extern int sha1_object_info(const unsigned char *, 
unsigned long *);
 extern int hash_sha1_file(const void *buf, unsigned long len, const char 
*type, unsigned char *sha1);
 extern int write_sha1_file(const void *buf, unsigned long len, const char 
*type, unsigned char *return_sha1);
 extern int hash_sha1_file_literally(const void *buf, unsigned long len, const 
char *type, unsigned char *sha1, unsigned flags);
+extern int freshen_object(const unsigned char *);
 extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned 
char *);
 extern int force_object_loose(const unsigned char *sha1, time_t mtime);
 extern int git_open_cloexec(const char *name, int flags);
diff --git a/sha1_file.c b/sha1_file.c
index e030805497..1daeb05dcd 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3275,6 +3275,11 @@ static int freshen_packed_object(const unsigned char 
*sha1)
return 1;
 }
 
+int freshen_object(const unsigned char *sha1)
+{
+   return freshen_packed_object(sha1) || freshen_loose_object(sha1);
+}
+
 int write_sha1_file(const void *buf, unsigned long len, const char *type, 
unsigned char *sha1)
 {
char hdr[32];
@@ -3284,7 +3289,7 @@ int write_sha1_file(const void *buf, unsigned long len, 
const char *type, unsign
 * it out into .git/objects/??/?{38} file.
 */
write_sha1_file_prepare(buf, len, type, sha1, hdr, );
-   if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
+   if (freshen_object(sha1))
return 0;
return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
 }
@@ -3302,7 +3307,7 @@ int hash_sha1_file_literally(const void *buf, unsigned 
long len, const char *typ
 
if (!(flags & HASH_WRITE_OBJECT))
goto cleanup;
-   if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
+   if (freshen_object(sha1))
goto cleanup;
status = write_loose_object(sha1, header, hdrlen, buf, len, 0);

Re: [PATCH 1/2] push: --dry-run updates submodules when --recurse-submodules=on-demand

2016-11-15 Thread Johannes Sixt


Am 15.11.2016 um 18:29 schrieb Brandon Williams:

I'm assuming the reason we want to avoid sub-shells is
for performance reasons right?


Yes, every fork() saved is a win on Windows. (No pun intended ;)

-- Hannes

Re: ignore blank line removals

2016-11-15 Thread John Rood

Yes that makes sense.
I was not aware of custom merge drivers, but indeed that may address
my situation. I'll look into it.
Thanks!

On Tue, Nov 15, 2016 at 12:51 PM, Junio C Hamano  wrote:
> John Rood  writes:
>
>> On Thu, Nov 3, 2016 at 10:57 AM, John Rood  wrote:
>>> If the contents of a file initially are:
>>>   one
>>>
>>>   three
>>> and on branch A there is a commit, removing the blank line:
>>>   one
>>>   three
>>> and on branch B there is a commit, adding 'two':
>>>   one
>>>   two
>>>   three
>>> Normally, if you try to merge A into B (or B into A), git recognizes a
>>> decision needs to be made between removing the second line or add
>>> "two" to the second line. It would be convenient to have a merge
>>> strategy that defaults to the latter in cases where the removed line
>>> was a blank line (or a line containing only whitespace) ...something
>>> like -Xignore-blank-line-removals.
>>
>> Is there any push-back on this, or is there a backlog that we can add
>> this feature to?
>
> If you mean by "push-back" objections that say "this feature is evil
> and should not be added to Git, ever", I do not think we saw any on
> the list.  The lack of response is most likely that everybody
> thought "Meh." aka "It is not useful/interesting/valuable enough
> feature to bother discussing."
>
> One thing I wondered was what you would want if the contents were
> one/three without blank, A added blank between the two and B
> replaced blank with two.  As your example shows, in the filetype you
> are dealing with, a blank line has no significant meaning (otherwise
> you won't be ignoring the change A made to remove the blank in your
> original example).  The outcome desired by you may be one/two/three
> without any blank in that case because of that.  Which would lead to
> the suspicion that ignore-blank-line-removals is not a good general
> feature (i.e. in this other example, you want to ignore blank line
> addition).  Which further leads to the suspicion that the desire you
> expressed in the original post is not well thought through to be a
> useful specifification to build anything out of (yet), but is merely
> a potentially interesting discussion starter.  And nobody so far
> found it interesting enough to spend time discussing it further to
> make the wish detailed enough to be called a "design" of a new
> feature.
>
> Having said all that.
>
> I suspect that you may not have to make any change to Git to do what
> you want; can't you just use the 'merge' attribute with a custom
> 3-way merge driver that removes an empty line?
>

Re: Protecting old temporary objects being reused from concurrent "git gc"?

On Tue, 2016-11-15 at 12:40 -0500, Jeff King wrote:
> On Tue, Nov 15, 2016 at 12:33:04PM -0500, Matt McCutchen wrote:
> 
> > 
> > On Tue, 2016-11-15 at 12:06 -0500, Jeff King wrote:
> > > 
> > >  - when an object write is optimized out because we already have the
> > >    object, git will update the mtime on the file (loose object or
> > >    packfile) to freshen it
> > 
> > FWIW, I am not seeing this happen when I do "git read-tree --prefix"
> > followed by "git write-tree" using the current master (3ab2281).  See
> > the attached test script.
> 
> The optimization I'm thinking about is the one from write_sha1_file(),
> which learned to freshen in 33d4221c7 (write_sha1_file: freshen existing
> objects, 2014-10-15).
> 
> I suspect the issue is that read-tree populates the cache-tree index
> extension, and then write-tree omits the object write before it even
> gets to write_sha1_file(). The solution is that it should probably be
> calling one of the freshen() functions (possibly just replacing
> has_sha1_file() with check_and_freshen(), but I haven't looked).
> 
> I'd definitely welcome patches in this area.

Cool, it's nice to have an idea of what's going on.  I don't think I'm
going to try to fix it myself though.

By the way, thanks for the fast response to my original question!

Matt

[PATCH] git-gc.txt: expand discussion of races with other processes

In general, "git gc" may delete objects that another concurrent process
is using but hasn't created a reference to.  Git has some mitigations,
but they fall short of a complete solution.  Document this in the
git-gc(1) man page and add a reference from the documentation of the
gc.pruneExpire config variable.

Based on a write-up by Jeff King:

http://marc.info/?l=git=147922960131779=2

Signed-off-by: Matt McCutchen 
---
 Documentation/config.txt |  4 +++-
 Documentation/git-gc.txt | 34 ++
 2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 21fdddf..3f1d931 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1409,7 +1409,9 @@ gc.pruneExpire::
Override the grace period with this config variable.  The value
"now" may be used to disable this grace period and always prune
unreachable objects immediately, or "never" may be used to
-   suppress pruning.
+   suppress pruning.  This feature helps prevent corruption when
+   'git gc' runs concurrently with another process writing to the
+   repository; see the "NOTES" section of linkgit:git-gc[1].
 
 gc.worktreePruneExpire::
When 'git gc' is run, it calls
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index bed60f4..852b72c 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -63,11 +63,10 @@ automatic consolidation of packs.
 --prune=::
Prune loose objects older than date (default is 2 weeks ago,
overridable by the config variable `gc.pruneExpire`).
-   --prune=all prunes loose objects regardless of their age (do
-   not use --prune=all unless you know exactly what you are doing.
-   Unless the repository is quiescent, you will lose newly created
-   objects that haven't been anchored with the refs and end up
-   corrupting your repository).  --prune is on by default.
+   --prune=all prunes loose objects regardless of their age and
+   increases the risk of corruption if another process is writing to
+   the repository concurrently; see "NOTES" below. --prune is on by
+   default.
 
 --no-prune::
Do not prune any loose objects.
@@ -138,17 +137,36 @@ default is "2 weeks ago".
 Notes
 -
 
-'git gc' tries very hard to be safe about the garbage it collects. In
+'git gc' tries very hard not to delete objects that are referenced
+anywhere in your repository. In
 particular, it will keep not only objects referenced by your current set
 of branches and tags, but also objects referenced by the index,
 remote-tracking branches, refs saved by 'git filter-branch' in
 refs/original/, or reflogs (which may reference commits in branches
 that were later amended or rewound).
-
-If you are expecting some objects to be collected and they aren't, check
+If you are expecting some objects to be deleted and they aren't, check
 all of those locations and decide whether it makes sense in your case to
 remove those references.
 
+On the other hand, when 'git gc' runs concurrently with another process,
+there is a risk of it deleting an object that the other process is using
+but hasn't created a reference to. This may just cause the other process
+to fail or may corrupt the repository if the other process later adds a
+reference to the deleted object. Git has two features that significantly
+mitigate this problem:
+
+. Any object with modification time newer than the `--prune` date is kept,
+  along with everything reachable from it.
+
+. Most operations that add an object to the database update the
+  modification time of the object if it is already present so that #1
+  applies.
+
+However, these features fall short of a complete solution, so users who
+run commands concurrently have to live with some risk of corruption (which
+seems to be low in practice) unless they turn off automatic garbage
+collection with 'git config gc.auto 0'.
+
 HOOKS
 -
 
-- 
2.7.4

Re: ignore blank line removals

John Rood  writes:

> On Thu, Nov 3, 2016 at 10:57 AM, John Rood  wrote:
>> If the contents of a file initially are:
>>   one
>>
>>   three
>> and on branch A there is a commit, removing the blank line:
>>   one
>>   three
>> and on branch B there is a commit, adding 'two':
>>   one
>>   two
>>   three
>> Normally, if you try to merge A into B (or B into A), git recognizes a
>> decision needs to be made between removing the second line or add
>> "two" to the second line. It would be convenient to have a merge
>> strategy that defaults to the latter in cases where the removed line
>> was a blank line (or a line containing only whitespace) ...something
>> like -Xignore-blank-line-removals.
>
> Is there any push-back on this, or is there a backlog that we can add
> this feature to?

If you mean by "push-back" objections that say "this feature is evil
and should not be added to Git, ever", I do not think we saw any on
the list.  The lack of response is most likely that everybody
thought "Meh." aka "It is not useful/interesting/valuable enough
feature to bother discussing."

One thing I wondered was what you would want if the contents were
one/three without blank, A added blank between the two and B
replaced blank with two.  As your example shows, in the filetype you
are dealing with, a blank line has no significant meaning (otherwise
you won't be ignoring the change A made to remove the blank in your
original example).  The outcome desired by you may be one/two/three
without any blank in that case because of that.  Which would lead to
the suspicion that ignore-blank-line-removals is not a good general
feature (i.e. in this other example, you want to ignore blank line
addition).  Which further leads to the suspicion that the desire you
expressed in the original post is not well thought through to be a
useful specifification to build anything out of (yet), but is merely
a potentially interesting discussion starter.  And nobody so far
found it interesting enough to spend time discussing it further to
make the wish detailed enough to be called a "design" of a new
feature.

Having said all that.

I suspect that you may not have to make any change to Git to do what
you want; can't you just use the 'merge' attribute with a custom
3-way merge driver that removes an empty line?

Re: RFC: Enable delayed responses to Git clean/smudge filter requests

2016-11-15 Thread Eric Wong

Lars Schneider  wrote:
> > On 15 Nov 2016, at 02:03, Eric Wong  wrote:
> 
> > Anyways, I'll plan on doing something similar (in Perl) with the
> > synchronous parts of public-inbox which relies on "cat-file --batch"
> > at some point... (my rotational disks are slw :<)
> 
> That sounds interesting! What changes to you have in mind for 
> "cat-file --batch"? We are thinking about performance improvements
> in that area, too. I would be happy to help reviewing your patches!

I'm not touching "cat-file --batch" itself, just the Perl code
that uses it.  (still trying my best to avoid working on C or
any compiled languages);

There'll probably be a Perl queue mechanism so there's still
only one cat-file process running per-inbox to avoid overloading
on seeks; but the Perl daemons should be able to handle network
duties and other in-memory operations while it waits for
cat-file.

Re: gitweb html validation

2016-11-15 Thread Ralf Thielow

Raphaël Gertz  wrote:
> Hi,
> 
> There a small bug in gitweb html validation, you need the following patch to
> pass w3c check with searchbox enabled.
> 
> The problem lies in the input directly embed inside a form without a wrapper
> which is not valid.
>

I agree this is a small bug. Only block level elements are
allowed to be inside form tags, according to
https://www.w3.org/2010/04/xhtml10-strict.html#elem_form

> Best regards
> 
> The following patch fix the issue for git-2.10.2 :
> --- /usr/share/gitweb/gitweb.cgi.orig   2016-11-15 15:37:21.149805026 +0100
> +++ /usr/share/gitweb/gitweb.cgi2016-11-15 15:37:48.579240429 +0100
> @@ -5518,6 +5518,7 @@ sub git_project_search_form {
> 
> print "\n";
> print $cgi->start_form(-method => 'get', -action => $my_uri) .
> + ''.
>   $cgi->hidden(-name => 'a', -value => 'project_list')  . "\n";
> print $cgi->hidden(-name => 'pf', -value => $project_filter). "\n"
> if (defined $project_filter);
> @@ -5529,6 +5530,7 @@ sub git_project_search_form {
>  -checked => $search_use_regexp) .
>   "\n" .
>   $cgi->submit(-name => 'btnS', -value => 'Search') .
> + ''.
>   $cgi->end_form() . "\n" .
>   $cgi->a({-href => href(project => undef, searchtext => undef,
>  project_filter => $project_filter)},

I think it's better to just move the -Tag outside of the
surrounding div?
Something like this perhaps, I didn't test it myself yet.

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 7cf68f07b..33d7c154f 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -5531,8 +5531,8 @@ sub git_project_search_form {
$limit = " in '$project_filter/'";
}
 
-   print "\n";
print $cgi->start_form(-method => 'get', -action => $my_uri) .
+ "\n" .
  $cgi->hidden(-name => 'a', -value => 'project_list')  . "\n";
print $cgi->hidden(-name => 'pf', -value => $project_filter). "\n"
if (defined $project_filter);
@@ -5544,11 +5544,11 @@ sub git_project_search_form {
 -checked => $search_use_regexp) .
  "\n" .
  $cgi->submit(-name => 'btnS', -value => 'Search') .
- $cgi->end_form() . "\n" .
  $cgi->a({-href => href(project => undef, searchtext => undef,
 project_filter => $project_filter)},
  esc_html("List all projects$limit")) . "\n";
-   print "\n";
+   print "\n" .
+ $cgi->end_form() . "\n";
 }
 
 # entry for given @keys needs filling if at least one of keys in list
diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css
index 321260103..507740b6a 100644
--- a/gitweb/static/gitweb.css
+++ b/gitweb/static/gitweb.css
@@ -539,7 +539,7 @@ div.projsearch {
margin: 20px 0px;
 }
 
-div.projsearch form {
+form div.projsearch {
margin-bottom: 2px;
 }

Re: RFC: Enable delayed responses to Git clean/smudge filter requests

Lars Schneider  writes:

>> The filter itself would need to be aware of parallelism
>> if it lives for multiple objects, right?
>
> Correct. This way Git doesn't need to deal with threading...

I think you need to be careful about three things (at least; there
may be more):

 * Codepaths that check out multiple cache entries do rely on the
   order of checkout.  We checkout removals first to make room so
   that creation of a path X can succeed if an existing path X/Y
   that used to want to see X as a directory can succeed (see the
   use of checkout_entry() by "git checkout", which does have two
   separate loops to explicitly guarantee this), for example.  I
   think "remove all and then create" you do not specifically have
   to worry about with the proposed change, but you may need to
   inspect and verify there aren't other kind of order dependency.

 * Done naively, it will lead to unmaintainable code, like this:

   + struct list_of_cache_entries *list = ...;
 for (i = 0; i < active_nr; i++)
   -checkout_entry(active_cache[i], state, NULL);
   +if (checkout_entry(active_cache[i], state, NULL) == DELAYED)
   +   add_cache_to_queue(, active_cache[i]);
   + while (list) {
   +wait_for_checkout_to_finish(*list);
   +list = list->next;
   + }

   I do not think we want to see such a rewrite all over the
   codepaths.  It might be OK to add such a "these entries are known
   to be delayed" list in struct checkout so that the above becomes
   more like this:

 for (i = 0; i < active_nr; i++)
checkout_entry(active_cache[i], state, NULL);
   + checkout_entry_finish(state);

   That is, addition of a single "some of the checkout_entry() calls
   done so far might have been lazy, and I'll give them a chance to
   clean up" might be palatable.  Anything more than that on the
   caller side is not.

 * You'd need to rein in the maximum parallelism somehow, as you do
   not want to see hundreds of competing filter processes starting
   only to tell the main loop over an index with hundreds of entries
   that they are delayed checkouts.

Re: ignore blank line removals

2016-11-15 Thread John Rood

Is there any push-back on this, or is there a backlog that we can add
this feature to?

On Thu, Nov 3, 2016 at 10:57 AM, John Rood  wrote:
> If the contents of a file initially are:
>   one
>
>   three
> and on branch A there is a commit, removing the blank line:
>   one
>   three
> and on branch B there is a commit, adding 'two':
>   one
>   two
>   three
> Normally, if you try to merge A into B (or B into A), git recognizes a
> decision needs to be made between removing the second line or add
> "two" to the second line. It would be convenient to have a merge
> strategy that defaults to the latter in cases where the removed line
> was a blank line (or a line containing only whitespace) ...something
> like -Xignore-blank-line-removals.

Re: [PATCH v3 0/4] Speedup finding of unpushed submodules

On 11/15, Stefan Beller wrote:
> On Tue, Nov 15, 2016 at 6:56 AM, Heiko Voigt  wrote:
> > You can find the second iteration of this series here:
> >
> > http://public-inbox.org/git/cover.1475851621.git.hvo...@hvoigt.net/
> >
> > All mentioned issues should be fixed. I put the NEEDSWORK comment in a
> > seperate patch since it seemed to me as if we did not fully agree on
> > that. So in case we decide against it we can just drop that patch.
> >
> > Cheers Heiko
> >
> 
> +cc Brandon who started building a series on top of yours.

I don't think there should be too much I'll have to change with my
series, I'll just rebase against these changes once Junio updates his
branch.

If you want to take a look at my series its here:
https://public-inbox.org/git/1479172735-698-1-git-send-email-bmw...@google.com/

Thanks for the heads up Stefan.

-- 
Brandon Williams

Re: [PATCH v3 0/4] Speedup finding of unpushed submodules

On Tue, Nov 15, 2016 at 6:56 AM, Heiko Voigt  wrote:
> You can find the second iteration of this series here:
>
> http://public-inbox.org/git/cover.1475851621.git.hvo...@hvoigt.net/
>
> All mentioned issues should be fixed. I put the NEEDSWORK comment in a
> seperate patch since it seemed to me as if we did not fully agree on
> that. So in case we decide against it we can just drop that patch.
>
> Cheers Heiko
>

+cc Brandon who started building a series on top of yours.

Re: [PATCH] remote-curl: don't hang when a server dies before any output

Jeff King  writes:

> On Mon, Nov 14, 2016 at 05:02:27PM -0800, Junio C Hamano wrote:
>
>> Jeff King  writes:
>> 
>> > Actually, I take it back. I think it works for a single round of ref
>> > negotiation, but not for multiple. Enabling GIT_TEST_LONG=1 causes it to
>> > fail t5551.
>> >
>> > I think I've probably made a mis-assumption on exactly when in the HTTP
>> > protocol we will see a flush packet (and perhaps that is a sign that
>> > this protocol-snooping approach is not a good one).
>> 
>> Hmph.  I think I tried David's original under GIT_TEST_LONG and saw
>> it got stuck; could be the same issue, I guess.
>
> It works OK here. I think it is just that the test is really slow (by
> design).

Yeah, I think what I recalled was my old attempt to run the
follow-up "any SHA-1" patch without this one.

Re: [PATCH v3 0/6] recursively grep across submodules

coverity seems to dislike this part:

*** CID 1394367:  Null pointer dereferences  (NULL_RETURNS)
/builtin/grep.c: 625 in grep_submodule()
619   is_submodule_populated(path))) {
620 /*
621  * If searching history, check for the presense of the
622  * submodule's gitdir before skipping the submodule.
623  */
624 if (sha1) {
>>> CID 1394367:  Null pointer dereferences  (NULL_RETURNS)
>>> Dereferencing a null pointer "submodule_from_path(null_sha1, path)".
625 path = git_path("modules/%s",
626
submodule_from_path(null_sha1, path)->name);
627
628 if (!(is_directory(path) &&
is_git_directory(path)))
629 return 0;
630 } else {

Re: [PATCH v7 13/17] ref-filter: add `:dir` and `:base` options for ref printing atoms

Jacob Keller  writes:

> dirname makes sense. What about implementing a reverse variant of
> strip, which you could perform stripping of right-most components and
> instead of stripping by a number, strip "to" a number, ie: keep the
> left N most components, and then you could use something like
> ...
> I think that would be more general purpose than basename, and less confusing?

I think you are going in the right direction.  I had a similar
thought but built around a different axis.  I.e. if strip=1 strips
one from the left, perhaps we want to have rstrip=1 that strips one
from the right, and also strip=-1 to mean strip everything except
one from the left and so on?.  I think this and your keep (and
perhaps you'll have rkeep for completeness) have the same expressive
power.  I do not offhand have a preference one over the other.

Somehow it sounds a bit strange to me to treat 'remotes' as the same
class of token as 'heads' and 'tags' (I'd expect 'heads' and
'remotes/origin' would be at the same level in end-user's mind), but
that is probably an unrelated tangent.  The reason this series wants
to introduce :base must be to emulate an existing feature, so that
existing feature is a concrete counter-example that argues against
my "it sounds a bit strange" reaction.

Re: Protecting old temporary objects being reused from concurrent "git gc"?

2016-11-15 Thread Jeff King

On Tue, Nov 15, 2016 at 12:33:04PM -0500, Matt McCutchen wrote:

> On Tue, 2016-11-15 at 12:06 -0500, Jeff King wrote:
> >  - when an object write is optimized out because we already have the
> >    object, git will update the mtime on the file (loose object or
> >    packfile) to freshen it
> 
> FWIW, I am not seeing this happen when I do "git read-tree --prefix"
> followed by "git write-tree" using the current master (3ab2281).  See
> the attached test script.

The optimization I'm thinking about is the one from write_sha1_file(),
which learned to freshen in 33d4221c7 (write_sha1_file: freshen existing
objects, 2014-10-15).

I suspect the issue is that read-tree populates the cache-tree index
extension, and then write-tree omits the object write before it even
gets to write_sha1_file(). The solution is that it should probably be
calling one of the freshen() functions (possibly just replacing
has_sha1_file() with check_and_freshen(), but I haven't looked).

I'd definitely welcome patches in this area.

> OK.  I'll write a patch to add a summary of this information to the
> git-gc man page.

Sounds like a good idea. Thanks.

-Peff

Re: Protecting old temporary objects being reused from concurrent "git gc"?

On Tue, 2016-11-15 at 12:06 -0500, Jeff King wrote:
>  - when an object write is optimized out because we already have the
>    object, git will update the mtime on the file (loose object or
>    packfile) to freshen it

FWIW, I am not seeing this happen when I do "git read-tree --prefix"
followed by "git write-tree" using the current master (3ab2281).  See
the attached test script.

> If you have long-running data (like, a temporary index file that might
> literally sit around for days or weeks) I think that is a potential
> problem. And the solution is probably to use refs in some way to point
> to your objects.

Agreed.  This is not my current scenario.

> If you're worried about a short-term operation where
> somebody happens to run git-gc concurrently, I agree it's a possible
> problem, but I suspect something you can ignore in practice.
> 
> For the most part, a lot of the client-side git tools assume that one
> operation is happening at a time in the repository. And I think that
> largely holds for a developer working on a single clone, and things just
> work in practice.
> 
> Auto-gc makes that a little sketchier, but historically does not seem to
> have really caused problems in practice.

OK.  I'll write a patch to add a summary of this information to the
git-gc man page.

Matt

test-git-read-tree-write-tree-touch-object.sh
Description: application/shellscript

Re: [PATCH 1/2] push: --dry-run updates submodules when --recurse-submodules=on-demand

On 11/15, Johannes Sixt wrote:
> Am 15.11.2016 um 02:18 schrieb Brandon Williams:
> >diff --git a/t/t5531-deep-submodule-push.sh b/t/t5531-deep-submodule-push.sh
> >index 198ce84..e6ccc30 100755
> >--- a/t/t5531-deep-submodule-push.sh
> >+++ b/t/t5531-deep-submodule-push.sh
> >@@ -427,7 +427,31 @@ test_expect_success 'push unpushable submodule 
> >recursively fails' '
> > cd submodule.git &&
> > git rev-parse master >../actual
> > ) &&
> >-test_cmp expected actual
> >+test_cmp expected actual &&
> >+git -C work reset --hard master^
> 
> This line looks like a clean-up to be done after the test case. You
> should wrap it in test_when_finished, but outside of a sub-shell,
> which looks like it's just one line earlier, before the test_cmp.

K will do.

> 
> >+'
> >+
> >+test_expect_failure 'push --dry-run does not recursively update submodules' 
> >'
> >+(
> >+cd work &&
> >+(
> >+cd gar/bage &&
> >+git checkout master &&
> >+git rev-parse master >../../../expected_submodule &&
> >+> junk9 &&
> >+git add junk9 &&
> >+git commit -m "Ninth junk"
> >+) &&
> 
> Could you please avoid this nested sub-shell? It is fine to cd
> around when you are in a sub-shell.

Yes I can reorganize it to avoid the nested sub-shells.  I was just
trying to follow the organization of the other tests in the same file.

> 
> >+git checkout master &&
> >+git rev-parse master >../expected_pub
> 
> Broken && chain.
> 
> >+git add gar/bage &&
> >+git commit -m "Ninth commit for gar/bage" &&
> >+git push --dry-run --recurse-submodules=on-demand ../pub.git 
> >master
> >+) &&
> >+git -C submodule.git rev-parse master >actual_submodule &&
> >+git -C pub.git rev-parse master >actual_pub &&
> 
> All of the commands above are 'git something' that could become 'git
> -C work something' and then the sub-shell would be unnecessary. I'm
> not sure I would appreciate the verbosity of the result, though.
> (Perhaps aligning the git subcommands after -C foo would help.)

I'll play around with it and try to make it look pretty while trying to
avoid sub-shells.  I'm assuming the reason we want to avoid sub-shells is
for performance reasons right?

-- 
Brandon Williams

Re: Protecting old temporary objects being reused from concurrent "git gc"?

2016-11-15 Thread Jeff King

On Tue, Nov 15, 2016 at 09:13:14AM -0500, Matt McCutchen wrote:

> I want to change this to something that won't leave an inconsistent
> state if interrupted.  I've written code for this kind of thing before
> that sets GIT_INDEX_FILE and uses a temporary index file and "git
> write-tree".  But I realized that if "git gc" runs concurrently, the
> generated tree could be deleted before it is used and the tool would
> fail.  If I had a need to run "git commit-tree", it seems like I might
> even end up with a commit object with a broken reference to a tree.
>  "git gc" normally doesn't delete objects that were created in the last
> 2 weeks, but if an identical tree was added to the object database more
> than 2 weeks ago by another operation and is unreferenced, it could be
> reused without updating its mtime and it could still get deleted.

Modern versions of git do two things to help with this:

 - any object which is referenced by a "recent" object (within the 2
   weeks) is also considered recent. So if you create a new commit
   object that points to a tree, even before you reference the commit
   that tree is protected

 - when an object write is optimized out because we already have the
   object, git will update the mtime on the file (loose object or
   packfile) to freshen it

This isn't perfect, though. You can decide to reference an existing
object just as it is being deleted. And the pruning process itself is
not atomic (and it's tricky to make it so, just because of what we're
promised by the filesystem).

> Is there a recommended way to avoid this kind of problem in add-on
> tools?  (I searched the Git documentation and the web for information
> about races with "git gc" and didn't find anything useful.)  If not, it
> seems to be a significant design flaw in "git gc", even if the problem
> is extremely rare in practice.  I wonder if some of the built-in
> commands may have the same problem, though I haven't tried to test
> them.  If this is confirmed to be a known problem affecting built-in
> commands, then at least I won't feel bad about introducing the
> same problem into add-on tools. :/

If you have long-running data (like, a temporary index file that might
literally sit around for days or weeks) I think that is a potential
problem. And the solution is probably to use refs in some way to point
to your objects. If you're worried about a short-term operation where
somebody happens to run git-gc concurrently, I agree it's a possible
problem, but I suspect something you can ignore in practice.

For the most part, a lot of the client-side git tools assume that one
operation is happening at a time in the repository. And I think that
largely holds for a developer working on a single clone, and things just
work in practice.

Auto-gc makes that a little sketchier, but historically does not seem to
have really caused problems in practice.

For a busy multi-user server, I recommend turning off auto-gc entirely,
and repacking manually with "-k" to be on the safe side.

-Peff

Re: [PATCH] t0021, t5615: use $PWD instead of $(pwd) in PATH-like shell variables

2016-11-15 Thread Johannes Schindelin

Hi Junio,

On Mon, 14 Nov 2016, Junio C Hamano wrote:

> Dscho's mention of 'still times out' may be an indiciation that
> something unspecified on 'pu' is not ready to be merged to 'next',
> but blocking all of 'pu' with a blanket statement is not useful,
> and that was where my response comes from.

Until the time when the test suite takes less than the insane three hours
to run, I am afraid that a blanket statement on `pu` is the best I can do.

Ciao,
Johannes

gitweb html validation

2016-11-15 Thread Raphaël Gertz


Hi,

There a small bug in gitweb html validation, you need the following 
patch to pass w3c check with searchbox enabled.


The problem lies in the input directly embed inside a form without a 
wrapper which is not valid.


Best regards

The following patch fix the issue for git-2.10.2 :
--- /usr/share/gitweb/gitweb.cgi.orig   2016-11-15 15:37:21.149805026 
+0100
+++ /usr/share/gitweb/gitweb.cgi2016-11-15 15:37:48.579240429 
+0100

@@ -5518,6 +5518,7 @@ sub git_project_search_form {

print "\n";
print $cgi->start_form(-method => 'get', -action => $my_uri) .
+ ''.
  $cgi->hidden(-name => 'a', -value => 'project_list')  . 
"\n";
print $cgi->hidden(-name => 'pf', -value => $project_filter). 
"\n"

if (defined $project_filter);
@@ -5529,6 +5530,7 @@ sub git_project_search_form {
 -checked => $search_use_regexp) .
  "\n" .
  $cgi->submit(-name => 'btnS', -value => 'Search') .
+ ''.
  $cgi->end_form() . "\n" .
  $cgi->a({-href => href(project => undef, searchtext => 
undef,
 project_filter => 
$project_filter)},

RE: [PATCH] remote-curl: don't hang when a server dies before any output

> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
...
> I'll make that change and then try to wrap this up with a commit message.
> I plan to steal your tests, if that's OK.

Please do!

Re: New to git, need help!

2016-11-15 Thread Pranit Bauva

Hey Mayank,

On Tue, Nov 15, 2016 at 6:00 PM, Mayank Gupta
 wrote:
> Hi All,
>
> I'm new to open source and have recently joined this mailing list.
> Since I'm new at this, I think I can initially contribute to the
> community by fixing some small bugs or errors but as the documentation
> is too large, I don't know where to start.
> So if anybody could guide me on how to go about it, it would be really
> appreciated.

It is really nice that you want to start contributing. We have a user
documentation[1] and a developer documentation[2]. To contribute to a
feature, firstly you need to learn about that feature. So go through
the man pages of the tool which you would want to work on. The things
that you would be using are documented in the technical documentation.
Apart from this, you can start hunting small errors but I think it
would be difficult in the initial stages. So read this[3] even though
it is specific for GSoC, it is helpful for any new developer. Also you
can search the mailing list archives for "low hanging fruit" to get
things which were thought of but not done or something like that.

[1]: https://github.com/git/git/tree/master/Documentation
[2]: https://github.com/git/git/tree/master/Documentation/technical
[3]: https://git.github.io/SoC-2016-Microprojects/

Hope to see a patch from your side soon! :)

Regards,
Pranit Bauva

Re: [PATCH v1 2/2] travis-ci: disable GIT_TEST_HTTPD for macOS

2016-11-15 Thread Jeff King

On Tue, Nov 15, 2016 at 01:07:18PM +0100, Heiko Voigt wrote:

> On Fri, Nov 11, 2016 at 09:22:51AM +0100, Lars Schneider wrote:
> > To all macOS users on the list:
> > Does anyone execute the tests with GIT_TEST_HTTPD enabled successfully?
> 
> Nope. The following tests fail for me on master: 5539, 5540, 5541, 5542,
> 5550, 5551, 5561, 5812.

Failing how? Does apache fail to start up? Do tests fails? What does
"-v" say? Is there anything interesting in httpd/error.log in the trash
directory?

-Peff

Re: Git status takes too long- How to improve the performance of git

On Tue, Nov 15, 2016 at 02:33:12AM -0700, ravalika wrote:
> Number of files - 63883

Since you also posted this to the "Git for Windows" mailinglist I assume
that you are using Windows. Reduce the number of files. For example
split the repository into two one for documentation and one for source.
Thats what I did with a converted repository that had to many files.

Windows is unfortunately very slow when it comes to handling many files
and if I recall correctly ~3 files was in a nicely handleable range
for a Git repository on Windows, but that might have changed...

Cheers Heiko

[PATCH v3 3/4] batch check whether submodule needs pushing into one call

We run a command for each sha1 change in a submodule. This is
unnecessary since we can simply batch all sha1's we want to check into
one command. Lets do it so we can speedup the check when many submodule
changes are in need of checking.

Signed-off-by: Heiko Voigt 
---
 submodule.c | 63 -
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/submodule.c b/submodule.c
index 769d666..e1196fd 100644
--- a/submodule.c
+++ b/submodule.c
@@ -507,27 +507,49 @@ static int append_sha1_to_argv(const unsigned char 
sha1[20], void *data)
return 0;
 }
 
-static int submodule_needs_pushing(const char *path, const unsigned char 
sha1[20])
+static int check_has_commit(const unsigned char sha1[20], void *data)
 {
-   if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
+   int *has_commit = (int *) data;
+
+   if (!lookup_commit_reference(sha1))
+   *has_commit = 0;
+
+   return 0;
+}
+
+static int submodule_has_commits(const char *path, struct sha1_array *commits)
+{
+   int has_commit = 1;
+
+   if (add_submodule_odb(path))
+   return 0;
+
+   sha1_array_for_each_unique(commits, check_has_commit, _commit);
+   return has_commit;
+}
+
+static int submodule_needs_pushing(const char *path, struct sha1_array 
*commits)
+{
+   if (!submodule_has_commits(path, commits))
return 0;
 
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
struct child_process cp = CHILD_PROCESS_INIT;
-   const char *argv[] = {"rev-list", NULL, "--not", "--remotes", 
"-n", "1" , NULL};
struct strbuf buf = STRBUF_INIT;
int needs_pushing = 0;
 
-   argv[1] = sha1_to_hex(sha1);
-   cp.argv = argv;
+   argv_array_push(, "rev-list");
+   sha1_array_for_each_unique(commits, append_sha1_to_argv, 
);
+   argv_array_pushl(, "--not", "--remotes", "-n", "1" , 
NULL);
+
prepare_submodule_repo_env(_array);
cp.git_cmd = 1;
cp.no_stdin = 1;
cp.out = -1;
cp.dir = path;
if (start_command())
-   die("Could not run 'git rev-list %s --not --remotes -n 
1' command in submodule %s",
-   sha1_to_hex(sha1), path);
+   die("Could not run 'git rev-list  --not 
--remotes -n 1' command in submodule %s",
+   path);
if (strbuf_read(, cp.out, 41))
needs_pushing = 1;
finish_command();
@@ -582,23 +604,6 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, );
 }
 
-struct collect_submodule_from_sha1s_data {
-   char *submodule_path;
-   struct string_list *needs_pushing;
-};
-
-static int collect_submodules_from_sha1s(const unsigned char sha1[20],
-   void *data)
-{
-   struct collect_submodule_from_sha1s_data *me =
-   (struct collect_submodule_from_sha1s_data *) data;
-
-   if (submodule_needs_pushing(me->submodule_path, sha1))
-   string_list_insert(me->needs_pushing, me->submodule_path);
-
-   return 0;
-}
-
 static void free_submodules_sha1s(struct string_list *submodules)
 {
struct string_list_item *item;
@@ -635,12 +640,10 @@ int find_unpushed_submodules(struct sha1_array *commits,
argv_array_clear();
 
for_each_string_list_item(submodule, ) {
-   struct collect_submodule_from_sha1s_data data;
-   data.submodule_path = submodule->string;
-   data.needs_pushing = needs_pushing;
-   sha1_array_for_each_unique((struct sha1_array *) 
submodule->util,
-   collect_submodules_from_sha1s,
-   );
+   struct sha1_array *commits = (struct sha1_array *) 
submodule->util;
+
+   if (submodule_needs_pushing(submodule->string, commits))
+   string_list_insert(needs_pushing, submodule->string);
}
free_submodules_sha1s();
 
-- 
2.10.1.386.gc503e45

[PATCH v3 2/4] serialize collection of refs that contain submodule changes

We are iterating over each pushed ref and want to check whether it
contains changes to submodules. Instead of immediately checking each ref
lets first collect them and then do the check for all of them in one
revision walk.

Signed-off-by: Heiko Voigt 
---
 submodule.c | 35 ---
 submodule.h |  5 +++--
 transport.c | 29 +
 3 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/submodule.c b/submodule.c
index b91585e..769d666 100644
--- a/submodule.c
+++ b/submodule.c
@@ -500,6 +500,13 @@ static int has_remote(const char *refname, const struct 
object_id *oid,
return 1;
 }
 
+static int append_sha1_to_argv(const unsigned char sha1[20], void *data)
+{
+   struct argv_array *argv = (struct argv_array *) data;
+   argv_array_push(argv, sha1_to_hex(sha1));
+   return 0;
+}
+
 static int submodule_needs_pushing(const char *path, const unsigned char 
sha1[20])
 {
if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
@@ -600,25 +607,24 @@ static void free_submodules_sha1s(struct string_list 
*submodules)
string_list_clear(submodules, 1);
 }
 
-int find_unpushed_submodules(unsigned char new_sha1[20],
+int find_unpushed_submodules(struct sha1_array *commits,
const char *remotes_name, struct string_list *needs_pushing)
 {
struct rev_info rev;
struct commit *commit;
-   const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
-   int argc = ARRAY_SIZE(argv) - 1;
-   char *sha1_copy;
struct string_list submodules = STRING_LIST_INIT_DUP;
struct string_list_item *submodule;
+   struct argv_array argv = ARGV_ARRAY_INIT;
 
-   struct strbuf remotes_arg = STRBUF_INIT;
-
-   strbuf_addf(_arg, "--remotes=%s", remotes_name);
init_revisions(, NULL);
-   sha1_copy = xstrdup(sha1_to_hex(new_sha1));
-   argv[1] = sha1_copy;
-   argv[3] = remotes_arg.buf;
-   setup_revisions(argc, argv, , NULL);
+
+   /* argv.argv[0] will be ignored by setup_revisions */
+   argv_array_push(, "find_unpushed_submodules");
+   sha1_array_for_each_unique(commits, append_sha1_to_argv, );
+   argv_array_push(, "--not");
+   argv_array_pushf(, "--remotes=%s", remotes_name);
+
+   setup_revisions(argv.argc, argv.argv, , NULL);
if (prepare_revision_walk())
die("revision walk setup failed");
 
@@ -626,8 +632,7 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
find_unpushed_submodule_commits(commit, );
 
reset_revision_walk();
-   free(sha1_copy);
-   strbuf_release(_arg);
+   argv_array_clear();
 
for_each_string_list_item(submodule, ) {
struct collect_submodule_from_sha1s_data data;
@@ -664,12 +669,12 @@ static int push_submodule(const char *path)
return 1;
 }
 
-int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name)
+int push_unpushed_submodules(struct sha1_array *commits, const char 
*remotes_name)
 {
int i, ret = 1;
struct string_list needs_pushing = STRING_LIST_INIT_DUP;
 
-   if (!find_unpushed_submodules(new_sha1, remotes_name, _pushing))
+   if (!find_unpushed_submodules(commits, remotes_name, _pushing))
return 1;
 
for (i = 0; i < needs_pushing.nr; i++) {
diff --git a/submodule.h b/submodule.h
index d9e197a..9454806 100644
--- a/submodule.h
+++ b/submodule.h
@@ -3,6 +3,7 @@
 
 struct diff_options;
 struct argv_array;
+struct sha1_array;
 
 enum {
RECURSE_SUBMODULES_CHECK = -4,
@@ -62,9 +63,9 @@ int submodule_uses_gitfile(const char *path);
 int ok_to_remove_submodule(const char *path);
 int merge_submodule(unsigned char result[20], const char *path, const unsigned 
char base[20],
const unsigned char a[20], const unsigned char b[20], int 
search);
-int find_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name,
+int find_unpushed_submodules(struct sha1_array *commits, const char 
*remotes_name,
struct string_list *needs_pushing);
-int push_unpushed_submodules(unsigned char new_sha1[20], const char 
*remotes_name);
+int push_unpushed_submodules(struct sha1_array *commits, const char 
*remotes_name);
 void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
 int parallel_submodules(void);
 
diff --git a/transport.c b/transport.c
index d57e8de..f482869 100644
--- a/transport.c
+++ b/transport.c
@@ -949,23 +949,36 @@ int transport_push(struct transport *transport,
 
if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && 
!is_bare_repository()) {
struct ref *ref = remote_refs;
+   struct sha1_array commits = SHA1_ARRAY_INIT;
+
for (; ref; ref = ref->next)
-   if (!is_null_oid(>new_oid) &&
-

[PATCH v3 4/4] submodule_needs_pushing() NEEDSWORK when we can not answer this question

Signed-off-by: Heiko Voigt 
---
 submodule.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/submodule.c b/submodule.c
index e1196fd..29efee9 100644
--- a/submodule.c
+++ b/submodule.c
@@ -531,6 +531,14 @@ static int submodule_has_commits(const char *path, struct 
sha1_array *commits)
 static int submodule_needs_pushing(const char *path, struct sha1_array 
*commits)
 {
if (!submodule_has_commits(path, commits))
+   /* NEEDSWORK: The correct answer here is "We do not
+* know" instead of "No push needed". We currently
+* proceed pushing here as if the submodules commits are
+* available on a remote. Since we can not check the
+* remote availability for this submodule we should
+* consider changing this behavior to: Stop here and
+* tell the user how to skip this check if wanted.
+*/
return 0;
 
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
-- 
2.10.1.386.gc503e45

[PATCH v3 1/4] serialize collection of changed submodules

To check whether a submodule needs to be pushed we need to collect all
changed submodules. Lets collect them first and then execute the
possibly expensive test whether certain revisions are already pushed
only once per submodule.

There is further potential for optimization since we can assemble one
command and only issued that instead of one call for each remote ref in
the submodule.

Signed-off-by: Heiko Voigt 
---
 submodule.c | 60 
 1 file changed, 56 insertions(+), 4 deletions(-)

diff --git a/submodule.c b/submodule.c
index 6f7d883..b91585e 100644
--- a/submodule.c
+++ b/submodule.c
@@ -532,19 +532,34 @@ static int submodule_needs_pushing(const char *path, 
const unsigned char sha1[20
return 0;
 }
 
+static struct sha1_array *submodule_commits(struct string_list *submodules,
+   const char *path)
+{
+   struct string_list_item *item;
+
+   item = string_list_insert(submodules, path);
+   if (item->util)
+   return (struct sha1_array *) item->util;
+
+   /* NEEDSWORK: should we have sha1_array_init()? */
+   item->util = xcalloc(1, sizeof(struct sha1_array));
+   return (struct sha1_array *) item->util;
+}
+
 static void collect_submodules_from_diff(struct diff_queue_struct *q,
 struct diff_options *options,
 void *data)
 {
int i;
-   struct string_list *needs_pushing = data;
+   struct string_list *submodules = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
+   struct sha1_array *commits;
if (!S_ISGITLINK(p->two->mode))
continue;
-   if (submodule_needs_pushing(p->two->path, p->two->oid.hash))
-   string_list_insert(needs_pushing, p->two->path);
+   commits = submodule_commits(submodules, p->two->path);
+   sha1_array_append(commits, p->two->oid.hash);
}
 }
 
@@ -560,6 +575,31 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, );
 }
 
+struct collect_submodule_from_sha1s_data {
+   char *submodule_path;
+   struct string_list *needs_pushing;
+};
+
+static int collect_submodules_from_sha1s(const unsigned char sha1[20],
+   void *data)
+{
+   struct collect_submodule_from_sha1s_data *me =
+   (struct collect_submodule_from_sha1s_data *) data;
+
+   if (submodule_needs_pushing(me->submodule_path, sha1))
+   string_list_insert(me->needs_pushing, me->submodule_path);
+
+   return 0;
+}
+
+static void free_submodules_sha1s(struct string_list *submodules)
+{
+   struct string_list_item *item;
+   for_each_string_list_item(item, submodules)
+   sha1_array_clear((struct sha1_array *) item->util);
+   string_list_clear(submodules, 1);
+}
+
 int find_unpushed_submodules(unsigned char new_sha1[20],
const char *remotes_name, struct string_list *needs_pushing)
 {
@@ -568,6 +608,8 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
int argc = ARRAY_SIZE(argv) - 1;
char *sha1_copy;
+   struct string_list submodules = STRING_LIST_INIT_DUP;
+   struct string_list_item *submodule;
 
struct strbuf remotes_arg = STRBUF_INIT;
 
@@ -581,12 +623,22 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
die("revision walk setup failed");
 
while ((commit = get_revision()) != NULL)
-   find_unpushed_submodule_commits(commit, needs_pushing);
+   find_unpushed_submodule_commits(commit, );
 
reset_revision_walk();
free(sha1_copy);
strbuf_release(_arg);
 
+   for_each_string_list_item(submodule, ) {
+   struct collect_submodule_from_sha1s_data data;
+   data.submodule_path = submodule->string;
+   data.needs_pushing = needs_pushing;
+   sha1_array_for_each_unique((struct sha1_array *) 
submodule->util,
+   collect_submodules_from_sha1s,
+   );
+   }
+   free_submodules_sha1s();
+
return needs_pushing->nr;
 }
 
-- 
2.10.1.386.gc503e45

[PATCH v3 0/4] Speedup finding of unpushed submodules

You can find the second iteration of this series here:

http://public-inbox.org/git/cover.1475851621.git.hvo...@hvoigt.net/

All mentioned issues should be fixed. I put the NEEDSWORK comment in a
seperate patch since it seemed to me as if we did not fully agree on
that. So in case we decide against it we can just drop that patch.

Cheers Heiko

Heiko Voigt (4):
  serialize collection of changed submodules
  serialize collection of refs that contain submodule changes
  batch check whether submodule needs pushing into one call
  submodule_needs_pushing() NEEDSWORK when we can not answer this
question

 submodule.c | 120 +++-
 submodule.h |   5 ++-
 transport.c |  29 +++
 3 files changed, 118 insertions(+), 36 deletions(-)

-- 
2.10.1.386.gc503e45

Re: RFC: Enable delayed responses to Git clean/smudge filter requests

2016-11-15 Thread Lars Schneider


> On 15 Nov 2016, at 02:03, Eric Wong  wrote:
> 
> Lars Schneider  wrote:
>> Hi,
>> 
>> Git always performs a clean/smudge filter on files in sequential order.
>> Sometimes a filter operation can take a noticeable amount of time. 
>> This blocks the entire Git process.
> 
> I have the same problem in many places which aren't git :>
> 
>> I would like to give a filter process the possibility to answer Git with
>> "I got your request, I am processing it, ask me for the result later!".
>> 
>> I see the following way to realize this:
>> 
>> In unpack-trees.c:check_updates() [1] we loop through the cache 
>> entries and "ask me later" could be an acceptable return value of the 
>> checkout_entry() call. The loop could run until all entries returned
>> success or error.
>> 
>> The filter machinery is triggered in various other places in Git and
>> all places that want to support "ask me later" would need to be patched 
>> accordingly.
> 
> That all sounds reasonable.
> 
> The filter itself would need to be aware of parallelism
> if it lives for multiple objects, right?

Correct. This way Git doesn't need to deal with threading...


>> Do you think this could be a viable approach?
> 
> It'd probably require a bit of work, but yes, I think it's viable.
> 
> We already do this with curl_multi requests for parallel
> fetching from dumb HTTP servers, but that's driven by curl
> internals operating with a select/poll loop.
> 
> Perhaps the curl API could be a good example for doing this.

Thanks for the pointer!


>> Do you see a better way?
> 
> Nope, I prefer non-blocking state machines to threads for
> debuggability and determinism.

Agreed!


> Anyways, I'll plan on doing something similar (in Perl) with the
> synchronous parts of public-inbox which relies on "cat-file --batch"
> at some point... (my rotational disks are slw :<)

That sounds interesting! What changes to you have in mind for 
"cat-file --batch"? We are thinking about performance improvements
in that area, too. I would be happy to help reviewing your patches!

Thanks a lot for your RFC feedback,
Lars

Re: [PATCH v1 2/2] travis-ci: disable GIT_TEST_HTTPD for macOS

2016-11-15 Thread Lars Schneider


> On 15 Nov 2016, at 13:07, Heiko Voigt  wrote:
> 
> On Fri, Nov 11, 2016 at 09:22:51AM +0100, Lars Schneider wrote:
>> To all macOS users on the list:
>> Does anyone execute the tests with GIT_TEST_HTTPD enabled successfully?
> 
> Nope. The following tests fail for me on master: 5539, 5540, 5541, 5542,
> 5550, 5551, 5561, 5812.

That's what I see, too. Apache needs to be configured in some special way 
to make them work and I was wondering if anyone has figured that out
already for macOS...

However, I much prefer Peff's idea to test against real world servers:
http://public-inbox.org/git/2016092824.qqgrmhtkuw3wp...@sigill.intra.peff.net/

- Lars

Protecting old temporary objects being reused from concurrent "git gc"?

The Braid subproject management tool stores the subproject content in
the main tree and is able to switch to a different upstream revision of
a subproject by doing the equivalent of "git read-tree -m" on the
superproject tree and the two upstream trees. The tricky part is
preparing temporary trees with the upstream content moved to the path
configured for the superproject. The usual method is "git read-tree
--prefix", but using what index file? Braid currently uses the user's
actual worktree, which can leave a mess if it gets interrupted:

https://github.com/cristibalan/braid/blob/7d81da6e86e24de62a74f3ab8d880666cb343b04/lib/braid/commands/update.rb#L98

I want to change this to something that won't leave an inconsistent
state if interrupted. I've written code for this kind of thing before
that sets GIT_INDEX_FILE and uses a temporary index file and "git
write-tree". But I realized that if "git gc" runs concurrently, the
generated tree could be deleted before it is used and the tool would
fail. If I had a need to run "git commit-tree", it seems like I might
even end up with a commit object with a broken reference to a tree.
"git gc" normally doesn't delete objects that were created in the last
2 weeks, but if an identical tree was added to the object database more
than 2 weeks ago by another operation and is unreferenced, it could be
reused without updating its mtime and it could still get deleted.

Is there a recommended way to avoid this kind of problem in add-on
tools? (I searched the Git documentation and the web for information
about races with "git gc" and didn't find anything useful.) If not, it
seems to be a significant design flaw in "git gc", even if the problem
is extremely rare in practice. I wonder if some of the built-in
commands may have the same problem, though I haven't tried to test
them. If this is confirmed to be a known problem affecting built-in
commands, then at least I won't feel bad about introducing the
same problem into add-on tools. :/

Thanks,
Matt

Re: git shortlog vs. stdin

2016-11-15 Thread Christian Neukirchen

Andreas Krey  writes:

> Bug or feature?

Documented feature, but you're holding it wrong ;)

   If no revisions are passed on the command line and either standard
   input is not a terminal or there is no current branch, git shortlog
   will output a summary of the log read from standard input, without
   reference to the current repository.

(Note that you can use shortlog ala "git log --pretty=short | git shortlog")

-- 
Christian Neukirchen    http://chneukirchen.org

git shortlog vs. stdin

2016-11-15 Thread Andreas Krey

Hi all,

I observed a strange an unexpected behaviour in 'git shortlog'.

When in git.git:

$ git shortlog -sn | wc
   14414493   31477

but with input redirected:

$ git shortlog -sn 
Date: Fri, 22 Jan 2010 07:29:21 -0800

New to git, need help!

2016-11-15 Thread Mayank Gupta

Hi All,

I'm new to open source and have recently joined this mailing list.
Since I'm new at this, I think I can initially contribute to the
community by fixing some small bugs or errors but as the documentation
is too large, I don't know where to start.
So if anybody could guide me on how to go about it, it would be really
appreciated.

Thanks,
Mayank

Re: [PATCH v1 2/2] travis-ci: disable GIT_TEST_HTTPD for macOS