Re: [PATCH v7 13/17] ref-filter: add `:dir` and `:base` options for ref printing atoms
On Tue, Nov 15, 2016 at 11:12 PM, Junio C Hamanowrote: > Jacob Keller writes: > >> dirname makes sense. What about implementing a reverse variant of >> strip, which you could perform stripping of right-most components and >> instead of stripping by a number, strip "to" a number, ie: keep the >> left N most components, and then you could use something like >> ... >> I think that would be more general purpose than basename, and less confusing? > > I think you are going in the right direction. I had a similar > thought but built around a different axis. I.e. if strip=1 strips > one from the left, perhaps we want to have rstrip=1 that strips one > from the right, and also strip=-1 to mean strip everything except > one from the left and so on?. I think this and your keep (and > perhaps you'll have rkeep for completeness) have the same expressive > power. I do not offhand have a preference one over the other. > If we do implement strip with negative numbers, it definitely would be neat, but to get the desired feature which I've mentioned below, we'd need to call strip twice, i.e to get remotes from /refs/foo/abc/xyz we'd need to do strip=1,strip=-1, which could be done but would need a lot of tweaking, since we have the same subatom/option having multiple values. On the other hand it would be easier maybe to just introduce rstrip, where we strip from right and ensure that strip and rstrip can be used together. On Wed, Nov 16, 2016 at 2:49 AM, Jacob Keller wrote: > On November 15, 2016 9:42:03 AM PST, Junio C Hamano wrote: >>Somehow it sounds a bit strange to me to treat 'remotes' as the same >>class of token as 'heads' and 'tags' (I'd expect 'heads' and >>'remotes/origin' would be at the same level in end-user's mind), but >>that is probably an unrelated tangent. The reason this series wants >>to introduce :base must be to emulate an existing feature, so that >>existing feature is a concrete counter-example that argues against >>my "it sounds a bit strange" reaction. > > It may be a bit strange indeed. What is the requirement for this? > > I think implementing a strip and rstrip ( if necessary ) with negative > numbers would be most ideal. The necessity is that we need to do different formatting as per the ref type in branch -l. If you see the last but one patch, we do strbuf_addf(, "%%(if:notequals=remotes)%%(refname:base)%%(then)%s%%(else)%s%%(end)", local.buf, remote.buf); where its using ':base' to check for the ref type and do conditional printing along with the %(if)...%(end) atoms. -- Regards, Karthik Nayak
Re: Bug with disabling compression and 'binary' files.
Douglas Coxwrites: >> This may or may not be related to the symptom >> you are observing (if it is, then you would see a packfile created >> in objects/pack/, not in loose objects in object/??/ directories). > > No, the file is loose (it's in .git/objects/eb in this case). This is > seen immediately after the add, though I believe it's the same way > when doing a commit on a changed file. Then I do not have a guess as to where the symptom you are seeing is coming from.
[PATCH v2] compression: unify pack.compression configuration parsing
There are three codepaths that use a variable whose name is pack_compression_level to affect how objects and deltas sent to a packfile is compressed. Unlike zlib_compression_level that controls the loose object compression, however, this variable was static to each of these codepaths. Two of them read the pack.compression configuration variable, using core.compression as the default, and one of them also allowed overriding it from the command line. The other codepath in bulk-checkin did not pay any attention to the configuration. Unify the configuration parsing to git_default_config(), where we implement the parsing of core.loosecompression and core.compression and make the former override the latter, by moving code to parse pack.compression and also allow core.compression to give default to this variable. Signed-off-by: Junio C Hamano--- * Now comes with new tests to cover pack-objects and fast-import, which should not fail with or without code change because they are only here to prevent regression with this change. On the code side, *_compression_seen variables are also made private to config.c, as these variables are used only by the config reader to implement "core.compression gives the default to *.compression" logic and the code is consolidated to this single file. builtin/pack-objects.c | 14 bulk-checkin.c | 2 -- cache.h | 2 +- config.c| 16 + environment.c | 2 +- fast-import.c | 13 --- t/t1050-large.sh| 29 t/t5315-pack-objects-compression.sh | 44 t/t9303-fast-import-compression.sh | 67 + 9 files changed, 158 insertions(+), 31 deletions(-) create mode 100755 t/t5315-pack-objects-compression.sh create mode 100755 t/t9303-fast-import-compression.sh diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 0fd52bd6b4..8841f8b366 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -61,8 +61,6 @@ static int delta_search_threads; static int pack_to_stdout; static int num_preferred_base; static struct progress *progress_state; -static int pack_compression_level = Z_DEFAULT_COMPRESSION; -static int pack_compression_seen; static struct packed_git *reuse_packfile; static uint32_t reuse_packfile_objects; @@ -2368,16 +2366,6 @@ static int git_pack_config(const char *k, const char *v, void *cb) depth = git_config_int(k, v); return 0; } - if (!strcmp(k, "pack.compression")) { - int level = git_config_int(k, v); - if (level == -1) - level = Z_DEFAULT_COMPRESSION; - else if (level < 0 || level > Z_BEST_COMPRESSION) - die("bad pack compression level %d", level); - pack_compression_level = level; - pack_compression_seen = 1; - return 0; - } if (!strcmp(k, "pack.deltacachesize")) { max_delta_cache_size = git_config_int(k, v); return 0; @@ -2869,8 +2857,6 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) reset_pack_idx_option(_idx_opts); git_config(git_pack_config, NULL); - if (!pack_compression_seen && core_compression_seen) - pack_compression_level = core_compression_level; progress = isatty(2); argc = parse_options(argc, argv, prefix, pack_objects_options, diff --git a/bulk-checkin.c b/bulk-checkin.c index 4347f5c76a..991b4a13e2 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -7,8 +7,6 @@ #include "pack.h" #include "strbuf.h" -static int pack_compression_level = Z_DEFAULT_COMPRESSION; - static struct bulk_checkin_state { unsigned plugged:1; diff --git a/cache.h b/cache.h index a50a61a197..b14c0e8e38 100644 --- a/cache.h +++ b/cache.h @@ -670,7 +670,7 @@ extern const char *git_attributes_file; extern const char *git_hooks_path; extern int zlib_compression_level; extern int core_compression_level; -extern int core_compression_seen; +extern int pack_compression_level; extern size_t packed_git_window_size; extern size_t packed_git_limit; extern size_t delta_base_cache_limit; diff --git a/config.c b/config.c index 83fdecb1bc..2f1aef742e 100644 --- a/config.c +++ b/config.c @@ -66,6 +66,8 @@ static struct key_value_info *current_config_kvi; */ static enum config_scope current_parsing_scope; +static int core_compression_seen; +static int pack_compression_seen; static int zlib_compression_seen; /* @@ -865,6 +867,8 @@ static int git_default_core_config(const char *var, const char *value) core_compression_seen = 1; if (!zlib_compression_seen) zlib_compression_level = level; + if
REPLY NOW
We have an inheritance of a deceased client with your surname. Kindly contact Andrew Bailey via email with your full Names and address. ( baanidle...@hotmail.com )for more info. -- Correo Corporativo Hospital Universitario del Valle E.S.E *** "Estamos re-dimensionandonos para crecer!" **
Re: Bug with disabling compression and 'binary' files.
> This may or may not be related to the symptom > you are observing (if it is, then you would see a packfile created > in objects/pack/, not in loose objects in object/??/ directories). No, the file is loose (it's in .git/objects/eb in this case). This is seen immediately after the add, though I believe it's the same way when doing a commit on a changed file. On Tue, Nov 15, 2016 at 8:42 PM, Junio C Hamanowrote: > Douglas Cox writes: > >> I narrowed this down to the '-text' attribute that is set when >> specifying 'binary'. For some reason this flag is cancelling out the >> core.compression = 0 setting and I think this is a bug? >> >> Unfortunately core.compression = 0 is also global. Ideally it would be >> great if there was a separate 'compression' attribute that could be >> specified in .gitattributes per wildcard similar to how -delta can be >> used. This way we would still be able to get compression for >> text/source files, while still getting the speed of skipping >> compression for binary files that do not compress well. >> >> Has there been any discussion on having an attribute similar to this? > > Nope. > > I do not offhand think of a way for '-text' attribute (or any > attribute for what matter) to interfere with compression level, but > while reading the various parts of the system that futz with the > compression level configuration, I noticed one thing. When we do an > initial "bulk-checkin" optimization that sends all objects to a > single packfile upon "git add", the packfile creation uses its own > compression level that is not affected by any configuration or > command line option. This may or may not be related to the symptom > you are observing (if it is, then you would see a packfile created > in objects/pack/, not in loose objects in object/??/ directories). > > >
[PATCH] compression: unify pack.compression configuration parsing
There are three codepaths that use a variable whose name is pack_compression_level to affect how objects and deltas sent to a packfile is compressed. Unlike zlib_compression_level that controls the loose object compression, however, this variable was static to each of these codepaths. Two of them read the pack.compression configuration variable, using core.compression as the default, and one of them also allowed overriding it from the command line. The other codepath in bulk-checkin did not pay any attention to the configuration. Unify the configuration parsing to git_default_config(), where we implement the parsing of core.loosecompression and core.compression and make the former override the latter, by moving code to parse pack.compression and also allow core.compression to give default to this variable. Signed-off-by: Junio C Hamano--- * I was primarily interested in teaching the configuration to bulk-checkin codepath, and added a test that succeeds only with the code change. The handling of confugraiton in other two codepaths, pack-objects and fast-import, may be totally broken with this change if there is no existing test. A similar set of tests for them are very much welcomed, as without them this change won't be 'next' worthy yet. builtin/pack-objects.c | 14 -- bulk-checkin.c | 2 -- cache.h| 2 ++ config.c | 14 ++ environment.c | 2 ++ fast-import.c | 13 - t/t1050-large.sh | 30 ++ 7 files changed, 48 insertions(+), 29 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 0fd52bd6b4..8841f8b366 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -61,8 +61,6 @@ static int delta_search_threads; static int pack_to_stdout; static int num_preferred_base; static struct progress *progress_state; -static int pack_compression_level = Z_DEFAULT_COMPRESSION; -static int pack_compression_seen; static struct packed_git *reuse_packfile; static uint32_t reuse_packfile_objects; @@ -2368,16 +2366,6 @@ static int git_pack_config(const char *k, const char *v, void *cb) depth = git_config_int(k, v); return 0; } - if (!strcmp(k, "pack.compression")) { - int level = git_config_int(k, v); - if (level == -1) - level = Z_DEFAULT_COMPRESSION; - else if (level < 0 || level > Z_BEST_COMPRESSION) - die("bad pack compression level %d", level); - pack_compression_level = level; - pack_compression_seen = 1; - return 0; - } if (!strcmp(k, "pack.deltacachesize")) { max_delta_cache_size = git_config_int(k, v); return 0; @@ -2869,8 +2857,6 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) reset_pack_idx_option(_idx_opts); git_config(git_pack_config, NULL); - if (!pack_compression_seen && core_compression_seen) - pack_compression_level = core_compression_level; progress = isatty(2); argc = parse_options(argc, argv, prefix, pack_objects_options, diff --git a/bulk-checkin.c b/bulk-checkin.c index 4347f5c76a..991b4a13e2 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -7,8 +7,6 @@ #include "pack.h" #include "strbuf.h" -static int pack_compression_level = Z_DEFAULT_COMPRESSION; - static struct bulk_checkin_state { unsigned plugged:1; diff --git a/cache.h b/cache.h index a50a61a197..a5d689f9d3 100644 --- a/cache.h +++ b/cache.h @@ -671,6 +671,8 @@ extern const char *git_hooks_path; extern int zlib_compression_level; extern int core_compression_level; extern int core_compression_seen; +extern int pack_compression_level; +extern int pack_compression_seen; extern size_t packed_git_window_size; extern size_t packed_git_limit; extern size_t delta_base_cache_limit; diff --git a/config.c b/config.c index 83fdecb1bc..0477386f27 100644 --- a/config.c +++ b/config.c @@ -865,6 +865,8 @@ static int git_default_core_config(const char *var, const char *value) core_compression_seen = 1; if (!zlib_compression_seen) zlib_compression_level = level; + if (!pack_compression_seen) + pack_compression_level = level; return 0; } @@ -1125,6 +1127,18 @@ int git_default_config(const char *var, const char *value, void *dummy) pack_size_limit_cfg = git_config_ulong(var, value); return 0; } + + if (!strcmp(var, "pack.compression")) { + int level = git_config_int(var, value); + if (level == -1) + level = Z_DEFAULT_COMPRESSION; + else if (level < 0 || level > Z_BEST_COMPRESSION) +
Re: Bug with disabling compression and 'binary' files.
Douglas Coxwrites: > I narrowed this down to the '-text' attribute that is set when > specifying 'binary'. For some reason this flag is cancelling out the > core.compression = 0 setting and I think this is a bug? > > Unfortunately core.compression = 0 is also global. Ideally it would be > great if there was a separate 'compression' attribute that could be > specified in .gitattributes per wildcard similar to how -delta can be > used. This way we would still be able to get compression for > text/source files, while still getting the speed of skipping > compression for binary files that do not compress well. > > Has there been any discussion on having an attribute similar to this? Nope. I do not offhand think of a way for '-text' attribute (or any attribute for what matter) to interfere with compression level, but while reading the various parts of the system that futz with the compression level configuration, I noticed one thing. When we do an initial "bulk-checkin" optimization that sends all objects to a single packfile upon "git add", the packfile creation uses its own compression level that is not affected by any configuration or command line option. This may or may not be related to the symptom you are observing (if it is, then you would see a packfile created in objects/pack/, not in loose objects in object/??/ directories).
Re: [PATCH v7 13/17] ref-filter: add `:dir` and `:base` options for ref printing atoms
On November 15, 2016 9:42:03 AM PST, Junio C Hamanowrote: >I think you are going in the right direction. I had a similar >thought but built around a different axis. I.e. if strip=1 strips >one from the left, perhaps we want to have rstrip=1 that strips one >from the right, and also strip=-1 to mean strip everything except >one from the left and so on?. I think this and your keep (and >perhaps you'll have rkeep for completeness) have the same expressive >power. I do not offhand have a preference one over the other. I prefer strip implemented with negative numbers. That is simple and expressive and if we need we can implement rstrip if necessary. > >Somehow it sounds a bit strange to me to treat 'remotes' as the same >class of token as 'heads' and 'tags' (I'd expect 'heads' and >'remotes/origin' would be at the same level in end-user's mind), but >that is probably an unrelated tangent. The reason this series wants >to introduce :base must be to emulate an existing feature, so that >existing feature is a concrete counter-example that argues against >my "it sounds a bit strange" reaction. It may be a bit strange indeed. What is the requirement for this? I think implementing a strip and rstrip ( if necessary ) with negative numbers would be most ideal. Thanks Jake
Re: [PATCH v3 5/6] grep: enable recurse-submodules to work on objects
On Fri, Nov 11, 2016 at 3:51 PM, Brandon Williamswrote: > to: > HEAD:file > HEAD:sub/file Maybe indent this ;) > static struct argv_array submodule_options = ARGV_ARRAY_INIT; > +static const char *parent_basename; > > static int grep_submodule_launch(struct grep_opt *opt, > const struct grep_source *gs); > @@ -535,19 +537,53 @@ static int grep_submodule_launch(struct grep_opt *opt, > { > struct child_process cp = CHILD_PROCESS_INIT; > int status, i; > + const char *end_of_base; > + const char *name; > struct work_item *w = opt->output_priv; > > + end_of_base = strchr(gs->name, ':'); > + if (end_of_base) > + name = end_of_base + 1; > + else > + name = gs->name; > + > prepare_submodule_repo_env(_array); > > /* Add super prefix */ > argv_array_pushf(, "--super-prefix=%s%s/", > super_prefix ? super_prefix : "", > -gs->name); > +name); > argv_array_push(, "grep"); > > + /* > +* Add basename of parent project > +* When performing grep on a object the filename is prefixed > +* with the object's name: ':filename'. This comment is hard to read as it's unclear what the mean. (Are the supposed to indicate a variable? If so why is file name not marked up?) > In order to > +* provide uniformity of output we want to pass the name of the > +* parent project's object name to the submodule so the submodule can > +* prefix its output with the parent's name and not its own SHA1. > +*/ > + if (end_of_base) > + argv_array_pushf(, "--parent-basename=%.*s", > +(int) (end_of_base - gs->name), > +gs->name); Do we pass this only with the tree-ish? What if we are grepping the working tree and the file name contains a colon? > +test_expect_success 'grep tree HEAD^' ' > + cat >expect <<-\EOF && > + HEAD^:a:foobar > + HEAD^:b/b:bar > + HEAD^:submodule/a:foobar > + EOF > + > + git grep -e "bar" --recurse-submodules HEAD^ > actual && > + test_cmp expect actual > +' > + > +test_expect_success 'grep tree HEAD^^' ' > + cat >expect <<-\EOF && > + HEAD^^:a:foobar > + HEAD^^:b/b:bar > + EOF > + > + git grep -e "bar" --recurse-submodules HEAD^^ > actual && > + test_cmp expect actual > +' > + > +test_expect_success 'grep tree and pathspecs' ' > + cat >expect <<-\EOF && > + HEAD:submodule/a:foobar > + HEAD:submodule/sub/a:foobar > + EOF > + > + git grep -e "bar" --recurse-submodules HEAD -- submodule > actual && > + test_cmp expect actual > +' Mind to add tests for * recursive submodules (say 2 levels), preferrably not having the gitlink at the root each, i.e. root has a sub1 at path subs/sub1 and sub1 has a sub2 at path subs/sub2, such that recursing would produce a path like HEAD:subs/sub1/subs/sub2/dir/file ? * file names with a colon in it * instead of just HEAD referencing trees, maybe a sha1 referenced test as well (though it is not immediately clear what the benefit would be) * what if the submodule doesn't have the commit referenced in the given sha1 Thanks, Stefan
Re: [PATCH 14/16] checkout: recurse into submodules if asked to
On 11/15, Stefan Beller wrote: > +int option_parse_recurse_submodules(const struct option *opt, > + const char *arg, int unset) > +{ > + if (unset) { > + recurse_submodules = RECURSE_SUBMODULES_OFF; > + return 0; > + } > + if (arg) > + recurse_submodules = > + parse_update_recurse_submodules_arg(opt->long_name, > + arg); > + else > + recurse_submodules = RECURSE_SUBMODULES_ON; > + > + return 0; > +} I assume it is ok to always return 0 from this function? Also, I know we don't like having braces on single statement if's but it is a bit hard to read that multi-line statement without braces. But that's just my opinion :) -- Brandon Williams
Re: [PATCH 13/16] submodule: teach unpack_trees() to update submodules
On 11/15, Stefan Beller wrote: > + if (!S_ISGITLINK(ce->ce_mode)) { > + int flags = > CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE; For readability you may want to have spaces between the two flags > + if (o->index_only > + || (!((old->ce_flags & CE_VALID) || ce_skip_worktree(old)) > + && (o->reset || ce_uptodate(old > + return 0; The coding guidelines say that git prefers to have the logical operators on the right side like this: if (o->index_only || (!((old->ce_flags & CE_VALID) || ce_skip_worktree(old)) && (o->reset || ce_uptodate(old return 0; It also says the other way is ok too, just a thought :) > + if (submodule_is_interesting(old->name, > null_sha1) > + && is_submodule_checkout_safe(new->name, > >oid)) > + return 0; here too. > + } else { > + /* > + * new is not a submodule any more, so only > + * care if we care: > + */ > + if (submodule_is_interesting(old->name, > null_sha1) > + && ok_to_remove_submodule(old->name)) > + return 0; and here -- Brandon Williams
RE: [PATCH 13/16] submodule: teach unpack_trees() to update submodules
[I've reviewed up-to and including 13; I'll look at 14-16 tomorrow-ish] > -Original Message- > From: Stefan Beller [mailto:sbel...@google.com] > Sent: Tuesday, November 15, 2016 6:07 PM > Cc: git@vger.kernel.org; bmw...@google.com; gits...@pobox.com; > jrnie...@gmail.com; mogulgu...@gmail.com; David Turner; Stefan Beller > Subject: [PATCH 13/16] submodule: teach unpack_trees() to update > submodules ... > msgs[ERROR_NOT_UPTODATE_DIR] = > _("Updating the following directories would lose untracked > files in it:\n%s"); > + msgs[ERROR_NOT_UPTODATE_SUBMODULE] = > + _("Updating the following submodules would lose modifications > in > +it:\n%s"); s/it/them/ > if (!strcmp(cmd, "checkout")) > msg = advice_commit_before_merge > @@ -1315,19 +1320,18 @@ static int verify_uptodate_1(const struct > cache_entry *ce, > return 0; > > if (!lstat(ce->name, )) { > - int flags = > CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE; > - unsigned changed = ie_match_stat(o->src_index, ce, , > flags); > - if (!changed) > - return 0; > - /* > - * NEEDSWORK: the current default policy is to allow > - * submodule to be out of sync wrt the superproject > - * index. This needs to be tightened later for > - * submodules that are marked to be automatically > - * checked out. > - */ > - if (S_ISGITLINK(ce->ce_mode)) > - return 0; > + if (!S_ISGITLINK(ce->ce_mode)) { I generally prefer to avoid if (!x) { A } else { B } -- I would rather just see if (x) { B } else { A }. > + if (!changed) { > + /* old is always a submodule */ > + if (S_ISGITLINK(new->ce_mode)) { > + /* > + * new is also a submodule, so check if we care > + * and then if can checkout the new sha1 safely > + */ > + if (submodule_is_interesting(old->name, > null_sha1) > + && is_submodule_checkout_safe(new->name, > > >oid)) > + return 0; > + } else { > + /* > + * new is not a submodule any more, so only > + * care if we care: > + */ > + if (submodule_is_interesting(old->name, > null_sha1) > + && ok_to_remove_submodule(old->name)) > + return 0; > + } Do we need a return 1 in here somewhere? Because otherwise, we fall through and return 0 later.
Bug with disabling compression and 'binary' files.
I was doing some experiments today with large-ish (100-200MB) binary files and was trying to determine the best configuration for Git. Here are the steps and timings I saw: git init Test cp .../largemovie.mp4 . time git add largemovie.mp4 This took 6.5s for a 200MB file. This file compressed a little bit, but not enough to matter, so I wanted to see how much faster the add would be with compression disabled. So I ran: git config core.compression = 0 I then completely reran the test above starting with a clean repository. This time the add took only 2.08s. I repeated these two tests about 10 times using the same file each time to verify it wasn't related to disk caching, etc. At this point I decided that this was likely a good setting for this repository, so I also created a .gitattributes file and added a few entries often seen in suggestions for dealing with binary files: *.mp4 binary -delta The goal here was to disable any delta compression done during a gc/pack and use the other settings 'binary' applies. Unfortunately when I ran the test again (still using compression = 0), the test was back to taking 6.5s and the file inside the .git/objects/ folder was compressed again. I narrowed this down to the '-text' attribute that is set when specifying 'binary'. For some reason this flag is cancelling out the core.compression = 0 setting and I think this is a bug? Unfortunately core.compression = 0 is also global. Ideally it would be great if there was a separate 'compression' attribute that could be specified in .gitattributes per wildcard similar to how -delta can be used. This way we would still be able to get compression for text/source files, while still getting the speed of skipping compression for binary files that do not compress well. Has there been any discussion on having an attribute similar to this? Thanks, -Doug
Re: [PATCH v15 01/27] bisect--helper: use OPT_CMDMODE instead of OPT_BOOL
Stephan Beyerwrites: > Besides the things I'm mentioning in respective patch e-mails, I wonder > why several bisect--helper commands are prefixed by "bisect"; I'm > talking about: > > git bisect--helper --bisect-clean-state >... > git bisect--helper --bisect-start > etc. > > instead of > > git bisect--helper --clean-state >... > git bisect--helper --start > etc. > > Well, I know *why* they have these names: because the shell function > names are simply reused. But I don't know why these prefixes are kept in > the bisect--helper command options. On the other hand, these command > names are not exposed to the user and may hence not be that important.(?) That's a good point ;-) These are not intended to be end-user entry points, so names that are bit longer than necessary does not bother me too much. Hopefully the longer-term endgame would be not to need a separate "bisect-helper" binary at all but to have a "git bisect" binary making these requests as subroutine calls, and at that point, the names of the functions would want to have "bisect" prefix.
RE: [PATCH 07/16] update submodules: introduce submodule_is_interesting
> -Original Message- > From: Stefan Beller [mailto:sbel...@google.com] > Sent: Tuesday, November 15, 2016 6:07 PM > Cc: git@vger.kernel.org; bmw...@google.com; gits...@pobox.com; > jrnie...@gmail.com; mogulgu...@gmail.com; David Turner; Stefan Beller > Subject: [PATCH 07/16] update submodules: introduce > submodule_is_interesting > +int submodule_is_interesting(const char *path, const unsigned char > +*sha1) { This is apparently only ever (in this series) called with null_sha1. So either this arg is unnecessary, or there are bugs elsewhere in the code.
Re: [PATCH 11/16] teach unpack_trees() to remove submodule contents
On 11/15, Stefan Beller wrote: > Extend rmdir_or_warn() to remove the directories of those submodules which > are scheduled for removal. Also teach verify_clean_submodule() to check > that a submodule configured to be removed is not modified before scheduling > it for removal. > > Signed-off-by: Stefan Beller> --- > unpack-trees.c | 6 ++ > wrapper.c | 4 > 2 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/unpack-trees.c b/unpack-trees.c > index ea6bdd2..576e1d5 100644 > --- a/unpack-trees.c > +++ b/unpack-trees.c > @@ -9,6 +9,7 @@ > #include "refs.h" > #include "attr.h" > #include "split-index.h" > +#include "submodule.h" > #include "dir.h" > > /* > @@ -1361,15 +1362,12 @@ static void invalidate_ce_path(const struct > cache_entry *ce, > /* > * Check that checking out ce->sha1 in subdir ce->name is not > * going to overwrite any working files. > - * > - * Currently, git does not checkout subprojects during a superproject > - * checkout, so it is not going to overwrite anything. > */ > static int verify_clean_submodule(const struct cache_entry *ce, > enum unpack_trees_error_types error_type, > struct unpack_trees_options *o) > { > - return 0; > + return submodule_is_interesting(ce->name, null_sha1) && > is_submodule_modified(ce->name, 0); > } So what does the return value from this function meant to mean? Is '1' mean the submodule is clean while '0' indicates it is dirty or is it the reverse of that? Reading this it seems to me a value of '1' means "yes the submodule is clean!" but the way the return value is calculated tells a different story. Either I'm understanding it incorrectly or I think the return should be something like this: return submodule_is_interesting(ce->name, null_sha1) && !is_submodule_modified(ce->name, 0); Where we return '1' if the submodule is interesting and it hasn't been modified. > > static int verify_clean_subdirectory(const struct cache_entry *ce, > diff --git a/wrapper.c b/wrapper.c > index e7f1979..17c08de 100644 > --- a/wrapper.c > +++ b/wrapper.c > @@ -2,6 +2,7 @@ > * Various trivial helper wrappers around standard functions > */ > #include "cache.h" > +#include "submodule.h" > > static void do_nothing(size_t size) > { > @@ -592,6 +593,9 @@ int unlink_or_warn(const char *file) > > int rmdir_or_warn(const char *file) > { > + if (submodule_is_interesting(file, null_sha1) > + && depopulate_submodule(file)) > + return -1; > return warn_if_unremovable("rmdir", file, rmdir(file)); > } It seems weird to me that rmdir is doing checks to see if the file being removed is a submodule. Shouldn't those checks have occurred before calling rmdir? -- Brandon Williams
Re: [PATCH v3 4/4] submodule_needs_pushing() NEEDSWORK when we can not answer this question
Stefan Bellerwrites: >> "We do not know" ... > > ... because there is no way to check for us as we don't have the > submodule commits. > > " We do consider it safe as no one in their sane mind would > have changed the submodule pointers without having the > submodule around. If a user did however change the submodules > without having the submodule commits around, this indicates an > expert who knows what they were doing." I didn't think it through myself to arrive at such a conclusion, but to me the above sounds like a sensible reasoning [*1*]. >> We currently >> +* proceed pushing here as if the submodules commits are >> +* available on a remote. Since we can not check the >> +* remote availability for this submodule we should >> +* consider changing this behavior to: Stop here and >> +* tell the user how to skip this check if wanted. >> +*/ >> return 0; > > Thanks for adding the NEEDSWORK, I just wrote the above lines > to clarify my thought process, not as a suggestion for change. One thing I would suggest would be "Stop here, explain the situation and then tell the user how to skip". I am not convinced that it would _help_ users and make it _safer_ if we stopped here, though. > Overall the series looks good to me; the nits are minor IMHO. Ditto. [Footnote] *1* My version was more like "we do not know if they would get into a situation where they do not have enough submodule commits if we pushed our superproject, but more importantly, we DO KNOW that it would not help an iota if we pushed our submodule to them, so there is no point stopping the push of superproject saying 'no, no, no, you must push the submodule first'".
RE: [PATCH 09/16] update submodules: add scheduling to update submodules
> -Original Message- > From: Brandon Williams [mailto:bmw...@google.com] > > +struct scheduled_submodules_update_type { > > + const char *path; > > + const struct object_id *oid; > > + /* > > +* Do we need to perform a complete checkout or just incremental > > +* update? > > +*/ > > + unsigned is_new:1; > > +} *scheduled_submodules; > > +#define SCHEDULED_SUBMODULES_INIT {NULL, NULL} > > I may not know enough about these types of initializors but that Init > macro only has 2 entries while there are three entries in the struct > itself. In fact, only the first NULL is necessary; unspecified initializer entries in C default to zero.
Re: [PATCH v3 4/6] grep: optionally recurse into submodules
On Fri, Nov 11, 2016 at 3:51 PM, Brandon Williamswrote: > Allow grep to recognize submodules and recursively search for patterns in > each submodule. This is done by forking off a process to recursively > call grep on each submodule. The top level --super-prefix option is > used to pass a path to the submodule which can in turn be used to > prepend to output or in pathspec matching logic. > > Recursion only occurs for submodules which have been initialized and > checked out by the parent project. If a submodule hasn't been > initialized and checked out it is simply skipped. > > In order to support the existing multi-threading infrastructure in grep, > output from each child process is captured in a strbuf so that it can be > later printed to the console in an ordered fashion. > > To limit the number of theads that are created, each child process has > half the number of threads as its parents (minimum of 1), otherwise we > potentailly have a fork-bomb. > > Signed-off-by: Brandon Williams > --- > Documentation/git-grep.txt | 5 + > builtin/grep.c | 300 > ++--- > git.c | 2 +- > t/t7814-grep-recurse-submodules.sh | 99 > 4 files changed, 385 insertions(+), 21 deletions(-) > create mode 100755 t/t7814-grep-recurse-submodules.sh > > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > index 0ecea6e..17aa1ba 100644 > --- a/Documentation/git-grep.txt > +++ b/Documentation/git-grep.txt > @@ -26,6 +26,7 @@ SYNOPSIS >[--threads ] >[-f ] [-e] >[--and|--or|--not|(|)|-e ...] > + [--recurse-submodules] >[ [--[no-]exclude-standard] [--cached | --no-index | --untracked] > | ...] >[--] [...] > > @@ -88,6 +89,10 @@ OPTIONS > mechanism. Only useful when searching files in the current directory > with `--no-index`. > > +--recurse-submodules:: > + Recursively search in each submodule that has been initialized and > + checked out in the repository. > + > -a:: > --text:: > Process binary files as if they were text. > diff --git a/builtin/grep.c b/builtin/grep.c > index 8887b6a..1fd292f 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -18,12 +18,20 @@ > #include "quote.h" > #include "dir.h" > #include "pathspec.h" > +#include "submodule.h" > > static char const * const grep_usage[] = { > N_("git grep [] [-e] [...] [[--] ...]"), > NULL > }; > > +static const char *super_prefix; > +static int recurse_submodules; > +static struct argv_array submodule_options = ARGV_ARRAY_INIT; > + > +static int grep_submodule_launch(struct grep_opt *opt, > +const struct grep_source *gs); > + > #define GREP_NUM_THREADS_DEFAULT 8 > static int num_threads; > > @@ -174,7 +182,10 @@ static void *run(void *arg) > break; > > opt->output_priv = w; > - hit |= grep_source(opt, >source); > + if (w->source.type == GREP_SOURCE_SUBMODULE) > + hit |= grep_submodule_launch(opt, >source); > + else > + hit |= grep_source(opt, >source); > grep_source_clear_data(>source); > work_done(w); > } > @@ -300,6 +311,10 @@ static int grep_sha1(struct grep_opt *opt, const > unsigned char *sha1, > if (opt->relative && opt->prefix_length) { > quote_path_relative(filename + tree_name_len, opt->prefix, > ); > strbuf_insert(, 0, filename, tree_name_len); > + } else if (super_prefix) { > + strbuf_add(, filename, tree_name_len); > + strbuf_addstr(, super_prefix); > + strbuf_addstr(, filename + tree_name_len); > } else { > strbuf_addstr(, filename); > } > @@ -328,10 +343,13 @@ static int grep_file(struct grep_opt *opt, const char > *filename) > { > struct strbuf buf = STRBUF_INIT; > > - if (opt->relative && opt->prefix_length) > + if (opt->relative && opt->prefix_length) { > quote_path_relative(filename, opt->prefix, ); > - else > + } else { > + if (super_prefix) > + strbuf_addstr(, super_prefix); > strbuf_addstr(, filename); > + } > > #ifndef NO_PTHREADS > if (num_threads) { > @@ -378,31 +396,258 @@ static void run_pager(struct grep_opt *opt, const char > *prefix) > exit(status); > } > > -static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, > int cached) > +static void compile_submodule_options(const struct grep_opt *opt, > + const struct pathspec *pathspec, > + int cached, int untracked, > + int
Re: [PATCH 10/16] update submodules: is_submodule_checkout_safe
On 11/15, Stefan Beller wrote: > In later patches we introduce the options and flag for commands > that modify the working directory, e.g. git-checkout. > > This piece of code will answer the question: > "Is it safe to change the submodule to this new state?" > e.g. is it overwriting untracked files or are there local > changes that would be overwritten? > > Signed-off-by: Stefan Beller> --- > submodule.c | 22 ++ > submodule.h | 2 ++ > 2 files changed, 24 insertions(+) > > diff --git a/submodule.c b/submodule.c > index 2773151..2149ef7 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1201,6 +1201,28 @@ int ok_to_remove_submodule(const char *path) > return ok_to_remove; > } > > +int is_submodule_checkout_safe(const char *path, const struct object_id *oid) > +{ > + struct child_process cp = CHILD_PROCESS_INIT; > + > + if (!is_submodule_populated(path)) > + /* The submodule is not populated, it's safe to check it out */ > + /* > + * TODO: When git learns to re-populate submodules, a check > must be > + * added here to assert that no local files will be overwritten. > + */ When you mean local files do you mean in the situation where we want to checkout a submodule at path 'foo' but there already exists a file at path 'foo'? > + return 1; > + > + argv_array_pushl(, "read-tree", "-n", "-m", "HEAD", > + sha1_to_hex(oid->hash), NULL); > + > + prepare_submodule_repo_env(_array); > + cp.git_cmd = 1; > + cp.no_stdin = 1; > + cp.dir = path; > + return run_command() == 0; > +} > + > static int find_first_merges(struct object_array *result, const char *path, > struct commit *a, struct commit *b) > { > diff --git a/submodule.h b/submodule.h > index f01f87c..a7fa634 100644 > --- a/submodule.h > +++ b/submodule.h > @@ -74,6 +74,8 @@ extern unsigned is_submodule_modified(const char *path, int > ignore_untracked); > extern int is_submodule_populated(const char *path); > extern int submodule_uses_gitfile(const char *path); > extern int ok_to_remove_submodule(const char *path); > +extern int is_submodule_checkout_safe(const char *path, > + const struct object_id *oid); > extern int merge_submodule(unsigned char result[20], const char *path, > const unsigned char base[20], > const unsigned char a[20], > -- > 2.10.1.469.g00a8914 > -- Brandon Williams
Re: gitweb html validation
Ralf Thielowwrites: > Only block level elements are > allowed to be inside form tags, according to > https://www.w3.org/2010/04/xhtml10-strict.html#elem_form > ... > I think it's better to just move the -Tag outside of the > surrounding div? > Something like this perhaps, I didn't test it myself yet. That sounds like a sensible update to me (no, I do not run gitweb myself). Is this the only we have in the UI, or is it the only one that is problematic? > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 7cf68f07b..33d7c154f 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -5531,8 +5531,8 @@ sub git_project_search_form { > $limit = " in '$project_filter/'"; > } > > - print "\n"; > print $cgi->start_form(-method => 'get', -action => $my_uri) . > + "\n" . > $cgi->hidden(-name => 'a', -value => 'project_list') . "\n"; > print $cgi->hidden(-name => 'pf', -value => $project_filter). "\n" > if (defined $project_filter); > @@ -5544,11 +5544,11 @@ sub git_project_search_form { >-checked => $search_use_regexp) . > "\n" . > $cgi->submit(-name => 'btnS', -value => 'Search') . > - $cgi->end_form() . "\n" . > $cgi->a({-href => href(project => undef, searchtext => undef, >project_filter => $project_filter)}, > esc_html("List all projects$limit")) . "\n"; > - print "\n"; > + print "\n" . > + $cgi->end_form() . "\n"; > } > > # entry for given @keys needs filling if at least one of keys in list > diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css > index 321260103..507740b6a 100644 > --- a/gitweb/static/gitweb.css > +++ b/gitweb/static/gitweb.css > @@ -539,7 +539,7 @@ div.projsearch { > margin: 20px 0px; > } > > -div.projsearch form { > +form div.projsearch { > margin-bottom: 2px; > } >
Re: [PATCH 09/16] update submodules: add scheduling to update submodules
On 11/15, Stefan Beller wrote: > +static int update_submodule(const char *path, const struct object_id *oid, > + int force, int is_new) > +{ > + const char *git_dir; > + struct child_process cp = CHILD_PROCESS_INIT; > + const struct submodule *sub = submodule_from_path(null_sha1, path); > + > + if (!sub || !sub->name) > + return -1; > + > + git_dir = resolve_gitdir(git_common_path("modules/%s", sub->name)); > + > + if (!git_dir) > + return -1; > + > + if (is_new) > + connect_work_tree_and_git_dir(path, git_dir); > + > + /* update index via `read-tree --reset sha1` */ > + argv_array_pushl(, "read-tree", > +force ? "--reset" : "-m", > +"-u", sha1_to_hex(oid->hash), NULL); > + prepare_submodule_repo_env(_array); > + cp.git_cmd = 1; > + cp.no_stdin = 1; > + cp.dir = path; > + if (run_command()) { > + warning(_("reading the index in submodule '%s' failed"), path); > + child_process_clear(); > + return -1; > + } > + > + /* write index to working dir */ > + child_process_clear(); > + child_process_init(); > + argv_array_pushl(, "checkout-index", "-a", NULL); > + cp.git_cmd = 1; > + cp.no_stdin = 1; > + cp.dir = path; > + if (force) > + argv_array_push(, "-f"); > + > + if (run_command()) { > + warning(_("populating the working directory in submodule '%s' > failed"), path); > + child_process_clear(); > + return -1; > + } > + > + /* get the HEAD right */ > + child_process_clear(); > + child_process_init(); > + argv_array_pushl(, "checkout", "--recurse-submodules", NULL); > + cp.git_cmd = 1; > + cp.no_stdin = 1; > + cp.dir = path; > + if (force) > + argv_array_push(, "-f"); > + argv_array_push(, sha1_to_hex(oid->hash)); > + > + if (run_command()) { > + warning(_("setting the HEAD in submodule '%s' failed"), path); > + child_process_clear(); > + return -1; > + } > + > + child_process_clear(); > + return 0; > +} If run command is successful then it handles the clearing of the child process struct, correct? Is there a negative to having all the explicit clears when the child was successful? > + > int depopulate_submodule(const char *path) > { > int ret = 0; > @@ -1336,3 +1405,51 @@ void prepare_submodule_repo_env(struct argv_array *out) > } > argv_array_push(out, "GIT_DIR=.git"); > } > + > +struct scheduled_submodules_update_type { > + const char *path; > + const struct object_id *oid; > + /* > + * Do we need to perform a complete checkout or just incremental > + * update? > + */ > + unsigned is_new:1; > +} *scheduled_submodules; > +#define SCHEDULED_SUBMODULES_INIT {NULL, NULL} I may not know enough about these types of initializors but that Init macro only has 2 entries while there are three entries in the struct itself. > + > +int scheduled_submodules_nr, scheduled_submodules_alloc; Should these globals be static since they should be scoped to only this file? > + > +void schedule_submodule_for_update(const struct cache_entry *ce, int is_new) > +{ > + struct scheduled_submodules_update_type *ssu; > + ALLOC_GROW(scheduled_submodules, > +scheduled_submodules_nr + 1, > +scheduled_submodules_alloc); > + ssu = _submodules[scheduled_submodules_nr++]; > + ssu->path = ce->name; > + ssu->oid = >oid; > + ssu->is_new = !!is_new; > +} > + > +int update_submodules(int force) > +{ > + int i; > + gitmodules_config(); > + > + /* > + * NEEDSWORK: As submodule updates can potentially take some > + * time each and they do not overlap (i.e. no d/f conflicts; > + * this can be parallelized using the run_commands.h API. > + */ > + for (i = 0; i < scheduled_submodules_nr; i++) { > + struct scheduled_submodules_update_type *ssu = > + _submodules[i]; > + > + if (submodule_is_interesting(ssu->path, null_sha1)) > + update_submodule(ssu->path, ssu->oid, > + force, ssu->is_new); > + } > + > + scheduled_submodules_nr = 0; > + return 0; > +} nit: organization wise it makes more sense to me to have the 'update_submodule' helper function be located more closely to the 'update_submodules' function. -- Brandon Williams
Re: [PATCH v3 1/6] submodules: add helper functions to determine presence of submodules
On Fri, Nov 11, 2016 at 3:51 PM, Brandon Williamswrote: > Add two helper functions to submodules.c. > `is_submodule_initialized()` checks if a submodule has been initialized > at a given path and `is_submodule_populated()` check if a submodule > has been checked out at a given path. This reminds me to write the documentation patch explaining the concepts of submodules (specifically that overview page would state all the possible states of submodules) This patch looks good, Stefan > + > + if (module) { > + char *key = xstrfmt("submodule.%s.url", module->name); > + char *value = NULL; minor nit: In case a reroll is needed, you could replace `value` by `not_needed` or `unused` to make it easier to follow. Hence it also doesn't need initialization (Doh, it does for free() to work, nevermind).
Re: [PATCH 08/16] update submodules: add depopulate_submodule
On 11/15, Stefan Beller wrote: > Implement the functionality needed to enable work tree manipulating > commands to that a deleted submodule should not only affect the index "to that a deleted" did you mean "so that a deleted" > (leaving all the files of the submodule in the work tree) but also to > remove the work tree of the superproject (including any untracked > files). > > To do so, we need an equivalent of "rm -rf", which is already found in > entry.c, so expose that and for clarity add a suffix "_or_dir" to it. > > That will only work properly when the submodule uses a gitfile instead of > a .git directory and no untracked files are present. Otherwise the removal > will fail with a warning (which is just what happened until now). So if a submodule uses a .git directory then it will be ignored during the checkout? All other submodules will actually be removed? Couldn't you end up in an undesirable state with a checkout effecting one submodule but not another? > diff --git a/cache.h b/cache.h > index a50a61a..65c47e4 100644 > --- a/cache.h > +++ b/cache.h > @@ -2018,4 +2018,6 @@ void sleep_millisec(int millisec); > */ > void safe_create_dir(const char *dir, int share); > > +void remove_subtree_or_die(const char *path); > + > #endif /* CACHE_H */ Should probably place an explicit 'extern' in the function prototype. > +int depopulate_submodule(const char *path) > +{ > + int ret = 0; > + char *dot_git = xstrfmt("%s/.git", path); > + > + /* Is it populated? */ > + if (!resolve_gitdir(dot_git)) > + goto out; > + > + /* Does it have a .git directory? */ > + if (!submodule_uses_gitfile(path)) { > + warning(_("cannot remove submodule '%s' because it (or one of " > + "its nested submodules) uses a .git directory"), > + path); > + ret = -1; > + goto out; > + } > + > + remove_subtree_or_die(path); > + > +out: > + free(dot_git); > + return ret; > +} -- Brandon Williams
Re: [PATCH 07/16] update submodules: introduce submodule_is_interesting
On 11/15, Stefan Beller wrote: > +/** > + * When updating the working tree, do we need to check if the submodule needs > + * updating. We do not require a check if we are already sure that the > + * submodule doesn't need updating, e.g. when we are not interested in > submodules > + * or the submodule is marked uninteresting by being not initialized. > + */ The first sentence seems a bit awkward. It seems like its worded as a question, maybe drop the 'do'? -- Brandon Williams
Re: [PATCH 04/16] update submodules: add is_submodule_populated
On 11/15, Stefan Beller wrote: > This is nearly same as Brandon sent out. > (First patch of origin/bw/grep-recurse-submodules, > will drop this patch once Brandons series is stable > enough to build on). I would be thrilled to see more reviews on that series :D -- Brandon Williams
Re: [PATCH v15 13/27] bisect--helper: `bisect_start` shell function partially in C
Hi, On 10/14/2016 04:14 PM, Pranit Bauva wrote: > diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c > index 6a5878c..1d3e17f 100644 > --- a/builtin/bisect--helper.c > +++ b/builtin/bisect--helper.c > @@ -24,6 +27,8 @@ static const char * const git_bisect_helper_usage[] = { > N_("git bisect--helper --bisect-check-and-set-terms > "), > N_("git bisect--helper --bisect-next-check [] >N_("git bisect--helper --bisect-terms [--term-good | --term-old | > --term-bad | --term-new]"), > + N_("git bisect--helper --bisect start [--term-{old,good}= > --term-{new,bad}=]" > + "[--no-checkout] [ > [...]] [--] [...]"), Typo: "--bisect start" with space instead of "-" > @@ -403,6 +408,205 @@ static int bisect_terms(struct bisect_terms *terms, > const char **argv, int argc) > return 0; > } > > +static int bisect_start(struct bisect_terms *terms, int no_checkout, > + const char **argv, int argc) > +{ > + int i, has_double_dash = 0, must_write_terms = 0, bad_seen = 0; > + int flags, pathspec_pos, retval = 0; > + struct string_list revs = STRING_LIST_INIT_DUP; > + struct string_list states = STRING_LIST_INIT_DUP; > + struct strbuf start_head = STRBUF_INIT; > + struct strbuf bisect_names = STRBUF_INIT; > + struct strbuf orig_args = STRBUF_INIT; > + const char *head; > + unsigned char sha1[20]; > + FILE *fp = NULL; > + struct object_id oid; > + > + if (is_bare_repository()) > + no_checkout = 1; > + > + for (i = 0; i < argc; i++) { > + if (!strcmp(argv[i], "--")) { > + has_double_dash = 1; > + break; > + } > + } > + > + for (i = 0; i < argc; i++) { > + const char *commit_id = xstrfmt("%s^{commit}", argv[i]); > + const char *arg = argv[i]; > + if (!strcmp(argv[i], "--")) { > + has_double_dash = 1; This is without effect since has_double_dash is already set to 1 by the loop above. I think you can remove this line. > + break; > + } else if (!strcmp(arg, "--no-checkout")) { > + no_checkout = 1; > + } else if (!strcmp(arg, "--term-good") || > + !strcmp(arg, "--term-old")) { > + must_write_terms = 1; > + terms->term_good = xstrdup(argv[++i]); > + } else if (skip_prefix(arg, "--term-good=", )) { > + must_write_terms = 1; > + terms->term_good = xstrdup(arg); > + } else if (skip_prefix(arg, "--term-old=", )) { > + must_write_terms = 1; > + terms->term_good = xstrdup(arg); I think you can join the last two branches: + } else if (skip_prefix(arg, "--term-good=", ) || + skip_prefix(arg, "--term-old=", )) { + must_write_terms = 1; + terms->term_good = xstrdup(arg); > + } else if (!strcmp(arg, "--term-bad") || > + !strcmp(arg, "--term-new")) { > + must_write_terms = 1; > + terms->term_bad = xstrdup(argv[++i]); > + } else if (skip_prefix(arg, "--term-bad=", )) { > + must_write_terms = 1; > + terms->term_bad = xstrdup(arg); > + } else if (skip_prefix(arg, "--term-new=", )) { > + must_write_terms = 1; > + terms->term_good = xstrdup(arg); This has to be terms->term_bad = ... Also, you can join the last two branches, again, ie, + } else if (skip_prefix(arg, "--term-bad=", ) || + skip_prefix(arg, "--term-new=", )) { + must_write_terms = 1; + terms->term_bad = xstrdup(arg); > + } else if (starts_with(arg, "--") && > + !one_of(arg, "--term-good", "--term-bad", NULL)) { > + die(_("unrecognised option: '%s'"), arg); [...] > + /* > + * Verify HEAD > + */ > + head = resolve_ref_unsafe("HEAD", 0, sha1, ); > + if (!head) > + if (get_sha1("HEAD", sha1)) > + die(_("Bad HEAD - I need a HEAD")); > + > + if (!is_empty_or_missing_file(git_path_bisect_start())) { You were so eager to re-use the comments from the shell script, but you forgot the "Check if we are bisecting." comment above this line ;-) > + /* Reset to the rev from where we started */ > + strbuf_read_file(_head, git_path_bisect_start(), 0); > + strbuf_trim(_head); > + if (!no_checkout) { > + struct argv_array argv = ARGV_ARRAY_INIT; [...] > + if (must_write_terms) > + if (write_terms(terms->term_bad,
[PATCH 15/16] completion: add '--recurse-submodules' to checkout
Signed-off-by: Stefan Beller--- contrib/completion/git-completion.bash | 2 +- t/t9902-completion.sh | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index 21016bf..28acfdb 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -1068,7 +1068,7 @@ _git_checkout () --*) __gitcomp " --quiet --ours --theirs --track --no-track --merge - --conflict= --orphan --patch + --conflict= --orphan --patch --recurse-submodules " ;; *) diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh index 2ba62fb..d2d1102 100755 --- a/t/t9902-completion.sh +++ b/t/t9902-completion.sh @@ -447,6 +447,7 @@ test_expect_success 'double dash "git checkout"' ' --conflict= --orphan Z --patch Z + --recurse-submodules Z EOF ' -- 2.10.1.469.g00a8914
[PATCH 05/16] update submodules: add submodule config parsing
Similar as in b33a15b08 (push: add recurseSubmodules config option, 2015-11-17) and 027771fcb1 (submodule: allow erroneous values for the fetchRecurseSubmodules option, 2015-08-17), we add submodule-config code that is later used to parse whether we are interested in updating submodules. We need the `die_on_error` parameter to be able to call this parsing function for the config file as well, which if incorrect let's Git die. As we're just touching the header file, also mark all functions extern. Signed-off-by: Stefan Beller--- submodule-config.c | 22 ++ submodule-config.h | 17 + 2 files changed, 31 insertions(+), 8 deletions(-) diff --git a/submodule-config.c b/submodule-config.c index 098085b..4b5297e 100644 --- a/submodule-config.c +++ b/submodule-config.c @@ -234,6 +234,28 @@ int parse_fetch_recurse_submodules_arg(const char *opt, const char *arg) return parse_fetch_recurse(opt, arg, 1); } +static int parse_update_recurse(const char *opt, const char *arg, + int die_on_error) +{ + switch (git_config_maybe_bool(opt, arg)) { + case 1: + return RECURSE_SUBMODULES_ON; + case 0: + return RECURSE_SUBMODULES_OFF; + default: + if (!strcmp(arg, "checkout")) + return RECURSE_SUBMODULES_ON; + if (die_on_error) + die("bad %s argument: %s", opt, arg); + return RECURSE_SUBMODULES_ERROR; + } +} + +int parse_update_recurse_submodules_arg(const char *opt, const char *arg) +{ + return parse_update_recurse(opt, arg, 1); +} + static int parse_push_recurse(const char *opt, const char *arg, int die_on_error) { diff --git a/submodule-config.h b/submodule-config.h index d05c542..992bfbe 100644 --- a/submodule-config.h +++ b/submodule-config.h @@ -22,13 +22,14 @@ struct submodule { int recommend_shallow; }; -int parse_fetch_recurse_submodules_arg(const char *opt, const char *arg); -int parse_push_recurse_submodules_arg(const char *opt, const char *arg); -int parse_submodule_config_option(const char *var, const char *value); -const struct submodule *submodule_from_name(const unsigned char *commit_sha1, - const char *name); -const struct submodule *submodule_from_path(const unsigned char *commit_sha1, - const char *path); -void submodule_free(void); +extern int parse_fetch_recurse_submodules_arg(const char *opt, const char *arg); +extern int parse_update_recurse_submodules_arg(const char *opt, const char *arg); +extern int parse_push_recurse_submodules_arg(const char *opt, const char *arg); +extern int parse_submodule_config_option(const char *var, const char *value); +extern const struct submodule *submodule_from_name( + const unsigned char *commit_sha1, const char *name); +extern const struct submodule *submodule_from_path( + const unsigned char *commit_sha1, const char *path); +extern void submodule_free(void); #endif /* SUBMODULE_CONFIG_H */ -- 2.10.1.469.g00a8914
[PATCH 10/16] update submodules: is_submodule_checkout_safe
In later patches we introduce the options and flag for commands that modify the working directory, e.g. git-checkout. This piece of code will answer the question: "Is it safe to change the submodule to this new state?" e.g. is it overwriting untracked files or are there local changes that would be overwritten? Signed-off-by: Stefan Beller--- submodule.c | 22 ++ submodule.h | 2 ++ 2 files changed, 24 insertions(+) diff --git a/submodule.c b/submodule.c index 2773151..2149ef7 100644 --- a/submodule.c +++ b/submodule.c @@ -1201,6 +1201,28 @@ int ok_to_remove_submodule(const char *path) return ok_to_remove; } +int is_submodule_checkout_safe(const char *path, const struct object_id *oid) +{ + struct child_process cp = CHILD_PROCESS_INIT; + + if (!is_submodule_populated(path)) + /* The submodule is not populated, it's safe to check it out */ + /* +* TODO: When git learns to re-populate submodules, a check must be +* added here to assert that no local files will be overwritten. +*/ + return 1; + + argv_array_pushl(, "read-tree", "-n", "-m", "HEAD", +sha1_to_hex(oid->hash), NULL); + + prepare_submodule_repo_env(_array); + cp.git_cmd = 1; + cp.no_stdin = 1; + cp.dir = path; + return run_command() == 0; +} + static int find_first_merges(struct object_array *result, const char *path, struct commit *a, struct commit *b) { diff --git a/submodule.h b/submodule.h index f01f87c..a7fa634 100644 --- a/submodule.h +++ b/submodule.h @@ -74,6 +74,8 @@ extern unsigned is_submodule_modified(const char *path, int ignore_untracked); extern int is_submodule_populated(const char *path); extern int submodule_uses_gitfile(const char *path); extern int ok_to_remove_submodule(const char *path); +extern int is_submodule_checkout_safe(const char *path, + const struct object_id *oid); extern int merge_submodule(unsigned char result[20], const char *path, const unsigned char base[20], const unsigned char a[20], -- 2.10.1.469.g00a8914
[PATCH 02/16] submodule: modernize ok_to_remove_submodule to use argv_array
Instead of constructing the NULL terminated array ourselves, we should make use of the argv_array infrastructure. Signed-off-by: Stefan Beller--- submodule.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/submodule.c b/submodule.c index 6f7d883..53a6dbb 100644 --- a/submodule.c +++ b/submodule.c @@ -1022,13 +1022,6 @@ int ok_to_remove_submodule(const char *path) { ssize_t len; struct child_process cp = CHILD_PROCESS_INIT; - const char *argv[] = { - "status", - "--porcelain", - "-u", - "--ignore-submodules=none", - NULL, - }; struct strbuf buf = STRBUF_INIT; int ok_to_remove = 1; @@ -1038,7 +1031,8 @@ int ok_to_remove_submodule(const char *path) if (!submodule_uses_gitfile(path)) return 0; - cp.argv = argv; + argv_array_pushl(, "status", "--porcelain", "-uall", + "--ignore-submodules=none", NULL); prepare_submodule_repo_env(_array); cp.git_cmd = 1; cp.no_stdin = 1; -- 2.10.1.469.g00a8914
[PATCH 04/16] update submodules: add is_submodule_populated
This is nearly same as Brandon sent out. (First patch of origin/bw/grep-recurse-submodules, will drop this patch once Brandons series is stable enough to build on). Signed-off-by: Stefan Beller--- submodule.c | 11 +++ submodule.h | 1 + 2 files changed, 12 insertions(+) diff --git a/submodule.c b/submodule.c index c9d22e5..97eaf7c 100644 --- a/submodule.c +++ b/submodule.c @@ -914,6 +914,17 @@ int fetch_populated_submodules(const struct argv_array *options, return spf.result; } +int is_submodule_populated(const char *path) +{ + int retval = 0; + struct strbuf gitdir = STRBUF_INIT; + strbuf_addf(, "%s/.git", path); + if (resolve_gitdir(gitdir.buf)) + retval = 1; + strbuf_release(); + return retval; +} + unsigned is_submodule_modified(const char *path, int ignore_untracked) { ssize_t len; diff --git a/submodule.h b/submodule.h index afc58d0..d44b4f1 100644 --- a/submodule.h +++ b/submodule.h @@ -61,6 +61,7 @@ extern int fetch_populated_submodules(const struct argv_array *options, const char *prefix, int command_line_option, int quiet, int max_parallel_jobs); extern unsigned is_submodule_modified(const char *path, int ignore_untracked); +extern int is_submodule_populated(const char *path); extern int submodule_uses_gitfile(const char *path); extern int ok_to_remove_submodule(const char *path); extern int merge_submodule(unsigned char result[20], const char *path, -- 2.10.1.469.g00a8914
[PATCH 03/16] submodule: use absolute path for computing relative path connecting
This addresses a similar concern as in f8eaa0ba98b (submodule--helper, module_clone: always operate on absolute paths, 2016-03-31) When computing the relative path from one to another location, we need to provide both locations as absolute paths to make sure the computation of the relative path is correct. While at it, change `real_work_tree` to be non const as we own the memory. Signed-off-by: Stefan Beller--- submodule.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/submodule.c b/submodule.c index 53a6dbb..c9d22e5 100644 --- a/submodule.c +++ b/submodule.c @@ -1221,23 +1221,25 @@ void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir) { struct strbuf file_name = STRBUF_INIT; struct strbuf rel_path = STRBUF_INIT; - const char *real_work_tree = xstrdup(real_path(work_tree)); + char *real_git_dir = xstrdup(real_path(git_dir)); + char *real_work_tree = xstrdup(real_path(work_tree)); /* Update gitfile */ strbuf_addf(_name, "%s/.git", work_tree); write_file(file_name.buf, "gitdir: %s", - relative_path(git_dir, real_work_tree, _path)); + relative_path(real_git_dir, real_work_tree, _path)); /* Update core.worktree setting */ strbuf_reset(_name); - strbuf_addf(_name, "%s/config", git_dir); + strbuf_addf(_name, "%s/config", real_git_dir); git_config_set_in_file(file_name.buf, "core.worktree", - relative_path(real_work_tree, git_dir, + relative_path(real_work_tree, real_git_dir, _path)); strbuf_release(_name); strbuf_release(_path); - free((void *)real_work_tree); + free(real_work_tree); + free(real_git_dir); } int parallel_submodules(void) -- 2.10.1.469.g00a8914
[PATCH 07/16] update submodules: introduce submodule_is_interesting
In later patches we introduce the --recurse-submodule flag for commands that modify the working directory, e.g. git-checkout. It is potentially expensive to check if a submodule needs an update, because a common theme to interact with submodules is to spawn a child process for each interaction. So let's introduce a function that pre checks if a submodule needs to be checked for an update. I am not particular happy with the name `submodule_is_interesting`, in internal iterations I had `submodule_requires_check_for_update` and `submodule_needs_update`, but I was even less happy with those names. Maybe `submodule_interesting_for_update`? Generally this is to answer "Am I allowed to touch the submodule at all?" or: "Does the user expect me to touch it?" which includes all of creation/deletion/update. This patch is based off a prior attempt by Jens Lehmann to add submodules to checkout. Signed-off-by: Stefan Beller--- submodule.c | 37 + submodule.h | 8 2 files changed, 45 insertions(+) diff --git a/submodule.c b/submodule.c index 38b0573..d34b721 100644 --- a/submodule.c +++ b/submodule.c @@ -500,6 +500,43 @@ void set_config_update_recurse_submodules(int value) config_update_recurse_submodules = value; } +int submodules_interesting_for_update(void) +{ + /* +* Update can't be "none", "merge" or "rebase", +* treat any value as OFF, except an explicit ON. +*/ + return config_update_recurse_submodules == RECURSE_SUBMODULES_ON; +} + +int submodule_is_interesting(const char *path, const unsigned char *sha1) +{ + /* +* If we cannot load a submodule config, we cannot get the name +* of the submodule, so we'd need to follow the gitlink file +*/ + const struct submodule *sub; + + if (!submodules_interesting_for_update()) + return 0; + + sub = submodule_from_path(sha1, path); + if (!sub) + return 0; + + switch (sub->update_strategy.type) { + case SM_UPDATE_UNSPECIFIED: + case SM_UPDATE_CHECKOUT: + return 1; + case SM_UPDATE_REBASE: + case SM_UPDATE_MERGE: + case SM_UPDATE_NONE: + case SM_UPDATE_COMMAND: + return 0; + } + return 0; +} + static int has_remote(const char *refname, const struct object_id *oid, int flags, void *cb_data) { diff --git a/submodule.h b/submodule.h index 185ad18..3df6881 100644 --- a/submodule.h +++ b/submodule.h @@ -57,6 +57,14 @@ extern void show_submodule_inline_diff(FILE *f, const char *path, const struct diff_options *opt); extern void set_config_fetch_recurse_submodules(int value); extern void set_config_update_recurse_submodules(int value); +/** + * When updating the working tree, do we need to check if the submodule needs + * updating. We do not require a check if we are already sure that the + * submodule doesn't need updating, e.g. when we are not interested in submodules + * or the submodule is marked uninteresting by being not initialized. + */ +extern int submodule_is_interesting(const char *path, const unsigned char *sha1); +extern int submodules_interesting_for_update(void); extern void check_for_new_submodule_commits(unsigned char new_sha1[20]); extern int fetch_populated_submodules(const struct argv_array *options, const char *prefix, int command_line_option, -- 2.10.1.469.g00a8914
[PATCH 16/16] checkout: add config option to recurse into submodules by default
To make it easier for the user, who doesn't want to give the `--recurse-submodules` option whenever they run checkout, have an option for to set the default behavior for checkout to recurse into submodules. Signed-off-by: Stefan Beller--- Documentation/config.txt | 6 ++ Documentation/git-checkout.txt | 5 +++-- submodule.c| 6 +++--- t/t2013-checkout-submodule.sh | 19 +++ 4 files changed, 31 insertions(+), 5 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index a0ab66a..67e0714 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -949,6 +949,12 @@ clean.requireForce:: A boolean to make git-clean do nothing unless given -f, -i or -n. Defaults to true. +checkout.recurseSubmodules:: + This option can be set to a boolean value. + When set to true checkout will recurse into submodules and + update them. When set to false, which is the default, checkout + will leave submodules untouched. + color.branch:: A boolean to enable/disable color in the output of linkgit:git-branch[1]. May be set to `always`, diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt index a0ea2c5..819c430 100644 --- a/Documentation/git-checkout.txt +++ b/Documentation/git-checkout.txt @@ -260,8 +260,9 @@ section of linkgit:git-add[1] to learn how to operate the `--patch` mode. Using --recurse-submodules will update the content of all initialized submodules according to the commit recorded in the superproject. If local modifications in a submodule would be overwritten the checkout - will fail until `-f` is used. If nothing (or --no-recurse-submodules) - is used, the work trees of submodules will not be updated. + will fail until `-f` is used. If `--no-recurse-submodules` is used, + the work trees of submodules will not be updated. If no command line + argument is given, `checkout.recurseSubmodules` is used as a default. :: Branch to checkout; if it refers to a branch (i.e., a name that, diff --git a/submodule.c b/submodule.c index 2149ef7..0c807d9 100644 --- a/submodule.c +++ b/submodule.c @@ -160,10 +160,10 @@ int submodule_config(const char *var, const char *value, void *cb) return 0; } else if (starts_with(var, "submodule.")) return parse_submodule_config_option(var, value); - else if (!strcmp(var, "fetch.recursesubmodules")) { + else if (!strcmp(var, "fetch.recursesubmodules")) config_fetch_recurse_submodules = parse_fetch_recurse_submodules_arg(var, value); - return 0; - } + else if (!strcmp(var, "checkout.recursesubmodules")) + config_update_recurse_submodules = parse_update_recurse_submodules_arg(var, value); return 0; } diff --git a/t/t2013-checkout-submodule.sh b/t/t2013-checkout-submodule.sh index 60f6987..788c59d 100755 --- a/t/t2013-checkout-submodule.sh +++ b/t/t2013-checkout-submodule.sh @@ -149,6 +149,25 @@ test_expect_success '"checkout --recurse-submodules" repopulates submodule' ' ) ' +test_expect_success 'option checkout.recurseSubmodules updates submodule' ' + test_config -C super checkout.recurseSubmodules 1 && + ( + cd super && + git checkout base && + git checkout -b advanced-base && + git -C submodule commit --allow-empty -m "empty commit" && + git add submodule && + git commit -m "advance submodule" && + git checkout base && + git diff-files --quiet && + git diff-index --quiet --cached base && + git checkout advanced-base && + git diff-files --quiet && + git diff-index --quiet --cached advanced-base && + git checkout --recurse-submodules base + ) +' + test_expect_success '"checkout --recurse-submodules" repopulates submodule in existing directory' ' ( cd super && -- 2.10.1.469.g00a8914
RE: [PATCH 02/16] submodule: modernize ok_to_remove_submodule to use argv_array
> - "-u", ... > + argv_array_pushl(, "status", "--porcelain", "-uall", This also changes -u to -uall, which is not mentioned in the commit message. That should probably be called out.
[PATCH 09/16] update submodules: add scheduling to update submodules
The walker of a tree is only expected to call `schedule_submodule_for_update` and once done, to run `update_submodules`. This avoids directory/file conflicts and later we can parallelize all submodule actions if needed. Signed-off-by: Stefan Beller--- submodule.c | 117 submodule.h | 4 +++ 2 files changed, 121 insertions(+) diff --git a/submodule.c b/submodule.c index 0fa4613..2773151 100644 --- a/submodule.c +++ b/submodule.c @@ -308,6 +308,75 @@ static void print_submodule_summary(struct rev_info *rev, FILE *f, strbuf_release(); } +static int update_submodule(const char *path, const struct object_id *oid, + int force, int is_new) +{ + const char *git_dir; + struct child_process cp = CHILD_PROCESS_INIT; + const struct submodule *sub = submodule_from_path(null_sha1, path); + + if (!sub || !sub->name) + return -1; + + git_dir = resolve_gitdir(git_common_path("modules/%s", sub->name)); + + if (!git_dir) + return -1; + + if (is_new) + connect_work_tree_and_git_dir(path, git_dir); + + /* update index via `read-tree --reset sha1` */ + argv_array_pushl(, "read-tree", + force ? "--reset" : "-m", + "-u", sha1_to_hex(oid->hash), NULL); + prepare_submodule_repo_env(_array); + cp.git_cmd = 1; + cp.no_stdin = 1; + cp.dir = path; + if (run_command()) { + warning(_("reading the index in submodule '%s' failed"), path); + child_process_clear(); + return -1; + } + + /* write index to working dir */ + child_process_clear(); + child_process_init(); + argv_array_pushl(, "checkout-index", "-a", NULL); + cp.git_cmd = 1; + cp.no_stdin = 1; + cp.dir = path; + if (force) + argv_array_push(, "-f"); + + if (run_command()) { + warning(_("populating the working directory in submodule '%s' failed"), path); + child_process_clear(); + return -1; + } + + /* get the HEAD right */ + child_process_clear(); + child_process_init(); + argv_array_pushl(, "checkout", "--recurse-submodules", NULL); + cp.git_cmd = 1; + cp.no_stdin = 1; + cp.dir = path; + if (force) + argv_array_push(, "-f"); + argv_array_push(, sha1_to_hex(oid->hash)); + + if (run_command()) { + warning(_("setting the HEAD in submodule '%s' failed"), path); + child_process_clear(); + return -1; + } + + child_process_clear(); + return 0; +} + int depopulate_submodule(const char *path) { int ret = 0; @@ -1336,3 +1405,51 @@ void prepare_submodule_repo_env(struct argv_array *out) } argv_array_push(out, "GIT_DIR=.git"); } + +struct scheduled_submodules_update_type { + const char *path; + const struct object_id *oid; + /* +* Do we need to perform a complete checkout or just incremental +* update? +*/ + unsigned is_new:1; +} *scheduled_submodules; +#define SCHEDULED_SUBMODULES_INIT {NULL, NULL} + +int scheduled_submodules_nr, scheduled_submodules_alloc; + +void schedule_submodule_for_update(const struct cache_entry *ce, int is_new) +{ + struct scheduled_submodules_update_type *ssu; + ALLOC_GROW(scheduled_submodules, + scheduled_submodules_nr + 1, + scheduled_submodules_alloc); + ssu = _submodules[scheduled_submodules_nr++]; + ssu->path = ce->name; + ssu->oid = >oid; + ssu->is_new = !!is_new; +} + +int update_submodules(int force) +{ + int i; + gitmodules_config(); + + /* +* NEEDSWORK: As submodule updates can potentially take some +* time each and they do not overlap (i.e. no d/f conflicts; +* this can be parallelized using the run_commands.h API. +*/ + for (i = 0; i < scheduled_submodules_nr; i++) { + struct scheduled_submodules_update_type *ssu = + _submodules[i]; + + if (submodule_is_interesting(ssu->path, null_sha1)) + update_submodule(ssu->path, ssu->oid, +force, ssu->is_new); + } + + scheduled_submodules_nr = 0; + return 0; +} diff --git a/submodule.h b/submodule.h index 8518cf3..f01f87c 100644 --- a/submodule.h +++ b/submodule.h @@ -94,4 +94,8 @@ extern int parallel_submodules(void); */ extern void prepare_submodule_repo_env(struct argv_array *out); +extern void schedule_submodule_for_update(const struct cache_entry *ce, + int new); +extern int update_submodules(int force); + #endif --
[PATCH 14/16] checkout: recurse into submodules if asked to
Allow checkout to recurse into submodules via the command line option --[no-]recurse-submodules. The flag for recurse-submodules in its current form could be an OPT_BOOL, but eventually we may want to have it as: git checkout --recurse-submodules=rebase|merge| \ cherry-pick|checkout|none which resembles the submodule..update options, so naturally a value such as "as-configured-or-X-as-fallback" would also come in handy. Signed-off-by: Stefan Beller--- Documentation/git-checkout.txt | 7 + builtin/checkout.c | 31 +++- t/lib-submodule-update.sh | 10 +- t/t2013-checkout-submodule.sh | 325 - 4 files changed, 362 insertions(+), 11 deletions(-) diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt index 8e2c066..a0ea2c5 100644 --- a/Documentation/git-checkout.txt +++ b/Documentation/git-checkout.txt @@ -256,6 +256,13 @@ section of linkgit:git-add[1] to learn how to operate the `--patch` mode. out anyway. In other words, the ref can be held by more than one worktree. +--[no-]recurse-submodules:: + Using --recurse-submodules will update the content of all initialized + submodules according to the commit recorded in the superproject. If + local modifications in a submodule would be overwritten the checkout + will fail until `-f` is used. If nothing (or --no-recurse-submodules) + is used, the work trees of submodules will not be updated. + :: Branch to checkout; if it refers to a branch (i.e., a name that, when prepended with "refs/heads/", is a valid ref), then that diff --git a/builtin/checkout.c b/builtin/checkout.c index 9b2a5b3..2a626a3 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -21,12 +21,31 @@ #include "submodule-config.h" #include "submodule.h" +static int recurse_submodules = RECURSE_SUBMODULES_DEFAULT; + static const char * const checkout_usage[] = { N_("git checkout [] "), N_("git checkout [] [] -- ..."), NULL, }; +int option_parse_recurse_submodules(const struct option *opt, + const char *arg, int unset) +{ + if (unset) { + recurse_submodules = RECURSE_SUBMODULES_OFF; + return 0; + } + if (arg) + recurse_submodules = + parse_update_recurse_submodules_arg(opt->long_name, + arg); + else + recurse_submodules = RECURSE_SUBMODULES_ON; + + return 0; +} + struct checkout_opts { int patch_mode; int quiet; @@ -826,7 +845,8 @@ static int switch_branches(const struct checkout_opts *opts, parse_commit_or_die(new->commit); } - ret = merge_working_tree(opts, , new, _error); + ret = merge_working_tree(opts, , new, _error) || + update_submodules(opts->force); if (ret) { free(path_to_free); return ret; @@ -1160,6 +1180,9 @@ int cmd_checkout(int argc, const char **argv, const char *prefix) N_("second guess 'git checkout '")), OPT_BOOL(0, "ignore-other-worktrees", _other_worktrees, N_("do not check if another worktree is holding the given ref")), + { OPTION_CALLBACK, 0, "recurse-submodules", _submodules, + "checkout", "control recursive updating of submodules", + PARSE_OPT_OPTARG, option_parse_recurse_submodules }, OPT_BOOL(0, "progress", _progress, N_("force progress reporting")), OPT_END(), }; @@ -1190,6 +1213,12 @@ int cmd_checkout(int argc, const char **argv, const char *prefix) git_xmerge_config("merge.conflictstyle", conflict_style, NULL); } + if (recurse_submodules != RECURSE_SUBMODULES_OFF) { + git_config(submodule_config, NULL); + if (recurse_submodules != RECURSE_SUBMODULES_DEFAULT) + set_config_update_recurse_submodules(recurse_submodules); + } + if ((!!opts.new_branch + !!opts.new_branch_force + !!opts.new_orphan_branch) > 1) die(_("-b, -B and --orphan are mutually exclusive")); diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh index 79cdd34..e0773c6 100755 --- a/t/lib-submodule-update.sh +++ b/t/lib-submodule-update.sh @@ -634,7 +634,13 @@ test_submodule_forced_switch () { ## Modified submodule # # Updating a submodule sha1 doesn't update the submodule's work tree - test_expect_success "$command: modified submodule does not update submodule work tree" ' + if test "$KNOWN_FAILURE_RECURSE_SUBMODULE_SERIES_BREAKS_REPLACE_SUBMODULE_TEST" = 1 + then +
[PATCH 13/16] submodule: teach unpack_trees() to update submodules
Signed-off-by: Stefan Beller--- entry.c| 7 +++-- unpack-trees.c | 97 -- unpack-trees.h | 1 + 3 files changed, 86 insertions(+), 19 deletions(-) diff --git a/entry.c b/entry.c index 2330b6e..dd829ec 100644 --- a/entry.c +++ b/entry.c @@ -270,7 +270,7 @@ int checkout_entry(struct cache_entry *ce, if (!check_path(path.buf, path.len, , state->base_dir_len)) { unsigned changed = ce_match_stat(ce, , CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE); - if (!changed) + if (!changed && (!S_ISDIR(st.st_mode) || !S_ISGITLINK(ce->ce_mode))) return 0; if (!state->force) { if (!state->quiet) @@ -287,9 +287,10 @@ int checkout_entry(struct cache_entry *ce, * just do the right thing) */ if (S_ISDIR(st.st_mode)) { - /* If it is a gitlink, leave it alone! */ - if (S_ISGITLINK(ce->ce_mode)) + if (S_ISGITLINK(ce->ce_mode)) { + schedule_submodule_for_update(ce, 1); return 0; + } if (!state->force) return error("%s is a directory", path.buf); remove_subtree(); diff --git a/unpack-trees.c b/unpack-trees.c index 576e1d5..c5c22ed 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -29,6 +29,9 @@ static const char *unpack_plumbing_errors[NB_UNPACK_TREES_ERROR_TYPES] = { /* ERROR_NOT_UPTODATE_DIR */ "Updating '%s' would lose untracked files in it", + /* ERROR_NOT_UPTODATE_SUBMODULE */ + "Updating submodule '%s' would lose modifications in it", + /* ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN */ "Untracked working tree file '%s' would be overwritten by merge.", @@ -80,6 +83,8 @@ void setup_unpack_trees_porcelain(struct unpack_trees_options *opts, msgs[ERROR_NOT_UPTODATE_DIR] = _("Updating the following directories would lose untracked files in it:\n%s"); + msgs[ERROR_NOT_UPTODATE_SUBMODULE] = + _("Updating the following submodules would lose modifications in it:\n%s"); if (!strcmp(cmd, "checkout")) msg = advice_commit_before_merge @@ -1315,19 +1320,18 @@ static int verify_uptodate_1(const struct cache_entry *ce, return 0; if (!lstat(ce->name, )) { - int flags = CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE; - unsigned changed = ie_match_stat(o->src_index, ce, , flags); - if (!changed) - return 0; - /* -* NEEDSWORK: the current default policy is to allow -* submodule to be out of sync wrt the superproject -* index. This needs to be tightened later for -* submodules that are marked to be automatically -* checked out. -*/ - if (S_ISGITLINK(ce->ce_mode)) - return 0; + if (!S_ISGITLINK(ce->ce_mode)) { + int flags = CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE; + unsigned changed = ie_match_stat(o->src_index, ce, , flags); + if (!changed) + return 0; + } else { + if (!submodule_is_interesting(ce->name, null_sha1)) + return 0; + if (ce_stage(ce) ? is_submodule_checkout_safe(ce->name, >oid) + : !is_submodule_modified(ce->name, 1)) + return 0; + } errno = 0; } if (errno == ENOENT) @@ -1350,6 +1354,59 @@ static int verify_uptodate_sparse(const struct cache_entry *ce, return verify_uptodate_1(ce, o, ERROR_SPARSE_NOT_UPTODATE_FILE); } +/* + * When a submodule gets turned into an unmerged entry, we want it to be + * up-to-date regarding the merge changes. + */ +static int verify_uptodate_submodule(const struct cache_entry *old, +const struct cache_entry *new, +struct unpack_trees_options *o) +{ + struct stat st; + + if (o->index_only + || (!((old->ce_flags & CE_VALID) || ce_skip_worktree(old)) + && (o->reset || ce_uptodate(old + return 0; + if (!submodule_is_interesting(new->name, null_sha1)) + return 0; + if (!lstat(old->name, )) { + unsigned changed = ie_match_stat(o->src_index, old, , +CE_MATCH_IGNORE_VALID| +
[PATCH 08/16] update submodules: add depopulate_submodule
Implement the functionality needed to enable work tree manipulating commands to that a deleted submodule should not only affect the index (leaving all the files of the submodule in the work tree) but also to remove the work tree of the superproject (including any untracked files). To do so, we need an equivalent of "rm -rf", which is already found in entry.c, so expose that and for clarity add a suffix "_or_dir" to it. That will only work properly when the submodule uses a gitfile instead of a .git directory and no untracked files are present. Otherwise the removal will fail with a warning (which is just what happened until now). Signed-off-by: Stefan Beller--- cache.h | 2 ++ entry.c | 8 submodule.c | 25 + submodule.h | 1 + 4 files changed, 36 insertions(+) diff --git a/cache.h b/cache.h index a50a61a..65c47e4 100644 --- a/cache.h +++ b/cache.h @@ -2018,4 +2018,6 @@ void sleep_millisec(int millisec); */ void safe_create_dir(const char *dir, int share); +void remove_subtree_or_die(const char *path); + #endif /* CACHE_H */ diff --git a/entry.c b/entry.c index c6eea24..019826b 100644 --- a/entry.c +++ b/entry.c @@ -73,6 +73,14 @@ static void remove_subtree(struct strbuf *path) die_errno("cannot rmdir '%s'", path->buf); } +void remove_subtree_or_die(const char *path) +{ + struct strbuf sb = STRBUF_INIT; + strbuf_addstr(, path); + remove_subtree(); + strbuf_release(); +} + static int create_file(const char *path, unsigned int mode) { mode = (mode & 0100) ? 0777 : 0666; diff --git a/submodule.c b/submodule.c index d34b721..0fa4613 100644 --- a/submodule.c +++ b/submodule.c @@ -308,6 +308,31 @@ static void print_submodule_summary(struct rev_info *rev, FILE *f, strbuf_release(); } +int depopulate_submodule(const char *path) +{ + int ret = 0; + char *dot_git = xstrfmt("%s/.git", path); + + /* Is it populated? */ + if (!resolve_gitdir(dot_git)) + goto out; + + /* Does it have a .git directory? */ + if (!submodule_uses_gitfile(path)) { + warning(_("cannot remove submodule '%s' because it (or one of " + "its nested submodules) uses a .git directory"), + path); + ret = -1; + goto out; + } + + remove_subtree_or_die(path); + +out: + free(dot_git); + return ret; +} + /* Helper function to display the submodule header line prior to the full * summary output. If it can locate the submodule objects directory it will * attempt to lookup both the left and right commits and put them into the diff --git a/submodule.h b/submodule.h index 3df6881..8518cf3 100644 --- a/submodule.h +++ b/submodule.h @@ -65,6 +65,7 @@ extern void set_config_update_recurse_submodules(int value); */ extern int submodule_is_interesting(const char *path, const unsigned char *sha1); extern int submodules_interesting_for_update(void); +extern int depopulate_submodule(const char *path); extern void check_for_new_submodule_commits(unsigned char new_sha1[20]); extern int fetch_populated_submodules(const struct argv_array *options, const char *prefix, int command_line_option, -- 2.10.1.469.g00a8914
[PATCH 11/16] teach unpack_trees() to remove submodule contents
Extend rmdir_or_warn() to remove the directories of those submodules which are scheduled for removal. Also teach verify_clean_submodule() to check that a submodule configured to be removed is not modified before scheduling it for removal. Signed-off-by: Stefan Beller--- unpack-trees.c | 6 ++ wrapper.c | 4 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/unpack-trees.c b/unpack-trees.c index ea6bdd2..576e1d5 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -9,6 +9,7 @@ #include "refs.h" #include "attr.h" #include "split-index.h" +#include "submodule.h" #include "dir.h" /* @@ -1361,15 +1362,12 @@ static void invalidate_ce_path(const struct cache_entry *ce, /* * Check that checking out ce->sha1 in subdir ce->name is not * going to overwrite any working files. - * - * Currently, git does not checkout subprojects during a superproject - * checkout, so it is not going to overwrite anything. */ static int verify_clean_submodule(const struct cache_entry *ce, enum unpack_trees_error_types error_type, struct unpack_trees_options *o) { - return 0; + return submodule_is_interesting(ce->name, null_sha1) && is_submodule_modified(ce->name, 0); } static int verify_clean_subdirectory(const struct cache_entry *ce, diff --git a/wrapper.c b/wrapper.c index e7f1979..17c08de 100644 --- a/wrapper.c +++ b/wrapper.c @@ -2,6 +2,7 @@ * Various trivial helper wrappers around standard functions */ #include "cache.h" +#include "submodule.h" static void do_nothing(size_t size) { @@ -592,6 +593,9 @@ int unlink_or_warn(const char *file) int rmdir_or_warn(const char *file) { + if (submodule_is_interesting(file, null_sha1) + && depopulate_submodule(file)) + return -1; return warn_if_unremovable("rmdir", file, rmdir(file)); } -- 2.10.1.469.g00a8914
[PATCH 12/16] entry: write_entry to write populate submodules
Signed-off-by: Stefan Beller--- entry.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/entry.c b/entry.c index 019826b..2330b6e 100644 --- a/entry.c +++ b/entry.c @@ -2,6 +2,7 @@ #include "blob.h" #include "dir.h" #include "streaming.h" +#include "submodule.h" static void create_directories(const char *path, int path_len, const struct checkout *state) @@ -211,6 +212,7 @@ static int write_entry(struct cache_entry *ce, return error("cannot create temporary submodule %s", path); if (mkdir(path, 0777) < 0) return error("cannot create submodule directory %s", path); + schedule_submodule_for_update(ce, 1); break; default: return error("unknown file mode for %s in index", path); -- 2.10.1.469.g00a8914
[RFC PATCH 00/16] Checkout aware of Submodules!
When working with submodules, nearly anytime after checking out a different state of the projects, that has submodules changed you'd run "git submodule update" with a current version of Git. There are two problems with this approach: * The "submodule update" command is dangerous as it doesn't check for work that may be lost in the submodule (e.g. a dangling commit). * you may forget to run the command as checkout is supposed to do all the work for you. Integrate updating the submodules into git checkout, with the same safety promises that git-checkout has, i.e. not throw away data unless asked to. This is done by first checking if the submodule is at the same sha1 as it is recorded in the superproject. If there are changes we stop proceeding the checkout just like it is when checking out a file that has local changes. The integration happens in the code that is also used in other commands such that it will be easier in the future to make other commands aware of submodule. This also solves d/f conflicts in case you replace a file/directory with a submodule or vice versa. The patches are still a bit rough, but the overall series seems promising enough to me that I want to put it out here. Any review, specifically on the design level welcome! Thanks, Stefan Stefan Beller (16): submodule.h: add extern keyword to functions, break line before 80 submodule: modernize ok_to_remove_submodule to use argv_array submodule: use absolute path for computing relative path connecting update submodules: add is_submodule_populated update submodules: add submodule config parsing update submodules: add a config option to determine if submodules are updated update submodules: introduce submodule_is_interesting update submodules: add depopulate_submodule update submodules: add scheduling to update submodules update submodules: is_submodule_checkout_safe teach unpack_trees() to remove submodule contents entry: write_entry to write populate submodules submodule: teach unpack_trees() to update submodules checkout: recurse into submodules if asked to completion: add '--recurse-submodules' to checkout checkout: add config option to recurse into submodules by default Documentation/config.txt | 6 + Documentation/git-checkout.txt | 8 + builtin/checkout.c | 31 ++- cache.h| 2 + contrib/completion/git-completion.bash | 2 +- entry.c| 17 +- submodule-config.c | 22 +++ submodule-config.h | 17 +- submodule.c| 246 +-- submodule.h| 77 +--- t/lib-submodule-update.sh | 10 +- t/t2013-checkout-submodule.sh | 344 - t/t9902-completion.sh | 1 + unpack-trees.c | 103 -- unpack-trees.h | 1 + wrapper.c | 4 + 16 files changed, 806 insertions(+), 85 deletions(-) -- 2.10.1.469.g00a8914
[PATCH 01/16] submodule.h: add extern keyword to functions, break line before 80
As the upcoming series will add a lot of functions to the submodule header, it's good to start of a sane base. So format the header to not exceed 80 characters a line and mark all functions to be extern. Signed-off-by: Stefan Beller--- submodule.h | 60 ++-- 1 file changed, 34 insertions(+), 26 deletions(-) diff --git a/submodule.h b/submodule.h index d9e197a..afc58d0 100644 --- a/submodule.h +++ b/submodule.h @@ -29,50 +29,58 @@ struct submodule_update_strategy { }; #define SUBMODULE_UPDATE_STRATEGY_INIT {SM_UPDATE_UNSPECIFIED, NULL} -int is_staging_gitmodules_ok(void); -int update_path_in_gitmodules(const char *oldpath, const char *newpath); -int remove_path_from_gitmodules(const char *path); -void stage_updated_gitmodules(void); -void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt, +extern int is_staging_gitmodules_ok(void); +extern int update_path_in_gitmodules(const char *oldpath, const char *newpath); +extern int remove_path_from_gitmodules(const char *path); +extern void stage_updated_gitmodules(void); +extern void set_diffopt_flags_from_submodule_config( + struct diff_options *diffopt, const char *path); -int submodule_config(const char *var, const char *value, void *cb); -void gitmodules_config(void); -int parse_submodule_update_strategy(const char *value, +extern int submodule_config(const char *var, const char *value, void *cb); +extern void gitmodules_config(void); +extern int parse_submodule_update_strategy(const char *value, struct submodule_update_strategy *dst); -const char *submodule_strategy_to_string(const struct submodule_update_strategy *s); -void handle_ignore_submodules_arg(struct diff_options *diffopt, const char *); -void show_submodule_summary(FILE *f, const char *path, +extern const char *submodule_strategy_to_string( + const struct submodule_update_strategy *s); +extern void handle_ignore_submodules_arg(struct diff_options *diffopt, +const char *); +extern void show_submodule_summary(FILE *f, const char *path, const char *line_prefix, struct object_id *one, struct object_id *two, unsigned dirty_submodule, const char *meta, const char *del, const char *add, const char *reset); -void show_submodule_inline_diff(FILE *f, const char *path, +extern void show_submodule_inline_diff(FILE *f, const char *path, const char *line_prefix, struct object_id *one, struct object_id *two, unsigned dirty_submodule, const char *meta, const char *del, const char *add, const char *reset, const struct diff_options *opt); -void set_config_fetch_recurse_submodules(int value); -void check_for_new_submodule_commits(unsigned char new_sha1[20]); -int fetch_populated_submodules(const struct argv_array *options, +extern void set_config_fetch_recurse_submodules(int value); +extern void check_for_new_submodule_commits(unsigned char new_sha1[20]); +extern int fetch_populated_submodules(const struct argv_array *options, const char *prefix, int command_line_option, int quiet, int max_parallel_jobs); -unsigned is_submodule_modified(const char *path, int ignore_untracked); -int submodule_uses_gitfile(const char *path); -int ok_to_remove_submodule(const char *path); -int merge_submodule(unsigned char result[20], const char *path, const unsigned char base[20], - const unsigned char a[20], const unsigned char b[20], int search); -int find_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name, - struct string_list *needs_pushing); -int push_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name); -void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir); -int parallel_submodules(void); +extern unsigned is_submodule_modified(const char *path, int ignore_untracked); +extern int submodule_uses_gitfile(const char *path); +extern int ok_to_remove_submodule(const char *path); +extern int merge_submodule(unsigned char result[20], const char *path, + const unsigned char base[20], + const unsigned char a[20], + const unsigned char b[20], int search); +extern int find_unpushed_submodules(unsigned char new_sha1[20], + const char *remotes_name, + struct string_list *needs_pushing); +extern int push_unpushed_submodules(unsigned char new_sha1[20], + const char *remotes_name); +extern void connect_work_tree_and_git_dir(const char *work_tree, + const char *git_dir); +extern int parallel_submodules(void); /* * Prepare
[PATCH 06/16] update submodules: add a config option to determine if submodules are updated
In later patches we introduce the options and flag for commands that modify the working directory, e.g. git-checkout. Have a central place to store such settings whether we want to update a submodule. Signed-off-by: Stefan Beller--- submodule.c | 6 ++ submodule.h | 1 + 2 files changed, 7 insertions(+) diff --git a/submodule.c b/submodule.c index 97eaf7c..38b0573 100644 --- a/submodule.c +++ b/submodule.c @@ -16,6 +16,7 @@ #include "quote.h" static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND; +static int config_update_recurse_submodules = RECURSE_SUBMODULES_DEFAULT; static int parallel_jobs = 1; static struct string_list changed_submodule_paths = STRING_LIST_INIT_NODUP; static int initialized_fetch_ref_tips; @@ -494,6 +495,11 @@ void set_config_fetch_recurse_submodules(int value) config_fetch_recurse_submodules = value; } +void set_config_update_recurse_submodules(int value) +{ + config_update_recurse_submodules = value; +} + static int has_remote(const char *refname, const struct object_id *oid, int flags, void *cb_data) { diff --git a/submodule.h b/submodule.h index d44b4f1..185ad18 100644 --- a/submodule.h +++ b/submodule.h @@ -56,6 +56,7 @@ extern void show_submodule_inline_diff(FILE *f, const char *path, const char *del, const char *add, const char *reset, const struct diff_options *opt); extern void set_config_fetch_recurse_submodules(int value); +extern void set_config_update_recurse_submodules(int value); extern void check_for_new_submodule_commits(unsigned char new_sha1[20]); extern int fetch_populated_submodules(const struct argv_array *options, const char *prefix, int command_line_option, -- 2.10.1.469.g00a8914
Re: [PATCH v3 4/4] submodule_needs_pushing() NEEDSWORK when we can not answer this question
On Tue, Nov 15, 2016 at 6:56 AM, Heiko Voigtwrote: > Signed-off-by: Heiko Voigt > --- > submodule.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/submodule.c b/submodule.c > index e1196fd..29efee9 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -531,6 +531,14 @@ static int submodule_has_commits(const char *path, > struct sha1_array *commits) > static int submodule_needs_pushing(const char *path, struct sha1_array > *commits) > { > if (!submodule_has_commits(path, commits)) > + /* NEEDSWORK: The correct answer here is "We do not style nit: /* * Usually we prefer comments with both the first and last line of the comment "empty". */ /* or just a one liner */ AFAICT these are the only two modes that we prefer in Git. For a discussion of all the other style, enjoy Linus' guidance. ;) http://lkml.iu.edu/hypermail/linux/kernel/1607.1/00627.html > "We do not know" ... ... because there is no way to check for us as we don't have the submodule commits. " We do consider it safe as no one in their sane mind would have changed the submodule pointers without having the submodule around. If a user did however change the submodules without having the submodule commits around, this indicates an expert who knows what they were doing." > We currently > +* proceed pushing here as if the submodules commits are > +* available on a remote. Since we can not check the > +* remote availability for this submodule we should > +* consider changing this behavior to: Stop here and > +* tell the user how to skip this check if wanted. > +*/ > return 0; Thanks for adding the NEEDSWORK, I just wrote the above lines to clarify my thought process, not as a suggestion for change. Overall the series looks good to me; the nits are minor IMHO. Thanks, Stefan
Re: [PATCH v3 3/4] batch check whether submodule needs pushing into one call
On Tue, Nov 15, 2016 at 6:56 AM, Heiko Voigtwrote: > -static int submodule_needs_pushing(const char *path, const unsigned char > sha1[20]) > +static int check_has_commit(const unsigned char sha1[20], void *data) > { > - if (add_submodule_odb(path) || !lookup_commit_reference(sha1)) > + int *has_commit = (int *) data; nit: just as prior patches ;) void* can be cast implicitly.
Re: [PATCH v3 2/4] serialize collection of refs that contain submodule changes
On Tue, Nov 15, 2016 at 6:56 AM, Heiko Voigtwrote: > +++ b/submodule.c > @@ -500,6 +500,13 @@ static int has_remote(const char *refname, const struct > object_id *oid, > return 1; > } > > +static int append_sha1_to_argv(const unsigned char sha1[20], void *data) > +{ > + struct argv_array *argv = (struct argv_array *) data; nit: no explicit cast needed when coming from a void pointer.
Re: [PATCH v3 1/4] serialize collection of changed submodules
On Tue, Nov 15, 2016 at 6:56 AM, Heiko Voigtwrote: > @@ -560,6 +575,31 @@ static void find_unpushed_submodule_commits(struct > commit *commit, > diff_tree_combined_merge(commit, 1, ); > } > > +struct collect_submodule_from_sha1s_data { > + char *submodule_path; > + struct string_list *needs_pushing; > +}; > + > +static int collect_submodules_from_sha1s(const unsigned char sha1[20], > + void *data) > +{ > + struct collect_submodule_from_sha1s_data *me = > + (struct collect_submodule_from_sha1s_data *) data; nit: no explicit cast needed when coming from void* ?
Re: [PATCH v15 04/27] bisect--helper: `bisect_clean_state` shell function in C
On 11/15/2016 10:40 PM, Junio C Hamano wrote: > Stephan Beyerwrites: > >>> +int bisect_clean_state(void) >>> +{ >>> + int result = 0; >>> + >>> + /* There may be some refs packed during bisection */ >>> + struct string_list refs_for_removal = STRING_LIST_INIT_NODUP; >>> + for_each_ref_in("refs/bisect", mark_for_removal, (void *) >>> _for_removal); >>> + string_list_append(_for_removal, xstrdup("BISECT_HEAD")); >>> + result = delete_refs(_for_removal, REF_NODEREF); >>> + refs_for_removal.strdup_strings = 1; >>> + string_list_clear(_for_removal, 0); >> >> Does it have advantages to populate a list (with duplicated strings), >> hand it to delete_refs(), and clear the list (and strings), instead of >> just doing a single delete_ref() (or whatever name the singular function >> has) in the callback? > > Depending on ref backends, removing multiple refs may be a lot more > efficient than calling a single ref removal for the same set of > refs, and the comment upfront I think hints that the code was > written in the way exactly with that in mind. Removing N refs from > a packed refs file will involve a loop that runs N times, each > iteration loading the file, locating an entry among possibly 100s of > refs to remove, and then rewriting the file. Great, that's the reply I wanted to hear (and that I've considered but wasn't sure of) ;) [I did not want to dig into the sources and check if delete_refs() does something smarter than invoking delete_ref() on each item of the list.] ~Stephan
Re: [PATCH v15 01/27] bisect--helper: use OPT_CMDMODE instead of OPT_BOOL
Hi, On 10/27/2016 06:59 PM, Junio C Hamano wrote: > Does any of you (and others on the list) have time and inclination > to review this series? Me, currently. ;) Besides the things I'm mentioning in respective patch e-mails, I wonder why several bisect--helper commands are prefixed by "bisect"; I'm talking about: git bisect--helper --bisect-clean-state git bisect--helper --bisect-reset git bisect--helper --bisect-write git bisect--helper --bisect-check-and-set-terms git bisect--helper --bisect-next-check git bisect--helper --bisect-terms git bisect--helper --bisect-start etc. instead of git bisect--helper --clean-state git bisect--helper --reset git bisect--helper --write git bisect--helper --check-and-set-terms git bisect--helper --next-check git bisect--helper --terms git bisect--helper --start etc. Well, I know *why* they have these names: because the shell function names are simply reused. But I don't know why these prefixes are kept in the bisect--helper command options. On the other hand, these command names are not exposed to the user and may hence not be that important.(?) ~Stephan
Re: [PATCH v15 04/27] bisect--helper: `bisect_clean_state` shell function in C
Stephan Beyerwrites: >> +int bisect_clean_state(void) >> +{ >> +int result = 0; >> + >> +/* There may be some refs packed during bisection */ >> +struct string_list refs_for_removal = STRING_LIST_INIT_NODUP; >> +for_each_ref_in("refs/bisect", mark_for_removal, (void *) >> _for_removal); >> +string_list_append(_for_removal, xstrdup("BISECT_HEAD")); >> +result = delete_refs(_for_removal, REF_NODEREF); >> +refs_for_removal.strdup_strings = 1; >> +string_list_clear(_for_removal, 0); > > Does it have advantages to populate a list (with duplicated strings), > hand it to delete_refs(), and clear the list (and strings), instead of > just doing a single delete_ref() (or whatever name the singular function > has) in the callback? Depending on ref backends, removing multiple refs may be a lot more efficient than calling a single ref removal for the same set of refs, and the comment upfront I think hints that the code was written in the way exactly with that in mind. Removing N refs from a packed refs file will involve a loop that runs N times, each iteration loading the file, locating an entry among possibly 100s of refs to remove, and then rewriting the file. Besides, it is bad taste to delete each individual item being iterated over in an interator in general, isn't it?
Re: [PATCH v15 04/27] bisect--helper: `bisect_clean_state` shell function in C
Hi, On 10/14/2016 04:14 PM, Pranit Bauva wrote: > diff --git a/bisect.c b/bisect.c > index 6f512c2..45d598d 100644 > --- a/bisect.c > +++ b/bisect.c > @@ -1040,3 +1046,40 @@ int estimate_bisect_steps(int all) > > return (e < 3 * x) ? n : n - 1; > } > + > +static int mark_for_removal(const char *refname, const struct object_id *oid, > + int flag, void *cb_data) > +{ > + struct string_list *refs = cb_data; > + char *ref = xstrfmt("refs/bisect%s", refname); > + string_list_append(refs, ref); > + return 0; > +} > + > +int bisect_clean_state(void) > +{ > + int result = 0; > + > + /* There may be some refs packed during bisection */ > + struct string_list refs_for_removal = STRING_LIST_INIT_NODUP; > + for_each_ref_in("refs/bisect", mark_for_removal, (void *) > _for_removal); > + string_list_append(_for_removal, xstrdup("BISECT_HEAD")); > + result = delete_refs(_for_removal, REF_NODEREF); > + refs_for_removal.strdup_strings = 1; > + string_list_clear(_for_removal, 0); Does it have advantages to populate a list (with duplicated strings), hand it to delete_refs(), and clear the list (and strings), instead of just doing a single delete_ref() (or whatever name the singular function has) in the callback? I am only seeing the disadvantage: a list with duped strings. > + unlink_or_warn(git_path_bisect_expected_rev()); [...] > + unlink_or_warn(git_path_bisect_start()); Comparing it with the original shell code (which uses "rm -f"), I was wondering a little after reading the function name unlink_or_warn() here... Looking it up helped: despite its name, unlink_or_warn() is really the function you have to use here ;D > diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c > index 65cf519..4254d61 100644 > --- a/builtin/bisect--helper.c > +++ b/builtin/bisect--helper.c > @@ -5,10 +5,15 @@ > #include "refs.h" > > static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS") > +static GIT_PATH_FUNC(git_path_bisect_expected_rev, "BISECT_EXPECTED_REV") > +static GIT_PATH_FUNC(git_path_bisect_ancestors_ok, "BISECT_ANCESTORS_OK") > +static GIT_PATH_FUNC(git_path_bisect_log, "BISECT_LOG") > +static GIT_PATH_FUNC(git_path_bisect_start, "BISECT_START") This is perhaps a non-issue, but you do not need any of these new path functions in *this* patch. I think nobody really cares, though, because you will need them later. Cheers Stephan
Re* [PATCH v7 00/17] port branch.c to use ref-filter's printing options
Junio C Hamanowrites: > Something like this needs to go before that step. This time with a log message and an additional test. The second paragraph of the proposed log message is written expecting that this patch will come before your "branch: use ref-filter printing APIs", which I think is a good place to avoid breakage in the series. -- >8 -- Subject: [PATCH] for-each-ref: do not segv with %(HEAD) on an unborn branch The code to flip between "*" and " " prefixes depending on what branch is checked out used in --format='%(HEAD)' did not consider that HEAD may resolve to an unborn branch and dereferenced a NULL. This will become a lot easier to trigger as the codepath will now be shared with "git branch [--list]". Signed-off-by: Junio C Hamano --- ref-filter.c | 2 +- t/t3203-branch-output.sh | 8 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/ref-filter.c b/ref-filter.c index 944671af5a..c71d7360d2 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -1318,7 +1318,7 @@ static void populate_value(struct ref_array_item *ref) head = resolve_ref_unsafe("HEAD", RESOLVE_REF_READING, sha1, NULL); - if (!strcmp(ref->refname, head)) + if (head && !strcmp(ref->refname, head)) v->s = "*"; else v->s = " "; diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh index d8edaf27e9..1a8dbca8c8 100755 --- a/t/t3203-branch-output.sh +++ b/t/t3203-branch-output.sh @@ -37,6 +37,14 @@ test_expect_success 'git branch --list shows local branches' ' test_cmp expect actual ' +test_expect_success 'same, but on an unborn branch' ' + test_when_finished "git checkout master" && + git checkout --orphan naster && + git branch --list >actual && + sed -e "s/\* / /" expect >expect-unborn && + test_cmp expect-unborn actual +' + cat >expect <<'EOF' branch-one branch-two -- 2.11.0-rc1-154-gcd2a643dcd
Re: [PATCH] worktree: fix a sparse 'Using plain integer as NULL pointer' warning
On 15/11/16 20:28, Ramsay Jones wrote: > > Signed-off-by: Ramsay Jones> --- > > Hi Duy, > > If you need to re-roll your 'nd/worktree-move' branch, could you > please squash this into the relevant patch [commit c49e92f5c > ("worktree move: refuse to move worktrees with submodules", 12-11-2016)]. > > Also, one of the new tests introduced by commit 31a8f3066 ("worktree move: > new command", 12-11-2016), fails for me, thus: > > $ ./t2028-worktree-move.sh -i -v > ... > --- expected2016-11-15 20:22:50.647241458 + > +++ actual 2016-11-15 20:22:50.647241458 + > @@ -1,3 +1,3 @@ >worktree /home/ramsay/git/t/trash directory.t2028-worktree-move > -worktree /home/ramsay/git/t/trash directory.t2028-worktree-move/destination >worktree /home/ramsay/git/t/trash directory.t2028-worktree-move/elsewhere > +worktree /home/ramsay/git/t/trash directory.t2028-worktree-move/destination > not ok 12 - move worktree > # > # git worktree move source destination && > # test_path_is_missing source && > # git worktree list --porcelain | grep "^worktree" >actual && > # cat <<-EOF >expected && > # worktree $TRASH_DIRECTORY > # worktree $TRASH_DIRECTORY/destination > # worktree $TRASH_DIRECTORY/elsewhere > # EOF > # test_cmp expected actual && > # git -C destination log --format=%s >actual2 && > # echo init >expected2 && > # test_cmp expected2 actual2 > # > $ > > Is there an expectation that the submodules will be listed in Er, that should read 'worktrees', of course! :( ATB, Ramsay Jones
Re: [PATCH v7 00/17] port branch.c to use ref-filter's printing options
Karthik Nayakwrites: > This is part of unification of the commands 'git tag -l, git branch -l > and git for-each-ref'. This ports over branch.c to use ref-filter's > printing options. > > Karthik Nayak (17): > ref-filter: implement %(if), %(then), and %(else) atoms > ref-filter: include reference to 'used_atom' within 'atom_value' > ref-filter: implement %(if:equals=) and > %(if:notequals=) > ref-filter: modify "%(objectname:short)" to take length > ref-filter: move get_head_description() from branch.c > ref-filter: introduce format_ref_array_item() > ref-filter: make %(upstream:track) prints "[gone]" for invalid > upstreams > ref-filter: add support for %(upstream:track,nobracket) > ref-filter: make "%(symref)" atom work with the ':short' modifier > ref-filter: introduce refname_atom_parser_internal() > ref-filter: introduce symref_atom_parser() and refname_atom_parser() > ref-filter: make remote_ref_atom_parser() use > refname_atom_parser_internal() > ref-filter: add `:dir` and `:base` options for ref printing atoms > ref-filter: allow porcelain to translate messages in the output > branch, tag: use porcelain output > branch: use ref-filter printing APIs > branch: implement '--format' option This is not a new issue, but --format='%(HEAD)' you stole from for-each-ref is broken when you are on an unborn branch, and the second patch from the tip makes "git branch" (no other args) on an unborn branch to segfault, when there are real branches that have commits. Something like this needs to go before that step. ref-filter.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ref-filter.c b/ref-filter.c index 944671af5a..c71d7360d2 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -1318,7 +1318,7 @@ static void populate_value(struct ref_array_item *ref) head = resolve_ref_unsafe("HEAD", RESOLVE_REF_READING, sha1, NULL); - if (!strcmp(ref->refname, head)) + if (head && !strcmp(ref->refname, head)) v->s = "*"; else v->s = " ";
[PATCH] worktree: fix a sparse 'Using plain integer as NULL pointer' warning
Signed-off-by: Ramsay Jones--- Hi Duy, If you need to re-roll your 'nd/worktree-move' branch, could you please squash this into the relevant patch [commit c49e92f5c ("worktree move: refuse to move worktrees with submodules", 12-11-2016)]. Also, one of the new tests introduced by commit 31a8f3066 ("worktree move: new command", 12-11-2016), fails for me, thus: $ ./t2028-worktree-move.sh -i -v ... --- expected 2016-11-15 20:22:50.647241458 + +++ actual2016-11-15 20:22:50.647241458 + @@ -1,3 +1,3 @@ worktree /home/ramsay/git/t/trash directory.t2028-worktree-move -worktree /home/ramsay/git/t/trash directory.t2028-worktree-move/destination worktree /home/ramsay/git/t/trash directory.t2028-worktree-move/elsewhere +worktree /home/ramsay/git/t/trash directory.t2028-worktree-move/destination not ok 12 - move worktree # # git worktree move source destination && # test_path_is_missing source && # git worktree list --porcelain | grep "^worktree" >actual && # cat <<-EOF >expected && # worktree $TRASH_DIRECTORY # worktree $TRASH_DIRECTORY/destination # worktree $TRASH_DIRECTORY/elsewhere # EOF # test_cmp expected actual && # git -C destination log --format=%s >actual2 && # echo init >expected2 && # test_cmp expected2 actual2 # $ Is there an expectation that the submodules will be listed in any particular order by 'git worktree list --porcelain' ? Thanks! ATB, Ramsay Jones builtin/worktree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/worktree.c b/builtin/worktree.c index e738142..abdf462 100644 --- a/builtin/worktree.c +++ b/builtin/worktree.c @@ -526,7 +526,7 @@ static int unlock_worktree(int ac, const char **av, const char *prefix) static void validate_no_submodules(const struct worktree *wt) { - struct index_state istate = {0}; + struct index_state istate = { NULL }; int i, found_submodules = 0; if (read_index_from(, worktree_git_path(wt, "index")) > 0) { -- 2.10.0
Re: Protecting old temporary objects being reused from concurrent "git gc"?
Jeff Kingwrites: > I suspect the issue is that read-tree populates the cache-tree index > extension, and then write-tree omits the object write before it even > gets to write_sha1_file(). The solution is that it should probably be > calling one of the freshen() functions (possibly just replacing > has_sha1_file() with check_and_freshen(), but I haven't looked). I think the final writing always happens via write_sha1_file(), but an earlier cache-tree update that says "if we have a tree object already, then use it, otherwise even though we know the object name for this subtree, do not record it in the cache-tree" codepath may decide to record the subtree's sha1 without refreshing the referent. A fix may look like this. cache-tree.c | 2 +- cache.h | 1 + sha1_file.c | 9 +++-- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 345ea35963..3ae6d056b4 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -401,7 +401,7 @@ static int update_one(struct cache_tree *it, if (repair) { unsigned char sha1[20]; hash_sha1_file(buffer.buf, buffer.len, tree_type, sha1); - if (has_sha1_file(sha1)) + if (freshen_object(sha1)) hashcpy(it->sha1, sha1); else to_invalidate = 1; diff --git a/cache.h b/cache.h index 5cdea6833e..1f5694f308 100644 --- a/cache.h +++ b/cache.h @@ -1126,6 +1126,7 @@ extern int sha1_object_info(const unsigned char *, unsigned long *); extern int hash_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *sha1); extern int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *return_sha1); extern int hash_sha1_file_literally(const void *buf, unsigned long len, const char *type, unsigned char *sha1, unsigned flags); +extern int freshen_object(const unsigned char *); extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned char *); extern int force_object_loose(const unsigned char *sha1, time_t mtime); extern int git_open_cloexec(const char *name, int flags); diff --git a/sha1_file.c b/sha1_file.c index e030805497..1daeb05dcd 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -3275,6 +3275,11 @@ static int freshen_packed_object(const unsigned char *sha1) return 1; } +int freshen_object(const unsigned char *sha1) +{ + return freshen_packed_object(sha1) || freshen_loose_object(sha1); +} + int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *sha1) { char hdr[32]; @@ -3284,7 +3289,7 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign * it out into .git/objects/??/?{38} file. */ write_sha1_file_prepare(buf, len, type, sha1, hdr, ); - if (freshen_packed_object(sha1) || freshen_loose_object(sha1)) + if (freshen_object(sha1)) return 0; return write_loose_object(sha1, hdr, hdrlen, buf, len, 0); } @@ -3302,7 +3307,7 @@ int hash_sha1_file_literally(const void *buf, unsigned long len, const char *typ if (!(flags & HASH_WRITE_OBJECT)) goto cleanup; - if (freshen_packed_object(sha1) || freshen_loose_object(sha1)) + if (freshen_object(sha1)) goto cleanup; status = write_loose_object(sha1, header, hdrlen, buf, len, 0);
Re: [PATCH 1/2] push: --dry-run updates submodules when --recurse-submodules=on-demand
Am 15.11.2016 um 18:29 schrieb Brandon Williams: I'm assuming the reason we want to avoid sub-shells is for performance reasons right? Yes, every fork() saved is a win on Windows. (No pun intended ;) -- Hannes
Re: ignore blank line removals
Yes that makes sense. I was not aware of custom merge drivers, but indeed that may address my situation. I'll look into it. Thanks! On Tue, Nov 15, 2016 at 12:51 PM, Junio C Hamanowrote: > John Rood writes: > >> On Thu, Nov 3, 2016 at 10:57 AM, John Rood wrote: >>> If the contents of a file initially are: >>> one >>> >>> three >>> and on branch A there is a commit, removing the blank line: >>> one >>> three >>> and on branch B there is a commit, adding 'two': >>> one >>> two >>> three >>> Normally, if you try to merge A into B (or B into A), git recognizes a >>> decision needs to be made between removing the second line or add >>> "two" to the second line. It would be convenient to have a merge >>> strategy that defaults to the latter in cases where the removed line >>> was a blank line (or a line containing only whitespace) ...something >>> like -Xignore-blank-line-removals. >> >> Is there any push-back on this, or is there a backlog that we can add >> this feature to? > > If you mean by "push-back" objections that say "this feature is evil > and should not be added to Git, ever", I do not think we saw any on > the list. The lack of response is most likely that everybody > thought "Meh." aka "It is not useful/interesting/valuable enough > feature to bother discussing." > > One thing I wondered was what you would want if the contents were > one/three without blank, A added blank between the two and B > replaced blank with two. As your example shows, in the filetype you > are dealing with, a blank line has no significant meaning (otherwise > you won't be ignoring the change A made to remove the blank in your > original example). The outcome desired by you may be one/two/three > without any blank in that case because of that. Which would lead to > the suspicion that ignore-blank-line-removals is not a good general > feature (i.e. in this other example, you want to ignore blank line > addition). Which further leads to the suspicion that the desire you > expressed in the original post is not well thought through to be a > useful specifification to build anything out of (yet), but is merely > a potentially interesting discussion starter. And nobody so far > found it interesting enough to spend time discussing it further to > make the wish detailed enough to be called a "design" of a new > feature. > > Having said all that. > > I suspect that you may not have to make any change to Git to do what > you want; can't you just use the 'merge' attribute with a custom > 3-way merge driver that removes an empty line? >
Re: Protecting old temporary objects being reused from concurrent "git gc"?
On Tue, 2016-11-15 at 12:40 -0500, Jeff King wrote: > On Tue, Nov 15, 2016 at 12:33:04PM -0500, Matt McCutchen wrote: > > > > > On Tue, 2016-11-15 at 12:06 -0500, Jeff King wrote: > > > > > > - when an object write is optimized out because we already have the > > > object, git will update the mtime on the file (loose object or > > > packfile) to freshen it > > > > FWIW, I am not seeing this happen when I do "git read-tree --prefix" > > followed by "git write-tree" using the current master (3ab2281). See > > the attached test script. > > The optimization I'm thinking about is the one from write_sha1_file(), > which learned to freshen in 33d4221c7 (write_sha1_file: freshen existing > objects, 2014-10-15). > > I suspect the issue is that read-tree populates the cache-tree index > extension, and then write-tree omits the object write before it even > gets to write_sha1_file(). The solution is that it should probably be > calling one of the freshen() functions (possibly just replacing > has_sha1_file() with check_and_freshen(), but I haven't looked). > > I'd definitely welcome patches in this area. Cool, it's nice to have an idea of what's going on. I don't think I'm going to try to fix it myself though. By the way, thanks for the fast response to my original question! Matt
[PATCH] git-gc.txt: expand discussion of races with other processes
In general, "git gc" may delete objects that another concurrent process is using but hasn't created a reference to. Git has some mitigations, but they fall short of a complete solution. Document this in the git-gc(1) man page and add a reference from the documentation of the gc.pruneExpire config variable. Based on a write-up by Jeff King: http://marc.info/?l=git=147922960131779=2 Signed-off-by: Matt McCutchen--- Documentation/config.txt | 4 +++- Documentation/git-gc.txt | 34 ++ 2 files changed, 29 insertions(+), 9 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 21fdddf..3f1d931 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1409,7 +1409,9 @@ gc.pruneExpire:: Override the grace period with this config variable. The value "now" may be used to disable this grace period and always prune unreachable objects immediately, or "never" may be used to - suppress pruning. + suppress pruning. This feature helps prevent corruption when + 'git gc' runs concurrently with another process writing to the + repository; see the "NOTES" section of linkgit:git-gc[1]. gc.worktreePruneExpire:: When 'git gc' is run, it calls diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt index bed60f4..852b72c 100644 --- a/Documentation/git-gc.txt +++ b/Documentation/git-gc.txt @@ -63,11 +63,10 @@ automatic consolidation of packs. --prune=:: Prune loose objects older than date (default is 2 weeks ago, overridable by the config variable `gc.pruneExpire`). - --prune=all prunes loose objects regardless of their age (do - not use --prune=all unless you know exactly what you are doing. - Unless the repository is quiescent, you will lose newly created - objects that haven't been anchored with the refs and end up - corrupting your repository). --prune is on by default. + --prune=all prunes loose objects regardless of their age and + increases the risk of corruption if another process is writing to + the repository concurrently; see "NOTES" below. --prune is on by + default. --no-prune:: Do not prune any loose objects. @@ -138,17 +137,36 @@ default is "2 weeks ago". Notes - -'git gc' tries very hard to be safe about the garbage it collects. In +'git gc' tries very hard not to delete objects that are referenced +anywhere in your repository. In particular, it will keep not only objects referenced by your current set of branches and tags, but also objects referenced by the index, remote-tracking branches, refs saved by 'git filter-branch' in refs/original/, or reflogs (which may reference commits in branches that were later amended or rewound). - -If you are expecting some objects to be collected and they aren't, check +If you are expecting some objects to be deleted and they aren't, check all of those locations and decide whether it makes sense in your case to remove those references. +On the other hand, when 'git gc' runs concurrently with another process, +there is a risk of it deleting an object that the other process is using +but hasn't created a reference to. This may just cause the other process +to fail or may corrupt the repository if the other process later adds a +reference to the deleted object. Git has two features that significantly +mitigate this problem: + +. Any object with modification time newer than the `--prune` date is kept, + along with everything reachable from it. + +. Most operations that add an object to the database update the + modification time of the object if it is already present so that #1 + applies. + +However, these features fall short of a complete solution, so users who +run commands concurrently have to live with some risk of corruption (which +seems to be low in practice) unless they turn off automatic garbage +collection with 'git config gc.auto 0'. + HOOKS - -- 2.7.4
Re: ignore blank line removals
John Roodwrites: > On Thu, Nov 3, 2016 at 10:57 AM, John Rood wrote: >> If the contents of a file initially are: >> one >> >> three >> and on branch A there is a commit, removing the blank line: >> one >> three >> and on branch B there is a commit, adding 'two': >> one >> two >> three >> Normally, if you try to merge A into B (or B into A), git recognizes a >> decision needs to be made between removing the second line or add >> "two" to the second line. It would be convenient to have a merge >> strategy that defaults to the latter in cases where the removed line >> was a blank line (or a line containing only whitespace) ...something >> like -Xignore-blank-line-removals. > > Is there any push-back on this, or is there a backlog that we can add > this feature to? If you mean by "push-back" objections that say "this feature is evil and should not be added to Git, ever", I do not think we saw any on the list. The lack of response is most likely that everybody thought "Meh." aka "It is not useful/interesting/valuable enough feature to bother discussing." One thing I wondered was what you would want if the contents were one/three without blank, A added blank between the two and B replaced blank with two. As your example shows, in the filetype you are dealing with, a blank line has no significant meaning (otherwise you won't be ignoring the change A made to remove the blank in your original example). The outcome desired by you may be one/two/three without any blank in that case because of that. Which would lead to the suspicion that ignore-blank-line-removals is not a good general feature (i.e. in this other example, you want to ignore blank line addition). Which further leads to the suspicion that the desire you expressed in the original post is not well thought through to be a useful specifification to build anything out of (yet), but is merely a potentially interesting discussion starter. And nobody so far found it interesting enough to spend time discussing it further to make the wish detailed enough to be called a "design" of a new feature. Having said all that. I suspect that you may not have to make any change to Git to do what you want; can't you just use the 'merge' attribute with a custom 3-way merge driver that removes an empty line?
Re: RFC: Enable delayed responses to Git clean/smudge filter requests
Lars Schneiderwrote: > > On 15 Nov 2016, at 02:03, Eric Wong wrote: > > > Anyways, I'll plan on doing something similar (in Perl) with the > > synchronous parts of public-inbox which relies on "cat-file --batch" > > at some point... (my rotational disks are slw :<) > > That sounds interesting! What changes to you have in mind for > "cat-file --batch"? We are thinking about performance improvements > in that area, too. I would be happy to help reviewing your patches! I'm not touching "cat-file --batch" itself, just the Perl code that uses it. (still trying my best to avoid working on C or any compiled languages); There'll probably be a Perl queue mechanism so there's still only one cat-file process running per-inbox to avoid overloading on seeks; but the Perl daemons should be able to handle network duties and other in-memory operations while it waits for cat-file.
Re: gitweb html validation
Raphaël Gertzwrote: > Hi, > > There a small bug in gitweb html validation, you need the following patch to > pass w3c check with searchbox enabled. > > The problem lies in the input directly embed inside a form without a wrapper > which is not valid. > I agree this is a small bug. Only block level elements are allowed to be inside form tags, according to https://www.w3.org/2010/04/xhtml10-strict.html#elem_form > Best regards > > The following patch fix the issue for git-2.10.2 : > --- /usr/share/gitweb/gitweb.cgi.orig 2016-11-15 15:37:21.149805026 +0100 > +++ /usr/share/gitweb/gitweb.cgi2016-11-15 15:37:48.579240429 +0100 > @@ -5518,6 +5518,7 @@ sub git_project_search_form { > > print "\n"; > print $cgi->start_form(-method => 'get', -action => $my_uri) . > + ''. > $cgi->hidden(-name => 'a', -value => 'project_list') . "\n"; > print $cgi->hidden(-name => 'pf', -value => $project_filter). "\n" > if (defined $project_filter); > @@ -5529,6 +5530,7 @@ sub git_project_search_form { > -checked => $search_use_regexp) . > "\n" . > $cgi->submit(-name => 'btnS', -value => 'Search') . > + ''. > $cgi->end_form() . "\n" . > $cgi->a({-href => href(project => undef, searchtext => undef, > project_filter => $project_filter)}, I think it's better to just move the -Tag outside of the surrounding div? Something like this perhaps, I didn't test it myself yet. diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 7cf68f07b..33d7c154f 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -5531,8 +5531,8 @@ sub git_project_search_form { $limit = " in '$project_filter/'"; } - print "\n"; print $cgi->start_form(-method => 'get', -action => $my_uri) . + "\n" . $cgi->hidden(-name => 'a', -value => 'project_list') . "\n"; print $cgi->hidden(-name => 'pf', -value => $project_filter). "\n" if (defined $project_filter); @@ -5544,11 +5544,11 @@ sub git_project_search_form { -checked => $search_use_regexp) . "\n" . $cgi->submit(-name => 'btnS', -value => 'Search') . - $cgi->end_form() . "\n" . $cgi->a({-href => href(project => undef, searchtext => undef, project_filter => $project_filter)}, esc_html("List all projects$limit")) . "\n"; - print "\n"; + print "\n" . + $cgi->end_form() . "\n"; } # entry for given @keys needs filling if at least one of keys in list diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css index 321260103..507740b6a 100644 --- a/gitweb/static/gitweb.css +++ b/gitweb/static/gitweb.css @@ -539,7 +539,7 @@ div.projsearch { margin: 20px 0px; } -div.projsearch form { +form div.projsearch { margin-bottom: 2px; }
Re: RFC: Enable delayed responses to Git clean/smudge filter requests
Lars Schneiderwrites: >> The filter itself would need to be aware of parallelism >> if it lives for multiple objects, right? > > Correct. This way Git doesn't need to deal with threading... I think you need to be careful about three things (at least; there may be more): * Codepaths that check out multiple cache entries do rely on the order of checkout. We checkout removals first to make room so that creation of a path X can succeed if an existing path X/Y that used to want to see X as a directory can succeed (see the use of checkout_entry() by "git checkout", which does have two separate loops to explicitly guarantee this), for example. I think "remove all and then create" you do not specifically have to worry about with the proposed change, but you may need to inspect and verify there aren't other kind of order dependency. * Done naively, it will lead to unmaintainable code, like this: + struct list_of_cache_entries *list = ...; for (i = 0; i < active_nr; i++) -checkout_entry(active_cache[i], state, NULL); +if (checkout_entry(active_cache[i], state, NULL) == DELAYED) + add_cache_to_queue(, active_cache[i]); + while (list) { +wait_for_checkout_to_finish(*list); +list = list->next; + } I do not think we want to see such a rewrite all over the codepaths. It might be OK to add such a "these entries are known to be delayed" list in struct checkout so that the above becomes more like this: for (i = 0; i < active_nr; i++) checkout_entry(active_cache[i], state, NULL); + checkout_entry_finish(state); That is, addition of a single "some of the checkout_entry() calls done so far might have been lazy, and I'll give them a chance to clean up" might be palatable. Anything more than that on the caller side is not. * You'd need to rein in the maximum parallelism somehow, as you do not want to see hundreds of competing filter processes starting only to tell the main loop over an index with hundreds of entries that they are delayed checkouts.
Re: ignore blank line removals
Is there any push-back on this, or is there a backlog that we can add this feature to? On Thu, Nov 3, 2016 at 10:57 AM, John Roodwrote: > If the contents of a file initially are: > one > > three > and on branch A there is a commit, removing the blank line: > one > three > and on branch B there is a commit, adding 'two': > one > two > three > Normally, if you try to merge A into B (or B into A), git recognizes a > decision needs to be made between removing the second line or add > "two" to the second line. It would be convenient to have a merge > strategy that defaults to the latter in cases where the removed line > was a blank line (or a line containing only whitespace) ...something > like -Xignore-blank-line-removals.
Re: [PATCH v3 0/4] Speedup finding of unpushed submodules
On 11/15, Stefan Beller wrote: > On Tue, Nov 15, 2016 at 6:56 AM, Heiko Voigtwrote: > > You can find the second iteration of this series here: > > > > http://public-inbox.org/git/cover.1475851621.git.hvo...@hvoigt.net/ > > > > All mentioned issues should be fixed. I put the NEEDSWORK comment in a > > seperate patch since it seemed to me as if we did not fully agree on > > that. So in case we decide against it we can just drop that patch. > > > > Cheers Heiko > > > > +cc Brandon who started building a series on top of yours. I don't think there should be too much I'll have to change with my series, I'll just rebase against these changes once Junio updates his branch. If you want to take a look at my series its here: https://public-inbox.org/git/1479172735-698-1-git-send-email-bmw...@google.com/ Thanks for the heads up Stefan. -- Brandon Williams
Re: [PATCH v3 0/4] Speedup finding of unpushed submodules
On Tue, Nov 15, 2016 at 6:56 AM, Heiko Voigtwrote: > You can find the second iteration of this series here: > > http://public-inbox.org/git/cover.1475851621.git.hvo...@hvoigt.net/ > > All mentioned issues should be fixed. I put the NEEDSWORK comment in a > seperate patch since it seemed to me as if we did not fully agree on > that. So in case we decide against it we can just drop that patch. > > Cheers Heiko > +cc Brandon who started building a series on top of yours.
Re: [PATCH] remote-curl: don't hang when a server dies before any output
Jeff Kingwrites: > On Mon, Nov 14, 2016 at 05:02:27PM -0800, Junio C Hamano wrote: > >> Jeff King writes: >> >> > Actually, I take it back. I think it works for a single round of ref >> > negotiation, but not for multiple. Enabling GIT_TEST_LONG=1 causes it to >> > fail t5551. >> > >> > I think I've probably made a mis-assumption on exactly when in the HTTP >> > protocol we will see a flush packet (and perhaps that is a sign that >> > this protocol-snooping approach is not a good one). >> >> Hmph. I think I tried David's original under GIT_TEST_LONG and saw >> it got stuck; could be the same issue, I guess. > > It works OK here. I think it is just that the test is really slow (by > design). Yeah, I think what I recalled was my old attempt to run the follow-up "any SHA-1" patch without this one.
Re: [PATCH v3 0/6] recursively grep across submodules
coverity seems to dislike this part: *** CID 1394367: Null pointer dereferences (NULL_RETURNS) /builtin/grep.c: 625 in grep_submodule() 619 is_submodule_populated(path))) { 620 /* 621 * If searching history, check for the presense of the 622 * submodule's gitdir before skipping the submodule. 623 */ 624 if (sha1) { >>> CID 1394367: Null pointer dereferences (NULL_RETURNS) >>> Dereferencing a null pointer "submodule_from_path(null_sha1, path)". 625 path = git_path("modules/%s", 626 submodule_from_path(null_sha1, path)->name); 627 628 if (!(is_directory(path) && is_git_directory(path))) 629 return 0; 630 } else {
Re: [PATCH v7 13/17] ref-filter: add `:dir` and `:base` options for ref printing atoms
Jacob Kellerwrites: > dirname makes sense. What about implementing a reverse variant of > strip, which you could perform stripping of right-most components and > instead of stripping by a number, strip "to" a number, ie: keep the > left N most components, and then you could use something like > ... > I think that would be more general purpose than basename, and less confusing? I think you are going in the right direction. I had a similar thought but built around a different axis. I.e. if strip=1 strips one from the left, perhaps we want to have rstrip=1 that strips one from the right, and also strip=-1 to mean strip everything except one from the left and so on?. I think this and your keep (and perhaps you'll have rkeep for completeness) have the same expressive power. I do not offhand have a preference one over the other. Somehow it sounds a bit strange to me to treat 'remotes' as the same class of token as 'heads' and 'tags' (I'd expect 'heads' and 'remotes/origin' would be at the same level in end-user's mind), but that is probably an unrelated tangent. The reason this series wants to introduce :base must be to emulate an existing feature, so that existing feature is a concrete counter-example that argues against my "it sounds a bit strange" reaction.
Re: Protecting old temporary objects being reused from concurrent "git gc"?
On Tue, Nov 15, 2016 at 12:33:04PM -0500, Matt McCutchen wrote: > On Tue, 2016-11-15 at 12:06 -0500, Jeff King wrote: > > - when an object write is optimized out because we already have the > > object, git will update the mtime on the file (loose object or > > packfile) to freshen it > > FWIW, I am not seeing this happen when I do "git read-tree --prefix" > followed by "git write-tree" using the current master (3ab2281). See > the attached test script. The optimization I'm thinking about is the one from write_sha1_file(), which learned to freshen in 33d4221c7 (write_sha1_file: freshen existing objects, 2014-10-15). I suspect the issue is that read-tree populates the cache-tree index extension, and then write-tree omits the object write before it even gets to write_sha1_file(). The solution is that it should probably be calling one of the freshen() functions (possibly just replacing has_sha1_file() with check_and_freshen(), but I haven't looked). I'd definitely welcome patches in this area. > OK. I'll write a patch to add a summary of this information to the > git-gc man page. Sounds like a good idea. Thanks. -Peff
Re: Protecting old temporary objects being reused from concurrent "git gc"?
On Tue, 2016-11-15 at 12:06 -0500, Jeff King wrote: > - when an object write is optimized out because we already have the > object, git will update the mtime on the file (loose object or > packfile) to freshen it FWIW, I am not seeing this happen when I do "git read-tree --prefix" followed by "git write-tree" using the current master (3ab2281). See the attached test script. > If you have long-running data (like, a temporary index file that might > literally sit around for days or weeks) I think that is a potential > problem. And the solution is probably to use refs in some way to point > to your objects. Agreed. This is not my current scenario. > If you're worried about a short-term operation where > somebody happens to run git-gc concurrently, I agree it's a possible > problem, but I suspect something you can ignore in practice. > > For the most part, a lot of the client-side git tools assume that one > operation is happening at a time in the repository. And I think that > largely holds for a developer working on a single clone, and things just > work in practice. > > Auto-gc makes that a little sketchier, but historically does not seem to > have really caused problems in practice. OK. I'll write a patch to add a summary of this information to the git-gc man page. Matt test-git-read-tree-write-tree-touch-object.sh Description: application/shellscript
Re: [PATCH 1/2] push: --dry-run updates submodules when --recurse-submodules=on-demand
On 11/15, Johannes Sixt wrote: > Am 15.11.2016 um 02:18 schrieb Brandon Williams: > >diff --git a/t/t5531-deep-submodule-push.sh b/t/t5531-deep-submodule-push.sh > >index 198ce84..e6ccc30 100755 > >--- a/t/t5531-deep-submodule-push.sh > >+++ b/t/t5531-deep-submodule-push.sh > >@@ -427,7 +427,31 @@ test_expect_success 'push unpushable submodule > >recursively fails' ' > > cd submodule.git && > > git rev-parse master >../actual > > ) && > >-test_cmp expected actual > >+test_cmp expected actual && > >+git -C work reset --hard master^ > > This line looks like a clean-up to be done after the test case. You > should wrap it in test_when_finished, but outside of a sub-shell, > which looks like it's just one line earlier, before the test_cmp. K will do. > > >+' > >+ > >+test_expect_failure 'push --dry-run does not recursively update submodules' > >' > >+( > >+cd work && > >+( > >+cd gar/bage && > >+git checkout master && > >+git rev-parse master >../../../expected_submodule && > >+> junk9 && > >+git add junk9 && > >+git commit -m "Ninth junk" > >+) && > > Could you please avoid this nested sub-shell? It is fine to cd > around when you are in a sub-shell. Yes I can reorganize it to avoid the nested sub-shells. I was just trying to follow the organization of the other tests in the same file. > > >+git checkout master && > >+git rev-parse master >../expected_pub > > Broken && chain. > > >+git add gar/bage && > >+git commit -m "Ninth commit for gar/bage" && > >+git push --dry-run --recurse-submodules=on-demand ../pub.git > >master > >+) && > >+git -C submodule.git rev-parse master >actual_submodule && > >+git -C pub.git rev-parse master >actual_pub && > > All of the commands above are 'git something' that could become 'git > -C work something' and then the sub-shell would be unnecessary. I'm > not sure I would appreciate the verbosity of the result, though. > (Perhaps aligning the git subcommands after -C foo would help.) I'll play around with it and try to make it look pretty while trying to avoid sub-shells. I'm assuming the reason we want to avoid sub-shells is for performance reasons right? -- Brandon Williams
Re: Protecting old temporary objects being reused from concurrent "git gc"?
On Tue, Nov 15, 2016 at 09:13:14AM -0500, Matt McCutchen wrote: > I want to change this to something that won't leave an inconsistent > state if interrupted. I've written code for this kind of thing before > that sets GIT_INDEX_FILE and uses a temporary index file and "git > write-tree". But I realized that if "git gc" runs concurrently, the > generated tree could be deleted before it is used and the tool would > fail. If I had a need to run "git commit-tree", it seems like I might > even end up with a commit object with a broken reference to a tree. > "git gc" normally doesn't delete objects that were created in the last > 2 weeks, but if an identical tree was added to the object database more > than 2 weeks ago by another operation and is unreferenced, it could be > reused without updating its mtime and it could still get deleted. Modern versions of git do two things to help with this: - any object which is referenced by a "recent" object (within the 2 weeks) is also considered recent. So if you create a new commit object that points to a tree, even before you reference the commit that tree is protected - when an object write is optimized out because we already have the object, git will update the mtime on the file (loose object or packfile) to freshen it This isn't perfect, though. You can decide to reference an existing object just as it is being deleted. And the pruning process itself is not atomic (and it's tricky to make it so, just because of what we're promised by the filesystem). > Is there a recommended way to avoid this kind of problem in add-on > tools? (I searched the Git documentation and the web for information > about races with "git gc" and didn't find anything useful.) If not, it > seems to be a significant design flaw in "git gc", even if the problem > is extremely rare in practice. I wonder if some of the built-in > commands may have the same problem, though I haven't tried to test > them. If this is confirmed to be a known problem affecting built-in > commands, then at least I won't feel bad about introducing the > same problem into add-on tools. :/ If you have long-running data (like, a temporary index file that might literally sit around for days or weeks) I think that is a potential problem. And the solution is probably to use refs in some way to point to your objects. If you're worried about a short-term operation where somebody happens to run git-gc concurrently, I agree it's a possible problem, but I suspect something you can ignore in practice. For the most part, a lot of the client-side git tools assume that one operation is happening at a time in the repository. And I think that largely holds for a developer working on a single clone, and things just work in practice. Auto-gc makes that a little sketchier, but historically does not seem to have really caused problems in practice. For a busy multi-user server, I recommend turning off auto-gc entirely, and repacking manually with "-k" to be on the safe side. -Peff
Re: [PATCH] t0021, t5615: use $PWD instead of $(pwd) in PATH-like shell variables
Hi Junio, On Mon, 14 Nov 2016, Junio C Hamano wrote: > Dscho's mention of 'still times out' may be an indiciation that > something unspecified on 'pu' is not ready to be merged to 'next', > but blocking all of 'pu' with a blanket statement is not useful, > and that was where my response comes from. Until the time when the test suite takes less than the insane three hours to run, I am afraid that a blanket statement on `pu` is the best I can do. Ciao, Johannes
gitweb html validation
Hi, There a small bug in gitweb html validation, you need the following patch to pass w3c check with searchbox enabled. The problem lies in the input directly embed inside a form without a wrapper which is not valid. Best regards The following patch fix the issue for git-2.10.2 : --- /usr/share/gitweb/gitweb.cgi.orig 2016-11-15 15:37:21.149805026 +0100 +++ /usr/share/gitweb/gitweb.cgi2016-11-15 15:37:48.579240429 +0100 @@ -5518,6 +5518,7 @@ sub git_project_search_form { print "\n"; print $cgi->start_form(-method => 'get', -action => $my_uri) . + ''. $cgi->hidden(-name => 'a', -value => 'project_list') . "\n"; print $cgi->hidden(-name => 'pf', -value => $project_filter). "\n" if (defined $project_filter); @@ -5529,6 +5530,7 @@ sub git_project_search_form { -checked => $search_use_regexp) . "\n" . $cgi->submit(-name => 'btnS', -value => 'Search') . + ''. $cgi->end_form() . "\n" . $cgi->a({-href => href(project => undef, searchtext => undef, project_filter => $project_filter)},
RE: [PATCH] remote-curl: don't hang when a server dies before any output
> -Original Message- > From: Jeff King [mailto:p...@peff.net] ... > I'll make that change and then try to wrap this up with a commit message. > I plan to steal your tests, if that's OK. Please do!
Re: New to git, need help!
Hey Mayank, On Tue, Nov 15, 2016 at 6:00 PM, Mayank Guptawrote: > Hi All, > > I'm new to open source and have recently joined this mailing list. > Since I'm new at this, I think I can initially contribute to the > community by fixing some small bugs or errors but as the documentation > is too large, I don't know where to start. > So if anybody could guide me on how to go about it, it would be really > appreciated. It is really nice that you want to start contributing. We have a user documentation[1] and a developer documentation[2]. To contribute to a feature, firstly you need to learn about that feature. So go through the man pages of the tool which you would want to work on. The things that you would be using are documented in the technical documentation. Apart from this, you can start hunting small errors but I think it would be difficult in the initial stages. So read this[3] even though it is specific for GSoC, it is helpful for any new developer. Also you can search the mailing list archives for "low hanging fruit" to get things which were thought of but not done or something like that. [1]: https://github.com/git/git/tree/master/Documentation [2]: https://github.com/git/git/tree/master/Documentation/technical [3]: https://git.github.io/SoC-2016-Microprojects/ Hope to see a patch from your side soon! :) Regards, Pranit Bauva
Re: [PATCH v1 2/2] travis-ci: disable GIT_TEST_HTTPD for macOS
On Tue, Nov 15, 2016 at 01:07:18PM +0100, Heiko Voigt wrote: > On Fri, Nov 11, 2016 at 09:22:51AM +0100, Lars Schneider wrote: > > To all macOS users on the list: > > Does anyone execute the tests with GIT_TEST_HTTPD enabled successfully? > > Nope. The following tests fail for me on master: 5539, 5540, 5541, 5542, > 5550, 5551, 5561, 5812. Failing how? Does apache fail to start up? Do tests fails? What does "-v" say? Is there anything interesting in httpd/error.log in the trash directory? -Peff
Re: Git status takes too long- How to improve the performance of git
On Tue, Nov 15, 2016 at 02:33:12AM -0700, ravalika wrote: > Number of files - 63883 Since you also posted this to the "Git for Windows" mailinglist I assume that you are using Windows. Reduce the number of files. For example split the repository into two one for documentation and one for source. Thats what I did with a converted repository that had to many files. Windows is unfortunately very slow when it comes to handling many files and if I recall correctly ~3 files was in a nicely handleable range for a Git repository on Windows, but that might have changed... Cheers Heiko
[PATCH v3 3/4] batch check whether submodule needs pushing into one call
We run a command for each sha1 change in a submodule. This is unnecessary since we can simply batch all sha1's we want to check into one command. Lets do it so we can speedup the check when many submodule changes are in need of checking. Signed-off-by: Heiko Voigt--- submodule.c | 63 - 1 file changed, 33 insertions(+), 30 deletions(-) diff --git a/submodule.c b/submodule.c index 769d666..e1196fd 100644 --- a/submodule.c +++ b/submodule.c @@ -507,27 +507,49 @@ static int append_sha1_to_argv(const unsigned char sha1[20], void *data) return 0; } -static int submodule_needs_pushing(const char *path, const unsigned char sha1[20]) +static int check_has_commit(const unsigned char sha1[20], void *data) { - if (add_submodule_odb(path) || !lookup_commit_reference(sha1)) + int *has_commit = (int *) data; + + if (!lookup_commit_reference(sha1)) + *has_commit = 0; + + return 0; +} + +static int submodule_has_commits(const char *path, struct sha1_array *commits) +{ + int has_commit = 1; + + if (add_submodule_odb(path)) + return 0; + + sha1_array_for_each_unique(commits, check_has_commit, _commit); + return has_commit; +} + +static int submodule_needs_pushing(const char *path, struct sha1_array *commits) +{ + if (!submodule_has_commits(path, commits)) return 0; if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) { struct child_process cp = CHILD_PROCESS_INIT; - const char *argv[] = {"rev-list", NULL, "--not", "--remotes", "-n", "1" , NULL}; struct strbuf buf = STRBUF_INIT; int needs_pushing = 0; - argv[1] = sha1_to_hex(sha1); - cp.argv = argv; + argv_array_push(, "rev-list"); + sha1_array_for_each_unique(commits, append_sha1_to_argv, ); + argv_array_pushl(, "--not", "--remotes", "-n", "1" , NULL); + prepare_submodule_repo_env(_array); cp.git_cmd = 1; cp.no_stdin = 1; cp.out = -1; cp.dir = path; if (start_command()) - die("Could not run 'git rev-list %s --not --remotes -n 1' command in submodule %s", - sha1_to_hex(sha1), path); + die("Could not run 'git rev-list --not --remotes -n 1' command in submodule %s", + path); if (strbuf_read(, cp.out, 41)) needs_pushing = 1; finish_command(); @@ -582,23 +604,6 @@ static void find_unpushed_submodule_commits(struct commit *commit, diff_tree_combined_merge(commit, 1, ); } -struct collect_submodule_from_sha1s_data { - char *submodule_path; - struct string_list *needs_pushing; -}; - -static int collect_submodules_from_sha1s(const unsigned char sha1[20], - void *data) -{ - struct collect_submodule_from_sha1s_data *me = - (struct collect_submodule_from_sha1s_data *) data; - - if (submodule_needs_pushing(me->submodule_path, sha1)) - string_list_insert(me->needs_pushing, me->submodule_path); - - return 0; -} - static void free_submodules_sha1s(struct string_list *submodules) { struct string_list_item *item; @@ -635,12 +640,10 @@ int find_unpushed_submodules(struct sha1_array *commits, argv_array_clear(); for_each_string_list_item(submodule, ) { - struct collect_submodule_from_sha1s_data data; - data.submodule_path = submodule->string; - data.needs_pushing = needs_pushing; - sha1_array_for_each_unique((struct sha1_array *) submodule->util, - collect_submodules_from_sha1s, - ); + struct sha1_array *commits = (struct sha1_array *) submodule->util; + + if (submodule_needs_pushing(submodule->string, commits)) + string_list_insert(needs_pushing, submodule->string); } free_submodules_sha1s(); -- 2.10.1.386.gc503e45
[PATCH v3 2/4] serialize collection of refs that contain submodule changes
We are iterating over each pushed ref and want to check whether it contains changes to submodules. Instead of immediately checking each ref lets first collect them and then do the check for all of them in one revision walk. Signed-off-by: Heiko Voigt--- submodule.c | 35 --- submodule.h | 5 +++-- transport.c | 29 + 3 files changed, 44 insertions(+), 25 deletions(-) diff --git a/submodule.c b/submodule.c index b91585e..769d666 100644 --- a/submodule.c +++ b/submodule.c @@ -500,6 +500,13 @@ static int has_remote(const char *refname, const struct object_id *oid, return 1; } +static int append_sha1_to_argv(const unsigned char sha1[20], void *data) +{ + struct argv_array *argv = (struct argv_array *) data; + argv_array_push(argv, sha1_to_hex(sha1)); + return 0; +} + static int submodule_needs_pushing(const char *path, const unsigned char sha1[20]) { if (add_submodule_odb(path) || !lookup_commit_reference(sha1)) @@ -600,25 +607,24 @@ static void free_submodules_sha1s(struct string_list *submodules) string_list_clear(submodules, 1); } -int find_unpushed_submodules(unsigned char new_sha1[20], +int find_unpushed_submodules(struct sha1_array *commits, const char *remotes_name, struct string_list *needs_pushing) { struct rev_info rev; struct commit *commit; - const char *argv[] = {NULL, NULL, "--not", "NULL", NULL}; - int argc = ARRAY_SIZE(argv) - 1; - char *sha1_copy; struct string_list submodules = STRING_LIST_INIT_DUP; struct string_list_item *submodule; + struct argv_array argv = ARGV_ARRAY_INIT; - struct strbuf remotes_arg = STRBUF_INIT; - - strbuf_addf(_arg, "--remotes=%s", remotes_name); init_revisions(, NULL); - sha1_copy = xstrdup(sha1_to_hex(new_sha1)); - argv[1] = sha1_copy; - argv[3] = remotes_arg.buf; - setup_revisions(argc, argv, , NULL); + + /* argv.argv[0] will be ignored by setup_revisions */ + argv_array_push(, "find_unpushed_submodules"); + sha1_array_for_each_unique(commits, append_sha1_to_argv, ); + argv_array_push(, "--not"); + argv_array_pushf(, "--remotes=%s", remotes_name); + + setup_revisions(argv.argc, argv.argv, , NULL); if (prepare_revision_walk()) die("revision walk setup failed"); @@ -626,8 +632,7 @@ int find_unpushed_submodules(unsigned char new_sha1[20], find_unpushed_submodule_commits(commit, ); reset_revision_walk(); - free(sha1_copy); - strbuf_release(_arg); + argv_array_clear(); for_each_string_list_item(submodule, ) { struct collect_submodule_from_sha1s_data data; @@ -664,12 +669,12 @@ static int push_submodule(const char *path) return 1; } -int push_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name) +int push_unpushed_submodules(struct sha1_array *commits, const char *remotes_name) { int i, ret = 1; struct string_list needs_pushing = STRING_LIST_INIT_DUP; - if (!find_unpushed_submodules(new_sha1, remotes_name, _pushing)) + if (!find_unpushed_submodules(commits, remotes_name, _pushing)) return 1; for (i = 0; i < needs_pushing.nr; i++) { diff --git a/submodule.h b/submodule.h index d9e197a..9454806 100644 --- a/submodule.h +++ b/submodule.h @@ -3,6 +3,7 @@ struct diff_options; struct argv_array; +struct sha1_array; enum { RECURSE_SUBMODULES_CHECK = -4, @@ -62,9 +63,9 @@ int submodule_uses_gitfile(const char *path); int ok_to_remove_submodule(const char *path); int merge_submodule(unsigned char result[20], const char *path, const unsigned char base[20], const unsigned char a[20], const unsigned char b[20], int search); -int find_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name, +int find_unpushed_submodules(struct sha1_array *commits, const char *remotes_name, struct string_list *needs_pushing); -int push_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name); +int push_unpushed_submodules(struct sha1_array *commits, const char *remotes_name); void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir); int parallel_submodules(void); diff --git a/transport.c b/transport.c index d57e8de..f482869 100644 --- a/transport.c +++ b/transport.c @@ -949,23 +949,36 @@ int transport_push(struct transport *transport, if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && !is_bare_repository()) { struct ref *ref = remote_refs; + struct sha1_array commits = SHA1_ARRAY_INIT; + for (; ref; ref = ref->next) - if (!is_null_oid(>new_oid) && -
[PATCH v3 4/4] submodule_needs_pushing() NEEDSWORK when we can not answer this question
Signed-off-by: Heiko Voigt--- submodule.c | 8 1 file changed, 8 insertions(+) diff --git a/submodule.c b/submodule.c index e1196fd..29efee9 100644 --- a/submodule.c +++ b/submodule.c @@ -531,6 +531,14 @@ static int submodule_has_commits(const char *path, struct sha1_array *commits) static int submodule_needs_pushing(const char *path, struct sha1_array *commits) { if (!submodule_has_commits(path, commits)) + /* NEEDSWORK: The correct answer here is "We do not +* know" instead of "No push needed". We currently +* proceed pushing here as if the submodules commits are +* available on a remote. Since we can not check the +* remote availability for this submodule we should +* consider changing this behavior to: Stop here and +* tell the user how to skip this check if wanted. +*/ return 0; if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) { -- 2.10.1.386.gc503e45
[PATCH v3 1/4] serialize collection of changed submodules
To check whether a submodule needs to be pushed we need to collect all changed submodules. Lets collect them first and then execute the possibly expensive test whether certain revisions are already pushed only once per submodule. There is further potential for optimization since we can assemble one command and only issued that instead of one call for each remote ref in the submodule. Signed-off-by: Heiko Voigt--- submodule.c | 60 1 file changed, 56 insertions(+), 4 deletions(-) diff --git a/submodule.c b/submodule.c index 6f7d883..b91585e 100644 --- a/submodule.c +++ b/submodule.c @@ -532,19 +532,34 @@ static int submodule_needs_pushing(const char *path, const unsigned char sha1[20 return 0; } +static struct sha1_array *submodule_commits(struct string_list *submodules, + const char *path) +{ + struct string_list_item *item; + + item = string_list_insert(submodules, path); + if (item->util) + return (struct sha1_array *) item->util; + + /* NEEDSWORK: should we have sha1_array_init()? */ + item->util = xcalloc(1, sizeof(struct sha1_array)); + return (struct sha1_array *) item->util; +} + static void collect_submodules_from_diff(struct diff_queue_struct *q, struct diff_options *options, void *data) { int i; - struct string_list *needs_pushing = data; + struct string_list *submodules = data; for (i = 0; i < q->nr; i++) { struct diff_filepair *p = q->queue[i]; + struct sha1_array *commits; if (!S_ISGITLINK(p->two->mode)) continue; - if (submodule_needs_pushing(p->two->path, p->two->oid.hash)) - string_list_insert(needs_pushing, p->two->path); + commits = submodule_commits(submodules, p->two->path); + sha1_array_append(commits, p->two->oid.hash); } } @@ -560,6 +575,31 @@ static void find_unpushed_submodule_commits(struct commit *commit, diff_tree_combined_merge(commit, 1, ); } +struct collect_submodule_from_sha1s_data { + char *submodule_path; + struct string_list *needs_pushing; +}; + +static int collect_submodules_from_sha1s(const unsigned char sha1[20], + void *data) +{ + struct collect_submodule_from_sha1s_data *me = + (struct collect_submodule_from_sha1s_data *) data; + + if (submodule_needs_pushing(me->submodule_path, sha1)) + string_list_insert(me->needs_pushing, me->submodule_path); + + return 0; +} + +static void free_submodules_sha1s(struct string_list *submodules) +{ + struct string_list_item *item; + for_each_string_list_item(item, submodules) + sha1_array_clear((struct sha1_array *) item->util); + string_list_clear(submodules, 1); +} + int find_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name, struct string_list *needs_pushing) { @@ -568,6 +608,8 @@ int find_unpushed_submodules(unsigned char new_sha1[20], const char *argv[] = {NULL, NULL, "--not", "NULL", NULL}; int argc = ARRAY_SIZE(argv) - 1; char *sha1_copy; + struct string_list submodules = STRING_LIST_INIT_DUP; + struct string_list_item *submodule; struct strbuf remotes_arg = STRBUF_INIT; @@ -581,12 +623,22 @@ int find_unpushed_submodules(unsigned char new_sha1[20], die("revision walk setup failed"); while ((commit = get_revision()) != NULL) - find_unpushed_submodule_commits(commit, needs_pushing); + find_unpushed_submodule_commits(commit, ); reset_revision_walk(); free(sha1_copy); strbuf_release(_arg); + for_each_string_list_item(submodule, ) { + struct collect_submodule_from_sha1s_data data; + data.submodule_path = submodule->string; + data.needs_pushing = needs_pushing; + sha1_array_for_each_unique((struct sha1_array *) submodule->util, + collect_submodules_from_sha1s, + ); + } + free_submodules_sha1s(); + return needs_pushing->nr; } -- 2.10.1.386.gc503e45
[PATCH v3 0/4] Speedup finding of unpushed submodules
You can find the second iteration of this series here: http://public-inbox.org/git/cover.1475851621.git.hvo...@hvoigt.net/ All mentioned issues should be fixed. I put the NEEDSWORK comment in a seperate patch since it seemed to me as if we did not fully agree on that. So in case we decide against it we can just drop that patch. Cheers Heiko Heiko Voigt (4): serialize collection of changed submodules serialize collection of refs that contain submodule changes batch check whether submodule needs pushing into one call submodule_needs_pushing() NEEDSWORK when we can not answer this question submodule.c | 120 +++- submodule.h | 5 ++- transport.c | 29 +++ 3 files changed, 118 insertions(+), 36 deletions(-) -- 2.10.1.386.gc503e45
Re: RFC: Enable delayed responses to Git clean/smudge filter requests
> On 15 Nov 2016, at 02:03, Eric Wongwrote: > > Lars Schneider wrote: >> Hi, >> >> Git always performs a clean/smudge filter on files in sequential order. >> Sometimes a filter operation can take a noticeable amount of time. >> This blocks the entire Git process. > > I have the same problem in many places which aren't git :> > >> I would like to give a filter process the possibility to answer Git with >> "I got your request, I am processing it, ask me for the result later!". >> >> I see the following way to realize this: >> >> In unpack-trees.c:check_updates() [1] we loop through the cache >> entries and "ask me later" could be an acceptable return value of the >> checkout_entry() call. The loop could run until all entries returned >> success or error. >> >> The filter machinery is triggered in various other places in Git and >> all places that want to support "ask me later" would need to be patched >> accordingly. > > That all sounds reasonable. > > The filter itself would need to be aware of parallelism > if it lives for multiple objects, right? Correct. This way Git doesn't need to deal with threading... >> Do you think this could be a viable approach? > > It'd probably require a bit of work, but yes, I think it's viable. > > We already do this with curl_multi requests for parallel > fetching from dumb HTTP servers, but that's driven by curl > internals operating with a select/poll loop. > > Perhaps the curl API could be a good example for doing this. Thanks for the pointer! >> Do you see a better way? > > Nope, I prefer non-blocking state machines to threads for > debuggability and determinism. Agreed! > Anyways, I'll plan on doing something similar (in Perl) with the > synchronous parts of public-inbox which relies on "cat-file --batch" > at some point... (my rotational disks are slw :<) That sounds interesting! What changes to you have in mind for "cat-file --batch"? We are thinking about performance improvements in that area, too. I would be happy to help reviewing your patches! Thanks a lot for your RFC feedback, Lars
Re: [PATCH v1 2/2] travis-ci: disable GIT_TEST_HTTPD for macOS
> On 15 Nov 2016, at 13:07, Heiko Voigtwrote: > > On Fri, Nov 11, 2016 at 09:22:51AM +0100, Lars Schneider wrote: >> To all macOS users on the list: >> Does anyone execute the tests with GIT_TEST_HTTPD enabled successfully? > > Nope. The following tests fail for me on master: 5539, 5540, 5541, 5542, > 5550, 5551, 5561, 5812. That's what I see, too. Apache needs to be configured in some special way to make them work and I was wondering if anyone has figured that out already for macOS... However, I much prefer Peff's idea to test against real world servers: http://public-inbox.org/git/2016092824.qqgrmhtkuw3wp...@sigill.intra.peff.net/ - Lars
Protecting old temporary objects being reused from concurrent "git gc"?
The Braid subproject management tool stores the subproject content in the main tree and is able to switch to a different upstream revision of a subproject by doing the equivalent of "git read-tree -m" on the superproject tree and the two upstream trees. The tricky part is preparing temporary trees with the upstream content moved to the path configured for the superproject. The usual method is "git read-tree --prefix", but using what index file? Braid currently uses the user's actual worktree, which can leave a mess if it gets interrupted: https://github.com/cristibalan/braid/blob/7d81da6e86e24de62a74f3ab8d880666cb343b04/lib/braid/commands/update.rb#L98 I want to change this to something that won't leave an inconsistent state if interrupted. I've written code for this kind of thing before that sets GIT_INDEX_FILE and uses a temporary index file and "git write-tree". But I realized that if "git gc" runs concurrently, the generated tree could be deleted before it is used and the tool would fail. If I had a need to run "git commit-tree", it seems like I might even end up with a commit object with a broken reference to a tree. "git gc" normally doesn't delete objects that were created in the last 2 weeks, but if an identical tree was added to the object database more than 2 weeks ago by another operation and is unreferenced, it could be reused without updating its mtime and it could still get deleted. Is there a recommended way to avoid this kind of problem in add-on tools? (I searched the Git documentation and the web for information about races with "git gc" and didn't find anything useful.) If not, it seems to be a significant design flaw in "git gc", even if the problem is extremely rare in practice. I wonder if some of the built-in commands may have the same problem, though I haven't tried to test them. If this is confirmed to be a known problem affecting built-in commands, then at least I won't feel bad about introducing the same problem into add-on tools. :/ Thanks, Matt
Re: git shortlog vs. stdin
Andreas Kreywrites: > Bug or feature? Documented feature, but you're holding it wrong ;) If no revisions are passed on the command line and either standard input is not a terminal or there is no current branch, git shortlog will output a summary of the log read from standard input, without reference to the current repository. (Note that you can use shortlog ala "git log --pretty=short | git shortlog") -- Christian Neukirchen http://chneukirchen.org
git shortlog vs. stdin
Hi all, I observed a strange an unexpected behaviour in 'git shortlog'. When in git.git: $ git shortlog -sn | wc 14414493 31477 but with input redirected: $ git shortlog -sn Date: Fri, 22 Jan 2010 07:29:21 -0800
New to git, need help!
Hi All, I'm new to open source and have recently joined this mailing list. Since I'm new at this, I think I can initially contribute to the community by fixing some small bugs or errors but as the documentation is too large, I don't know where to start. So if anybody could guide me on how to go about it, it would be really appreciated. Thanks, Mayank
Re: [PATCH v1 2/2] travis-ci: disable GIT_TEST_HTTPD for macOS
On Fri, Nov 11, 2016 at 09:22:51AM +0100, Lars Schneider wrote: > To all macOS users on the list: > Does anyone execute the tests with GIT_TEST_HTTPD enabled successfully? Nope. The following tests fail for me on master: 5539, 5540, 5541, 5542, 5550, 5551, 5561, 5812. Cheers Heiko
Re: Git status takes too long- How to improve the performance of git
On Tue, Nov 15, 2016 at 11:24 AM, Fredrik Gustafssonwrote: > On Tue, Nov 15, 2016 at 02:33:12AM -0700, ravalika wrote: [...] >> And I have experimented the following ways >> - - Setting core.ignorestat to true >> - - Git gc clean >> - - Shallow clone – Reducing number of commits >> - - Clone only one branch >> - Git repacking - git repack -ad && git prune >> - - Cold/warm cache >> >> Could you please let me know, what are the ways to improve the git >> performance ? >> I have gone through the mailing lists. > > You could always check the --assume-unchanged bit, see the manual page > for git update-index. However this is quite extreme and demanding for > the user. If you install a recent version version, you may be able to use the untracked cache feature. (See "core.untrackedCache" in the git config documentation and --untracked-cache in the git update-index documentation.)
Re: Git status takes too long- How to improve the performance of git
Hi, On Tue, Nov 15, 2016 at 02:33:12AM -0700, ravalika wrote: > We are using git-1.8.2 version for version control. That's a three (almost four) year old version of git. Your first test should be to see if an upgrade to a recent version will improve things. > It is an centralized server and git status takes too long A centralized server? How? git is designed to be runned locally. If you're running git on a network file system, the performance will suffer. Could you elaborate on how your environment is setup? > > How to improve the performance of git status > > Git repo details: > > Size of the .git folder is 8.9MB > Number of commits approx 53838 (git rev-list HEAD --count) > Number of branches - 330 > Number of files - 63883 > Working tree clone size is 4.3GB .git folder of 8.9 MEGABYTE and working tree of 4.3 GIGABYTE? Is this a typo? > > time git status shows > real 0m23.673s > user 0m9.432s > sys 0m3.793s > > then after 5 mins > real0m4.864s > user0m1.417s > sys 0m4.710s A slow disc and empty caches are slow. Two ways of improving this is to have faster discs or make sure your cache is up to date. When I'd a really slow disc, I'd my shell to run a git status in the background to load the cache everytime I started working on a project. This is however an ugly hack that wasn't approved to be a part of git. > > And I have experimented the following ways > - - Setting core.ignorestat to true > - - Git gc clean > - - Shallow clone – Reducing number of commits > - - Clone only one branch > - Git repacking - git repack -ad && git prune > - - Cold/warm cache > > Could you please let me know, what are the ways to improve the git > performance ? > I have gone through the mailing lists. You could always check the --assume-unchanged bit, see the manual page for git update-index. However this is quite extreme and demanding for the user. -- Fredrik Gustafsson phone: +46 733-608274 e-mail: iv...@iveqy.com website: http://www.iveqy.com
Git status takes too long- How to improve the performance of git
Hi All, We are using git-1.8.2 version for version control. It is an centralized server and git status takes too long How to improve the performance of git status Git repo details: Size of the .git folder is 8.9MB Number of commits approx 53838 (git rev-list HEAD --count) Number of branches - 330 Number of files - 63883 Working tree clone size is 4.3GB time git status shows real0m23.673s user0m9.432s sys 0m3.793s then after 5 mins real0m4.864s user0m1.417s sys 0m4.710s And I have experimented the following ways - - Setting core.ignorestat to true - - Git gc clean - - Shallow clone – Reducing number of commits - - Clone only one branch - Git repacking - git repack -ad && git prune - - Cold/warm cache Could you please let me know, what are the ways to improve the git performance ? I have gone through the mailing lists. Thank you, Renuka -- View this message in context: http://git.661346.n2.nabble.com/Git-status-takes-too-long-How-to-improve-the-performance-of-git-tp7657456.html Sent from the git mailing list archive at Nabble.com.