date:20180130

Re: [PATCH RFC 01/24] ref-filter: get rid of goto

2018-01-30 Thread Оля Тележная

2018-01-30 23:49 GMT+03:00 Junio C Hamano :
> Оля Тележная   writes:
>
>>> one place improves readability.  If better readability is the
>>> purpose, I would even say
>>>
>>>  for (i = 0; i < used_atom_cnt; i++) {
>>> if (...)
>>> -   goto need_obj;
>>> +   break;
>>> }
>>> -   return;
>>> +   if (used_atom_cnt <= i)
>>> return;
>>>
>>> -need_obj:
>>>
>>> would make the result easier to follow with a much less impact.
>>
>> It's hard for me to read the code with goto, and as I know, it's not
>> only my problem,...
>
> That sounds as if you are complaining "I wanted to get rid of goto
> and you tell me not to do so???", but read what I showed above again
> and notice that it is also getting rid of "goto".

No, I am not complaining. I tried to explain why I did everything that
way. Sorry if it was not clear enough.

>
> The main difference from your version is that the original function
> is still kept as a single unit of work, instead of two.

And I am not sure that it is good, the function is too big and it
actually does so many different separate pieces. If it is possible to
shorten long function by getting some separate logic (that's our case,
we do not request object until that final goto statement), I think
it's good idea and we need to do so and simplify future reading. But,
if you do not agree with this fact, please explain your position in
detail, and I will change that place as you want.

Thanks.

Re: [PATCH v2 06/14] commit-graph: implement git-commit-graph --read

2018-01-30 Thread Stefan Beller

> +static void free_commit_graph(struct commit_graph **g)
> +{
> +   if (!g || !*g)
> +   return;
> +
> +   close_commit_graph(*g);
> +
> +   free(*g);
> +   *g = NULL;

nit: You may want to use FREE_AND_NULL(*g) instead.

Re: [PATCH v2 02/14] graph: add commit graph design document

2018-01-30 Thread Stefan Beller

On Tue, Jan 30, 2018 at 1:39 PM, Derrick Stolee  wrote:
> Add Documentation/technical/commit-graph.txt with details of the planned
> commit graph feature, including future plans.
>
> Signed-off-by: Derrick Stolee 
> ---
>  Documentation/technical/commit-graph.txt | 189 
> +++
>  1 file changed, 189 insertions(+)
>  create mode 100644 Documentation/technical/commit-graph.txt
>
> diff --git a/Documentation/technical/commit-graph.txt 
> b/Documentation/technical/commit-graph.txt
> new file mode 100644
> index 00..cbf88f7264
> --- /dev/null
> +++ b/Documentation/technical/commit-graph.txt
> @@ -0,0 +1,189 @@
> +Git Commit Graph Design Notes
> +=
> +
> +Git walks the commit graph for many reasons, including:
> +
> +1. Listing and filtering commit history.
> +2. Computing merge bases.
> +
> +These operations can become slow as the commit count grows. The merge
> +base calculation shows up in many user-facing commands, such as 'merge-base'
> +or 'git show --remerge-diff' and can take minutes to compute depending on
> +history shape.

Sorry for appearing more authoritative than I am here. The --remerge-diff flag
is just floating around the mailing list, and was never merged. (It is
such a cool
feature though, but it would actually confuse users looking for it,
not finding it)


> +There are two main costs here:
> +
> +1. Decompressing and parsing commits.
> +2. Walking the entire graph to avoid topological order mistakes.
> +
> +The commit graph file is a supplemental data structure that accelerates
> +commit graph walks. If a user downgrades or disables the 'core.commitgraph'
> +config setting, then the existing ODB is sufficient. The file is stored
> +next to packfiles either in the .git/objects/pack directory or in the pack
> +directory of an alternate.
> +
> +The commit graph file stores the commit graph structure along with some
> +extra metadata to speed up graph walks. By listing commit OIDs in lexi-
> +cographic order, we can identify an integer position for each commit and
> +refer to the parents of a commit using those integer positions. We use
> +binary search to find initial commits and then use the integer positions
> +for fast lookups during the walk.
> +
> +A consumer may load the following info for a commit from the graph:
> +
> +1. The commit OID.
> +2. The list of parents, along with their integer position.
> +3. The commit date.
> +4. The root tree OID.
> +5. The generation number (see definition below).
> +
> +Values 1-4 satisfy the requirements of parse_commit_gently().
> +
> +Define the "generation number" of a commit recursively as follows:
> +
> + * A commit with no parents (a root commit) has generation number one.
> +
> + * A commit with at least one parent has generation number one more than
> +   the largest generation number among its parents.
> +
> +Equivalently, the generation number of a commit A is one more than the
> +length of a longest path from A to a root commit. The recursive definition
> +is easier to use for computation and observing the following property:
> +
> +If A and B are commits with generation numbers N and M, respectively,
> +and N <= M, then A cannot reach B. That is, we know without searching
> +that B is not an ancestor of A because it is further from a root commit
> +than A.
> +
> +Conversely, when checking if A is an ancestor of B, then we only need
> +to walk commits until all commits on the walk boundary have generation
> +number at most N. If we walk commits using a priority queue seeded by
> +generation numbers, then we always expand the boundary commit with 
> highest
> +generation number and can easily detect the stopping condition.
> +
> +This property can be used to significantly reduce the time it takes to
> +walk commits and determine topological relationships. Without generation
> +numbers, the general heuristic is the following:
> +
> +If A and B are commits with commit time X and Y, respectively, and
> +X < Y, then A _probably_ cannot reach B.
> +
> +This heuristic is currently used whenever the computation can make
> +mistakes with topological orders (such as "git log" with default order),
> +but is not used when the topological order is required (such as merge
> +base calculations, "git log --graph").
> +
> +In practice, we expect some commits to be created recently and not stored
> +in the commit graph. We can treat these commits as having "infinite"
> +generation number and walk until reaching commits with known generation
> +number.
> +
> +Design Details
> +--
> +
> +- A graph file is stored in a file named 'graph-.graph' in the pack
> +  directory. This could be stored in an alternate.
> +
> +- The most-recent graph file hash is stored in a 'graph-head' file for
> +  immediate access and storing backup graphs. This could be stored in an
> +  alternate, and refers to a

[ANNOUNCE] tig-2.3.3

2018-01-30 Thread Jonas Fonseca

Hello,

A regression in 2.3.1 (and 2.3.2) related with the detection of busy loops has
been revisited in version 2.3.3.

Release notes
-

Bug fixes:

 - Revert "Handle \n like \r (#758)". (GH #769)
 - Fix GH #164 by catching SIGHUP.
 - Change `refs_tags` type to `size_t`.

Change summary
--
The diffstat and log summary for changes made in this release.

 INSTALL.adoc  |  4 ++--
 Makefile  |  2 +-
 NEWS.adoc |  9 +
 README.adoc   |  2 +-
 src/display.c | 40 
 src/refdb.c   |  2 +-
 src/tig.c | 12 
 tools/aspell.dict |  2 +-
 8 files changed, 27 insertions(+), 46 deletions(-)

Alexander Droste (1):
  Revert "Handle \n like \r (#758)" (#769)

Jonas Fonseca (3):
  Fix #164 by catching SIGHUP
  Change refs_tags type to size_t
  tig-2.3.3

harshavardhan (1):
  updated https to https (#777)

-- 
Jonas Fonseca

Re: [PATCH 00/37] removal of some c++ keywords

2018-01-30 Thread Duy Nguyen

On Wed, Jan 31, 2018 at 7:57 AM, Stefan Beller  wrote:
>> There's also C99 designator in builtin/clean.c (I thought we avoided
>> C99, I can start using this specific feature more now :D)
>
> That was a test balloon? See 512f41cfac
> (clean.c: use designated initializer, 2017-07-14)

Aww.. I thought it was in there since forever and it should be safe to
use now...

> One of the big advantages would be stricter type checking, such as
> signed/unsigned confusion, that we occasionally have.
> e.g. 61d36330b4 (prefer "!=" when checking read_in_full()
> result, 2017-09-27) or what is referenced from there 561598cfcf
> (read_pack_header: handle signed/unsigned comparison in read result,
> 2017-09-13).

We can do that even with C (at least with gcc and I guess clang as
well). The problem is it looks so bad right now that I have to turn it
off with -Wno-sign-compare

> The bugs resulting in these patches could have been caught more easily
> with C++ checking IMHO.
-- 
Duy

Re: [RFC PATCH 0/2] alternate hash test

2018-01-30 Thread Stefan Beller

On Sun, Jan 28, 2018 at 9:06 AM, brian m. carlson
 wrote:
> This series wires up an alternate hash implementation, namely
> BLAKE2b-160.  The goal is to allow us to identify tests which rely on
> the hash algorithm in use so that we can fix those tests.
>
> For this test, I picked BLAKE2b-160 for a couple reasons:
> * Debian ships a libb2-1 package which can be used easily (in other
>   words, I was lazy and didn't want to add a crypto implementation just
>   for test purposes);
> * The API of the libb2 package easily allows arbitrary hash lengths, so
>   I didn't have to manage truncation myself;
> * Our codebase isn't yet ready for a hash function larger than 20 bytes,
>   as there's still more work to do on the object_id conversions.
>
> This isn't an endorsement for or against any particular algorithm
> choice, just an artifact of the tools that were easily available to me.
> Provoking discussion of which hash to pick for NewHash is explicitly
> *not* a goal for this series.  I'm only interested in the ability to
> identify and fix tests.
>
> The first patch does no feature detection and just assumes you have
> libb2 installed.  For obvious reasons, this series is not meant for
> production use.

Thanks for writing this, I chose a slightly different approach at
https://public-inbox.org/git/20170728171817.21458-2-sbel...@google.com/
which might be quicker for local testing.

Thanks for bringing this discussion back on the list,
Stefan

Re: [PATCH 00/37] removal of some c++ keywords

2018-01-30 Thread Stefan Beller

On Tue, Jan 30, 2018 at 4:48 PM, Duy Nguyen  wrote:
> On Wed, Jan 31, 2018 at 6:01 AM, Stefan Beller  wrote:
>> On Tue, Jan 30, 2018 at 2:36 PM, Junio C Hamano  wrote:
>>> Duy Nguyen  writes:
>>>
 Is it simpler (though hacky) to just  do

 #ifdef __cplusplus
 #define new not_new
 #define try really_try
 ...

 somewhere in git-compat-util.h?
>>>
>>> Very tempting, especially given that your approach automatically
>>> would cover topics in flight without any merge conflict ;-)
>>>
>>> I agree that it is hacky and somewhat ugly, but the hackiness
>>> somehow does not bother me too much in this case; perhaps because
>>> attempting to use a C++ compiler may already be hacky in the first
>>> place?
>>>
>>> It probably depends on the reason why we are doing this topic.  If a
>>> report about our source code coming from the C++ oriented tool cite
>>> the symbol names seen by machines, then the "hacky" approach will
>>> give us "not_new" where Brandon's patch may give us "new_oid", or
>>> whatever symbol that is more appropriate for the context it appears
>>> than such an automated cute name.
>
> Well. the world after post processing is always ugly. But we could try
> "#define new new__" to get the not so ugly names. new_oid is
> definitely better regardless of c/c++ though so I could see that as a
> good cleanup.
>
 Do we use any C features that are incompatible with C++? (or do we not
 need to care?)
>>>
>>> Good question.
>>
>> implicit casts from void?
>> e.g. xmalloc returns a void pointer, not the type requested.
>> https://embeddedartistry.com/blog/2017/2/28/c-casting-or-oh-no-we-broke-malloc
>
> That causes lots of warnings but not errors (I bit the bullet and
> tried to compile git with g++).

And for g++ there is a flag to disable this specific set of warnings.
I think the value of using C++ analysis tools is in the LLVM/clang
world, not GNU.

> The next set of changes would be to
> reorganize nested enum/struct declarations. Even if nested, C
> considers these flat while C++ sees them in namespaces. There's some
> warnings about confusion with the new cool feature string literals,
> but that's easy to fix.
>
> There's also C99 designator in builtin/clean.c (I thought we avoided
> C99, I can start using this specific feature more now :D)

That was a test balloon? See 512f41cfac
(clean.c: use designated initializer, 2017-07-14)

One of the big advantages would be stricter type checking, such as
signed/unsigned confusion, that we occasionally have.
e.g. 61d36330b4 (prefer "!=" when checking read_in_full()
result, 2017-09-27) or what is referenced from there 561598cfcf
(read_pack_header: handle signed/unsigned comparison in read result,
2017-09-13).

The bugs resulting in these patches could have been caught more easily
with C++ checking IMHO.

> I was stuck at the thread_local thing in index-pack.c and gave up. So
> I don't know what else we would need to change.

Thanks for experimenting!
Stefan

> --
> Duy

Re: [PATCH 00/37] removal of some c++ keywords

2018-01-30 Thread Duy Nguyen

On Wed, Jan 31, 2018 at 6:01 AM, Stefan Beller  wrote:
> On Tue, Jan 30, 2018 at 2:36 PM, Junio C Hamano  wrote:
>> Duy Nguyen  writes:
>>
>>> Is it simpler (though hacky) to just  do
>>>
>>> #ifdef __cplusplus
>>> #define new not_new
>>> #define try really_try
>>> ...
>>>
>>> somewhere in git-compat-util.h?
>>
>> Very tempting, especially given that your approach automatically
>> would cover topics in flight without any merge conflict ;-)
>>
>> I agree that it is hacky and somewhat ugly, but the hackiness
>> somehow does not bother me too much in this case; perhaps because
>> attempting to use a C++ compiler may already be hacky in the first
>> place?
>>
>> It probably depends on the reason why we are doing this topic.  If a
>> report about our source code coming from the C++ oriented tool cite
>> the symbol names seen by machines, then the "hacky" approach will
>> give us "not_new" where Brandon's patch may give us "new_oid", or
>> whatever symbol that is more appropriate for the context it appears
>> than such an automated cute name.

Well. the world after post processing is always ugly. But we could try
"#define new new__" to get the not so ugly names. new_oid is
definitely better regardless of c/c++ though so I could see that as a
good cleanup.

>>> Do we use any C features that are incompatible with C++? (or do we not
>>> need to care?)
>>
>> Good question.
>
> implicit casts from void?
> e.g. xmalloc returns a void pointer, not the type requested.
> https://embeddedartistry.com/blog/2017/2/28/c-casting-or-oh-no-we-broke-malloc

That causes lots of warnings but not errors (I bit the bullet and
tried to compile git with g++). The next set of changes would be to
reorganize nested enum/struct declarations. Even if nested, C
considers these flat while C++ sees them in namespaces. There's some
warnings about confusion with the new cool feature string literals,
but that's easy to fix.

There's also C99 designator in builtin/clean.c (I thought we avoided
C99, I can start using this specific feature more now :D)

I was stuck at the thread_local thing in index-pack.c and gave up. So
I don't know what else we would need to change.
-- 
Duy

[PATCH v7 26/31] merge-recursive: avoid clobbering untracked files with directory renames

2018-01-30 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 merge-recursive.c   | 42 +++--
 t/t6043-merge-rename-directories.sh |  6 +++---
 2 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 7c78dc2dc1..39e161e094 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1147,6 +1147,26 @@ static int conflict_rename_dir(struct merge_options *o,
 {
const struct diff_filespec *dest = pair->two;
 
+   if (!o->call_depth && would_lose_untracked(dest->path)) {
+   char *alt_path = unique_path(o, dest->path, rename_branch);
+
+   output(o, 1, _("Error: Refusing to lose untracked file at %s; "
+  "writing to %s instead."),
+  dest->path, alt_path);
+   /*
+* Write the file in worktree at alt_path, but not in the
+* index.  Instead, write to dest->path for the index but
+* only at the higher appropriate stage.
+*/
+   if (update_file(o, 0, >oid, dest->mode, alt_path))
+   return -1;
+   free(alt_path);
+   return update_stages(o, dest->path, NULL,
+rename_branch == o->branch1 ? dest : NULL,
+rename_branch == o->branch1 ? NULL : dest);
+   }
+
+   /* Update dest->path both in index and in worktree */
if (update_file(o, 1, >oid, dest->mode, dest->path))
return -1;
return 0;
@@ -1165,7 +1185,8 @@ static int handle_change_delete(struct merge_options *o,
const char *update_path = path;
int ret = 0;
 
-   if (dir_in_way(path, !o->call_depth, 0)) {
+   if (dir_in_way(path, !o->call_depth, 0) ||
+   (!o->call_depth && would_lose_untracked(path))) {
update_path = alt_path = unique_path(o, path, change_branch);
}
 
@@ -1291,6 +1312,12 @@ static int handle_file(struct merge_options *o,
dst_name = unique_path(o, rename->path, cur_branch);
output(o, 1, _("%s is a directory in %s adding as %s 
instead"),
   rename->path, other_branch, dst_name);
+   } else if (!o->call_depth &&
+  would_lose_untracked(rename->path)) {
+   dst_name = unique_path(o, rename->path, cur_branch);
+   output(o, 1, _("Refusing to lose untracked file at %s; "
+  "adding as %s instead"),
+  rename->path, dst_name);
}
}
if ((ret = update_file(o, 0, >oid, rename->mode, dst_name)))
@@ -1416,7 +1443,18 @@ static int conflict_rename_rename_2to1(struct 
merge_options *o,
char *new_path2 = unique_path(o, path, ci->branch2);
output(o, 1, _("Renaming %s to %s and %s to %s instead"),
   a->path, new_path1, b->path, new_path2);
-   remove_file(o, 0, path, 0);
+   if (would_lose_untracked(path))
+   /*
+* Only way we get here is if both renames were from
+* a directory rename AND user had an untracked file
+* at the location where both files end up after the
+* two directory renames.  See testcase 10d of t6043.
+*/
+   output(o, 1, _("Refusing to lose untracked file at "
+  "%s, even though it's in the way."),
+  path);
+   else
+   remove_file(o, 0, path, 0);
ret = update_file(o, 0, _c1.oid, mfi_c1.mode, new_path1);
if (!ret)
ret = update_file(o, 0, _c2.oid, mfi_c2.mode,
diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index b6207763cf..abb5b20f6b 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -2984,7 +2984,7 @@ test_expect_success '10b-setup: Overwrite untracked with 
dir rename + delete' '
)
 '
 
-test_expect_failure '10b-check: Overwrite untracked with dir rename + delete' '
+test_expect_success '10b-check: Overwrite untracked with dir rename + delete' '
(
cd 10b &&
 
@@ -3062,7 +3062,7 @@ test_expect_success '10c-setup: Overwrite untracked with 
dir rename/rename(1to2)
)
 '
 
-test_expect_failure '10c-check: Overwrite untracked with dir 
rename/rename(1to2)' '
+test_expect_success '10c-check: Overwrite untracked with dir 
rename/rename(1to2)' '
(
cd 10c &&
 
@@ -3137,7 +3137,7 @@ test_expect_success '10d-setup: Delete untracked with dir 
rename/rename(2to1)' '
)
 '
 
-test_expect_failure '10d-check: Delete

[PATCH v7 12/31] merge-recursive: move the get_renames() function

2018-01-30 Thread Elijah Newren

I want to re-use some other functions in the file without moving those
other functions or dealing with a handful of annoying split function
declarations and definitions.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 139 +++---
 1 file changed, 70 insertions(+), 69 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 9d53f30111..2028dd113b 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -537,75 +537,6 @@ struct rename {
unsigned processed:1;
 };
 
-/*
- * Get information of all renames which occurred between 'o_tree' and
- * 'tree'. We need the three trees in the merge ('o_tree', 'a_tree' and
- * 'b_tree') to be able to associate the correct cache entries with
- * the rename information. 'tree' is always equal to either a_tree or b_tree.
- */
-static struct string_list *get_renames(struct merge_options *o,
-  struct tree *tree,
-  struct tree *o_tree,
-  struct tree *a_tree,
-  struct tree *b_tree,
-  struct string_list *entries)
-{
-   int i;
-   struct string_list *renames;
-   struct diff_options opts;
-
-   renames = xcalloc(1, sizeof(struct string_list));
-   if (!o->detect_rename)
-   return renames;
-
-   diff_setup();
-   opts.flags.recursive = 1;
-   opts.flags.rename_empty = 0;
-   opts.detect_rename = DIFF_DETECT_RENAME;
-   opts.rename_limit = o->merge_rename_limit >= 0 ? o->merge_rename_limit :
-   o->diff_rename_limit >= 0 ? o->diff_rename_limit :
-   1000;
-   opts.rename_score = o->rename_score;
-   opts.show_rename_progress = o->show_rename_progress;
-   opts.output_format = DIFF_FORMAT_NO_OUTPUT;
-   diff_setup_done();
-   diff_tree_oid(_tree->object.oid, >object.oid, "", );
-   diffcore_std();
-   if (opts.needed_rename_limit > o->needed_rename_limit)
-   o->needed_rename_limit = opts.needed_rename_limit;
-   for (i = 0; i < diff_queued_diff.nr; ++i) {
-   struct string_list_item *item;
-   struct rename *re;
-   struct diff_filepair *pair = diff_queued_diff.queue[i];
-   if (pair->status != 'R') {
-   diff_free_filepair(pair);
-   continue;
-   }
-   re = xmalloc(sizeof(*re));
-   re->processed = 0;
-   re->pair = pair;
-   item = string_list_lookup(entries, re->pair->one->path);
-   if (!item)
-   re->src_entry = insert_stage_data(re->pair->one->path,
-   o_tree, a_tree, b_tree, entries);
-   else
-   re->src_entry = item->util;
-
-   item = string_list_lookup(entries, re->pair->two->path);
-   if (!item)
-   re->dst_entry = insert_stage_data(re->pair->two->path,
-   o_tree, a_tree, b_tree, entries);
-   else
-   re->dst_entry = item->util;
-   item = string_list_insert(renames, pair->one->path);
-   item->util = re;
-   }
-   opts.output_format = DIFF_FORMAT_NO_OUTPUT;
-   diff_queued_diff.nr = 0;
-   diff_flush();
-   return renames;
-}
-
 static int update_stages(struct merge_options *opt, const char *path,
 const struct diff_filespec *o,
 const struct diff_filespec *a,
@@ -1389,6 +1320,76 @@ static int conflict_rename_rename_2to1(struct 
merge_options *o,
return ret;
 }
 
+/*
+ * Get information of all renames which occurred between 'o_tree' and
+ * 'tree'. We need the three trees in the merge ('o_tree', 'a_tree' and
+ * 'b_tree') to be able to associate the correct cache entries with
+ * the rename information. 'tree' is always equal to either a_tree or b_tree.
+ */
+static struct string_list *get_renames(struct merge_options *o,
+  struct tree *tree,
+  struct tree *o_tree,
+  struct tree *a_tree,
+  struct tree *b_tree,
+  struct string_list *entries)
+{
+   int i;
+   struct string_list *renames;
+   struct diff_options opts;
+
+   renames = xcalloc(1, sizeof(struct string_list));
+   if (!o->detect_rename)
+   return renames;
+
+   diff_setup();
+   opts.flags.recursive = 1;
+   opts.flags.rename_empty = 0;
+   opts.detect_rename = DIFF_DETECT_RENAME;
+   opts.rename_limit = o->merge_rename_limit >= 0 ? o->merge_rename_limit :
+

[PATCH v7 09/31] directory rename detection: miscellaneous testcases to complete coverage

2018-01-30 Thread Elijah Newren

I came up with the testcases in the first eight sections before coding up
the implementation.  The testcases in this section were mostly ones I
thought of while coding/debugging, and which I was too lazy to insert
into the previous sections because I didn't want to re-label with all the
testcase references.  :-)

Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 565 +++-
 1 file changed, 564 insertions(+), 1 deletion(-)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index e2db5d0ac1..b730256653 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -305,6 +305,7 @@ test_expect_failure '1d-check: Directory renames cause a 
rename/rename(2to1) con
 '
 
 # Testcase 1e, Renamed directory, with all filenames being renamed too
+#   (Related to testcases 9f & 9g)
 #   Commit O: z/{oldb,oldc}
 #   Commit A: y/{newb,newc}
 #   Commit B: z/{oldb,oldc,d}
@@ -593,7 +594,7 @@ test_expect_success '2b-check: Directory split into two on 
one side, with equal
 ###
 
 # Testcase 3a, Avoid implicit rename if involved as source on other side
-#   (Related to testcases 1c and 1f)
+#   (Related to testcases 1c, 1f, and 9h)
 #   Commit O: z/{b,c,d}
 #   Commit A: z/{b,c,d} (no change)
 #   Commit B: y/{b,c}, x/d
@@ -2308,4 +2309,566 @@ test_expect_failure '8e-check: Both sides rename, one 
side adds to original dire
)
 '
 
+###
+# SECTION 9: Other testcases
+#
+# This section consists of miscellaneous testcases I thought of during
+# the implementation which round out the testing.
+###
+
+# Testcase 9a, Inner renamed directory within outer renamed directory
+#   (Related to testcase 1f)
+#   Commit O: z/{b,c,d/{e,f,g}}
+#   Commit A: y/{b,c}, x/w/{e,f,g}
+#   Commit B: z/{b,c,d/{e,f,g,h},i}
+#   Expected: y/{b,c,i}, x/w/{e,f,g,h}
+#   NOTE: The only reason this one is interesting is because when a directory
+# is split into multiple other directories, we determine by the weight
+# of which one had the most paths going to it.  A naive implementation
+# of that could take the new file in commit B at z/i to x/w/i or x/i.
+
+test_expect_success '9a-setup: Inner renamed directory within outer renamed 
directory' '
+   test_create_repo 9a &&
+   (
+   cd 9a &&
+
+   mkdir -p z/d &&
+   echo b >z/b &&
+   echo c >z/c &&
+   echo e >z/d/e &&
+   echo f >z/d/f &&
+   echo g >z/d/g &&
+   git add z &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   mkdir x &&
+   git mv z/d x/w &&
+   git mv z y &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   echo h >z/d/h &&
+   echo i >z/i &&
+   git add z &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_failure '9a-check: Inner renamed directory within outer renamed 
directory' '
+   (
+   cd 9a &&
+
+   git checkout A^0 &&
+
+   git merge -s recursive B^0 &&
+
+   git ls-files -s >out &&
+   test_line_count = 7 out &&
+   git ls-files -u >out &&
+   test_line_count = 0 out &&
+   git ls-files -o >out &&
+   test_line_count = 1 out &&
+
+   git rev-parse >actual \
+   HEAD:y/b HEAD:y/c HEAD:y/i &&
+   git rev-parse >expect \
+   O:z/bO:z/cB:z/i &&
+   test_cmp expect actual &&
+
+   git rev-parse >actual \
+   HEAD:x/w/e HEAD:x/w/f HEAD:x/w/g HEAD:x/w/h &&
+   git rev-parse >expect \
+   O:z/d/eO:z/d/fO:z/d/gB:z/d/h &&
+   test_cmp expect actual
+   )
+'
+
+# Testcase 9b, Transitive rename with content merge
+#   (Related to testcase 1c)
+#   Commit O: z/{b,c},   x/d_1
+#   Commit A: y/{b,c},   x/d_2
+#   Commit B: z/{b,c,d_3}
+#   Expected: y/{b,c,d_merged}
+
+test_expect_success '9b-setup: Transitive rename with content merge' '
+   test_create_repo 9b &&
+   (
+   cd 9b &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   mkdir x &&
+   test_seq 1 10 >x/d &&
+   git add z x &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+

[PATCH v7 07/31] directory rename detection: more involved edge/corner testcases

2018-01-30 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 396 
 1 file changed, 396 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index fbeb8f4316..68bd86f555 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -1508,4 +1508,400 @@ test_expect_success '6e-check: Add/add from one side' '
 #   side of history is the one doing the renaming.
 ###
 
+
+###
+# SECTION 7: More involved Edge/Corner cases
+#
+# The ruleset we have generated in the above sections seems to provide
+# well-defined merges.  But can we find edge/corner cases that either (a)
+# are harder for users to understand, or (b) have a resolution that is
+# non-intuitive or suboptimal?
+#
+# The testcases in this section dive into cases that I've tried to craft in
+# a way to find some that might be surprising to users or difficult for
+# them to understand (the next section will look at non-intuitive or
+# suboptimal merge results).  Some of the testcases are similar to ones
+# from past sections, but have been simplified to try to highlight error
+# messages using a "modified" path (due to the directory rename).  Are
+# users okay with these?
+#
+# In my opinion, testcases that are difficult to understand from this
+# section is due to difficulty in the testcase rather than the directory
+# renaming (similar to how t6042 and t6036 have difficult resolutions due
+# to the problem setup itself being complex).  And I don't think the
+# error messages are a problem.
+#
+# On the other hand, the testcases in section 8 worry me slightly more...
+###
+
+# Testcase 7a, rename-dir vs. rename-dir (NOT split evenly) PLUS add-other-file
+#   Commit O: z/{b,c}
+#   Commit A: y/{b,c}
+#   Commit B: w/b, x/c, z/d
+#   Expected: y/d, CONFLICT(rename/rename for both z/b and z/c)
+#   NOTE: There's a rename of z/ here, y/ has more renames, so z/d -> y/d.
+
+test_expect_success '7a-setup: rename-dir vs. rename-dir (NOT split evenly) 
PLUS add-other-file' '
+   test_create_repo 7a &&
+   (
+   cd 7a &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   git add z &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   git mv z y &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   mkdir w &&
+   mkdir x &&
+   git mv z/b w/ &&
+   git mv z/c x/ &&
+   echo d > z/d &&
+   git add z/d &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_failure '7a-check: rename-dir vs. rename-dir (NOT split evenly) 
PLUS add-other-file' '
+   (
+   cd 7a &&
+
+   git checkout A^0 &&
+
+   test_must_fail git merge -s recursive B^0 >out &&
+   test_i18ngrep "CONFLICT (rename/rename).*z/b.*y/b.*w/b" out &&
+   test_i18ngrep "CONFLICT (rename/rename).*z/c.*y/c.*x/c" out &&
+
+   git ls-files -s >out &&
+   test_line_count = 7 out &&
+   git ls-files -u >out &&
+   test_line_count = 6 out &&
+   git ls-files -o >out &&
+   test_line_count = 1 out &&
+
+   git rev-parse >actual \
+   :1:z/b :2:y/b :3:w/b :1:z/c :2:y/c :3:x/c :0:y/d &&
+   git rev-parse >expect \
+O:z/b  O:z/b  O:z/b  O:z/c  O:z/c  O:z/c  B:z/d &&
+   test_cmp expect actual &&
+
+   git hash-object >actual \
+   y/b   w/b   y/c   x/c &&
+   git rev-parse >expect \
+   O:z/b O:z/b O:z/c O:z/c &&
+   test_cmp expect actual
+   )
+'
+
+# Testcase 7b, rename/rename(2to1), but only due to transitive rename
+#   (Related to testcase 1d)
+#   Commit O: z/{b,c}, x/d_1, w/d_2
+#   Commit A: y/{b,c,d_2}, x/d_1
+#   Commit B: z/{b,c,d_1},w/d_2
+#   Expected: y/{b,c}, CONFLICT(rename/rename(2to1): x/d_1, w/d_2 -> y_d)
+
+test_expect_success '7b-setup: rename/rename(2to1), but only due to transitive 
rename' '
+   test_create_repo 7b &&
+   (
+   cd 7b &&
+
+   mkdir z &&
+   mkdir x &&
+   mkdir w &&
+   echo b >z/b &&
+   echo c >z/c &&
+   echo d1 > x/d &&
+   echo d2 > w/d &&
+   git add z x w &&
+   test_tick &&
+

[PATCH v7 08/31] directory rename detection: testcases exploring possibly suboptimal merges

2018-01-30 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 404 
 1 file changed, 404 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index 68bd86f555..e2db5d0ac1 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -1904,4 +1904,408 @@ test_expect_failure '7e-check: transitive rename in 
rename/delete AND dirs in th
)
 '
 
+###
+# SECTION 8: Suboptimal merges
+#
+# As alluded to in the last section, the ruleset we have built up for
+# detecting directory renames unfortunately has some special cases where it
+# results in slightly suboptimal or non-intuitive behavior.  This section
+# explores these cases.
+#
+# To be fair, we already had non-intuitive or suboptimal behavior for most
+# of these cases in git before introducing implicit directory rename
+# detection, but it'd be nice if there was a modified ruleset out there
+# that handled these cases a bit better.
+###
+
+# Testcase 8a, Dual-directory rename, one into the others' way
+#   Commit O. x/{a,b},   y/{c,d}
+#   Commit A. x/{a,b,e}, y/{c,d,f}
+#   Commit B. y/{a,b},   z/{c,d}
+#
+# Possible Resolutions:
+#   w/o dir-rename detection: y/{a,b,f},   z/{c,d},   x/e
+#   Currently expected:   y/{a,b,e,f}, z/{c,d}
+#   Optimal:  y/{a,b,e},   z/{c,d,f}
+#
+# Note: Both x and y got renamed and it'd be nice to detect both, and we do
+# better with directory rename detection than git did without, but the
+# simple rule from section 5 prevents me from handling this as optimally as
+# we potentially could.
+
+test_expect_success '8a-setup: Dual-directory rename, one into the others way' 
'
+   test_create_repo 8a &&
+   (
+   cd 8a &&
+
+   mkdir x &&
+   mkdir y &&
+   echo a >x/a &&
+   echo b >x/b &&
+   echo c >y/c &&
+   echo d >y/d &&
+   git add x y &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   echo e >x/e &&
+   echo f >y/f &&
+   git add x/e y/f &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   git mv y z &&
+   git mv x y &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_failure '8a-check: Dual-directory rename, one into the others way' 
'
+   (
+   cd 8a &&
+
+   git checkout A^0 &&
+
+   git merge -s recursive B^0 &&
+
+   git ls-files -s >out &&
+   test_line_count = 6 out &&
+   git ls-files -u >out &&
+   test_line_count = 0 out &&
+   git ls-files -o >out &&
+   test_line_count = 1 out &&
+
+   git rev-parse >actual \
+   HEAD:y/a HEAD:y/b HEAD:y/e HEAD:y/f HEAD:z/c HEAD:z/d &&
+   git rev-parse >expect \
+   O:x/aO:x/bA:x/eA:y/fO:y/cO:y/d &&
+   test_cmp expect actual
+   )
+'
+
+# Testcase 8b, Dual-directory rename, one into the others' way, with 
conflicting filenames
+#   Commit O. x/{a_1,b_1}, y/{a_2,b_2}
+#   Commit A. x/{a_1,b_1,e_1}, y/{a_2,b_2,e_2}
+#   Commit B. y/{a_1,b_1}, z/{a_2,b_2}
+#
+#   w/o dir-rename detection: y/{a_1,b_1,e_2}, z/{a_2,b_2}, x/e_1
+#   Currently expected:   
+#   Scary:y/{a_1,b_1}, z/{a_2,b_2}, CONFLICT(add/add, 
e_1 vs. e_2)
+#   Optimal:  y/{a_1,b_1,e_1}, z/{a_2,b_2,e_2}
+#
+# Note: Very similar to 8a, except instead of 'e' and 'f' in directories x and
+# y, both are named 'e'.  Without directory rename detection, neither file
+# moves directories.  Implement directory rename detection suboptimally, and
+# you get an add/add conflict, but both files were added in commit A, so this
+# is an add/add conflict where one side of history added both files --
+# something we can't represent in the index.  Obviously, we'd prefer the last
+# resolution, but our previous rules are too coarse to allow it.  Using both
+# the rules from section 4 and section 5 save us from the Scary resolution,
+# making us fall back to pre-directory-rename-detection behavior for both
+# e_1 and e_2.
+
+test_expect_success '8b-setup: Dual-directory rename, one into the others way, 
with conflicting filenames' '
+   test_create_repo 8b &&
+   (
+   cd 8b &&
+
+   mkdir x &&
+   mkdir y &&
+   echo a1 >x/a &&
+   echo b1 >x/b &&
+   echo a2 >y/a &&
+

[PATCH v7 27/31] merge-recursive: fix overwriting dirty files involved in renames

2018-01-30 Thread Elijah Newren

This fixes an issue that existed before my directory rename detection
patches that affects both normal renames and renames implied by
directory rename detection.  Additional codepaths that only affect
overwriting of directy files that are involved in directory rename
detection will be added in a subsequent commit.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c   | 85 -
 merge-recursive.h   |  2 +
 t/t3501-revert-cherry-pick.sh   |  2 +-
 t/t6043-merge-rename-directories.sh |  2 +-
 t/t7607-merge-overwrite.sh  |  2 +-
 unpack-trees.c  |  4 +-
 unpack-trees.h  |  4 ++
 7 files changed, 77 insertions(+), 24 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 39e161e094..fba1a0d207 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -334,32 +334,37 @@ static void init_tree_desc_from_tree(struct tree_desc 
*desc, struct tree *tree)
init_tree_desc(desc, tree->buffer, tree->size);
 }
 
-static int git_merge_trees(int index_only,
+static int git_merge_trees(struct merge_options *o,
   struct tree *common,
   struct tree *head,
   struct tree *merge)
 {
int rc;
struct tree_desc t[3];
-   struct unpack_trees_options opts;
 
-   memset(, 0, sizeof(opts));
-   if (index_only)
-   opts.index_only = 1;
+   memset(>unpack_opts, 0, sizeof(o->unpack_opts));
+   if (o->call_depth)
+   o->unpack_opts.index_only = 1;
else
-   opts.update = 1;
-   opts.merge = 1;
-   opts.head_idx = 2;
-   opts.fn = threeway_merge;
-   opts.src_index = _index;
-   opts.dst_index = _index;
-   setup_unpack_trees_porcelain(, "merge");
+   o->unpack_opts.update = 1;
+   o->unpack_opts.merge = 1;
+   o->unpack_opts.head_idx = 2;
+   o->unpack_opts.fn = threeway_merge;
+   o->unpack_opts.src_index = _index;
+   o->unpack_opts.dst_index = _index;
+   setup_unpack_trees_porcelain(>unpack_opts, "merge");
 
init_tree_desc_from_tree(t+0, common);
init_tree_desc_from_tree(t+1, head);
init_tree_desc_from_tree(t+2, merge);
 
-   rc = unpack_trees(3, t, );
+   rc = unpack_trees(3, t, >unpack_opts);
+   /*
+* unpack_trees NULLifies src_index, but it's used in verify_uptodate,
+* so set to the new index which will usually have modification
+* timestamp info copied over.
+*/
+   o->unpack_opts.src_index = _index;
cache_tree_free(_cache_tree);
return rc;
 }
@@ -792,6 +797,20 @@ static int would_lose_untracked(const char *path)
return !was_tracked(path) && file_exists(path);
 }
 
+static int was_dirty(struct merge_options *o, const char *path)
+{
+   struct cache_entry *ce;
+   int dirty = 1;
+
+   if (o->call_depth || !was_tracked(path))
+   return !dirty;
+
+   ce = cache_file_exists(path, strlen(path), ignore_case);
+   dirty = (ce->ce_stat_data.sd_mtime.sec > 0 &&
+verify_uptodate(ce, >unpack_opts) != 0);
+   return dirty;
+}
+
 static int make_room_for_path(struct merge_options *o, const char *path)
 {
int status, i;
@@ -2654,6 +2673,7 @@ static int handle_modify_delete(struct merge_options *o,
 
 static int merge_content(struct merge_options *o,
 const char *path,
+int file_in_way,
 struct object_id *o_oid, int o_mode,
 struct object_id *a_oid, int a_mode,
 struct object_id *b_oid, int b_mode,
@@ -2728,7 +2748,7 @@ static int merge_content(struct merge_options *o,
return -1;
}
 
-   if (df_conflict_remains) {
+   if (df_conflict_remains || file_in_way) {
char *new_path;
if (o->call_depth) {
remove_file_from_cache(path);
@@ -2762,6 +2782,30 @@ static int merge_content(struct merge_options *o,
return mfi.clean;
 }
 
+static int conflict_rename_normal(struct merge_options *o,
+ const char *path,
+ struct object_id *o_oid, unsigned int o_mode,
+ struct object_id *a_oid, unsigned int a_mode,
+ struct object_id *b_oid, unsigned int b_mode,
+ struct rename_conflict_info *ci)
+{
+   int clean_merge;
+   int file_in_the_way = 0;
+
+   if (was_dirty(o, path)) {
+   file_in_the_way = 1;
+   output(o, 1, _("Refusing to lose dirty file at %s"), path);
+   }
+
+   /* Merge the content and write it out */
+   clean_merge = merge_content(o, path, file_in_the_way,
+

[PATCH v7 21/31] merge-recursive: add a new hashmap for storing file collisions

2018-01-30 Thread Elijah Newren

Directory renames with the ability to merge directories opens up the
possibility of add/add/add/.../add conflicts, if each of the N
directories being merged into one target directory all had a file with
the same name.  We need a way to check for and report on such
collisions; this hashmap will be used for this purpose.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 23 +++
 merge-recursive.h |  7 +++
 2 files changed, 30 insertions(+)

diff --git a/merge-recursive.c b/merge-recursive.c
index 9e9ad45d2a..ac968ad2ae 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -84,6 +84,29 @@ static void dir_rename_entry_init(struct dir_rename_entry 
*entry,
string_list_init(>possible_new_dirs, 0);
 }
 
+static struct collision_entry *collision_find_entry(struct hashmap *hashmap,
+   char *target_file)
+{
+   struct collision_entry key;
+
+   hashmap_entry_init(, strhash(target_file));
+   key.target_file = target_file;
+   return hashmap_get(hashmap, , NULL);
+}
+
+static int collision_cmp(void *unused_cmp_data,
+const struct collision_entry *e1,
+const struct collision_entry *e2,
+const void *unused_keydata)
+{
+   return strcmp(e1->target_file, e2->target_file);
+}
+
+static void collision_init(struct hashmap *map)
+{
+   hashmap_init(map, (hashmap_cmp_fn) collision_cmp, NULL, 0);
+}
+
 static void flush_output(struct merge_options *o)
 {
if (o->buffer_output < 2 && o->obuf.len) {
diff --git a/merge-recursive.h b/merge-recursive.h
index d7f4cc80c1..e1be27f57c 100644
--- a/merge-recursive.h
+++ b/merge-recursive.h
@@ -37,6 +37,13 @@ struct dir_rename_entry {
struct string_list possible_new_dirs;
 };
 
+struct collision_entry {
+   struct hashmap_entry ent; /* must be the first member! */
+   char *target_file;
+   struct string_list source_files;
+   unsigned reported_already:1;
+};
+
 /* merge_trees() but with recursive ancestor consolidation */
 int merge_recursive(struct merge_options *o,
struct commit *h1,
-- 
2.16.1.106.gf69932adfe

[PATCH v7 03/31] directory rename detection: testcases to avoid taking detection too far

2018-01-30 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 153 
 1 file changed, 153 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index b22a9052b3..8049ed5fc9 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -582,4 +582,157 @@ test_expect_success '2b-check: Directory split into two 
on one side, with equal
 #   messages are handled correctly.
 ###
 
+
+###
+# SECTION 3: Path in question is the source path for some rename already
+#
+# Combining cases from Section 1 and trying to handle them could lead to
+# directory renaming detection being over-applied.  So, this section
+# provides some good testcases to check that the implementation doesn't go
+# too far.
+###
+
+# Testcase 3a, Avoid implicit rename if involved as source on other side
+#   (Related to testcases 1c and 1f)
+#   Commit O: z/{b,c,d}
+#   Commit A: z/{b,c,d} (no change)
+#   Commit B: y/{b,c}, x/d
+#   Expected: y/{b,c}, x/d
+test_expect_success '3a-setup: Avoid implicit rename if involved as source on 
other side' '
+   test_create_repo 3a &&
+   (
+   cd 3a &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   echo d >z/d &&
+   git add z &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   test_tick &&
+   git commit --allow-empty -m "A" &&
+
+   git checkout B &&
+   mkdir y &&
+   mkdir x &&
+   git mv z/b y/ &&
+   git mv z/c y/ &&
+   git mv z/d x/ &&
+   rmdir z &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_success '3a-check: Avoid implicit rename if involved as source on 
other side' '
+   (
+   cd 3a &&
+
+   git checkout A^0 &&
+
+   git merge -s recursive B^0 &&
+
+   git ls-files -s >out &&
+   test_line_count = 3 out &&
+
+   git rev-parse >actual \
+   HEAD:y/b HEAD:y/c HEAD:x/d &&
+   git rev-parse >expect \
+   O:z/bO:z/cO:z/d &&
+   test_cmp expect actual
+   )
+'
+
+# Testcase 3b, Avoid implicit rename if involved as source on other side
+#   (Related to testcases 5c and 7c, also kind of 1e and 1f)
+#   Commit O: z/{b,c,d}
+#   Commit A: y/{b,c}, x/d
+#   Commit B: z/{b,c}, w/d
+#   Expected: y/{b,c}, CONFLICT:(z/d -> x/d vs. w/d)
+#   NOTE: We're particularly checking that since z/d is already involved as
+# a source in a file rename on the same side of history, that we don't
+# get it involved in directory rename detection.  If it were, we might
+# end up with CONFLICT:(z/d -> y/d vs. x/d vs. w/d), i.e. a
+# rename/rename/rename(1to3) conflict, which is just weird.
+test_expect_success '3b-setup: Avoid implicit rename if involved as source on 
current side' '
+   test_create_repo 3b &&
+   (
+   cd 3b &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   echo d >z/d &&
+   git add z &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   mkdir y &&
+   mkdir x &&
+   git mv z/b y/ &&
+   git mv z/c y/ &&
+   git mv z/d x/ &&
+   rmdir z &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   mkdir w &&
+   git mv z/d w/ &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_success '3b-check: Avoid implicit rename if involved as source on 
current side' '
+   (
+   cd 3b &&
+
+   git checkout A^0 &&
+
+   test_must_fail git merge -s recursive B^0 >out &&
+   test_i18ngrep CONFLICT.*rename/rename.*z/d.*x/d.*w/d out &&
+   test_i18ngrep ! CONFLICT.*rename/rename.*y/d out &&
+
+   git ls-files -s >out &&
+   test_line_count = 5 out &&
+   git ls-files -u >out &&
+   test_line_count = 3 out &&
+   git ls-files -o >out &&
+   test_line_count = 1 out &&
+
+   git rev-parse >actual \
+

[PATCH v7 29/31] directory rename detection: new testcases showcasing a pair of bugs

2018-01-30 Thread Elijah Newren

Add a testcase showing spurious rename/rename(1to2) conflicts occurring
due to directory rename detection.

Also add a pair of testcases dealing with moving directory hierarchies
around that were suggested by Stefan Beller as "food for thought" during
his review of an earlier patch series, but which actually uncovered a
bug.  Round things out with a test that is a cross between the two
testcases that showed existing bugs in order to make sure we aren't
merely addressing problems in isolation but in general.

Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 296 
 1 file changed, 296 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index a34c57d986..3d292f0c5f 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -159,6 +159,7 @@ test_expect_success '1b-check: Merge a directory with 
another' '
 # Testcase 1c, Transitive renaming
 #   (Related to testcases 3a and 6d -- when should a transitive rename apply?)
 #   (Related to testcases 9c and 9d -- can transitivity repeat?)
+#   (Related to testcase 12b -- joint-transitivity?)
 #   Commit O: z/{b,c},   x/d
 #   Commit A: y/{b,c},   x/d
 #   Commit B: z/{b,c,d}
@@ -2863,6 +2864,68 @@ test_expect_failure '9g-check: Renamed directory that 
only contained immediate s
)
 '
 
+# Testcase 9h, Avoid implicit rename if involved as source on other side
+#   (Extremely closely related to testcase 3a)
+#   Commit O: z/{b,c,d_1}
+#   Commit A: z/{b,c,d_2}
+#   Commit B: y/{b,c}, x/d_1
+#   Expected: y/{b,c}, x/d_2
+#   NOTE: If we applied the z/ -> y/ rename to z/d, then we'd end up with
+# a rename/rename(1to2) conflict (z/d -> y/d vs. x/d)
+test_expect_success '9h-setup: Avoid dir rename on merely modified path' '
+   test_create_repo 9h &&
+   (
+   cd 9h &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   printf "1\n2\n3\n4\n5\n6\n7\n8\nd\n" >z/d &&
+   git add z &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   test_tick &&
+   echo more >>z/d &&
+   git add z/d &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   mkdir y &&
+   mkdir x &&
+   git mv z/b y/ &&
+   git mv z/c y/ &&
+   git mv z/d x/ &&
+   rmdir z &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_failure '9h-check: Avoid dir rename on merely modified path' '
+   (
+   cd 9h &&
+
+   git checkout A^0 &&
+
+   git merge -s recursive B^0 &&
+
+   git ls-files -s >out &&
+   test_line_count = 3 out &&
+
+   git rev-parse >actual \
+   HEAD:y/b HEAD:y/c HEAD:x/d &&
+   git rev-parse >expect \
+   O:z/bO:z/cA:z/d &&
+   test_cmp expect actual
+   )
+'
+
 ###
 # Rules suggested by section 9:
 #
@@ -3696,4 +3759,237 @@ test_expect_success '11f-check: Avoid deleting 
not-uptodate with dir rename/rena
)
 '
 
+###
+# SECTION 12: Everything else
+#
+# Tests suggested by others.  Tests added after implementation completed
+# and submitted.  Grab bag.
+###
+
+# Testcase 12a, Moving one directory hierarchy into another
+#   (Related to testcase 9a)
+#   Commit O: node1/{leaf1,leaf2}, node2/{leaf3,leaf4}
+#   Commit A: node1/{leaf1,leaf2,node2/{leaf3,leaf4}}
+#   Commit B: node1/{leaf1,leaf2,leaf5}, node2/{leaf3,leaf4,leaf6}
+#   Expected: node1/{leaf1,leaf2,leaf5,node2/{leaf3,leaf4,leaf6}}
+
+test_expect_success '12a-setup: Moving one directory hierarchy into another' '
+   test_create_repo 12a &&
+   (
+   cd 12a &&
+
+   mkdir -p node1 node2 &&
+   echo leaf1 >node1/leaf1 &&
+   echo leaf2 >node1/leaf2 &&
+   echo leaf3 >node2/leaf3 &&
+   echo leaf4 >node2/leaf4 &&
+   git add node1 node2 &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   git mv node2/ node1/ &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   echo leaf5 >node1/leaf5 &&
+   echo leaf6 >node2/leaf6 &&
+   git add node1 node2 &&
+

[PATCH v7 17/31] merge-recursive: add a new hashmap for storing directory renames

2018-01-30 Thread Elijah Newren

This just adds dir_rename_entry and the associated functions; code using
these will be added in subsequent commits.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 35 +++
 merge-recursive.h |  8 
 2 files changed, 43 insertions(+)

diff --git a/merge-recursive.c b/merge-recursive.c
index 8ac69e1cbb..3b6d0e3f70 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -49,6 +49,41 @@ static unsigned int path_hash(const char *path)
return ignore_case ? strihash(path) : strhash(path);
 }
 
+static struct dir_rename_entry *dir_rename_find_entry(struct hashmap *hashmap,
+ char *dir)
+{
+   struct dir_rename_entry key;
+
+   if (dir == NULL)
+   return NULL;
+   hashmap_entry_init(, strhash(dir));
+   key.dir = dir;
+   return hashmap_get(hashmap, , NULL);
+}
+
+static int dir_rename_cmp(void *unused_cmp_data,
+ const struct dir_rename_entry *e1,
+ const struct dir_rename_entry *e2,
+ const void *unused_keydata)
+{
+   return strcmp(e1->dir, e2->dir);
+}
+
+static void dir_rename_init(struct hashmap *map)
+{
+   hashmap_init(map, (hashmap_cmp_fn) dir_rename_cmp, NULL, 0);
+}
+
+static void dir_rename_entry_init(struct dir_rename_entry *entry,
+ char *directory)
+{
+   hashmap_entry_init(entry, strhash(directory));
+   entry->dir = directory;
+   entry->non_unique_new_dir = 0;
+   strbuf_init(>new_dir, 0);
+   string_list_init(>possible_new_dirs, 0);
+}
+
 static void flush_output(struct merge_options *o)
 {
if (o->buffer_output < 2 && o->obuf.len) {
diff --git a/merge-recursive.h b/merge-recursive.h
index 80d69d1401..d7f4cc80c1 100644
--- a/merge-recursive.h
+++ b/merge-recursive.h
@@ -29,6 +29,14 @@ struct merge_options {
struct string_list df_conflict_file_set;
 };
 
+struct dir_rename_entry {
+   struct hashmap_entry ent; /* must be the first member! */
+   char *dir;
+   unsigned non_unique_new_dir:1;
+   struct strbuf new_dir;
+   struct string_list possible_new_dirs;
+};
+
 /* merge_trees() but with recursive ancestor consolidation */
 int merge_recursive(struct merge_options *o,
struct commit *h1,
-- 
2.16.1.106.gf69932adfe

[PATCH v7 01/31] directory rename detection: basic testcases

2018-01-30 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 442 
 1 file changed, 442 insertions(+)
 create mode 100755 t/t6043-merge-rename-directories.sh

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
new file mode 100755
index 00..d045f0e31e
--- /dev/null
+++ b/t/t6043-merge-rename-directories.sh
@@ -0,0 +1,442 @@
+#!/bin/sh
+
+test_description="recursive merge with directory renames"
+# includes checking of many corner cases, with a similar methodology to:
+#   t6042: corner cases with renames but not criss-cross merges
+#   t6036: corner cases with both renames and criss-cross merges
+#
+# The setup for all of them, pictorially, is:
+#
+#  A
+#  o
+# / \
+#  O o   ?
+# \ /
+#  o
+#  B
+#
+# To help make it easier to follow the flow of tests, they have been
+# divided into sections and each test will start with a quick explanation
+# of what commits O, A, and B contain.
+#
+# Notation:
+#z/{b,c}   means  files z/b and z/c both exist
+#x/d_1 means  file x/d exists with content d1.  (Purpose of the
+# underscore notation is to differentiate different
+# files that might be renamed into each other's paths.)
+
+. ./test-lib.sh
+
+
+###
+# SECTION 1: Basic cases we should be able to handle
+###
+
+# Testcase 1a, Basic directory rename.
+#   Commit O: z/{b,c}
+#   Commit A: y/{b,c}
+#   Commit B: z/{b,c,d,e/f}
+#   Expected: y/{b,c,d,e/f}
+
+test_expect_success '1a-setup: Simple directory rename detection' '
+   test_create_repo 1a &&
+   (
+   cd 1a &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   git add z &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   git mv z y &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   echo d >z/d &&
+   mkdir z/e &&
+   echo f >z/e/f &&
+   git add z/d z/e/f &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_failure '1a-check: Simple directory rename detection' '
+   (
+   cd 1a &&
+
+   git checkout A^0 &&
+
+   git merge -s recursive B^0 &&
+
+   git ls-files -s >out &&
+   test_line_count = 4 out &&
+
+   git rev-parse >actual \
+   HEAD:y/b HEAD:y/c HEAD:y/d HEAD:y/e/f &&
+   git rev-parse >expect \
+   O:z/bO:z/cB:z/dB:z/e/f &&
+   test_cmp expect actual &&
+
+   git hash-object y/d >actual &&
+   git rev-parse B:z/d >expect &&
+   test_cmp expect actual &&
+
+   test_must_fail git rev-parse HEAD:z/d &&
+   test_must_fail git rev-parse HEAD:z/e/f &&
+   test_path_is_missing z/d &&
+   test_path_is_missing z/e/f
+   )
+'
+
+# Testcase 1b, Merge a directory with another
+#   Commit O: z/{b,c},   y/d
+#   Commit A: z/{b,c,e}, y/d
+#   Commit B: y/{b,c,d}
+#   Expected: y/{b,c,d,e}
+
+test_expect_success '1b-setup: Merge a directory with another' '
+   test_create_repo 1b &&
+   (
+   cd 1b &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   mkdir y &&
+   echo d >y/d &&
+   git add z y &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   echo e >z/e &&
+   git add z/e &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   git mv z/b y &&
+   git mv z/c y &&
+   rmdir z &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_failure '1b-check: Merge a directory with another' '
+   (
+   cd 1b &&
+
+   git checkout A^0 &&
+
+   git merge -s recursive B^0 &&
+
+   git ls-files -s >out &&
+   test_line_count = 4 out &&
+
+   git rev-parse >actual \
+   HEAD:y/b HEAD:y/c HEAD:y/d HEAD:y/e &&
+   git rev-parse >expect \
+   O:z/bO:z/cO:y/dA:z/e &&
+   test_cmp expect actual &&
+   test_must_fail git rev-parse HEAD:z/e
+   )
+'
+
+# Testcase 1c, Transitive

[PATCH v7 13/31] merge-recursive: introduce new functions to handle rename logic

2018-01-30 Thread Elijah Newren

The amount of logic in merge_trees() relative to renames was just a few
lines, but split it out into new handle_renames() and cleanup_renames()
functions to prepare for additional logic to be added to each.  No code or
logic changes, just a new place to put stuff for when the rename detection
gains additional checks.

Note that process_renames() records pointers to various information (such
as diff_filepairs) into rename_conflict_info structs.  Even though the
rename string_lists are not directly used once handle_renames() completes,
we should not immediately free the lists at the end of that function
because they store the information referenced in the rename_conflict_info,
which is used later in process_entry().  Thus the reason for a separate
cleanup_renames().

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 43 +--
 1 file changed, 33 insertions(+), 10 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 2028dd113b..eac3041261 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1645,6 +1645,32 @@ static int process_renames(struct merge_options *o,
return clean_merge;
 }
 
+struct rename_info {
+   struct string_list *head_renames;
+   struct string_list *merge_renames;
+};
+
+static int handle_renames(struct merge_options *o,
+ struct tree *common,
+ struct tree *head,
+ struct tree *merge,
+ struct string_list *entries,
+ struct rename_info *ri)
+{
+   ri->head_renames  = get_renames(o, head, common, head, merge, entries);
+   ri->merge_renames = get_renames(o, merge, common, head, merge, entries);
+   return process_renames(o, ri->head_renames, ri->merge_renames);
+}
+
+static void cleanup_renames(struct rename_info *re_info)
+{
+   string_list_clear(re_info->head_renames, 0);
+   string_list_clear(re_info->merge_renames, 0);
+
+   free(re_info->head_renames);
+   free(re_info->merge_renames);
+}
+
 static struct object_id *stage_oid(const struct object_id *oid, unsigned mode)
 {
return (is_null_oid(oid) || mode == 0) ? NULL: (struct object_id *)oid;
@@ -2004,7 +2030,8 @@ int merge_trees(struct merge_options *o,
}
 
if (unmerged_cache()) {
-   struct string_list *entries, *re_head, *re_merge;
+   struct string_list *entries;
+   struct rename_info re_info;
int i;
/*
 * Only need the hashmap while processing entries, so
@@ -2018,9 +2045,8 @@ int merge_trees(struct merge_options *o,
get_files_dirs(o, merge);
 
entries = get_unmerged();
-   re_head  = get_renames(o, head, common, head, merge, entries);
-   re_merge = get_renames(o, merge, common, head, merge, entries);
-   clean = process_renames(o, re_head, re_merge);
+   clean = handle_renames(o, common, head, merge, entries,
+  _info);
record_df_conflict_files(o, entries);
if (clean < 0)
goto cleanup;
@@ -2045,16 +2071,13 @@ int merge_trees(struct merge_options *o,
}
 
 cleanup:
-   string_list_clear(re_merge, 0);
-   string_list_clear(re_head, 0);
+   cleanup_renames(_info);
+
string_list_clear(entries, 1);
+   free(entries);
 
hashmap_free(>current_file_dir_set, 1);
 
-   free(re_merge);
-   free(re_head);
-   free(entries);
-
if (clean < 0)
return clean;
}
-- 
2.16.1.106.gf69932adfe

[PATCH v7 02/31] directory rename detection: directory splitting testcases

2018-01-30 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 143 
 1 file changed, 143 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index d045f0e31e..b22a9052b3 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -439,4 +439,147 @@ test_expect_failure '1f-check: Split a directory into two 
other directories' '
 #   in section 2, plus testcases 3a and 4a.
 ###
 
+
+###
+# SECTION 2: Split into multiple directories, with equal number of paths
+#
+# Explore the splitting-a-directory rules a bit; what happens in the
+# edge cases?
+#
+# Note that there is a closely related case of a directory not being
+# split on either side of history, but being renamed differently on
+# each side.  See testcase 8e for that.
+###
+
+# Testcase 2a, Directory split into two on one side, with equal numbers of 
paths
+#   Commit O: z/{b,c}
+#   Commit A: y/b, w/c
+#   Commit B: z/{b,c,d}
+#   Expected: y/b, w/c, z/d, with warning about z/ -> (y/ vs. w/) conflict
+test_expect_success '2a-setup: Directory split into two on one side, with 
equal numbers of paths' '
+   test_create_repo 2a &&
+   (
+   cd 2a &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   git add z &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   mkdir y &&
+   mkdir w &&
+   git mv z/b y/ &&
+   git mv z/c w/ &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   echo d >z/d &&
+   git add z/d &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_failure '2a-check: Directory split into two on one side, with 
equal numbers of paths' '
+   (
+   cd 2a &&
+
+   git checkout A^0 &&
+
+   test_must_fail git merge -s recursive B^0 >out &&
+   test_i18ngrep "CONFLICT.*directory rename split" out &&
+
+   git ls-files -s >out &&
+   test_line_count = 3 out &&
+   git ls-files -u >out &&
+   test_line_count = 0 out &&
+   git ls-files -o >out &&
+   test_line_count = 1 out &&
+
+   git rev-parse >actual \
+   :0:y/b :0:w/c :0:z/d &&
+   git rev-parse >expect \
+O:z/b  O:z/c  B:z/d &&
+   test_cmp expect actual
+   )
+'
+
+# Testcase 2b, Directory split into two on one side, with equal numbers of 
paths
+#   Commit O: z/{b,c}
+#   Commit A: y/b, w/c
+#   Commit B: z/{b,c}, x/d
+#   Expected: y/b, w/c, x/d; No warning about z/ -> (y/ vs. w/) conflict
+test_expect_success '2b-setup: Directory split into two on one side, with 
equal numbers of paths' '
+   test_create_repo 2b &&
+   (
+   cd 2b &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   git add z &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   mkdir y &&
+   mkdir w &&
+   git mv z/b y/ &&
+   git mv z/c w/ &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   mkdir x &&
+   echo d >x/d &&
+   git add x/d &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_success '2b-check: Directory split into two on one side, with 
equal numbers of paths' '
+   (
+   cd 2b &&
+
+   git checkout A^0 &&
+
+   git merge -s recursive B^0 >out &&
+
+   git ls-files -s >out &&
+   test_line_count = 3 out &&
+   git ls-files -u >out &&
+   test_line_count = 0 out &&
+   git ls-files -o >out &&
+   test_line_count = 1 out &&
+
+   git rev-parse >actual \
+   :0:y/b :0:w/c :0:x/d &&
+   git rev-parse >expect \
+O:z/b  O:z/c  B:x/d &&
+   test_cmp expect actual &&
+   test_i18ngrep ! "CONFLICT.*directory rename split" out
+   )
+'
+
+###
+# Rules suggested by section 2:
+#
+#

[PATCH v7 04/31] directory rename detection: partially renamed directory testcase/discussion

2018-01-30 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 107 
 1 file changed, 107 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index 8049ed5fc9..f0213f2bbd 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -735,4 +735,111 @@ test_expect_success '3b-check: Avoid implicit rename if 
involved as source on cu
 #   of a rename on either side of a merge.
 ###
 
+
+###
+# SECTION 4: Partially renamed directory; still exists on both sides of merge
+#
+# What if we were to attempt to do directory rename detection when someone
+# "mostly" moved a directory but still left some files around, or,
+# equivalently, fully renamed a directory in one commmit and then recreated
+# that directory in a later commit adding some new files and then tried to
+# merge?
+#
+# It's hard to divine user intent in these cases, because you can make an
+# argument that, depending on the intermediate history of the side being
+# merged, that some users will want files in that directory to
+# automatically be detected and renamed, while users with a different
+# intermediate history wouldn't want that rename to happen.
+#
+# I think that it is best to simply not have directory rename detection
+# apply to such cases.  My reasoning for this is four-fold: (1) it's
+# easiest for users in general to figure out what happened if we don't
+# apply directory rename detection in any such case, (2) it's an easy rule
+# to explain ["We don't do directory rename detection if the directory
+# still exists on both sides of the merge"], (3) we can get some hairy
+# edge/corner cases that would be really confusing and possibly not even
+# representable in the index if we were to even try, and [related to 3] (4)
+# attempting to resolve this issue of divining user intent by examining
+# intermediate history goes against the spirit of three-way merges and is a
+# path towards crazy corner cases that are far more complex than what we're
+# already dealing with.
+#
+# This section contains a test for this partially-renamed-directory case.
+###
+
+# Testcase 4a, Directory split, with original directory still present
+#   (Related to testcase 1f)
+#   Commit O: z/{b,c,d,e}
+#   Commit A: y/{b,c,d}, z/e
+#   Commit B: z/{b,c,d,e,f}
+#   Expected: y/{b,c,d}, z/{e,f}
+#   NOTE: Even though most files from z moved to y, we don't want f to follow.
+
+test_expect_success '4a-setup: Directory split, with original directory still 
present' '
+   test_create_repo 4a &&
+   (
+   cd 4a &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   echo d >z/d &&
+   echo e >z/e &&
+   git add z &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   mkdir y &&
+   git mv z/b y/ &&
+   git mv z/c y/ &&
+   git mv z/d y/ &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   echo f >z/f &&
+   git add z/f &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_success '4a-check: Directory split, with original directory still 
present' '
+   (
+   cd 4a &&
+
+   git checkout A^0 &&
+
+   git merge -s recursive B^0 &&
+
+   git ls-files -s >out &&
+   test_line_count = 5 out &&
+   git ls-files -u >out &&
+   test_line_count = 0 out &&
+   git ls-files -o >out &&
+   test_line_count = 1 out &&
+
+   git rev-parse >actual \
+   HEAD:y/b HEAD:y/c HEAD:y/d HEAD:z/e HEAD:z/f &&
+   git rev-parse >expect \
+   O:z/bO:z/cO:z/dO:z/eB:z/f &&
+   test_cmp expect actual
+   )
+'
+
+###
+# Rules suggested by section 4:
+#
+#   Directory-rename-detection should be turned off for any directories (as
+#   a source for renames) that exist on both sides of the merge.  (The "as
+#   a source for renames" clarification is due to cases like 1c where
+#   the target directory exists on both sides and we do want the rename
+#   detection.)  But, sadly, see testcase 8b.
+###
+
 test_done
-- 
2.16.1.106.gf69932adfe

[PATCH v7 19/31] merge-recursive: add get_directory_renames()

2018-01-30 Thread Elijah Newren

This populates a list of directory renames for us.  The list of
directory renames is not yet used, but will be in subsequent commits.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 155 --
 1 file changed, 152 insertions(+), 3 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 40ed8e1f39..c75d3a5139 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1393,6 +1393,138 @@ static struct diff_queue_struct *get_diffpairs(struct 
merge_options *o,
return ret;
 }
 
+static void get_renamed_dir_portion(const char *old_path, const char *new_path,
+   char **old_dir, char **new_dir)
+{
+   char *end_of_old, *end_of_new;
+   int old_len, new_len;
+
+   *old_dir = NULL;
+   *new_dir = NULL;
+
+   /* For
+*"a/b/c/d/foo.c" -> "a/b/something-else/d/foo.c"
+* the "d/foo.c" part is the same, we just want to know that
+*"a/b/c" was renamed to "a/b/something-else"
+* so, for this example, this function returns "a/b/c" in
+* *old_dir and "a/b/something-else" in *new_dir.
+*
+* Also, if the basename of the file changed, we don't care.  We
+* want to know which portion of the directory, if any, changed.
+*/
+   end_of_old = strrchr(old_path, '/');
+   end_of_new = strrchr(new_path, '/');
+
+   if (end_of_old == NULL || end_of_new == NULL)
+   return;
+   while (*--end_of_new == *--end_of_old &&
+  end_of_old != old_path &&
+  end_of_new != new_path)
+   ; /* Do nothing; all in the while loop */
+   /*
+* We've found the first non-matching character in the directory
+* paths.  That means the current directory we were comparing
+* represents the rename.  Move end_of_old and end_of_new back
+* to the full directory name.
+*/
+   if (*end_of_old == '/')
+   end_of_old++;
+   if (*end_of_old != '/')
+   end_of_new++;
+   end_of_old = strchr(end_of_old, '/');
+   end_of_new = strchr(end_of_new, '/');
+
+   /*
+* It may have been the case that old_path and new_path were the same
+* directory all along.  Don't claim a rename if they're the same.
+*/
+   old_len = end_of_old - old_path;
+   new_len = end_of_new - new_path;
+
+   if (old_len != new_len || strncmp(old_path, new_path, old_len)) {
+   *old_dir = xstrndup(old_path, old_len);
+   *new_dir = xstrndup(new_path, new_len);
+   }
+}
+
+static struct hashmap *get_directory_renames(struct diff_queue_struct *pairs,
+struct tree *tree)
+{
+   struct hashmap *dir_renames;
+   struct hashmap_iter iter;
+   struct dir_rename_entry *entry;
+   int i;
+
+   dir_renames = malloc(sizeof(struct hashmap));
+   dir_rename_init(dir_renames);
+   for (i = 0; i < pairs->nr; ++i) {
+   struct string_list_item *item;
+   int *count;
+   struct diff_filepair *pair = pairs->queue[i];
+   char *old_dir, *new_dir;
+
+   /* File not part of directory rename if it wasn't renamed */
+   if (pair->status != 'R')
+   continue;
+
+   get_renamed_dir_portion(pair->one->path, pair->two->path,
+   _dir,_dir);
+   if (!old_dir)
+   /* Directory didn't change at all; ignore this one. */
+   continue;
+
+   entry = dir_rename_find_entry(dir_renames, old_dir);
+   if (!entry) {
+   entry = xmalloc(sizeof(struct dir_rename_entry));
+   dir_rename_entry_init(entry, old_dir);
+   hashmap_put(dir_renames, entry);
+   } else {
+   free(old_dir);
+   }
+   item = string_list_lookup(>possible_new_dirs, new_dir);
+   if (!item) {
+   item = string_list_insert(>possible_new_dirs,
+ new_dir);
+   item->util = xcalloc(1, sizeof(int));
+   } else {
+   free(new_dir);
+   }
+   count = item->util;
+   *count += 1;
+   }
+
+   hashmap_iter_init(dir_renames, );
+   while ((entry = hashmap_iter_next())) {
+   int max = 0;
+   int bad_max = 0;
+   char *best = NULL;
+
+   for (i = 0; i < entry->possible_new_dirs.nr; i++) {
+   int *count = entry->possible_new_dirs.items[i].util;
+
+   if (*count == max)
+   bad_max = max;
+   else if (*count > max) {
+

[PATCH v7 18/31] merge-recursive: make a helper function for cleanup for handle_renames

2018-01-30 Thread Elijah Newren

In anticipation of more involved cleanup to come, make a helper function
for doing the cleanup at the end of handle_renames.  Rename the already
existing cleanup_rename[s]() to final_cleanup_rename[s](), name the new
helper initial_cleanup_rename(), and leave the big comment in the code
about why we can't do all the cleanup at once.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 3b6d0e3f70..40ed8e1f39 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1704,6 +1704,12 @@ struct rename_info {
struct string_list *merge_renames;
 };
 
+static void initial_cleanup_rename(struct diff_queue_struct *pairs)
+{
+   free(pairs->queue);
+   free(pairs);
+}
+
 static int handle_renames(struct merge_options *o,
  struct tree *common,
  struct tree *head,
@@ -1734,16 +1740,13 @@ static int handle_renames(struct merge_options *o,
 * data structures are still needed and referenced in
 * process_entry().  But there are a few things we can free now.
 */
-
-   free(head_pairs->queue);
-   free(head_pairs);
-   free(merge_pairs->queue);
-   free(merge_pairs);
+   initial_cleanup_rename(head_pairs);
+   initial_cleanup_rename(merge_pairs);
 
return clean;
 }
 
-static void cleanup_rename(struct string_list *rename)
+static void final_cleanup_rename(struct string_list *rename)
 {
const struct rename *re;
int i;
@@ -1759,10 +1762,10 @@ static void cleanup_rename(struct string_list *rename)
free(rename);
 }
 
-static void cleanup_renames(struct rename_info *re_info)
+static void final_cleanup_renames(struct rename_info *re_info)
 {
-   cleanup_rename(re_info->head_renames);
-   cleanup_rename(re_info->merge_renames);
+   final_cleanup_rename(re_info->head_renames);
+   final_cleanup_rename(re_info->merge_renames);
 }
 
 static struct object_id *stage_oid(const struct object_id *oid, unsigned mode)
@@ -2165,7 +2168,7 @@ int merge_trees(struct merge_options *o,
}
 
 cleanup:
-   cleanup_renames(_info);
+   final_cleanup_renames(_info);
 
string_list_clear(entries, 1);
free(entries);
-- 
2.16.1.106.gf69932adfe

[PATCH v7 00/31] Add directory rename detection to git

2018-01-30 Thread Elijah Newren

This patchset introduces directory rename detection to merge-recursive.  See
  https://public-inbox.org/git/20171110190550.27059-1-new...@gmail.com/
for the first series (including design considerations, etc.), and follow-up
series can be found at
  https://public-inbox.org/git/20171120220209.15111-1-new...@gmail.com/
  https://public-inbox.org/git/20171121080059.32304-1-new...@gmail.com/
  https://public-inbox.org/git/20171129014237.32570-1-new...@gmail.com/
  https://public-inbox.org/git/20171228041352.27880-1-new...@gmail.com/
  https://public-inbox.org/git/20180105202711.24311-1-new...@gmail.com/

Changes since v6 (full tbdiff follows below):
  * Fix missing file argument in a testcase, pointed out by SZEDER G??bor.
  * Add whitespace in some testcases in separate git-rev-parse invocations
to make it easier to visually see what is being compared.
  * Rebased on latest origin/master (not that there were any conflicts)

Note: This series continues to depend upon en/merge-recursive-fixes, at
  least contextually.

Elijah Newren (31):
  directory rename detection: basic testcases
  directory rename detection: directory splitting testcases
  directory rename detection: testcases to avoid taking detection too
far
  directory rename detection: partially renamed directory
testcase/discussion
  directory rename detection: files/directories in the way of some
renames
  directory rename detection: testcases checking which side did the
rename
  directory rename detection: more involved edge/corner testcases
  directory rename detection: testcases exploring possibly suboptimal
merges
  directory rename detection: miscellaneous testcases to complete
coverage
  directory rename detection: tests for handling overwriting untracked
files
  directory rename detection: tests for handling overwriting dirty files
  merge-recursive: move the get_renames() function
  merge-recursive: introduce new functions to handle rename logic
  merge-recursive: fix leaks of allocated renames and diff_filepairs
  merge-recursive: make !o->detect_rename codepath more obvious
  merge-recursive: split out code for determining diff_filepairs
  merge-recursive: add a new hashmap for storing directory renames
  merge-recursive: make a helper function for cleanup for handle_renames
  merge-recursive: add get_directory_renames()
  merge-recursive: check for directory level conflicts
  merge-recursive: add a new hashmap for storing file collisions
  merge-recursive: add computation of collisions due to dir rename &
merging
  merge-recursive: check for file level conflicts then get new name
  merge-recursive: when comparing files, don't include trees
  merge-recursive: apply necessary modifications for directory renames
  merge-recursive: avoid clobbering untracked files with directory
renames
  merge-recursive: fix overwriting dirty files involved in renames
  merge-recursive: fix remaining directory rename + dirty overwrite
cases
  directory rename detection: new testcases showcasing a pair of bugs
  merge-recursive: avoid spurious rename/rename conflict from dir
renames
  merge-recursive: ensure we write updates for directory-renamed file

 merge-recursive.c   | 1212 ++-
 merge-recursive.h   |   17 +
 strbuf.c|   16 +
 strbuf.h|   16 +
 t/t3501-revert-cherry-pick.sh   |2 +-
 t/t6043-merge-rename-directories.sh | 3990 +++
 t/t7607-merge-overwrite.sh  |2 +-
 unpack-trees.c  |4 +-
 unpack-trees.h  |4 +
 9 files changed, 5148 insertions(+), 115 deletions(-)
 create mode 100755 t/t6043-merge-rename-directories.sh

 1: 2fd0812b7a !  1: 5ba69c9c7b directory rename detection: basic testcases
@@ -94,7 +94,7 @@
 +  git rev-parse >actual \
 +  HEAD:y/b HEAD:y/c HEAD:y/d HEAD:y/e/f &&
 +  git rev-parse >expect \
-+  O:z/b O:z/c B:z/d B:z/e/f &&
++  O:z/bO:z/cB:z/dB:z/e/f &&
 +  test_cmp expect actual &&
 +
 +  git hash-object y/d >actual &&
@@ -161,7 +161,7 @@
 +  git rev-parse >actual \
 +  HEAD:y/b HEAD:y/c HEAD:y/d HEAD:y/e &&
 +  git rev-parse >expect \
-+  O:z/b O:z/c O:y/d A:z/e &&
++  O:z/bO:z/cO:y/dA:z/e &&
 +  test_cmp expect actual &&
 +  test_must_fail git rev-parse HEAD:z/e
 +  )
@@ -219,7 +219,7 @@
 +  git rev-parse >actual \
 +  HEAD:y/b HEAD:y/c HEAD:y/d &&
 +  git rev-parse >expect \
-+  O:z/b O:z/c O:x/d &&
++  O:z/bO:z/cO:x/d &&
 +  test_cmp expect actual &&
 +  test_must_fail git rev-parse HEAD:x/d &&
 +

[PATCH v7 11/31] directory rename detection: tests for handling overwriting dirty files

2018-01-30 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 458 
 1 file changed, 458 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index aa9af49edc..fbac664408 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -3238,4 +3238,462 @@ test_expect_failure '10e-check: Does git complain about 
untracked file that is n
)
 '
 
+###
+# SECTION 11: Handling dirty (not up-to-date) files
+#
+# unpack_trees(), upon which the recursive merge algorithm is based, aborts
+# the operation if untracked or dirty files would be deleted or overwritten
+# by the merge.  Unfortunately, unpack_trees() does not understand renames,
+# and if it doesn't abort, then it muddies up the working directory before
+# we even get to the point of detecting renames, so we need some special
+# handling.  This was true even of normal renames, but there are additional
+# codepaths that need special handling with directory renames.  Add
+# testcases for both renamed-by-directory-rename-detection and standard
+# rename cases.
+###
+
+# Testcase 11a, Avoid losing dirty contents with simple rename
+#   Commit O: z/{a,b_v1},
+#   Commit A: z/{a,c_v1}, and z/c_v1 has uncommitted mods
+#   Commit B: z/{a,b_v2}
+#   Expected: ERROR_MSG(Refusing to lose dirty file at z/c) +
+# z/a, staged version of z/c has sha1sum matching B:z/b_v2,
+# z/c~HEAD with contents of B:z/b_v2,
+# z/c with uncommitted mods on top of A:z/c_v1
+
+test_expect_success '11a-setup: Avoid losing dirty contents with simple 
rename' '
+   test_create_repo 11a &&
+   (
+   cd 11a &&
+
+   mkdir z &&
+   echo a >z/a &&
+   test_seq 1 10 >z/b &&
+   git add z &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   git mv z/b z/c &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   echo 11 >>z/b &&
+   git add z/b &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_failure '11a-check: Avoid losing dirty contents with simple 
rename' '
+   (
+   cd 11a &&
+
+   git checkout A^0 &&
+   echo stuff >>z/c &&
+
+   test_must_fail git merge -s recursive B^0 >out 2>err &&
+   test_i18ngrep "Refusing to lose dirty file at z/c" out &&
+
+   test_seq 1 10 >expected &&
+   echo stuff >>expected &&
+   test_cmp expected z/c &&
+
+   git ls-files -s >out &&
+   test_line_count = 2 out &&
+   git ls-files -u >out &&
+   test_line_count = 1 out &&
+   git ls-files -o >out &&
+   test_line_count = 4 out &&
+
+   git rev-parse >actual \
+   :0:z/a :2:z/c &&
+   git rev-parse >expect \
+O:z/a  B:z/b &&
+   test_cmp expect actual &&
+
+   git hash-object z/c~HEAD >actual &&
+   git rev-parse B:z/b >expect &&
+   test_cmp expect actual
+   )
+'
+
+# Testcase 11b, Avoid losing dirty file involved in directory rename
+#   Commit O: z/a, x/{b,c_v1}
+#   Commit A: z/{a,c_v1},  x/b,   and z/c_v1 has uncommitted mods
+#   Commit B: y/a, x/{b,c_v2}
+#   Expected: y/{a,c_v2}, x/b, z/c_v1 with uncommitted mods untracked,
+# ERROR_MSG(Refusing to lose dirty file at z/c)
+
+
+test_expect_success '11b-setup: Avoid losing dirty file involved in directory 
rename' '
+   test_create_repo 11b &&
+   (
+   cd 11b &&
+
+   mkdir z x &&
+   echo a >z/a &&
+   echo b >x/b &&
+   test_seq 1 10 >x/c &&
+   git add z x &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   git mv x/c z/c &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   git mv z y &&
+   echo 11 >>x/c &&
+   git add x/c &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_failure '11b-check: Avoid losing dirty file involved in directory 
rename' '
+   (
+   cd 11b &&
+
+   git checkout A^0 &&
+   echo stuff >>z/c &&
+
+   git

[PATCH v7 25/31] merge-recursive: apply necessary modifications for directory renames

2018-01-30 Thread Elijah Newren

This commit hooks together all the directory rename logic by making the
necessary changes to the rename struct, it's dst_entry, and the
diff_filepair under consideration.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c   | 187 +++-
 t/t6043-merge-rename-directories.sh |  50 +-
 2 files changed, 211 insertions(+), 26 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 38dc0eefaf..7c78dc2dc1 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -177,6 +177,7 @@ static int oid_eq(const struct object_id *a, const struct 
object_id *b)
 
 enum rename_type {
RENAME_NORMAL = 0,
+   RENAME_DIR,
RENAME_DELETE,
RENAME_ONE_FILE_TO_ONE,
RENAME_ONE_FILE_TO_TWO,
@@ -607,6 +608,7 @@ struct rename {
 */
struct stage_data *src_entry;
struct stage_data *dst_entry;
+   unsigned add_turned_into_rename:1;
unsigned processed:1;
 };
 
@@ -641,6 +643,27 @@ static int update_stages(struct merge_options *opt, const 
char *path,
return 0;
 }
 
+static int update_stages_for_stage_data(struct merge_options *opt,
+   const char *path,
+   const struct stage_data *stage_data)
+{
+   struct diff_filespec o, a, b;
+
+   o.mode = stage_data->stages[1].mode;
+   oidcpy(, _data->stages[1].oid);
+
+   a.mode = stage_data->stages[2].mode;
+   oidcpy(, _data->stages[2].oid);
+
+   b.mode = stage_data->stages[3].mode;
+   oidcpy(, _data->stages[3].oid);
+
+   return update_stages(opt, path,
+is_null_sha1(o.oid.hash) ? NULL : ,
+is_null_sha1(a.oid.hash) ? NULL : ,
+is_null_sha1(b.oid.hash) ? NULL : );
+}
+
 static void update_entry(struct stage_data *entry,
 struct diff_filespec *o,
 struct diff_filespec *a,
@@ -1117,6 +1140,18 @@ static int merge_file_one(struct merge_options *o,
return merge_file_1(o, , , , branch1, branch2, mfi);
 }
 
+static int conflict_rename_dir(struct merge_options *o,
+  struct diff_filepair *pair,
+  const char *rename_branch,
+  const char *other_branch)
+{
+   const struct diff_filespec *dest = pair->two;
+
+   if (update_file(o, 1, >oid, dest->mode, dest->path))
+   return -1;
+   return 0;
+}
+
 static int handle_change_delete(struct merge_options *o,
 const char *path, const char *old_path,
 const struct object_id *o_oid, int o_mode,
@@ -1386,6 +1421,24 @@ static int conflict_rename_rename_2to1(struct 
merge_options *o,
if (!ret)
ret = update_file(o, 0, _c2.oid, mfi_c2.mode,
  new_path2);
+   /*
+* unpack_trees() actually populates the index for us for
+* "normal" rename/rename(2to1) situtations so that the
+* correct entries are at the higher stages, which would
+* make the call below to update_stages_for_stage_data
+* unnecessary.  However, if either of the renames came
+* from a directory rename, then unpack_trees() will not
+* have gotten the right data loaded into the index, so we
+* need to do so now.  (While it'd be tempting to move this
+* call to update_stages_for_stage_data() to
+* apply_directory_rename_modifications(), that would break
+* our intermediate calls to would_lose_untracked() since
+* those rely on the current in-memory index.  See also the
+* big "NOTE" in update_stages()).
+*/
+   if (update_stages_for_stage_data(o, path, ci->dst_entry1))
+   ret = -1;
+
free(new_path2);
free(new_path1);
}
@@ -1919,6 +1972,111 @@ static char *check_for_directory_rename(struct 
merge_options *o,
return new_path;
 }
 
+static void apply_directory_rename_modifications(struct merge_options *o,
+struct diff_filepair *pair,
+char *new_path,
+struct rename *re,
+struct tree *tree,
+struct tree *o_tree,
+struct tree *a_tree,
+struct tree *b_tree,
+struct string_list *entries,
+int *clean)
+{
+   struct

[PATCH v7 31/31] merge-recursive: ensure we write updates for directory-renamed file

2018-01-30 Thread Elijah Newren

When a file is present in HEAD before the merge and the other side of the
merge does not modify that file, we try to avoid re-writing the file and
making it stat-dirty.  However, when a file is present in HEAD before the
merge and was in a directory that was renamed by the other side of the
merge, we have to move the file to a new location and re-write it.
Update the code that checks whether we can skip the update to also work in
the presence of directory renames.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c   | 4 +---
 t/t6043-merge-rename-directories.sh | 2 +-
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 97859e1ab7..1e40aff51d 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2741,7 +2741,6 @@ static int merge_content(struct merge_options *o,
 
if (mfi.clean && !df_conflict_remains &&
oid_eq(, a_oid) && mfi.mode == a_mode) {
-   int path_renamed_outside_HEAD;
output(o, 3, _("Skipped %s (merged same as existing)"), path);
/*
 * The content merge resulted in the same file contents we
@@ -2749,8 +2748,7 @@ static int merge_content(struct merge_options *o,
 * are recorded at the correct path (which may not be true
 * if the merge involves a rename).
 */
-   path_renamed_outside_HEAD = !path2 || !strcmp(path, path2);
-   if (!path_renamed_outside_HEAD) {
+   if (was_tracked(path)) {
add_cacheinfo(o, mfi.mode, , path,
  0, (!o->call_depth), 0);
return mfi.clean;
diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index f349f69984..500fdd7755 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -3876,7 +3876,7 @@ test_expect_success '12b-setup: Moving one directory 
hierarchy into another' '
)
 '
 
-test_expect_failure '12b-check: Moving one directory hierarchy into another' '
+test_expect_success '12b-check: Moving one directory hierarchy into another' '
(
cd 12b &&
 
-- 
2.16.1.106.gf69932adfe

[PATCH v7 22/31] merge-recursive: add computation of collisions due to dir rename & merging

2018-01-30 Thread Elijah Newren

directory renaming and merging can cause one or more files to be moved to
where an existing file is, or to cause several files to all be moved to
the same (otherwise vacant) location.  Add checking and reporting for such
cases, falling back to no-directory-rename handling for such paths.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 123 --
 1 file changed, 120 insertions(+), 3 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index ac968ad2ae..dc03b1bb54 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1425,6 +1425,31 @@ static int tree_has_path(struct tree *tree, const char 
*path)
   hashy, _o);
 }
 
+/*
+ * Return a new string that replaces the beginning portion (which matches
+ * entry->dir), with entry->new_dir.  In perl-speak:
+ *   new_path_name = (old_path =~ s/entry->dir/entry->new_dir/);
+ * NOTE:
+ *   Caller must ensure that old_path starts with entry->dir + '/'.
+ */
+static char *apply_dir_rename(struct dir_rename_entry *entry,
+ const char *old_path)
+{
+   struct strbuf new_path = STRBUF_INIT;
+   int oldlen, newlen;
+
+   if (entry->non_unique_new_dir)
+   return NULL;
+
+   oldlen = strlen(entry->dir);
+   newlen = entry->new_dir.len + (strlen(old_path) - oldlen) + 1;
+   strbuf_grow(_path, newlen);
+   strbuf_addbuf(_path, >new_dir);
+   strbuf_addstr(_path, _path[oldlen]);
+
+   return strbuf_detach(_path, NULL);
+}
+
 static void get_renamed_dir_portion(const char *old_path, const char *new_path,
char **old_dir, char **new_dir)
 {
@@ -1663,6 +1688,84 @@ static struct hashmap *get_directory_renames(struct 
diff_queue_struct *pairs,
return dir_renames;
 }
 
+static struct dir_rename_entry *check_dir_renamed(const char *path,
+ struct hashmap *dir_renames)
+{
+   char temp[PATH_MAX];
+   char *end;
+   struct dir_rename_entry *entry;
+
+   strcpy(temp, path);
+   while ((end = strrchr(temp, '/'))) {
+   *end = '\0';
+   entry = dir_rename_find_entry(dir_renames, temp);
+   if (entry)
+   return entry;
+   }
+   return NULL;
+}
+
+static void compute_collisions(struct hashmap *collisions,
+  struct hashmap *dir_renames,
+  struct diff_queue_struct *pairs)
+{
+   int i;
+
+   /*
+* Multiple files can be mapped to the same path due to directory
+* renames done by the other side of history.  Since that other
+* side of history could have merged multiple directories into one,
+* if our side of history added the same file basename to each of
+* those directories, then all N of them would get implicitly
+* renamed by the directory rename detection into the same path,
+* and we'd get an add/add/.../add conflict, and all those adds
+* from *this* side of history.  This is not representable in the
+* index, and users aren't going to easily be able to make sense of
+* it.  So we need to provide a good warning about what's
+* happening, and fall back to no-directory-rename detection
+* behavior for those paths.
+*
+* See testcases 9e and all of section 5 from t6043 for examples.
+*/
+   collision_init(collisions);
+
+   for (i = 0; i < pairs->nr; ++i) {
+   struct dir_rename_entry *dir_rename_ent;
+   struct collision_entry *collision_ent;
+   char *new_path;
+   struct diff_filepair *pair = pairs->queue[i];
+
+   if (pair->status == 'D')
+   continue;
+   dir_rename_ent = check_dir_renamed(pair->two->path,
+  dir_renames);
+   if (!dir_rename_ent)
+   continue;
+
+   new_path = apply_dir_rename(dir_rename_ent, pair->two->path);
+   if (!new_path)
+   /*
+* dir_rename_ent->non_unique_new_path is true, which
+* means there is no directory rename for us to use,
+* which means it won't cause us any additional
+* collisions.
+*/
+   continue;
+   collision_ent = collision_find_entry(collisions, new_path);
+   if (!collision_ent) {
+   collision_ent = xcalloc(1,
+   sizeof(struct collision_entry));
+   hashmap_entry_init(collision_ent, strhash(new_path));
+   hashmap_put(collisions, collision_ent);
+   collision_ent->target_file = new_path;
+

[PATCH v7 30/31] merge-recursive: avoid spurious rename/rename conflict from dir renames

2018-01-30 Thread Elijah Newren

If a file on one side of history was renamed, and merely modified on the
other side, then applying a directory rename to the modified side gives us
a rename/rename(1to2) conflict.  We should only apply directory renames to
pairs representing either adds or renames.

Making this change means that a directory rename testcase that was
previously reported as a rename/delete conflict will now be reported as a
modify/delete conflict.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c   |  4 +--
 t/t6043-merge-rename-directories.sh | 55 +
 2 files changed, 27 insertions(+), 32 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 62e4266d21..97859e1ab7 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1960,7 +1960,7 @@ static void compute_collisions(struct hashmap *collisions,
char *new_path;
struct diff_filepair *pair = pairs->queue[i];
 
-   if (pair->status == 'D')
+   if (pair->status != 'A' && pair->status != 'R')
continue;
dir_rename_ent = check_dir_renamed(pair->two->path,
   dir_renames);
@@ -2187,7 +2187,7 @@ static struct string_list *get_renames(struct 
merge_options *o,
struct diff_filepair *pair = pairs->queue[i];
char *new_path; /* non-NULL only with directory renames */
 
-   if (pair->status == 'D') {
+   if (pair->status != 'A' && pair->status != 'R') {
diff_free_filepair(pair);
continue;
}
diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index 3d292f0c5f..f349f69984 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -2070,18 +2070,23 @@ test_expect_success '8b-check: Dual-directory rename, 
one into the others way, w
)
 '
 
-# Testcase 8c, rename+modify/delete
-#   (Related to testcases 5b and 8d)
+# Testcase 8c, modify/delete or rename+modify/delete?
+#   (Related to testcases 5b, 8d, and 9h)
 #   Commit O: z/{b,c,d}
 #   Commit A: y/{b,c}
 #   Commit B: z/{b,c,d_modified,e}
-#   Expected: y/{b,c,e}, CONFLICT(rename+modify/delete: x/d -> y/d or deleted)
+#   Expected: y/{b,c,e}, CONFLICT(modify/delete: on z/d)
 #
-#   Note: This testcase doesn't present any concerns for me...until you
-# compare it with testcases 5b and 8d.  See notes in 8d for more
-# details.
-
-test_expect_success '8c-setup: rename+modify/delete' '
+#   Note: It could easily be argued that the correct resolution here is
+# y/{b,c,e}, CONFLICT(rename/delete: z/d -> y/d vs deleted)
+# and that the modifed version of d should be present in y/ after
+# the merge, just marked as conflicted.  Indeed, I previously did
+# argue that.  But applying directory renames to the side of
+# history where a file is merely modified results in spurious
+# rename/rename(1to2) conflicts -- see testcase 9h.  See also
+# notes in 8d.
+
+test_expect_success '8c-setup: modify/delete or rename+modify/delete?' '
test_create_repo 8c &&
(
cd 8c &&
@@ -2114,32 +2119,32 @@ test_expect_success '8c-setup: rename+modify/delete' '
)
 '
 
-test_expect_success '8c-check: rename+modify/delete' '
+test_expect_success '8c-check: modify/delete or rename+modify/delete' '
(
cd 8c &&
 
git checkout A^0 &&
 
test_must_fail git merge -s recursive B^0 >out &&
-   test_i18ngrep "CONFLICT (rename/delete).* z/d.*y/d" out &&
+   test_i18ngrep "CONFLICT (modify/delete).* z/d" out &&
 
git ls-files -s >out &&
-   test_line_count = 4 out &&
+   test_line_count = 5 out &&
git ls-files -u >out &&
-   test_line_count = 1 out &&
+   test_line_count = 2 out &&
git ls-files -o >out &&
test_line_count = 1 out &&
 
git rev-parse >actual \
-   :0:y/b :0:y/c :0:y/e :3:y/d &&
+   :0:y/b :0:y/c :0:y/e :1:z/d :3:z/d &&
git rev-parse >expect \
-O:z/b  O:z/c  B:z/e  B:z/d &&
+O:z/b  O:z/c  B:z/e  O:z/d  B:z/d &&
test_cmp expect actual &&
 
-   test_must_fail git rev-parse :1:y/d &&
-   test_must_fail git rev-parse :2:y/d &&
-   git ls-files -s y/d | grep ^100755 &&
-   test_path_is_file y/d
+   test_must_fail git rev-parse :2:z/d &&
+   git ls-files -s z/d | grep ^100755 &&
+   test_path_is_file z/d &&
+   test_path_is_missing y/d
)
 '
 
@@ -2153,16 +2158,6 @@ test_expect_success '8c-check: rename+modify/delete' '

[PATCH v7 23/31] merge-recursive: check for file level conflicts then get new name

2018-01-30 Thread Elijah Newren

Before trying to apply directory renames to paths within the given
directories, we want to make sure that there aren't conflicts at the
file level either.  If there aren't any, then get the new name from
any directory renames.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c   | 174 ++--
 strbuf.c|  16 
 strbuf.h|  16 
 t/t6043-merge-rename-directories.sh |   2 +-
 4 files changed, 199 insertions(+), 9 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index dc03b1bb54..354d91d2a8 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1517,6 +1517,91 @@ static void remove_hashmap_entries(struct hashmap 
*dir_renames,
string_list_clear(items_to_remove, 0);
 }
 
+/*
+ * See if there is a directory rename for path, and if there are any file
+ * level conflicts for the renamed location.  If there is a rename and
+ * there are no conflicts, return the new name.  Otherwise, return NULL.
+ */
+static char *handle_path_level_conflicts(struct merge_options *o,
+const char *path,
+struct dir_rename_entry *entry,
+struct hashmap *collisions,
+struct tree *tree)
+{
+   char *new_path = NULL;
+   struct collision_entry *collision_ent;
+   int clean = 1;
+   struct strbuf collision_paths = STRBUF_INIT;
+
+   /*
+* entry has the mapping of old directory name to new directory name
+* that we want to apply to path.
+*/
+   new_path = apply_dir_rename(entry, path);
+
+   if (!new_path) {
+   /* This should only happen when entry->non_unique_new_dir set */
+   if (!entry->non_unique_new_dir)
+   BUG("entry->non_unqiue_dir not set and !new_path");
+   output(o, 1, _("CONFLICT (directory rename split): "
+  "Unclear where to place %s because directory "
+  "%s was renamed to multiple other directories, "
+  "with no destination getting a majority of the "
+  "files."),
+  path, entry->dir);
+   clean = 0;
+   return NULL;
+   }
+
+   /*
+* The caller needs to have ensured that it has pre-populated
+* collisions with all paths that map to new_path.  Do a quick check
+* to ensure that's the case.
+*/
+   collision_ent = collision_find_entry(collisions, new_path);
+   if (collision_ent == NULL)
+   BUG("collision_ent is NULL");
+
+   /*
+* Check for one-sided add/add/.../add conflicts, i.e.
+* where implicit renames from the other side doing
+* directory rename(s) can affect this side of history
+* to put multiple paths into the same location.  Warn
+* and bail on directory renames for such paths.
+*/
+   if (collision_ent->reported_already) {
+   clean = 0;
+   } else if (tree_has_path(tree, new_path)) {
+   collision_ent->reported_already = 1;
+   strbuf_add_separated_string_list(_paths, ", ",
+_ent->source_files);
+   output(o, 1, _("CONFLICT (implicit dir rename): Existing "
+  "file/dir at %s in the way of implicit "
+  "directory rename(s) putting the following "
+  "path(s) there: %s."),
+  new_path, collision_paths.buf);
+   clean = 0;
+   } else if (collision_ent->source_files.nr > 1) {
+   collision_ent->reported_already = 1;
+   strbuf_add_separated_string_list(_paths, ", ",
+_ent->source_files);
+   output(o, 1, _("CONFLICT (implicit dir rename): Cannot map "
+  "more than one path to %s; implicit directory "
+  "renames tried to put these paths there: %s"),
+  new_path, collision_paths.buf);
+   clean = 0;
+   }
+
+   /* Free memory we no longer need */
+   strbuf_release(_paths);
+   if (!clean && new_path) {
+   free(new_path);
+   return NULL;
+   }
+
+   return new_path;
+}
+
 /*
  * There are a couple things we want to do at the directory level:
  *   1. Check for both sides renaming to the same thing, in order to avoid
@@ -1766,6 +1851,59 @@ static void compute_collisions(struct hashmap 
*collisions,
}
 }
 
+static char *check_for_directory_rename(struct merge_options *o,
+   const char *path,
+   struct tree

[PATCH v7 16/31] merge-recursive: split out code for determining diff_filepairs

2018-01-30 Thread Elijah Newren

Create a new function, get_diffpairs() to compute the diff_filepairs
between two trees.  While these are currently only used in
get_renames(), I want them to be available to some new functions.  No
actual logic changes yet.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 86 +--
 1 file changed, 64 insertions(+), 22 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 4e6d0c248e..8ac69e1cbb 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1321,24 +1321,15 @@ static int conflict_rename_rename_2to1(struct 
merge_options *o,
 }
 
 /*
- * Get information of all renames which occurred between 'o_tree' and
- * 'tree'. We need the three trees in the merge ('o_tree', 'a_tree' and
- * 'b_tree') to be able to associate the correct cache entries with
- * the rename information. 'tree' is always equal to either a_tree or b_tree.
+ * Get the diff_filepairs changed between o_tree and tree.
  */
-static struct string_list *get_renames(struct merge_options *o,
-  struct tree *tree,
-  struct tree *o_tree,
-  struct tree *a_tree,
-  struct tree *b_tree,
-  struct string_list *entries)
+static struct diff_queue_struct *get_diffpairs(struct merge_options *o,
+  struct tree *o_tree,
+  struct tree *tree)
 {
-   int i;
-   struct string_list *renames;
+   struct diff_queue_struct *ret;
struct diff_options opts;
 
-   renames = xcalloc(1, sizeof(struct string_list));
-
diff_setup();
opts.flags.recursive = 1;
opts.flags.rename_empty = 0;
@@ -1354,10 +1345,43 @@ static struct string_list *get_renames(struct 
merge_options *o,
diffcore_std();
if (opts.needed_rename_limit > o->needed_rename_limit)
o->needed_rename_limit = opts.needed_rename_limit;
-   for (i = 0; i < diff_queued_diff.nr; ++i) {
+
+   ret = malloc(sizeof(struct diff_queue_struct));
+   ret->queue = diff_queued_diff.queue;
+   ret->nr = diff_queued_diff.nr;
+   /* Ignore diff_queued_diff.alloc; we won't be changing size at all */
+
+   opts.output_format = DIFF_FORMAT_NO_OUTPUT;
+   diff_queued_diff.nr = 0;
+   diff_queued_diff.queue = NULL;
+   diff_flush();
+   return ret;
+}
+
+/*
+ * Get information of all renames which occurred in 'pairs', making use of
+ * any implicit directory renames inferred from the other side of history.
+ * We need the three trees in the merge ('o_tree', 'a_tree' and 'b_tree')
+ * to be able to associate the correct cache entries with the rename
+ * information; tree is always equal to either a_tree or b_tree.
+ */
+static struct string_list *get_renames(struct merge_options *o,
+  struct diff_queue_struct *pairs,
+  struct tree *tree,
+  struct tree *o_tree,
+  struct tree *a_tree,
+  struct tree *b_tree,
+  struct string_list *entries)
+{
+   int i;
+   struct string_list *renames;
+
+   renames = xcalloc(1, sizeof(struct string_list));
+
+   for (i = 0; i < pairs->nr; ++i) {
struct string_list_item *item;
struct rename *re;
-   struct diff_filepair *pair = diff_queued_diff.queue[i];
+   struct diff_filepair *pair = pairs->queue[i];
 
if (pair->status != 'R') {
diff_free_filepair(pair);
@@ -1382,9 +1406,6 @@ static struct string_list *get_renames(struct 
merge_options *o,
item = string_list_insert(renames, pair->one->path);
item->util = re;
}
-   opts.output_format = DIFF_FORMAT_NO_OUTPUT;
-   diff_queued_diff.nr = 0;
-   diff_flush();
return renames;
 }
 
@@ -1655,15 +1676,36 @@ static int handle_renames(struct merge_options *o,
  struct string_list *entries,
  struct rename_info *ri)
 {
+   struct diff_queue_struct *head_pairs, *merge_pairs;
+   int clean;
+
ri->head_renames = NULL;
ri->merge_renames = NULL;
 
if (!o->detect_rename)
return 1;
 
-   ri->head_renames  = get_renames(o, head, common, head, merge, entries);
-   ri->merge_renames = get_renames(o, merge, common, head, merge, entries);
-   return process_renames(o, ri->head_renames, ri->merge_renames);
+   head_pairs = get_diffpairs(o, common, head);
+   merge_pairs = get_diffpairs(o, common, merge);
+
+   ri->head_renames  = get_renames(o, head_pairs, head,
+

[PATCH v7 20/31] merge-recursive: check for directory level conflicts

2018-01-30 Thread Elijah Newren

Before trying to apply directory renames to paths within the given
directories, we want to make sure that there aren't conflicts at the
directory level.  There will be additional checks at the individual
file level too, which will be added later.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 119 ++
 1 file changed, 119 insertions(+)

diff --git a/merge-recursive.c b/merge-recursive.c
index c75d3a5139..9e9ad45d2a 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1393,6 +1393,15 @@ static struct diff_queue_struct *get_diffpairs(struct 
merge_options *o,
return ret;
 }
 
+static int tree_has_path(struct tree *tree, const char *path)
+{
+   unsigned char hashy[20];
+   unsigned int mode_o;
+
+   return !get_tree_entry(tree->object.oid.hash, path,
+  hashy, _o);
+}
+
 static void get_renamed_dir_portion(const char *old_path, const char *new_path,
char **old_dir, char **new_dir)
 {
@@ -1447,6 +1456,112 @@ static void get_renamed_dir_portion(const char 
*old_path, const char *new_path,
}
 }
 
+static void remove_hashmap_entries(struct hashmap *dir_renames,
+  struct string_list *items_to_remove)
+{
+   int i;
+   struct dir_rename_entry *entry;
+
+   for (i = 0; i < items_to_remove->nr; i++) {
+   entry = items_to_remove->items[i].util;
+   hashmap_remove(dir_renames, entry, NULL);
+   }
+   string_list_clear(items_to_remove, 0);
+}
+
+/*
+ * There are a couple things we want to do at the directory level:
+ *   1. Check for both sides renaming to the same thing, in order to avoid
+ *  implicit renaming of files that should be left in place.  (See
+ *  testcase 6b in t6043 for details.)
+ *   2. Prune directory renames if there are still files left in the
+ *  the original directory.  These represent a partial directory rename,
+ *  i.e. a rename where only some of the files within the directory
+ *  were renamed elsewhere.  (Technically, this could be done earlier
+ *  in get_directory_renames(), except that would prevent us from
+ *  doing the previous check and thus failing testcase 6b.)
+ *   3. Check for rename/rename(1to2) conflicts (at the directory level).
+ *  In the future, we could potentially record this info as well and
+ *  omit reporting rename/rename(1to2) conflicts for each path within
+ *  the affected directories, thus cleaning up the merge output.
+ *   NOTE: We do NOT check for rename/rename(2to1) conflicts at the
+ * directory level, because merging directories is fine.  If it
+ * causes conflicts for files within those merged directories, then
+ * that should be detected at the individual path level.
+ */
+static void handle_directory_level_conflicts(struct merge_options *o,
+struct hashmap *dir_re_head,
+struct tree *head,
+struct hashmap *dir_re_merge,
+struct tree *merge)
+{
+   struct hashmap_iter iter;
+   struct dir_rename_entry *head_ent;
+   struct dir_rename_entry *merge_ent;
+
+   struct string_list remove_from_head = STRING_LIST_INIT_NODUP;
+   struct string_list remove_from_merge = STRING_LIST_INIT_NODUP;
+
+   hashmap_iter_init(dir_re_head, );
+   while ((head_ent = hashmap_iter_next())) {
+   merge_ent = dir_rename_find_entry(dir_re_merge, head_ent->dir);
+   if (merge_ent &&
+   !head_ent->non_unique_new_dir &&
+   !merge_ent->non_unique_new_dir &&
+   !strbuf_cmp(_ent->new_dir, _ent->new_dir)) {
+   /* 1. Renamed identically; remove it from both sides */
+   string_list_append(_from_head,
+  head_ent->dir)->util = head_ent;
+   strbuf_release(_ent->new_dir);
+   string_list_append(_from_merge,
+  merge_ent->dir)->util = merge_ent;
+   strbuf_release(_ent->new_dir);
+   } else if (tree_has_path(head, head_ent->dir)) {
+   /* 2. This wasn't a directory rename after all */
+   string_list_append(_from_head,
+  head_ent->dir)->util = head_ent;
+   strbuf_release(_ent->new_dir);
+   }
+   }
+
+   remove_hashmap_entries(dir_re_head, _from_head);
+   remove_hashmap_entries(dir_re_merge, _from_merge);
+
+   hashmap_iter_init(dir_re_merge, );
+   while ((merge_ent = hashmap_iter_next())) {
+   head_ent = dir_rename_find_entry(dir_re_head, merge_ent->dir);
+

[PATCH v7 14/31] merge-recursive: fix leaks of allocated renames and diff_filepairs

2018-01-30 Thread Elijah Newren

get_renames() has always zero'ed out diff_queued_diff.nr while only
manually free'ing diff_filepairs that did not correspond to renames.
Further, it allocated struct renames that were tucked away in the
return string_list.  Make sure all of these are deallocated when we
are done with them.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index eac3041261..1986af79a9 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1662,13 +1662,23 @@ static int handle_renames(struct merge_options *o,
return process_renames(o, ri->head_renames, ri->merge_renames);
 }
 
-static void cleanup_renames(struct rename_info *re_info)
+static void cleanup_rename(struct string_list *rename)
 {
-   string_list_clear(re_info->head_renames, 0);
-   string_list_clear(re_info->merge_renames, 0);
+   const struct rename *re;
+   int i;
 
-   free(re_info->head_renames);
-   free(re_info->merge_renames);
+   for (i = 0; i < rename->nr; i++) {
+   re = rename->items[i].util;
+   diff_free_filepair(re->pair);
+   }
+   string_list_clear(rename, 1);
+   free(rename);
+}
+
+static void cleanup_renames(struct rename_info *re_info)
+{
+   cleanup_rename(re_info->head_renames);
+   cleanup_rename(re_info->merge_renames);
 }
 
 static struct object_id *stage_oid(const struct object_id *oid, unsigned mode)
-- 
2.16.1.106.gf69932adfe

[PATCH v7 24/31] merge-recursive: when comparing files, don't include trees

2018-01-30 Thread Elijah Newren

get_renames() would look up stage data that already existed (populated
in get_unmerged(), taken from whatever unpack_trees() created), and if
it didn't exist, would call insert_stage_data() to create the necessary
entry for the given file.  The insert_stage_data() fallback becomes
much more important for directory rename detection, because that creates
a mechanism to have a file in the resulting merge that didn't exist on
either side of history.  However, insert_stage_data(), due to calling
get_tree_entry() loaded up trees as readily as files.  We aren't
interested in comparing trees to files; the D/F conflict handling is
done elsewhere.  This code is just concerned with what entries existed
for a given path on the different sides of the merge, so create a
get_tree_entry_if_blob() helper function and use it.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 27 +--
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 354d91d2a8..38dc0eefaf 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -418,6 +418,21 @@ static void get_files_dirs(struct merge_options *o, struct 
tree *tree)
read_tree_recursive(tree, "", 0, 0, _all, save_files_dirs, o);
 }
 
+static int get_tree_entry_if_blob(const unsigned char *tree,
+ const char *path,
+ unsigned char *hashy,
+ unsigned int *mode_o)
+{
+   int ret;
+
+   ret = get_tree_entry(tree, path, hashy, mode_o);
+   if (S_ISDIR(*mode_o)) {
+   hashcpy(hashy, null_sha1);
+   *mode_o = 0;
+   }
+   return ret;
+}
+
 /*
  * Returns an index_entry instance which doesn't have to correspond to
  * a real cache entry in Git's index.
@@ -428,12 +443,12 @@ static struct stage_data *insert_stage_data(const char 
*path,
 {
struct string_list_item *item;
struct stage_data *e = xcalloc(1, sizeof(struct stage_data));
-   get_tree_entry(o->object.oid.hash, path,
-   e->stages[1].oid.hash, >stages[1].mode);
-   get_tree_entry(a->object.oid.hash, path,
-   e->stages[2].oid.hash, >stages[2].mode);
-   get_tree_entry(b->object.oid.hash, path,
-   e->stages[3].oid.hash, >stages[3].mode);
+   get_tree_entry_if_blob(o->object.oid.hash, path,
+  e->stages[1].oid.hash, >stages[1].mode);
+   get_tree_entry_if_blob(a->object.oid.hash, path,
+  e->stages[2].oid.hash, >stages[2].mode);
+   get_tree_entry_if_blob(b->object.oid.hash, path,
+  e->stages[3].oid.hash, >stages[3].mode);
item = string_list_insert(entries, path);
item->util = e;
return e;
-- 
2.16.1.106.gf69932adfe

[PATCH v7 15/31] merge-recursive: make !o->detect_rename codepath more obvious

2018-01-30 Thread Elijah Newren

Previously, if !o->detect_rename then get_renames() would return an
empty string_list, and then process_renames() would have nothing to
iterate over.  It seems more straightforward to simply avoid calling
either function in that case.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 1986af79a9..4e6d0c248e 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1338,8 +1338,6 @@ static struct string_list *get_renames(struct 
merge_options *o,
struct diff_options opts;
 
renames = xcalloc(1, sizeof(struct string_list));
-   if (!o->detect_rename)
-   return renames;
 
diff_setup();
opts.flags.recursive = 1;
@@ -1657,6 +1655,12 @@ static int handle_renames(struct merge_options *o,
  struct string_list *entries,
  struct rename_info *ri)
 {
+   ri->head_renames = NULL;
+   ri->merge_renames = NULL;
+
+   if (!o->detect_rename)
+   return 1;
+
ri->head_renames  = get_renames(o, head, common, head, merge, entries);
ri->merge_renames = get_renames(o, merge, common, head, merge, entries);
return process_renames(o, ri->head_renames, ri->merge_renames);
@@ -1667,6 +1671,9 @@ static void cleanup_rename(struct string_list *rename)
const struct rename *re;
int i;
 
+   if (rename == NULL)
+   return;
+
for (i = 0; i < rename->nr; i++) {
re = rename->items[i].util;
diff_free_filepair(re->pair);
-- 
2.16.1.106.gf69932adfe

[PATCH v7 10/31] directory rename detection: tests for handling overwriting untracked files

2018-01-30 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 367 
 1 file changed, 367 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index b730256653..aa9af49edc 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -2871,4 +2871,371 @@ test_expect_failure '9g-check: Renamed directory that 
only contained immediate s
 #   side of history for any implicit directory renames.
 ###
 
+###
+# SECTION 10: Handling untracked files
+#
+# unpack_trees(), upon which the recursive merge algorithm is based, aborts
+# the operation if untracked or dirty files would be deleted or overwritten
+# by the merge.  Unfortunately, unpack_trees() does not understand renames,
+# and if it doesn't abort, then it muddies up the working directory before
+# we even get to the point of detecting renames, so we need some special
+# handling, at least in the case of directory renames.
+###
+
+# Testcase 10a, Overwrite untracked: normal rename/delete
+#   Commit O: z/{b,c_1}
+#   Commit A: z/b + untracked z/c + untracked z/d
+#   Commit B: z/{b,d_1}
+#   Expected: Aborted Merge +
+#   ERROR_MSG(untracked working tree files would be overwritten by merge)
+
+test_expect_success '10a-setup: Overwrite untracked with normal rename/delete' 
'
+   test_create_repo 10a &&
+   (
+   cd 10a &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   git add z &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   git rm z/c &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   git mv z/c z/d &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_success '10a-check: Overwrite untracked with normal rename/delete' 
'
+   (
+   cd 10a &&
+
+   git checkout A^0 &&
+   echo very >z/c &&
+   echo important >z/d &&
+
+   test_must_fail git merge -s recursive B^0 >out 2>err &&
+   test_i18ngrep "The following untracked working tree files would 
be overwritten by merge" err &&
+
+   git ls-files -s >out &&
+   test_line_count = 1 out &&
+   git ls-files -o >out &&
+   test_line_count = 4 out &&
+
+   echo very >expect &&
+   test_cmp expect z/c &&
+
+   echo important >expect &&
+   test_cmp expect z/d &&
+
+   git rev-parse HEAD:z/b >actual &&
+   git rev-parse O:z/b >expect &&
+   test_cmp expect actual
+   )
+'
+
+# Testcase 10b, Overwrite untracked: dir rename + delete
+#   Commit O: z/{b,c_1}
+#   Commit A: y/b + untracked y/{c,d,e}
+#   Commit B: z/{b,d_1,e}
+#   Expected: Failed Merge; y/b + untracked y/c + untracked y/d on disk +
+# z/c_1 -> z/d_1 rename recorded at stage 3 for y/d +
+#   ERROR_MSG(refusing to lose untracked file at 'y/d')
+
+test_expect_success '10b-setup: Overwrite untracked with dir rename + delete' '
+   test_create_repo 10b &&
+   (
+   cd 10b &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   git add z &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   git rm z/c &&
+   git mv z/ y/ &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   git mv z/c z/d &&
+   echo e >z/e &&
+   git add z/e &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_failure '10b-check: Overwrite untracked with dir rename + delete' '
+   (
+   cd 10b &&
+
+   git checkout A^0 &&
+   echo very >y/c &&
+   echo important >y/d &&
+   echo contents >y/e &&
+
+   test_must_fail git merge -s recursive B^0 >out 2>err &&
+   test_i18ngrep "CONFLICT (rename/delete).*Version B\^0 of y/d 
left in tree at y/d~B\^0" out &&
+   test_i18ngrep "Error: Refusing to lose untracked file at y/e; 
writing to y/e~B\^0 instead" out &&
+
+   git ls-files -s >out &&
+   test_line_count = 3 out &&
+   git ls-files -u >out &&
+

[PATCH v7 28/31] merge-recursive: fix remaining directory rename + dirty overwrite cases

2018-01-30 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 merge-recursive.c   | 26 +++---
 t/t6043-merge-rename-directories.sh |  8 
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index fba1a0d207..62e4266d21 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1320,11 +1320,23 @@ static int handle_file(struct merge_options *o,
 
add = filespec_from_entry(, dst_entry, stage ^ 1);
if (add) {
+   int ren_src_was_dirty = was_dirty(o, rename->path);
char *add_name = unique_path(o, rename->path, other_branch);
if (update_file(o, 0, >oid, add->mode, add_name))
return -1;
 
-   remove_file(o, 0, rename->path, 0);
+   if (ren_src_was_dirty) {
+   output(o, 1, _("Refusing to lose dirty file at %s"),
+  rename->path);
+   }
+   /*
+* Stupid double negatives in remove_file; it somehow manages
+* to repeatedly mess me up.  So, just for myself:
+*1) update_wd iff !ren_src_was_dirty.
+*2) no_wd iff !update_wd
+*3) so, no_wd == !!ren_src_was_dirty == ren_src_was_dirty
+*/
+   remove_file(o, 0, rename->path, ren_src_was_dirty);
dst_name = unique_path(o, rename->path, cur_branch);
} else {
if (dir_in_way(rename->path, !o->call_depth, 0)) {
@@ -1462,7 +1474,10 @@ static int conflict_rename_rename_2to1(struct 
merge_options *o,
char *new_path2 = unique_path(o, path, ci->branch2);
output(o, 1, _("Renaming %s to %s and %s to %s instead"),
   a->path, new_path1, b->path, new_path2);
-   if (would_lose_untracked(path))
+   if (was_dirty(o, path))
+   output(o, 1, _("Refusing to lose dirty file at %s"),
+  path);
+   else if (would_lose_untracked(path))
/*
 * Only way we get here is if both renames were from
 * a directory rename AND user had an untracked file
@@ -2042,6 +2057,7 @@ static void apply_directory_rename_modifications(struct 
merge_options *o,
 {
struct string_list_item *item;
int stage = (tree == a_tree ? 2 : 3);
+   int update_wd;
 
/*
 * In all cases where we can do directory rename detection,
@@ -2052,7 +2068,11 @@ static void apply_directory_rename_modifications(struct 
merge_options *o,
 * saying the file would have been overwritten), but it might
 * be dirty, though.
 */
-   remove_file(o, 1, pair->two->path, 0 /* no_wd */);
+   update_wd = !was_dirty(o, pair->two->path);
+   if (!update_wd)
+   output(o, 1, _("Refusing to lose dirty file at %s"),
+  pair->two->path);
+   remove_file(o, 1, pair->two->path, !update_wd);
 
/* Find or create a new re->dst_entry */
item = string_list_lookup(entries, new_path);
diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index 89b2eacf38..a34c57d986 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -3362,7 +3362,7 @@ test_expect_success '11b-setup: Avoid losing dirty file 
involved in directory re
)
 '
 
-test_expect_failure '11b-check: Avoid losing dirty file involved in directory 
rename' '
+test_expect_success '11b-check: Avoid losing dirty file involved in directory 
rename' '
(
cd 11b &&
 
@@ -3504,7 +3504,7 @@ test_expect_success '11d-setup: Avoid losing not-uptodate 
with rename + D/F conf
)
 '
 
-test_expect_failure '11d-check: Avoid losing not-uptodate with rename + D/F 
conflict' '
+test_expect_success '11d-check: Avoid losing not-uptodate with rename + D/F 
conflict' '
(
cd 11d &&
 
@@ -3583,7 +3583,7 @@ test_expect_success '11e-setup: Avoid deleting 
not-uptodate with dir rename/rena
)
 '
 
-test_expect_failure '11e-check: Avoid deleting not-uptodate with dir 
rename/rename(1to2)/add' '
+test_expect_success '11e-check: Avoid deleting not-uptodate with dir 
rename/rename(1to2)/add' '
(
cd 11e &&
 
@@ -3659,7 +3659,7 @@ test_expect_success '11f-setup: Avoid deleting 
not-uptodate with dir rename/rena
)
 '
 
-test_expect_failure '11f-check: Avoid deleting not-uptodate with dir 
rename/rename(2to1)' '
+test_expect_success '11f-check: Avoid deleting not-uptodate with dir 
rename/rename(2to1)' '
(
cd 11f &&
 
-- 
2.16.1.106.gf69932adfe

[PATCH v7 06/31] directory rename detection: testcases checking which side did the rename

2018-01-30 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 336 
 1 file changed, 336 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index ac9c3e9974..fbeb8f4316 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -1172,4 +1172,340 @@ test_expect_failure '5d-check: Directory/file/file 
conflict due to directory ren
 #   back to old handling.  But, sadly, see testcases 8a and 8b.
 ###
 
+
+###
+# SECTION 6: Same side of the merge was the one that did the rename
+#
+# It may sound obvious that you only want to apply implicit directory
+# renames to directories if the _other_ side of history did the renaming.
+# If you did make an implementation that didn't explicitly enforce this
+# rule, the majority of cases that would fall under this section would
+# also be solved by following the rules from the above sections.  But
+# there are still a few that stick out, so this section covers them just
+# to make sure we also get them right.
+###
+
+# Testcase 6a, Tricky rename/delete
+#   Commit O: z/{b,c,d}
+#   Commit A: z/b
+#   Commit B: y/{b,c}, z/d
+#   Expected: y/b, CONFLICT(rename/delete, z/c -> y/c vs. NULL)
+#   Note: We're just checking here that the rename of z/b and z/c to put
+# them under y/ doesn't accidentally catch z/d and make it look like
+# it is also involved in a rename/delete conflict.
+
+test_expect_success '6a-setup: Tricky rename/delete' '
+   test_create_repo 6a &&
+   (
+   cd 6a &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   echo d >z/d &&
+   git add z &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   git rm z/c &&
+   git rm z/d &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   mkdir y &&
+   git mv z/b y/ &&
+   git mv z/c y/ &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_success '6a-check: Tricky rename/delete' '
+   (
+   cd 6a &&
+
+   git checkout A^0 &&
+
+   test_must_fail git merge -s recursive B^0 >out &&
+   test_i18ngrep "CONFLICT (rename/delete).*z/c.*y/c" out &&
+
+   git ls-files -s >out &&
+   test_line_count = 2 out &&
+   git ls-files -u >out &&
+   test_line_count = 1 out &&
+   git ls-files -o >out &&
+   test_line_count = 1 out &&
+
+   git rev-parse >actual \
+   :0:y/b :3:y/c &&
+   git rev-parse >expect \
+O:z/b  O:z/c &&
+   test_cmp expect actual
+   )
+'
+
+# Testcase 6b, Same rename done on both sides
+#   (Related to testcases 6c and 8e)
+#   Commit O: z/{b,c}
+#   Commit A: y/{b,c}
+#   Commit B: y/{b,c}, z/d
+#   Expected: y/{b,c}, z/d
+#   Note: If we did directory rename detection here, we'd move z/d into y/,
+# but B did that rename and still decided to put the file into z/,
+# so we probably shouldn't apply directory rename detection for it.
+
+test_expect_success '6b-setup: Same rename done on both sides' '
+   test_create_repo 6b &&
+   (
+   cd 6b &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   git add z &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   git mv z y &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   git mv z y &&
+   mkdir z &&
+   echo d >z/d &&
+   git add z/d &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_success '6b-check: Same rename done on both sides' '
+   (
+   cd 6b &&
+
+   git checkout A^0 &&
+
+   git merge -s recursive B^0 &&
+
+   git ls-files -s >out &&
+   test_line_count = 3 out &&
+   git ls-files -u >out &&
+   test_line_count = 0 out &&
+   git ls-files -o >out &&
+   test_line_count = 1 out &&
+
+   git rev-parse >actual \
+   HEAD:y/b HEAD:y/c HEAD:z/d

[PATCH v7 05/31] directory rename detection: files/directories in the way of some renames

2018-01-30 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 330 
 1 file changed, 330 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index f0213f2bbd..ac9c3e9974 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -842,4 +842,334 @@ test_expect_success '4a-check: Directory split, with 
original directory still pr
 #   detection.)  But, sadly, see testcase 8b.
 ###
 
+
+###
+# SECTION 5: Files/directories in the way of subset of to-be-renamed paths
+#
+# Implicitly renaming files due to a detected directory rename could run
+# into problems if there are files or directories in the way of the paths
+# we want to rename.  Explore such cases in this section.
+###
+
+# Testcase 5a, Merge directories, other side adds files to original and target
+#   Commit O: z/{b,c},   y/d
+#   Commit A: z/{b,c,e_1,f}, y/{d,e_2}
+#   Commit B: y/{b,c,d}
+#   Expected: z/e_1, y/{b,c,d,e_2,f} + CONFLICT warning
+#   NOTE: While directory rename detection is active here causing z/f to
+# become y/f, we did not apply this for z/e_1 because that would
+# give us an add/add conflict for y/e_1 vs y/e_2.  This problem with
+# this add/add, is that both versions of y/e are from the same side
+# of history, giving us no way to represent this conflict in the
+# index.
+
+test_expect_success '5a-setup: Merge directories, other side adds files to 
original and target' '
+   test_create_repo 5a &&
+   (
+   cd 5a &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   mkdir y &&
+   echo d >y/d &&
+   git add z y &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   echo e1 >z/e &&
+   echo f >z/f &&
+   echo e2 >y/e &&
+   git add z/e z/f y/e &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   git mv z/b y/ &&
+   git mv z/c y/ &&
+   rmdir z &&
+   test_tick &&
+   git commit -m "B"
+   )
+'
+
+test_expect_failure '5a-check: Merge directories, other side adds files to 
original and target' '
+   (
+   cd 5a &&
+
+   git checkout A^0 &&
+
+   test_must_fail git merge -s recursive B^0 >out &&
+   test_i18ngrep "CONFLICT.*implicit dir rename" out &&
+
+   git ls-files -s >out &&
+   test_line_count = 6 out &&
+   git ls-files -u >out &&
+   test_line_count = 0 out &&
+   git ls-files -o >out &&
+   test_line_count = 1 out &&
+
+   git rev-parse >actual \
+   :0:y/b :0:y/c :0:y/d :0:y/e :0:z/e :0:y/f &&
+   git rev-parse >expect \
+O:z/b  O:z/c  O:y/d  A:y/e  A:z/e  A:z/f &&
+   test_cmp expect actual
+   )
+'
+
+# Testcase 5b, Rename/delete in order to get add/add/add conflict
+#   (Related to testcase 8d; these may appear slightly inconsistent to users;
+#Also related to testcases 7d and 7e)
+#   Commit O: z/{b,c,d_1}
+#   Commit A: y/{b,c,d_2}
+#   Commit B: z/{b,c,d_1,e}, y/d_3
+#   Expected: y/{b,c,e}, CONFLICT(add/add: y/d_2 vs. y/d_3)
+#   NOTE: If z/d_1 in commit B were to be involved in dir rename detection, as
+# we normaly would since z/ is being renamed to y/, then this would be
+# a rename/delete (z/d_1 -> y/d_1 vs. deleted) AND an add/add/add
+# conflict of y/d_1 vs. y/d_2 vs. y/d_3.  Add/add/add is not
+# representable in the index, so the existence of y/d_3 needs to
+# cause us to bail on directory rename detection for that path, falling
+# back to git behavior without the directory rename detection.
+
+test_expect_success '5b-setup: Rename/delete in order to get add/add/add 
conflict' '
+   test_create_repo 5b &&
+   (
+   cd 5b &&
+
+   mkdir z &&
+   echo b >z/b &&
+   echo c >z/c &&
+   echo d1 >z/d &&
+   git add z &&
+   test_tick &&
+   git commit -m "O" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   git rm z/d &&
+   git mv z y &&
+   echo d2 >y/d &&
+   git add y/d &&
+

Re: [PATCH v7 00/31] Add directory rename detection to git

2018-01-30 Thread Elijah Newren

On Tue, Jan 30, 2018 at 3:25 PM, Elijah Newren  wrote:
> This patchset introduces directory rename detection to merge-recursive.  See

And a meta question: Should I be trying to submit this feature some other way?

I've really appreciated all the reviews[1], and the testcases are
undoubtedly better because of them.  The code too, but the reviews
have focused almost exclusively on the testcases so far.  Is
merge-recursive just too hairy and my changes too large for folks to
stomach tackling?  Is there something different I could do?

I'm curious for any thoughts on the matter.

Thanks!
Elijah

[1] Especially Stefan who looked through all the patches, but also for
spot reviews from Junio, Johannes Schindelin, Johanness Sixt, Adam
Dinwoodie, and SZEDER Gábor.  Also, a few interested comments from
Philip Oakley and Jacob Keller.

Re: git add --all does not respect submodule..ignore

2018-01-30 Thread Jacob Keller

On Tue, Jan 30, 2018 at 1:56 PM, Stefan Beller  wrote:
> The assume-unchanged bit is a performance optimisation for powerusers,
> but its documentation words it in a less dangerous way, such that it sounds
> as if it is a UX feature instead of a performance thing. I'd stay away from
> that know.
>
> Stefan

In almost all cases where someone incorrectly recommends
assume-unchanged, I think they could just use skip-worktree to get the
actual behavior they want.

Thanks,
Jake

Re: Bug Report: Subtrees and GPG Signed Commits

2018-01-30 Thread Avery Pennarun

On Tue, Jan 30, 2018 at 6:24 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>> There has not been feedback for a while on this thread.
>> I think that is because subtrees are not in anyone's hot
>> interest area currently.
>>
>> This is definitely the right place to submit bugs.
>> Looking through "git log --format="%ae %s" -S subtree",
>> it seems as if Avery (apenw...@gmail.com) was mostly
>> interested in developing subtrees, though I think he has
>> moved on. Originally it was invented by Junio, who is
>> the active maintainer of the project in 68faf68938
>> (A new merge stragety 'subtree'., 2007-02-15)
>
> Thanks for trying to help, but I have *NOTHING* to do with the "git
> subtree" subcommand (and I personally have no interest in it).  What
> I did was a subtree merge strategy (i.e. "git merge -s subtree"),
> which is totally a different thing.
>
> David Greene offered to take it over in 2015, and then we saw some
> activity by David Aguilar in 2016, but otherwise the subcommand from
> contrib/ has pretty much been dormant these days.

Strictly speaking, the 'git subtree' command does in fact use 'git
merge -s subtree' under the covers, so Junio is at least partly
responsible for giving me the idea :)

I actually have never looked into how signed commits work and although
I still use git-subtree occasionally (it hasn't needed any
maintenance, for my simple use cases), I have never used it with
signed commits.

git-subtree maintains a cache that maps commit ids in the "original
project" with their equivalents in the "merged project."  If there's
something magic about how commit ids work with signed commits, I could
imagine that causing the "no a valid object name" problems.  Or,
git-subtree in --squash mode actually generates new commit objects
using some magic of its own.  If it were to accidentally copy a
signature into a commit that no longer matches the original, I imagine
that new object might get rejected.

Unfortunately I don't have time to look into it.  The git-subtree code
is pretty straightforward, though, so if Stephen has an hour or two to
look deeper it's probably possible to fix it up.  The tool is not
actually as magical and difficult as it might seem at first glance :)

Sorry I can't help more.

Good luck,

Avery

Re: Bug Report: Subtrees and GPG Signed Commits

2018-01-30 Thread Junio C Hamano

Stefan Beller  writes:

> There has not been feedback for a while on this thread.
> I think that is because subtrees are not in anyone's hot
> interest area currently.
>
> This is definitely the right place to submit bugs.
> Looking through "git log --format="%ae %s" -S subtree",
> it seems as if Avery (apenw...@gmail.com) was mostly
> interested in developing subtrees, though I think he has
> moved on. Originally it was invented by Junio, who is
> the active maintainer of the project in 68faf68938
> (A new merge stragety 'subtree'., 2007-02-15)

Thanks for trying to help, but I have *NOTHING* to do with the "git
subtree" subcommand (and I personally have no interest in it).  What
I did was a subtree merge strategy (i.e. "git merge -s subtree"),
which is totally a different thing.

David Greene offered to take it over in 2015, and then we saw some
activity by David Aguilar in 2016, but otherwise the subcommand from
contrib/ has pretty much been dormant these days.

Re: [PATCH 37/37] replace: rename 'new' variables

2018-01-30 Thread Junio C Hamano

Stefan Beller  writes:

> On Mon, Jan 29, 2018 at 2:37 PM, Brandon Williams  wrote:
>> Rename C++ keyword in order to bring the codebase closer to being able
>> to be compiled with a C++ compiler.
>>
>> Signed-off-by: Brandon Williams 
>> ---
>>  builtin/replace.c | 16 
>>  1 file changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/builtin/replace.c b/builtin/replace.c
>> index 42cf4f62a..e48835b54 100644
>> --- a/builtin/replace.c
>> +++ b/builtin/replace.c
>> @@ -284,7 +284,7 @@ static int edit_and_replace(const char *object_ref, int 
>> force, int raw)
>>  {
>> char *tmpfile = git_pathdup("REPLACE_EDITOBJ");
>> enum object_type type;
>> -   struct object_id old, new, prev;
>> +   struct object_id old, new_oid, prev;
>
> new is a keyword that often comes with a counterpart, here `old`.
> So while at it, also rename old to old_oid ?
> Do we care about the symmetry enough to warrant additional churn for this?

Absolutely.  That is one of the reasons why the "hacky" approach is
so attractive---it does not force those who are doing conversion to
think.  With this approach, "new" in this context gets replaced with
new_oid (because this is about oid; in another codepath, "new" and
"old" might have been referring to a file and the new names for them
would have been "new_file" vs "old_file") after some thought, and
the same thought process should realize "old" must become "old_oid".

Very good point.

Re: Bug Report: Subtrees and GPG Signed Commits

2018-01-30 Thread Stefan Beller

On Tue, Jan 30, 2018 at 11:15 AM, Stephen R Guglielmo
 wrote:
> Hi, just following up on this bug report. I have not heard back. Is
> there additional information that's needed? Is there a better place to
> file bug reports?
>
> Additionally, I have confirmed that this bug still exists with git
> version 2.16.1.
>
> Thanks
>
> On Thu, Jan 18, 2018 at 11:19 AM, Stephen R Guglielmo
>  wrote:
>> Hi, just following up on this bug report. I have not heard back. Is
>> there additional information that's needed? Is there a better place to
>> file bug reports?
>>
>> Thanks
>>
>> On Sat, Jan 6, 2018 at 5:45 PM, Stephen R Guglielmo
>>  wrote:
>>> Hi all,
>>>
>>> I've noticed an issue regarding the use of `git subtree add` and `git
>>> subtree pull` when the subtree repository's commit (either HEAD or
>>> whatever commit specified by the subtree command) is signed with GPG.
>>> It seems to work properly if the commit is not signed but previous
>>> commits are.
>>>
>>> The gist of the issue is that `git subtree add` does not add the
>>> subree properly and a "fatal: Not a valid object name" error is
>>> thrown. Running `git subtree pull` does not pull any upstream changes
>>> after that ("'subtree' was never added").
>>>
>>> I have not done extensive testing, however, below are instructions to
>>> reproduce the issue. This was tested using git version 2.15.1
>>> installed via Homebrew on MacOS. I did not test with the built-in
>>> version of git on MacOS.
>>>
>>> Thanks,
>>> Steve
>>>
>>> # Create a new repository
>>> mkdir repoA && cd repoA
>>> git init
>>> echo "Test File in Repo A" > FileA
>>> git add -A && git commit -m 'Initial commit in repo A'
>>>
>>> # Create a second repository
>>> cd .. && mkdir repoB && cd repoB
>>> git init
>>> echo "Test File in Repo B" > FileB
>>> git add -A && git commit -m 'Initial commit in repo B'
>>>
>>> # Create a signed commit in repo B
>>> echo "Signed Commit" >> FileB
>>> git commit -a -S  -m 'Signed commit in repo B'
>>>
>>> # Now, add repoB as a subtree of RepoA
>>> cd ../repoA
>>> git subtree add --prefix repoB_subtree/ ../repoB/ master --squash
>>> # Output:
>>> git fetch ../repoB/ master
>>> warning: no common commits
>>> remote: Counting objects: 6, done.
>>> remote: Compressing objects: 100% (2/2), done.
>>> remote: Total 6 (delta 0), reused 0 (delta 0)
>>> Unpacking objects: 100% (6/6), done.
>>> From ../repoB
>>>  * branchmaster -> FETCH_HEAD
>>> fatal: Not a valid object name gpg: Signature made Sat Jan  6 17:38:31 2018 
>>> EST
>>> gpg:using RSA key 6900E9CFDD39B6A741D601F50999759F2DCF3E7C
>>> gpg: Good signature from "Stephen Robert Guglielmo (Temple University
>>> Computer Services) " [ultimate]
>>> Primary key fingerprint: 6900 E9CF DD39 B6A7 41D6  01F5 0999 759F 2DCF 3E7C
>>> 4b700b1a4ebb9e2c1011aafd6b0f720b38f059a4
>>> # Note, git exits with status 128 at this point.
>>>
>>> # FileB was in fact added and staged to repoA, despite the "fatal"
>>> above. Commit it:
>>> git commit -m 'Add repoB subtree'
>>>
>>> # Ok, let's make another commit in repoB and try a `subtree pull`
>>> instead of `subtree add`
>>> cd ../repoB
>>> echo "Another Line" >> FileB
>>> git commit -a -S -m 'Another signed commit'
>>> cd ../repoA
>>> git subtree pull --prefix repoB_subtree/ ../repoB master --squash
>>> # Output:
>>> warning: no common commits
>>> remote: Counting objects: 9, done.
>>> remote: Compressing objects: 100% (3/3), done.
>>> remote: Total 9 (delta 0), reused 0 (delta 0)
>>> Unpacking objects: 100% (9/9), done.
>>> From ../repoB
>>>  * branchmaster -> FETCH_HEAD
>>> Can't squash-merge: 'repoB_subtree' was never added.
>>> # Note, git exits with status 1 at this point.
>>>
>>> # RepoB's third commit ('Another signed commit') is not pulled into
>>> the subree in repo A.
>>> # This can be verified by running a diff:
>>> diff -qr --exclude ".git" repoB_subtree ../repoB
>>> # Output:
>>> Files repoB_subtree/FileB and ../repoB/FileB differ

There has not been feedback for a while on this thread.
I think that is because subtrees are not in anyone's hot
interest area currently.

This is definitely the right place to submit bugs.
Looking through "git log --format="%ae %s" -S subtree",
it seems as if Avery (apenw...@gmail.com) was mostly
interested in developing subtrees, though I think he has
moved on. Originally it was invented by Junio, who is
the active maintainer of the project in 68faf68938
(A new merge stragety 'subtree'., 2007-02-15)

Thanks,
Stefan

Re: Some rough edges of core.fsmonitor

2018-01-30 Thread Ævar Arnfjörð Bjarmason


On Tue, Jan 30 2018, Ben Peart jotted:

> While some of these issues have been discussed in other threads, I
> thought I'd summarize my thoughts here.

Thanks for this & your other reply. I'm going to get to testing some of
Duy's patches soon, and if you have some other relevant WIP I'd be happy
to try them, but meanwhile replying to a few of these:

> On 1/26/2018 7:28 PM, Ævar Arnfjörð Bjarmason wrote:
>> I just got around to testing this since it landed, for context some
>> previous poking of mine in [1].
>>
>> Issues / stuff I've noticed:
>>
>> 1) We end up invalidating the untracked cache because stuff in .git/
>> changed. For example:
>>
>>  01:09:24.975524 fsmonitor.c:173 fsmonitor process 
>> '.git/hooks/fsmonitor-watchman' returned success
>>  01:09:24.975548 fsmonitor.c:138 fsmonitor_refresh_callback 
>> '.git'
>>  01:09:24.975556 fsmonitor.c:138 fsmonitor_refresh_callback 
>> '.git/config'
>>  01:09:24.975568 fsmonitor.c:138 fsmonitor_refresh_callback 
>> '.git/index'
>>  01:09:25.122726 fsmonitor.c:91  write fsmonitor extension 
>> successful
>>
>> Am I missing something or should we do something like:
>>
>>  diff --git a/fsmonitor.c b/fsmonitor.c
>>  index 0af7c4edba..5067b89bda 100644
>>  --- a/fsmonitor.c
>>  +++ b/fsmonitor.c
>>  @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t 
>> last_update, struct strbuf *que
>>
>>   static void fsmonitor_refresh_callback(struct index_state *istate, 
>> const char *name)
>>   {
>>  -   int pos = index_name_pos(istate, name, strlen(name));
>>  +   int pos;
>>  +
>>  +   if (!strcmp(name, ".git") || starts_with(name, ".git/"))
>>  +   return;
>>  +
>>  +   pos = index_name_pos(istate, name, strlen(name));
>>
>>  if (pos >= 0) {
>>  struct cache_entry *ce = istate->cache[pos];
>>
>> With that patch applied status on a large repo[2] goes from a consistent
>> ~180-200ms to ~140-150ms, since we're not invalidating some of the UC
>> structure
>>
>
> I favor making this optimization by updating
> untracked_cache_invalidate_path() so that it ignores paths under
> get_git_dir() and doesn't invalidate the untracked cache or flag the
> index as dirty.

*nod*

>> 2) We re-write out the index even though we know nothing changed
>>
>> When you first run with core.fsmonitor it needs to
>> mark_fsmonitor_clean() for every path, but is there a reason for why we
>> wouldn't supply the equivalent of GIT_OPTIONAL_LOCKS=0 if all paths are
>> marked and we know from the hook that nothing changed? Why write out the
>> index again?
>>
>
> Writing out the index when core.fsmonitor is first turned on is
> necessary to get the index extension added with the current state of
> the dirty flags.  Given it is a one time cost, I don't think we have
> anything worth trying to optimize here.

Indeed, that makes sense. What I was showing here is even after the
initial setup we continue to write it out when we know nothing changed.

We do that anyway without fsmonitor, but this seemed like a worthwhile
thing to optimize.

>> 3) A lot of time spend reading the index (or something..)
>>
>> While the hook itself takes ~20ms (and watchman itself 1/4 of that)
>> status as a whole takes much longer. gprof reveals:
>>
>>  Each sample counts as 0.01 seconds.
>>%   cumulative   self  self total
>>   time   seconds   secondscalls  ms/call  ms/call  name
>>   15.38  0.02 0.02   221690 0.00 0.00  memihash
>>   15.38  0.04 0.02   221689 0.00 0.00  create_from_disk
>>7.69  0.05 0.01  2216897 0.00 0.00  git_bswap32
>>7.69  0.06 0.01   222661 0.00 0.00  ce_path_match
>>7.69  0.07 0.01   221769 0.00 0.00  hashmap_add
>>7.69  0.08 0.0139941 0.00 0.00  prep_exclude
>>7.69  0.09 0.0139940 0.00 0.00  strbuf_addch
>>7.69  0.10 0.01110.0010.00  read_one
>>7.69  0.11 0.01110.0010.00  refresh_index
>>7.69  0.12 0.01110.0010.00  tweak_fsmonitor
>>7.69  0.13 0.01 preload_thread
>>
>> The index is 24M in this case, I guess it's unpacking it, but I wonder
>> if this couldn't be much faster if we saved away the result of the last
>> "status" in something that's quick to access, and then if nothing
>> changed we just report that, and no need to re-write the index (or just
>> write the "it was clean at this time" part).
>
> Yes, reading the index is slow.  We've made some improvements (not
> computing the SHA, not validating the sort order, etc) and have one
> more in progress that will reduce the malloc() cost.  I haven't found
> any other easy optimizations but it would be great if you

Re: [PATCH 00/37] removal of some c++ keywords

2018-01-30 Thread Stefan Beller

On Tue, Jan 30, 2018 at 2:36 PM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>> Is it simpler (though hacky) to just  do
>>
>> #ifdef __cplusplus
>> #define new not_new
>> #define try really_try
>> ...
>>
>> somewhere in git-compat-util.h?
>
> Very tempting, especially given that your approach automatically
> would cover topics in flight without any merge conflict ;-)
>
> I agree that it is hacky and somewhat ugly, but the hackiness
> somehow does not bother me too much in this case; perhaps because
> attempting to use a C++ compiler may already be hacky in the first
> place?
>
> It probably depends on the reason why we are doing this topic.  If a
> report about our source code coming from the C++ oriented tool cite
> the symbol names seen by machines, then the "hacky" approach will
> give us "not_new" where Brandon's patch may give us "new_oid", or
> whatever symbol that is more appropriate for the context it appears
> than such an automated cute name.
>
>> Do we use any C features that are incompatible with C++? (or do we not
>> need to care?)
>
> Good question.

implicit casts from void?
e.g. xmalloc returns a void pointer, not the type requested.
https://embeddedartistry.com/blog/2017/2/28/c-casting-or-oh-no-we-broke-malloc

Re: [PATCH 37/37] replace: rename 'new' variables

2018-01-30 Thread Stefan Beller

On Mon, Jan 29, 2018 at 2:37 PM, Brandon Williams  wrote:
> Rename C++ keyword in order to bring the codebase closer to being able
> to be compiled with a C++ compiler.
>
> Signed-off-by: Brandon Williams 
> ---
>  builtin/replace.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/replace.c b/builtin/replace.c
> index 42cf4f62a..e48835b54 100644
> --- a/builtin/replace.c
> +++ b/builtin/replace.c
> @@ -284,7 +284,7 @@ static int edit_and_replace(const char *object_ref, int 
> force, int raw)
>  {
> char *tmpfile = git_pathdup("REPLACE_EDITOBJ");
> enum object_type type;
> -   struct object_id old, new, prev;
> +   struct object_id old, new_oid, prev;

new is a keyword that often comes with a counterpart, here `old`.
So while at it, also rename old to old_oid ?
Do we care about the symmetry enough to warrant additional churn for this?

Stefan

Re: Some rough edges of core.fsmonitor

2018-01-30 Thread Ben Peart

While some of these issues have been discussed in other threads, I 
thought I'd summarize my thoughts here.


On 1/26/2018 7:28 PM, Ævar Arnfjörð Bjarmason wrote:

I just got around to testing this since it landed, for context some
previous poking of mine in [1].

Issues / stuff I've noticed:

1) We end up invalidating the untracked cache because stuff in .git/
changed. For example:

 01:09:24.975524 fsmonitor.c:173 fsmonitor process 
'.git/hooks/fsmonitor-watchman' returned success
 01:09:24.975548 fsmonitor.c:138 fsmonitor_refresh_callback '.git'
 01:09:24.975556 fsmonitor.c:138 fsmonitor_refresh_callback 
'.git/config'
 01:09:24.975568 fsmonitor.c:138 fsmonitor_refresh_callback 
'.git/index'
 01:09:25.122726 fsmonitor.c:91  write fsmonitor extension 
successful

Am I missing something or should we do something like:

 diff --git a/fsmonitor.c b/fsmonitor.c
 index 0af7c4edba..5067b89bda 100644
 --- a/fsmonitor.c
 +++ b/fsmonitor.c
 @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t 
last_update, struct strbuf *que

  static void fsmonitor_refresh_callback(struct index_state *istate, const 
char *name)
  {
 -   int pos = index_name_pos(istate, name, strlen(name));
 +   int pos;
 +
 +   if (!strcmp(name, ".git") || starts_with(name, ".git/"))
 +   return;
 +
 +   pos = index_name_pos(istate, name, strlen(name));

 if (pos >= 0) {
 struct cache_entry *ce = istate->cache[pos];

With that patch applied status on a large repo[2] goes from a consistent
~180-200ms to ~140-150ms, since we're not invalidating some of the UC
structure



I favor making this optimization by updating 
untracked_cache_invalidate_path() so that it ignores paths under 
get_git_dir() and doesn't invalidate the untracked cache or flag the 
index as dirty.



2) We re-write out the index even though we know nothing changed

When you first run with core.fsmonitor it needs to
mark_fsmonitor_clean() for every path, but is there a reason for why we
wouldn't supply the equivalent of GIT_OPTIONAL_LOCKS=0 if all paths are
marked and we know from the hook that nothing changed? Why write out the
index again?



Writing out the index when core.fsmonitor is first turned on is 
necessary to get the index extension added with the current state of the 
dirty flags.  Given it is a one time cost, I don't think we have 
anything worth trying to optimize here.



3) A lot of time spend reading the index (or something..)

While the hook itself takes ~20ms (and watchman itself 1/4 of that)
status as a whole takes much longer. gprof reveals:

 Each sample counts as 0.01 seconds.
   %   cumulative   self  self total
  time   seconds   secondscalls  ms/call  ms/call  name
  15.38  0.02 0.02   221690 0.00 0.00  memihash
  15.38  0.04 0.02   221689 0.00 0.00  create_from_disk
   7.69  0.05 0.01  2216897 0.00 0.00  git_bswap32
   7.69  0.06 0.01   222661 0.00 0.00  ce_path_match
   7.69  0.07 0.01   221769 0.00 0.00  hashmap_add
   7.69  0.08 0.0139941 0.00 0.00  prep_exclude
   7.69  0.09 0.0139940 0.00 0.00  strbuf_addch
   7.69  0.10 0.01110.0010.00  read_one
   7.69  0.11 0.01110.0010.00  refresh_index
   7.69  0.12 0.01110.0010.00  tweak_fsmonitor
   7.69  0.13 0.01 preload_thread

The index is 24M in this case, I guess it's unpacking it, but I wonder
if this couldn't be much faster if we saved away the result of the last
"status" in something that's quick to access, and then if nothing
changed we just report that, and no need to re-write the index (or just
write the "it was clean at this time" part).


Yes, reading the index is slow.  We've made some improvements (not 
computing the SHA, not validating the sort order, etc) and have one more 
in progress that will reduce the malloc() cost.  I haven't found any 
other easy optimizations but it would be great if you could find more! 
To make significant improvements, I'm afraid it will take more 
substantial changes to the in memory and on disk formats and updates to 
the code to take advantage of those changes.




4) core.fsmonitor=false behaves unexpectedly

The code that reads this variable just treats it as a string, so we do a
bunch of work for nothing (and nothing warns) if this is set and 'false'
is executed. Any reason we couldn't do our standard boolean parsing
here? You couldn't call your hook 0/1/true/false, but that doesn't seem
like a big loss.

1. 
https://public-inbox.org/git/cacbzzx5a6op7dh_g9wofbnejh2zgnk4b34ygxa8dandqvit...@mail.gmail.com/
2. https://github.com/avar/2015-04-03-1M-git



I'm torn on this

Re: Some rough edges of core.fsmonitor

2018-01-30 Thread Ben Peart




On 1/27/2018 2:01 PM, Ævar Arnfjörð Bjarmason wrote:


On Sat, Jan 27 2018, Duy Nguyen jotted:


On Sat, Jan 27, 2018 at 07:39:27PM +0700, Duy Nguyen wrote:

On Sat, Jan 27, 2018 at 6:43 PM, Ævar Arnfjörð Bjarmason
 wrote:

a) no fsmonitor

 $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
 12:32:44.947651 read-cache.c:1890   performance: 0.053153609 s: read 
cache .git/index
 12:32:44.967943 preload-index.c:112 performance: 0.020161093 s: 
preload index
 12:32:44.974217 read-cache.c:1446   performance: 0.006230611 s: 
refresh index

...

b) with fsmonitor

 $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
 12:34:23.833625 read-cache.c:1890   performance: 0.049485685 s: read 
cache .git/index
 12:34:23.838622 preload-index.c:112 performance: 0.001221197 s: 
preload index
 12:34:23.858723 fsmonitor.c:170 performance: 0.020059647 s: 
fsmonitor process '.git/hooks/fsmonitor-watchman'
 12:34:23.871532 read-cache.c:1446   performance: 0.032870818 s: 
refresh index


Hmm.. why does refresh take longer with fsmonitor/watchman? With the
help from watchman, we know what files are modified. We don't need
manual stat()'ing and this line should be lower than the "no
fsmonitor" case, which is 0.006230611s.


Ahh.. my patch probably does not see that fsmonitor could be activated
lazily inside refresh_index() call. The patch below should fix it.


Will have to get those numbers to you later, or alternatively clone
https://github.com/avar/2015-04-03-1M-git (or some other test repo) and
test it yourself, sorry. Don't have time to follow-up much this weekend.


But between your normal refresh time (0.020 preload + 0.006 actual
refresh) and fsmonitor taking 0.020 just to talk to watchman, this
repo seems "too small" for fsmonitor/watchman to shine.


Surely that's an implementation limitation and not something inherent,
given that watchman itself returns in 5ms?

I.e. status could work like this, no?:

  1. At start, record the timestamp & find out canonical state via some
 expansive method.
  2. Print out xyz changed, abc added etc.
  3. Record *just* what status would report about xyz, abc etc.
  4. On subsequent git status, just amend that information, e.g. if
 watchman says nothing changed $(cat .git/last-status-output).

We shouldn't need to be reading the entire index in the common case
where just a few things change.



I agree that reading the entire index in the common case is rather 
expensive. It is, however, the model we have today and all the code in 
git assumes all cache entries are in memory.


We are interested in pursuing a patch series that would enable higher 
performance especially with large and/or sparse repos by making the 
index sparse, hierarchical, and incrementally readable/writable.  As you 
might expect, that is a lot of work and is far beyond what we can 
address in this patch series.



There's also a lot of things that use status to just check "are we
clean?", those would only need to record the last known timestamp when
the tree was clean, and then ask watchman if there were any changes, if
not we're done.


I'm still a bit curious that refresh index time, after excluding 0.020
for fsmonitor, is stil 0.012s. What does it do? It should really be
doing nothing. Either way, read index time seems to be the elephant in
the room now.

-- 8< --
diff --git a/read-cache.c b/read-cache.c
index eac74bc9f1..d60e0a8480 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1367,12 +1367,21 @@ int refresh_index(struct index_state *istate, unsigned 
int flags,
unsigned int options = (CE_MATCH_REFRESH |
(really ? CE_MATCH_IGNORE_VALID : 0) |
(not_new ? CE_MATCH_IGNORE_MISSING : 0));
+   int ignore_fsmonitor = options & CE_MATCH_IGNORE_FSMONITOR;
const char *modified_fmt;
const char *deleted_fmt;
const char *typechange_fmt;
const char *added_fmt;
const char *unmerged_fmt;
-   uint64_t start = getnanotime();
+   uint64_t start;
+
+   /*
+* If fsmonitor is used, force its communication early to
+* accurately measure how long this function takes without it.
+*/
+   if (!ignore_fsmonitor)
+   refresh_fsmonitor(istate);
+   start = getnanotime();

modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");
deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");
-- 8< --

Re: [PATCH 00/37] removal of some c++ keywords

2018-01-30 Thread Junio C Hamano

Duy Nguyen  writes:

> Is it simpler (though hacky) to just  do
>
> #ifdef __cplusplus
> #define new not_new
> #define try really_try
> ...
>
> somewhere in git-compat-util.h?

Very tempting, especially given that your approach automatically
would cover topics in flight without any merge conflict ;-)

I agree that it is hacky and somewhat ugly, but the hackiness
somehow does not bother me too much in this case; perhaps because
attempting to use a C++ compiler may already be hacky in the first
place?

It probably depends on the reason why we are doing this topic.  If a
report about our source code coming from the C++ oriented tool cite
the symbol names seen by machines, then the "hacky" approach will
give us "not_new" where Brandon's patch may give us "new_oid", or
whatever symbol that is more appropriate for the context it appears
than such an automated cute name.

> Do we use any C features that are incompatible with C++? (or do we not
> need to care?)

Good question.

Re: recommendations for log enhancement

2018-01-30 Thread Stefan Beller

On Sun, Jan 28, 2018 at 7:46 AM, 牛旭  wrote:
> Our team studies the consistent edits of git during evolution. And we find 
> several missed edits in the latest release of git. For example, there are two 
> consist edits we have figured out from historical commits:

Thanks for studying the code of Git. It will help the project in
bettering the code.
Welcome to the Git community!
Which version do you mean by "latest release" ?


> 1) . Version: git 2.3.9 – git-2.3.10
>File: builtin/merge-tree.c
>
> dst.size = size;
> -   xdi_diff(, , , , );
> +   if (xdi_diff(, , , , ))
> +   die("unable to generate diff");
> free(src.ptr);
> free(dst.ptr);
>  }
>
> 2) .Version: git 2.3.9 – git-2.3.10
>   File: combine-diff.c
>
> -   xdi_diff_outf(_file, result_file, consume_line, ,
> - , );
> +   if (xdi_diff_outf(_file, result_file, consume_line, ,
> + , ))
> +   die("unable to generate combined diff for %s",
> +   sha1_to_hex(parent));
> free(parent_file.ptr);
>
> Those two commits both add if structure and log messages for handling the 
> return value of xdi_diff_outf().
> And in the latest release, we find one candidate that may also need log 
> statements inserted:
> 1)  File: git-2.14.2/builtin/rerere.c
>
>  1  static int diff_two(const char *file1, const char *label1,
>  2  const char *file2, const char *label2)
>  3  {
> ….
> 20  ret = xdi_diff(, , , , );
> 21
> 22  free(minus.ptr);
> 23  free(plus.ptr);
> 24  return ret;
> 25  }
> ...
> }
>
> There are more examples of consistent update and corresponding suggestions in 
> attachment. It is so nice of you to read them and share me with your opinion 
> on the correctness of our suggestion. Thanks a lot.

Thanks a lot for this suggestion and the suggestions in the attachment.

However these are less than optimal to consume for the project.
Care to make these changes as actual commits in your local repository
and then send patches?

See Documentation/SubmittingPatches
https://github.com/git/git/blob/master/Documentation/SubmittingPatches

Thanks,
Stefan

Re: 2.15.1 - merge with submodule output issue

2018-01-30 Thread Stefan Beller

On Tue, Jan 30, 2018 at 7:42 AM, FIGADERE, LAURENT
 wrote:
> Dear git,
>
> I use since few weeks now git 2.15.1.
>
> I did few trials but please find here my outputs.
>
> To reproduce:
> Use a top module git which include a submodule
> First step: from a work area, I changed selected version of submodule in 
> master branch.
> Then git add + git commit + git push
>    A new commit on master branch has been published on my origin 
> repository with the version v1.2 of submodule
>
> Second step: in my second workarea, I created a user branch mybranch, then 
> selected another release of submodule
> I added the update and then commit in mybranch.
>    A new commit with release v2.0 of submodule is in my last SHA1 of 
> mybranch
>
> Last step: in the second workarea, in mybranch, I first run ‘git fetch’ and 
> then ‘git merge origin/master’
> I got a CONFLICT message of course due to the 2 different versions of 
> submodule.
> Here the message:
> warning: Failed to merge submodule submodule (commits don't follow merge-base)
> Auto-merging submodule
> CONFLICT (submodule): Merge conflict in submodule
> Automatic merge failed; fix conflicts and then commit the result.
>
> Now I thought that the command ‘git submodule’ provided an output with both 
> versions of modules (local and remote).
> But this is not the case in my environment:
> [15:20:10] $ git submodule status
> U submodule
>
> And when I run the mergetool command I have this output:
> [14:54:41] $ git mergetool
> Merging:
> submodule
> Submodule merge conflict for 'submodule':
>   {local}: submodule commit 08f86f2404d9f8f616262971a3127e69f39f9d11
>   {remote}: submodule commit b3dd6fde4f02258b88ad0b2dba6474c126b499f7
> Use (l)ocal or (r)emote, or (a)bort?
>
> So, it means it’s not usefull to determine which version has to be selected.
> Is it a bug or perhaps I make something wrong?

It's not a bug, but the real feature has not been implemented yet.

Re: [PATCH v5 5/7] convert: add 'working-tree-encoding' attribute

2018-01-30 Thread Junio C Hamano

Lars Schneider  writes:

>> On 30 Jan 2018, at 21:05, Junio C Hamano  wrote:
>> 
>> tbo...@web.de writes:
>> 
>>> +   if ((conv_flags & CONV_WRITE_OBJECT) && !strcmp(enc->name, 
>>> "SHIFT-JIS")) {
>>> +   char *re_src;
>>> +   int re_src_len;
>> 
>> I think it is a bad idea to 
>> 
>> (1) not check without CONV_WRITE_OBJECT here.
>
> The idea is to perform the roundtrip check *only* if we 
> actually write to Git. In all other cases we don't care
> if the encoding roundtrips.
>
> "git checkout" is such a case where we don't care as 
> noted by Peff here:
> https://public-inbox.org/git/20171215095838.ga3...@sigill.intra.peff.net/
>
> Do you agree?

I am not sure why this is special cased and other codepaths have "if
WRITE_OBJECT then die, otherwise error" checks, so no, I do not
agree with your reasoning, at least not yet.

Re: git add --all does not respect submodule..ignore

2018-01-30 Thread Stefan Beller

On Tue, Jan 30, 2018 at 12:25 PM, Michael Scott-Nelson
 wrote:
> Giving a submodule "ignore=all" or "ignore=dirty" in .gitmodule
> successfully removes that module from `git status` queries. However,
> these same diffs are automatically added by git-add, eg `git add .` or
> `git add --all` adds the submodules that we want ignored. This seems
> inconsistent and confusing.

My prime suspect for this change would be
https://github.com/git/git/commit/5556808690ea245708fb80383be5c1afee2fb3eb

> Workflow reason:
> We want to be able to have supers and subs checked-out, make changes
> to both, but only update the SHA-1 pointer from super to sub when
> explicitly forced to do so, eg `git add -f subName`. This workflow
> prevents continuous merge conflicts from clashing SHA-1 pointers while
> still allowing `git add --all`, allowing a sort of middle ground
> between submodules and an untracked library.

For that you want to set the ignore flag locally in .git/config instead of
.gitmodules. The .gitmodules seems like a convenient place to "share
submodule config", but that is a slippery slope and I think that was a
mistake by the project.

(If you control the environment, you could also put the ignore flags into
the system wide config)

> Teaching git-add about submodule.ignore and/or teaching .gitignore
> about submodules would be awesome.

I wonder if we can address this issue with even more config options.
(sounds bad, but is the first thought I have)

> Also experimented with `git update-index --assume-unchanged subName`,
> but I believe that it does not get committed and besides also does not
> seem to have a way to `git add -f`.

The assume-unchanged bit is a performance optimisation for powerusers,
but its documentation words it in a less dangerous way, such that it sounds
as if it is a UX feature instead of a performance thing. I'd stay away from
that know.

Stefan

Re: [PATCH v5 4/7] utf8: add function to detect a missing UTF-16/32 BOM

2018-01-30 Thread Junio C Hamano

Lars Schneider  writes:

> "false". Therefore, "is_missing_required_utf_bom()" might be 
> lengthy but should fit.

Thanks, sounds understandable a lot better than the original ;-)

Re: [PATCH v2 00/14] Serialized Git Commit Graph

2018-01-30 Thread Stefan Beller

On Tue, Jan 30, 2018 at 1:39 PM, Derrick Stolee  wrote:
> Thanks to everyone who gave comments on v1. I tried my best to respond to
> all of the feedback, but may have missed some while I was doing several
> renames, including:
>
> * builtin/graph.c -> builtin/commit-graph.c
> * packed-graph.[c|h] -> commit-graph.[c|h]
> * t/t5319-graph.sh -> t/t5318-commit-graph.sh
>
> Because of these renames (and several type/function renames) the diff
> is too large to conveniently share here.
>
> Some issues that came up and are addressed:
>
> * Use  instead of  when referring to the graph-.graph
>   filenames and the contents of graph-head.
> * 32-bit timestamps will not cause undefined behavior.
> * timestamp_t is unsigned, so they are never negative.
> * The config setting "core.commitgraph" now only controls consuming the
>   graph during normal operations and will not block the commit-graph
>   plumbing command.
> * The --stdin-commits is better about sanitizing the input for strings
>   that do not parse to OIDs or are OIDs for non-commit objects.
>
> One unresolved comment that I would like consensus on is the use of
> globals to store the config setting and the graph state. I'm currently
> using the pattern from packed_git instead of putting these values in
> the_repository. However, we want to eventually remove globals like
> packed_git. Should I deviate from the pattern _now_ in order to keep
> the problem from growing, or should I keep to the known pattern?

I have a series doing the conversion in
https://github.com/stefanbeller/git/tree/object-store
that is based on 2.16.

While the commits are structured for easy review (to not miss any of
the globals that that series is based upon), I did not come up with a
good strategy how to take care of series in flight that add more globals.

So I think for now you'd want to keep it as global vars, such that
it is consistent with the code base and then we'll figure out how to
do the conversion one step at a time.

Please do not feel stopped or hindered by my slow pace of working
through that series, maybe I'll have to come up with another approach
that is better for upstream (rebasing that series is a pain, as upstream
moves rather quickly. Maybe I'll have to send that series in smaller chunks).

> Finally, I tried to clean up my incorrect style as I was recreating
> these commits. Feel free to be merciless in style feedback now that the
> architecture is more stable.

ok, will do.

Thanks,
Stefan

[PATCH v2 04/14] commit-graph: implement construct_commit_graph()

2018-01-30 Thread Derrick Stolee

Teach Git to write a commit graph file by checking all packed objects
to see if they are commits, then store the file in the given pack
directory.

Signed-off-by: Derrick Stolee 
---
 Makefile   |   1 +
 commit-graph.c | 376 +
 commit-graph.h |  20 +++
 3 files changed, 397 insertions(+)
 create mode 100644 commit-graph.c
 create mode 100644 commit-graph.h

diff --git a/Makefile b/Makefile
index aee5d3f7b9..894432b35b 100644
--- a/Makefile
+++ b/Makefile
@@ -773,6 +773,7 @@ LIB_OBJS += color.o
 LIB_OBJS += column.o
 LIB_OBJS += combine-diff.o
 LIB_OBJS += commit.o
+LIB_OBJS += commit-graph.o
 LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += config.o
diff --git a/commit-graph.c b/commit-graph.c
new file mode 100644
index 00..db2b7390c7
--- /dev/null
+++ b/commit-graph.c
@@ -0,0 +1,376 @@
+#include "cache.h"
+#include "config.h"
+#include "git-compat-util.h"
+#include "pack.h"
+#include "packfile.h"
+#include "commit.h"
+#include "object.h"
+#include "commit-graph.h"
+
+#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
+#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
+#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
+#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
+
+#define GRAPH_DATA_WIDTH 36
+
+#define GRAPH_VERSION_1 0x1
+#define GRAPH_VERSION GRAPH_VERSION_1
+
+#define GRAPH_OID_VERSION_SHA1 1
+#define GRAPH_OID_LEN_SHA1 20
+#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
+#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
+
+#define GRAPH_LARGE_EDGES_NEEDED 0x8000
+#define GRAPH_PARENT_MISSING 0x7fff
+#define GRAPH_EDGE_LAST_MASK 0x7fff
+#define GRAPH_PARENT_NONE 0x7000
+
+#define GRAPH_LAST_EDGE 0x8000
+
+#define GRAPH_FANOUT_SIZE (4*256)
+#define GRAPH_CHUNKLOOKUP_SIZE (5 * 12)
+#define GRAPH_MIN_SIZE (GRAPH_CHUNKLOOKUP_SIZE + GRAPH_FANOUT_SIZE + \
+   GRAPH_OID_LEN + sizeof(struct commit_graph_header))
+
+char* get_commit_graph_filename_hash(const char *pack_dir,
+struct object_id *hash)
+{
+   size_t len;
+   struct strbuf head_path = STRBUF_INIT;
+   strbuf_addstr(_path, pack_dir);
+   strbuf_addstr(_path, "/graph-");
+   strbuf_addstr(_path, oid_to_hex(hash));
+   strbuf_addstr(_path, ".graph");
+
+   return strbuf_detach(_path, );
+}
+
+static void write_graph_chunk_fanout(struct sha1file *f,
+struct commit **commits,
+int nr_commits)
+{
+   uint32_t i, count = 0;
+   struct commit **list = commits;
+   struct commit **last = commits + nr_commits;
+
+   /*
+* Write the first-level table (the list is sorted,
+* but we use a 256-entry lookup to be able to avoid
+* having to do eight extra binary search iterations).
+*/
+   for (i = 0; i < 256; i++) {
+   uint32_t swap_count;
+
+   while (list < last) {
+   if ((*list)->object.oid.hash[0] != i)
+   break;
+   count++;
+   list++;
+   }
+
+   swap_count = htonl(count);
+   sha1write(f, _count, 4);
+   }
+}
+
+static void write_graph_chunk_oids(struct sha1file *f, int hash_len,
+  struct commit **commits, int nr_commits)
+{
+   struct commit **list, **last = commits + nr_commits;
+   for (list = commits; list < last; list++)
+   sha1write(f, (*list)->object.oid.hash, (int)hash_len);
+}
+
+static int commit_pos(struct commit **commits, int nr_commits,
+ const struct object_id *oid, uint32_t *pos)
+{
+   uint32_t first = 0, last = nr_commits;
+
+   while (first < last) {
+   uint32_t mid = first + (last - first) / 2;
+   struct object_id *current;
+   int cmp;
+
+   current = &(commits[mid]->object.oid);
+   cmp = oidcmp(oid, current);
+   if (!cmp) {
+   *pos = mid;
+   return 1;
+   }
+   if (cmp > 0) {
+   first = mid + 1;
+   continue;
+   }
+   last = mid;
+   }
+
+   *pos = first;
+   return 0;
+}
+
+static void write_graph_chunk_data(struct sha1file *f, int hash_len,
+  struct commit **commits, int nr_commits)
+{
+   struct commit **list = commits;
+   struct commit **last = commits + nr_commits;
+   uint32_t num_large_edges = 0;
+
+   while (list < last) {
+   struct commit_list *parent;
+   uint32_t int_id, swap_int_id;
+   uint32_t packedDate[2];
+
+   parse_commit(*list);
+   sha1write(f,

[PATCH v2 10/14] commit-graph: add core.commitgraph setting

2018-01-30 Thread Derrick Stolee

The commit graph feature is controlled by the new core.commitgraph config
setting. This defaults to 0, so the feature is opt-in.

The intention of core.commitgraph is that a user can always stop checking
for or parsing commit graph files if core.commitgraph=0.

Signed-off-by: Derrick Stolee 
---
 Documentation/config.txt | 3 +++
 cache.h  | 1 +
 config.c | 5 +
 environment.c| 1 +
 4 files changed, 10 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0e25b2c92b..5b63559a2b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -898,6 +898,9 @@ core.notesRef::
 This setting defaults to "refs/notes/commits", and it can be overridden by
 the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
 
+core.commitgraph::
+   Enable git commit graph feature. Allows reading from .graph files.
+
 core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
linkgit:git-read-tree[1] for more information.
diff --git a/cache.h b/cache.h
index d8b975a571..e50e447a4f 100644
--- a/cache.h
+++ b/cache.h
@@ -825,6 +825,7 @@ extern char *git_replace_ref_base;
 extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_apply_sparse_checkout;
+extern int core_commitgraph;
 extern int precomposed_unicode;
 extern int protect_hfs;
 extern int protect_ntfs;
diff --git a/config.c b/config.c
index e617c2018d..99153fcfdb 100644
--- a/config.c
+++ b/config.c
@@ -1223,6 +1223,11 @@ static int git_default_core_config(const char *var, 
const char *value)
return 0;
}
 
+   if (!strcmp(var, "core.commitgraph")) {
+   core_commitgraph = git_config_bool(var, value);
+   return 0;
+   }
+
if (!strcmp(var, "core.sparsecheckout")) {
core_apply_sparse_checkout = git_config_bool(var, value);
return 0;
diff --git a/environment.c b/environment.c
index 63ac38a46f..faa4323cc5 100644
--- a/environment.c
+++ b/environment.c
@@ -61,6 +61,7 @@ enum object_creation_mode object_creation_mode = 
OBJECT_CREATION_MODE;
 char *notes_ref_name;
 int grafts_replace_parents = 1;
 int core_apply_sparse_checkout;
+int core_commitgraph;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
-- 
2.16.0.15.g9c3cf44.dirty

[PATCH v2 02/14] graph: add commit graph design document

2018-01-30 Thread Derrick Stolee

Add Documentation/technical/commit-graph.txt with details of the planned
commit graph feature, including future plans.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph.txt | 189 +++
 1 file changed, 189 insertions(+)
 create mode 100644 Documentation/technical/commit-graph.txt

diff --git a/Documentation/technical/commit-graph.txt 
b/Documentation/technical/commit-graph.txt
new file mode 100644
index 00..cbf88f7264
--- /dev/null
+++ b/Documentation/technical/commit-graph.txt
@@ -0,0 +1,189 @@
+Git Commit Graph Design Notes
+=
+
+Git walks the commit graph for many reasons, including:
+
+1. Listing and filtering commit history.
+2. Computing merge bases.
+
+These operations can become slow as the commit count grows. The merge
+base calculation shows up in many user-facing commands, such as 'merge-base'
+or 'git show --remerge-diff' and can take minutes to compute depending on
+history shape.
+
+There are two main costs here:
+
+1. Decompressing and parsing commits.
+2. Walking the entire graph to avoid topological order mistakes.
+
+The commit graph file is a supplemental data structure that accelerates
+commit graph walks. If a user downgrades or disables the 'core.commitgraph'
+config setting, then the existing ODB is sufficient. The file is stored
+next to packfiles either in the .git/objects/pack directory or in the pack
+directory of an alternate.
+
+The commit graph file stores the commit graph structure along with some
+extra metadata to speed up graph walks. By listing commit OIDs in lexi-
+cographic order, we can identify an integer position for each commit and
+refer to the parents of a commit using those integer positions. We use
+binary search to find initial commits and then use the integer positions
+for fast lookups during the walk.
+
+A consumer may load the following info for a commit from the graph:
+
+1. The commit OID.
+2. The list of parents, along with their integer position.
+3. The commit date.
+4. The root tree OID.
+5. The generation number (see definition below).
+
+Values 1-4 satisfy the requirements of parse_commit_gently().
+
+Define the "generation number" of a commit recursively as follows:
+
+ * A commit with no parents (a root commit) has generation number one.
+
+ * A commit with at least one parent has generation number one more than
+   the largest generation number among its parents.
+
+Equivalently, the generation number of a commit A is one more than the
+length of a longest path from A to a root commit. The recursive definition
+is easier to use for computation and observing the following property:
+
+If A and B are commits with generation numbers N and M, respectively,
+and N <= M, then A cannot reach B. That is, we know without searching
+that B is not an ancestor of A because it is further from a root commit
+than A.
+
+Conversely, when checking if A is an ancestor of B, then we only need
+to walk commits until all commits on the walk boundary have generation
+number at most N. If we walk commits using a priority queue seeded by
+generation numbers, then we always expand the boundary commit with highest
+generation number and can easily detect the stopping condition.
+
+This property can be used to significantly reduce the time it takes to
+walk commits and determine topological relationships. Without generation
+numbers, the general heuristic is the following:
+
+If A and B are commits with commit time X and Y, respectively, and
+X < Y, then A _probably_ cannot reach B.
+
+This heuristic is currently used whenever the computation can make
+mistakes with topological orders (such as "git log" with default order),
+but is not used when the topological order is required (such as merge
+base calculations, "git log --graph").
+
+In practice, we expect some commits to be created recently and not stored
+in the commit graph. We can treat these commits as having "infinite"
+generation number and walk until reaching commits with known generation
+number.
+
+Design Details
+--
+
+- A graph file is stored in a file named 'graph-.graph' in the pack
+  directory. This could be stored in an alternate.
+
+- The most-recent graph file hash is stored in a 'graph-head' file for
+  immediate access and storing backup graphs. This could be stored in an
+  alternate, and refers to a 'graph-.graph' file in the same pack
+  directory.
+
+- The core.commitgraph config setting must be on to consume graph files.
+
+- The file format includes parameters for the object id length and hash
+  algorithm, so a future change of hash algorithm does not require a change
+  in format.
+
+Current Limitations
+---
+
+- Only one graph file is used at one time. This allows the integer position
+  to seek into the single graph file. It is possible to extend the model
+  for multiple graph files, but that is currently not

[PATCH v2 12/14] commit-graph: read only from specific pack-indexes

2018-01-30 Thread Derrick Stolee

Teach git-commit-graph to inspect the objects only in a certain list
of pack-indexes within the given pack directory. This allows updating
the commit graph iteratively, since we add all commits stored in a
previous commit graph.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt | 13 +
 builtin/commit-graph.c | 25 ++---
 commit-graph.c | 25 +++--
 commit-graph.h |  4 +++-
 packfile.c |  4 ++--
 packfile.h |  2 ++
 t/t5318-commit-graph.sh|  6 --
 7 files changed, 69 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 7b376e9212..d0571cd896 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -43,6 +43,11 @@ OPTIONS
When used with --write and --update-head, delete the graph file
previously referenced by graph-head.
 
+--stdin-packs::
+   When used with --write, generate the new graph by walking objects
+   only in the specified packfiles and any commits in the
+   existing graph-head.
+
 EXAMPLES
 
 
@@ -65,6 +70,14 @@ $ git commit-graph --write
 $ git commit-graph --write --update-head --delete-expired
 
 
+* Write a graph file, extending the current graph file using commits
+* in , update graph-head, and delete the old graph-.graph
+* file.
++
+
+$ echo  | git commit-graph --write --update-head --delete-expired 
--stdin-packs
+
+
 * Read basic information from a graph file.
 +
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 766f09e6fc..80a409e784 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -12,7 +12,7 @@ static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--pack-dir ]"),
N_("git commit-graph --clear [--pack-dir ]"),
N_("git commit-graph --read [--graph-hash=]"),
-   N_("git commit-graph --write [--pack-dir ] [--update-head] 
[--delete-expired]"),
+   N_("git commit-graph --write [--pack-dir ] [--update-head] 
[--delete-expired] [--stdin-packs]"),
NULL
 };
 
@@ -24,6 +24,7 @@ static struct opts_commit_graph {
int write;
int update_head;
int delete_expired;
+   int stdin_packs;
int has_existing;
struct object_id old_graph_hash;
 } opts;
@@ -114,7 +115,24 @@ static void update_head_file(const char *pack_dir, const 
struct object_id *graph
 
 static int graph_write(void)
 {
-   struct object_id *graph_hash = construct_commit_graph(opts.pack_dir);
+   struct object_id *graph_hash;
+   char **pack_indexes = NULL;
+   int num_packs = 0;
+   int size_packs = 0;
+
+   if (opts.stdin_packs) {
+   struct strbuf buf = STRBUF_INIT;
+   size_packs = 128;
+   ALLOC_ARRAY(pack_indexes, size_packs);
+
+   while (strbuf_getline(, stdin) != EOF) {
+   ALLOC_GROW(pack_indexes, num_packs + 1, size_packs);
+   pack_indexes[num_packs++] = buf.buf;
+   strbuf_detach(, NULL);
+   }
+   }
+
+   graph_hash = construct_commit_graph(opts.pack_dir, pack_indexes, 
num_packs);
 
if (opts.update_head)
update_head_file(opts.pack_dir, graph_hash);
@@ -122,7 +140,6 @@ static int graph_write(void)
if (graph_hash)
printf("%s\n", oid_to_hex(graph_hash));
 
-
if (opts.delete_expired && opts.update_head && opts.has_existing &&
oidcmp(graph_hash, _graph_hash)) {
char *old_path = get_commit_graph_filename_hash(opts.pack_dir,
@@ -153,6 +170,8 @@ int cmd_commit_graph(int argc, const char **argv, const 
char *prefix)
N_("update graph-head to written graph file")),
OPT_BOOL('d', "delete-expired", _expired,
N_("delete expired head graph file")),
+   OPT_BOOL('s', "stdin-packs", _packs,
+   N_("only scan packfiles listed by stdin")),
{ OPTION_STRING, 'H', "graph-hash", _hash,
N_("hash"),
N_("A hash for a specific graph file in the pack-dir."),
diff --git a/commit-graph.c b/commit-graph.c
index fc816533c6..e5a1d9ee8b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -638,7 +638,9 @@ static int if_packed_commit_add_to_list(const struct 
object_id *oid,
return 0;
 }
 
-struct object_id *construct_commit_graph(const char *pack_dir)
+struct object_id *construct_commit_graph(const char *pack_dir,
+char **pack_indexes,
+

[PATCH v2 09/14] commit-graph: teach git-commit-graph --delete-expired

2018-01-30 Thread Derrick Stolee

Teach git-commit-graph to delete the graph previously referenced by 'graph_head'
when writing a new graph file and updating 'graph_head'. This prevents
data creep by storing a list of useless graphs. Be careful to not delete
the graph if the file did not change.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  8 +++--
 builtin/commit-graph.c | 16 -
 t/t5318-commit-graph.sh| 66 +-
 3 files changed, 86 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 33d6567f11..7b376e9212 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -39,6 +39,10 @@ OPTIONS
When used with --write, update the graph-head file to point to
the written graph file.
 
+--delete-expired::
+   When used with --write and --update-head, delete the graph file
+   previously referenced by graph-head.
+
 EXAMPLES
 
 
@@ -55,10 +59,10 @@ $ git commit-graph --write
 
 
 * Write a graph file for the packed commits in your local .git folder,
-* and update graph-head.
+* update graph-head, and delete the old graph-.graph file.
 +
 
-$ git commit-graph --write --update-head
+$ git commit-graph --write --update-head --delete-expired
 
 
 * Read basic information from a graph file.
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 4970dec133..766f09e6fc 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -12,7 +12,7 @@ static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--pack-dir ]"),
N_("git commit-graph --clear [--pack-dir ]"),
N_("git commit-graph --read [--graph-hash=]"),
-   N_("git commit-graph --write [--pack-dir ] [--update-head]"),
+   N_("git commit-graph --write [--pack-dir ] [--update-head] 
[--delete-expired]"),
NULL
 };
 
@@ -23,6 +23,7 @@ static struct opts_commit_graph {
const char *graph_hash;
int write;
int update_head;
+   int delete_expired;
int has_existing;
struct object_id old_graph_hash;
 } opts;
@@ -121,6 +122,17 @@ static int graph_write(void)
if (graph_hash)
printf("%s\n", oid_to_hex(graph_hash));
 
+
+   if (opts.delete_expired && opts.update_head && opts.has_existing &&
+   oidcmp(graph_hash, _graph_hash)) {
+   char *old_path = get_commit_graph_filename_hash(opts.pack_dir,
+   
_graph_hash);
+   if (remove_path(old_path))
+   die("failed to remove path %s", old_path);
+
+   free(old_path);
+   }
+
free(graph_hash);
return 0;
 }
@@ -139,6 +151,8 @@ int cmd_commit_graph(int argc, const char **argv, const 
char *prefix)
N_("write commit graph file")),
OPT_BOOL('u', "update-head", _head,
N_("update graph-head to written graph file")),
+   OPT_BOOL('d', "delete-expired", _expired,
+   N_("delete expired head graph file")),
{ OPTION_STRING, 'H', "graph-hash", _hash,
N_("hash"),
N_("A hash for a specific graph file in the pack-dir."),
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 6e3b62b754..b56a6d4217 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -101,9 +101,73 @@ test_expect_success 'write graph with merges' \
  _graph_read_expect "18" "${packdir}" &&
  cmp expect output'
 
+test_expect_success 'Add more commits' \
+'for i in $(test_seq 16 20)
+ do
+echo $i >$i.txt &&
+git add $i.txt &&
+git commit -m "commit $i" &&
+git branch commits/$i
+ done &&
+ git repack'
+
+# Current graph structure:
+#
+#  20
+#   |
+#  19
+#   |
+#  18
+#   |
+#  17
+#   |
+#  16
+#   |
+#  M3
+# / |\_
+#/ 10  15
+#   /   |  |
+#  /9 M2   14
+# | |/  \  |
+# | 8 M1 | 13
+# | |/ | \_|
+# 5 7  |   12
+# | |   \__|
+# 4 6  11
+# |/__/
+# 3
+# |
+# 2
+# |
+# 1
+
+test_expect_success 'write graph with merges' \
+'graph3=$(git commit-graph --write --update-head --delete-expired) &&
+ test_path_is_file ${packdir}/graph-${graph3}.graph &&
+ test_path_is_missing ${packdir}/graph-${graph2}.graph &&
+ test_path_is_file ${packdir}/graph-${graph1}.graph &&
+ test_path_is_file ${packdir}/graph-head &&
+ echo ${graph3} >expect &&
+ cmp -n 40 expect ${packdir}/graph-head &&
+ git commit-graph --read --graph-hash=${graph3} >output &&
+ _graph_read_expect "23"

[PATCH v2 05/14] commit-graph: implement git-commit-graph --write

2018-01-30 Thread Derrick Stolee

Teach git-commit-graph to write graph files. Create new test script to verify
this command succeeds without failure.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt | 18 +++
 builtin/commit-graph.c | 30 
 t/t5318-commit-graph.sh| 96 ++
 3 files changed, 144 insertions(+)
 create mode 100755 t/t5318-commit-graph.sh

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index c8ea548dfb..3f3790d9a8 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -5,3 +5,21 @@ NAME
 
 git-commit-graph - Write and verify Git commit graphs (.graph files)
 
+
+SYNOPSIS
+
+[verse]
+'git commit-graph' --write  [--pack-dir ]
+
+EXAMPLES
+
+
+* Write a commit graph file for the packed commits in your local .git folder.
++
+
+$ git commit-graph --write
+
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 2104550d25..7affd512f1 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -6,22 +6,38 @@
 #include "lockfile.h"
 #include "packfile.h"
 #include "parse-options.h"
+#include "commit-graph.h"
 
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--pack-dir ]"),
+   N_("git commit-graph --write [--pack-dir ]"),
NULL
 };
 
 static struct opts_commit_graph {
const char *pack_dir;
+   int write;
 } opts;
 
+static int graph_write(void)
+{
+   struct object_id *graph_hash = construct_commit_graph(opts.pack_dir);
+
+   if (graph_hash)
+   printf("%s\n", oid_to_hex(graph_hash));
+
+   free(graph_hash);
+   return 0;
+}
+
 int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 {
static struct option builtin_commit_graph_options[] = {
{ OPTION_STRING, 'p', "pack-dir", _dir,
N_("dir"),
N_("The pack directory to store the graph") },
+   OPT_BOOL('w', "write", ,
+   N_("write commit graph file")),
OPT_END(),
};
 
@@ -29,5 +45,19 @@ int cmd_commit_graph(int argc, const char **argv, const char 
*prefix)
usage_with_options(builtin_commit_graph_usage,
   builtin_commit_graph_options);
 
+   argc = parse_options(argc, argv, prefix,
+builtin_commit_graph_options,
+builtin_commit_graph_usage, 0);
+
+   if (!opts.pack_dir) {
+   struct strbuf path = STRBUF_INIT;
+   strbuf_addstr(, get_object_directory());
+   strbuf_addstr(, "/pack");
+   opts.pack_dir = strbuf_detach(, NULL);
+   }
+
+   if (opts.write)
+   return graph_write();
+
return 0;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
new file mode 100755
index 00..6bcd1cc264
--- /dev/null
+++ b/t/t5318-commit-graph.sh
@@ -0,0 +1,96 @@
+#!/bin/sh
+
+test_description='commit graph'
+. ./test-lib.sh
+
+test_expect_success 'setup full repo' \
+'rm -rf .git &&
+ mkdir full &&
+ cd full &&
+ git init &&
+ git config core.commitgraph true &&
+ git config pack.threads 1 &&
+ packdir=".git/objects/pack"'
+
+test_expect_success 'write graph with no packs' \
+'git commit-graph --write --pack-dir .'
+
+test_expect_success 'create commits and repack' \
+'for i in $(test_seq 5)
+ do
+echo $i >$i.txt &&
+git add $i.txt &&
+git commit -m "commit $i" &&
+git branch commits/$i
+ done &&
+ git repack'
+
+test_expect_success 'write graph' \
+'graph1=$(git commit-graph --write) &&
+ test_path_is_file ${packdir}/graph-${graph1}.graph'
+
+t_expect_success 'Add more commits' \
+'git reset --hard commits/3 &&
+ for i in $(test_seq 6 10)
+ do
+echo $i >$i.txt &&
+git add $i.txt &&
+git commit -m "commit $i" &&
+git branch commits/$i
+ done &&
+ git reset --hard commits/3 &&
+ for i in $(test_seq 11 15)
+ do
+echo $i >$i.txt &&
+git add $i.txt &&
+git commit -m "commit $i" &&
+git branch commits/$i
+ done &&
+ git reset --hard commits/7 &&
+ git merge commits/11 &&
+ git branch merge/1 &&
+ git reset --hard commits/8 &&
+ git merge commits/12 &&
+ git branch merge/2 &&
+ git reset --hard commits/5 &&
+ git merge commits/10 commits/15 &&
+ git branch merge/3 &&
+ git repack'
+
+# Current graph structure:
+#
+#  M3
+# / |\_
+#/ 10  15
+#   /   |  |
+#  /9 M2   14
+# | |/  \  |
+# | 8 M1 | 13
+# | |/ | \_|
+# 5 7  |   12
+# | |   \__|

[PATCH v2 07/14] commit-graph: implement git-commit-graph --update-head

2018-01-30 Thread Derrick Stolee

It is possible to have multiple commit graph files in a pack directory,
but only one is important at a time. Use a 'graph_head' file to point
to the important file. Teach git-commit-graph to write 'graph_head' upon
writing a new commit graph file.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt | 34 ++
 builtin/commit-graph.c | 38 +++---
 commit-graph.c | 25 +
 commit-graph.h |  2 ++
 t/t5318-commit-graph.sh| 12 ++--
 5 files changed, 106 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 09aeaf6c82..99ced16ddc 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -12,15 +12,49 @@ SYNOPSIS
 'git commit-graph' --write  [--pack-dir ]
 'git commit-graph' --read  [--pack-dir ]
 
+OPTIONS
+---
+--pack-dir::
+   Use given directory for the location of packfiles, graph-head,
+   and graph files.
+
+--read::
+   Read a graph file given by the graph-head file and output basic
+   details about the graph file. (Cannot be combined with --write.)
+
+--graph-id::
+   When used with --read, consider the graph file graph-.graph.
+
+--write::
+   Write a new graph file to the pack directory. (Cannot be combined
+   with --read.)
+
+--update-head::
+   When used with --write, update the graph-head file to point to
+   the written graph file.
+
 EXAMPLES
 
 
+* Output the hash of the graph file pointed to by /graph-head.
++
+
+$ git commit-graph --pack-dir=
+
+
 * Write a commit graph file for the packed commits in your local .git folder.
 +
 
 $ git commit-graph --write
 
 
+* Write a graph file for the packed commits in your local .git folder,
+* and update graph-head.
++
+
+$ git commit-graph --write --update-head
+
+
 * Read basic information from a graph file.
 +
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 218740b1f8..d73cbc907d 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -11,7 +11,7 @@
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--pack-dir ]"),
N_("git commit-graph --read [--graph-hash=]"),
-   N_("git commit-graph --write [--pack-dir ]"),
+   N_("git commit-graph --write [--pack-dir ] [--update-head]"),
NULL
 };
 
@@ -20,6 +20,9 @@ static struct opts_commit_graph {
int read;
const char *graph_hash;
int write;
+   int update_head;
+   int has_existing;
+   struct object_id old_graph_hash;
 } opts;
 
 static int graph_read(void)
@@ -30,8 +33,8 @@ static int graph_read(void)
 
if (opts.graph_hash && strlen(opts.graph_hash) == GIT_MAX_HEXSZ)
get_oid_hex(opts.graph_hash, _hash);
-   else
-   die("no graph hash specified");
+   else if (!get_graph_head_hash(opts.pack_dir, _hash))
+   die("no graph-head exists");
 
graph_file = get_commit_graph_filename_hash(opts.pack_dir, _hash);
graph = load_commit_graph_one(graph_file, opts.pack_dir);
@@ -62,10 +65,33 @@ static int graph_read(void)
return 0;
 }
 
+static void update_head_file(const char *pack_dir, const struct object_id 
*graph_hash)
+{
+   struct strbuf head_path = STRBUF_INIT;
+   int fd;
+   struct lock_file lk = LOCK_INIT;
+
+   strbuf_addstr(_path, pack_dir);
+   strbuf_addstr(_path, "/");
+   strbuf_addstr(_path, "graph-head");
+
+   fd = hold_lock_file_for_update(, head_path.buf, LOCK_DIE_ON_ERROR);
+   strbuf_release(_path);
+
+   if (fd < 0)
+   die_errno("unable to open graph-head");
+
+   write_in_full(fd, oid_to_hex(graph_hash), GIT_MAX_HEXSZ);
+   commit_lock_file();
+}
+
 static int graph_write(void)
 {
struct object_id *graph_hash = construct_commit_graph(opts.pack_dir);
 
+   if (opts.update_head)
+   update_head_file(opts.pack_dir, graph_hash);
+
if (graph_hash)
printf("%s\n", oid_to_hex(graph_hash));
 
@@ -83,6 +109,8 @@ int cmd_commit_graph(int argc, const char **argv, const char 
*prefix)
N_("read graph file")),
OPT_BOOL('w', "write", ,
N_("write commit graph file")),
+   OPT_BOOL('u', "update-head", _head,
+   N_("update graph-head to written graph file")),
{ OPTION_STRING, 'H', "graph-hash", _hash,

[PATCH v2 13/14] commit-graph: close under reachability

2018-01-30 Thread Derrick Stolee

Teach construct_commit_graph() to walk all parents from the commits
discovered in packfiles. This prevents gaps given by loose objects or
previously-missed packfiles.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 26 ++
 t/t5318-commit-graph.sh | 14 ++
 2 files changed, 40 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index e5a1d9ee8b..cfa0415a21 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -5,6 +5,7 @@
 #include "packfile.h"
 #include "commit.h"
 #include "object.h"
+#include "revision.h"
 #include "commit-graph.h"
 
 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
@@ -638,6 +639,29 @@ static int if_packed_commit_add_to_list(const struct 
object_id *oid,
return 0;
 }
 
+static void close_reachable(struct packed_oid_list *oids)
+{
+   int i;
+   struct rev_info revs;
+   struct commit *commit;
+   init_revisions(, NULL);
+
+   for (i = 0; i < oids->num; i++) {
+   commit = lookup_commit(oids->list[i]);
+   if (commit && !parse_commit(commit))
+   revs.commits = commit_list_insert(commit, 
);
+   }
+
+   if (prepare_revision_walk())
+   die(_("revision walk setup failed"));
+
+   while ((commit = get_revision()) != NULL) {
+   ALLOC_GROW(oids->list, oids->num + 1, oids->size);
+   oids->list[oids->num] = &(commit->object.oid);
+   (oids->num)++;
+   }
+}
+
 struct object_id *construct_commit_graph(const char *pack_dir,
 char **pack_indexes,
 int nr_packs)
@@ -696,6 +720,8 @@ struct object_id *construct_commit_graph(const char 
*pack_dir,
} else {
for_each_packed_object(if_packed_commit_add_to_list, , 0);
}
+
+   close_reachable();
QSORT(oids.list, oids.num, commit_compare);
 
count_distinct = 1;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index b9a73f398c..2001b0b5b5 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -213,6 +213,20 @@ test_expect_success 'clear graph' \
 _graph_git_behavior commits/20 merge/1
 _graph_git_behavior commits/20 merge/2
 
+test_expect_success 'build graph from latest pack with closure' \
+'graph5=$(cat new-idx | git commit-graph --write --update-head 
--stdin-packs) &&
+ test_path_is_file ${packdir}/graph-${graph5}.graph &&
+ test_path_is_file ${packdir}/graph-${graph1}.graph &&
+ test_path_is_file ${packdir}/graph-head &&
+ echo ${graph5} >expect &&
+ cmp -n 40 expect ${packdir}/graph-head &&
+ git commit-graph --read --graph-hash=${graph5} >output &&
+ _graph_read_expect "21" "${packdir}" &&
+ cmp expect output'
+
+_graph_git_behavior commits/20 merge/1
+_graph_git_behavior commits/20 merge/2
+
 test_expect_success 'setup bare repo' \
 'cd .. &&
  git clone --bare full bare &&
-- 
2.16.0.15.g9c3cf44.dirty

[PATCH v2 14/14] commit-graph: build graph from starting commits

2018-01-30 Thread Derrick Stolee

Teach git-commit-graph to read commits from stdin when the
--stdin-commits flag is specified. Commits reachable from these
commits are added to the graph. This is a much faster way to construct
the graph than inspecting all packed objects, but is restricted to
known tips.

For the Linux repository, 700,000+ commits were added to the graph
file starting from 'master' in 7-9 seconds, depending on the number
of packfiles in the repo (1, 24, or 120).

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  7 ++-
 builtin/commit-graph.c | 34 +-
 commit-graph.c | 26 +++---
 commit-graph.h |  4 +++-
 t/t5318-commit-graph.sh| 18 ++
 5 files changed, 75 insertions(+), 14 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index d0571cd896..3357c0cf8f 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -46,7 +46,12 @@ OPTIONS
 --stdin-packs::
When used with --write, generate the new graph by walking objects
only in the specified packfiles and any commits in the
-   existing graph-head.
+   existing graph-head. (Cannot be combined with --stdin-commits.)
+
+--stdin-commits::
+   When used with --write, generate the new graph by walking commits
+   starting at the commits specified in stdin as a list of OIDs in
+   hex, one OID per line. (Cannot be combined with --stdin-packs.)
 
 EXAMPLES
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 80a409e784..adc05f0582 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -12,7 +12,7 @@ static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--pack-dir ]"),
N_("git commit-graph --clear [--pack-dir ]"),
N_("git commit-graph --read [--graph-hash=]"),
-   N_("git commit-graph --write [--pack-dir ] [--update-head] 
[--delete-expired] [--stdin-packs]"),
+   N_("git commit-graph --write [--pack-dir ] [--update-head] 
[--delete-expired] [--stdin-packs|--stdin-commits]"),
NULL
 };
 
@@ -25,6 +25,7 @@ static struct opts_commit_graph {
int update_head;
int delete_expired;
int stdin_packs;
+   int stdin_commits;
int has_existing;
struct object_id old_graph_hash;
 } opts;
@@ -117,23 +118,36 @@ static int graph_write(void)
 {
struct object_id *graph_hash;
char **pack_indexes = NULL;
+   char **commits = NULL;
int num_packs = 0;
-   int size_packs = 0;
+   int num_commits = 0;
+   char **lines = NULL;
+   int num_lines = 0;
+   int size_lines = 0;
 
-   if (opts.stdin_packs) {
+   if (opts.stdin_packs || opts.stdin_commits) {
struct strbuf buf = STRBUF_INIT;
-   size_packs = 128;
-   ALLOC_ARRAY(pack_indexes, size_packs);
+   size_lines = 128;
+   ALLOC_ARRAY(lines, size_lines);
 
while (strbuf_getline(, stdin) != EOF) {
-   ALLOC_GROW(pack_indexes, num_packs + 1, size_packs);
-   pack_indexes[num_packs++] = buf.buf;
+   ALLOC_GROW(lines, num_lines + 1, size_lines);
+   lines[num_lines++] = buf.buf;
strbuf_detach(, NULL);
}
-   }
 
-   graph_hash = construct_commit_graph(opts.pack_dir, pack_indexes, 
num_packs);
+   if (opts.stdin_packs) {
+   pack_indexes = lines;
+   num_packs = num_lines;
+   }
+   if (opts.stdin_commits) {
+   commits = lines;
+   num_commits = num_lines;
+   }
+   }
 
+   graph_hash = construct_commit_graph(opts.pack_dir, pack_indexes, 
num_packs,
+   commits, num_commits);
if (opts.update_head)
update_head_file(opts.pack_dir, graph_hash);
 
@@ -172,6 +186,8 @@ int cmd_commit_graph(int argc, const char **argv, const 
char *prefix)
N_("delete expired head graph file")),
OPT_BOOL('s', "stdin-packs", _packs,
N_("only scan packfiles listed by stdin")),
+   OPT_BOOL('C', "stdin-commits", _commits,
+   N_("start walk at commits listed by stdin")),
{ OPTION_STRING, 'H', "graph-hash", _hash,
N_("hash"),
N_("A hash for a specific graph file in the pack-dir."),
diff --git a/commit-graph.c b/commit-graph.c
index cfa0415a21..7f31a6c795 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -664,7 +664,9 @@ static void close_reachable(struct packed_oid_list *oids)
 
 struct object_id *construct_commit_graph(const char *pack_dir,

[PATCH v2 06/14] commit-graph: implement git-commit-graph --read

2018-01-30 Thread Derrick Stolee

Teach git-commit-graph to read commit graph files and summarize their contents.

Use the --read option to verify the contents of a commit graph file in the
tests.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |   7 ++
 builtin/commit-graph.c |  55 +++
 commit-graph.c | 138 -
 commit-graph.h |  25 +++
 t/t5318-commit-graph.sh|  28 ++--
 5 files changed, 247 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 3f3790d9a8..09aeaf6c82 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -10,6 +10,7 @@ SYNOPSIS
 
 [verse]
 'git commit-graph' --write  [--pack-dir ]
+'git commit-graph' --read  [--pack-dir ]
 
 EXAMPLES
 
@@ -20,6 +21,12 @@ EXAMPLES
 $ git commit-graph --write
 
 
+* Read basic information from a graph file.
++
+
+$ git commit-graph --read --graph-hash=
+
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 7affd512f1..218740b1f8 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -10,15 +10,58 @@
 
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--pack-dir ]"),
+   N_("git commit-graph --read [--graph-hash=]"),
N_("git commit-graph --write [--pack-dir ]"),
NULL
 };
 
 static struct opts_commit_graph {
const char *pack_dir;
+   int read;
+   const char *graph_hash;
int write;
 } opts;
 
+static int graph_read(void)
+{
+   struct object_id graph_hash;
+   struct commit_graph *graph = 0;
+   const char *graph_file;
+
+   if (opts.graph_hash && strlen(opts.graph_hash) == GIT_MAX_HEXSZ)
+   get_oid_hex(opts.graph_hash, _hash);
+   else
+   die("no graph hash specified");
+
+   graph_file = get_commit_graph_filename_hash(opts.pack_dir, _hash);
+   graph = load_commit_graph_one(graph_file, opts.pack_dir);
+
+   if (!graph)
+   die("graph file %s does not exist", graph_file);
+
+   printf("header: %08x %02x %02x %02x %02x\n",
+   ntohl(graph->hdr->graph_signature),
+   graph->hdr->graph_version,
+   graph->hdr->hash_version,
+   graph->hdr->hash_len,
+   graph->hdr->num_chunks);
+   printf("num_commits: %u\n", graph->num_commits);
+   printf("chunks:");
+
+   if (graph->chunk_oid_fanout)
+   printf(" oid_fanout");
+   if (graph->chunk_oid_lookup)
+   printf(" oid_lookup");
+   if (graph->chunk_commit_data)
+   printf(" commit_metadata");
+   if (graph->chunk_large_edges)
+   printf(" large_edges");
+   printf("\n");
+
+   printf("pack_dir: %s\n", graph->pack_dir);
+   return 0;
+}
+
 static int graph_write(void)
 {
struct object_id *graph_hash = construct_commit_graph(opts.pack_dir);
@@ -36,8 +79,14 @@ int cmd_commit_graph(int argc, const char **argv, const char 
*prefix)
{ OPTION_STRING, 'p', "pack-dir", _dir,
N_("dir"),
N_("The pack directory to store the graph") },
+   OPT_BOOL('r', "read", ,
+   N_("read graph file")),
OPT_BOOL('w', "write", ,
N_("write commit graph file")),
+   { OPTION_STRING, 'H', "graph-hash", _hash,
+   N_("hash"),
+   N_("A hash for a specific graph file in the pack-dir."),
+   PARSE_OPT_OPTARG, NULL, (intptr_t) "" },
OPT_END(),
};
 
@@ -49,6 +98,10 @@ int cmd_commit_graph(int argc, const char **argv, const char 
*prefix)
 builtin_commit_graph_options,
 builtin_commit_graph_usage, 0);
 
+   if (opts.write + opts.read > 1)
+   usage_with_options(builtin_commit_graph_usage,
+  builtin_commit_graph_options);
+
if (!opts.pack_dir) {
struct strbuf path = STRBUF_INIT;
strbuf_addstr(, get_object_directory());
@@ -56,6 +109,8 @@ int cmd_commit_graph(int argc, const char **argv, const char 
*prefix)
opts.pack_dir = strbuf_detach(, NULL);
}
 
+   if (opts.read)
+   return graph_read();
if (opts.write)
return graph_write();
 
diff --git a/commit-graph.c b/commit-graph.c
index db2b7390c7..622a650259 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -48,6 +48,142 @@ char* get_commit_graph_filename_hash(const char *pack_dir,
return

[PATCH v2 08/14] commit-graph: implement git-commit-graph --clear

2018-01-30 Thread Derrick Stolee

Teach Git to delete the current 'graph_head' file and the commit graph
it references. This is a good safety valve if somehow the file is
corrupted and needs to be recalculated. Since the commit graph is a
summary of contents already in the ODB, it can be regenerated.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt | 16 ++--
 builtin/commit-graph.c | 32 +++-
 t/t5318-commit-graph.sh|  7 ++-
 3 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 99ced16ddc..33d6567f11 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -11,6 +11,7 @@ SYNOPSIS
 [verse]
 'git commit-graph' --write  [--pack-dir ]
 'git commit-graph' --read  [--pack-dir ]
+'git commit-graph' --clear [--pack-dir ]
 
 OPTIONS
 ---
@@ -18,16 +19,21 @@ OPTIONS
Use given directory for the location of packfiles, graph-head,
and graph files.
 
+--clear::
+   Delete the graph-head file and the graph file it references.
+   (Cannot be combined with --read or --write.)
+
 --read::
Read a graph file given by the graph-head file and output basic
-   details about the graph file. (Cannot be combined with --write.)
+   details about the graph file. (Cannot be combined with --clear
+   or --write.)
 
 --graph-id::
When used with --read, consider the graph file graph-.graph.
 
 --write::
Write a new graph file to the pack directory. (Cannot be combined
-   with --read.)
+   with --clear or --read.)
 
 --update-head::
When used with --write, update the graph-head file to point to
@@ -61,6 +67,12 @@ $ git commit-graph --write --update-head
 $ git commit-graph --read --graph-hash=
 
 
+* Delete /graph-head and the file it references.
++
+
+$ git commit-graph --clear --pack-dir=
+
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index d73cbc907d..4970dec133 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -10,6 +10,7 @@
 
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--pack-dir ]"),
+   N_("git commit-graph --clear [--pack-dir ]"),
N_("git commit-graph --read [--graph-hash=]"),
N_("git commit-graph --write [--pack-dir ] [--update-head]"),
NULL
@@ -17,6 +18,7 @@ static char const * const builtin_commit_graph_usage[] = {
 
 static struct opts_commit_graph {
const char *pack_dir;
+   int clear;
int read;
const char *graph_hash;
int write;
@@ -25,6 +27,30 @@ static struct opts_commit_graph {
struct object_id old_graph_hash;
 } opts;
 
+static int graph_clear(void)
+{
+   struct strbuf head_path = STRBUF_INIT;
+   char *old_path;
+
+   if (!opts.has_existing)
+   return 0;
+
+   strbuf_addstr(_path, opts.pack_dir);
+   strbuf_addstr(_path, "/");
+   strbuf_addstr(_path, "graph-head");
+   if (remove_path(head_path.buf))
+   die("failed to remove path %s", head_path.buf);
+   strbuf_release(_path);
+
+   old_path = get_commit_graph_filename_hash(opts.pack_dir,
+ _graph_hash);
+   if (remove_path(old_path))
+   die("failed to remove path %s", old_path);
+   free(old_path);
+
+   return 0;
+}
+
 static int graph_read(void)
 {
struct object_id graph_hash;
@@ -105,6 +131,8 @@ int cmd_commit_graph(int argc, const char **argv, const 
char *prefix)
{ OPTION_STRING, 'p', "pack-dir", _dir,
N_("dir"),
N_("The pack directory to store the graph") },
+   OPT_BOOL('c', "clear", ,
+   N_("clear graph file and graph-head")),
OPT_BOOL('r', "read", ,
N_("read graph file")),
OPT_BOOL('w', "write", ,
@@ -126,7 +154,7 @@ int cmd_commit_graph(int argc, const char **argv, const 
char *prefix)
 builtin_commit_graph_options,
 builtin_commit_graph_usage, 0);
 
-   if (opts.write + opts.read > 1)
+   if (opts.write + opts.read + opts.clear > 1)
usage_with_options(builtin_commit_graph_usage,
   builtin_commit_graph_options);
 
@@ -139,6 +167,8 @@ int cmd_commit_graph(int argc, const char **argv, const 
char *prefix)
 
opts.has_existing = !!get_graph_head_hash(opts.pack_dir, 
_graph_hash);
 
+   if (opts.clear)
+   return graph_clear();
if (opts.read)
return graph_read();
if (opts.write)
diff --git

[PATCH v2 03/14] commit-graph: create git-commit-graph builtin

2018-01-30 Thread Derrick Stolee

Teach git the 'commit-graph' builtin that will be used for writing and
reading packed graph files. The current implementation is mostly
empty, except for a '--pack-dir' option.

Signed-off-by: Derrick Stolee 
---
 .gitignore |  1 +
 Documentation/git-commit-graph.txt |  7 +++
 Makefile   |  1 +
 builtin.h  |  1 +
 builtin/commit-graph.c | 33 +
 command-list.txt   |  1 +
 git.c  |  1 +
 7 files changed, 45 insertions(+)
 create mode 100644 Documentation/git-commit-graph.txt
 create mode 100644 builtin/commit-graph.c

diff --git a/.gitignore b/.gitignore
index 833ef3b0b7..e82f90184d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -34,6 +34,7 @@
 /git-clone
 /git-column
 /git-commit
+/git-commit-graph
 /git-commit-tree
 /git-config
 /git-count-objects
diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
new file mode 100644
index 00..c8ea548dfb
--- /dev/null
+++ b/Documentation/git-commit-graph.txt
@@ -0,0 +1,7 @@
+git-commit-graph(1)
+
+
+NAME
+
+git-commit-graph - Write and verify Git commit graphs (.graph files)
+
diff --git a/Makefile b/Makefile
index 1a9b23b679..aee5d3f7b9 100644
--- a/Makefile
+++ b/Makefile
@@ -965,6 +965,7 @@ BUILTIN_OBJS += builtin/for-each-ref.o
 BUILTIN_OBJS += builtin/fsck.o
 BUILTIN_OBJS += builtin/gc.o
 BUILTIN_OBJS += builtin/get-tar-commit-id.o
+BUILTIN_OBJS += builtin/commit-graph.o
 BUILTIN_OBJS += builtin/grep.o
 BUILTIN_OBJS += builtin/hash-object.o
 BUILTIN_OBJS += builtin/help.o
diff --git a/builtin.h b/builtin.h
index 42378f3aa4..079855b6d4 100644
--- a/builtin.h
+++ b/builtin.h
@@ -149,6 +149,7 @@ extern int cmd_clone(int argc, const char **argv, const 
char *prefix);
 extern int cmd_clean(int argc, const char **argv, const char *prefix);
 extern int cmd_column(int argc, const char **argv, const char *prefix);
 extern int cmd_commit(int argc, const char **argv, const char *prefix);
+extern int cmd_commit_graph(int argc, const char **argv, const char *prefix);
 extern int cmd_commit_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_config(int argc, const char **argv, const char *prefix);
 extern int cmd_count_objects(int argc, const char **argv, const char *prefix);
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
new file mode 100644
index 00..2104550d25
--- /dev/null
+++ b/builtin/commit-graph.c
@@ -0,0 +1,33 @@
+#include "builtin.h"
+#include "cache.h"
+#include "config.h"
+#include "dir.h"
+#include "git-compat-util.h"
+#include "lockfile.h"
+#include "packfile.h"
+#include "parse-options.h"
+
+static char const * const builtin_commit_graph_usage[] = {
+   N_("git commit-graph [--pack-dir ]"),
+   NULL
+};
+
+static struct opts_commit_graph {
+   const char *pack_dir;
+} opts;
+
+int cmd_commit_graph(int argc, const char **argv, const char *prefix)
+{
+   static struct option builtin_commit_graph_options[] = {
+   { OPTION_STRING, 'p', "pack-dir", _dir,
+   N_("dir"),
+   N_("The pack directory to store the graph") },
+   OPT_END(),
+   };
+
+   if (argc == 2 && !strcmp(argv[1], "-h"))
+   usage_with_options(builtin_commit_graph_usage,
+  builtin_commit_graph_options);
+
+   return 0;
+}
diff --git a/command-list.txt b/command-list.txt
index a1fad28fd8..835c5890be 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -34,6 +34,7 @@ git-clean   mainporcelain
 git-clone   mainporcelain   init
 git-column  purehelpers
 git-commit  mainporcelain   history
+git-commit-graphplumbingmanipulators
 git-commit-tree plumbingmanipulators
 git-config  ancillarymanipulators
 git-count-objects   ancillaryinterrogators
diff --git a/git.c b/git.c
index c870b9719c..c7b5adae7b 100644
--- a/git.c
+++ b/git.c
@@ -388,6 +388,7 @@ static struct cmd_struct commands[] = {
{ "clone", cmd_clone },
{ "column", cmd_column, RUN_SETUP_GENTLY },
{ "commit", cmd_commit, RUN_SETUP | NEED_WORK_TREE },
+   { "commit-graph", cmd_commit_graph, RUN_SETUP },
{ "commit-tree", cmd_commit_tree, RUN_SETUP },
{ "config", cmd_config, RUN_SETUP_GENTLY },
{ "count-objects", cmd_count_objects, RUN_SETUP },
-- 
2.16.0.15.g9c3cf44.dirty

[PATCH v2 01/14] commit-graph: add format document

2018-01-30 Thread Derrick Stolee

Add document specifying the binary format for commit graphs. This
format allows for:

* New versions.
* New hash functions and hash lengths.
* Optional extensions.

Basic header information is followed by a binary table of contents
into "chunks" that include:

* An ordered list of commit object IDs.
* A 256-entry fanout into that list of OIDs.
* A list of metadata for the commits.
* A list of "large edges" to enable octopus merges.

The format automatically includes two parent positions for every
commit. This favors speed over space, since using only one position
per commit would cause an extra level of indirection for every merge
commit. (Octopus merges suffer from this indirection, but they are
very rare.)

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph-format.txt | 89 +
 1 file changed, 89 insertions(+)
 create mode 100644 Documentation/technical/commit-graph-format.txt

diff --git a/Documentation/technical/commit-graph-format.txt 
b/Documentation/technical/commit-graph-format.txt
new file mode 100644
index 00..8a987c7aa9
--- /dev/null
+++ b/Documentation/technical/commit-graph-format.txt
@@ -0,0 +1,89 @@
+Git commit graph format
+===
+
+The Git commit graph stores a list of commit OIDs and some associated
+metadata, including:
+
+- The generation number of the commit. Commits with no parents have
+  generation number 1; commits with parents have generation number
+  one more than the maximum generation number of its parents. We
+  reserve zero as special, and can be used to mark a generation
+  number invalid or as "not computed".
+
+- The root tree OID.
+
+- The commit date.
+
+- The parents of the commit, stored using positional references within
+  the graph file.
+
+== graph-*.graph files have the following format:
+
+In order to allow extensions that add extra data to the graph, we organize
+the body into "chunks" and provide a binary lookup table at the beginning
+of the body. The header includes certain values, such as number of chunks,
+hash lengths and types.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+  4-byte signature:
+  The signature is: {'C', 'G', 'P', 'H'}
+
+  1-byte version number:
+  Currently, the only valid version is 1.
+
+  1-byte Object Id Version (1 = SHA-1)
+
+  1-byte Object Id Length (H)
+
+  1-byte number (C) of "chunks"
+
+CHUNK LOOKUP:
+
+  (C + 1) * 12 bytes listing the table of contents for the chunks:
+  First 4 bytes describe chunk id. Value 0 is a terminating label.
+  Other 8 bytes provide offset in current file for chunk to start.
+  (Chunks are ordered contiguously in the file, so you can infer
+  the length using the next chunk position if necessary.)
+
+  The remaining data in the body is described one chunk at a time, and
+  these chunks may be given in any order. Chunks are required unless
+  otherwise specified.
+
+CHUNK DATA:
+
+  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
+  The ith entry, F[i], stores the number of OIDs with first
+  byte at most i. Thus F[255] stores the total
+  number of commits (N).
+
+  OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
+  The OIDs for all commits in the graph, sorted in ascending order.
+
+  Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
+* The first H bytes are for the OID of the root tree.
+* The next 8 bytes are for the int-ids of the first two parents
+  of the ith commit. Stores value 0x if no parent in that
+  position. If there are more than two parents, the second value
+  has its most-significant bit on and the other bits store an array
+  position into the Large Edge List chunk.
+* The next 8 bytes store the generation number of the commit and
+  the commit time in seconds since EPOCH. The generation number
+  uses the higher 30 bits of the first 4 bytes, while the commit
+  time uses the 32 bits of the second 4 bytes, along with the lowest
+  2 bits of the lowest byte, storing the 33rd and 34th bit of the
+  commit time.
+
+  Large Edge List (ID: {'E', 'D', 'G', 'E'})
+  This list of 4-byte values store the second through nth parents for
+  all octopus merges. The second parent value in the commit data is a
+  negative number pointing into this list. Then iterate through this
+  list starting at that position until reaching a value with the most-
+  significant bit on. The other bits correspond to the int-id of the
+  last parent. This chunk should always be present, but may be empty.
+
+TRAILER:
+
+   H-byte HASH-checksum of all of the above.
-- 
2.16.0.15.g9c3cf44.dirty

[PATCH v2 11/14] commit: integrate commit graph with commit parsing

2018-01-30 Thread Derrick Stolee

Teach Git to inspect a commit graph file to supply the contents of a
struct commit when calling parse_commit_gently(). This implementation
satisfies all post-conditions on the struct commit, including loading
parents, the root tree, and the commit date. The only loosely-expected
condition is that the commit buffer is loaded into the cache. This
was checked in log-tree.c:show_log(), but the "return;" on failure
produced unexpected results (i.e. the message line was never terminated).
The new behavior of loading the buffer when needed prevents the
unexpected behavior.

If core.commitgraph is false, then do not check graph files.

In test script t5319-commit-graph.sh, add output-matching conditions on
read-only graph operations.

By loading commits from the graph instead of parsing commit buffers, we
save a lot of time on long commit walks. Here are some performance
results for a copy of the Linux repository where 'master' has 704,766
reachable commits and is behind 'origin/master' by 19,610 commits.

| Command  | Before | After  | Rel % |
|--|||---|
| log --oneline --topo-order -1000 |  5.9s  |  0.7s  | -88%  |
| branch -vv   |  0.42s |  0.27s | -35%  |
| rev-list --all   |  6.4s  |  1.0s  | -84%  |
| rev-list --all --objects | 32.6s  | 27.6s  | -15%  |

Signed-off-by: Derrick Stolee 
---
 alloc.c |   1 +
 commit-graph.c  | 237 
 commit-graph.h  |  20 +++-
 commit.c|  10 +-
 commit.h|   4 +
 log-tree.c  |   3 +-
 t/t5318-commit-graph.sh |  47 ++
 7 files changed, 318 insertions(+), 4 deletions(-)

diff --git a/alloc.c b/alloc.c
index 12afadfacd..cf4f8b61e1 100644
--- a/alloc.c
+++ b/alloc.c
@@ -93,6 +93,7 @@ void *alloc_commit_node(void)
struct commit *c = alloc_node(_state, sizeof(struct commit));
c->object.type = OBJ_COMMIT;
c->index = alloc_commit_index();
+   c->graph_pos = COMMIT_NOT_FROM_GRAPH;
return c;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index 764e016ddb..fc816533c6 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -35,6 +35,9 @@
 #define GRAPH_MIN_SIZE (GRAPH_CHUNKLOOKUP_SIZE + GRAPH_FANOUT_SIZE + \
GRAPH_OID_LEN + sizeof(struct commit_graph_header))
 
+/* global storage */
+struct commit_graph *commit_graph = 0;
+
 struct object_id *get_graph_head_hash(const char *pack_dir, struct object_id 
*hash)
 {
struct strbuf head_filename = STRBUF_INIT;
@@ -209,6 +212,220 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file, const char *p
return graph;
 }
 
+static void prepare_commit_graph_one(const char *obj_dir)
+{
+   char *graph_file;
+   struct object_id oid;
+   struct strbuf pack_dir = STRBUF_INIT;
+   strbuf_addstr(_dir, obj_dir);
+   strbuf_add(_dir, "/pack", 5);
+
+   if (!get_graph_head_hash(pack_dir.buf, ))
+   return;
+
+   graph_file = get_commit_graph_filename_hash(pack_dir.buf, );
+
+   commit_graph = load_commit_graph_one(graph_file, pack_dir.buf);
+   strbuf_release(_dir);
+}
+
+static int prepare_commit_graph_run_once = 0;
+void prepare_commit_graph(void)
+{
+   struct alternate_object_database *alt;
+   char *obj_dir;
+
+   if (prepare_commit_graph_run_once)
+   return;
+   prepare_commit_graph_run_once = 1;
+
+   obj_dir = get_object_directory();
+   prepare_commit_graph_one(obj_dir);
+   prepare_alt_odb();
+   for (alt = alt_odb_list; !commit_graph && alt; alt = alt->next)
+   prepare_commit_graph_one(alt->path);
+}
+
+static int bsearch_graph(struct commit_graph *g, struct object_id *oid, 
uint32_t *pos)
+{
+   uint32_t last, first = 0;
+
+   if (oid->hash[0])
+   first = ntohl(*(uint32_t*)(g->chunk_oid_fanout + 4 * 
(oid->hash[0] - 1)));
+   last = ntohl(*(uint32_t*)(g->chunk_oid_fanout + 4 * oid->hash[0]));
+
+   while (first < last) {
+   uint32_t mid = first + (last - first) / 2;
+   const unsigned char *current;
+   int cmp;
+
+   current = g->chunk_oid_lookup + g->hdr->hash_len * mid;
+   cmp = hashcmp(oid->hash, current);
+   if (!cmp) {
+   *pos = mid;
+   return 1;
+   }
+   if (cmp > 0) {
+   first = mid + 1;
+   continue;
+   }
+   last = mid;
+   }
+
+   *pos = first;
+   return 0;
+}
+
+struct object_id *get_nth_commit_oid(struct commit_graph *g,
+uint32_t n,
+struct object_id *oid)
+{
+   hashcpy(oid->hash, g->chunk_oid_lookup + g->hdr->hash_len * n);
+   return oid;
+}
+

[PATCH v2 00/14] Serialized Git Commit Graph

2018-01-30 Thread Derrick Stolee

Thanks to everyone who gave comments on v1. I tried my best to respond to
all of the feedback, but may have missed some while I was doing several
renames, including:

* builtin/graph.c -> builtin/commit-graph.c
* packed-graph.[c|h] -> commit-graph.[c|h]
* t/t5319-graph.sh -> t/t5318-commit-graph.sh

Because of these renames (and several type/function renames) the diff
is too large to conveniently share here.

Some issues that came up and are addressed:

* Use  instead of  when referring to the graph-.graph
  filenames and the contents of graph-head.
* 32-bit timestamps will not cause undefined behavior.
* timestamp_t is unsigned, so they are never negative.
* The config setting "core.commitgraph" now only controls consuming the
  graph during normal operations and will not block the commit-graph
  plumbing command.
* The --stdin-commits is better about sanitizing the input for strings
  that do not parse to OIDs or are OIDs for non-commit objects.

One unresolved comment that I would like consensus on is the use of
globals to store the config setting and the graph state. I'm currently
using the pattern from packed_git instead of putting these values in
the_repository. However, we want to eventually remove globals like
packed_git. Should I deviate from the pattern _now_ in order to keep
the problem from growing, or should I keep to the known pattern?

Finally, I tried to clean up my incorrect style as I was recreating
these commits. Feel free to be merciless in style feedback now that the
architecture is more stable.

Thanks,
-Stolee

-- >8 --

As promised [1], this patch contains a way to serialize the commit graph.
The current implementation defines a new file format to store the graph
structure (parent relationships) and basic commit metadata (commit date,
root tree OID) in order to prevent parsing raw commits while performing
basic graph walks. For example, we do not need to parse the full commit
when performing these walks:

* 'git log --topo-order -1000' walks all reachable commits to avoid
  incorrect topological orders, but only needs the commit message for
  the top 1000 commits.

* 'git merge-base  ' may walk many commits to find the correct
  boundary between the commits reachable from A and those reachable
  from B. No commit messages are needed.

* 'git branch -vv' checks ahead/behind status for all local branches
  compared to their upstream remote branches. This is essentially as
  hard as computing merge bases for each.

The current patch speeds up these calculations by injecting a check in
parse_commit_gently() to check if there is a graph file and using that
to provide the required metadata to the struct commit.

The file format has room to store generation numbers, which will be
provided as a patch after this framework is merged. Generation numbers
are referenced by the design document but not implemented in order to
make the current patch focus on the graph construction process. Once
that is stable, it will be easier to add generation numbers and make
graph walks aware of generation numbers one-by-one.

Here are some performance results for a copy of the Linux repository
where 'master' has 704,766 reachable commits and is behind 'origin/master'
by 19,610 commits.

| Command  | Before | After  | Rel % |
|--|||---|
| log --oneline --topo-order -1000 |  5.9s  |  0.7s  | -88%  |
| branch -vv   |  0.42s |  0.27s | -35%  |
| rev-list --all   |  6.4s  |  1.0s  | -84%  |
| rev-list --all --objects | 32.6s  | 27.6s  | -15%  |

To test this yourself, run the following on your repo:

  git config core.commitgraph true
  git show-ref -s | git graph --write --update-head --stdin-commits

The second command writes a commit graph file containing every commit
reachable from your refs. Now, all git commands that walk commits will
check your graph first before consulting the ODB. You can run your own
performance comparisions by toggling the 'core.commitgraph' setting.

[1] 
https://public-inbox.org/git/d154319e-bb9e-b300-7c37-27b1dcd2a...@jeffhostetler.com/
Re: What's cooking in git.git (Jan 2018, #03; Tue, 23)

[2] https://github.com/derrickstolee/git/pull/2
A GitHub pull request containing the latest version of this patch.

Derrick Stolee (14):
  commit-graph: add format document
  graph: add commit graph design document
  commit-graph: create git-commit-graph builtin
  commit-graph: implement construct_commit_graph()
  commit-graph: implement git-commit-graph --write
  commit-graph: implement git-commit-graph --read
  commit-graph: implement git-commit-graph --update-head
  commit-graph: implement git-commit-graph --clear
  commit-graph: teach git-commit-graph --delete-expired
  commit-graph: add core.commitgraph setting
  commit: integrate commit graph with commit parsing
  commit-graph: read only from specific pack-indexes
  commit-graph: close under reachability

Re: [PATCH v2 00/10] rebase -i: offer to recreate merge commits

2018-01-30 Thread Junio C Hamano

Johannes Schindelin  writes:

> Changes since v1:
>
> - reintroduced "sequencer: make refs generated by the `label` command
>   worktree-local" (which was squashed into "sequencer: handle autosquash
>   and post-rewrite for merge commands" by accident)

Good.

> - got rid of the universally-hated `bud` command

Universally is a bit too strong a word, unless you want to hint that
you are specifically ignoring my input ;-).

> - the no-rebase-cousins mode was made the default

Although I lack first-hand experience with this implementation, this
design decision matches my instinct.

May comment on individual patches separately, later.

Thanks.

[PATCH v5 08/10] wildmatch test: create & test files on disk in addition to in-memory

2018-01-30 Thread Ævar Arnfjörð Bjarmason

There has never been any full roundtrip testing of what git-ls-files
and other commands that use wildmatch() actually do, rather we've been
satisfied with just testing the underlying C function.

Due to git-ls-files and friends having their own codepaths before they
call wildmatch() there's sometimes differences in the behavior between
the two. Even when we test for those (as with [1]), there was no one
place where you can review how these two modes differ.

Now there is. We now attempt to create a file called $haystack and
match $needle against it for each pair of $needle and $haystack that
we were passing to test-wildmatch.

If we can't create the file we skip the test. This ensures that we can
run this on all platforms and not maintain some infinitely growing
whitelist of e.g. platforms that don't support certain characters in
filenames.

A notable exception to this is Windows, where due to the reasons
explained in [2] the shellscript emulation layer might fake the
creation of a file such as "*", and "test -e" for it will succeed
since it just got created with some character that maps to "*", but
git ls-files won't be fooled by this.

Thus we need to skip creating certain filenames entirely on Windows,
the list here might be overly aggressive. I don't have access to a
Windows system to test this.

As a result of doing these tests we can now see the cases where these
two ways of testing wildmatch differ:

 * Creating a file called 'a[]b' and running ls-files 'a[]b' will show
   that file, but wildmatch("a[]b", "a[]b") will not match

 * wildmatch() won't match a file called \ against \, but ls-files
   will.

 * `git --glob-pathspecs ls-files 'foo**'` will match a file
   'foo/bba/arr', but wildmatch won't, however pathmatch will.

   This seems like a bug to me, the two are otherwise equivalent as
   these tests show.

This also reveals the case discussed in [1], since 2.16.0 '' is now an
error as far as ls-files is concerned, but wildmatch() itself happily
accepts it.

1. 9e4e8a64c2 ("pathspec: die on empty strings as pathspec",
   2017-06-06)

2. nycvar.qro.7.76.6.1801052133380.1...@wbunaarf-fpuvaqryva.tvgsbejvaqbjf.bet
   
(https://public-inbox.org/git/?q=nycvar.QRO.7.76.6.1801052133380.1337%40wbunaarf-fpuvaqryva.tvgsbejvaqbjf.bet)

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t3070-wildmatch.sh | 201 ---
 1 file changed, 190 insertions(+), 11 deletions(-)

diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index 3e75cb0cbe..bd11e5acb0 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -4,6 +4,72 @@ test_description='wildmatch tests'
 
 . ./test-lib.sh
 
+should_create_test_file() {
+   file=$1
+
+   case $file in
+   # `touch .` will succeed but obviously not do what we intend
+   # here.
+   ".")
+   return 1
+   ;;
+   # We cannot create a file with an empty filename.
+   "")
+   return 1
+   ;;
+   # The tests that are testing that e.g. foo//bar is matched by
+   # foo/*/bar can't be tested on filesystems since there's no
+   # way we're getting a double slash.
+   *//*)
+   return 1
+   ;;
+   # When testing the difference between foo/bar and foo/bar/ we
+   # can't test the latter.
+   */)
+   return 1
+   ;;
+   # On Windows, \ in paths is silently converted to /, which
+   # would result in the "touch" below working, but the test
+   # itself failing. See 6fd1106aa4 ("t3700: Skip a test with
+   # backslashes in pathspec", 2009-03-13) for prior art and
+   # details.
+   *\\*)
+   if ! test_have_prereq BSLASHPSPEC
+   then
+   return 1
+   fi
+   # NOTE: The ;;& bash extension is not portable, so
+   # this test needs to be at the end of the pattern
+   # list.
+   #
+   # If we want to add more conditional returns we either
+   # need a new case statement, or turn this whole thing
+   # into a series of "if" tests.
+   ;;
+   esac
+
+
+   # On Windows proper (i.e. not Cygwin) many file names which
+   # under Cygwin would be emulated don't work.
+   if test_have_prereq MINGW
+   then
+   case $file in
+   " ")
+   # Files called " " are forbidden on Windows
+   return 1
+   ;;
+   *\<*|*\>*|*:*|*\"*|*\|*|*\?*|*\**)
+   # Files with various special characters aren't
+   # allowed on Windows. Sourced from
+   # https://stackoverflow.com/a/31976060
+   return 1
+   ;;
+   esac
+   fi
+
+   return 0
+}
+
 match_with_function() {
text=$1

[PATCH v5 09/10] test-lib: add an EXPENSIVE_ON_WINDOWS prerequisite

2018-01-30 Thread Ævar Arnfjörð Bjarmason

Add an EXPENSIVE_ON_WINDOWS prerequisite to mark those tests which are
very expensive to run on Windows, but cheap elsewhere.

Certain tests that heavily stress the filesystem or run a lot of shell
commands are disproportionately expensive on Windows, this
prerequisite will later be used by a tests that runs in 4-8 seconds on
a modern Linux system, but takes almost 10 minutes on Windows.

There's no reason to skip such tests by default on other platforms,
but Windows users shouldn't need to wait around while they finish.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/test-lib.sh | 4 
 1 file changed, 4 insertions(+)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 9a0a21f49a..a2703c7d36 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1132,6 +1132,10 @@ test_lazy_prereq EXPENSIVE '
test -n "$GIT_TEST_LONG"
 '
 
+test_lazy_prereq EXPENSIVE_ON_WINDOWS '
+   test_have_prereq EXPENSIVE || test_have_prereq !MINGW,!CYGWIN
+'
+
 test_lazy_prereq USR_BIN_TIME '
test -x /usr/bin/time
 '
-- 
2.15.1.424.g9478a66081

[PATCH v5 10/10] wildmatch test: mark test as EXPENSIVE_ON_WINDOWS

2018-01-30 Thread Ævar Arnfjörð Bjarmason

Mark the newly added test which creates test files on-disk as
EXPENSIVE_ON_WINDOWS. According to [1] it takes almost ten minutes to
run this test file on Windows after this recent change, but just a few
seconds on Linux as noted in my [2].

This could be done faster by exiting earlier, however by using this
pattern we'll emit "skip" lines for each skipped test, making it clear
we're not running a lot of them in the TAP output, at the cost of some
overhead.

1. nycvar.qro.7.76.6.1801061337020.1...@wbunaarf-fpuvaqryva.tvgsbejvaqbjf.bet
   
(https://public-inbox.org/git/nycvar.qro.7.76.6.1801061337020.1...@wbunaarf-fpuvaqryva.tvgsbejvaqbjf.bet/)

2. 87mv1raz9p@evledraar.gmail.com
   (https://public-inbox.org/git/87mv1raz9p@evledraar.gmail.com/)

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t3070-wildmatch.sh | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index bd11e5acb0..c1fc6ca730 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -109,36 +109,36 @@ match_with_ls_files() {
then
if test -e .git/created_test_file
then
-   test_expect_success "$match_function (via ls-files): 
match dies on '$pattern' '$text'" "
+   test_expect_success EXPENSIVE_ON_WINDOWS 
"$match_function (via ls-files): match dies on '$pattern' '$text'" "
printf '%s' '$text' >expect &&
test_must_fail git$ls_files_args ls-files -z -- 
'$pattern'
"
else
-   test_expect_failure "$match_function (via ls-files): 
match skip '$pattern' '$text'" 'false'
+   test_expect_failure EXPENSIVE_ON_WINDOWS 
"$match_function (via ls-files): match skip '$pattern' '$text'" 'false'
fi
elif test "$match_expect" = 1
then
if test -e .git/created_test_file
then
-   test_expect_success "$match_function (via ls-files): 
match '$pattern' '$text'" "
+   test_expect_success EXPENSIVE_ON_WINDOWS 
"$match_function (via ls-files): match '$pattern' '$text'" "
printf '%s' '$text' >expect &&
git$ls_files_args ls-files -z -- '$pattern' 
>actual.raw 2>actual.err &&
$match_stdout_stderr_cmp
"
else
-   test_expect_failure "$match_function (via ls-files): 
match skip '$pattern' '$text'" 'false'
+   test_expect_failure EXPENSIVE_ON_WINDOWS 
"$match_function (via ls-files): match skip '$pattern' '$text'" 'false'
fi
elif test "$match_expect" = 0
then
if test -e .git/created_test_file
then
-   test_expect_success "$match_function (via ls-files): no 
match '$pattern' '$text'" "
+   test_expect_success EXPENSIVE_ON_WINDOWS 
"$match_function (via ls-files): no match '$pattern' '$text'" "
>expect &&
git$ls_files_args ls-files -z -- '$pattern' 
>actual.raw 2>actual.err &&
$match_stdout_stderr_cmp
"
else
-   test_expect_failure "$match_function (via ls-files): no 
match skip '$pattern' '$text'" 'false'
+   test_expect_failure EXPENSIVE_ON_WINDOWS 
"$match_function (via ls-files): no match skip '$pattern' '$text'" 'false'
fi
else
test_expect_success "PANIC: Test framework error. Unknown 
matches value $match_expect" 'false'
@@ -174,7 +174,7 @@ match() {
pattern=${10}
fi
 
-   test_expect_success 'cleanup after previous file test' '
+   test_expect_success EXPENSIVE_ON_WINDOWS 'cleanup after previous file 
test' '
if test -e .git/created_test_file
then
git reset &&
@@ -184,7 +184,7 @@ match() {
 
printf '%s' "$text" >.git/expected_test_file
 
-   test_expect_success "setup match file test for $text" '
+   test_expect_success EXPENSIVE_ON_WINDOWS "setup match file test for 
$text" '
file=$(cat .git/expected_test_file) &&
if should_create_test_file "$file"
then
-- 
2.15.1.424.g9478a66081

[PATCH v5 07/10] wildmatch test: perform all tests under all wildmatch() modes

2018-01-30 Thread Ævar Arnfjörð Bjarmason

Rewrite the wildmatch() test suite so that each test now tests all
combinations of the wildmatch() WM_CASEFOLD and WM_PATHNAME flags.

Before this change some test inputs were not tested on
e.g. WM_PATHNAME. Now the function is stress tested on all possible
inputs, and for each input we declare what the result should be if the
mode is case-insensitive, or pathname matching, or case-sensitive or
not matching pathnames.

Also before this change, nothing was testing case-insensitive
non-pathname matching, so I've added that to test-wildmatch.c and made
use of it.

This yields a rather scary patch, but there are no functional changes
here, just more test coverage. Some now-redundant tests were deleted
as a result of this change, since they were now duplicating an earlier
test.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/helper/test-wildmatch.c |   2 +
 t/t3070-wildmatch.sh  | 478 ++
 2 files changed, 228 insertions(+), 252 deletions(-)

diff --git a/t/helper/test-wildmatch.c b/t/helper/test-wildmatch.c
index 921d7b3e7e..66d33dfcfd 100644
--- a/t/helper/test-wildmatch.c
+++ b/t/helper/test-wildmatch.c
@@ -16,6 +16,8 @@ int cmd_main(int argc, const char **argv)
return !!wildmatch(argv[3], argv[2], WM_PATHNAME | WM_CASEFOLD);
else if (!strcmp(argv[1], "pathmatch"))
return !!wildmatch(argv[3], argv[2], 0);
+   else if (!strcmp(argv[1], "ipathmatch"))
+   return !!wildmatch(argv[3], argv[2], WM_CASEFOLD);
else
return 1;
 }
diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index fe0e5103a3..3e75cb0cbe 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -4,278 +4,252 @@ test_description='wildmatch tests'
 
 . ./test-lib.sh
 
-match() {
-   if test "$1" = 1
-   then
-   test_expect_success "wildmatch: match '$2' '$3'" "
-   test-wildmatch wildmatch '$2' '$3'
-   "
-   elif test "$1" = 0
-   then
-   test_expect_success "wildmatch: no match '$2' '$3'" "
-   test_must_fail test-wildmatch wildmatch '$2' '$3'
-   "
-   else
-   test_expect_success "PANIC: Test framework error. Unknown 
matches value $1" 'false'
-   fi
-}
+match_with_function() {
+   text=$1
+   pattern=$2
+   match_expect=$3
+   match_function=$4
 
-imatch() {
-   if test "$1" = 1
+   if test "$match_expect" = 1
then
-   test_expect_success "iwildmatch: match '$2' '$3'" "
-   test-wildmatch iwildmatch '$2' '$3'
+   test_expect_success "$match_function: match '$text' '$pattern'" 
"
+   test-wildmatch $match_function '$text' '$pattern'
"
-   elif test "$1" = 0
+   elif test "$match_expect" = 0
then
-   test_expect_success "iwildmatch: no match '$2' '$3'" "
-   test_must_fail test-wildmatch iwildmatch '$2' '$3'
+   test_expect_success "$match_function: no match '$text' 
'$pattern'" "
+   test_must_fail test-wildmatch $match_function '$text' 
'$pattern'
"
else
-   test_expect_success "PANIC: Test framework error. Unknown 
matches value $1" 'false'
+   test_expect_success "PANIC: Test framework error. Unknown 
matches value $match_expect" 'false'
fi
+
 }
 
-pathmatch() {
-   if test "$1" = 1
-   then
-   test_expect_success "pathmatch: match '$2' '$3'" "
-   test-wildmatch pathmatch '$2' '$3'
-   "
-   elif test "$1" = 0
-   then
-   test_expect_success "pathmatch: no match '$2' '$3'" "
-   test_must_fail test-wildmatch pathmatch '$2' '$3'
-   "
-   else
-   test_expect_success "PANIC: Test framework error. Unknown 
matches value $1" 'false'
-   fi
+match() {
+   match_glob=$1
+   match_iglob=$2
+   match_pathmatch=$3
+   match_pathmatchi=$4
+   text=$5
+   pattern=$6
+
+   # $1: Case sensitive glob match: test-wildmatch & ls-files
+   match_with_function "$text" "$pattern" $match_glob "wildmatch"
+
+   # $2: Case insensitive glob match: test-wildmatch & ls-files
+   match_with_function "$text" "$pattern" $match_iglob "iwildmatch"
+
+   # $3: Case sensitive path match: test-wildmatch & ls-files
+   match_with_function "$text" "$pattern" $match_pathmatch "pathmatch"
+
+   # $4: Case insensitive path match: test-wildmatch & ls-files
+   match_with_function "$text" "$pattern" $match_pathmatchi "ipathmatch"
 }
 
-# Basic wildmat features
-match 1 foo foo
-match 0 foo bar
-match 1 '' ""
-match 1 foo '???'
-match 0 foo '??'
-match 1 foo '*'
-match 1 foo 'f*'
-match 0 foo '*f'
-match 1 foo '*foo*'
-match 1 foobar '*ob*a*r*'
-match 1 aaabababab '*ab'

[PATCH v5 05/10] wildmatch test: remove dead fnmatch() test code

2018-01-30 Thread Ævar Arnfjörð Bjarmason

Remove the unused fnmatch() test parameter from the wildmatch
test. The code that used to test this was removed in 70a8fc999d ("stop
using fnmatch (either native or compat)", 2014-02-15).

As a --word-diff shows the only change to the body of the tests is the
removal of the second out of four parameters passed to match().

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t3070-wildmatch.sh | 356 +--
 1 file changed, 178 insertions(+), 178 deletions(-)

diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index 9691d8eda3..2f8a681c72 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -7,13 +7,13 @@ test_description='wildmatch tests'
 match() {
if test "$1" = 1
then
-   test_expect_success "wildmatch: match '$3' '$4'" "
-   test-wildmatch wildmatch '$3' '$4'
+   test_expect_success "wildmatch: match '$2' '$3'" "
+   test-wildmatch wildmatch '$2' '$3'
"
elif test "$1" = 0
then
-   test_expect_success "wildmatch: no match '$3' '$4'" "
-   ! test-wildmatch wildmatch '$3' '$4'
+   test_expect_success "wildmatch: no match '$2' '$3'" "
+   ! test-wildmatch wildmatch '$2' '$3'
"
else
test_expect_success "PANIC: Test framework error. Unknown 
matches value $1" 'false'
@@ -53,176 +53,176 @@ pathmatch() {
 }
 
 # Basic wildmat features
-match 1 1 foo foo
-match 0 0 foo bar
-match 1 1 '' ""
-match 1 1 foo '???'
-match 0 0 foo '??'
-match 1 1 foo '*'
-match 1 1 foo 'f*'
-match 0 0 foo '*f'
-match 1 1 foo '*foo*'
-match 1 1 foobar '*ob*a*r*'
-match 1 1 aaabababab '*ab'
-match 1 1 'foo*' 'foo\*'
-match 0 0 foobar 'foo\*bar'
-match 1 1 'f\oo' 'f\\oo'
-match 1 1 ball '*[al]?'
-match 0 0 ten '[ten]'
-match 0 1 ten '**[!te]'
-match 0 0 ten '**[!ten]'
-match 1 1 ten 't[a-g]n'
-match 0 0 ten 't[!a-g]n'
-match 1 1 ton 't[!a-g]n'
-match 1 1 ton 't[^a-g]n'
-match 1 x 'a]b' 'a[]]b'
-match 1 x a-b 'a[]-]b'
-match 1 x 'a]b' 'a[]-]b'
-match 0 x aab 'a[]-]b'
-match 1 x aab 'a[]a-]b'
-match 1 1 ']' ']'
+match 1 foo foo
+match 0 foo bar
+match 1 '' ""
+match 1 foo '???'
+match 0 foo '??'
+match 1 foo '*'
+match 1 foo 'f*'
+match 0 foo '*f'
+match 1 foo '*foo*'
+match 1 foobar '*ob*a*r*'
+match 1 aaabababab '*ab'
+match 1 'foo*' 'foo\*'
+match 0 foobar 'foo\*bar'
+match 1 'f\oo' 'f\\oo'
+match 1 ball '*[al]?'
+match 0 ten '[ten]'
+match 0 ten '**[!te]'
+match 0 ten '**[!ten]'
+match 1 ten 't[a-g]n'
+match 0 ten 't[!a-g]n'
+match 1 ton 't[!a-g]n'
+match 1 ton 't[^a-g]n'
+match 1 'a]b' 'a[]]b'
+match 1 a-b 'a[]-]b'
+match 1 'a]b' 'a[]-]b'
+match 0 aab 'a[]-]b'
+match 1 aab 'a[]a-]b'
+match 1 ']' ']'
 
 # Extended slash-matching features
-match 0 0 'foo/baz/bar' 'foo*bar'
-match 0 0 'foo/baz/bar' 'foo**bar'
-match 0 1 'foobazbar' 'foo**bar'
-match 1 1 'foo/baz/bar' 'foo/**/bar'
-match 1 0 'foo/baz/bar' 'foo/**/**/bar'
-match 1 0 'foo/b/a/z/bar' 'foo/**/bar'
-match 1 0 'foo/b/a/z/bar' 'foo/**/**/bar'
-match 1 0 'foo/bar' 'foo/**/bar'
-match 1 0 'foo/bar' 'foo/**/**/bar'
-match 0 0 'foo/bar' 'foo?bar'
-match 0 0 'foo/bar' 'foo[/]bar'
-match 0 0 'foo/bar' 'foo[^a-z]bar'
-match 0 0 'foo/bar' 'f[^eiu][^eiu][^eiu][^eiu][^eiu]r'
-match 1 1 'foo-bar' 'f[^eiu][^eiu][^eiu][^eiu][^eiu]r'
-match 1 0 'foo' '**/foo'
-match 1 x 'XXX/foo' '**/foo'
-match 1 0 'bar/baz/foo' '**/foo'
-match 0 0 'bar/baz/foo' '*/foo'
-match 0 0 'foo/bar/baz' '**/bar*'
-match 1 0 'deep/foo/bar/baz' '**/bar/*'
-match 0 0 'deep/foo/bar/baz/' '**/bar/*'
-match 1 0 'deep/foo/bar/baz/' '**/bar/**'
-match 0 0 'deep/foo/bar' '**/bar/*'
-match 1 0 'deep/foo/bar/' '**/bar/**'
-match 0 0 'foo/bar/baz' '**/bar**'
-match 1 0 'foo/bar/baz/x' '*/bar/**'
-match 0 0 'deep/foo/bar/baz/x' '*/bar/**'
-match 1 0 'deep/foo/bar/baz/x' '**/bar/*/*'
+match 0 'foo/baz/bar' 'foo*bar'
+match 0 'foo/baz/bar' 'foo**bar'
+match 0 'foobazbar' 'foo**bar'
+match 1 'foo/baz/bar' 'foo/**/bar'
+match 1 'foo/baz/bar' 'foo/**/**/bar'
+match 1 'foo/b/a/z/bar' 'foo/**/bar'
+match 1 'foo/b/a/z/bar' 'foo/**/**/bar'
+match 1 'foo/bar' 'foo/**/bar'
+match 1 'foo/bar' 'foo/**/**/bar'
+match 0 'foo/bar' 'foo?bar'
+match 0 'foo/bar' 'foo[/]bar'
+match 0 'foo/bar' 'foo[^a-z]bar'
+match 0 'foo/bar' 'f[^eiu][^eiu][^eiu][^eiu][^eiu]r'
+match 1 'foo-bar' 'f[^eiu][^eiu][^eiu][^eiu][^eiu]r'
+match 1 'foo' '**/foo'
+match 1 'XXX/foo' '**/foo'
+match 1 'bar/baz/foo' '**/foo'
+match 0 'bar/baz/foo' '*/foo'
+match 0 'foo/bar/baz' '**/bar*'
+match 1 'deep/foo/bar/baz' '**/bar/*'
+match 0 'deep/foo/bar/baz/' '**/bar/*'
+match 1 'deep/foo/bar/baz/' '**/bar/**'
+match 0 'deep/foo/bar' '**/bar/*'
+match 1 'deep/foo/bar/' '**/bar/**'
+match 0 'foo/bar/baz' '**/bar**'
+match 1 'foo/bar/baz/x' '*/bar/**'
+match 0 'deep/foo/bar/baz/x' '*/bar/**'
+match 1 'deep/foo/bar/baz/x' '**/bar/*/*'
 
 # Various additional tests
-match 0 0 'acrt'

[PATCH v5 06/10] wildmatch test: use test_must_fail, not ! for test-wildmatch

2018-01-30 Thread Ævar Arnfjörð Bjarmason

Use of ! should be reserved for non-git programs that are assumed not
to fail, see README. With this change only
t/t0110-urlmatch-normalization.sh is still using this anti-pattern.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t3070-wildmatch.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index 2f8a681c72..fe0e5103a3 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -13,7 +13,7 @@ match() {
elif test "$1" = 0
then
test_expect_success "wildmatch: no match '$2' '$3'" "
-   ! test-wildmatch wildmatch '$2' '$3'
+   test_must_fail test-wildmatch wildmatch '$2' '$3'
"
else
test_expect_success "PANIC: Test framework error. Unknown 
matches value $1" 'false'
@@ -29,7 +29,7 @@ imatch() {
elif test "$1" = 0
then
test_expect_success "iwildmatch: no match '$2' '$3'" "
-   ! test-wildmatch iwildmatch '$2' '$3'
+   test_must_fail test-wildmatch iwildmatch '$2' '$3'
"
else
test_expect_success "PANIC: Test framework error. Unknown 
matches value $1" 'false'
@@ -45,7 +45,7 @@ pathmatch() {
elif test "$1" = 0
then
test_expect_success "pathmatch: no match '$2' '$3'" "
-   ! test-wildmatch pathmatch '$2' '$3'
+   test_must_fail test-wildmatch pathmatch '$2' '$3'
"
else
test_expect_success "PANIC: Test framework error. Unknown 
matches value $1" 'false'
-- 
2.15.1.424.g9478a66081

[PATCH v5 04/10] wildmatch test: use a paranoia pattern from nul_match()

2018-01-30 Thread Ævar Arnfjörð Bjarmason

Use a pattern from the nul_match() function in t7008-grep-binary.sh to
make sure that we don't just fall through to the "else" if there's an
unknown parameter.

This is something I added in commit 77f6f4406f ("grep: add a test
helper function for less verbose -f \0 tests", 2017-05-20) to grep
tests, which were modeled on these wildmatch tests, and I'm now
porting back to the original wildmatch tests.

I am not using the "say '...'; exit 1" pattern from t-basic.sh
because if I fail I want to run the rest of the tests (unless under
-i), and doing this makes sure we do that and don't exit right away
without fully reporting our errors.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t3070-wildmatch.sh | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index 19ea64bba9..9691d8eda3 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -10,10 +10,13 @@ match() {
test_expect_success "wildmatch: match '$3' '$4'" "
test-wildmatch wildmatch '$3' '$4'
"
-   else
+   elif test "$1" = 0
+   then
test_expect_success "wildmatch: no match '$3' '$4'" "
! test-wildmatch wildmatch '$3' '$4'
"
+   else
+   test_expect_success "PANIC: Test framework error. Unknown 
matches value $1" 'false'
fi
 }
 
@@ -23,10 +26,13 @@ imatch() {
test_expect_success "iwildmatch: match '$2' '$3'" "
test-wildmatch iwildmatch '$2' '$3'
"
-   else
+   elif test "$1" = 0
+   then
test_expect_success "iwildmatch: no match '$2' '$3'" "
! test-wildmatch iwildmatch '$2' '$3'
"
+   else
+   test_expect_success "PANIC: Test framework error. Unknown 
matches value $1" 'false'
fi
 }
 
@@ -36,10 +42,13 @@ pathmatch() {
test_expect_success "pathmatch: match '$2' '$3'" "
test-wildmatch pathmatch '$2' '$3'
"
-   else
+   elif test "$1" = 0
+   then
test_expect_success "pathmatch: no match '$2' '$3'" "
! test-wildmatch pathmatch '$2' '$3'
"
+   else
+   test_expect_success "PANIC: Test framework error. Unknown 
matches value $1" 'false'
fi
 }
 
-- 
2.15.1.424.g9478a66081

[PATCH v5 01/10] wildmatch test: indent with tabs, not spaces

2018-01-30 Thread Ævar Arnfjörð Bjarmason

Replace the 4-width mixed space & tab indentation in this file with
indentation with tabs as we do in most of the rest of our tests.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t3070-wildmatch.sh | 54 ++--
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index 163a14a1c2..27fa878f6e 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -5,39 +5,39 @@ test_description='wildmatch tests'
 . ./test-lib.sh
 
 match() {
-if [ $1 = 1 ]; then
-   test_expect_success "wildmatch: match '$3' '$4'" "
-   test-wildmatch wildmatch '$3' '$4'
-   "
-else
-   test_expect_success "wildmatch:  no match '$3' '$4'" "
-   ! test-wildmatch wildmatch '$3' '$4'
-   "
-fi
+   if [ $1 = 1 ]; then
+   test_expect_success "wildmatch: match '$3' '$4'" "
+   test-wildmatch wildmatch '$3' '$4'
+   "
+   else
+   test_expect_success "wildmatch:  no match '$3' '$4'" "
+   ! test-wildmatch wildmatch '$3' '$4'
+   "
+   fi
 }
 
 imatch() {
-if [ $1 = 1 ]; then
-   test_expect_success "iwildmatch:match '$2' '$3'" "
-   test-wildmatch iwildmatch '$2' '$3'
-   "
-else
-   test_expect_success "iwildmatch: no match '$2' '$3'" "
-   ! test-wildmatch iwildmatch '$2' '$3'
-   "
-fi
+   if [ $1 = 1 ]; then
+   test_expect_success "iwildmatch:match '$2' '$3'" "
+   test-wildmatch iwildmatch '$2' '$3'
+   "
+   else
+   test_expect_success "iwildmatch: no match '$2' '$3'" "
+   ! test-wildmatch iwildmatch '$2' '$3'
+   "
+   fi
 }
 
 pathmatch() {
-if [ $1 = 1 ]; then
-   test_expect_success "pathmatch: match '$2' '$3'" "
-   test-wildmatch pathmatch '$2' '$3'
-   "
-else
-   test_expect_success "pathmatch:  no match '$2' '$3'" "
-   ! test-wildmatch pathmatch '$2' '$3'
-   "
-fi
+   if [ $1 = 1 ]; then
+   test_expect_success "pathmatch: match '$2' '$3'" "
+   test-wildmatch pathmatch '$2' '$3'
+   "
+   else
+   test_expect_success "pathmatch:  no match '$2' '$3'" "
+   ! test-wildmatch pathmatch '$2' '$3'
+   "
+   fi
 }
 
 # Basic wildmat features
-- 
2.15.1.424.g9478a66081

[PATCH v5 03/10] wildmatch test: don't try to vertically align our output

2018-01-30 Thread Ævar Arnfjörð Bjarmason

Don't try to vertically align the test output, which is futile anyway
under the TAP output where we're going to be emitting a number for
each test without aligning the test count.

This makes subsequent changes of mine where I'm not going to be
aligning this output as I add new tests easier.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t3070-wildmatch.sh | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index 4d589d1f9a..19ea64bba9 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -7,11 +7,11 @@ test_description='wildmatch tests'
 match() {
if test "$1" = 1
then
-   test_expect_success "wildmatch: match '$3' '$4'" "
+   test_expect_success "wildmatch: match '$3' '$4'" "
test-wildmatch wildmatch '$3' '$4'
"
else
-   test_expect_success "wildmatch:  no match '$3' '$4'" "
+   test_expect_success "wildmatch: no match '$3' '$4'" "
! test-wildmatch wildmatch '$3' '$4'
"
fi
@@ -20,7 +20,7 @@ match() {
 imatch() {
if test "$1" = 1
then
-   test_expect_success "iwildmatch:match '$2' '$3'" "
+   test_expect_success "iwildmatch: match '$2' '$3'" "
test-wildmatch iwildmatch '$2' '$3'
"
else
@@ -33,11 +33,11 @@ imatch() {
 pathmatch() {
if test "$1" = 1
then
-   test_expect_success "pathmatch: match '$2' '$3'" "
+   test_expect_success "pathmatch: match '$2' '$3'" "
test-wildmatch pathmatch '$2' '$3'
"
else
-   test_expect_success "pathmatch:  no match '$2' '$3'" "
+   test_expect_success "pathmatch: no match '$2' '$3'" "
! test-wildmatch pathmatch '$2' '$3'
"
fi
-- 
2.15.1.424.g9478a66081

[PATCH v5 00/10] increase wildmatch test coverage

2018-01-30 Thread Ævar Arnfjörð Bjarmason

v5 has been a long time coming (20 days since I said I'd re-roll
this), but hopefully this is a version that works well for everyone,
including Windows users. Changes:

Ævar Arnfjörð Bjarmason (10):
  wildmatch test: indent with tabs, not spaces
  wildmatch test: use more standard shell style
  wildmatch test: don't try to vertically align our output
  wildmatch test: use a paranoia pattern from nul_match()
  wildmatch test: remove dead fnmatch() test code

No changes.

  wildmatch test: use test_must_fail, not ! for test-wildmatch

NEW: Fix a tiny nit I spotted while re-rolling.

  wildmatch test: perform all tests under all wildmatch() modes

The testing of various wildmatch modes got factored into a
function. It makes no difference to this patch, but makes a huge
difference in readability to the follow-up patch.

Also I stopped renaming "match" to "wildtest", I can't remeber why I
did that in the first place, but no point in doing that, and this
makes things easier to review...

  wildmatch test: create & test files on disk in addition to in-memory

Almost entirely based on feedback from Johannes:

a) This is now much more friendly under -x, as little test code as
possible outside actual tests.

b) Factored out into functions

c) Gave variables better names

d) Hopefully runs under Windows now without errors, due to a blacklist
of filenames that aren't allowed on Windows. Commit message now
mentions this.

e) This should be a lot faster than before, since I factored out the
setup work being done for every test so it's only done

f) At this point I can't remember who/where this was pointed out, but
it was observed that I was using a very dangerous looking `rm -rf --
*` pattern in the old test, turns out this could be replaced with a
less scary `git clean -df`.

  test-lib: add an EXPENSIVE_ON_WINDOWS prerequisite
  wildmatch test: mark test as EXPENSIVE_ON_WINDOWS

Follow-up my 87mv1raz9p@evledraar.gmail.com from the v4 thread,
and create an EXPENSIVE_ON_WINDOWS prerequisite, which is then used
for the file tests so they're skipped on Windows by default.

Even though 8/10 should be faster now, and hopefully passes on
Windows, I still expect it to be quite slow on Windows, so let's not
run it there by default unless under GIT_TEST_LONG=1.

 t/helper/test-wildmatch.c |   2 +
 t/t3070-wildmatch.sh  | 655 +-
 t/test-lib.sh |   4 +
 3 files changed, 416 insertions(+), 245 deletions(-)

-- 
2.15.1.424.g9478a66081

[PATCH v5 02/10] wildmatch test: use more standard shell style

2018-01-30 Thread Ævar Arnfjörð Bjarmason

Change the wildmatch test to use more standard shell style, usually we
use "if test" not "if [".

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t3070-wildmatch.sh | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index 27fa878f6e..4d589d1f9a 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -5,7 +5,8 @@ test_description='wildmatch tests'
 . ./test-lib.sh
 
 match() {
-   if [ $1 = 1 ]; then
+   if test "$1" = 1
+   then
test_expect_success "wildmatch: match '$3' '$4'" "
test-wildmatch wildmatch '$3' '$4'
"
@@ -17,7 +18,8 @@ match() {
 }
 
 imatch() {
-   if [ $1 = 1 ]; then
+   if test "$1" = 1
+   then
test_expect_success "iwildmatch:match '$2' '$3'" "
test-wildmatch iwildmatch '$2' '$3'
"
@@ -29,7 +31,8 @@ imatch() {
 }
 
 pathmatch() {
-   if [ $1 = 1 ]; then
+   if test "$1" = 1
+   then
test_expect_success "pathmatch: match '$2' '$3'" "
test-wildmatch pathmatch '$2' '$3'
"
-- 
2.15.1.424.g9478a66081

Re: [PATCH] git-svn: control destruction order to avoid segfault

2018-01-30 Thread Junio C Hamano

Eric Wong  writes:

> Todd Zullinger  wrote:
>> I'm running the tests with and without your patch as well.
>> So far I've run t9128 300 times with the patch and no
>> failures.  Without it, it's failed 3 times in only a few
>> dozen runs.  That's promising.
>
> Thanks for confirming it works on other systems.
> Pull request and patch below:
>
> The following changes since commit 5be1f00a9a701532232f57958efab4be8c959a29:
>
>   First batch after 2.16 (2018-01-23 13:21:10 -0800)
>
> are available in the Git repository at:
>
>   git://bogomips.org/git-svn.git svn-branch-segfault
>
> for you to fetch changes up to 2784b8d68faca823489949cbc69ead2f296cfc07:
>
>   git-svn: control destruction order to avoid segfault (2018-01-29 23:12:00 
> +)
>
> 
> Eric Wong (1):
>   git-svn: control destruction order to avoid segfault
>
>  git-svn.perl | 5 +
>  1 file changed, 5 insertions(+)

Thanks.  I'd actually apply this as a patch instead of pullilng, as
I suspect you'd want it in 'maint' as well, though.


> -8<-
> Subject: [PATCH] git-svn: control destruction order to avoid segfault
>
> It seems necessary to control destruction ordering to avoid a
> segfault with SVN 1.9.5 when using "git svn branch".
> I've also reported the problem against libsvn-perl to Debian
> [Bug #888791], but releasing the SVN::Client instance can be
> beneficial anyways to save memory.
>
> ref: https://bugs.debian.org/888791
> Tested-by: Todd Zullinger 
> Reported-by: brian m. carlson 
> Signed-off-by: Eric Wong 
> ---
>  git-svn.perl | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/git-svn.perl b/git-svn.perl
> index 76a75d0b3d..a6b6c3e40c 100755
> --- a/git-svn.perl
> +++ b/git-svn.perl
> @@ -1200,6 +1200,11 @@ sub cmd_branch {
>   $ctx->copy($src, $rev, $dst)
>   unless $_dry_run;
>  
> + # Release resources held by ctx before creating another SVN::Ra
> + # so destruction is orderly.  This seems necessary with SVN 1.9.5
> + # to avoid segfaults.
> + $ctx = undef;
> +
>   $gs->fetch_all;
>  }

Re: [PATCH] doc: mention 'git show' defaults to HEAD

2018-01-30 Thread Junio C Hamano

Todd Zullinger  writes:

> When 'git show' is called without any object it defaults to HEAD.  This
> has been true since d4ed9793fd ("Simplify common default options setup
> for built-in log family.", 2006-04-16).
>
> The SYNOPSIS suggests that the object argument is required.  Clarify
> that it is not required and note the default.
>
> Signed-off-by: Todd Zullinger 
> ---
> This was mentioned in #git today by qaz.  It seems reasonable to document the
> default.

Sounds sensible.

>
>  Documentation/git-show.txt | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/git-show.txt b/Documentation/git-show.txt
> index 82a4125a2d..e73ef54017 100644
> --- a/Documentation/git-show.txt
> +++ b/Documentation/git-show.txt
> @@ -9,7 +9,7 @@ git-show - Show various types of objects
>  SYNOPSIS
>  
>  [verse]
> -'git show' [options] ...
> +'git show' [options] [...]
>  
>  DESCRIPTION
>  ---
> @@ -35,7 +35,7 @@ This manual page describes only the most frequently used 
> options.
>  OPTIONS
>  ---
>  ...::
> - The names of objects to show.
> + The names of objects to show (defaults to 'HEAD').
>   For a more complete list of ways to spell object names, see
>   "SPECIFYING REVISIONS" section in linkgit:gitrevisions[7].

Re: [RFC PATCH 0/2] alternate hash test

2018-01-30 Thread Junio C Hamano

"brian m. carlson"  writes:

> This series wires up an alternate hash implementation, namely
> BLAKE2b-160.  The goal is to allow us to identify tests which rely on
> the hash algorithm in use so that we can fix those tests.

Yay.

> Provoking discussion of which hash to pick for NewHash is explicitly
> *not* a goal for this series.  I'm only interested in the ability to
> identify and fix tests.

Double yay.

Re: [PATCH v5 4/7] utf8: add function to detect a missing UTF-16/32 BOM

2018-01-30 Thread Lars Schneider


> On 30 Jan 2018, at 20:15, Junio C Hamano  wrote:
> 
> tbo...@web.de writes:
> 
>> From: Lars Schneider 
>> 
>> If the endianness is not defined in the encoding name, then let's
>> be strict and require a BOM to avoid any encoding confusion. The
>> has_missing_utf_bom() function returns true if a required BOM is
>> missing.
>> 
>> The Unicode standard instructs to assume big-endian if there in no BOM
>> for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard used
>> in HTML5 recommends to assume little-endian to "deal with deployed
>> content" [3]. Strictly requiring a BOM seems to be the safest option
>> for content in Git.
> 
> I do not have strong opinion on encoding such policy-ish behaviour
> as our default, but am I alone to find that "has missing X" is a
> confusing name for a helper function?  "is missing X" (or "lacks
> X") is a bit more understandable, I guess.

That might be a german/english translation thingy but I think I get
your point. "has" implies there is something and "missing" implies
there is nothing :)

"is_missing_utf_bom()" might be even a bit unspecific as UTF-8
is usually missing a UTF BOM but the function would still return 
"false". Therefore, "is_missing_required_utf_bom()" might be 
lengthy but should fit.

OK for you?

- Lars


> 
>> +int has_missing_utf_bom(const char *enc, const char *data, size_t len)
>> +{
>> +return (
>> +   !strcmp(enc, "UTF-16") &&
>> +   !(has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) ||
>> + has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom)))
>> +) || (
>> +   !strcmp(enc, "UTF-32") &&
>> +   !(has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) ||
>> + has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom)))
>> +);
>> +}

Re: [PATCH 00/37] removal of some c++ keywords

2018-01-30 Thread Johannes Sixt


Am 29.01.2018 um 23:36 schrieb Brandon Williams:

A while back there was some discussion of getting our codebase into a state
where we could use a c++ compiler if we wanted to (for various reason like
leveraging c++ only analysis tools, etc.).  Johannes Sixt had a very large
patch that achieved this but it wasn't in a state where it could be upstreamed.
I took that idea and did some removals of c++ keywords (new, template, try,
this, etc) but broke it up into several (well maybe more than several) patches.
I don't believe I've captured all of them in this series but this is at least
moving a step closer to being able to compile using a c++ compiler.


Cool! The patches all look reasonable. Some keywords remain: 'delete', 
'private', 'thread_local', 'not', 'xor', 'explicit', 'export'.



I don't know if this is something the community still wants to move towards,
but if this is something people are still interested in, and this series is
wanted, then we can keep doing these sort of conversions in chunks slowly.


I've rebased my patches on top of this series. They are available from

  https://github.com/j6t/git.git c-plus-plus

-- Hannes

Re: [PATCH v2 1/1] setup: recognise extensions.objectFormat

2018-01-30 Thread Junio C Hamano

Jeff King  writes:

> Putting code in master is OK; we can always refactor it. But once we
> add and document a user-facing config option like this, we have to
> support it forever. So that's really the step I was wondering about: are
> we sure this is what the user-facing config is going to look like?

Yup, that is an important distinction.

> But that's sort of my point. It appears to be working, but the
> prior-version safety they think they have is not there. I think we're
> better off erring on the side of caution here, and letting them know
> forcefully that their config is bogus.
>
>> At the same time... there's extension.partialclone in pu and it does not
>> have check on repo format.
>
> IMHO it should (and we should just do it by enforcing it for all
> extensions automatically).

Sounds good.

Re: [PATCH RFC 01/24] ref-filter: get rid of goto

2018-01-30 Thread Junio C Hamano

Оля Тележная   writes:

>> one place improves readability.  If better readability is the
>> purpose, I would even say
>>
>>  for (i = 0; i < used_atom_cnt; i++) {
>> if (...)
>> -   goto need_obj;
>> +   break;
>> }
>> -   return;
>> +   if (used_atom_cnt <= i)
>> return;
>>
>> -need_obj:
>>
>> would make the result easier to follow with a much less impact.
>
> It's hard for me to read the code with goto, and as I know, it's not
> only my problem,...

That sounds as if you are complaining "I wanted to get rid of goto
and you tell me not to do so???", but read what I showed above again
and notice that it is also getting rid of "goto".

The main difference from your version is that the original function
is still kept as a single unit of work, instead of two.

Re: git send-email sets date

2018-01-30 Thread Junio C Hamano

Theodore Ts'o  writes:

> If there is a From: header in the beginning of the mail body, it is
> used as the Author instead of the From: header in the mail header.  It
> would make sense if there is a Date: header in the beginning of the
> mail body, it should be used instead of Date: field in the mail header.

Just like From:, Date: and Subject: are in-body headers that are
accepted by deployed versions of "git am" (actually, "mailinfo").

Re: git send-email sets date

2018-01-30 Thread Junio C Hamano

Michal Suchánek  writes:

> git send-email sets the message date to author date.
>
> This is wrong because the message will most likely not get delivered
> when the author date differs from current time. It might give slightly
> better results with commit date instead of author date but can't is
> just skip that header and leave it to the mailer?
>
> It does not even seem to have an option to suppress adding the date
> header.

I think you are complaining about output from "git format-patch",
and the reason why the date header is recorded in the output is as
others already mentioned in this thread.

The complaint about "delivery" is misplaced because that date is not
used to drive the SMTP conversation in any way.  "git send-email"
does create its own timestamp, but that is based on the current time
and does not have anything to do with the author or committer date
of the original commit the patch message came from.  

I think we confused end-users like you by allowing the command to
drive "git format-patch" from the command line (and worse, somehow
appearing to encourage such use), which probably was a UI mistake.
We should encourage people to run two commands separately instead,
which incidentally will allow the patch messages to be proofread for
the last time before they are sent out, but also reduce this
confusion when users see that these dates from the author timestamp
are not used in the "Date:" header of received e-mails.

Re: [PATCH v1 0/5] Incremental rewrite of git-submodules: git-foreach

2018-01-30 Thread Stefan Beller

On Mon, Jan 29, 2018 at 11:34 AM, Prathamesh Chavan  wrote:
> Following series of patches focuses on porting submodule subcommand
> git-foreach from shell to C.
> An initial attempt for porting was introduced about 9 months back,
> and since then then patches have undergone many changes. Some of the
> notable discussion thread which I would like to point out is: [1]
> The previous version of this patch series which was floated is
> available at: [2].
>
> The following changes were made to that:
> * As it was observed in other submodule subcommand's ported function
>   that the number of params increased a lot, the variables quiet and
>   recursive, were replaced in the cb_foreach struct with a single
>   unsigned integer variable called flags.
>
> * To accomodate the possiblity of a direct call to the functions
>   runcommand_in_submodule(), callback function
>   runcommand_in_submodule_cb() was introduced.
>
> [1]: 
> https://public-inbox.org/git/20170419170513.16475-1-pc44...@gmail.com/T/#u
> [2]: https://public-inbox.org/git/20170807211900.15001-14-pc44...@gmail.com/
>
> As before you can find this series at:
> https://github.com/pratham-pc/git/commits/patch-series-3
>
> And its build report is available at:
> https://travis-ci.org/pratham-pc/git/builds/
> Branch: patch-series-3
> Build #202
>
> Prathamesh Chavan (5):
>   submodule foreach: correct '$path' in nested submodules from a
> subdirectory
>   submodule foreach: document '$sm_path' instead of '$path'
>   submodule foreach: clarify the '$toplevel' variable documentation
>   submodule foreach: document variable '$displaypath'
>   submodule: port submodule subcommand 'foreach' from shell to C
>
>  Documentation/git-submodule.txt |  15 ++--
>  builtin/submodule--helper.c | 151 
> 
>  git-submodule.sh|  40 +--
>  t/t7407-submodule-foreach.sh|  38 +-
>  4 files changed, 197 insertions(+), 47 deletions(-)

Thanks for bringing this series up again, my review still holds.

Thanks,
Stefan

Re: [PATCH v5 5/7] convert: add 'working-tree-encoding' attribute

2018-01-30 Thread Lars Schneider

> On 30 Jan 2018, at 21:05, Junio C Hamano  wrote:
> 
> tbo...@web.de writes:
> 
>> +if ((conv_flags & CONV_WRITE_OBJECT) && !strcmp(enc->name, 
>> "SHIFT-JIS")) {
>> +char *re_src;
>> +int re_src_len;
> 
> I think it is a bad idea to 
> 
> (1) not check without CONV_WRITE_OBJECT here.

The idea is to perform the roundtrip check *only* if we 
actually write to Git. In all other cases we don't care
if the encoding roundtrips.

"git checkout" is such a case where we don't care as 
noted by Peff here:
https://public-inbox.org/git/20171215095838.ga3...@sigill.intra.peff.net/

Do you agree?

> (2) hardcode SJIS and do this always and to SJIS alone.
> 
> ...
> 
> For (2), perhaps introduce a multi-value configuration variable
> core.checkRoundtripEncoding, whose default value consists of just
> SJIS, but allow users to add or clear it?

Well, in that case I would make it simpler and make
core.checkRoundtripEncoding a boolean that applies to all encodings
if enabled. We could make even simpler than that by removing the entire 
roundtrip check. The thing is, I was not able to come up with a
sequence that would not generate a iconv error *and* not round trip.
Would that be ok for you to remove all that roundtrip checking code?

>> +re_src = reencode_string_len(dst, dst_len,
>> + enc->name, default_encoding,
>> + _src_len);
>> +
>> +if (!re_src || src_len != re_src_len ||
>> +memcmp(src, re_src, src_len)) {
>> +const char* msg = _("encoding '%s' from %s to %s and "
>> +"back is not the same");
>> +if (conv_flags & CONV_WRITE_OBJECT)
>> +die(msg, path, enc->name, default_encoding);
>> +else
>> +error(msg, path, enc->name, default_encoding);
> 
> The "error" side of this inner if() is dead code, I think.

Good catch. I think this code should go away if we keep the roundtrip
code and you agree with my statement above.

Thanks a lot for the review,
Lars

git add --all does not respect submodule..ignore

2018-01-30 Thread Michael Scott-Nelson

Giving a submodule "ignore=all" or "ignore=dirty" in .gitmodule
successfully removes that module from `git status` queries. However,
these same diffs are automatically added by git-add, eg `git add .` or
`git add --all` adds the submodules that we want ignored. This seems
inconsistent and confusing.

Workflow reason:
We want to be able to have supers and subs checked-out, make changes
to both, but only update the SHA-1 pointer from super to sub when
explicitly forced to do so, eg `git add -f subName`. This workflow
prevents continuous merge conflicts from clashing SHA-1 pointers while
still allowing `git add --all`, allowing a sort of middle ground
between submodules and an untracked library.

Teaching git-add about submodule.ignore and/or teaching .gitignore
about submodules would be awesome.

Also experimented with `git update-index --assume-unchanged subName`,
but I believe that it does not get committed and besides also does not
seem to have a way to `git add -f`.
---
Note: currently on git version 2.14.1, but looking through the
changelogs, did not see any changes since then that would enable this
workflow.

-Michael Scott-Nelson

Re: [PATCH 0/2] Add "git rebase --show-patch"

2018-01-30 Thread Junio C Hamano

Johannes Schindelin  writes:

> The pseudo ref certainly has an appeal. For people very familiar with
> Git's peculiarities such as FETCH_HEAD. Such as myself.
>
> For users, it is probably substantially worse an experience than having a
> cmdmode like --show-patch in the very command they were just running.

I tend to agree with that assessment.  FETCH_HEAD was a required
mechanism for commands in the toolchain to communicate and wasn't
meant as a mechanism for end-users.  I do not think it is a good 
idea to add to the pile to these special files the users would need
to look at, when we do not need to.  

Even if the internal implementation uses such a file, wrapping it
with a documented command mode would make a better UI.

Re: [PATCH v2 02/10] sequencer: introduce new commands to reset the revision

2018-01-30 Thread Stefan Beller

On Mon, Jan 29, 2018 at 2:54 PM, Johannes Schindelin
 wrote:
> In the upcoming commits, we will teach the sequencer to recreate merges.
> This will be done in a very different way from the unfortunate design of
> `git rebase --preserve-merges` (which does not allow for reordering
> commits, or changing the branch topology).
>
> The main idea is to introduce new todo list commands, to support
> labeling the current revision with a given name, resetting the current
> revision to a previous state, merging labeled revisions.
>
> This idea was developed in Git for Windows' Git garden shears (that are
> used to maintain the "thicket of branches" on top of upstream Git), and
> this patch is part of the effort to make it available to a wider
> audience, as well as to make the entire process more robust (by
> implementing it in a safe and portable language rather than a Unix shell
> script).
>
> This commit implements the commands to label, and to reset to, given
> revisions. The syntax is:
>
> label 
> reset 
>
> Internally, the `label ` command creates the ref
> `refs/rewritten/`. This makes it possible to work with the labeled
> revisions interactively, or in a scripted fashion (e.g. via the todo
> list command `exec`).
>
> Later in this patch series, we will mark the `refs/rewritten/` refs as
> worktree-local, to allow for interactive rebases to be run in parallel in
> worktrees linked to the same repository.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  git-rebase--interactive.sh |   2 +
>  sequencer.c| 180 
> -
>  2 files changed, 179 insertions(+), 3 deletions(-)
>
> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index fcedece1860..7e5281e74aa 100644
> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -162,6 +162,8 @@ s, squash  = use commit, but meld into previous 
> commit
>  f, fixup  = like \"squash\", but discard this commit's log message
>  x, exec  = run command (the rest of the line) using shell
>  d, drop  = remove commit
> +l, label  = label current HEAD with a name
> +t, reset  = reset HEAD to a label
>
>  These lines can be re-ordered; they are executed from top to bottom.
>  " | git stripspace --comment-lines >>"$todo"
> diff --git a/sequencer.c b/sequencer.c
> index 4d3f60594cb..92ca8d2adee 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -21,6 +21,8 @@
>  #include "log-tree.h"
>  #include "wt-status.h"
>  #include "hashmap.h"
> +#include "unpack-trees.h"
> +#include "worktree.h"
>
>  #define GIT_REFLOG_ACTION "GIT_REFLOG_ACTION"
>
> @@ -116,6 +118,13 @@ static GIT_PATH_FUNC(rebase_path_stopped_sha, 
> "rebase-merge/stopped-sha")
>  static GIT_PATH_FUNC(rebase_path_rewritten_list, 
> "rebase-merge/rewritten-list")
>  static GIT_PATH_FUNC(rebase_path_rewritten_pending,
> "rebase-merge/rewritten-pending")
> +
> +/*
> + * The path of the file listing refs that need to be deleted after the rebase
> + * finishes. This is used by the `merge` command.
> + */

So this file contains (label -> commit), which is appended in do_label,
it uses refs to store the commits in refs/rewritten.
We do not have to worry about the contents of that file getting too long,
or label re-use, because the directory containing all these helper files will
be deleted upon successful rebase in `sequencer_remove_state()`.



> +static GIT_PATH_FUNC(rebase_path_refs_to_delete, 
> "rebase-merge/refs-to-delete")
> +
>  /*
>   * The following files are written by git-rebase just after parsing the
>   * command-line (and are only consumed, not modified, by the sequencer).
> @@ -767,6 +776,8 @@ enum todo_command {
> TODO_SQUASH,
> /* commands that do something else than handling a single commit */
> TODO_EXEC,
> +   TODO_LABEL,
> +   TODO_RESET,
> /* commands that do nothing but are counted for reporting progress */
> TODO_NOOP,
> TODO_DROP,
> @@ -785,6 +796,8 @@ static struct {
> { 'f', "fixup" },
> { 's', "squash" },
> { 'x', "exec" },
> +   { 'l', "label" },
> +   { 't', "reset" },
> { 0,   "noop" },
> { 'd', "drop" },
> { 0,   NULL }
> @@ -1253,7 +1266,8 @@ static int parse_insn_line(struct todo_item *item, 
> const char *bol, char *eol)
> if (skip_prefix(bol, todo_command_info[i].str, )) {
> item->command = i;
> break;
> -   } else if (bol[1] == ' ' && *bol == todo_command_info[i].c) {
> +   } else if ((bol + 1 == eol || bol[1] == ' ') &&
> +  *bol == todo_command_info[i].c) {
> bol++;
> item->command = i;
> break;
> @@ -1279,7 +1293,8 @@ static int parse_insn_line(struct todo_item *item, 
> const char *bol, char *eol)
> return

Re: [PATCH v5 5/7] convert: add 'working-tree-encoding' attribute

2018-01-30 Thread Junio C Hamano

tbo...@web.de writes:

> + if ((conv_flags & CONV_WRITE_OBJECT) && !strcmp(enc->name, 
> "SHIFT-JIS")) {
> + char *re_src;
> + int re_src_len;

I think it is a bad idea to 

 (1) not check without CONV_WRITE_OBJECT here.
 (2) hardcode SJIS and do this always and to SJIS alone.

For (1), a fix would be obvious (and that will resurrect the dead
code below).

For (2), perhaps introduce a multi-value configuration variable
core.checkRoundtripEncoding, whose default value consists of just
SJIS, but allow users to add or clear it?

> + re_src = reencode_string_len(dst, dst_len,
> +  enc->name, default_encoding,
> +  _src_len);
> +
> + if (!re_src || src_len != re_src_len ||
> + memcmp(src, re_src, src_len)) {
> + const char* msg = _("encoding '%s' from %s to %s and "
> + "back is not the same");
> + if (conv_flags & CONV_WRITE_OBJECT)
> + die(msg, path, enc->name, default_encoding);
> + else
> + error(msg, path, enc->name, default_encoding);

The "error" side of this inner if() is dead code, I think.

1 2 >

1 - 100 of 126 matches

Mail list logo