[PATCH] mergetool: don't suggest to continue after last file

2018-08-12 Thread Nicholas Guriev
This eliminates an unnecessary prompt to continue after failed merger.
The patch uses positional parameters to count files in the list. If only
one iteration is remained, the prompt_after_failed_merge function is not
called.

Signed-off-by: Nicholas Guriev 
---
 git-mergetool.sh | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/git-mergetool.sh b/git-mergetool.sh
index c062e3d..d07c7f3 100755
--- a/git-mergetool.sh
+++ b/git-mergetool.sh
@@ -491,14 +491,16 @@ main () {
printf "%s\n" "$files"
 
rc=0
-   for i in $files
+   set -- $files
+   while test $# -ne 0
do
printf "\n"
-   if ! merge_file "$i"
+   if ! merge_file "$1"
then
rc=1
-   prompt_after_failed_merge || exit 1
+   test $# -ne 1 && prompt_after_failed_merge || exit 1
fi
+   shift
done
 
exit $rc
-- 
2.7.4



[PATCH v2] status: -i shorthand for --ignored command line option

2018-08-12 Thread Коля Гурьев
09.08.2018 18:44, Junio C Hamano пишет:
> Unlike "-u', "-i" is supported by "git commit" which shares the
> underlying implementation, and that is the historical reason why we
> never had "-i" shorthand, I think.  

git-commit supports the -u flag, its meaning is the same as for
git-status. Although the -i flag might be confused with the --include
option of git-commit, I suggest this shortening based on first letter of
the --ignored option because git-status and git-commit are different
commands and it's more obvious shortening.

> While I do understand why sometimes "-u" is useful, especially
> "-uno", to help those whose .gitignore is not managed as well as it
> should be, I am not sure why a "typical git-status" invocation would
> ask for "--ignored" that often to require such a short-hand.

The --ignored option is often used for opposing purposes, to show all
changes in working directory regardless of .gitignore files which may be
written sloppy. I've discovered that I type this option quite
frequently, and I hope my case may help others.





Re: [PATCH v4 1/7] Add delta-islands.{c,h}

2018-08-12 Thread Christian Couder
On Mon, Aug 13, 2018 at 3:14 AM, Ramsay Jones
 wrote:
> On 12/08/18 06:11, Christian Couder wrote:

>> Because the inefficiency primarily arises when an
>> object is delitified against another object that does
>
> s/delitified/deltified/ ?

Ok, this will be in the next reroll if any.

>> not exist in the same fork, we partition objects into
>> sets that appear in the same fork, and define
>> "delta islands". When finding delta base, we do not
>> allow an object outside the same island to be
>> considered as its base.

>> --- /dev/null
>> +++ b/delta-islands.c
>> @@ -0,0 +1,493 @@
>> +#include "cache.h"
>> +#include "attr.h"
>> +#include "object.h"
>> +#include "blob.h"
>> +#include "commit.h"
>> +#include "tag.h"
>> +#include "tree.h"
>> +#include "delta.h"
>> +#include "pack.h"
>> +#include "tree-walk.h"
>> +#include "diff.h"
>> +#include "revision.h"
>> +#include "list-objects.h"
>> +#include "progress.h"
>> +#include "refs.h"
>> +#include "khash.h"
>
> I was wondering how many copies of the inline functions
> introduced by this header we had, so:
>
>   $ nm git | grep ' t ' | cut -d' ' -f3 | sort | uniq -c | sort -rn | grep kh_
> 3 kh_resize_sha1
> 3 kh_put_sha1
> 3 kh_init_sha1
> 3 kh_get_sha1
> 1 kh_resize_str
> 1 kh_resize_sha1_pos
> 1 kh_put_str
> 1 kh_put_sha1_pos
> 1 kh_init_str
> 1 kh_init_sha1_pos
> 1 kh_get_str
> 1 kh_get_sha1_pos
> 1 kh_destroy_sha1
>   $
>
> Looking at the individual object files, we see:
>
>   $ nm pack-bitmap-write.o | grep ' t ' | grep kh_
>   01cc t kh_get_sha1
>   01b7 t kh_init_sha1
>   085e t kh_put_sha1
>   0310 t kh_resize_sha1
>   $
>
> So, the two instances of the sha1 hash-map are never
> destroyed (kh_destroy_sha1 is not present in the object
> file).

This is interesting (even though it seems related to more code than
the current patch series).

As those hash maps are in 'struct bitmap_writer' and a static instance is used:

  static struct bitmap_writer writer;

it maybe ok.

>   $ nm pack-bitmap.o | grep ' t ' | grep kh_
>   02d9 t kh_destroy_sha1
>   032b t kh_get_sha1
>   0daa t kh_get_sha1_pos
>   02c4 t kh_init_sha1
>   0d95 t kh_init_sha1_pos
>   09bd t kh_put_sha1
>   1432 t kh_put_sha1_pos
>   046f t kh_resize_sha1
>   0eee t kh_resize_sha1_pos
>   $
>
> The sha1_pos hash-map is not destroyed here.

Yeah, maybe a line like:

kh_destroy_pos(b->ext_index.positions);

is missing from free_bitmap_index()?

Adding that should be in a separate patch from this series though.

>   $ nm delta-islands.o | grep ' t ' | grep kh_
>   02be t kh_get_sha1
>   0e52 t kh_get_str
>   02a9 t kh_init_sha1
>   0e3d t kh_init_str
>   0950 t kh_put_sha1
>   14e4 t kh_put_str
>   0402 t kh_resize_sha1
>   0f96 t kh_resize_str
>   $
>
> And neither the sha1 or str hash-maps are destroyed here.
> (That is not necessarily a problem, of course! ;-) )

The instances are declared as static:

  static khash_sha1 *island_marks;

  static kh_str_t *remote_islands;

so it maybe ok.

>> +struct island_bitmap {
>> + uint32_t refcount;
>> + uint32_t bits[];
>
> Use FLEX_ARRAY here? We are slowly moving toward requiring
> certain C99 features, but I can't remember a flex array
> weather-balloon patch.

This was already discussed by Junio and Peff there:

https://public-inbox.org/git/20180727130229.gb18...@sigill.intra.peff.net/

>> +};

>> +int in_same_island(const struct object_id *trg_oid, const struct object_id 
>> *src_oid)
>
> Hmm, what does the trg_ prefix stand for?
>
>> +{
>> + khiter_t trg_pos, src_pos;
>> +
>> + /* If we aren't using islands, assume everything goes together. */
>> + if (!island_marks)
>> + return 1;
>> +
>> + /*
>> +  * If we don't have a bitmap for the target, we can delta it
>
> ... Ah, OK, trg_ => target.

I am ok to replace "trg" with "target" (or maybe "dst"? or something
else) and "src" with "source" if you think it would make things
clearer.

>> +static void add_ref_to_island(const char *island_name, const struct 
>> object_id *oid)
>> +{
>> + uint64_t sha_core;
>> + struct remote_island *rl = NULL;
>> +
>> + int hash_ret;
>> + khiter_t pos = kh_put_str(remote_islands, island_name, _ret);
>> +
>> + if (hash_ret) {
>> + kh_key(remote_islands, pos) = xstrdup(island_name);
>> + kh_value(remote_islands, pos) = xcalloc(1, sizeof(struct 
>> remote_island));
>> + }
>> +
>> + rl = kh_value(remote_islands, pos);
>> + oid_array_append(>oids, oid);
>> +
>> + memcpy(_core, oid->hash, sizeof(uint64_t));
>> + rl->hash += sha_core;
>
> Hmm, so the first 64-bits of the oid of each ref that is part of
> this island is 

Re: [PATCH v4 1/7] Add delta-islands.{c,h}

2018-08-12 Thread Ramsay Jones



On 12/08/18 06:11, Christian Couder wrote:
> From: Jeff King 
> 
> Hosting providers that allow users to "fork" existing
> repos want those forks to share as much disk space as
> possible.
> 
> Alternates are an existing solution to keep all the
> objects from all the forks into a unique central repo,
> but this can have some drawbacks. Especially when
> packing the central repo, deltas will be created
> between objects from different forks.
> 
> This can make cloning or fetching a fork much slower
> and much more CPU intensive as Git might have to
> compute new deltas for many objects to avoid sending
> objects from a different fork.
> 
> Because the inefficiency primarily arises when an
> object is delitified against another object that does

s/delitified/deltified/ ?

> not exist in the same fork, we partition objects into
> sets that appear in the same fork, and define
> "delta islands". When finding delta base, we do not
> allow an object outside the same island to be
> considered as its base.
> 
> So "delta islands" is a way to store objects from
> different forks in the same repo and packfile without
> having deltas between objects from different forks.
> 
> This patch implements the delta islands mechanism in
> "delta-islands.{c,h}", but does not yet make use of it.
> 
> A few new fields are added in 'struct object_entry'
> in "pack-objects.h" though.
> 
> The documentation will follow in a patch that actually
> uses delta islands in "builtin/pack-objects.c".
> 
> Signed-off-by: Jeff King 
> Signed-off-by: Christian Couder 
> ---
>  Makefile|   1 +
>  delta-islands.c | 493 
>  delta-islands.h |  11 ++
>  pack-objects.h  |   4 +
>  4 files changed, 509 insertions(+)
>  create mode 100644 delta-islands.c
>  create mode 100644 delta-islands.h
> 
> diff --git a/Makefile b/Makefile
> index bc4fc8eeab..e7994888e8 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -841,6 +841,7 @@ LIB_OBJS += csum-file.o
>  LIB_OBJS += ctype.o
>  LIB_OBJS += date.o
>  LIB_OBJS += decorate.o
> +LIB_OBJS += delta-islands.o
>  LIB_OBJS += diffcore-break.o
>  LIB_OBJS += diffcore-delta.o
>  LIB_OBJS += diffcore-order.o
> diff --git a/delta-islands.c b/delta-islands.c
> new file mode 100644
> index 00..c0b0da6837
> --- /dev/null
> +++ b/delta-islands.c
> @@ -0,0 +1,493 @@
> +#include "cache.h"
> +#include "attr.h"
> +#include "object.h"
> +#include "blob.h"
> +#include "commit.h"
> +#include "tag.h"
> +#include "tree.h"
> +#include "delta.h"
> +#include "pack.h"
> +#include "tree-walk.h"
> +#include "diff.h"
> +#include "revision.h"
> +#include "list-objects.h"
> +#include "progress.h"
> +#include "refs.h"
> +#include "khash.h"

I was wondering how many copies of the inline functions
introduced by this header we had, so:

  $ nm git | grep ' t ' | cut -d' ' -f3 | sort | uniq -c | sort -rn | grep kh_
3 kh_resize_sha1
3 kh_put_sha1
3 kh_init_sha1
3 kh_get_sha1
1 kh_resize_str
1 kh_resize_sha1_pos
1 kh_put_str
1 kh_put_sha1_pos
1 kh_init_str
1 kh_init_sha1_pos
1 kh_get_str
1 kh_get_sha1_pos
1 kh_destroy_sha1
  $ 

Looking at the individual object files, we see:

  $ nm pack-bitmap-write.o | grep ' t ' | grep kh_
  01cc t kh_get_sha1
  01b7 t kh_init_sha1
  085e t kh_put_sha1
  0310 t kh_resize_sha1
  $ 

So, the two instances of the sha1 hash-map are never
destroyed (kh_destroy_sha1 is not present in the object
file).

  $ nm pack-bitmap.o | grep ' t ' | grep kh_
  02d9 t kh_destroy_sha1
  032b t kh_get_sha1
  0daa t kh_get_sha1_pos
  02c4 t kh_init_sha1
  0d95 t kh_init_sha1_pos
  09bd t kh_put_sha1
  1432 t kh_put_sha1_pos
  046f t kh_resize_sha1
  0eee t kh_resize_sha1_pos
  $ 

The sha1_pos hash-map is not destroyed here.

  $ nm delta-islands.o | grep ' t ' | grep kh_
  02be t kh_get_sha1
  0e52 t kh_get_str
  02a9 t kh_init_sha1
  0e3d t kh_init_str
  0950 t kh_put_sha1
  14e4 t kh_put_str
  0402 t kh_resize_sha1
  0f96 t kh_resize_str
  $ 

And neither the sha1 or str hash-maps are destroyed here.
(That is not necessarily a problem, of course! ;-) )
   
> +#include "pack-bitmap.h"
> +#include "pack-objects.h"
> +#include "delta-islands.h"
> +#include "sha1-array.h"
> +#include "config.h"
> +
> +KHASH_INIT(str, const char *, void *, 1, kh_str_hash_func, kh_str_hash_equal)
> +
> +static khash_sha1 *island_marks;
> +static unsigned island_counter;
> +static unsigned island_counter_core;
> +
> +static kh_str_t *remote_islands;
> +
> +struct remote_island {
> + uint64_t hash;
> + struct oid_array oids;
> +};
> +
> +struct island_bitmap {
> + uint32_t refcount;
> + uint32_t bits[];

Use FLEX_ARRAY 

[PATCH] t5318: avoid unnecessary command substitutions

2018-08-12 Thread SZEDER Gábor
Two tests added in dade47c06c (commit-graph: add repo arg to graph
readers, 2018-07-11) prepare the contents of 'expect' files by
'echo'ing the results of command substitutions.  That's unncessary,
avoid them by directly saving the output of the commands executed in
those command substitutions.

Signed-off-by: SZEDER Gábor 
---
 t/t5318-commit-graph.sh | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 1c148ebf21..3c1ffad491 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -444,25 +444,27 @@ test_expect_success 'setup non-the_repository tests' '
 test_expect_success 'parse_commit_in_graph works for non-the_repository' '
test-tool repository parse_commit_in_graph \
repo/.git repo "$(git -C repo rev-parse two)" >actual &&
-   echo $(git -C repo log --pretty="%ct" -1) \
-   $(git -C repo rev-parse one) >expect &&
+   {
+   git -C repo log --pretty=format:"%ct " -1 &&
+   git -C repo rev-parse one
+   } >expect &&
test_cmp expect actual &&
 
test-tool repository parse_commit_in_graph \
repo/.git repo "$(git -C repo rev-parse one)" >actual &&
-   echo $(git -C repo log --pretty="%ct" -1 one) >expect &&
+   git -C repo log --pretty="%ct" -1 one >expect &&
test_cmp expect actual
 '
 
 test_expect_success 'get_commit_tree_in_graph works for non-the_repository' '
test-tool repository get_commit_tree_in_graph \
repo/.git repo "$(git -C repo rev-parse two)" >actual &&
-   echo $(git -C repo rev-parse two^{tree}) >expect &&
+   git -C repo rev-parse two^{tree} >expect &&
test_cmp expect actual &&
 
test-tool repository get_commit_tree_in_graph \
repo/.git repo "$(git -C repo rev-parse one)" >actual &&
-   echo $(git -C repo rev-parse one^{tree}) >expect &&
+   git -C repo rev-parse one^{tree} >expect &&
test_cmp expect actual
 '
 
-- 
2.18.0.408.g42635c01bc



Re: [PATCH v5 05/21] range-diff: also show the diff between patches

2018-08-12 Thread Thomas Gummerer
Hi Dscho,

On 08/10, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin 
>
> [..]
> 
> @@ -13,15 +14,38 @@ NULL
>  int cmd_range_diff(int argc, const char **argv, const char *prefix)
>  {
>   int creation_factor = 60;
> + struct diff_options diffopt = { NULL };
>   struct option options[] = {
>   OPT_INTEGER(0, "creation-factor", _factor,
>   N_("Percentage by which creation is weighted")),
>   OPT_END()
>   };
> - int res = 0;
> + int i, j, res = 0;
>   struct strbuf range1 = STRBUF_INIT, range2 = STRBUF_INIT;
>  
> + git_config(git_diff_ui_config, NULL);
> +
> + diff_setup();
> + diffopt.output_format = DIFF_FORMAT_PATCH;
> +
>   argc = parse_options(argc, argv, NULL, options,
> +  builtin_range_diff_usage, PARSE_OPT_KEEP_UNKNOWN |
> +  PARSE_OPT_KEEP_DASHDASH | PARSE_OPT_KEEP_ARGV0);
> +
> + for (i = j = 1; i < argc && strcmp("--", argv[i]); ) {
> + int c = diff_opt_parse(, argv + i, argc - i, prefix);
> +
> + if (!c)
> + argv[j++] = argv[i++];
> + else
> + i += c;
> + }

I don't think this handles "--" quite as would be expected.  Trying to
use "git range-diff -- js/range-diff-v4...HEAD" I get:

$ ./git range-diff -- js/range-diff-v4...HEAD
error: need two commit ranges
usage: git range-diff [] .. 
..
   or: git range-diff [] ...
   or: git range-diff []   

--creation-factor 
  Percentage by which creation is weighted
--no-dual-color   color both diff and diff-between-diffs

while what I would have expected is to actually get a range diff.
This happens because after we break out of the loop we don't add the
actual ranges to argv, but just skip them instead.

I think something like the following should be squashed in to this
patch.

--->8---
diff --git a/builtin/range-diff.c b/builtin/range-diff.c
index ef3ba22e29..132574c57a 100644
--- a/builtin/range-diff.c
+++ b/builtin/range-diff.c
@@ -53,6 +53,11 @@ int cmd_range_diff(int argc, const char **argv, const char 
*prefix)
else
i += c;
}
+   if (i < argc && !strcmp("--", argv[i])) {
+   i++; j++;
+   while (i < argc)
+   argv[j++] = argv[i++];
+   }
argc = j;
diff_setup_done();
 
--->8---

> + argc = j;
> + diff_setup_done();
> +
> + /* Make sure that there are no unparsed options */
> + argc = parse_options(argc, argv, NULL,
> +  options + ARRAY_SIZE(options) - 1, /* OPT_END */
>builtin_range_diff_usage, 0);
>  
>   if (argc == 2) {
> @@ -59,7 +83,8 @@ int cmd_range_diff(int argc, const char **argv, const char 
> *prefix)
>   usage_with_options(builtin_range_diff_usage, options);
>   }
>  
> - res = show_range_diff(range1.buf, range2.buf, creation_factor);
> + res = show_range_diff(range1.buf, range2.buf, creation_factor,
> +   );
>  
>   strbuf_release();
>   strbuf_release();
> diff --git a/range-diff.c b/range-diff.c
> index 2d94200d3..71883a4b7 100644
> --- a/range-diff.c
> +++ b/range-diff.c
> @@ -6,6 +6,7 @@
>  #include "hashmap.h"
>  #include "xdiff-interface.h"
>  #include "linear-assignment.h"
> +#include "diffcore.h"
>  
>  struct patch_util {
>   /* For the search for an exact match */
> @@ -258,7 +259,31 @@ static const char *short_oid(struct patch_util *util)
>   return find_unique_abbrev(>oid, DEFAULT_ABBREV);
>  }
>  
> -static void output(struct string_list *a, struct string_list *b)
> +static struct diff_filespec *get_filespec(const char *name, const char *p)
> +{
> + struct diff_filespec *spec = alloc_filespec(name);
> +
> + fill_filespec(spec, _oid, 0, 0644);
> + spec->data = (char *)p;
> + spec->size = strlen(p);
> + spec->should_munmap = 0;
> + spec->is_stdin = 1;
> +
> + return spec;
> +}
> +
> +static void patch_diff(const char *a, const char *b,
> +   struct diff_options *diffopt)
> +{
> + diff_queue(_queued_diff,
> +get_filespec("a", a), get_filespec("b", b));
> +
> + diffcore_std(diffopt);
> + diff_flush(diffopt);
> +}
> +
> +static void output(struct string_list *a, struct string_list *b,
> +struct diff_options *diffopt)
>  {
>   int i = 0, j = 0;
>  
> @@ -300,6 +325,9 @@ static void output(struct string_list *a, struct 
> string_list *b)
>   printf("%d: %s ! %d: %s\n",
>  b_util->matching + 1, short_oid(a_util),
>  j + 1, short_oid(b_util));
> + if (!(diffopt->output_format & DIFF_FORMAT_NO_OUTPUT))
> + patch_diff(a->items[b_util->matching].string,

[PATCH] t5318: use 'test_cmp_bin' to compare commit-graph files

2018-08-12 Thread SZEDER Gábor
The commit-graph files are binary files, so they should not be
compared with 'test_cmp', because that might cause issues on Windows,
where 'test_cmp' is a shell function to deal with random LF-CRLF
conversions.

Use 'test_cmp_bin' instead.

Signed-off-by: SZEDER Gábor 
---
 t/t5318-commit-graph.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 4f17d7701e..1c148ebf21 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -254,9 +254,9 @@ test_expect_success 'check that gc computes commit-graph' '
git config gc.writeCommitGraph true &&
git gc &&
cp $objdir/info/commit-graph commit-graph-after-gc &&
-   ! test_cmp commit-graph-before-gc commit-graph-after-gc &&
+   ! test_cmp_bin commit-graph-before-gc commit-graph-after-gc &&
git commit-graph write --reachable &&
-   test_cmp commit-graph-after-gc $objdir/info/commit-graph
+   test_cmp_bin commit-graph-after-gc $objdir/info/commit-graph
 '
 
 # the verify tests below expect the commit-graph to contain
-- 
2.18.0.408.g42635c01bc



Re: [PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-12 Thread Thomas Adam
On Sun, 12 Aug 2018 at 09:19, Nguyễn Thái Ngọc Duy  wrote:

Hi,

> +   trace_performance_leave("cache_tree_update");

I would suggest trace_performance_leave() calls use __func__ instead.
That way, there's no ambiguity if the function name ever changes.

Kindly,
Thomas


Re: Help with "fatal: unable to read ...." error during GC?

2018-08-12 Thread Duy Nguyen
On Sat, Aug 11, 2018 at 4:25 PM Jeff King  wrote:
> > I do still have these warnings and no amount of git gc/git fsck/etc.
> > has reduced them in any way:
> >
> > $ git gc
> > warning: reflog of 'HEAD' references pruned commits
> > warning: reflog of 'HEAD' references pruned commits
> > warning: reflog of 'HEAD' references pruned commits
> > warning: reflog of 'HEAD' references pruned commits
> > warning: reflog of 'HEAD' references pruned commits
> > warning: reflog of 'HEAD' references pruned commits
> > warning: reflog of 'HEAD' references pruned commits
> > warning: reflog of 'HEAD' references pruned commits
>
> I think these would go away via "reflog expire" (I'd have thought "git
> gc" would do so, though). I wonder if this is yet another tool that
> needs to be taught about worktree heads.

You would need "reflog expire --expire-unreachable=now" because the
default 30 days are probably too long for this case. And yes "reflog
expire --all" needs to be aware of other heads (and other per-worktree
refs in general). I'm pretty sure right now it only cares about the
current worktree's head.
-- 
Duy


[PATCH v4] clone: report duplicate entries on case-insensitive filesystems

2018-08-12 Thread Nguyễn Thái Ngọc Duy
Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what exactly is "dirty".

This patch helps the situation a bit by pointing out the problem at
clone time. Even though this patch talks about case sensitivity, the
patch makes no assumption about folding rules by the filesystem. It
simply observes that if an entry has been already checked out at clone
time when we're about to write a new path, some folding rules are
behind this.

This patch is tested with vim-colorschemes repository on a JFS partition
with case insensitive support on Linux. This repository has two files
darkBlue.vim and darkblue.vim.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 v4 removes nr_duplicates (and fixes that false warning Szeder
 reported). It also hints about case insensitivity as a cause of
 problem because it's most likely the case when this warning shows up.

 builtin/clone.c  |  1 +
 cache.h  |  1 +
 entry.c  | 28 
 t/t5601-clone.sh |  8 +++-
 unpack-trees.c   | 28 
 unpack-trees.h   |  1 +
 6 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5c439f1394..0702b0e9d0 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -747,6 +747,7 @@ static int checkout(int submodule_progress)
memset(, 0, sizeof opts);
opts.update = 1;
opts.merge = 1;
+   opts.clone = 1;
opts.fn = oneway_merge;
opts.verbose_update = (option_verbosity >= 0);
opts.src_index = _index;
diff --git a/cache.h b/cache.h
index 8b447652a7..6d6138f4f1 100644
--- a/cache.h
+++ b/cache.h
@@ -1455,6 +1455,7 @@ struct checkout {
unsigned force:1,
 quiet:1,
 not_new:1,
+clone:1,
 refresh_cache:1;
 };
 #define CHECKOUT_INIT { NULL, "" }
diff --git a/entry.c b/entry.c
index b5d1d3cf23..c70340df8e 100644
--- a/entry.c
+++ b/entry.c
@@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct 
stat *st, int skiplen)
return lstat(path, st);
 }
 
+static void mark_colliding_entries(const struct checkout *state,
+  struct cache_entry *ce, struct stat *st)
+{
+   int i;
+
+   ce->ce_flags |= CE_MATCHED;
+
+#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
+   for (i = 0; i < state->istate->cache_nr; i++) {
+   struct cache_entry *dup = state->istate->cache[i];
+
+   if (dup == ce)
+   break;
+
+   if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
+   continue;
+
+   if (dup->ce_stat_data.sd_ino == st->st_ino) {
+   dup->ce_flags |= CE_MATCHED;
+   break;
+   }
+   }
+#endif
+}
+
 /*
  * Write the contents from ce out to the working tree.
  *
@@ -455,6 +480,9 @@ int checkout_entry(struct cache_entry *ce,
return -1;
}
 
+   if (state->clone)
+   mark_colliding_entries(state, ce, );
+
/*
 * We unlink the old file, to get the new one with the
 * right permissions (including umask, which is nasty
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 0b62037744..f2eb73bc74 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' '
git hash-object -w -t tree --stdin) &&
c=$(git commit-tree -m bogus $t) &&
git update-ref refs/heads/bogus $c &&
-   git clone -b bogus . bogus
+   git clone -b bogus . bogus 2>warning
)
 '
 
+test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file 
detection' '
+   grep X icasefs/warning &&
+   grep x icasefs/warning &&
+   test_i18ngrep "the following paths have collided" icasefs/warning
+'
+
 partial_clone () {
   SERVER="$1" &&
   URL="$2" &&
diff --git a/unpack-trees.c b/unpack-trees.c
index cd0680f11e..443df048ef 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -359,6 +359,12 @@ static int check_updates(struct unpack_trees_options *o)
state.refresh_cache = 1;
state.istate = index;
 
+   if (o->clone) {
+   state.clone = 1;
+   for (i = 0; i < index->cache_nr; i++)
+   index->cache[i]->ce_flags &= ~CE_MATCHED;
+   }
+
progress = get_progress(o);
 
if (o->update)
@@ -423,6 +429,28 @@ static int check_updates(struct unpack_trees_options *o)
errs |= finish_delayed_checkout();
if (o->update)
git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
+
+   if 

[PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-12 Thread Nguyễn Thái Ngọc Duy
We're going to optimize unpack_trees() a bit in the following
patches. Let's add some tracing to measure how long it takes before
and after. This is the baseline ("git checkout -" on webkit.git, 275k
files on worktree)

performance: 0.056651714 s:  read cache .git/index
performance: 0.183101080 s:  preload index
performance: 0.008584433 s:  refresh index
performance: 0.633767589 s:   traverse_trees
performance: 0.340265448 s:   check_updates
performance: 0.381884638 s:   cache_tree_update
performance: 1.401562947 s:  unpack_trees
performance: 0.338687914 s:  write index, changed mask = 2e
performance: 0.411927922 s:traverse_trees
performance: 0.23335 s:check_updates
performance: 0.423697246 s:   unpack_trees
performance: 0.423708360 s:  diff-index
performance: 2.559524127 s: git command: git checkout -

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 cache-tree.c   | 2 ++
 unpack-trees.c | 9 -
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/cache-tree.c b/cache-tree.c
index 6b46711996..105f13806f 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -433,7 +433,9 @@ int cache_tree_update(struct index_state *istate, int flags)
 
if (i)
return i;
+   trace_performance_enter();
i = update_one(it, cache, entries, "", 0, , flags);
+   trace_performance_leave("cache_tree_update");
if (i < 0)
return i;
istate->cache_changed |= CACHE_TREE_CHANGED;
diff --git a/unpack-trees.c b/unpack-trees.c
index cd0680f11e..b237eaa0f2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -354,6 +354,7 @@ static int check_updates(struct unpack_trees_options *o)
struct checkout state = CHECKOUT_INIT;
int i;
 
+   trace_performance_enter();
state.force = 1;
state.quiet = 1;
state.refresh_cache = 1;
@@ -423,6 +424,7 @@ static int check_updates(struct unpack_trees_options *o)
errs |= finish_delayed_checkout();
if (o->update)
git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
+   trace_performance_leave("check_updates");
return errs != 0;
 }
 
@@ -1279,6 +1281,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
struct unpack_trees_options
if (len > MAX_UNPACK_TREES)
die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
 
+   trace_performance_enter();
memset(, 0, sizeof(el));
if (!core_apply_sparse_checkout || !o->update)
o->skip_sparse_checkout = 1;
@@ -1351,7 +1354,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
struct unpack_trees_options
}
}
 
-   if (traverse_trees(len, t, ) < 0)
+   trace_performance_enter();
+   ret = traverse_trees(len, t, );
+   trace_performance_leave("traverse_trees");
+   if (ret < 0)
goto return_failed;
}
 
@@ -1443,6 +1449,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
struct unpack_trees_options
o->src_index = NULL;
 
 done:
+   trace_performance_leave("unpack_trees");
clear_exclude_list();
return ret;
 
-- 
2.18.0.1004.g6639190530



[PATCH v4 4/5] unpack-trees: reduce malloc in cache-tree walk

2018-08-12 Thread Nguyễn Thái Ngọc Duy
This is a micro optimization that probably only shines on repos with
deep directory structure. Instead of allocating and freeing a new
cache_entry in every iteration, we reuse the last one and only update
the parts that are new each iteration.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 unpack-trees.c | 29 -
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 07456d0fb2..6deb04c163 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -694,6 +694,8 @@ static int traverse_by_cache_tree(int pos, int nr_entries, 
int nr_names,
 {
struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
struct unpack_trees_options *o = info->data;
+   struct cache_entry *tree_ce = NULL;
+   int ce_len = 0;
int i, d;
 
if (!o->merge)
@@ -708,30 +710,39 @@ static int traverse_by_cache_tree(int pos, int 
nr_entries, int nr_names,
 * get here in the first place.
 */
for (i = 0; i < nr_entries; i++) {
-   struct cache_entry *tree_ce;
-   int len, rc;
+   int new_ce_len, len, rc;
 
src[0] = o->src_index->cache[pos + i];
 
len = ce_namelen(src[0]);
-   tree_ce = xcalloc(1, cache_entry_size(len));
+   new_ce_len = cache_entry_size(len);
+
+   if (new_ce_len > ce_len) {
+   new_ce_len <<= 1;
+   tree_ce = xrealloc(tree_ce, new_ce_len);
+   memset(tree_ce, 0, new_ce_len);
+   ce_len = new_ce_len;
+
+   tree_ce->ce_flags = create_ce_flags(0);
+
+   for (d = 1; d <= nr_names; d++)
+   src[d] = tree_ce;
+   }
 
tree_ce->ce_mode = src[0]->ce_mode;
-   tree_ce->ce_flags = create_ce_flags(0);
tree_ce->ce_namelen = len;
oidcpy(_ce->oid, [0]->oid);
memcpy(tree_ce->name, src[0]->name, len + 1);
 
-   for (d = 1; d <= nr_names; d++)
-   src[d] = tree_ce;
-
rc = call_unpack_fn((const struct cache_entry * const *)src, o);
-   free(tree_ce);
-   if (rc < 0)
+   if (rc < 0) {
+   free(tree_ce);
return rc;
+   }
 
mark_ce_used(src[0], o);
}
+   free(tree_ce);
if (o->debug_unpack)
printf("Unpacked %d entries from %s to %s using cache-tree\n",
   nr_entries,
-- 
2.18.0.1004.g6639190530



[PATCH v4 5/5] unpack-trees: reuse (still valid) cache-tree from src_index

2018-08-12 Thread Nguyễn Thái Ngọc Duy
We do n-way merge by walking the source index and n trees at the same
time and add merge results to a new temporary index called o->result.
The merge result for any given path could be either

- keep_entry(): same old index entry in o->src_index is reused
- merged_entry(): either a new entry is added, or an existing one updated
- deleted_entry(): one entry from o->src_index is removed

For some reason [1] we keep making sure that the source index's
cache-tree is still valid if used by o->result: for all those
merged/deleted entries, we invalidate the same path in o->src_index,
so only cache-trees covering the "keep_entry" parts remain good.

Because of this, the cache-tree from o->src_index can be perfectly
reused in o->result. And in fact we already rely on this logic to
reuse untracked cache in edf3b90553 (unpack-trees: preserve index
extensions - 2017-05-08). Move the cache-tree to o->result before
doing cache_tree_update() to reduce hashing cost.

Since cache_tree_update() has risen up as one of the most expensive
parts in unpack_trees() after the last few patches. This does help
reduce unpack_trees() time significantly (on webkit.git):

before   after
  
0.080394752  0.051258167 s:  read cache .git/index
0.216010838  0.212106298 s:  preload index
0.008534301  0.280521764 s:  refresh index
0.251992198  0.218160442 s:   traverse_trees
0.377031383  0.374948191 s:   check_updates
0.372768105  0.037040114 s:   cache_tree_update
1.045887251  0.672031609 s:  unpack_trees
0.314983512  0.317456290 s:  write index, changed mask = 2e
0.062572653  0.038382654 s:traverse_trees
0.22544  0.42731 s:check_updates
0.073795585  0.050930053 s:   unpack_trees
0.073807557  0.051099735 s:  diff-index
1.938191592  1.614241153 s: git command: git checkout -

[1] I'm pretty sure the reason is an oversight in 34110cd4e3 (Make
'unpack_trees()' have a separate source and destination index -
2008-03-06). That patch aims to _not_ update the source index at
all. The invalidation should have been done on o->result in that
patch. But then there was no cache-tree on o->result even then so
it's pointless to do so.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 read-cache.c   | 2 ++
 unpack-trees.c | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 4fd35f4f37..2b5646ef26 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2763,4 +2763,6 @@ void move_index_extensions(struct index_state *dst, 
struct index_state *src)
 {
dst->untracked = src->untracked;
src->untracked = NULL;
+   dst->cache_tree = src->cache_tree;
+   src->cache_tree = NULL;
 }
diff --git a/unpack-trees.c b/unpack-trees.c
index 6deb04c163..d822662c75 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1570,6 +1570,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
struct unpack_trees_options
 
ret = check_updates(o) ? (-2) : 0;
if (o->dst_index) {
+   move_index_extensions(>result, o->src_index);
if (!ret) {
if (!o->result.cache_tree)
o->result.cache_tree = cache_tree();
@@ -1578,7 +1579,6 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
struct unpack_trees_options
  WRITE_TREE_SILENT |
  WRITE_TREE_REPAIR);
}
-   move_index_extensions(>result, o->src_index);
discard_index(o->dst_index);
*o->dst_index = o->result;
} else {
-- 
2.18.0.1004.g6639190530



[PATCH v4 1/5] trace.h: support nested performance tracing

2018-08-12 Thread Nguyễn Thái Ngọc Duy
Performance measurements are listed right now as a flat list, which is
fine when we measure big blocks. But when we start adding more and
more measurements, some of them could be just part of a bigger
measurement and a flat list gives a wrong impression that they are
executed at the same level instead of nested.

Add trace_performance_enter() and trace_performance_leave() to allow
indent these nested measurements. For now it does not help much
because the only nested thing is (lazy) name hash initialization
(e.g. called in diff-index from "git status"). This will help more
because I'm going to add some more tracing that's actually nested.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 diff-lib.c  |  4 +--
 dir.c   |  4 +--
 name-hash.c |  4 +--
 preload-index.c |  4 +--
 read-cache.c| 11 
 trace.c | 69 -
 trace.h | 15 +++
 7 files changed, 92 insertions(+), 19 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index a9f38eb5a3..1ffa22c882 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -518,8 +518,8 @@ static int diff_cache(struct rev_info *revs,
 int run_diff_index(struct rev_info *revs, int cached)
 {
struct object_array_entry *ent;
-   uint64_t start = getnanotime();
 
+   trace_performance_enter();
ent = revs->pending.objects;
if (diff_cache(revs, >item->oid, ent->name, cached))
exit(128);
@@ -528,7 +528,7 @@ int run_diff_index(struct rev_info *revs, int cached)
diffcore_fix_diff_index(>diffopt);
diffcore_std(>diffopt);
diff_flush(>diffopt);
-   trace_performance_since(start, "diff-index");
+   trace_performance_leave("diff-index");
return 0;
 }
 
diff --git a/dir.c b/dir.c
index 21e6f2520a..c5e9fc8cea 100644
--- a/dir.c
+++ b/dir.c
@@ -2263,11 +2263,11 @@ int read_directory(struct dir_struct *dir, struct 
index_state *istate,
   const char *path, int len, const struct pathspec *pathspec)
 {
struct untracked_cache_dir *untracked;
-   uint64_t start = getnanotime();
 
if (has_symlink_leading_path(path, len))
return dir->nr;
 
+   trace_performance_enter();
untracked = validate_untracked_cache(dir, len, pathspec);
if (!untracked)
/*
@@ -2302,7 +2302,7 @@ int read_directory(struct dir_struct *dir, struct 
index_state *istate,
dir->nr = i;
}
 
-   trace_performance_since(start, "read directory %.*s", len, path);
+   trace_performance_leave("read directory %.*s", len, path);
if (dir->untracked) {
static int force_untracked_cache = -1;
static struct trace_key trace_untracked_stats = 
TRACE_KEY_INIT(UNTRACKED_STATS);
diff --git a/name-hash.c b/name-hash.c
index 163849831c..1fcda73cb3 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -578,10 +578,10 @@ static void threaded_lazy_init_name_hash(
 
 static void lazy_init_name_hash(struct index_state *istate)
 {
-   uint64_t start = getnanotime();
 
if (istate->name_hash_initialized)
return;
+   trace_performance_enter();
hashmap_init(>name_hash, cache_entry_cmp, NULL, 
istate->cache_nr);
hashmap_init(>dir_hash, dir_entry_cmp, NULL, istate->cache_nr);
 
@@ -602,7 +602,7 @@ static void lazy_init_name_hash(struct index_state *istate)
}
 
istate->name_hash_initialized = 1;
-   trace_performance_since(start, "initialize name hash");
+   trace_performance_leave("initialize name hash");
 }
 
 /*
diff --git a/preload-index.c b/preload-index.c
index 4d08d44874..d7f7919ba2 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -78,7 +78,6 @@ static void preload_index(struct index_state *index,
 {
int threads, i, work, offset;
struct thread_data data[MAX_PARALLEL];
-   uint64_t start = getnanotime();
 
if (!core_preload_index)
return;
@@ -88,6 +87,7 @@ static void preload_index(struct index_state *index,
threads = 2;
if (threads < 2)
return;
+   trace_performance_enter();
if (threads > MAX_PARALLEL)
threads = MAX_PARALLEL;
offset = 0;
@@ -109,7 +109,7 @@ static void preload_index(struct index_state *index,
if (pthread_join(p->pthread, NULL))
die("unable to join threaded lstat");
}
-   trace_performance_since(start, "preload index");
+   trace_performance_leave("preload index");
 }
 #endif
 
diff --git a/read-cache.c b/read-cache.c
index e865254bea..4fd35f4f37 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1399,8 +1399,8 @@ int refresh_index(struct index_state *istate, unsigned 
int flags,
const char *typechange_fmt;
const char *added_fmt;
const char *unmerged_fmt;
-   uint64_t start = getnanotime();
 
+   trace_performance_enter();
modified_fmt = (in_porcelain 

[PATCH v4 3/5] unpack-trees: optimize walking same trees with cache-tree

2018-08-12 Thread Nguyễn Thái Ngọc Duy
In order to merge one or many trees with the index, unpack-trees code
walks multiple trees in parallel with the index and performs n-way
merge. If we find out at start of a directory that all trees are the
same (by comparing OID) and cache-tree happens to be available for
that directory as well, we could avoid walking the trees because we
already know what these trees contain: it's flattened in what's called
"the index".

The upside is of course a lot less I/O since we can potentially skip
lots of trees (think subtrees). We also save CPU because we don't have
to inflate and apply the deltas. The downside is of course more
fragile code since the logic in some functions are now duplicated
elsewhere.

"checkout -" with this patch on webkit.git (275k files):

baseline  new
  
0.056651714   0.080394752 s:  read cache .git/index
0.183101080   0.216010838 s:  preload index
0.008584433   0.008534301 s:  refresh index
0.633767589   0.251992198 s:   traverse_trees
0.340265448   0.377031383 s:   check_updates
0.381884638   0.372768105 s:   cache_tree_update
1.401562947   1.045887251 s:  unpack_trees
0.338687914   0.314983512 s:  write index, changed mask = 2e
0.411927922   0.062572653 s:traverse_trees
0.23335   0.22544 s:check_updates
0.423697246   0.073795585 s:   unpack_trees
0.423708360   0.073807557 s:  diff-index
2.559524127   1.938191592 s: git command: git checkout -

Another measurement from Ben's running "git checkout" with over 500k
trees (on the whole series):

baselinenew
  --
0.535510167 0.556558733 s: read cache .git/index
0.3057373   0.3147105   s: initialize name hash
0.0184082   0.023558433 s: preload index
0.086910967 0.089085967 s: refresh index
7.889590767 2.191554433 s: unpack trees
0.120760833 0.131941267 s: update worktree after a merge
2.2583504   2.572663167 s: repair cache-tree
0.8916137   0.959495233 s: write index, changed mask = 28
3.405199233 0.2710663   s: unpack trees
0.000999667 0.0021554   s: update worktree after a merge
3.4063306   0.273318333 s: diff-index
16.9524923  9.462943133 s: git command: git.exe checkout

This command calls unpack_trees() twice, the first time on 2way merge
and the second 1way merge. In both times, "unpack trees" time is
reduced to one third. Overall time reduction is not that impressive of
course because index operations take a big chunk. And there's that
repair cache-tree line.

PS. A note about cache-tree invalidation and the use of it in this
code.

We do invalidate cache-tree in _source_ index when we add new entries
to the (temporary) "result" index. But we also use the cache-tree from
source index in this optimization. Does this mean we end up having no
cache-tree in the source index to activate this optimization?

The answer is twisted: the order of finding a good cache-tree and
invalidating it matters. In this case we check for a good cache-tree
first in all_trees_same_as_cache_tree(), then we start to merge things
and potentially invalidate that same cache-tree in the process. Since
cache-tree invalidation happens after the optimization kicks in, we're
still good. But we may lose that cache-tree at the very first
call_unpack_fn() call in traverse_by_cache_tree().

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 unpack-trees.c | 127 +
 1 file changed, 127 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index b237eaa0f2..07456d0fb2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -644,6 +644,102 @@ static inline int are_same_oid(struct name_entry *name_j, 
struct name_entry *nam
return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
 }
 
+static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
+   struct name_entry *names,
+   struct traverse_info *info)
+{
+   struct unpack_trees_options *o = info->data;
+   int i;
+
+   if (!o->merge || dirmask != ((1 << n) - 1))
+   return 0;
+
+   for (i = 1; i < n; i++)
+   if (!are_same_oid(names, names + i))
+   return 0;
+
+   return cache_tree_matches_traversal(o->src_index->cache_tree, names, 
info);
+}
+
+static int index_pos_by_traverse_info(struct name_entry *names,
+ struct traverse_info *info)
+{
+   struct unpack_trees_options *o = info->data;
+   int len = traverse_path_len(info, names);
+   char *name = xmalloc(len + 1 /* slash */ + 1 /* NUL */);
+   int pos;
+
+   make_traverse_path(name, info, names);
+   name[len++] = '/';
+   name[len] = 

[PATCH v4 0/5] Speed up unpack_trees()

2018-08-12 Thread Nguyễn Thái Ngọc Duy
v4 has a bunch of changes

- 1/5 is a new one to show indented tracing. This way it's less
  misleading to read nested time measurements
- 3/5 now has the switch/restore cache_bottom logic. Junio suggested a
  check instead in his final note, but I think this is safer (yeah I'm
  scared too)
- the old 4/4 is dropped because
  - it assumes n-way logic
  - the visible time saving is not worth the tradeoff
  - Elijah gave me an idea to avoid add_index_entry() that I think
does not have n-way logic assumptions and gives better saving.
But it requires some more changes so I'm going to do it later
- 5/5 is also new and should help reduce cache_tree_update() cost.
  I wrote somewhere I was not going to work on this part, but it turns
  out just a couple lines, might as well do it now.

Interdiff

diff --git a/cache-tree.c b/cache-tree.c
index 0dbe10fc85..105f13806f 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -426,7 +426,6 @@ static int update_one(struct cache_tree *it,
 
 int cache_tree_update(struct index_state *istate, int flags)
 {
-   uint64_t start = getnanotime();
struct cache_tree *it = istate->cache_tree;
struct cache_entry **cache = istate->cache;
int entries = istate->cache_nr;
@@ -434,11 +433,12 @@ int cache_tree_update(struct index_state *istate, int 
flags)
 
if (i)
return i;
+   trace_performance_enter();
i = update_one(it, cache, entries, "", 0, , flags);
+   trace_performance_leave("cache_tree_update");
if (i < 0)
return i;
istate->cache_changed |= CACHE_TREE_CHANGED;
-   trace_performance_since(start, "repair cache-tree");
return 0;
 }
 
diff --git a/cache.h b/cache.h
index e6f7ee4b64..8b447652a7 100644
--- a/cache.h
+++ b/cache.h
@@ -673,7 +673,6 @@ extern int index_name_pos(const struct index_state *, const 
char *name, int name
 #define ADD_CACHE_JUST_APPEND 8/* Append only; 
tree.c::read_tree() */
 #define ADD_CACHE_NEW_ONLY 16  /* Do not replace existing ones */
 #define ADD_CACHE_KEEP_CACHE_TREE 32   /* Do not invalidate cache-tree */
-#define ADD_CACHE_SKIP_VERIFY_PATH 64  /* Do not verify path */
 extern int add_index_entry(struct index_state *, struct cache_entry *ce, int 
option);
 extern void rename_index_entry_at(struct index_state *, int pos, const char 
*new_name);
 
diff --git a/diff-lib.c b/diff-lib.c
index a9f38eb5a3..1ffa22c882 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -518,8 +518,8 @@ static int diff_cache(struct rev_info *revs,
 int run_diff_index(struct rev_info *revs, int cached)
 {
struct object_array_entry *ent;
-   uint64_t start = getnanotime();
 
+   trace_performance_enter();
ent = revs->pending.objects;
if (diff_cache(revs, >item->oid, ent->name, cached))
exit(128);
@@ -528,7 +528,7 @@ int run_diff_index(struct rev_info *revs, int cached)
diffcore_fix_diff_index(>diffopt);
diffcore_std(>diffopt);
diff_flush(>diffopt);
-   trace_performance_since(start, "diff-index");
+   trace_performance_leave("diff-index");
return 0;
 }
 
diff --git a/dir.c b/dir.c
index 21e6f2520a..c5e9fc8cea 100644
--- a/dir.c
+++ b/dir.c
@@ -2263,11 +2263,11 @@ int read_directory(struct dir_struct *dir, struct 
index_state *istate,
   const char *path, int len, const struct pathspec *pathspec)
 {
struct untracked_cache_dir *untracked;
-   uint64_t start = getnanotime();
 
if (has_symlink_leading_path(path, len))
return dir->nr;
 
+   trace_performance_enter();
untracked = validate_untracked_cache(dir, len, pathspec);
if (!untracked)
/*
@@ -2302,7 +2302,7 @@ int read_directory(struct dir_struct *dir, struct 
index_state *istate,
dir->nr = i;
}
 
-   trace_performance_since(start, "read directory %.*s", len, path);
+   trace_performance_leave("read directory %.*s", len, path);
if (dir->untracked) {
static int force_untracked_cache = -1;
static struct trace_key trace_untracked_stats = 
TRACE_KEY_INIT(UNTRACKED_STATS);
diff --git a/name-hash.c b/name-hash.c
index 163849831c..1fcda73cb3 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -578,10 +578,10 @@ static void threaded_lazy_init_name_hash(
 
 static void lazy_init_name_hash(struct index_state *istate)
 {
-   uint64_t start = getnanotime();
 
if (istate->name_hash_initialized)
return;
+   trace_performance_enter();
hashmap_init(>name_hash, cache_entry_cmp, NULL, 
istate->cache_nr);
hashmap_init(>dir_hash, dir_entry_cmp, NULL, istate->cache_nr);
 
@@ -602,7 +602,7 @@ static void lazy_init_name_hash(struct index_state *istate)
}
 
istate->name_hash_initialized = 1;
-   trace_performance_since(start, "initialize name hash");
+   trace_performance_leave("initialize name hash");
 }
 
 /*
diff 

Re: [PATCH v3] test_dir_is_empty: properly detect files with newline in name

2018-08-12 Thread Eric Sunshine
On Sun, Aug 12, 2018 at 2:33 AM William Chargin  wrote:
> > This is an abuse of test_must_fail() which is intended strictly for
> > testing 'git' invocations which might fail for reasons other than the
> > expected one (for instance, git might crash).
>
> Interesting. I didn't infer this from the docs on `test_must_fail` in
> `t/test-lib-functions.sh`. Sharness, which is supposed to be independent
> of Git, explicitly says to use `test_must_fail` instead of `!`.

This sort of knowledge is, perhaps unfortunately, spread around in too
many places. In this case, it's mentioned in t/README. The relevant
excerpt:

Don't use '! git cmd' when you want to make sure the git command
exits with failure in a controlled way by calling "die()".
Instead, use 'test_must_fail git cmd'.  This will signal a failure
if git dies in an unexpected way (e.g. segfault).

On the other hand, don't use test_must_fail for running regular
platform commands; just use '! cmd'.  We are not in the business
of verifying that the world given to us sanely works.

It probably wouldn't hurt to update the comment block above
test_must_fail() in t/test-functions-lib.sh.

> I also see other uses of `test_must_fail` throughout the codebase: e.g.,
> with `kill`, `test`, `test_cmp`, `run_sub_test_lib_test`, etc. as the
> target command. Are these invocations in error?

Looks that way, even run_sub_test_lib_test().


Re: [PATCH v3] test_dir_is_empty: properly detect files with newline in name

2018-08-12 Thread William Chargin
Thanks for the review.

> We usually avoid "touch" unless the timestamp of the file is
> significant.

Makes sense. Will change as you suggest.

> This is an abuse of test_must_fail() which is intended strictly for
> testing 'git' invocations which might fail for reasons other than the
> expected one (for instance, git might crash).

Interesting. I didn't infer this from the docs on `test_must_fail` in
`t/test-lib-functions.sh`. Sharness, which is supposed to be independent
of Git, explicitly says to use `test_must_fail` instead of `!`.
(Admittedly, the implementations are different, but only slightly:
within Git, a Valgrind error 126 is a failure, not success.)

I also see other uses of `test_must_fail` throughout the codebase: e.g.,
with `kill`, `test`, `test_cmp`, `run_sub_test_lib_test`, etc. as the
target command. Are these invocations in error?

(I'm nevertheless happy to change this as you suggest.)

I'll make these changes locally and hold on to the patch, waiting for
others' input.

Best,
WC


Re: [PATCH v3] test_dir_is_empty: properly detect files with newline in name

2018-08-12 Thread Eric Sunshine
On Sun, Aug 12, 2018 at 12:07 AM William Chargin  wrote:
> While the `test_dir_is_empty` function appears correct in most normal
> use cases, it can fail when filenames contain newlines. This patch
> changes the implementation to check that the output of `ls -a` has at
> most two lines (for `.` and `..`), which should be better behaved.
>
> The newly added unit test fails before this change and passes after it.
>
> Signed-off-by: William Chargin 
> ---
> diff --git a/t/t-basic.sh b/t/t-basic.sh
> @@ -821,6 +821,37 @@ test_expect_success 'tests clean up even on failures' "
> +test_expect_success FUNNYNAMES \
> +   'test_dir_is_empty behaves even in pathological cases' "
> +   test_expect_success 'should fail with a normal filename' '
> +   mkdir nonempty_dir &&
> +   touch nonempty_dir/some_file &&

We usually avoid "touch" unless the timestamp of the file is
significant. In this case, it isn't, so it would be more idiomatic to
say simply:

>nonempty_dir/some_file &&

> +   test_must_fail test_dir_is_empty nonempty_dir

This is an abuse of test_must_fail() which is intended strictly for
testing 'git' invocations which might fail for reasons other than the
expected one (for instance, git might crash). Instead, you should just
use '!', like this:

! test_dir_is_empty nonempty_dir

> +   test_expect_success 'should fail with dot-newline-dot filename' '
> +   mkdir pathological_dir &&
> +   touch \"pathological_dir/.
> +   .\" &&
> +   test_must_fail test_dir_is_empty pathological_dir

Same comments as above.