Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)

2018-07-28 Thread Duy Nguyen
On Fri, Jul 27, 2018 at 07:52:33PM +0200, Duy Nguyen wrote:
> Just FYI I'm still trying to reduce execution time further and this
> change happens to half traverse_trees() time (which is a huge deal)
> 
> diff --git a/unpack-trees.c b/unpack-trees.c
> index f0be9f298d..a2e63ad5bf 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -201,7 +201,7 @@ static int do_add_entry(struct
> unpack_trees_options *o, struct cache_entry *ce,
> 
> ce->ce_flags = (ce->ce_flags & ~clear) | set;
> return add_index_entry(&o->result, ce,
> -  ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
> +  ADD_CACHE_JUST_APPEND |
> ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
>  }
> 
>  static struct cache_entry *dup_entry(const struct cache_entry *ce)
> 
> It's probably not the right thing to do of course. But perhaps we
> could do something in that direction (e.g. validate everything at the
> end of traverse_by_cache_tree...)

It's just too much computation that could be reduced. The following
patch gives more or less the same performance gain as adding
ADD_CACHE_JUST_APPEND (traverse_trees() time cut down by half).

Of these, the walking cache-tree inside add_index_entry_with_check()
is most expensive and we probably could just walk the cache-tree in
traverse_by_cache_tree() loop and do the invalidation there instead.

-- 8< --
diff --git a/cache.h b/cache.h
index 8b447652a7..e6f7ee4b64 100644
--- a/cache.h
+++ b/cache.h
@@ -673,6 +673,7 @@ extern int index_name_pos(const struct index_state *, const 
char *name, int name
 #define ADD_CACHE_JUST_APPEND 8/* Append only; 
tree.c::read_tree() */
 #define ADD_CACHE_NEW_ONLY 16  /* Do not replace existing ones */
 #define ADD_CACHE_KEEP_CACHE_TREE 32   /* Do not invalidate cache-tree */
+#define ADD_CACHE_SKIP_VERIFY_PATH 64  /* Do not verify path */
 extern int add_index_entry(struct index_state *, struct cache_entry *ce, int 
option);
 extern void rename_index_entry_at(struct index_state *, int pos, const char 
*new_name);
 
diff --git a/read-cache.c b/read-cache.c
index e865254bea..b0b5df5de7 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1170,6 +1170,7 @@ static int add_index_entry_with_check(struct index_state 
*istate, struct cache_e
int ok_to_add = option & ADD_CACHE_OK_TO_ADD;
int ok_to_replace = option & ADD_CACHE_OK_TO_REPLACE;
int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
+   int skip_verify_path = option & ADD_CACHE_SKIP_VERIFY_PATH;
int new_only = option & ADD_CACHE_NEW_ONLY;
 
if (!(option & ADD_CACHE_KEEP_CACHE_TREE))
@@ -1210,7 +1211,7 @@ static int add_index_entry_with_check(struct index_state 
*istate, struct cache_e
 
if (!ok_to_add)
return -1;
-   if (!verify_path(ce->name, ce->ce_mode))
+   if (!skip_verify_path && !verify_path(ce->name, ce->ce_mode))
return error("Invalid path '%s'", ce->name);
 
if (!skip_df_check &&
diff --git a/unpack-trees.c b/unpack-trees.c
index f2a2db6ab8..ff6a0f2bd3 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -201,6 +201,7 @@ static int do_add_entry(struct unpack_trees_options *o, 
struct cache_entry *ce,
 
ce->ce_flags = (ce->ce_flags & ~clear) | set;
return add_index_entry(&o->result, ce,
+  o->extra_add_index_flags |
   ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
 }
 
@@ -678,6 +679,25 @@ static int traverse_by_cache_tree(int pos, int nr_entries, 
int nr_names,
const char *first_name = o->src_index->cache[pos]->name;
int dirlen = (strrchr(first_name, '/') - first_name)+1;
 
+   /*
+* Try to keep add_index_entry() as fast as possible since
+* we're going to do a lot of them.
+*
+* Skipping verify_path() should totally be safe because these
+* paths are from the source index, which must have been
+* verified.
+*
+* Skipping D/F and cache-tree validation checks is trickier
+* because it assumes what n-merge code would do when all
+* trees and the index are the same. We probably could just
+* optimize those code instead (e.g. we don't invalidate that
+* many cache-tree, but the searching for them is very
+* expensive).
+*/
+   o->extra_add_index_flags = ADD_CACHE_SKIP_DFCHECK;
+   o->extra_add_index_flags |= ADD_CACHE_KEEP_CACHE_TREE;
+   o->extra_add_index_flags |= ADD_CACHE_SKIP_VERIFY_PATH;
+
/*
 * Do what unpack_callback() and unpack_nondirectories() normally
 * do. But we do it in one function call (for even nested trees)
@@ -721,6 +741,7 @@ static int traverse_by_cache_tree(int pos, int nr_entries, 
int nr_names,
 
mark_ce_used(src[0], o);
}
+   o->extra_add_index_flags = 0;
free(tree_ce);
trace_printf("Quick traverse over %d entries from 

Re: Git clone and case sensitivity

2018-07-28 Thread Duy Nguyen
On Sat, Jul 28, 2018 at 11:57 AM Jeff King  wrote:
> > +static int has_duplicate_icase_entries(struct index_state *istate)
> > +{
> > + struct string_list list = STRING_LIST_INIT_NODUP;
> > + int i;
> > + int found = 0;
> > +
> > + for (i = 0; i < istate->cache_nr; i++)
> > + string_list_append(&list, istate->cache[i]->name);
> > +
> > + list.cmp = strcasecmp;
> > + string_list_sort(&list);
> > +
> > + for (i = 1; i < list.nr; i++) {
> > + if (strcasecmp(list.items[i-1].string,
> > +list.items[i].string))
> > + continue;
> > + found = 1;
> > + break;
> > + }
> > + string_list_clear(&list, 0);
> > +
> > + return found;
> > +}
>
> strcasecmp() will only catch a subset of the cases. We really need to
> follow the same folding rules that the filesystem would.

True. But that's how we handle case insensitivity internally. If a
filesytem has more sophisticated folding rules then git will not work
well on that one anyway.

> For the case of clone, I actually wonder if we could detect during the
> checkout step that a file already exists. Since we know that the
> directory we started with was empty, then if it does, either:
>
>   - there's some funny case-folding going on that means two paths in the
> repository map to the same name in the filesystem; or
>
>   - somebody else is writing to the directory at the same time as us

This is exactly what my first patch does (minus the sparse checkout
part).  But without knowing the exact folding rules, I don't think we
can locate this "somebody else" who wrote the first path. So if N
paths are treated the same by this filesystem, we could only report
N-1 of them.

If we want to report just one path when this happens though, then this
works quite well.
-- 
Duy


Re: [PATCH] t5562: avoid non-portable "export FOO=bar" construct

2018-07-28 Thread Eric Sunshine
On Sat, Jul 28, 2018 at 6:51 PM Ramsay Jones
 wrote:
> Commit 6c213e863a ("http-backend: respect CONTENT_LENGTH for
> receive-pack", 2018-07-27) adds a test which uses the non-portable
> export construct. Replace it with "FOO=bar && export FOO" instead.
>
> Signed-off-by: Ramsay Jones 
> ---
> Could you please put this on top of the 'mk/http-backend-content-length'
> branch. This test tickles the new "export FOO=bar" check, so the test
> suite does not run otherwise.

The "export FOO=bar" check comes from 99680d (test-lint: detect
'export FOO=bar', 2013-07-08), so is not exactly new.

Perhaps you were thinking of [1] a0a630192d
(t/check-non-portable-shell: detect "FOO=bar shell_func", 2018-07-13),
when you wrote this, though it is not related to "export FOO=bar"
detection.

The patch itself looks fine, by the way.

[1]: 
https://public-inbox.org/git/20180713055205.32351-5-sunsh...@sunshineco.com/


[PATCH v4 1/4] t7501: add coverage for flags which imply dry runs

2018-07-28 Thread Samuel Lijin
The behavior of git commit, when doing a dry run, changes if there are
unresolved/resolved merge conflicts, but the test suite currently only
asserts that `git commit --dry-run` succeeds when all merge conflicts
are resolved.

Add tests to document the behavior of all flags (i.e. `--dry-run`,
`--short`, `--porcelain`, and `--long`) which imply a dry run when (1)
there are only unresolved merge conflicts, (2) when there are both
unresolved and resolved merge conflicts, and (3) when all merge
conflicts are resolved.

When testing behavior involving resolved merge conflicts, resolve merge
conflicts by replacing each merge conflict with completely new contents,
rather than choosing the contents associated with one of the parent
commits, since the latter decision has no bearing on the behavior of a
dry run commit invocation.

Verify that a dry run invocation of git commit does not create a new
commit by asserting that HEAD has not changed, instead of by crafting
the commit.

Signed-off-by: Samuel Lijin 
---
 t/t7501-commit.sh | 146 +-
 1 file changed, 132 insertions(+), 14 deletions(-)

diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh
index 9dbbd01fc..e49dfd0a2 100755
--- a/t/t7501-commit.sh
+++ b/t/t7501-commit.sh
@@ -664,24 +664,142 @@ test_expect_success '--only works on to-be-born branch' '
test_cmp expected actual
 '
 
-test_expect_success '--dry-run with conflicts fixed from a merge' '
-   # setup two branches with conflicting information
-   # in the same file, resolve the conflict,
-   # call commit with --dry-run
-   echo "Initial contents, unimportant" >test-file &&
-   git add test-file &&
+test_expect_success 'prepare commits that can be used to trigger a merge 
conflict' '
+   # setup two branches with conflicting contents in two paths
+   echo "Initial contents, unimportant" | tee test-file1 test-file2 &&
+   git add test-file1 test-file2 &&
git commit -m "Initial commit" &&
-   echo "commit-1-state" >test-file &&
-   git commit -m "commit 1" -i test-file &&
+   echo "commit-1-state" | tee test-file1 test-file2 &&
+   git commit -m "commit 1" -i test-file1 test-file2 &&
git tag commit-1 &&
git checkout -b branch-2 HEAD^1 &&
-   echo "commit-2-state" >test-file &&
-   git commit -m "commit 2" -i test-file &&
-   ! $(git merge --no-commit commit-1) &&
-   echo "commit-2-state" >test-file &&
-   git add test-file &&
+   echo "commit-2-state" | tee test-file1 test-file2 &&
+   git commit -m "commit 2" -i test-file1 test-file2 &&
+   git tag commit-2
+'
+
+test_expect_success '--dry-run with only unresolved merge conflicts' '
+   git reset --hard commit-2 &&
+   test_must_fail git merge --no-commit commit-1 &&
+   test_must_fail git commit --dry-run &&
+   git rev-parse commit-2 >expected &&
+   git rev-parse HEAD >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success '--short with only unresolved merge conflicts' '
+   git reset --hard commit-2 &&
+   test_must_fail git merge --no-commit commit-1 &&
+   test_must_fail git commit --short &&
+   git rev-parse commit-2 >expected &&
+   git rev-parse HEAD >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success '--porcelain with only unresolved merge conflicts' '
+   git reset --hard commit-2 &&
+   test_must_fail git merge --no-commit commit-1 &&
+   test_must_fail git commit --porcelain &&
+   git rev-parse commit-2 >expected &&
+   git rev-parse HEAD >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success '--long with only unresolved merge conflicts' '
+   git reset --hard commit-2 &&
+   test_must_fail git merge --no-commit commit-1 &&
+   test_must_fail git commit --long &&
+   git rev-parse commit-2 >expected &&
+   git rev-parse HEAD >actual &&
+   test_cmp expected actual
+'
+
+test_expect_failure '--dry-run with resolved and unresolved merge conflicts' '
+   git reset --hard commit-2 &&
+   test_must_fail git merge --no-commit commit-1 &&
+   echo "resolve one merge conflict" >test-file1 &&
+   git add test-file1 &&
+   test_must_fail git commit --dry-run &&
+   git rev-parse commit-2 >expected &&
+   git rev-parse HEAD >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success '--short with resolved and unresolved merge conflicts' '
+   git reset --hard commit-2 &&
+   test_must_fail git merge --no-commit commit-1 &&
+   echo "resolve one merge conflict" >test-file1 &&
+   git add test-file1 &&
+   test_must_fail git commit --short &&
+   git rev-parse commit-2 >expected &&
+   git rev-parse HEAD >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success '--porcelain with resolved and unresolved merge conflicts' 
'
+   git reset --hard commit-2 &&
+   test_must_fail git merge --no-c

[PATCH v4 0/4] Rerolling patch series to fix t7501

2018-07-28 Thread Samuel Lijin
Following up on Junio's review from last time.

Samuel Lijin (4):
  t7501: add coverage for flags which imply dry runs
  wt-status: rename commitable to committable
  wt-status: teach wt_status_collect about merges in progress
  commit: fix exit code when doing a dry run

 builtin/commit.c  |  32 +++---
 ref-filter.c  |   3 +-
 t/t7501-commit.sh | 150 ---
 wt-status.c   | 258 --
 wt-status.h   |  13 +--
 5 files changed, 298 insertions(+), 158 deletions(-)

-- 
2.18.0



[PATCH v4 2/4] wt-status: rename commitable to committable

2018-07-28 Thread Samuel Lijin
Fix a typo in the name of the committable bit in wt_status_state.

Signed-off-by: Samuel Lijin 
---
 builtin/commit.c | 18 +-
 wt-status.c  | 10 +-
 wt-status.h  |  2 +-
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 158e3f843..32f9db33b 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -507,7 +507,7 @@ static int run_status(FILE *fp, const char *index_file, 
const char *prefix, int
wt_status_collect(s);
wt_status_print(s);
 
-   return s->commitable;
+   return s->committable;
 }
 
 static int is_a_merge(const struct commit *current_head)
@@ -653,7 +653,7 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
 {
struct stat statbuf;
struct strbuf committer_ident = STRBUF_INIT;
-   int commitable;
+   int committable;
struct strbuf sb = STRBUF_INIT;
const char *hook_arg1 = NULL;
const char *hook_arg2 = NULL;
@@ -870,7 +870,7 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
 
saved_color_setting = s->use_color;
s->use_color = 0;
-   commitable = run_status(s->fp, index_file, prefix, 1, s);
+   committable = run_status(s->fp, index_file, prefix, 1, s);
s->use_color = saved_color_setting;
} else {
struct object_id oid;
@@ -888,7 +888,7 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
for (i = 0; i < active_nr; i++)
if (ce_intent_to_add(active_cache[i]))
ita_nr++;
-   commitable = active_nr - ita_nr > 0;
+   committable = active_nr - ita_nr > 0;
} else {
/*
 * Unless the user did explicitly request a submodule
@@ -904,7 +904,7 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
if (ignore_submodule_arg &&
!strcmp(ignore_submodule_arg, "all"))
flags.ignore_submodules = 1;
-   commitable = index_differs_from(parent, &flags, 1);
+   committable = index_differs_from(parent, &flags, 1);
}
}
strbuf_release(&committer_ident);
@@ -916,7 +916,7 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
 * explicit --allow-empty. In the cherry-pick case, it may be
 * empty due to conflict resolution, which the user should okay.
 */
-   if (!commitable && whence != FROM_MERGE && !allow_empty &&
+   if (!committable && whence != FROM_MERGE && !allow_empty &&
!(amend && is_a_merge(current_head))) {
s->display_comment_prefix = old_display_comment_prefix;
run_status(stdout, index_file, prefix, 0, s);
@@ -1186,14 +1186,14 @@ static int parse_and_validate_options(int argc, const 
char *argv[],
 static int dry_run_commit(int argc, const char **argv, const char *prefix,
  const struct commit *current_head, struct wt_status 
*s)
 {
-   int commitable;
+   int committable;
const char *index_file;
 
index_file = prepare_index(argc, argv, prefix, current_head, 1);
-   commitable = run_status(stdout, index_file, prefix, 0, s);
+   committable = run_status(stdout, index_file, prefix, 0, s);
rollback_index_files();
 
-   return commitable ? 0 : 1;
+   return committable ? 0 : 1;
 }
 
 define_list_config_array_extra(color_status_slots, {"added"});
diff --git a/wt-status.c b/wt-status.c
index 8827a256d..18ea333a5 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -773,7 +773,7 @@ static void wt_longstatus_print_updated(struct wt_status *s)
continue;
if (!shown_header) {
wt_longstatus_print_cached_header(s);
-   s->commitable = 1;
+   s->committable = 1;
shown_header = 1;
}
wt_longstatus_print_change_data(s, WT_STATUS_UPDATED, it);
@@ -1008,7 +1008,7 @@ static void wt_longstatus_print_verbose(struct wt_status 
*s)
rev.diffopt.use_color = 0;
wt_status_add_cut_line(s->fp);
}
-   if (s->verbose > 1 && s->commitable) {
+   if (s->verbose > 1 && s->committable) {
/* print_updated() printed a header, so do we */
if (s->fp != stdout)
wt_longstatus_print_trailer(s);
@@ -1089,7 +1089,7 @@ static void show_merge_in_progress(struct wt_status *s,
 _("  (use \"git merge --abort\" to 
abort the merge)"));
}
} else {
-   s-> commitable = 1;
+

[PATCH v4 4/4] commit: fix exit code when doing a dry run

2018-07-28 Thread Samuel Lijin
In wt-status.c, the s->committable bit is set only in the call tree of
wt_longstatus_print(), and it is not always set correctly. This means
that in normal cases, if there are changes to be committed, or if there
is a merge in progress and all conflicts have been resolved, `--dry-run`
and `--long` return the correct exit code but `--short` and
`--porcelain` do not, even though all four flags imply dry run.
Moreover, if there is a merge in progress and some but not all conflicts
have been resolved, `--short` and `--porcelain` only return the correct
exit code by coincidence (because the codepaths they follow never touch
the committable bit), whereas `--dry-run` and `--long` return the wrong
exit code.

Teach wt_status_collect() to set s->committable correctly (if a merge is
in progress, committable should be set iff there are no unmerged
changes; otherwise, committable should be set iff there are changes in
the index) so that all four flags which imply dry runs return the
correct exit code in the above described situations and mark the
documenting tests as fixed.

Use the index_status field in the wt_status_change_data structs in
has_unmerged() to determine whether or not there are unmerged paths,
instead of the stagemask field, to improve readability.

Also stop setting s->committable in wt_longstatus_print_updated() and
show_merge_in_progress(), and const-ify wt_status_state in the method
signatures in those callpaths.

Signed-off-by: Samuel Lijin 
---
 t/t7501-commit.sh | 12 +++
 wt-status.c   | 80 +--
 2 files changed, 55 insertions(+), 37 deletions(-)

diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh
index e49dfd0a2..6dba526e6 100755
--- a/t/t7501-commit.sh
+++ b/t/t7501-commit.sh
@@ -99,12 +99,12 @@ test_expect_success '--dry-run with stuff to commit returns 
ok' '
git commit -m next -a --dry-run
 '
 
-test_expect_failure '--short with stuff to commit returns ok' '
+test_expect_success '--short with stuff to commit returns ok' '
echo bongo bongo bongo >>file &&
git commit -m next -a --short
 '
 
-test_expect_failure '--porcelain with stuff to commit returns ok' '
+test_expect_success '--porcelain with stuff to commit returns ok' '
echo bongo bongo bongo >>file &&
git commit -m next -a --porcelain
 '
@@ -714,7 +714,7 @@ test_expect_success '--long with only unresolved merge 
conflicts' '
test_cmp expected actual
 '
 
-test_expect_failure '--dry-run with resolved and unresolved merge conflicts' '
+test_expect_success '--dry-run with resolved and unresolved merge conflicts' '
git reset --hard commit-2 &&
test_must_fail git merge --no-commit commit-1 &&
echo "resolve one merge conflict" >test-file1 &&
@@ -747,7 +747,7 @@ test_expect_success '--porcelain with resolved and 
unresolved merge conflicts' '
test_cmp expected actual
 '
 
-test_expect_failure '--long with resolved and unresolved merge conflicts' '
+test_expect_success '--long with resolved and unresolved merge conflicts' '
git reset --hard commit-2 &&
test_must_fail git merge --no-commit commit-1 &&
echo "resolve one merge conflict" >test-file1 &&
@@ -769,7 +769,7 @@ test_expect_success '--dry-run with only resolved merge 
conflicts' '
test_cmp expected actual
 '
 
-test_expect_failure '--short with only resolved merge conflicts' '
+test_expect_success '--short with only resolved merge conflicts' '
git reset --hard commit-2 &&
test_must_fail git merge --no-commit commit-1 &&
echo "resolve all merge conflicts" | tee test-file1 test-file2 &&
@@ -780,7 +780,7 @@ test_expect_failure '--short with only resolved merge 
conflicts' '
test_cmp expected actual
 '
 
-test_expect_failure '--porcelain with only resolved merge conflicts' '
+test_expect_success '--porcelain with only resolved merge conflicts' '
git reset --hard commit-2 &&
test_must_fail git merge --no-commit commit-1 &&
echo "resolve all merge conflicts" | tee test-file1 test-file2 &&
diff --git a/wt-status.c b/wt-status.c
index af83fae68..fc239f61c 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -724,6 +724,38 @@ static void wt_status_collect_untracked(struct wt_status 
*s)
s->untracked_in_ms = (getnanotime() - t_begin) / 100;
 }
 
+static int has_unmerged(const struct wt_status *s)
+{
+   int i;
+
+   for (i = 0; i < s->change.nr; i++) {
+   struct wt_status_change_data *d = (s->change.items[i]).util;
+   if (d->index_status == DIFF_STATUS_UNMERGED)
+   return 1;
+   }
+   return 0;
+}
+
+static void wt_status_mark_committable(
+   struct wt_status *s, const struct wt_status_state *state)
+{
+   int i;
+
+   if (state->merge_in_progress) {
+   s->committable = !has_unmerged(s);
+   return;
+   }
+
+   for (i = 0; i < s->change.nr; i++) {
+   

[PATCH v4 3/4] wt-status: teach wt_status_collect about merges in progress

2018-07-28 Thread Samuel Lijin
To fix the breakages documented by t7501, the next patch in this series
will teach wt_status_collect() how to set the committable bit, instead
of having wt_longstatus_print_updated() and show_merge_in_progress() set
it (which is what currently happens). To set the committable bit
correctly, however, wt_status_collect() needs to know whether or not
there is a merge in progress (if a merge is in progress, the bit (1)
should not be set if there are unresolved merge conflicts and (2) should
be set even if the index is the same as HEAD), so teach its (two)
callers to create, initialize, and pass in
instances of wt_status_state, which records this metadata.

Since wt_longstatus_print() and show_merge_in_progress() are in the same
callpaths and currently create and init copies of wt_status_state,
remove that logic and instead pass wt_status_state through.

Make wt_status_get_state() easier to use, add a helper method to clean up
wt_status_state, const-ify as many struct pointers in method signatures
as possible, and add a FIXME for a struct pointer which should be const
but isn't (that this patch series will not address).

Signed-off-by: Samuel Lijin 
---
gitster: I kept the FIXME around because it wasn't clear whether or not
you were opposed to it. For what it's worth, there are only two
callsites that can't be const-ified because of this one item.

 builtin/commit.c |  14 ++--
 ref-filter.c |   3 +-
 wt-status.c  | 178 +++
 wt-status.h  |  11 +--
 4 files changed, 105 insertions(+), 101 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 32f9db33b..dd3e83053 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -485,6 +485,7 @@ static const char *prepare_index(int argc, const char 
**argv, const char *prefix
 static int run_status(FILE *fp, const char *index_file, const char *prefix, 
int nowarn,
  struct wt_status *s)
 {
+   struct wt_status_state state;
struct object_id oid;
 
if (s->relative_paths)
@@ -504,8 +505,10 @@ static int run_status(FILE *fp, const char *index_file, 
const char *prefix, int
s->status_format = status_format;
s->ignore_submodule_arg = ignore_submodule_arg;
 
-   wt_status_collect(s);
-   wt_status_print(s);
+   wt_status_get_state(s, &state);
+   wt_status_collect(s, &state);
+   wt_status_print(s, &state);
+   wt_status_clear_state(&state);
 
return s->committable;
 }
@@ -1295,6 +1298,7 @@ int cmd_status(int argc, const char **argv, const char 
*prefix)
static int no_renames = -1;
static const char *rename_score_arg = (const char *)-1;
static struct wt_status s;
+   struct wt_status_state state;
int fd;
struct object_id oid;
static struct option builtin_status_options[] = {
@@ -1379,7 +1383,8 @@ int cmd_status(int argc, const char **argv, const char 
*prefix)
s.rename_score = parse_rename_score(&rename_score_arg);
}
 
-   wt_status_collect(&s);
+   wt_status_get_state(&s, &state);
+   wt_status_collect(&s, &state);
 
if (0 <= fd)
update_index_if_able(&the_index, &index_lock);
@@ -1387,7 +1392,8 @@ int cmd_status(int argc, const char **argv, const char 
*prefix)
if (s.relative_paths)
s.prefix = prefix;
 
-   wt_status_print(&s);
+   wt_status_print(&s, &state);
+   wt_status_clear_state(&state);
return 0;
 }
 
diff --git a/ref-filter.c b/ref-filter.c
index 492f2b770..bc9b6b274 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1381,8 +1381,7 @@ char *get_head_description(void)
 {
struct strbuf desc = STRBUF_INIT;
struct wt_status_state state;
-   memset(&state, 0, sizeof(state));
-   wt_status_get_state(&state, 1);
+   wt_status_get_state(NULL, &state);
if (state.rebase_in_progress ||
state.rebase_interactive_in_progress) {
if (state.branch)
diff --git a/wt-status.c b/wt-status.c
index 18ea333a5..af83fae68 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -33,7 +33,7 @@ static char default_wt_status_colors[][COLOR_MAXLEN] = {
GIT_COLOR_NIL,/* WT_STATUS_ONBRANCH */
 };
 
-static const char *color(int slot, struct wt_status *s)
+static const char *color(int slot, const struct wt_status *s)
 {
const char *c = "";
if (want_color(s->use_color))
@@ -43,7 +43,7 @@ static const char *color(int slot, struct wt_status *s)
return c;
 }
 
-static void status_vprintf(struct wt_status *s, int at_bol, const char *color,
+static void status_vprintf(const struct wt_status *s, int at_bol, const char 
*color,
const char *fmt, va_list ap, const char *trail)
 {
struct strbuf sb = STRBUF_INIT;
@@ -89,7 +89,7 @@ static void status_vprintf(struct wt_status *s, int at_bol, 
const char *color,
strbuf_release(&sb);
 }
 
-void status_printf_ln(struct wt_status *s, cons

[PATCH] t5562: avoid non-portable "export FOO=bar" construct

2018-07-28 Thread Ramsay Jones


Commit 6c213e863a ("http-backend: respect CONTENT_LENGTH for
receive-pack", 2018-07-27) adds a test which uses the non-portable
export construct. Replace it with "FOO=bar && export FOO" instead.

Signed-off-by: Ramsay Jones 
---

Hi Junio,

Could you please put this on top of the 'mk/http-backend-content-length'
branch. This test tickles the new "export FOO=bar" check, so the test
suite does not run otherwise.

[If Max needs to re-roll that patch series, then he can squash this in.]

BTW, t3404.#4 fails for me, but I think you are already aware of that
test failure, right?

Thanks!

ATB,
Ramsay Jones

 t/t5562-http-backend-content-length.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t5562-http-backend-content-length.sh 
b/t/t5562-http-backend-content-length.sh
index 057dcb85d6..43570ce120 100755
--- a/t/t5562-http-backend-content-length.sh
+++ b/t/t5562-http-backend-content-length.sh
@@ -45,7 +45,8 @@ ssize_b100dots() {
 }
 
 test_expect_success 'setup' '
-   export HTTP_CONTENT_ENCODING="identity" &&
+   HTTP_CONTENT_ENCODING="identity" &&
+   export HTTP_CONTENT_ENCODING &&
git config http.receivepack true &&
test_commit c0 &&
test_commit c1 &&
-- 
2.18.0


[PATCH] doc: fix want-capability separator

2018-07-28 Thread Masaya Suzuki
Unlike ref advertisement, client capabilities and the first want are
separated by SP, not NUL, in the implementation. Fix the documentation
to align with the implementation. pack-protocol.txt is already fixed.

Signed-off-by: Masaya Suzuki 
---
 Documentation/technical/http-protocol.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/http-protocol.txt 
b/Documentation/technical/http-protocol.txt
index 64f49d0bb..9c5b6f0fa 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -338,11 +338,11 @@ server advertises capability `allow-tip-sha1-in-want` or
   request_end
   request_end   =  "" / "done"
 
-  want_list =  PKT-LINE(want NUL cap_list LF)
+  want_list =  PKT-LINE(want SP cap_list LF)
   *(want_pkt)
   want_pkt  =  PKT-LINE(want LF)
   want  =  "want" SP id
-  cap_list  =  *(SP capability) SP
+  cap_list  =  capability *(SP capability)
 
   have_list =  *PKT-LINE("have" SP id LF)
 
-- 
2.18.0



Re: Proposed approaches to supporting HTTP remotes in "git archive"

2018-07-28 Thread brian m. carlson
On Fri, Jul 27, 2018 at 02:47:00PM -0700, Josh Steadmon wrote:
> ## Use git-upload-archive
> 
> This approach requires adding support for the git-upload-archive
> endpoint to the HTTP backend. Clients will connect to the remote
> server's git-upload-archive service and the server will generate the
> archive which is then delivered to the client.
> 
> ### Benefits
> 
> * Matches existing "git archive" behavior for other remotes.
> 
> * Requires less bandwidth to send a compressed archive than a shallow
>   clone.
> 
> * Resulting archive does not depend in any way on the client
>   implementation.
> 
> ### Drawbacks
> 
> * Implementation is more complicated; it will require changes to (at
>   least) builtin/archive.c, http-backend.c, and
>   builtin/upload-archive.c.
> 
> * Generates more CPU load on the server when compressing archives. This
>   is potentially a DoS vector.
> 
> * Does not allow archiving from servers that don't support the
>   git-upload-archive service.

I happen to like this option because it has the potential to be driven
by a non-git client (e.g. a curl invocation).  That would be enormously
valuable, especially in cases where authentication isn't desired or an
SSH key isn't a good form of authentication.

I'm not really worried about the DoS vector because an implementation is
almost certainly going to support both SSH and HTTPS or neither, and the
DoS potential is the same either way.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Git clone and case sensitivity

2018-07-28 Thread brian m. carlson
On Sat, Jul 28, 2018 at 05:56:59AM -0400, Jeff King wrote:
> strcasecmp() will only catch a subset of the cases. We really need to
> follow the same folding rules that the filesystem would.
> 
> For the case of clone, I actually wonder if we could detect during the
> checkout step that a file already exists. Since we know that the
> directory we started with was empty, then if it does, either:
> 
>   - there's some funny case-folding going on that means two paths in the
> repository map to the same name in the filesystem; or
> 
>   - somebody else is writing to the directory at the same time as us
> 
> Either of which I think would be worth warning about. I'm not sure if we
> already lstat() the paths we're writing anyway as part of the checkout,
> so we might even get the feature "for free".

This is possible to do.  From the bug I accidentally introduced in 2.16,
we know that on clone, there is a code path that is only traversed when
we hit this case and only on case-insensitive file systems.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v3 05/10] config doc: elaborate on fetch.fsckObjects security

2018-07-28 Thread Ævar Arnfjörð Bjarmason


On Fri, Jul 27 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason   writes:
>
>> +For now, the paranoid need to find some way to emulate the quarantine
>> +environment if they'd like the same protection as "push". E.g. in the
>
> We probably should mention that you can immediately prune, as these
> unwanted crufts are unreferenced.  That would probably be a lot easier
> workaround for the intended readers of this document than "find some
> way to emulate".

I'll mention that as well in v6 that "git prune" will get rid of these
objects.

For what it's worth I was imagining something like a system where you're
mirroring every push to some unpatched-git-host.com repo in-house, by
doing a local "git fetch" when you see new data, and you're paranoid
that someone's trying to introduce something like the .gitmodules
security issue to your local mirror, even if you have
transfer.fsckObjects set.

In a case like that, relying on "git prune" is much more fragile. You'd
need to implement your mirror as some loop that does (pseudocode):

while ref = poll_new_refs()
git fetch upstream
git prune --expire=now

As opposed to:

while ref = poll_new_refs()
(git fetch upstream && git prune --expire=now) &

As you might find in some event-based system. I.e. every time you fetch
you need to stop the world and run a full prune, because the potentially
malicious upstream can craft a series of ref updates where one ref
update (which you'll refuse) contains the bad data, but at that point
you have some of those blobs/trees/commits it in your object store, and
then a second ref update references that already existing data and
causes you to update the ref.

It's also much slower and I/O heavy, on an already-pruned linux.git
running 'git prune --expire=now' takes 40 seconds on my machine, as
opposed to:

while ref = poll_new_refs()
(git fetch && git push internal-mirror --mirror) &

Which could take as little time as a second for the whole operation, can
safely be run in parallel, and would be protected because the actually
published internal mirror gets its refs via receive-pack, which uses the
quarantine.


Re: [PATCH v3 10/10] fsck: test and document unknown fsck. values

2018-07-28 Thread Ævar Arnfjörð Bjarmason


On Fri, Jul 27 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason   writes:
>
>> When fsck. is set to an unknown value it'll cause "fsck" to
>> die, but the same is not rue of the "fetch" and "receive"
>> variants. Document this and test for it.
>
> Interesting.  Before documenting and adding a test to cast the
> current behaviour in stone, do we need to see if the discrepancy is
> desired and designed one, or something we may want to fix?

We could change it. This is just something I ran into and figured it
should be tested / documented, and didn't feel any need to change it
myself.

The current behavior is probably more of an organically grown
accident. Maybe we should make all of these warn.

Trying to post-hoc rationalize these, it probably makes sense for
receive.* not to die, since you don't want pushes to fail if you're
tweaking this on a server and typo it, same with fetch (although less
so), whereas "fsck" tends to be more of an offline validation command.

So Yeah, I can change this, or not. What do you/others think?

>
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
>>  Documentation/config.txt|  4 
>>  t/t5504-fetch-receive-strict.sh | 14 ++
>>  2 files changed, 18 insertions(+)
>>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index 57c463c6e2..4cead6119a 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -1637,6 +1637,10 @@ In general, it is better to enumerate existing 
>> objects with problems
>>  with `fsck.skipList`, instead of listing the kind of breakages these
>>  problematic objects share to be ignored, as doing the latter will
>>  allow new instances of the same breakages go unnoticed.
>> ++
>> +Setting an unknown `fsck.` value will cause fsck to die, but
>> +doing the same for `receive.fsck.` and `fetch.fsck.`
>> +will only cause git to warn.
>>
>>  fsck.skipList::
>>  The path to a sorted list of object names (i.e. one SHA-1 per
>> diff --git a/t/t5504-fetch-receive-strict.sh 
>> b/t/t5504-fetch-receive-strict.sh
>> index 7f06b537d3..62f3569891 100755
>> --- a/t/t5504-fetch-receive-strict.sh
>> +++ b/t/t5504-fetch-receive-strict.sh
>> @@ -198,6 +198,10 @@ test_expect_success 'fetch with fetch.fsck.skipList' '
>>  git --git-dir=dst/.git fetch "file://$(pwd)" $refspec
>>  '
>>
>> +test_expect_success 'fsck. dies' '
>> +test_must_fail git -c fsck.whatEver=ignore fsck 2>err &&
>> +test_i18ngrep "Unhandled message id: whatever" err
>> +'
>>
>>  test_expect_success 'push with receive.fsck.missingEmail=warn' '
>>  commit="$(git hash-object -t commit -w --stdin > @@ -211,10 +215,15 @@ test_expect_success 'push with 
>> receive.fsck.missingEmail=warn' '
>>  git --git-dir=dst/.git config fsck.missingEmail warn &&
>>  test_must_fail git push --porcelain dst bogus &&
>>
>> +# receive.fsck. warns
>> +git --git-dir=dst/.git config \
>> +receive.fsck.whatEver error &&
>> +
>>  git --git-dir=dst/.git config \
>>  receive.fsck.missingEmail warn &&
>>  git push --porcelain dst bogus >act 2>&1 &&
>>  grep "missingEmail" act &&
>> +test_i18ngrep "Skipping unknown msg id.*whatever" act &&
>>  git --git-dir=dst/.git branch -D bogus &&
>>  git --git-dir=dst/.git config --add \
>>  receive.fsck.missingEmail ignore &&
>> @@ -235,10 +244,15 @@ test_expect_success 'fetch with 
>> fetch.fsck.missingEmail=warn' '
>>  git --git-dir=dst/.git config fsck.missingEmail warn &&
>>  test_must_fail git --git-dir=dst/.git fetch "file://$(pwd)" $refspec &&
>>
>> +# receive.fsck. warns
>> +git --git-dir=dst/.git config \
>> +fetch.fsck.whatEver error &&
>> +
>>  git --git-dir=dst/.git config \
>>  fetch.fsck.missingEmail warn &&
>>  git --git-dir=dst/.git fetch "file://$(pwd)" $refspec >act 2>&1 &&
>>  grep "missingEmail" act &&
>> +test_i18ngrep "Skipping unknown msg id.*whatever" act &&
>>  rm -rf dst &&
>>  git init dst &&
>>  git --git-dir=dst/.git config fetch.fsckobjects true &&


Re: [RFC PATCH v5 0/4] add -p: select individual hunk lines

2018-07-28 Thread Ævar Arnfjörð Bjarmason


On Thu, Jul 26 2018, Phillip Wood wrote:

> Unfortuantely v4 had test failures due to a suprious brace from a last
> minute edit to a comment that I forgot to test. This version fixes
> that, my applogies for the patch churn.
>
> I've updated this series based on Ævar's feedback on v3 (to paraphrase
> stop using '$_' so much and fix staging modified lines.). The first
> patch is functionally equivalent to the previous version but with a
> reworked implementation. Patch 2 is new, it implements correctly
> staging modified lines with a couple of limitations - see the commit
> message for more details, I'm keen to get some feedback on it. Patches
> 3 and 4 are essentially rebased and tweaked versions of patches 2 and
> 3 from the previous version.
>
> This series is based on pw/add-p-recount (f4d35a6b49 "add -p: fix
> counting empty context lines in edited patches")
>
> The motivation for this series is summed up in the first commit
> message:
>
> "When I end up editing hunks it is almost always because I want to
> stage a subset of the lines in the hunk. Doing this by editing the
> hunk is inconvenient and error prone (especially so if the patch is
> going to be reversed before being applied). Instead offer an option
> for add -p to stage individual lines. When the user presses 'l' the
> hunk is redrawn with labels by the insertions and deletions and they
> are prompted to enter a list of the lines they wish to stage. Ranges
> of lines may be specified using 'a-b' where either 'a' or 'b' may be
> omitted to mean all lines from 'a' to the end of the hunk or all lines
> from 1 upto and including 'b'."

I tested this with an eye towards what I pointed out in
https://public-inbox.org/git/878ta8vyqe@evledraar.gmail.com/

Using the same workflow (search for "So what I was expecting" in that
E-Mail) this now does the right thing in that example:

select lines? 4,10
[...]
$ git diff --staged -U1
diff --git a/README.md b/README.md
index ff990622a3..6d16f7e52b 100644
--- a/README.md
+++ b/README.md
@@ -20,3 +20,3 @@ See [Documentation/gittutorial.txt][] to get started, 
then see
 Documentation/git-.txt for documentation of each command.
-If git has been correctly installed, then the tutorial can also be
+If Git has been correctly installed, then the tutorial can also be
 read with `man gittutorial` or `git help tutorial`, and the
u git ((49703a4754...) $) $

Some other comments on this:

1) It needs to be more obvious how to exit this sub-mode, i.e. consider
this confused user:

Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,l,?]? l
select lines? ?
invalid hunk line '?'
select lines? q
invalid hunk line 'q'
select lines? exit
invalid hunk line 'exit'
select lines? quit
invalid hunk line 'quit'
select lines? :wq
invalid hunk line ':wq'
select lines? help
invalid hunk line 'help'

Just doing Ctrl+D or RET exits it. Instead "?" should print some help
related to this sub-mode showing what the syntax is, and how to exit the
sub-mode. I think it would make sense for "q" to by synonymous with
"RET", i.e. you'd need "qq" to fully exit, but I don't know...

2) I think it's confusing UI that selecting some of the lines won't
re-present the hunk to you again in line mode, but I see this is
consistent with how e.g. "e" works, it won't re-present the hunk to you
if there's still something to do, you need to exit and run "git add -p"
again.

I think it makes sense to change that and you'd either "e" or "l" and
then "n" to proceed, or continue, but that's per-se unrelated to this
feature. Just something I ran into...

3) I don't see any way around this, but we need to carefully explain
that selecting a list of things in one session is *not* the same thing
as selecting them incrementally in multiple sessions. I.e. consider this
diff:

@@ -1,3 +1,3 @@
-a
-b
-c
+1
+2
+3

If I select 1,4 I get, as expected:

@@ -1,3 +1,3 @@
-a
+1
 b
 c

And then in the next session:

  @@ -1,3 +1,3 @@
   1
1 -b
2 -c
3 +2
4 +3
select lines? 1,3

Yields, as expected:

@@ -1,3 +1,3 @@
-a
-b
+1
+2
 c

But this is not the same as redoing the whole thing as:

select lines? 1,4
select lines? 1
select lines? 3

Which instead yields:

@@ -1,3 +1,3 @@
-a
-b
+1
 c
+3

Now, rummaging through my wetware and that E-Mail from back in March I
don't see how it could work differently, and you *also* want to be able
to select one line at a time like that.

Just something that's not very intuative / hard to explain, and maybe
there should be a different syntax (e.g. 1:4) for this "swap 1 for 4"
operation, as opposed to selecting lines 1 and 4 as they appear in the
diff.

4) With that abc 123 diff noted above, why am I in two sessions allowed
to do:

@@ -1,3 +1,3 @@
1 -a
2 -b
3 -c
4 +1
5 +2
6 +3
select lines?

Re: [RFC PATCH 3/5] pack-objects: add delta-islands support

2018-07-28 Thread Christian Couder
On Sat, Jul 28, 2018 at 11:00 AM, Jeff King  wrote:
>
> On Fri, Jul 27, 2018 at 10:22:09AM -0700, Stefan Beller wrote:
>
> > > That would solve the problem that fetching torvalds/linux from GitHub
> > > yields a bigger pack than fetching it from kernel.org. But again, it's
> > > making that root fork weirdly magical. People who fetch solely from
> > > other forks won't get any benefit (and may even see worse packs).
> >
> > Thanks for the explanation.
> > I think this discussion just hints at me being dense reading the
> > documentation. After all I like the concept.
>
> I actually think it hints that the documentation in the commits
> themselves is not adequate. :)

Ok, I will try to improve it using information from the discussion threads.

Thanks,
Christian.


Re: [PATCH] config: fix case sensitive subsection names on writing

2018-07-28 Thread Jeff King
On Fri, Jul 27, 2018 at 08:52:59PM -0700, Stefan Beller wrote:

> > > + for key in "v.a.r" "V.A.r" "v.A.r" "V.a.r"
> > > + do
> > > + cp testConfig testConfig_actual &&
> > > + git config -f testConfig_actual v.a.r value2 &&
> > > + test_cmp testConfig_expect testConfig_actual
> > > + done
> > > +'
> >
> > I think you meant to use "$key" when setting the variable to value2.
> >
> > When the test_cmp fails with "v.a.r" but later succeeds and most
> > importantly succeeds with "V.a.r" (i.e. the last one), wouldn't the
> > whole thing suceed?  I think the common trick people use is to end
> > the last one with "|| return 1" in a loop inside test_expect_success.
> 
> Hah, of course this patch is not as easy.
> 
> The problem is in git_parse_source (config.c, line 755) when we call
> get_base_var it normalizes both styles, (and lower cases the dot style).
> 
> So in case of dot style, the strncasecmp is correct, but for the new
> extended style we need to strncmp.
> 
> So this is correct for extended but not any more for the former dot style.

Hmm, it looks like this has always been broken. Before your patch (but
after v2.18.0), running "git config v.A.r value2" writes:

  [V.A]
  r = value1
  r = value2

which is obviously broken (we decided it was the right section, but
didn't match the variables, leading to a weird duplicate). After your
patch we write:

  [V.A]
  r = value1
  [v "A"]
  r = value2

I.e., we do not consider them the same value at all. But that's actually
what happened before v2.18, too. I think you could almost justify that
behavior as "The section.subsection syntax is deprecated, so we match it
case-insensitively on reading but not on writing; since we never write
such sections ourselves, you are on your own for modifying such entries
if you choose to write them manually".

I dunno. Maybe that is too harsh. But this syntax has been deprecated
for 7 years, and nobody has noticed the bug until now (when _still_
nobody wants to use it, but rather we're poking at it as "gee, I wonder
if this even works").

> I wonder if we want to either (a) add another CONFIG_EVENT_SECTION
> that indicates the difference between the two, or have a flag in 'cf' that
> tells us what kind of section style we have, such that we can check
> appropriately for the case.

I'd think that the parse_event_data could carry type-specific
information. But...

> Or we could encode it in the 'cf->var' value to distinguish the old
> and new style.

Even if we fix the section-matching here, I suspect that would just
trigger a separate bug later when we match the full key (so we might add
to the correct section, but we would not correctly replace an existing
key). To fix that, the information does need to be carried for the whole
lifetime of cf->var, not just during the header event.

-Peff


Re: [RFC PATCH v5 0/4] add -p: select individual hunk lines

2018-07-28 Thread Phillip Wood
On 27/07/18 19:27, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Jul 26 2018, Phillip Wood wrote:
> 
>> From: Phillip Wood 
>>
>> Unfortuantely v4 had test failures due to a suprious brace from a last
>> minute edit to a comment that I forgot to test. This version fixes
>> that, my applogies for the patch churn.
>>
>> I've updated this series based on Ævar's feedback on v3 (to paraphrase
>> stop using '$_' so much and fix staging modified lines.). The first
>> patch is functionally equivalent to the previous version but with a
>> reworked implementation. Patch 2 is new, it implements correctly
>> staging modified lines with a couple of limitations - see the commit
>> message for more details, I'm keen to get some feedback on it. Patches
>> 3 and 4 are essentially rebased and tweaked versions of patches 2 and
>> 3 from the previous version.
> 
> I was going to review this, but can't find what it's based on, I can't
> apply 1/4 to master, next or pu. It seems to be based on some older
> version of master, e.g. 1/4 has this hunk:
> 
> + elsif ($line =~ /^l/) {
> + unless ($other =~ /l/) {
> + error_msg __("Cannot select line by 
> line\n");
> + next;
> + }
> + my $newhunk = select_lines_loop($hunk[$ix]);
> + if ($newhunk) {
> + splice @hunk, $ix, 1, $newhunk;
> + } else {
> + next;
> + }
> + }
>   elsif ($other =~ /s/ && $line =~ /^s/) {
> 
> Which seems to conflict with your 4bdd6e7ce3 ("add -p: improve error
> messages", 2018-02-13). I could have tried to manually apply this, but
> figured I'd bounce this back to you...

Yes, I wasn't sure whether to rebase or not and in the end I didn't. The
line below where you cut the cover letter message says it is based on
f4d35a6b49 "add -p: fix counting empty context lines in edited patches".
So you could just apply it there and test it. You can fetch it with

git fetch https://github.com/phillipwood/git add-i-select-lines-v5

> Having just skimmed through the patches themselves I agree with this
> approach of handling the simple case (as discussed before) and leaving
> the rest for some future change, but let's see about the details once I
> have this running.

Thanks

Phillip




Re: Git clone and case sensitivity

2018-07-28 Thread Jeff King
On Sat, Jul 28, 2018 at 07:11:05AM +0200, Duy Nguyen wrote:

> > It might be enough to just issue a warning and give an advise() hint
> > that tells the user what's going on. Then they can decide what to do
> > (hide both paths, or just work in the index, or move to a different fs,
> > or complain to upstream).
> 
> Yeah that may be the best option. Something like this perhaps? Not
> sure how much detail the advice should be here.

Yeah, something along these lines.  I agree with Simon's comment
elsewhere that this should probably mention the names. I don't know if
we'd want to offer advice pointing them to using the sparse feature to
work around it.

> +static int has_duplicate_icase_entries(struct index_state *istate)
> +{
> + struct string_list list = STRING_LIST_INIT_NODUP;
> + int i;
> + int found = 0;
> +
> + for (i = 0; i < istate->cache_nr; i++)
> + string_list_append(&list, istate->cache[i]->name);
> +
> + list.cmp = strcasecmp;
> + string_list_sort(&list);
> +
> + for (i = 1; i < list.nr; i++) {
> + if (strcasecmp(list.items[i-1].string,
> +list.items[i].string))
> + continue;
> + found = 1;
> + break;
> + }
> + string_list_clear(&list, 0);
> +
> + return found;
> +}

strcasecmp() will only catch a subset of the cases. We really need to
follow the same folding rules that the filesystem would.

For the case of clone, I actually wonder if we could detect during the
checkout step that a file already exists. Since we know that the
directory we started with was empty, then if it does, either:

  - there's some funny case-folding going on that means two paths in the
repository map to the same name in the filesystem; or

  - somebody else is writing to the directory at the same time as us

Either of which I think would be worth warning about. I'm not sure if we
already lstat() the paths we're writing anyway as part of the checkout,
so we might even get the feature "for free".

-Peff


Re: Git clone and case sensitivity

2018-07-28 Thread Simon Ruderich
On Sat, Jul 28, 2018 at 07:11:05AM +0200, Duy Nguyen wrote:
>  static int checkout(int submodule_progress)
>  {
>   struct object_id oid;
> @@ -761,6 +785,11 @@ static int checkout(int submodule_progress)
>   if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
>   die(_("unable to write new index file"));
>
> + if (ignore_case && has_duplicate_icase_entries(&the_index))
> + warning(_("This repository has paths that only differ in case\n"
> +   "and you have a case-insenitive filesystem which 
> will\n"
> +   "cause problems."));
> +
>   err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
>  oid_to_hex(&oid), "1", NULL);

I think the advice message should list the problematic file
names. Even though this might be quite verbose it will help those
affected to quickly find the problematic files to either fix this
on their own or report to upstream (unless there's already an
easy way to find those files - if so it should be mentioned in
the message).

Regards
Simon
-- 
+ privacy is necessary
+ using gnupg http://gnupg.org
+ public key id: 0x92FEFDB7E44C32F9


Re: [PATCH v3 1/4] automatically ban strcpy()

2018-07-28 Thread Jeff King
On Fri, Jul 27, 2018 at 10:34:20AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> >>  $ git rebase --onto HEAD @{-1}~3 @{-1}^0
> >
> > Interesting. I'd have probably done it with an interactive rebase:
> >
> >   $ git rebase -i HEAD~4
> >   [change first "pick" to "edit"; after stopping...]
> >   $ git reset --hard HEAD^ ;# throw away patch 1
> >   $ git am -s mbox ;# apply single patch
> >   $ git rebase --continue
> >
> > Which is really the same thing,...
> 
> I have unfounded fear for doing anything other than "edit", "commit
> --amend", "rebase --continue", or "rebase --abort" during a "rebase
> -i" session.  
> 
> Especiallly "reset --hard" with anything but HEAD.  I guess that is
> because I do not fully trust/understand how the sequencer machinery
> replays the remainder of todo tasks on top of the HEAD that is
> different from what it left me to futz with when it relinquished the
> control.

I jump around via "reset --hard" all the time during interactive
rebases. I don't recall having any issues, though that does not mean
there isn't a corner case lurking. :)

> Also "am" during "rebase -i" is scary to me, as "am" also tries to
> keep its own sequencing state.  Do you know if "rebase --continue"
> would work correctly in the above sequence if "am" conflicted, I
> gave up, and said "am --abort", for example?  I don't offhand know.

This one is trickier. I _assumed_ it would be fine to "am" during a
rebase, but I didn't actually try it (in fact, I rarely do anything
exotic with "am", since my main use is just test-applying patches on a
detached head).

It _seems_ to work with this example:

-- >8 --
git init repo
cd repo

for i in 1 2 3 4; do
  echo $i >file
  git add file
  git commit -m $i
done
git format-patch --stdout -1 >patch

git reset --hard HEAD^
GIT_EDITOR='perl -i -pe "/2$/ and s/pick/edit/"' git rebase -i HEAD~2
git am patch
git am --abort
-- 8< --

I think because "am" lives in $GIT_DIR/rebase-apply, and "rebase -i"
lives in ".git/rebase-merge". Of course "rebase" can use the
rebase-apply directory, but I think interactive-rebase never will.

So it works, but mostly by luck. :)

In my ideal world, operations like this that can be continued would be
stored in a stack, and each stack element would know its operation type.
So you could do:

  # push a rebase onto the stack
  git rebase foo

  # while stopped, you might do another operation which pushes onto the
  # stack
  git am ~/patch

  # aborting an operation (or finishing it naturally) pops it off the
  # stack; now we just have the rebase on the stack
  git am --abort

  # aborting an operation that's not at the top of the stack would
  # either be an error, or could auto-abort everybody on top
  git am ~/patch
  git rebase --abort ;# aborts the am, too!

  # you could even nest similar operations; by default we find the
  # top-most one in the stack, but you could refer to them by stack
  # position.
  #
  # put us in a rebase that stops at a conflict
  git rebase foo
  # oops, rewrite the last few commits as part of fixing the conflict
  git rebase -i HEAD~3
  # nope, let's abort the whole thing (stack level 0)
  git rebase --abort=0

  # it would also be nice to have generic commands to manipulate the
  # stack
  git op list ;# show the stack
  git op abort ;# abort the top operation, whatever it is
  git op continue ;# continue the top operation, whatever it is

I've hacked something similar to "git op continue" myself and find it
very useful, but:

  - it's intimately aware of all the possible operations, including some
custom ones that I have. I wouldn't need to if each operation
touched a well-known directory to push itself on the stack, and
provided a few commands in the stack directory for things like "run
this to abort me".

  - it sometimes behaves weirdly, because I "canceled" an operation with
"reset" or "checkout", and later I expect to continue operation X,
but find that some other operation Y was waiting. Having a stack
means it's much easier to see which operations are still hanging
around (we have this already with the prompt logic that shows things
like rebases, but again, it has to be intimate with which operations
are storing data in $GIT_DIR)

Anyway. That's something I've dreamed about for a while, but the thought
of retro-fitting the existing multi-command operations turned me off.
The current systems _usually_ works.

-Peff


Re: [RFC PATCH 3/5] pack-objects: add delta-islands support

2018-07-28 Thread Jeff King
On Fri, Jul 27, 2018 at 10:22:09AM -0700, Stefan Beller wrote:

> > That would solve the problem that fetching torvalds/linux from GitHub
> > yields a bigger pack than fetching it from kernel.org. But again, it's
> > making that root fork weirdly magical. People who fetch solely from
> > other forks won't get any benefit (and may even see worse packs).
> 
> Thanks for the explanation.
> I think this discussion just hints at me being dense reading the
> documentation. After all I like the concept.

I actually think it hints that the documentation in the commits
themselves is not adequate. :)

-Peff


Re: [PATCH v4 01/21] linear-assignment: a function to solve least-cost assignment problems

2018-07-28 Thread Thomas Gummerer
On 07/21, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin 
> 
> The problem solved by the code introduced in this commit goes like this:
> given two sets of items, and a cost matrix which says how much it
> "costs" to assign any given item of the first set to any given item of
> the second, assign all items (except when the sets have different size)
> in the cheapest way.
> 
> We use the Jonker-Volgenant algorithm to solve the assignment problem to
> answer questions such as: given two different versions of a topic branch
> (or iterations of a patch series), what is the best pairing of
> commits/patches between the different versions?
> 
> Signed-off-by: Johannes Schindelin 
> ---
>  Makefile|   1 +
>  linear-assignment.c | 201 
>  linear-assignment.h |  22 +
>  3 files changed, 224 insertions(+)
>  create mode 100644 linear-assignment.c
>  create mode 100644 linear-assignment.h
>
> [...]
> 
> diff --git a/linear-assignment.h b/linear-assignment.h
> new file mode 100644
> index 0..fc4c502c8
> --- /dev/null
> +++ b/linear-assignment.h
> @@ -0,0 +1,22 @@
> +#ifndef HUNGARIAN_H
> +#define HUNGARIAN_H

nit: maybe s/HUNGARIAN/LINEAR_ASSIGNMENT/ in the two lines above.

> +
> +/*
> + * Compute an assignment of columns -> rows (and vice versa) such that every
> + * column is assigned to at most one row (and vice versa) minimizing the
> + * overall cost.
> + *
> + * The parameter `cost` is the cost matrix: the cost to assign column j to 
> row
> + * i is `cost[j + column_count * i].
> + *
> + * The arrays column2row and row2column will be populated with the respective
> + * assignments (-1 for unassigned, which can happen only if column_count !=
> + * row_count).
> + */
> +void compute_assignment(int column_count, int row_count, int *cost,
> + int *column2row, int *row2column);
> +
> +/* The maximal cost in the cost matrix (to prevent integer overflows). */
> +#define COST_MAX (1<<16)
> +
> +#endif
> -- 
> gitgitgadget
>