date:20181005

[PATCH] docs: typo: s/isimilar/similar/

2018-10-05 Thread Michael Witten

Signed-off-by: Michael Witten 
---
 Documentation/git-rebase.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 1fbc6ebcde..432baabe33 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -954,7 +954,7 @@ command fails, it is rescheduled immediately, with a 
helpful message how
 to proceed.
 
 The `reset` command resets the HEAD, index and worktree to the specified
-revision. It is isimilar to an `exec git reset --hard `, but
+revision. It is similar to an `exec git reset --hard `, but
 refuses to overwrite untracked files. If the `reset` command fails, it is
 rescheduled immediately, with a helpful message how to edit the todo list
 (this typically happens when a `reset` command was inserted into the todo
-- 
2.18.0

[PATCH] docs: graph: Remove unnecessary `graph_update()' call

2018-10-05 Thread Michael Witten

The sample code calls `get_revision()' followed by `graph_update()',
but the documentation and source code indicate that `get_revision()'
already calls `graph_update()' for you.

Signed-off-by: Michael Witten 
---
 Documentation/technical/api-history-graph.txt | 1 -
 1 file changed, 1 deletion(-)

diff --git a/Documentation/technical/api-history-graph.txt 
b/Documentation/technical/api-history-graph.txt
index 18142b6d29..d9fd98d435 100644
--- a/Documentation/technical/api-history-graph.txt
+++ b/Documentation/technical/api-history-graph.txt
@@ -115,7 +115,6 @@ struct commit *commit;
 struct git_graph *graph = graph_init(opts);
 
 while ((commit = get_revision(opts)) != NULL) {
-   graph_update(graph, commit);
while (!graph_is_commit_finished(graph))
{
struct strbuf sb;
-- 
2.18.0

[PATCH] docs: typo: s/go/to/

2018-10-05 Thread Michael Witten

Signed-off-by: Michael Witten 
---
 Documentation/technical/api-history-graph.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/technical/api-history-graph.txt 
b/Documentation/technical/api-history-graph.txt
index 18142b6d29..729d6a0cf3 100644
--- a/Documentation/technical/api-history-graph.txt
+++ b/Documentation/technical/api-history-graph.txt
@@ -80,7 +80,7 @@ Calling sequence
   it is invoked.
 
 * For each commit, call `graph_next_line()` repeatedly, until
-  `graph_is_commit_finished()` returns non-zero.  Each call go
+  `graph_is_commit_finished()` returns non-zero.  Each call to
   `graph_next_line()` will output a single line of the graph.  The resulting
   lines will not contain any newlines.  `graph_next_line()` returns 1 if the
   resulting line contains the current commit, or 0 if this is merely a line
-- 
2.18.0

Re: [PATCH v6 08/10] submodule: add a helper to check if it is safe to write to .gitmodules

2018-10-05 Thread Stefan Beller

>  static int module_config(int argc, const char **argv, const char *prefix)
>  {
> +   enum {
> +   CHECK_WRITEABLE = 1
> +   } command = 0;

Can we have the default named? Then we would only use states
from within the enum?

Re: [PATCH v3] coccicheck: process every source file at once

2018-10-05 Thread Jacob Keller

On Fri, Oct 5, 2018 at 12:00 PM Jeff King  wrote:
> That's OK, too, assuming people would actually want to use it. I'm also
> OK shipping this (with the "make -j" fix you suggested) and seeing if
> anybody actually complains. I assume there are only a handful of people
> running coccicheck in the first place.
>
> -Peff

Ok. I can go this route if we have consensus on the "break it and see
if someone complains" route.

Regards,
Jake

[PATCH] builtin/grep.c: remote superflous submodule code

2018-10-05 Thread Stefan Beller

In f9ee2fcdfa (grep: recurse in-process using 'struct repository',
2017-08-02), we introduced a call to repo_read_gitmodules in builtin/grep
to simplify the submodule handling.

After ff6f1f564c4 (submodule-config: lazy-load a repository's .gitmodules
file, 2017-08-03) this is no longer necessary, but that commit did not
cleanup the whole tree, but just show cased the new way how to deal with
submodules in ls-files.

Cleanup the only remaining caller to repo_read_gitmodules outside of
submodule.c

Signed-off-by: Stefan Beller 
---

Antonio Ospite writes:
> BTW, with Stefan Beller we also identified some unneeded code which
> could have been removed to alleviate the issue, but that would not have
> solved it completely; so, I am not removing the unnecessary call to
> repo_read_gitmodules() builtin/grep.c in this series, possibly this can
> become a stand-alone change.

Here is the stand-alone change.

The patch [1] contains the lines as deleted below in the context lines
but they would not conflict as there is one empty line between the changes
in this patch in [1].

[1] https://public-inbox.org/git/20181005130601.15879-10-...@ao2.it/

 builtin/grep.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 601f801158..a6272b9c2f 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -427,8 +427,6 @@ static int grep_submodule(struct grep_opt *opt, struct 
repository *superproject,
if (repo_submodule_init(, superproject, path))
return 0;

-   repo_read_gitmodules();
-
/*
 * NEEDSWORK: This adds the submodule's object directory to the list of
 * alternates for the single in-memory object store.  This has some bad
-- 
2.19.0

[PATCH v5 5/7] tests: don't swallow Git errors upstream of pipes

2018-10-05 Thread Matthew DeVore

Some pipes in tests lose the exit code of git processes, which can mask
unexpected behavior like crashes. Split these pipes up so that git
commands are only at the end of pipes rather than the beginning or
middle.

The violations fixed in this patch were found in the process of fixing
pipe placement in a prior patch.

Signed-off-by: Matthew DeVore 
---
 t/t5317-pack-objects-filter-objects.sh | 156 +
 t/t5616-partial-clone.sh   |  14 ++-
 t/t6112-rev-list-filters-objects.sh| 103 
 3 files changed, 141 insertions(+), 132 deletions(-)

diff --git a/t/t5317-pack-objects-filter-objects.sh 
b/t/t5317-pack-objects-filter-objects.sh
index c093eb891..2e718f0bd 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -20,8 +20,9 @@ test_expect_success 'setup r1' '
 '
 
 test_expect_success 'verify blob count in normal packfile' '
-   git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 |
-   awk -f print_2.awk |
+   git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 \
+   >ls_files_result &&
+   awk -f print_2.awk ls_files_result |
sort >expected &&
 
git -C r1 pack-objects --rev --stdout >all.pack <<-EOF &&
@@ -29,8 +30,8 @@ test_expect_success 'verify blob count in normal packfile' '
EOF
git -C r1 index-pack ../all.pack &&
 
-   git -C r1 verify-pack -v ../all.pack |
-   grep blob |
+   git -C r1 verify-pack -v ../all.pack >verify_result &&
+   grep blob verify_result |
awk -f print_1.awk |
sort >observed &&
 
@@ -43,8 +44,8 @@ test_expect_success 'verify blob:none packfile has no blobs' '
EOF
git -C r1 index-pack ../filter.pack &&
 
-   git -C r1 verify-pack -v ../filter.pack |
-   grep blob |
+   git -C r1 verify-pack -v ../filter.pack >verify_result &&
+   grep blob verify_result |
awk -f print_1.awk |
sort >observed &&
 
@@ -53,13 +54,13 @@ test_expect_success 'verify blob:none packfile has no 
blobs' '
 '
 
 test_expect_success 'verify normal and blob:none packfiles have same 
commits/trees' '
-   git -C r1 verify-pack -v ../all.pack |
-   grep -E "commit|tree" |
+   git -C r1 verify-pack -v ../all.pack >verify_result &&
+   grep -E "commit|tree" verify_result |
awk -f print_1.awk |
sort >expected &&
 
-   git -C r1 verify-pack -v ../filter.pack |
-   grep -E "commit|tree" |
+   git -C r1 verify-pack -v ../filter.pack >verify_result &&
+   grep -E "commit|tree" verify_result |
awk -f print_1.awk |
sort >observed &&
 
@@ -82,8 +83,8 @@ test_expect_success 'setup r2' '
 '
 
 test_expect_success 'verify blob count in normal packfile' '
-   git -C r2 ls-files -s large.1000 large.1 |
-   awk -f print_2.awk |
+   git -C r2 ls-files -s large.1000 large.1 >ls_files_result &&
+   awk -f print_2.awk ls_files_result |
sort >expected &&
 
git -C r2 pack-objects --rev --stdout >all.pack <<-EOF &&
@@ -91,8 +92,8 @@ test_expect_success 'verify blob count in normal packfile' '
EOF
git -C r2 index-pack ../all.pack &&
 
-   git -C r2 verify-pack -v ../all.pack |
-   grep blob |
+   git -C r2 verify-pack -v ../all.pack >verify_result &&
+   grep blob verify_result |
awk -f print_1.awk |
sort >observed &&
 
@@ -105,8 +106,8 @@ test_expect_success 'verify blob:limit=500 omits all blobs' 
'
EOF
git -C r2 index-pack ../filter.pack &&
 
-   git -C r2 verify-pack -v ../filter.pack |
-   grep blob |
+   git -C r2 verify-pack -v ../filter.pack >verify_result &&
+   grep blob verify_result |
awk -f print_1.awk |
sort >observed &&
 
@@ -120,8 +121,8 @@ test_expect_success 'verify blob:limit=1000' '
EOF
git -C r2 index-pack ../filter.pack &&
 
-   git -C r2 verify-pack -v ../filter.pack |
-   grep blob |
+   git -C r2 verify-pack -v ../filter.pack >verify_result &&
+   grep blob verify_result |
awk -f print_1.awk |
sort >observed &&
 
@@ -130,8 +131,8 @@ test_expect_success 'verify blob:limit=1000' '
 '
 
 test_expect_success 'verify blob:limit=1001' '
-   git -C r2 ls-files -s large.1000 |
-   awk -f print_2.awk |
+   git -C r2 ls-files -s large.1000 >ls_files_result &&
+   awk -f print_2.awk ls_files_result |
sort >expected &&
 
git -C r2 pack-objects --rev --stdout --filter=blob:limit=1001 
>filter.pack <<-EOF &&
@@ -139,8 +140,8 @@ test_expect_success 'verify blob:limit=1001' '
EOF
git -C r2 index-pack ../filter.pack &&
 
-   git -C r2 verify-pack -v ../filter.pack |
-   grep blob |
+   git -C r2 verify-pack -v ../filter.pack >verify_result &&
+   grep blob verify_result |
awk -f print_1.awk |
sort >observed &&
 
@@ -148,8 +149,8 @@ test_expect_success 'verify

[PATCH v5 7/7] tests: order arguments to git-rev-list properly

2018-10-05 Thread Matthew DeVore

It is a common mistake to put positional arguments before flags when
invoking git-rev-list. Order the positional arguments last.

This patch skips git-rev-list invocations which include the --not flag,
since the ordering of flags and positional arguments affects the
behavior. This patch also skips invocations of git-rev-list that occur
in command substitution in which the exit code is discarded, since
fixing those properly will require a more involved cleanup.

Signed-off-by: Matthew DeVore 
---
 t/t5616-partial-clone.sh| 26 +
 t/t5702-protocol-v2.sh  |  4 +--
 t/t6112-rev-list-filters-objects.sh | 43 ++---
 3 files changed, 44 insertions(+), 29 deletions(-)

diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index fc7aeb1ab..6ff614692 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -35,7 +35,7 @@ test_expect_success 'setup bare clone for server' '
 test_expect_success 'do partial clone 1' '
git clone --no-checkout --filter=blob:none "file://$(pwd)/srv.bare" pc1 
&&
 
-   git -C pc1 rev-list HEAD --quiet --objects --missing=print >revs &&
+   git -C pc1 rev-list --quiet --objects --missing=print HEAD >revs &&
awk -f print_1.awk revs |
sed "s/?//" |
sort >observed.oids &&
@@ -48,10 +48,10 @@ test_expect_success 'do partial clone 1' '
 
 # checkout master to force dynamic object fetch of blobs at HEAD.
 test_expect_success 'verify checkout with dynamic object fetch' '
-   git -C pc1 rev-list HEAD --quiet --objects --missing=print >observed &&
+   git -C pc1 rev-list --quiet --objects --missing=print HEAD >observed &&
test_line_count = 4 observed &&
git -C pc1 checkout master &&
-   git -C pc1 rev-list HEAD --quiet --objects --missing=print >observed &&
+   git -C pc1 rev-list --quiet --objects --missing=print HEAD >observed &&
test_line_count = 0 observed
 '
 
@@ -74,7 +74,8 @@ test_expect_success 'push new commits to server' '
 # have the new blobs.
 test_expect_success 'partial fetch inherits filter settings' '
git -C pc1 fetch origin &&
-   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print >observed &&
+   git -C pc1 rev-list --quiet --objects --missing=print \
+   master..origin/master >observed &&
test_line_count = 5 observed
 '
 
@@ -82,7 +83,8 @@ test_expect_success 'partial fetch inherits filter settings' '
 # we should only get 1 new blob (for the file in origin/master).
 test_expect_success 'verify diff causes dynamic object fetch' '
git -C pc1 diff master..origin/master -- file.1.txt &&
-   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print >observed &&
+   git -C pc1 rev-list --quiet --objects --missing=print \
+master..origin/master >observed &&
test_line_count = 4 observed
 '
 
@@ -91,7 +93,8 @@ test_expect_success 'verify diff causes dynamic object fetch' 
'
 test_expect_success 'verify blame causes dynamic object fetch' '
git -C pc1 blame origin/master -- file.1.txt >observed.blame &&
test_cmp expect.blame observed.blame &&
-   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print >observed &&
+   git -C pc1 rev-list --quiet --objects --missing=print \
+   master..origin/master >observed &&
test_line_count = 0 observed
 '
 
@@ -111,7 +114,8 @@ test_expect_success 'push new commits to server for 
file.2.txt' '
 # Verify we have all the new blobs.
 test_expect_success 'override inherited filter-spec using --no-filter' '
git -C pc1 fetch --no-filter origin &&
-   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print >observed &&
+   git -C pc1 rev-list --quiet --objects --missing=print \
+   master..origin/master >observed &&
test_line_count = 0 observed
 '
 
@@ -133,8 +137,8 @@ test_expect_success 'push new commits to server for 
file.3.txt' '
 test_expect_success 'manual prefetch of missing objects' '
git -C pc1 fetch --filter=blob:none origin &&
 
-   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print \
-   >revs &&
+   git -C pc1 rev-list --quiet --objects --missing=print \
+master..origin/master >revs &&
awk -f print_1.awk revs |
sed "s/?//" |
sort >observed.oids &&
@@ -142,8 +146,8 @@ test_expect_success 'manual prefetch of missing objects' '
test_line_count = 6 observed.oids &&
git -C pc1 fetch-pack --stdin "file://$(pwd)/srv.bare" revs &&
+   git -C pc1 rev-list --quiet --objects --missing=print \
+   master..origin/master >revs &&
awk -f print_1.awk revs |
sed "s/?//" |
sort >observed.oids &&
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 54727450b..11a84efff 100755
---

[PATCH v5 6/7] t9109: don't swallow Git errors upstream of pipes

2018-10-05 Thread Matthew DeVore

'git ... | foo' will mask any errors or crashes in git, so split up such
pipes in this file.

One testcase uses several separate pipe sequences in a row which are
awkward to split up. Wrap the split-up pipe in a function so the
awkwardness is not repeated. Also change that testcase's surrounding
quotes from double to single to avoid premature string interpolation.

Signed-off-by: Matthew DeVore 
---
 t/t9101-git-svn-props.sh | 34 +-
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/t/t9101-git-svn-props.sh b/t/t9101-git-svn-props.sh
index 8a5c8dc1a..c26c4b092 100755
--- a/t/t9101-git-svn-props.sh
+++ b/t/t9101-git-svn-props.sh
@@ -174,7 +174,8 @@ test_expect_success 'test create-ignore' "
cmp ./deeply/.gitignore create-ignore.expect &&
cmp ./deeply/nested/.gitignore create-ignore.expect &&
cmp ./deeply/nested/directory/.gitignore create-ignore.expect &&
-   git ls-files -s | grep gitignore | cmp - create-ignore-index.expect
+   git ls-files -s >ls_files_result &&
+   grep gitignore ls_files_result | cmp - create-ignore-index.expect
"
 
 cat >prop.expect <<\EOF
@@ -189,17 +190,21 @@ EOF
 # This test can be improved: since all the svn:ignore contain the same
 # pattern, it can pass even though the propget did not execute on the
 # right directory.
-test_expect_success 'test propget' "
-   git svn propget svn:ignore . | cmp - prop.expect &&
+test_expect_success 'test propget' '
+   test_propget () {
+   git svn propget $1 $2 >actual &&
+   cmp $3 actual
+   } &&
+   test_propget svn:ignore . prop.expect &&
cd deeply &&
-   git svn propget svn:ignore . | cmp - ../prop.expect &&
-   git svn propget svn:entry:committed-rev nested/directory/.keep \
- | cmp - ../prop2.expect &&
-   git svn propget svn:ignore .. | cmp - ../prop.expect &&
-   git svn propget svn:ignore nested/ | cmp - ../prop.expect &&
-   git svn propget svn:ignore ./nested | cmp - ../prop.expect &&
-   git svn propget svn:ignore .././deeply/nested | cmp - ../prop.expect
-   "
+   test_propget svn:ignore . ../prop.expect &&
+   test_propget svn:entry:committed-rev nested/directory/.keep \
+   ../prop2.expect &&
+   test_propget svn:ignore .. ../prop.expect &&
+   test_propget svn:ignore nested/ ../prop.expect &&
+   test_propget svn:ignore ./nested ../prop.expect &&
+   test_propget svn:ignore .././deeply/nested ../prop.expect
+   '
 
 cat >prop.expect <<\EOF
 Properties on '.':
@@ -218,8 +223,11 @@ Properties on 'nested/directory/.keep':
 EOF
 
 test_expect_success 'test proplist' "
-   git svn proplist . | cmp - prop.expect &&
-   git svn proplist nested/directory/.keep | cmp - prop2.expect
+   git svn proplist . >actual &&
+   cmp prop.expect actual &&
+
+   git svn proplist nested/directory/.keep >actual &&
+   cmp prop2.expect actual
"
 
 test_done
-- 
2.19.0.605.g01d371f741-goog

[PATCH v5 4/7] t/*: fix ordering of expected/observed arguments

2018-10-05 Thread Matthew DeVore

Fix various places where the ordering was obviously wrong, meaning it
was easy to find with grep.

Signed-off-by: Matthew DeVore 
---
 t/t-basic.sh   |  2 +-
 t/t0021-conversion.sh  |  4 +--
 t/t1300-config.sh  |  4 +--
 t/t1303-wacky-config.sh|  4 +--
 t/t2101-update-index-reupdate.sh   |  2 +-
 t/t3200-branch.sh  |  2 +-
 t/t3320-notes-merge-worktrees.sh   |  4 +--
 t/t3400-rebase.sh  |  8 +++---
 t/t3417-rebase-whitespace-fix.sh   |  6 ++---
 t/t3702-add-edit.sh|  4 +--
 t/t3903-stash.sh   |  8 +++---
 t/t3905-stash-include-untracked.sh |  2 +-
 t/t4025-hunk-header.sh |  2 +-
 t/t4117-apply-reject.sh|  6 ++---
 t/t4124-apply-ws-rule.sh   | 30 +++
 t/t4138-apply-ws-expansion.sh  |  2 +-
 t/t5317-pack-objects-filter-objects.sh | 34 +-
 t/t5318-commit-graph.sh|  2 +-
 t/t5701-git-serve.sh   | 14 +--
 t/t5702-protocol-v2.sh | 10 
 t/t6023-merge-file.sh  | 12 -
 t/t6027-merge-binary.sh|  4 +--
 t/t6031-merge-filemode.sh  |  2 +-
 t/t6112-rev-list-filters-objects.sh| 24 +-
 t/t7201-co.sh  |  4 +--
 t/t7406-submodule-update.sh|  8 +++---
 t/t7800-difftool.sh|  2 +-
 t/t9100-git-svn-basic.sh   |  2 +-
 t/t9133-git-svn-nested-git-repo.sh |  6 ++---
 t/t9600-cvsimport.sh   |  2 +-
 t/t9603-cvsimport-patchsets.sh |  4 +--
 t/t9604-cvsimport-timestamps.sh|  4 +--
 32 files changed, 112 insertions(+), 112 deletions(-)

diff --git a/t/t-basic.sh b/t/t-basic.sh
index 850f651e4..224c098a8 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -1018,7 +1018,7 @@ test_expect_success SHA1 'validate git diff-files output 
for a know cache/work t
 :12 12 6649a1ebe9e9f1c553b66f5a6e74136a07ccc57c 
 M path3/subp3/file3sym
 EOF
git diff-files >current &&
-   test_cmp current expected
+   test_cmp expected current
 '
 
 test_expect_success 'git update-index --refresh should succeed' '
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 308cd28f3..fd5f1ac64 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -166,10 +166,10 @@ test_expect_success expanded_in_repo '
rm -f expanded-keywords expanded-keywords-crlf &&
 
git checkout -- expanded-keywords &&
-   test_cmp expanded-keywords expected-output &&
+   test_cmp expected-output expanded-keywords &&
 
git checkout -- expanded-keywords-crlf &&
-   test_cmp expanded-keywords-crlf expected-output-crlf
+   test_cmp expected-output-crlf expanded-keywords-crlf
 '
 
 # The use of %f in a filter definition is expanded to the path to
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 5869d6cb6..e2cd50ecf 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1001,7 +1001,7 @@ EOF
 
 test_expect_success 'value continued on next line' '
git config --list > result &&
-   test_cmp result expect
+   test_cmp expect result
 '
 
 cat > .git/config <<\EOF
@@ -1882,7 +1882,7 @@ test_expect_success '--replace-all does not invent 
newlines' '
Qkey = b
EOF
git config --replace-all abc.key b &&
-   test_cmp .git/config expect
+   test_cmp expect .git/config
 '
 
 test_done
diff --git a/t/t1303-wacky-config.sh b/t/t1303-wacky-config.sh
index 3b92083e1..e664e 100755
--- a/t/t1303-wacky-config.sh
+++ b/t/t1303-wacky-config.sh
@@ -14,7 +14,7 @@ setup() {
 check() {
echo "$2" >expected
git config --get "$1" >actual 2>&1
-   test_cmp actual expected
+   test_cmp expected actual
 }
 
 # 'check section.key regex value' verifies that the entry for
@@ -22,7 +22,7 @@ check() {
 check_regex() {
echo "$3" >expected
git config --get "$1" "$2" >actual 2>&1
-   test_cmp actual expected
+   test_cmp expected actual
 }
 
 test_expect_success 'modify same key' '
diff --git a/t/t2101-update-index-reupdate.sh b/t/t2101-update-index-reupdate.sh
index 685ec4563..6c32d42c8 100755
--- a/t/t2101-update-index-reupdate.sh
+++ b/t/t2101-update-index-reupdate.sh
@@ -73,7 +73,7 @@ test_expect_success 'update-index --update from subdir' '
100644 $(git hash-object dir1/file3) 0  dir1/file3
100644 $file2 0 file2
EOF
-   test_cmp current expected
+   test_cmp expected current
 '
 
 test_expect_success 'update-index --update with pathspec' '
diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
index 93f21ab07..478b82cf9 100755
--- a/t/t3200-branch.sh
+++ b/t/t3200-branch.sh
@@ -1221,7 +1221,7 @@ test_expect_success 'use --edit-description' '
EOF

[PATCH v5 2/7] Documentation: add shell guidelines

2018-10-05 Thread Matthew DeVore

Add the following guideline to Documentation/CodingGuidelines:

Break overlong lines after "&&", "||", and "|", not before
them; that way the command can continue to subsequent lines
without backslash at the end.

And the following to t/README (since it is specific to writing tests):

Pipes and $(git ...) should be avoided when they swallow exit
codes of Git processes

Signed-off-by: Matthew DeVore 
---
 Documentation/CodingGuidelines | 18 ++
 t/README   | 27 +++
 2 files changed, 45 insertions(+)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 48aa4edfb..72967deb7 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -118,6 +118,24 @@ For shell scripts specifically (not exhaustive):
do this
fi
 
+ - If a command sequence joined with && or || or | spans multiple
+   lines, put each command on a separate line and put && and || and |
+   operators at the end of each line, rather than the start. This
+   means you don't need to use \ to join lines, since the above
+   operators imply the sequence isn't finished.
+
+   (incorrect)
+   grep blob verify_pack_result \
+   | awk -f print_1.awk \
+   | sort >actual &&
+   ...
+
+   (correct)
+   grep blob verify_pack_result |
+   awk -f print_1.awk |
+   sort >actual &&
+   ...
+
  - We prefer "test" over "[ ... ]".
 
  - We do not write the noiseword "function" in front of shell
diff --git a/t/README b/t/README
index 68012d673..ab9fa4230 100644
--- a/t/README
+++ b/t/README
@@ -466,6 +466,33 @@ And here are the "don'ts:"
platform commands; just use '! cmd'.  We are not in the business
of verifying that the world given to us sanely works.
 
+ - Don't feed the output of a git command to a pipe, as in:
+
+ git -C repo ls-files |
+ xargs -n 1 basename |
+ grep foo
+
+   which will discard git's exit code and may mask a crash. In the
+   above example, all exit codes are ignored except grep's.
+
+   Instead, write the output of that command to a temporary
+   file with ">" or assign it to a variable with "x=$(git ...)" rather
+   than pipe it.
+
+ - Don't use command substitution in a way that discards git's exit
+   code. When assigning to a variable, the exit code is not discarded,
+   e.g.:
+
+ x=$(git cat-file -p $sha) &&
+ ...
+
+   is OK because a crash in "git cat-file" will cause the "&&" chain
+   to fail, but:
+
+ test "refs/heads/foo" = "$(git symbolic-ref HEAD)"
+
+   is not OK and a crash in git could go undetected.
+
  - Don't use perl without spelling it as "$PERL_PATH". This is to help
our friends on Windows where the platform Perl often adds CR before
the end of line, and they bundle Git with a version of Perl that
-- 
2.19.0.605.g01d371f741-goog

[PATCH v5 1/7] t/README: reformat Do, Don't, Keep in mind lists

2018-10-05 Thread Matthew DeVore

The list of Don'ts for test writing has grown large such that it is hard
to see at a glance which section an item is in. In other words, if I
ignore a little bit of surrounding context, the "don'ts" look like
"do's."

To make the list more readable, prefix "Don't" in front of every first
sentence in the items.

Also, the "Keep in mind" list is out of place and awkward, because it
was a very short "list" beneath two very long ones, and it seemed easy
to miss under the list of "don'ts," and it only had one item. So move
this item to the list of "do's" and phrase as "Remember..."

Signed-off-by: Matthew DeVore 
---
 t/README | 42 --
 1 file changed, 20 insertions(+), 22 deletions(-)

diff --git a/t/README b/t/README
index 9028b47d9..68012d673 100644
--- a/t/README
+++ b/t/README
@@ -393,13 +393,13 @@ This test harness library does the following things:
consistently when command line arguments --verbose (or -v),
--debug (or -d), and --immediate (or -i) is given.
 
-Do's, don'ts & things to keep in mind
--
+Do's & don'ts
+-
 
 Here are a few examples of things you probably should and shouldn't do
 when writing tests.
 
-Do:
+Here are the "do's:"
 
  - Put all code inside test_expect_success and other assertions.
 
@@ -444,16 +444,21 @@ Do:
Windows, where the shell (MSYS bash) mangles absolute path names.
For details, see the commit message of 4114156ae9.
 
-Don't:
+ - Remember that inside the

[PATCH v5 3/7] tests: standardize pipe placement

2018-10-05 Thread Matthew DeVore

Instead of using a line-continuation and pipe on the second line, take
advantage of the shell's implicit line continuation after a pipe
character.  So for example, instead of

some long line \
| next line

use

some long line |
next line

And add a blank line before and after the pipe where it aids readability
(it usually does).

This better matches the coding style documented in
Documentation/CodingGuidelines and used in shell scripts elsewhere in
the tree.

Signed-off-by: Matthew DeVore 
---
 t/lib-gpg.sh   |   9 +-
 t/t1006-cat-file.sh|   8 +-
 t/t1300-config.sh  |   5 +-
 t/t5317-pack-objects-filter-objects.sh | 330 ++---
 t/t5500-fetch-pack.sh  |   7 +-
 t/t5616-partial-clone.sh   |  32 ++-
 t/t6112-rev-list-filters-objects.sh| 203 ---
 7 files changed, 344 insertions(+), 250 deletions(-)

diff --git a/t/lib-gpg.sh b/t/lib-gpg.sh
index 3fe02876c..f1277bef4 100755
--- a/t/lib-gpg.sh
+++ b/t/lib-gpg.sh
@@ -57,9 +57,12 @@ then
echo | gpgsm --homedir "${GNUPGHOME}" 2>/dev/null \
--passphrase-fd 0 --pinentry-mode loopback \
--import "$TEST_DIRECTORY"/lib-gpg/gpgsm_cert.p12 &&
-   gpgsm --homedir "${GNUPGHOME}" 2>/dev/null -K \
-   | grep fingerprint: | cut -d" " -f4 | tr -d '\n' > \
-   ${GNUPGHOME}/trustlist.txt &&
+
+   gpgsm --homedir "${GNUPGHOME}" 2>/dev/null -K |
+   grep fingerprint: |
+   cut -d" " -f4 |
+   tr -d '\n' >"${GNUPGHOME}/trustlist.txt" &&
+
echo " S relax" >> ${GNUPGHOME}/trustlist.txt &&
(gpgconf --kill gpg-agent >/dev/null 2>&1 || : ) &&
echo hello | gpgsm --homedir "${GNUPGHOME}" >/dev/null \
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 7f19d591f..a0fa926d3 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -218,8 +218,8 @@ test_expect_success "--batch-check for a non-existent hash" 
'
 test "0042 missing
 0084 missing" = \
 "$( ( echo 0042;
- echo_without_newline 0084; ) \
-   | git cat-file --batch-check)"
+ echo_without_newline 0084; ) |
+   git cat-file --batch-check)"
 '
 
 test_expect_success "--batch for an existent and a non-existent hash" '
@@ -227,8 +227,8 @@ test_expect_success "--batch for an existent and a 
non-existent hash" '
 $tag_content
  missing" = \
 "$( ( echo $tag_sha1;
- echo_without_newline ; ) \
-   | git cat-file --batch)"
+ echo_without_newline ; ) |
+   git cat-file --batch)"
 '
 
 test_expect_success "--batch-check for an empty line" '
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index cdf1fed5d..5869d6cb6 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1770,8 +1770,9 @@ test_expect_success '--show-origin stdin with file 
include' '
cat >expect <<-EOF &&
file:$INCLUDE_DIR/stdin.include include
EOF
-   echo "[include]path=\"$INCLUDE_DIR\"/stdin.include" \
-   | git config --show-origin --includes --file - user.stdin 
>output &&
+   echo "[include]path=\"$INCLUDE_DIR\"/stdin.include" |
+   git config --show-origin --includes --file - user.stdin >output &&
+
test_cmp expect output
 '
 
diff --git a/t/t5317-pack-objects-filter-objects.sh 
b/t/t5317-pack-objects-filter-objects.sh
index 6710c8bc8..ce69148ec 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -20,17 +20,20 @@ test_expect_success 'setup r1' '
 '
 
 test_expect_success 'verify blob count in normal packfile' '
-   git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 \
-   | awk -f print_2.awk \
-   | sort >expected &&
+   git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 |
+   awk -f print_2.awk |
+   sort >expected &&
+
git -C r1 pack-objects --rev --stdout >all.pack <<-EOF &&
HEAD
EOF
git -C r1 index-pack ../all.pack &&
-   git -C r1 verify-pack -v ../all.pack \
-   | grep blob \
-   | awk -f print_1.awk \
-   | sort >observed &&
+
+   git -C r1 verify-pack -v ../all.pack |
+   grep blob |
+   awk -f print_1.awk |
+   sort >observed &&
+
test_cmp observed expected
 '
 
@@ -39,23 +42,27 @@ test_expect_success 'verify blob:none packfile has no 
blobs' '
HEAD
EOF
git -C r1 index-pack ../filter.pack &&
-   git -C r1 verify-pack -v ../filter.pack \

[PATCH v5 0/7] subject: Clean up tests for test_cmp arg ordering and pipe placement

2018-10-05 Thread Matthew DeVore

This version of the patchset fixes some wording and formatting issues
pointed out by Junio. The commit message in the first patch has also
been reworded.

Thank you,
Matt

diff --git a/t/README b/t/README
index 9a71d5732..ab9fa4230 100644
--- a/t/README
+++ b/t/README
@@ -394,7 +394,7 @@ This test harness library does the following things:
--debug (or -d), and --immediate (or -i) is given.
 
 Do's & don'ts
--
+-
 
 Here are a few examples of things you probably should and shouldn't do
 when writing tests.
@@ -466,8 +466,7 @@ And here are the "don'ts:"
platform commands; just use '! cmd'.  We are not in the business
of verifying that the world given to us sanely works.
 
- - Don't use Git upstream in the non-final position in a piped chain, as
-   in:
+ - Don't feed the output of a git command to a pipe, as in:
 
  git -C repo ls-files |
  xargs -n 1 basename |
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index eeedd1623..6ff614692 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -35,7 +35,7 @@ test_expect_success 'setup bare clone for server' '
 test_expect_success 'do partial clone 1' '
 git clone --no-checkout --filter=blob:none "file://$(pwd)/srv.bare" 
pc1 &&
 
-git -C pc1 rev-list --quiet --objects --missing=print >revs HEAD &&
+git -C pc1 rev-list --quiet --objects --missing=print HEAD >revs &&
 awk -f print_1.awk revs |
 sed "s/?//" |
 sort >observed.oids &&
@@ -93,8 +93,8 @@ test_expect_success 'verify diff causes dynamic object fetch' 
'
 test_expect_success 'verify blame causes dynamic object fetch' '
 git -C pc1 blame origin/master -- file.1.txt >observed.blame &&
 test_cmp expect.blame observed.blame &&
-git -C pc1 rev-list --quiet --objects --missing=print >observed \
-master..origin/master &&
+git -C pc1 rev-list --quiet --objects --missing=print \
+master..origin/master >observed &&
 test_line_count = 0 observed
 '
 

Matthew DeVore (7):
  t/README: reformat Do, Don't, Keep in mind lists
  Documentation: add shell guidelines
  tests: standardize pipe placement
  t/*: fix ordering of expected/observed arguments
  tests: don't swallow Git errors upstream of pipes
  t9109: don't swallow Git errors upstream of pipes
  tests: order arguments to git-rev-list properly

 Documentation/CodingGuidelines |  18 ++
 t/README   |  69 +++--
 t/lib-gpg.sh   |   9 +-
 t/t-basic.sh   |   2 +-
 t/t0021-conversion.sh  |   4 +-
 t/t1006-cat-file.sh|   8 +-
 t/t1300-config.sh  |   9 +-
 t/t1303-wacky-config.sh|   4 +-
 t/t2101-update-index-reupdate.sh   |   2 +-
 t/t3200-branch.sh  |   2 +-
 t/t3320-notes-merge-worktrees.sh   |   4 +-
 t/t3400-rebase.sh  |   8 +-
 t/t3417-rebase-whitespace-fix.sh   |   6 +-
 t/t3702-add-edit.sh|   4 +-
 t/t3903-stash.sh   |   8 +-
 t/t3905-stash-include-untracked.sh |   2 +-
 t/t4025-hunk-header.sh |   2 +-
 t/t4117-apply-reject.sh|   6 +-
 t/t4124-apply-ws-rule.sh   |  30 +--
 t/t4138-apply-ws-expansion.sh  |   2 +-
 t/t5317-pack-objects-filter-objects.sh | 360 ++---
 t/t5318-commit-graph.sh|   2 +-
 t/t5500-fetch-pack.sh  |   7 +-
 t/t5616-partial-clone.sh   |  50 ++--
 t/t5701-git-serve.sh   |  14 +-
 t/t5702-protocol-v2.sh |  14 +-
 t/t6023-merge-file.sh  |  12 +-
 t/t6027-merge-binary.sh|   4 +-
 t/t6031-merge-filemode.sh  |   2 +-
 t/t6112-rev-list-filters-objects.sh| 237 +---
 t/t7201-co.sh  |   4 +-
 t/t7406-submodule-update.sh|   8 +-
 t/t7800-difftool.sh|   2 +-
 t/t9100-git-svn-basic.sh   |   2 +-
 t/t9101-git-svn-props.sh   |  34 ++-
 t/t9133-git-svn-nested-git-repo.sh |   6 +-
 t/t9600-cvsimport.sh   |   2 +-
 t/t9603-cvsimport-patchsets.sh |   4 +-
 t/t9604-cvsimport-timestamps.sh|   4 +-
 39 files changed, 568 insertions(+), 399 deletions(-)

-- 
2.19.0.605.g01d371f741-goog

Re: [PATCH v3 2/5] fetch-pack: load tip_oids eagerly iff needed

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 11:22:31PM +0200, René Scharfe wrote:

> > diff --git a/fetch-pack.c b/fetch-pack.c
> > index 53914563b5..c0a1b80f4c 100644
> > --- a/fetch-pack.c
> > +++ b/fetch-pack.c
> > @@ -606,6 +606,12 @@ static void filter_refs(struct fetch_pack_args *args,
> > ref->match_status = REF_MATCHED;
> > *newtail = copy_ref(ref);
> > newtail = &(*newtail)->next;
> > +   /*
> > +* No need to update tip_oids with ref->old_oid; we got
> > +* here because either it was already there, or we are
> > +* in !strict mode, in which case we do not use
> > +* tip_oids at all.
> > +*/
> > } else {
> > ref->match_status = REF_UNADVERTISED_NOT_ALLOWED;
> > }
> 
> This comment puzzles me -- why ask the question it answers?
> `tip_oids` has been loaded with all `refs` at that point; adding
> more seems odd.

If you think that tip_oids is a fast lookup for what's in newlist, then
I think it is a reasonable question to ask whether new additions to
newlist need the same treatment. That was what the comment in the
original lazy-load was answering.

> I feel the code needs be simplified further; not sure how, though,
> except perhaps by using the `unfound` array added in another reply.

I agree it's not the most clear code in the world, but we may be
reaching a point of diminishing returns in discussing it further.

-Peff

[PATCH v11 6/8] list-objects-filter: use BUG rather than die

2018-10-05 Thread Matthew DeVore

In some cases in this file, BUG makes more sense than die. In such
cases, a we get there from a coding error rather than a user error.

'return' has been removed following some instances of BUG since BUG does
not return.

Signed-off-by: Matthew DeVore 
---
 list-objects-filter.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..5f8b1a002 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none(
 
switch (filter_situation) {
default:
-   die("unknown filter_situation");
-   return LOFR_ZERO;
+   BUG("unknown filter_situation: %d", filter_situation);
 
case LOFS_BEGIN_TREE:
assert(obj->type == OBJ_TREE);
@@ -102,8 +101,7 @@ static enum list_objects_filter_result filter_blobs_limit(
 
switch (filter_situation) {
default:
-   die("unknown filter_situation");
-   return LOFR_ZERO;
+   BUG("unknown filter_situation: %d", filter_situation);
 
case LOFS_BEGIN_TREE:
assert(obj->type == OBJ_TREE);
@@ -208,8 +206,7 @@ static enum list_objects_filter_result filter_sparse(
 
switch (filter_situation) {
default:
-   die("unknown filter_situation");
-   return LOFR_ZERO;
+   BUG("unknown filter_situation: %d", filter_situation);
 
case LOFS_BEGIN_TREE:
assert(obj->type == OBJ_TREE);
@@ -389,7 +386,7 @@ void *list_objects_filter__init(
assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
 
if (filter_options->choice >= LOFC__COUNT)
-   die("invalid list-objects filter choice: %d",
+   BUG("invalid list-objects filter choice: %d",
filter_options->choice);
 
init_fn = s_filters[filter_options->choice];
-- 
2.19.0.605.g01d371f741-goog

[PATCH v11 5/8] revision: mark non-user-given objects instead

2018-10-05 Thread Matthew DeVore

Currently, list-objects.c incorrectly treats all root trees of commits
as USER_GIVEN. Also, it would be easier to mark objects that are
non-user-given instead of user-given, since the places in the code
where we access an object through a reference are more obvious than
the places where we access an object that was given by the user.

Resolve these two problems by introducing a flag NOT_USER_GIVEN that
marks blobs and trees that are non-user-given, replacing USER_GIVEN.
(Only blobs and trees are marked because this mark is only used when
filtering objects, and filtering of other types of objects is not
supported yet.)

This fixes a bug in that git rev-list behaved differently from git
pack-objects. pack-objects would *not* filter objects given explicitly
on the command line and rev-list would filter. This was because the two
commands used a different function to add objects to the rev_info
struct. This seems to have been an oversight, and pack-objects has the
correct behavior, so I added a test to make sure that rev-list now
behaves properly.

Signed-off-by: Matthew DeVore 
---
 list-objects.c  | 31 +
 revision.c  |  1 -
 revision.h  | 11 --
 t/t6112-rev-list-filters-objects.sh | 12 +++
 4 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 243192af5..7a1a0929d 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx,
 
pathlen = path->len;
strbuf_addstr(path, name);
-   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
r = ctx->filter_fn(LOFS_BLOB, obj,
   path->buf, >buf[pathlen],
   ctx->filter_data);
@@ -120,17 +120,19 @@ static void process_tree_contents(struct 
traversal_context *ctx,
continue;
}
 
-   if (S_ISDIR(entry.mode))
-   process_tree(ctx,
-lookup_tree(the_repository, entry.oid),
-base, entry.path);
+   if (S_ISDIR(entry.mode)) {
+   struct tree *t = lookup_tree(the_repository, entry.oid);
+   t->object.flags |= NOT_USER_GIVEN;
+   process_tree(ctx, t, base, entry.path);
+   }
else if (S_ISGITLINK(entry.mode))
process_gitlink(ctx, entry.oid->hash,
base, entry.path);
-   else
-   process_blob(ctx,
-lookup_blob(the_repository, entry.oid),
-base, entry.path);
+   else {
+   struct blob *b = lookup_blob(the_repository, entry.oid);
+   b->object.flags |= NOT_USER_GIVEN;
+   process_blob(ctx, b, base, entry.path);
+   }
}
 }
 
@@ -171,7 +173,7 @@ static void process_tree(struct traversal_context *ctx,
}
 
strbuf_addstr(base, name);
-   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
   base->buf, >buf[baselen],
   ctx->filter_data);
@@ -185,7 +187,7 @@ static void process_tree(struct traversal_context *ctx,
if (!failed_parse)
process_tree_contents(ctx, tree, base);
 
-   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+   if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
r = ctx->filter_fn(LOFS_END_TREE, obj,
   base->buf, >buf[baselen],
   ctx->filter_data);
@@ -301,8 +303,11 @@ static void do_traverse(struct traversal_context *ctx)
 * an uninteresting boundary commit may not have its tree
 * parsed yet, but we are not going to show them anyway
 */
-   if (get_commit_tree(commit))
-   add_pending_tree(ctx->revs, get_commit_tree(commit));
+   if (get_commit_tree(commit)) {
+   struct tree *tree = get_commit_tree(commit);
+   tree->object.flags |= NOT_USER_GIVEN;
+   add_pending_tree(ctx->revs, tree);
+   }
ctx->show_commit(commit, ctx->show_data);
 
if (ctx->revs->tree_blobs_in_commit_order)
diff --git a/revision.c b/revision.c
index de4dce600..72d48a17f 100644
--- a/revision.c
+++ b/revision.c
@@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info 
*revs,

[PATCH v11 8/8] list-objects-filter: implement filter tree:0

2018-10-05 Thread Matthew DeVore

Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also considered only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore 
---
 Documentation/rev-list-options.txt |  5 +++
 list-objects-filter-options.c  | 13 +++
 list-objects-filter-options.h  |  1 +
 list-objects-filter.c  | 49 ++
 t/t5317-pack-objects-filter-objects.sh | 28 +++
 t/t5616-partial-clone.sh   | 42 ++
 t/t6112-rev-list-filters-objects.sh| 15 
 7 files changed, 153 insertions(+)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 7b273635d..5f1672913 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -731,6 +731,11 @@ the requested refs.
 +
 The form '--filter=sparse:path=' similarly uses a sparse-checkout
 specification contained in .
++
+The form '--filter=tree:' omits all blobs and trees whose depth
+from the root tree is >=  (minimum depth if an object is located
+at multiple depths in the commits traversed). Currently, only =0
+is supported, which omits all blobs and trees.
 
 --no-filter::
Turn off any previous `--filter=` argument.
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index d259bdb2c..e8da2e858 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -49,6 +49,19 @@ static int gently_parse_list_objects_filter(
return 0;
}
 
+   } else if (skip_prefix(arg, "tree:", )) {
+   unsigned long depth;
+   if (!git_parse_ulong(v0, ) || depth != 0) {
+   if (errbuf) {
+   strbuf_addstr(
+   errbuf,
+   _("only 'tree:0' is supported"));
+   }
+   return 1;
+   }
+   filter_options->choice = LOFC_TREE_NONE;
+   return 0;
+
} else if (skip_prefix(arg, "sparse:oid=", )) {
struct object_context oc;
struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
LOFC_DISABLED = 0,
LOFC_BLOB_NONE,
LOFC_BLOB_LIMIT,
+   LOFC_TREE_NONE,
LOFC_SPARSE_OID,
LOFC_SPARSE_PATH,
LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 5f8b1a002..09b2b05d5 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -79,6 +79,54 @@ static void *filter_blobs_none__init(
return d;
 }
 
+/*
+ * A filter for list-objects to omit ALL trees and blobs from the traversal.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_trees_none_data {
+   struct oidset *omits;
+};
+
+static enum list_objects_filter_result filter_trees_none(
+   enum list_objects_filter_situation filter_situation,
+   struct object *obj,
+   const char *pathname,
+   const char *filename,
+   void *filter_data_)
+{
+   struct filter_trees_none_data *filter_data = filter_data_;
+
+   switch (filter_situation) {
+   default:
+   BUG("unknown filter_situation: %d", filter_situation);
+
+   case LOFS_BEGIN_TREE:
+   case LOFS_BLOB:
+   if (filter_data->omits)
+   oidset_insert(filter_data->omits, >oid);
+   return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+
+   case LOFS_END_TREE:
+   assert(obj->type == OBJ_TREE);
+   return LOFR_ZERO;
+
+   }
+}
+
+static void* filter_trees_none__init(
+   struct oidset *omitted,
+   struct list_objects_filter_options *filter_options,
+   filter_object_fn *filter_fn,
+   filter_free_fn *filter_free_fn)
+{
+   struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
+   d->omits = omitted;
+
+   *filter_fn = filter_trees_none;
+

[PATCH v11 7/8] list-objects-filter-options: do not over-strbuf_init

2018-10-05 Thread Matthew DeVore

The function gently_parse_list_objects_filter is either called with
errbuf=STRBUF_INIT or errbuf=NULL, but that function calls strbuf_init
when errbuf is not NULL. strbuf_init is only necessary if errbuf
contains garbage, and risks a memory leak if errbuf already has a
non-STRBUF_INIT state. It should be the caller's responsibility to make
sure errbuf is not garbage, since garbage content is easily avoidable
with STRBUF_INIT.

Signed-off-by: Matthew DeVore 
---
 list-objects-filter-options.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..d259bdb2c 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -30,7 +30,6 @@ static int gently_parse_list_objects_filter(
 
if (filter_options->choice) {
if (errbuf) {
-   strbuf_init(errbuf, 0);
strbuf_addstr(
errbuf,
_("multiple filter-specs cannot be combined"));
@@ -71,10 +70,9 @@ static int gently_parse_list_objects_filter(
return 0;
}
 
-   if (errbuf) {
-   strbuf_init(errbuf, 0);
+   if (errbuf)
strbuf_addf(errbuf, "invalid filter-spec '%s'", arg);
-   }
+
memset(filter_options, 0, sizeof(*filter_options));
return 1;
 }
-- 
2.19.0.605.g01d371f741-goog

[PATCH v11 4/8] rev-list: handle missing tree objects properly

2018-10-05 Thread Matthew DeVore

Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. The --missing=*
and --exclude-promisor-objects flags now work for trees as they already
do for blobs. This is demonstrated in t6112.

Signed-off-by: Matthew DeVore 
---
 builtin/rev-list.c | 11 ---
 list-objects.c | 11 +--
 revision.h | 15 +
 t/t0410-partial-clone.sh   | 45 ++
 t/t5317-pack-objects-filter-objects.sh | 13 
 t/t6112-rev-list-filters-objects.sh| 22 +
 6 files changed, 110 insertions(+), 7 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..49d6deed7 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 */
switch (arg_missing_action) {
case MA_ERROR:
-   die("missing blob object '%s'", oid_to_hex(>oid));
+   die("missing %s object '%s'",
+   type_name(obj->type), oid_to_hex(>oid));
return;
 
case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
case MA_ALLOW_PROMISOR:
if (is_promisor_object(>oid))
return;
-   die("unexpected missing blob object '%s'",
-   oid_to_hex(>oid));
+   die("unexpected missing %s object '%s'",
+   type_name(obj->type), oid_to_hex(>oid));
return;
 
default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
struct rev_list_info *info = cb_data;
-   if (obj->type == OBJ_BLOB && !has_object_file(>oid)) {
+   if (!has_object_file(>oid)) {
finish_object__ma(obj);
return 1;
}
@@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
init_revisions(, prefix);
revs.abbrev = DEFAULT_ABBREV;
revs.commit_format = CMIT_FMT_UNSPECIFIED;
+   revs.do_not_die_on_missing_tree = 1;
 
/*
 * Scan the argument list before invoking setup_revisions(), so that we
diff --git a/list-objects.c b/list-objects.c
index f9b51db7a..243192af5 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,6 +143,7 @@ static void process_tree(struct traversal_context *ctx,
struct rev_info *revs = ctx->revs;
int baselen = base->len;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
+   int failed_parse;
 
if (!revs->tree_objects)
return;
@@ -150,7 +151,9 @@ static void process_tree(struct traversal_context *ctx,
die("bad tree object");
if (obj->flags & (UNINTERESTING | SEEN))
return;
-   if (parse_tree_gently(tree, 1) < 0) {
+
+   failed_parse = parse_tree_gently(tree, 1);
+   if (failed_parse) {
if (revs->ignore_missing_links)
return;
 
@@ -163,7 +166,8 @@ static void process_tree(struct traversal_context *ctx,
is_promisor_object(>oid))
return;
 
-   die("bad tree object %s", oid_to_hex(>oid));
+   if (!revs->do_not_die_on_missing_tree)
+   die("bad tree object %s", oid_to_hex(>oid));
}
 
strbuf_addstr(base, name);
@@ -178,7 +182,8 @@ static void process_tree(struct traversal_context *ctx,
if (base->len)
strbuf_addch(base, '/');
 
-   process_tree_contents(ctx, tree, base);
+   if (!failed_parse)
+   process_tree_contents(ctx, tree, base);
 
if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
r = ctx->filter_fn(LOFS_END_TREE, obj,
diff --git a/revision.h b/revision.h
index 007278cc1..5910613cb 100644
--- a/revision.h
+++ b/revision.h
@@ -126,6 +126,21 @@ struct rev_info {
line_level_traverse:1,
tree_blobs_in_commit_order:1,
 
+   /*
+* Blobs are shown without regard for their existence.
+* But not so for trees: unless exclude_promisor_objects
+* is set and the tree in question is a promisor object;
+* OR ignore_missing_links is set, the revision walker
+* dies with a "bad tree object HASH" message when
+* encountering a missing tree. For callers that can
+* handle missing trees

[PATCH v11 2/8] list-objects: refactor to process_tree_contents

2018-10-05 Thread Matthew DeVore

This will be used in a follow-up patch to reduce indentation needed when
invoking the logic conditionally. i.e. rather than:

if (foo) {
while (...) {
/* this is very indented */
}
}

we will have:

if (foo)
process_tree_contents(...);

Signed-off-by: Matthew DeVore 
---
 list-objects.c | 68 ++
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 584518a3f..ccc529e5e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -94,6 +94,46 @@ static void process_gitlink(struct traversal_context *ctx,
/* Nothing to do */
 }
 
+static void process_tree(struct traversal_context *ctx,
+struct tree *tree,
+struct strbuf *base,
+const char *name);
+
+static void process_tree_contents(struct traversal_context *ctx,
+ struct tree *tree,
+ struct strbuf *base)
+{
+   struct tree_desc desc;
+   struct name_entry entry;
+   enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
+   all_entries_interesting : entry_not_interesting;
+
+   init_tree_desc(, tree->buffer, tree->size);
+
+   while (tree_entry(, )) {
+   if (match != all_entries_interesting) {
+   match = tree_entry_interesting(, base, 0,
+  
>revs->diffopt.pathspec);
+   if (match == all_entries_not_interesting)
+   break;
+   if (match == entry_not_interesting)
+   continue;
+   }
+
+   if (S_ISDIR(entry.mode))
+   process_tree(ctx,
+lookup_tree(the_repository, entry.oid),
+base, entry.path);
+   else if (S_ISGITLINK(entry.mode))
+   process_gitlink(ctx, entry.oid->hash,
+   base, entry.path);
+   else
+   process_blob(ctx,
+lookup_blob(the_repository, entry.oid),
+base, entry.path);
+   }
+}
+
 static void process_tree(struct traversal_context *ctx,
 struct tree *tree,
 struct strbuf *base,
@@ -101,10 +141,6 @@ static void process_tree(struct traversal_context *ctx,
 {
struct object *obj = >object;
struct rev_info *revs = ctx->revs;
-   struct tree_desc desc;
-   struct name_entry entry;
-   enum interesting match = revs->diffopt.pathspec.nr == 0 ?
-   all_entries_interesting: entry_not_interesting;
int baselen = base->len;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
int gently = revs->ignore_missing_links ||
@@ -144,29 +180,7 @@ static void process_tree(struct traversal_context *ctx,
if (base->len)
strbuf_addch(base, '/');
 
-   init_tree_desc(, tree->buffer, tree->size);
-
-   while (tree_entry(, )) {
-   if (match != all_entries_interesting) {
-   match = tree_entry_interesting(, base, 0,
-  >diffopt.pathspec);
-   if (match == all_entries_not_interesting)
-   break;
-   if (match == entry_not_interesting)
-   continue;
-   }
-
-   if (S_ISDIR(entry.mode))
-   process_tree(ctx,
-lookup_tree(the_repository, entry.oid),
-base, entry.path);
-   else if (S_ISGITLINK(entry.mode))
-   process_gitlink(ctx, entry.oid->hash, base, entry.path);
-   else
-   process_blob(ctx,
-lookup_blob(the_repository, entry.oid),
-base, entry.path);
-   }
+   process_tree_contents(ctx, tree, base);
 
if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
r = ctx->filter_fn(LOFS_END_TREE, obj,
-- 
2.19.0.605.g01d371f741-goog

[PATCH v11 1/8] list-objects: store common func args in struct

2018-10-05 Thread Matthew DeVore

This will make utility functions easier to create, as done by the next
patch.

Signed-off-by: Matthew DeVore 
---
 list-objects.c | 158 +++--
 1 file changed, 74 insertions(+), 84 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index c99c47ac1..584518a3f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -12,20 +12,25 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static void process_blob(struct rev_info *revs,
+struct traversal_context {
+   struct rev_info *revs;
+   show_object_fn show_object;
+   show_commit_fn show_commit;
+   void *show_data;
+   filter_object_fn filter_fn;
+   void *filter_data;
+};
+
+static void process_blob(struct traversal_context *ctx,
 struct blob *blob,
-show_object_fn show,
 struct strbuf *path,
-const char *name,
-void *cb_data,
-filter_object_fn filter_fn,
-void *filter_data)
+const char *name)
 {
struct object *obj = >object;
size_t pathlen;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
-   if (!revs->blob_objects)
+   if (!ctx->revs->blob_objects)
return;
if (!obj)
die("bad blob object");
@@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs,
 * may cause the actual filter to report an incomplete list
 * of missing objects.
 */
-   if (revs->exclude_promisor_objects &&
+   if (ctx->revs->exclude_promisor_objects &&
!has_object_file(>oid) &&
is_promisor_object(>oid))
return;
 
pathlen = path->len;
strbuf_addstr(path, name);
-   if (!(obj->flags & USER_GIVEN) && filter_fn)
-   r = filter_fn(LOFS_BLOB, obj,
- path->buf, >buf[pathlen],
- filter_data);
+   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   r = ctx->filter_fn(LOFS_BLOB, obj,
+  path->buf, >buf[pathlen],
+  ctx->filter_data);
if (r & LOFR_MARK_SEEN)
obj->flags |= SEEN;
if (r & LOFR_DO_SHOW)
-   show(obj, path->buf, cb_data);
+   ctx->show_object(obj, path->buf, ctx->show_data);
strbuf_setlen(path, pathlen);
 }
 
@@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs,
  * the link, and how to do it. Whether it necessarily makes
  * any sense what-so-ever to ever do that is another issue.
  */
-static void process_gitlink(struct rev_info *revs,
+static void process_gitlink(struct traversal_context *ctx,
const unsigned char *sha1,
-   show_object_fn show,
struct strbuf *path,
-   const char *name,
-   void *cb_data)
+   const char *name)
 {
/* Nothing to do */
 }
 
-static void process_tree(struct rev_info *revs,
+static void process_tree(struct traversal_context *ctx,
 struct tree *tree,
-show_object_fn show,
 struct strbuf *base,
-const char *name,
-void *cb_data,
-filter_object_fn filter_fn,
-void *filter_data)
+const char *name)
 {
struct object *obj = >object;
+   struct rev_info *revs = ctx->revs;
struct tree_desc desc;
struct name_entry entry;
enum interesting match = revs->diffopt.pathspec.nr == 0 ?
@@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs,
}
 
strbuf_addstr(base, name);
-   if (!(obj->flags & USER_GIVEN) && filter_fn)
-   r = filter_fn(LOFS_BEGIN_TREE, obj,
- base->buf, >buf[baselen],
- filter_data);
+   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
+  base->buf, >buf[baselen],
+  ctx->filter_data);
if (r & LOFR_MARK_SEEN)
obj->flags |= SEEN;
if (r & LOFR_DO_SHOW)
-   show(obj, base->buf, cb_data);
+   ctx->show_object(obj, base->buf, ctx->show_data);
if (base->len)
strbuf_addch(base, '/');
 
@@ -157,29 +157,25 @@ static void process_tree(struct rev_info *revs,
}
 
if (S_ISDIR(entry.mode))
-   process_tree(revs,
+   process_tree(ctx,
 lookup_tree(the_repository, entry.oid),
-

[PATCH v11 3/8] list-objects: always parse trees gently

2018-10-05 Thread Matthew DeVore

If parsing fails when revs->ignore_missing_links and
revs->exclude_promisor_objects are both false, we print the OID anyway
in the die("bad tree object...") call, so any message printed by
parse_tree_gently() is superfluous.

Signed-off-by: Matthew DeVore 
---
 list-objects.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ccc529e5e..f9b51db7a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,8 +143,6 @@ static void process_tree(struct traversal_context *ctx,
struct rev_info *revs = ctx->revs;
int baselen = base->len;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
-   int gently = revs->ignore_missing_links ||
-revs->exclude_promisor_objects;
 
if (!revs->tree_objects)
return;
@@ -152,7 +150,7 @@ static void process_tree(struct traversal_context *ctx,
die("bad tree object");
if (obj->flags & (UNINTERESTING | SEEN))
return;
-   if (parse_tree_gently(tree, gently) < 0) {
+   if (parse_tree_gently(tree, 1) < 0) {
if (revs->ignore_missing_links)
return;
 
-- 
2.19.0.605.g01d371f741-goog

[PATCH v11 0/8] filter: support for excluding all trees and blobs

2018-10-05 Thread Matthew DeVore

Here is a clean re-rollup fixing the issue I found earlier. It fixes a problem
stemming from a discarded exit code, which masked a crash in Git. The crash was
not a bug because an earlier test deleted a loose object file. The fix was to
make that test manipulate a clone rather than the original repo.

An interdiff from v10 is below.

Thank you,
Matt

diff --git a/t/t6112-rev-list-filters-objects.sh 
b/t/t6112-rev-list-filters-objects.sh
index 5a61614b1..c8e3d87c4 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -210,16 +210,21 @@ test_expect_success 'verify sparse:oid=oid-ish omits 
top-level files' '
 test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for 
trees' '
 TREE=$(git -C r3 rev-parse HEAD:dir1) &&
 
-rm r3/.git/objects/$(echo $TREE | sed "s|^..|&/|") &&
+# Create a spare repo because we will be deleting objects from this 
one.
+git clone r3 r3.b &&
 
-git -C r3 rev-list --quiet --missing=print --objects HEAD 
>missing_objs 2>rev_list_err &&
+rm r3.b/.git/objects/$(echo $TREE | sed "s|^..|&/|") &&
+
+git -C r3.b rev-list --quiet --missing=print --objects HEAD \
+>missing_objs 2>rev_list_err &&
 echo "?$TREE" >expected &&
 test_cmp expected missing_objs &&
 
 # do not complain when a missing tree cannot be parsed
 test_must_be_empty rev_list_err &&
 
-git -C r3 rev-list --missing=allow-any --objects HEAD >objs 
2>rev_list_err &&
+git -C r3.b rev-list --missing=allow-any --objects HEAD \
+>objs 2>rev_list_err &&
 ! grep $TREE objs &&
 test_must_be_empty rev_list_err
 '
@@ -228,12 +233,13 @@ test_expect_success 'rev-list W/ --missing=print and 
--missing=allow-any for tre
 
 test_expect_success 'verify tree:0 includes trees in "filtered" output' '
 git -C r3 rev-list --quiet --objects --filter-print-omitted \
---filter=tree:0 HEAD |
-awk -f print_1.awk |
+--filter=tree:0 HEAD >revs &&
+
+awk -f print_1.awk revs |
 sed s/~// |
-xargs -n1 git -C r3 cat-file -t |
-sort -u >filtered_types &&
+xargs -n1 git -C r3 cat-file -t >unsorted_filtered_types &&
 
+sort -u unsorted_filtered_types >filtered_types &&
 printf "blob\ntree\n" >expected &&
 test_cmp expected filtered_types
 '

Matthew DeVore (8):
  list-objects: store common func args in struct
  list-objects: refactor to process_tree_contents
  list-objects: always parse trees gently
  rev-list: handle missing tree objects properly
  revision: mark non-user-given objects instead
  list-objects-filter: use BUG rather than die
  list-objects-filter-options: do not over-strbuf_init
  list-objects-filter: implement filter tree:0

 Documentation/rev-list-options.txt |   5 +
 builtin/rev-list.c |  11 +-
 list-objects-filter-options.c  |  19 +-
 list-objects-filter-options.h  |   1 +
 list-objects-filter.c  |  60 ++-
 list-objects.c | 232 +
 revision.c |   1 -
 revision.h |  26 ++-
 t/t0410-partial-clone.sh   |  45 +
 t/t5317-pack-objects-filter-objects.sh |  41 +
 t/t5616-partial-clone.sh   |  42 +
 t/t6112-rev-list-filters-objects.sh|  49 ++
 12 files changed, 404 insertions(+), 128 deletions(-)

-- 
2.19.0.605.g01d371f741-goog

Re: [PATCH v3 2/5] fetch-pack: load tip_oids eagerly iff needed

2018-10-05 Thread René Scharfe

Am 05.10.2018 um 22:27 schrieb Jeff King:
> On Fri, Oct 05, 2018 at 10:13:34PM +0200, René Scharfe wrote:
> 
 -{
 -  /*
 -   * Note that this only looks at the ref lists the first time it's
 -   * called. This works out in filter_refs() because even though it may
 -   * add to "newlist" between calls, the additions will always be for
 -   * oids that are already in the set.
 -   */
>>>
>>> I don't think the subtle point this comment is making goes away. We're
>>> still growing the list in the loop that calls tip_oids_contain() (and
>>> which now calls just oidset_contains). That's OK for the reasons given
>>> here, but I think that would need to be moved down to this code:
>>>
 +  if (strict) {
 +  for (i = 0; i < nr_sought; i++) {
 +  ref = sought[i];
 +  if (!is_unmatched_ref(ref))
 +  continue;
 +
 +  add_refs_to_oidset(_oids, unmatched);
 +  add_refs_to_oidset(_oids, newlist);
 +  break;
 +  }
 +  }
>>>
>>> I.e., we need to say here why it's OK to summarize newlist in the
>>> oidset, even though we're adding to it later.
>>
>> There is already this comment:
>>
>>  /* Append unmatched requests to the list */
>>
>> And that's enough in my eyes.  The refs loop at the top splits the list
>> into matched ("the list") and unmatched, and the loop below said comment
>> adds a few more.  I see no subtlety left -- what do I miss?
> 
> It looks like tip_oids is meant as a fast lookup into what's in
> unmatched and newlist. But in the second loop we continue appending to
> newlist. Why is it OK that we do not update tip_oids when we do so?

`tip_oids` contains the object_ids of the all `refs` passed to
filter_refs().  Instead of adding them at the top of the function that
is done only when it has become clear that there are unmatched ones,
as an optimization.  (That optimization was implemented by lazy-loading
in tip_oids_contain() earlier.)  At that point the list has been split
into `newlist` and `unmatched`, so we load from them instead of `refs`.

> 
> I.e., something like this explains it:
> 
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 53914563b5..c0a1b80f4c 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -606,6 +606,12 @@ static void filter_refs(struct fetch_pack_args *args,
>   ref->match_status = REF_MATCHED;
>   *newtail = copy_ref(ref);
>   newtail = &(*newtail)->next;
> + /*
> +  * No need to update tip_oids with ref->old_oid; we got
> +  * here because either it was already there, or we are
> +  * in !strict mode, in which case we do not use
> +  * tip_oids at all.
> +  */
>   } else {
>   ref->match_status = REF_UNADVERTISED_NOT_ALLOWED;
>   }

This comment puzzles me -- why ask the question it answers?
`tip_oids` has been loaded with all `refs` at that point; adding
more seems odd.

I feel the code needs be simplified further; not sure how, though,
except perhaps by using the `unfound` array added in another reply.

René

Re: [PATCH v3 2/5] fetch-pack: load tip_oids eagerly iff needed

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 10:13:34PM +0200, René Scharfe wrote:

> >> -{
> >> -  /*
> >> -   * Note that this only looks at the ref lists the first time it's
> >> -   * called. This works out in filter_refs() because even though it may
> >> -   * add to "newlist" between calls, the additions will always be for
> >> -   * oids that are already in the set.
> >> -   */
> > 
> > I don't think the subtle point this comment is making goes away. We're
> > still growing the list in the loop that calls tip_oids_contain() (and
> > which now calls just oidset_contains). That's OK for the reasons given
> > here, but I think that would need to be moved down to this code:
> > 
> >> +  if (strict) {
> >> +  for (i = 0; i < nr_sought; i++) {
> >> +  ref = sought[i];
> >> +  if (!is_unmatched_ref(ref))
> >> +  continue;
> >> +
> >> +  add_refs_to_oidset(_oids, unmatched);
> >> +  add_refs_to_oidset(_oids, newlist);
> >> +  break;
> >> +  }
> >> +  }
> > 
> > I.e., we need to say here why it's OK to summarize newlist in the
> > oidset, even though we're adding to it later.
> 
> There is already this comment:
> 
>   /* Append unmatched requests to the list */
> 
> And that's enough in my eyes.  The refs loop at the top splits the list
> into matched ("the list") and unmatched, and the loop below said comment
> adds a few more.  I see no subtlety left -- what do I miss?

It looks like tip_oids is meant as a fast lookup into what's in
unmatched and newlist. But in the second loop we continue appending to
newlist. Why is it OK that we do not update tip_oids when we do so?

I.e., something like this explains it:

diff --git a/fetch-pack.c b/fetch-pack.c
index 53914563b5..c0a1b80f4c 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -606,6 +606,12 @@ static void filter_refs(struct fetch_pack_args *args,
ref->match_status = REF_MATCHED;
*newtail = copy_ref(ref);
newtail = &(*newtail)->next;
+   /*
+* No need to update tip_oids with ref->old_oid; we got
+* here because either it was already there, or we are
+* in !strict mode, in which case we do not use
+* tip_oids at all.
+*/
} else {
ref->match_status = REF_UNADVERTISED_NOT_ALLOWED;
}

Re: [PATCH v3 2/5] fetch-pack: load tip_oids eagerly iff needed

2018-10-05 Thread René Scharfe

Am 05.10.2018 um 00:11 schrieb René Scharfe:
> Am 04.10.2018 um 23:38 schrieb Jonathan Tan:
>> I don't think the concerns are truly separated - the first loop quoted
>> needs to know that in the second loop, tip_oids is accessed only if
>> there is at least one unmatched ref.
> 
> Right, the two loops are still closely related, but only the first one
> is concerned with loading refs.
> 
> For a true separation we could first build a list of unmatched refs and
> then loop through that instead of `sought`.  Not sure if that's better,
> but maybe the overhead I imagine it would introduce isn't all that big.

Here's what I mean, on top of the other two patches:

---
 fetch-pack.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 53914563b5..7f28584bce 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -543,6 +543,8 @@ static void filter_refs(struct fetch_pack_args *args,
struct ref *newlist = NULL;
struct ref **newtail = 
struct ref *unmatched = NULL;
+   struct ref **unfound;
+   int nr_unfound = 0;
struct ref *ref, *next;
struct oidset tip_oids = OIDSET_INIT;
int i;
@@ -584,23 +586,19 @@ static void filter_refs(struct fetch_pack_args *args,
}
}
 
-   if (strict) {
-   for (i = 0; i < nr_sought; i++) {
-   ref = sought[i];
-   if (!is_unmatched_ref(ref))
-   continue;
-
-   add_refs_to_oidset(_oids, unmatched);
-   add_refs_to_oidset(_oids, newlist);
-   break;
-   }
+   ALLOC_ARRAY(unfound, nr_sought);
+   for (i = 0; i < nr_sought; i++) {
+   if (is_unmatched_ref(sought[i]))
+   unfound[nr_unfound++] = sought[i];
+   }
+   if (strict && nr_unfound) {
+   add_refs_to_oidset(_oids, unmatched);
+   add_refs_to_oidset(_oids, newlist);
}
 
/* Append unmatched requests to the list */
-   for (i = 0; i < nr_sought; i++) {
-   ref = sought[i];
-   if (!is_unmatched_ref(ref))
-   continue;
+   for (i = 0; i < nr_unfound; i++) {
+   ref = unfound[i];
 
if (!strict || oidset_contains(_oids, >old_oid)) {
ref->match_status = REF_MATCHED;
@@ -611,6 +609,7 @@ static void filter_refs(struct fetch_pack_args *args,
}
}
 
+   free(unfound);
oidset_clear(_oids);
for (ref = unmatched; ref; ref = next) {
next = ref->next;
-- 
2.19.0

Re: [PATCH v3 2/5] fetch-pack: load tip_oids eagerly iff needed

2018-10-05 Thread René Scharfe

Am 05.10.2018 um 00:07 schrieb Jeff King:
> On Thu, Oct 04, 2018 at 05:09:39PM +0200, René Scharfe wrote:
> 
>> tip_oids_contain() lazily loads refs into an oidset at its first call.
>> It abuses the internal (sub)member .map.tablesize of that oidset to
>> check if it has done that already.
>>
>> Determine if the oidset needs to be populated upfront and then do that
>> instead.  This duplicates a loop, but simplifies the existing one by
>> separating concerns between the two.
> 
> I like this approach much better than what I showed earlier. But...
> 
>> diff --git a/fetch-pack.c b/fetch-pack.c
>> index 3b317952f0..53914563b5 100644
>> --- a/fetch-pack.c
>> +++ b/fetch-pack.c
>> @@ -526,23 +526,6 @@ static void add_refs_to_oidset(struct oidset *oids, 
>> struct ref *refs)
>>  oidset_insert(oids, >old_oid);
>>  }
>>  
>> -static int tip_oids_contain(struct oidset *tip_oids,
>> -struct ref *unmatched, struct ref *newlist,
>> -const struct object_id *id)
>> -{
>> -/*
>> - * Note that this only looks at the ref lists the first time it's
>> - * called. This works out in filter_refs() because even though it may
>> - * add to "newlist" between calls, the additions will always be for
>> - * oids that are already in the set.
>> - */
> 
> I don't think the subtle point this comment is making goes away. We're
> still growing the list in the loop that calls tip_oids_contain() (and
> which now calls just oidset_contains). That's OK for the reasons given
> here, but I think that would need to be moved down to this code:
> 
>> +if (strict) {
>> +for (i = 0; i < nr_sought; i++) {
>> +ref = sought[i];
>> +if (!is_unmatched_ref(ref))
>> +continue;
>> +
>> +add_refs_to_oidset(_oids, unmatched);
>> +add_refs_to_oidset(_oids, newlist);
>> +break;
>> +}
>> +}
> 
> I.e., we need to say here why it's OK to summarize newlist in the
> oidset, even though we're adding to it later.

There is already this comment:

/* Append unmatched requests to the list */

And that's enough in my eyes.  The refs loop at the top splits the list
into matched ("the list") and unmatched, and the loop below said comment
adds a few more.  I see no subtlety left -- what do I miss?

René

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 10:01:31PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > There's unfortunately not a fast way of doing that. One option would be
> > to keep a counter of "ungraphed commit objects", and have callers update
> > it. Anybody admitting a pack via index-pack or unpack-objects can easily
> > get this information. Commands like fast-import can do likewise, and
> > "git commit" obviously increments it by one.
> >
> > I'm not excited about adding a new global on-disk data structure (and
> > the accompanying lock).
> 
> You don't really need a new global datastructure to solve this
> problem. It would be sufficient to have git-gc itself write out a 4-line
> text file after it runs saying how many tags, commits, trees and blobs
> it found on its last run.
>
> You can then fuzzily compare object counts v.s. commit counts for the
> purposes of deciding whether something like the commit-graph needs to be
> updated, while assuming that whatever new data you have has similar
> enough ratios of those as your existing data.

I think this is basically the same thing as Stolee's suggestion to keep
the total object count in the commit-graph file. The only difference is
here is that we know the actual ratio of commit to blobs for this
particular repository. But I don't think we need to know that. As you
said, this is fuzzy anyway, so a single number for "update the graph
when there are N new objects" is likely enough.

If you had a repository with an unusually large tree, you'd end up
rebuilding the graph more often. But I think it would probably be OK, as
we're primarily trying not to waste time doing a graph rebuild when
we've only done a small amount of other work. But if we just shoved a
ton of objects through index-pack then we did a lot of work, whether
those were commit objects or not.

-Peff

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 04:00:12PM -0400, Derrick Stolee wrote:

> On 10/5/2018 3:47 PM, Jeff King wrote:
> > On Fri, Oct 05, 2018 at 03:41:40PM -0400, Derrick Stolee wrote:
> > 
> > > > So can we really just take (total_objects - commit_graph_objects) and
> > > > compare it to some threshold?
> > > The commit-graph only stores the number of _commits_, not total objects.
> > Oh, right, of course. That does throw a monkey wrench in that line of
> > thought. ;)
> > 
> > There's unfortunately not a fast way of doing that. One option would be
> > to keep a counter of "ungraphed commit objects", and have callers update
> > it. Anybody admitting a pack via index-pack or unpack-objects can easily
> > get this information. Commands like fast-import can do likewise, and
> > "git commit" obviously increments it by one.
> > 
> > I'm not excited about adding a new global on-disk data structure (and
> > the accompanying lock).
> 
> If we want, then we can add an optional chunk to the commit-graph file that
> stores the object count.

Yeah, that's probably a saner route, since we have to do the write then
anyway.

-Peff

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Ævar Arnfjörð Bjarmason

On Fri, Oct 05 2018, Jeff King wrote:

> On Fri, Oct 05, 2018 at 03:41:40PM -0400, Derrick Stolee wrote:
>
>> > So can we really just take (total_objects - commit_graph_objects) and
>> > compare it to some threshold?
>>
>> The commit-graph only stores the number of _commits_, not total objects.
>
> Oh, right, of course. That does throw a monkey wrench in that line of
> thought. ;)
>
> There's unfortunately not a fast way of doing that. One option would be
> to keep a counter of "ungraphed commit objects", and have callers update
> it. Anybody admitting a pack via index-pack or unpack-objects can easily
> get this information. Commands like fast-import can do likewise, and
> "git commit" obviously increments it by one.
>
> I'm not excited about adding a new global on-disk data structure (and
> the accompanying lock).

You don't really need a new global datastructure to solve this
problem. It would be sufficient to have git-gc itself write out a 4-line
text file after it runs saying how many tags, commits, trees and blobs
it found on its last run.

You can then fuzzily compare object counts v.s. commit counts for the
purposes of deciding whether something like the commit-graph needs to be
updated, while assuming that whatever new data you have has similar
enough ratios of those as your existing data.

That's an assumption that'll hold well enough for big repos where this
matters the most, and who tend to grow in fairly uniform ways as far as
their object type ratios go.

Databases like MySQL, PostgreSQL etc. pull similar tricks with their
fuzzy table statistics.

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Derrick Stolee


On 10/5/2018 3:47 PM, Jeff King wrote:

On Fri, Oct 05, 2018 at 03:41:40PM -0400, Derrick Stolee wrote:


So can we really just take (total_objects - commit_graph_objects) and
compare it to some threshold?

The commit-graph only stores the number of _commits_, not total objects.

Oh, right, of course. That does throw a monkey wrench in that line of
thought. ;)

There's unfortunately not a fast way of doing that. One option would be
to keep a counter of "ungraphed commit objects", and have callers update
it. Anybody admitting a pack via index-pack or unpack-objects can easily
get this information. Commands like fast-import can do likewise, and
"git commit" obviously increments it by one.

I'm not excited about adding a new global on-disk data structure (and
the accompanying lock).


If we want, then we can add an optional chunk to the commit-graph file 
that stores the object count.

Re: [PATCH v3] coccicheck: process every source file at once

2018-10-05 Thread SZEDER Gábor

On Fri, Oct 05, 2018 at 03:02:16PM -0400, Jeff King wrote:
> On Fri, Oct 05, 2018 at 08:39:04PM +0200, SZEDER Gábor wrote:
> 
> > > It should still be a net win, since the total CPU seems to drop by a
> > > factor of 3-4.
> > 
> > Well, that's true when you have unlimited resources... :)  or it's
> > true even then, when I have just enough resources, but not much
> > contention.  After all, Coccinelle doesn't have to parse the same
> > header files over and over again.  However, on Travis CI, where who
> > knows how many other build jobs are running next to our static
> > analysis, it doesn't seem to be the case.
> > 
> > On current master with an additional 'time' in front:
> > 
> >   time make --jobs=2 coccicheck
> >   <...>
> >   695.70user 50.27system 6:27.88elapsed 192%CPU (0avgtext+0avgdata 
> > 91448maxresident)k
> >   5976inputs+2536outputs (42major+18411888minor)pagefaults 0swaps
> > 
> >   https://travis-ci.org/szeder/git/jobs/437733874#L574
> > 
> > With this patch, but without -j2 to fit into 3GB:
> > 
> >   960.50user 22.59system 16:23.74elapsed 99%CPU (0avgtext+0avgdata 
> > 1606156maxresident)k
> >   5976inputs+1320outputs (26major+4548440minor)pagefaults 0swaps
> > 
> >   https://travis-ci.org/szeder/git/jobs/437734003#L575
> > 
> > Note that both the runtime and the CPU time increased. (and RSS, of
> > course)
> 
> I'm not sure what to make of those results. Was the jump in CPU _caused_
> by the patch, or does it independently fluctuate based on other things
> happening on the Travis servers?
> 
> I.e., in the second run, do we know that the time would not have
> actually been worse with the first patch?

Runtimes tend to fluctuate quite a bit more on Travis CI compared to
my machine, but not this much, and it seems to be consistent so far.

After scripting/querying the Travis CI API a bit, I found that from
the last 100 static analysis build jobs 78 did actully run 'make
coccicheck' [1], avaraging 470s for the whole build job, with only 4
build job exceeding the 10min mark.

I had maybe 6-8 build jobs running this patch over the last 2-3 days,
I think all of them were over 15min.  (I restarted some of them, so I
don't have separate logs for all of them, hence the uncertainty.)


1 - There are a couple of canceled build jobs, and we skip the build
job of branches when they happen to match a tags.

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 03:41:40PM -0400, Derrick Stolee wrote:

> > So can we really just take (total_objects - commit_graph_objects) and
> > compare it to some threshold?
> 
> The commit-graph only stores the number of _commits_, not total objects.

Oh, right, of course. That does throw a monkey wrench in that line of
thought. ;)

There's unfortunately not a fast way of doing that. One option would be
to keep a counter of "ungraphed commit objects", and have callers update
it. Anybody admitting a pack via index-pack or unpack-objects can easily
get this information. Commands like fast-import can do likewise, and
"git commit" obviously increments it by one.

I'm not excited about adding a new global on-disk data structure (and
the accompanying lock).

-Peff

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 09:42:49PM +0200, Ævar Arnfjörð Bjarmason wrote:

> >  that I imagine will create a lot of new problems. (We're just now
> > allowing C99 -- I don't even want to think about what kind of compiler
> > issues we'll run into on antique systems trying to use C++).
> 
> ...But just on this point: I was under the impression that this problem
> was way easier with C++. I.e. reason we're just now using C99 for
> portable C projects is because Microsoft for years refused to put any
> effort into updating their compiler to support newer C versions, while
> keeping up-to-date with C++, and that this has only recently started
> changing: https://en.wikipedia.org/wiki/C99#Implementations
> 
> Maybe there was some other popular vendor of C/C++ compilers that had
> the inverse of that story, but I'm not aware of any.

I'd worry about what the C++ story is on AIX, etc.

-Peff

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread Ævar Arnfjörð Bjarmason



On Fri, Oct 05 2018, Jeff King wrote:

> On Fri, Oct 05, 2018 at 09:12:09PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > I'm not wild about declaring functions inside macros, just because it
>> > makes tools like ctags like useful (but I have certainly been guilty of
>> > it myself). I'd also worry that taking "code" as a macro parameter might
>> > not scale (what happens if the code has a comma in it?)
>>
>> There's always the option of generating the C code as part of some build
>> step and carrying around a big C file with various type-safe functions
>> that only differ in the types they operate on. It can even be committed
>> to source control.
>>
>> That sucks in some ways for sure, but is a lot friendlier for grepping,
>> ctags etc.
>
> Yeah, in a lot of ways the C preprocessor is not great for larger-scale
> code generation. I was hoping we could get away without having the
> bodies of these functions as part of the generated bit, though.
>
> I think what René showed later in the thread is not too bad in that
> respect.
>
>> I've just barely resisted the urge to include that thread where we were
>> discussing making the code C++-compiler compatible in the References
>> header :)
>
> Yes. The main thing I would want out of using C++ is type-safe,
> efficient data structures. IIRC, early versions of C++ were implemented
> via code generation, and we're basically walking down that same road.
>
> I'm not sure where the right cutoff is, though. It's nice to pick up
> the solution somebody else produced, but requiring a C++ compiler to
> build Git is a pretty big step[...]

No comment on whether git should use C++...

>  that I imagine will create a lot of new problems. (We're just now
> allowing C99 -- I don't even want to think about what kind of compiler
> issues we'll run into on antique systems trying to use C++).

...But just on this point: I was under the impression that this problem
was way easier with C++. I.e. reason we're just now using C99 for
portable C projects is because Microsoft for years refused to put any
effort into updating their compiler to support newer C versions, while
keeping up-to-date with C++, and that this has only recently started
changing: https://en.wikipedia.org/wiki/C99#Implementations

Maybe there was some other popular vendor of C/C++ compilers that had
the inverse of that story, but I'm not aware of any.

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 09:36:28PM +0200, René Scharfe wrote:

> Am 05.10.2018 um 21:08 schrieb Jeff King:
> > On Fri, Oct 05, 2018 at 08:48:27PM +0200, René Scharfe wrote:
> >> +#define DEFINE_SORT(name, type, compare)  \
> >> +static int compare##_void(const void *one, const void *two)   
> >> \
> >> +{ \
> >> +  return compare(one, two);   \
> >> +} \
> >> +static void name(type base, size_t nmemb) \
> >> +{ \
> >> +  const type dummy = NULL;\
> >> +  if (nmemb > 1)  \
> >> +  qsort(base, nmemb, sizeof(base[0]), compare##_void);\
> >> +  else if (0) \
> >> +  compare(dummy, dummy);  \
> >> +}
> > 
> > I do like that this removes the need to have the code block aspart of
> > the macro.
> > 
> > Did you measure to see if there is any runtime impact?
> 
> No, but I wouldn't expect any -- the generated code should be the same
> in most cases.
> 
> Here's an example: https://godbolt.org/z/gwXENy.

OK, that's good enough for me.

> The typed comparison function can be inlined into the one with the void
> pointers, though.

Right, that makes sense. I suspect it depends on the comparison function
being static, but in a DEFINE_SORT() world, they generally could be.

So I like this approach.

-Peff

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Derrick Stolee


On 10/5/2018 3:21 PM, Jeff King wrote:

On Fri, Oct 05, 2018 at 09:45:47AM -0400, Derrick Stolee wrote:


My misunderstanding was that your proposed change to gc computes the
commit-graph in either of these two cases:

(1) The auto-GC threshold is met.

(2) There is no commit-graph file.

And what I hope to have instead of (2) is (3):

(3) The commit-graph file is "sufficiently behind" the tip refs.

This condition is intentionally vague at the moment. It could be that we
hint that (3) holds by saying "--post-fetch" (i.e. "We just downloaded a
pack, and it probably contains a lot of new commits") or we could create
some more complicated condition based on counting reachable commits with
infinite generation number (the number of commits not in the commit-graph
file).

I like that you are moving forward to make the commit-graph be written more
frequently, but I'm trying to push us in a direction of writing it even more
often than your proposed strategy. We should avoid creating too many
orthogonal conditions that trigger the commit-graph write, which is why I'm
pushing on your design here.

Anyone else have thoughts on this direction?

Yes, I think measuring "sufficiently behind" is the right thing.
Everything else is a proxy or heuristic, and will run into corner cases.
E.g., I have some small number of objects and then do a huge fetch, and
now my commit-graph only covers 5% of what's available.

We know how many objects are in the graph already. And it's not too
expensive to get the number of objects in the repository. We can do the
same sampling for loose objects that "gc --auto" does, and counting
packed objects just involves opening up the .idx files (that can be slow
if you have a ton of packs, but you'd want to either repack or use a
.midx in that case anyway, either of which would help here).

So can we really just take (total_objects - commit_graph_objects) and
compare it to some threshold?


The commit-graph only stores the number of _commits_, not total objects.

Azure Repos' commit-graph does store the total number of objects, and 
that is how we trigger updating the graph, so it is not unreasonable to 
use that as a heuristic.


Thanks,

-Stolee

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread René Scharfe

Am 05.10.2018 um 21:08 schrieb Jeff King:
> On Fri, Oct 05, 2018 at 08:48:27PM +0200, René Scharfe wrote:
>> +#define DEFINE_SORT(name, type, compare)\
>> +static int compare##_void(const void *one, const void *two) \
>> +{   \
>> +return compare(one, two);   \
>> +}   \
>> +static void name(type base, size_t nmemb)   \
>> +{   \
>> +const type dummy = NULL;\
>> +if (nmemb > 1)  \
>> +qsort(base, nmemb, sizeof(base[0]), compare##_void);\
>> +else if (0) \
>> +compare(dummy, dummy);  \
>> +}
> 
> I do like that this removes the need to have the code block aspart of
> the macro.
> 
> Did you measure to see if there is any runtime impact?

No, but I wouldn't expect any -- the generated code should be the same
in most cases.

Here's an example: https://godbolt.org/z/gwXENy.

> As an aside, we may need to take a "scope" argument in case somebody
> wants to do this in a non-static way.

Sure.  (They could easily wrap the static function, but a macro
parameter is simpler still.)

> It would be nice if we could make
> this "static inline", but I don't think even a clever compiler would be
> able to omit the wrapper call.

It could, if it was to inline qsort(3).  Current compilers don't do
that AFAIK, but I wouldn't be too surprised if they started to.

The typed comparison function can be inlined into the one with the void
pointers, though.

René

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 09:12:09PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I'm not wild about declaring functions inside macros, just because it
> > makes tools like ctags like useful (but I have certainly been guilty of
> > it myself). I'd also worry that taking "code" as a macro parameter might
> > not scale (what happens if the code has a comma in it?)
> 
> There's always the option of generating the C code as part of some build
> step and carrying around a big C file with various type-safe functions
> that only differ in the types they operate on. It can even be committed
> to source control.
> 
> That sucks in some ways for sure, but is a lot friendlier for grepping,
> ctags etc.

Yeah, in a lot of ways the C preprocessor is not great for larger-scale
code generation. I was hoping we could get away without having the
bodies of these functions as part of the generated bit, though.

I think what René showed later in the thread is not too bad in that
respect.

> I've just barely resisted the urge to include that thread where we were
> discussing making the code C++-compiler compatible in the References
> header :)

Yes. The main thing I would want out of using C++ is type-safe,
efficient data structures. IIRC, early versions of C++ were implemented
via code generation, and we're basically walking down that same road.

I'm not sure where the right cutoff is, though. It's nice to pick up
the solution somebody else produced, but requiring a C++ compiler to
build Git is a pretty big step that I imagine will create a lot of new
problems. (We're just now allowing C99 -- I don't even want to think
about what kind of compiler issues we'll run into on antique systems
trying to use C++).

-Peff

Re: [PATCH 1/1] protocol: limit max protocol version per service

2018-10-05 Thread Stefan Beller

> > But given that we transport the version in env variables, we'd
> > need to be extra careful if we just did not see the version
> > in the --remote=. above?
>
> Sorry, I'm not sure I understand this. What about env variables requires
> caution?

By locally transporting the version via env variables means the absence of
"version X" lines in the GIT_TRACE output is not a clear indication
of version 0, I would think. It is a strong indicator, but now we'd have to dig
and see if the env variables were set outside?


> >
> > > I suppose if we are strict about serving from a single endpoint, the
> > > version registry makes more sense, and individual operations can declare
> > > acceptable version numbers before calling any network code?
> >
> > Ah yeah, that makes sense!
>
> Thinking out loud here. Please let me know if I say something stupid :)
>
> So we'll have (up to) three pieces of version information we'll care
> about for version negotiation:
>
> 1. (maybe) a client-side protocol.version config entry

(and in case we don't, we have it implicit ly hardcoded, as
we have to choose the default for users that don't care themselves about
this setting)

> 2. a list of acceptable proto versions for the currently running
>operation on the client
> 3. a list of acceptable proto versions for the server endpoint that
>handles the request

Yes that matches my understanding. The issue is between (1) and (2)
as (1) is in a generic config, whereas (2) is specific to the command,
such that it may differ. And as a user you may want to express things
like: "use the highest version", which is done by setting (1) to "version 2"
despite (2) not having support of all commands for v2.

> According to the doc on protocol.version: "If set, clients will attempt
> to communicate with a server using the specified protocol version. If
> unset, no attempt will be made by the client to communicate using a
> particular protocol version, this results in protocol version 0 being
> used."
>
> So, if protocol.version is not set, or set to 0, the client should not
> attempt any sort of version negotiation.

Yes, as version 0 is unaware of versions, i.e. there are old installations
out there where all the versioning code is not there, so in case of an
old client the new server *must* speak v0 to be able to communicate
(and vice versa).


> Otherwise, the client prefers a
> particular version, but we don't guarantee that they will actually use
> that version after the (unspecified) version negotiation procedure.
>
> If protocol.version is set to something other than 0, we construct a
> list of acceptable versions for the given operation. If the
> protocol.version entry is present in that list, we move it to the front
> of the list to note that it is the preferred version. We send all of
> these, in order, in the request.
>
> When the server endpoint begins to handle a request, it first constructs
> a list of acceptable versions. If the client specifies a list of
> versions, we check them one-by-one to see if they are acceptable. If we
> find a match, we use that version. If we don't find any matches or if
> the client did not send a version list, we default to v0.
>
> Seem reasonable?

This sounds super reasonable!

Thanks
Stefan

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 09:45:47AM -0400, Derrick Stolee wrote:

> My misunderstanding was that your proposed change to gc computes the
> commit-graph in either of these two cases:
> 
> (1) The auto-GC threshold is met.
> 
> (2) There is no commit-graph file.
> 
> And what I hope to have instead of (2) is (3):
> 
> (3) The commit-graph file is "sufficiently behind" the tip refs.
> 
> This condition is intentionally vague at the moment. It could be that we
> hint that (3) holds by saying "--post-fetch" (i.e. "We just downloaded a
> pack, and it probably contains a lot of new commits") or we could create
> some more complicated condition based on counting reachable commits with
> infinite generation number (the number of commits not in the commit-graph
> file).
> 
> I like that you are moving forward to make the commit-graph be written more
> frequently, but I'm trying to push us in a direction of writing it even more
> often than your proposed strategy. We should avoid creating too many
> orthogonal conditions that trigger the commit-graph write, which is why I'm
> pushing on your design here.
> 
> Anyone else have thoughts on this direction?

Yes, I think measuring "sufficiently behind" is the right thing.
Everything else is a proxy or heuristic, and will run into corner cases.
E.g., I have some small number of objects and then do a huge fetch, and
now my commit-graph only covers 5% of what's available.

We know how many objects are in the graph already. And it's not too
expensive to get the number of objects in the repository. We can do the
same sampling for loose objects that "gc --auto" does, and counting
packed objects just involves opening up the .idx files (that can be slow
if you have a ton of packs, but you'd want to either repack or use a
.midx in that case anyway, either of which would help here).

So can we really just take (total_objects - commit_graph_objects) and
compare it to some threshold?

-Peff

Re: [PATCH] grep: provide a noop --recursive option

2018-10-05 Thread Stefan Beller

On Fri, Oct 5, 2018 at 6:05 AM Mischa POSLAWSKY  wrote:
>
> Junio C Hamano wrote 2018-10-05 1:19 (-0700):
> > Stefan Beller  writes:
> >
> > > git-grep is always file/tree recursive, but there is --recurse-submodules
> > > which is off by default. Instead of providing a short alias to a noop,
> > > we could use -r for submodules. (And if you happen to have no
> > > submodules, this is a noop for you)
> >
> > I am not sure if it is an overall win for those who do have and use
> > submodules to easily be able to go recursive with a short-and-sweet
> > 'r', or even they want to work inside one project at a time most of
> > the time.  If the latter, then using 'r' for recurse-submodules is
> > going to be a mistake (besides, other commands that have 'recursive'
> > typically use 'r' for its shorthand,and 'r' does not stand for
> > 'recurse-submodules' for them).
>
> Personally I would welcome a shorthand for --recurse-submodules,
> especially if --r^I no longer completes to this.

The new switch differs by one dash, so I'd think the double dashed
version would still autocomplete.

Unrelated to this, but more to submodules:
There is submodule.recurse which you may want to set.
Would you be interested in a more specific config option there?
(i.e. grep.recurseSubmodules to only apply to grep recursing into
submodules, just like fetch.recurseSubmodules only applies to fetch)

> It is also closer to the behaviour provided by grep -r as that recurses
> into submodules as well.

That sort of makes for the grep case, but not for other commands.
See the related discussion at
https://public-inbox.org/git/20180907064026.gb172...@aiede.svl.corp.google.com/

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread Ævar Arnfjörð Bjarmason



On Fri, Oct 05 2018, Jeff King wrote:

> On Fri, Oct 05, 2018 at 12:59:02AM +0200, René Scharfe wrote:
>
>> We could also do something like this to reduce the amount of manual
>> casting, but do we want to?  (Macro at the bottom, three semi-random
>> examples at the top.)
>> [...]
>> diff --git a/git-compat-util.h b/git-compat-util.h
>> index 5f2e90932f..f9e78d69a2 100644
>> --- a/git-compat-util.h
>> +++ b/git-compat-util.h
>> @@ -1066,6 +1066,18 @@ static inline void sane_qsort(void *base, size_t 
>> nmemb, size_t size,
>>  qsort(base, nmemb, size, compar);
>>  }
>>
>> +#define DEFINE_SORT(name, elemtype, one, two, code) \
>> +static int name##_compare(const void *one##_v_, const void *two##_v_)   
>> \
>> +{   \
>> +elemtype const *one = one##_v_; \
>> +elemtype const *two = two##_v_; \
>> +code;   \
>> +}   \
>> +static void name(elemtype *array, size_t n) \
>> +{   \
>> +QSORT(array, n, name##_compare);\
>> +}
>
> Interesting. When I saw the callers of this macro, I first thought you
> were just removing the casts from the comparison function, but the real
> value here is the matching QSORT() wrapper which provides the type
> safety.
>
> I'm not wild about declaring functions inside macros, just because it
> makes tools like ctags like useful (but I have certainly been guilty of
> it myself). I'd also worry that taking "code" as a macro parameter might
> not scale (what happens if the code has a comma in it?)

There's always the option of generating the C code as part of some build
step and carrying around a big C file with various type-safe functions
that only differ in the types they operate on. It can even be committed
to source control.

That sucks in some ways for sure, but is a lot friendlier for grepping,
ctags etc.

I've just barely resisted the urge to include that thread where we were
discussing making the code C++-compiler compatible in the References
header :)

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 08:48:27PM +0200, René Scharfe wrote:

> If the comparison function has proper types then we need to declare a
> version with void pointer parameters as well to give to qsort(3).  I
> think using cast function pointers is undefined.  Perhaps like this?

I think it's undefined, too, though we have many instances already.

> +#define DEFINE_SORT(name, type, compare) \
> +static int compare##_void(const void *one, const void *two)  \
> +{\
> + return compare(one, two);   \
> +}\
> +static void name(type base, size_t nmemb)\
> +{\
> + const type dummy = NULL;\
> + if (nmemb > 1)  \
> + qsort(base, nmemb, sizeof(base[0]), compare##_void);\
> + else if (0) \
> + compare(dummy, dummy);  \
> +}

I do like that this removes the need to have the code block aspart of
the macro.

Did you measure to see if there is any runtime impact?

As an aside, we may need to take a "scope" argument in case somebody
wants to do this in a non-static way. It would be nice if we could make
this "static inline", but I don't think even a clever compiler would be
able to omit the wrapper call.

-Peff

Re: [PATCH v3] coccicheck: process every source file at once

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 08:39:04PM +0200, SZEDER Gábor wrote:

> > It should still be a net win, since the total CPU seems to drop by a
> > factor of 3-4.
> 
> Well, that's true when you have unlimited resources... :)  or it's
> true even then, when I have just enough resources, but not much
> contention.  After all, Coccinelle doesn't have to parse the same
> header files over and over again.  However, on Travis CI, where who
> knows how many other build jobs are running next to our static
> analysis, it doesn't seem to be the case.
> 
> On current master with an additional 'time' in front:
> 
>   time make --jobs=2 coccicheck
>   <...>
>   695.70user 50.27system 6:27.88elapsed 192%CPU (0avgtext+0avgdata 
> 91448maxresident)k
>   5976inputs+2536outputs (42major+18411888minor)pagefaults 0swaps
> 
>   https://travis-ci.org/szeder/git/jobs/437733874#L574
> 
> With this patch, but without -j2 to fit into 3GB:
> 
>   960.50user 22.59system 16:23.74elapsed 99%CPU (0avgtext+0avgdata 
> 1606156maxresident)k
>   5976inputs+1320outputs (26major+4548440minor)pagefaults 0swaps
> 
>   https://travis-ci.org/szeder/git/jobs/437734003#L575
> 
> Note that both the runtime and the CPU time increased. (and RSS, of
> course)

I'm not sure what to make of those results. Was the jump in CPU _caused_
by the patch, or does it independently fluctuate based on other things
happening on the Travis servers?

I.e., in the second run, do we know that the time would not have
actually been worse with the first patch?

-Peff

Re: [PATCH v3] coccicheck: process every source file at once

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 08:50:50PM +0200, SZEDER Gábor wrote:

> On Fri, Oct 05, 2018 at 12:59:01PM -0400, Jeff King wrote:
> > On Fri, Oct 05, 2018 at 04:53:35PM +, Keller, Jacob E wrote:
> > 
> > > > Are we OK with saying 1.3-1.8GB is necessary to run coccicheck? That
> > > > doesn't feel like an exorbitant request for a developer-only tool these
> > > > days, but I have noticed some people on the list tend to have lousier
> > > > machines than I do. ;)
> > > > 
> > > > -Peff
> > > 
> > > It's probably not worth trying to make this more complicated and scale
> > > up how many files we do at once based on the amount of available
> > > memory on the system...
> > 
> > Yeah, that sounds too complicated. At most I'd give a Makefile knob to
> > say "spatch in batches of $(N)". But I'd prefer to avoid even that
> > complexity if we can.
> 
> But perhaps one more if-else, e.g.:
> 
>   if test -n "$(COCCICHECK_ALL_AT_ONCE)"; then \
>   
>   else
>   
>   fi
> 
> would be an acceptable compromise?  Dunno.

That's OK, too, assuming people would actually want to use it. I'm also
OK shipping this (with the "make -j" fix you suggested) and seeing if
anybody actually complains. I assume there are only a handful of people
running coccicheck in the first place.

-Peff

Re: [PATCH v3] coccicheck: process every source file at once

2018-10-05 Thread SZEDER Gábor

On Fri, Oct 05, 2018 at 12:59:01PM -0400, Jeff King wrote:
> On Fri, Oct 05, 2018 at 04:53:35PM +, Keller, Jacob E wrote:
> 
> > > Are we OK with saying 1.3-1.8GB is necessary to run coccicheck? That
> > > doesn't feel like an exorbitant request for a developer-only tool these
> > > days, but I have noticed some people on the list tend to have lousier
> > > machines than I do. ;)
> > > 
> > > -Peff
> > 
> > It's probably not worth trying to make this more complicated and scale
> > up how many files we do at once based on the amount of available
> > memory on the system...
> 
> Yeah, that sounds too complicated. At most I'd give a Makefile knob to
> say "spatch in batches of $(N)". But I'd prefer to avoid even that
> complexity if we can.

But perhaps one more if-else, e.g.:

  if test -n "$(COCCICHECK_ALL_AT_ONCE)"; then \
  
  else
  
  fi

would be an acceptable compromise?  Dunno.

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread René Scharfe

Am 05.10.2018 um 18:51 schrieb Jeff King:
> On Fri, Oct 05, 2018 at 12:59:02AM +0200, René Scharfe wrote:
> 
>> We could also do something like this to reduce the amount of manual
>> casting, but do we want to?  (Macro at the bottom, three semi-random
>> examples at the top.)
>> [...]
>> diff --git a/git-compat-util.h b/git-compat-util.h
>> index 5f2e90932f..f9e78d69a2 100644
>> --- a/git-compat-util.h
>> +++ b/git-compat-util.h
>> @@ -1066,6 +1066,18 @@ static inline void sane_qsort(void *base, size_t 
>> nmemb, size_t size,
>>  qsort(base, nmemb, size, compar);
>>  }
>>  
>> +#define DEFINE_SORT(name, elemtype, one, two, code) \
>> +static int name##_compare(const void *one##_v_, const void *two##_v_)   
>> \
>> +{   \
>> +elemtype const *one = one##_v_; \
>> +elemtype const *two = two##_v_; \
>> +code;   \
>> +}   \
>> +static void name(elemtype *array, size_t n) \
>> +{   \
>> +QSORT(array, n, name##_compare);\
>> +}
> 
> Interesting. When I saw the callers of this macro, I first thought you
> were just removing the casts from the comparison function, but the real
> value here is the matching QSORT() wrapper which provides the type
> safety.

Indeed.

> I'm not wild about declaring functions inside macros, just because it
> makes tools like ctags like useful (but I have certainly been guilty of
> it myself). I'd also worry that taking "code" as a macro parameter might
> not scale (what happens if the code has a comma in it?)

It works fine, as long as the comma is surrounded by parentheses, so
function calls with more than one parameter are fine without any change.

> I think we can address that last part by switching the definition order.
> Like:
> 
>   #define DEFINE_SORT(name, elemtype, one, two) \
>   static int name##_compare(const void *, const void *);\
>   static void name(elemtype *array, size_t n)   \
>   { \
>   QSORT(array, n, name##_compare);\
>   } \
>   static int name##_compare(const void *one##_v_, const void *two##_v_) \
>   { \
>   elemtype const *one = one##_v_; \
>   elemtype const *two = two##_v_; \
> 
> And then expecting the caller to do:
> 
>   DEFINE_SORT(foo, struct foo, a, b)
>  /* code goes here */
>   }
> 
> The unbalanced braces are nasty, though (and likely to screw up editor
> formatting, highlighting, etc).

Adding an extra pair of parentheses if needed is also not ideal, but has
less downsides, I think.

> I wonder if it would be possible to just declare the comparison function
> with its real types, and then teach QSORT() to do a type check. That
> would require typeof() at least, but it would be OK for the type-check
> to be available only to gcc/clang users, I think.
> 
> I'm not quite sure what that type-check would look like, but I was
> thinking something along the lines of (inside the QSORT macro):
> 
>   do {
> /* this will yield a type mismatch if fed the wrong function */
> int (*check)(const typeof(array), const typeof(array)) = compar;
> sane_qsort(array, n, sizeof(*array), n);
>   } while (0)
> 
> I have no idea if that even comes close to compiling, though.

If the comparison function has proper types then we need to declare a
version with void pointer parameters as well to give to qsort(3).  I
think using cast function pointers is undefined.  Perhaps like this?

---
 bisect.c  | 11 +--
 commit-graph.c|  8 
 commit-reach.c| 12 +++-
 git-compat-util.h | 14 ++
 4 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/bisect.c b/bisect.c
index e8b17cf7e1..1fc6278c6b 100644
--- a/bisect.c
+++ b/bisect.c
@@ -192,17 +192,16 @@ struct commit_dist {
int distance;
 };
 
-static int compare_commit_dist(const void *a_, const void *b_)
+static int compare_commit_dist(const struct commit_dist *a,
+  const struct commit_dist *b)
 {
-   struct commit_dist *a, *b;
-
-   a = (struct commit_dist *)a_;
-   b = (struct commit_dist *)b_;
if (a->distance != b->distance)
return b->distance - a->distance; /* desc sort */
return oidcmp(>commit->object.oid, >commit->object.oid);
 }
 
+DEFINE_SORT(sort_by_commit_dist, struct commit_dist *, compare_commit_dist)
+
 static

Re: [PATCH v3] coccicheck: process every source file at once

2018-10-05 Thread SZEDER Gábor

On Fri, Oct 05, 2018 at 12:25:17PM -0400, Jeff King wrote:
> On Fri, Oct 05, 2018 at 02:40:48PM +0200, SZEDER Gábor wrote:
> 
> > On Thu, Oct 04, 2018 at 07:17:47PM -0700, Jacob Keller wrote:
> > > Junio, do you want me to update the commit message on my side with the
> > > memory concerns? Or could you update it to mention memory as a noted
> > > trade off.
> > 
> > We have been running 'make -j2 coccicheck' in the static analysis
> > build job on Travis CI, which worked just fine so far.  The Travis CI
> > build environments have 3GB of memory available [1], but, as shown in
> > [2], with this patch the memory consumption jumps up to about
> > 1.3-1.8GB for each of those jobs.  So with two parallel jobs we will
> > very likely bump into this limit.
> > 
> > So this patch should definitely change that build script to run only a
> > single job.
> 
> It should still be a net win, since the total CPU seems to drop by a
> factor of 3-4.

Well, that's true when you have unlimited resources... :)  or it's
true even then, when I have just enough resources, but not much
contention.  After all, Coccinelle doesn't have to parse the same
header files over and over again.  However, on Travis CI, where who
knows how many other build jobs are running next to our static
analysis, it doesn't seem to be the case.

On current master with an additional 'time' in front:

  time make --jobs=2 coccicheck
  <...>
  695.70user 50.27system 6:27.88elapsed 192%CPU (0avgtext+0avgdata 
91448maxresident)k
  5976inputs+2536outputs (42major+18411888minor)pagefaults 0swaps

  https://travis-ci.org/szeder/git/jobs/437733874#L574

With this patch, but without -j2 to fit into 3GB:

  960.50user 22.59system 16:23.74elapsed 99%CPU (0avgtext+0avgdata 
1606156maxresident)k
  5976inputs+1320outputs (26major+4548440minor)pagefaults 0swaps

  https://travis-ci.org/szeder/git/jobs/437734003#L575

Note that both the runtime and the CPU time increased. (and RSS, of
course)

> Are we OK with saying 1.3-1.8GB is necessary to run coccicheck? That
> doesn't feel like an exorbitant request for a developer-only tool these
> days, but I have noticed some people on the list tend to have lousier
> machines than I do. ;)
> 
> -Peff

Re: [PATCH v4 2/7] Documentation: add shell guidelines

2018-10-05 Thread Matthew DeVore

On Fri, Oct 5, 2018 at 9:48 AM Junio C Hamano  wrote:
>
> Matthew DeVore  writes:
>
> > Add the following guideline to Documentation/CodingGuidelines:
> >
> >   &&, ||, and | should appear at the end of lines, not the
> >   beginning, and the \ line continuation character should be
> >   omitted
>
> "should be omitted" sounds as if it is the norm to have such a
> character, but it is not.  The text in the actual patch body does a
> much better job than this.
>
> Perhaps
>
> Break overlong lines after "&&", "||", and "|", not before
> them; that way the command can continue to subsequent lines
> without backslash at the end.
That sounds good. Fixed.

> > + - If a command sequence joined with && or || or | spans multiple
> > +   lines, put each command on a separate line and put && and || and |
> > +   operators at the end of each line, rather than the start. This
> > +   means you don't need to use \ to join lines, since the above
> > +   operators imply the sequence isn't finished.
>
> Correct.  Even though I wonder if we need to say the last sentence,
> which is rather obvious, patch is already written and I do not see
> much point editing this further.
ack

> > @@ -466,6 +466,34 @@ And here are the "don'ts:"
> > platform commands; just use '! cmd'.  We are not in the business
> > of verifying that the world given to us sanely works.
> >
> > + - Don't use Git upstream in the non-final position in a piped chain, as
> > +   in:
>
> "upstream in the non-final position" is a bit redundant, isn't it?
>
>   - Don't feed the output of 'git' to a pipe, as in:

Done, but I changed it to "Don't feed the output of a git command to a
pipe, as in:" since referring to it as "git command" without the
quotes seems rather common in this file.

Thank you for the review!

Re: [PATCH v4 5/7] tests: don't swallow Git errors upstream of pipes

2018-10-05 Thread Matthew DeVore

I just realized that the changes to t9101 should actually be part of
the next patch (6/7), not this one. I've fixed that for the next
re-roll.

Re: [PATCH v4 5/7] tests: don't swallow Git errors upstream of pipes

2018-10-05 Thread Matthew DeVore

On Fri, Oct 5, 2018 at 9:48 AM Junio C Hamano  wrote:
>
> Hopefully this is not a blind mechanical patch, as introduction of
> unexpected temporary files in the working tree could interfere with
> later tests (e.g. they may expect exact set of untracked files, and
> these new temporary files would b unexpected additions).
>
> Thanks.

I reviewed the files modified in this patch. For the svn one, there
are other scratch files being created in the cwd before mine, so my
additions should be fine, I admit I don't fully understand the
commands being run in that test...after the temp files are created, I
see git svn proplist and git svn propget being executed... the public
documentation for them does not seem to contradict my assumption.
(http://svnbook.red-bean.com/en/1.6/svn.ref.svn.c.propget.html)

As for the other test files in this patch, all git invocations are
either "git clone" or "git init " or "git -C ..." which means
none of them care about extra files in the cwd. After I searched for
Git commands I read through the files somewhat briskly and looked for
any globs or suspicious invocations of awk, grep, or other commands
(we really do use a limited number of commands in the tests I saw :)
and did not find anything that depended on existence or non-existence
of unnamed files.

[ANNOUNCE] Git for Windows 2.19.1

2018-10-05 Thread Johannes Schindelin

Dear Git users,

It is my pleasure to announce that Git for Windows 2.19.1 is available from:

https://gitforwindows.org/

Changes since Git for Windows v2.19.0 (September 11th 2018)

New Features

  * Comes with Git v2.19.1.
  * Comes with Git LFS v2.5.2.
  * Comes with Git Credential Manager v1.17.2.
  * When FSCache is enabled, commands such as add, commit, and reset
are now much faster.
  * Sublime Text, Atom, and even the new user-specific VS Code
installations can now by used as Git's default editor.
  * Comes with Git Credential Manager v1.18.0.

Bug Fixes

  * Several corner case bugs were fixed in the built-in rebase/stash
commands.
  * An occasional crash in git gc (which had been introduced into
v2.19.0) has been fixed.

Filename | SHA-256
 | ---
Git-2.19.1-64-bit.exe | 
5e11205840937dd4dfa4a2a7943d08da7443faa41d92ccc5dafbb4f82e724793
Git-2.19.1-32-bit.exe | 
a94ab28c2a9c20cf1ef061a2acce5b2de572773e5efed0e994d89380f458d52a
PortableGit-2.19.1-64-bit.7z.exe | 
419fc2141b5d17c5ad73d9b2306cd3b2f5114c50bd67558cdc538da2c8e67b66
PortableGit-2.19.1-32-bit.7z.exe | 
03b993c9aa336259c293eaf3f123f6591a9c8f2939d2129810332d5f64d553a6
MinGit-2.19.1-64-bit.zip | 
f89e103a41bda8e12efeaab198a8c20bb4a84804683862da518ee2cb66a5a5b3
MinGit-2.19.1-32-bit.zip | 
9bde728fe03f66a022b3e41408902ccfceb56a34067db1f35d6509375b9be922
MinGit-2.19.1-busybox-64-bit.zip | 
aa2c0dd240c9db66a85d8633fb9f90cbcc10cb00c86cc37880bdde691c1362de
MinGit-2.19.1-busybox-32-bit.zip | 
179c7810e6f388e37a6b4073e9cfc2d109598061787dc913b3813b78fecafa97
Git-2.19.1-64-bit.tar.bz2 | 
7b4de9b248dcec6f28210f14af2cf1eb31e1400e3c4f139a277fcdd7691486d2
Git-2.19.1-32-bit.tar.bz2 | 
4ae9ce6fe7a5fe4b149dc8e24f1a7db3dff5285225e10aa6c0e39284ad0548bd
pdbs-for-git-64-bit-2.19.1.1.11a3092e18-1.zip | 
856c9fb0eb8b2ffb6b6b4b3c2f5ffb37add31c60aed1662b0784114d57444d5e
pdbs-for-git-32-bit-2.19.1.1.11a3092e18-1.zip | 
83191c41776fc3c4d16e2abd686d881d36f5c5e9320a34826fcee33dd74f8a0d

Ciao,
Johannes

Re: [PATCH v3] coccicheck: process every source file at once

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 04:53:35PM +, Keller, Jacob E wrote:

> > Are we OK with saying 1.3-1.8GB is necessary to run coccicheck? That
> > doesn't feel like an exorbitant request for a developer-only tool these
> > days, but I have noticed some people on the list tend to have lousier
> > machines than I do. ;)
> > 
> > -Peff
> 
> It's probably not worth trying to make this more complicated and scale
> up how many files we do at once based on the amount of available
> memory on the system...

Yeah, that sounds too complicated. At most I'd give a Makefile knob to
say "spatch in batches of $(N)". But I'd prefer to avoid even that
complexity if we can.

-Peff

[Announce] Git 2.14.5, 2.15.3, 2.16.5, 2.17.2, 2.18.1, and 2.19.1

2018-10-05 Thread Junio C Hamano

These releases fix a security flaw (CVE-2018-17456), which allowed an
attacker to execute arbitrary code by crafting a malicious .gitmodules
file in a project cloned with --recurse-submodules.

When running "git clone --recurse-submodules", Git parses the supplied
.gitmodules file for a URL field and blindly passes it as an argument
to a "git clone" subprocess.  If the URL field is set to a string that
begins with a dash, this "git clone" subprocess interprets the URL as
an option.  This can lead to executing an arbitrary script shipped in
the superproject as the user who ran "git clone".

In addition to fixing the security issue for the user running "clone",
the 2.17.2, 2.18.1 and 2.19.1 releases have an "fsck" check which can
be used to detect such malicious repository content when fetching or
accepting a push. See "transfer.fsckObjects" in git-config(1).

Credit for finding and fixing this vulnerability goes to joernchen
and Jeff King, respectively.

P.S. Folks at Microsoft tried to follow the known exploit recipe on
Git for Windows (but not Cygwin or other Git implementations on
Windows) and found that the recipe (or its variants they can think
of) would not make their system vulnerable.  This is due to the fact
that the type of submodule path require by the known exploit recipe
cannot be created on Windows.  Nonetheless, it is possible we have
missed some exploitation path and users are encouraged to upgrade.

Re: Is there some script to find un-delta-able objects?

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 06:44:25PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Some version of the former. Ones where we haven't found any (or much of)
> useful deltas yet. E.g. say I had a repository with a lot of files
> generated by this command at various points in the history:
> 
> dd if=/dev/urandom of=file.binary count=1024 bs=1024
> 
> Some script similar to git-sizer which could report that the
> packed+compressed+delta'd version of the 10 *.binary files I had in my
> history had a 1:1 ratio of how large they were in .git, v.s. how large
> the sum of each file retrieved by "git show" was (i.e. uncompressed,
> un-delta'd).

You can get the uncompressed and on-disk sizes with:

  git cat-file --batch-all-objects \
--batch-check='%(objectname) %(objectsize) %(objectsize:disk)'

and then compare the sizes/ratios however you like. If you want just a
subset of the blobs, drop the "--batch-all-objects" and just feed the
object names or even "HEAD:filename" on stdin).

> That doesn't mean that tomorrow I won't commit 10 new objects which
> would have a really good delta ratio to those 10 existing files,
> bringing the ratio to ~1:2, but if I had some report like:
> 
>  
> 
> For a given repo that could be fed into .gitattributes to say we
> shouldn't bother to delta files of certain extensions.

I don't know of a tool that does that, but I think a modest application
of perl to the cat-file output would produce it.

-Peff

RE: [PATCH v3] coccicheck: process every source file at once

2018-10-05 Thread Keller, Jacob E

> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
> Sent: Friday, October 05, 2018 9:25 AM
> To: SZEDER Gábor 
> Cc: Jacob Keller ; Keller, Jacob E
> ; Git mailing list 
> Subject: Re: [PATCH v3] coccicheck: process every source file at once
> 
> On Fri, Oct 05, 2018 at 02:40:48PM +0200, SZEDER Gábor wrote:
> 
> > On Thu, Oct 04, 2018 at 07:17:47PM -0700, Jacob Keller wrote:
> > > Junio, do you want me to update the commit message on my side with the
> > > memory concerns? Or could you update it to mention memory as a noted
> > > trade off.
> >
> > We have been running 'make -j2 coccicheck' in the static analysis
> > build job on Travis CI, which worked just fine so far.  The Travis CI
> > build environments have 3GB of memory available [1], but, as shown in
> > [2], with this patch the memory consumption jumps up to about
> > 1.3-1.8GB for each of those jobs.  So with two parallel jobs we will
> > very likely bump into this limit.
> >
> > So this patch should definitely change that build script to run only a
> > single job.
> 
> It should still be a net win, since the total CPU seems to drop by a
> factor of 3-4.
> 
> Are we OK with saying 1.3-1.8GB is necessary to run coccicheck? That
> doesn't feel like an exorbitant request for a developer-only tool these
> days, but I have noticed some people on the list tend to have lousier
> machines than I do. ;)
> 
> -Peff

It's probably not worth trying to make this more complicated and scale up how 
many files we do at once based on the amount of available memory on the 
system...

Thanks,
Jake

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 12:59:02AM +0200, René Scharfe wrote:

> We could also do something like this to reduce the amount of manual
> casting, but do we want to?  (Macro at the bottom, three semi-random
> examples at the top.)
> [...]
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 5f2e90932f..f9e78d69a2 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -1066,6 +1066,18 @@ static inline void sane_qsort(void *base, size_t 
> nmemb, size_t size,
>   qsort(base, nmemb, size, compar);
>  }
>  
> +#define DEFINE_SORT(name, elemtype, one, two, code)  \
> +static int name##_compare(const void *one##_v_, const void *two##_v_)
> \
> +{\
> + elemtype const *one = one##_v_; \
> + elemtype const *two = two##_v_; \
> + code;   \
> +}\
> +static void name(elemtype *array, size_t n)  \
> +{\
> + QSORT(array, n, name##_compare);\
> +}

Interesting. When I saw the callers of this macro, I first thought you
were just removing the casts from the comparison function, but the real
value here is the matching QSORT() wrapper which provides the type
safety.

I'm not wild about declaring functions inside macros, just because it
makes tools like ctags like useful (but I have certainly been guilty of
it myself). I'd also worry that taking "code" as a macro parameter might
not scale (what happens if the code has a comma in it?)

I think we can address that last part by switching the definition order.
Like:

  #define DEFINE_SORT(name, elemtype, one, two) \
  static int name##_compare(const void *, const void *);\
  static void name(elemtype *array, size_t n)   \
  { \
QSORT(array, n, name##_compare);\
  } \
  static int name##_compare(const void *one##_v_, const void *two##_v_) \
  { \
elemtype const *one = one##_v_; \
elemtype const *two = two##_v_; \

And then expecting the caller to do:

  DEFINE_SORT(foo, struct foo, a, b)
 /* code goes here */
  }

The unbalanced braces are nasty, though (and likely to screw up editor
formatting, highlighting, etc).

I wonder if it would be possible to just declare the comparison function
with its real types, and then teach QSORT() to do a type check. That
would require typeof() at least, but it would be OK for the type-check
to be available only to gcc/clang users, I think.

I'm not quite sure what that type-check would look like, but I was
thinking something along the lines of (inside the QSORT macro):

  do {
/* this will yield a type mismatch if fed the wrong function */
int (*check)(const typeof(array), const typeof(array)) = compar;
sane_qsort(array, n, sizeof(*array), n);
  } while (0)

I have no idea if that even comes close to compiling, though.

-Peff

Re: [PATCH v4 2/7] Documentation: add shell guidelines

2018-10-05 Thread Junio C Hamano

Matthew DeVore  writes:

> Add the following guideline to Documentation/CodingGuidelines:
>
>   &&, ||, and | should appear at the end of lines, not the
>   beginning, and the \ line continuation character should be
>   omitted

"should be omitted" sounds as if it is the norm to have such a
character, but it is not.  The text in the actual patch body does a
much better job than this.

Perhaps

Break overlong lines after "&&", "||", and "|", not before
them; that way the command can continue to subsequent lines
without backslash at the end.

> And the following to t/README (since it is specific to writing tests):
>
>   pipes and $(git ...) should be avoided when they swallow exit
>   codes of Git processes

Good.

> Signed-off-by: Matthew DeVore 
> ---
>  Documentation/CodingGuidelines | 18 ++
>  t/README   | 28 
>  2 files changed, 46 insertions(+)
>
> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> index 48aa4edfb..72967deb7 100644
> --- a/Documentation/CodingGuidelines
> +++ b/Documentation/CodingGuidelines
> @@ -118,6 +118,24 @@ For shell scripts specifically (not exhaustive):
>   do this
>   fi
>  
> + - If a command sequence joined with && or || or | spans multiple
> +   lines, put each command on a separate line and put && and || and |
> +   operators at the end of each line, rather than the start. This
> +   means you don't need to use \ to join lines, since the above
> +   operators imply the sequence isn't finished.

Correct.  Even though I wonder if we need to say the last sentence,
which is rather obvious, patch is already written and I do not see
much point editing this further.

> + (incorrect)
> + grep blob verify_pack_result \
> + | awk -f print_1.awk \
> + | sort >actual &&
> + ...
> +
> + (correct)
> + grep blob verify_pack_result |
> + awk -f print_1.awk |
> + sort >actual &&
> + ...
> +
>   - We prefer "test" over "[ ... ]".
>  
>   - We do not write the noiseword "function" in front of shell
> diff --git a/t/README b/t/README
> index 85024aba6..9a71d5732 100644
> --- a/t/README
> +++ b/t/README
> @@ -466,6 +466,34 @@ And here are the "don'ts:"
> platform commands; just use '! cmd'.  We are not in the business
> of verifying that the world given to us sanely works.
>  
> + - Don't use Git upstream in the non-final position in a piped chain, as
> +   in:

"upstream in the non-final position" is a bit redundant, isn't it?

  - Don't feed the output of 'git' to a pipe, as in:

> +
> + git -C repo ls-files |
> + xargs -n 1 basename |
> + grep foo
> +
> +   which will discard git's exit code and may mask a crash. In the
> +   above example, all exit codes are ignored except grep's.

Good.

> +   Instead, write the output of that command to a temporary
> +   file with ">" or assign it to a variable with "x=$(git ...)" rather
> +   than pipe it.
> +
> + - Don't use command substitution in a way that discards git's exit
> +   code. When assigning to a variable, the exit code is not discarded,
> +   e.g.:
> +
> + x=$(git cat-file -p $sha) &&
> + ...
> +
> +   is OK because a crash in "git cat-file" will cause the "&&" chain
> +   to fail, but:
> +
> + test "refs/heads/foo" = "$(git symbolic-ref HEAD)"
> +
> +   is not OK and a crash in git could go undetected.

Good.

>   - Don't use perl without spelling it as "$PERL_PATH". This is to help
> our friends on Windows where the platform Perl often adds CR before
> the end of line, and they bundle Git with a version of Perl that

Re: [PATCH v4 5/7] tests: don't swallow Git errors upstream of pipes

2018-10-05 Thread Junio C Hamano

Matthew DeVore  writes:

> Some pipes in tests lose the exit code of git processes, which can mask
> unexpected behavior like crashes. Split these pipes up so that git
> commands are only at the end of pipes rather than the beginning or
> middle.
>
> The violations fixed in this patch were found in the process of fixing
> pipe placement in a prior patch.

Hopefully this is not a blind mechanical patch, as introduction of
unexpected temporary files in the working tree could interfere with
later tests (e.g. they may expect exact set of untracked files, and
these new temporary files would b unexpected additions).

Thanks.

Re: Is there some script to find un-delta-able objects?

2018-10-05 Thread Junio C Hamano

Jeff King  writes:

> On Fri, Oct 05, 2018 at 04:20:27PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> I.e. something to generate the .gitattributes file using this format:
>> 
>> https://git-scm.com/docs/gitattributes#_packing_objects
>> 
>> Some stuff is obvious, like "*.gpg binary -delta", but I'm wondering if
>> there's some repo scanner utility to spew this out for a given repo.
>
> I'm not sure what you mean by "un-delta-able" objects. Do you mean ones
> where we're not likely to find a delta? Or ones where Git will not try
> to look for a delta?
>
> If the latter, I think the only rules are the "-delta" attribute and the
> object size. You should be able to use git-check-attr and "git-cat-file"
> to get that info.
>
> If the former, I don't know how you would know. We can only report on
> what isn't a delta _yet_.

I am reasonably sure that the question is about solving the former
so that "-delta" attribute is set appropriately.

Iniitially, I thought that it is likely an undeltifiable object has
higher randomness than deltifiable ones and that can be exploited,
but if you have such a highly random blob A (and no other object
like it) in the repository and then later acquire another blob B
that happens to share most of the data with A, then A and B by
themselves will pass the "highly random" test but still yet each can
be expressed as a delta derived from the other.  So your "what isn't
a delta yet" is a reasonable assessment of what mechanically can be
known.  

Knowledge/heuristic like "No two '*.gpg' files are expected to be
alike" needs something more than the randomness of individual files,
I guess.

Re: [PATCH v3 1/3] help: redirect to aliased commands for "git cmd --help"

2018-10-05 Thread Junio C Hamano

Rasmus Villemoes  writes:

>> It also is strange to count from (0); if the patchset is rerolled
>> again, I'd prefer to see these start counting from (1), in which
>> case this item will become (3).
>
> If you prefer, I can send a v4.

Sure, if you prefer, you can send a v4 for me to look at and queue.

Thanks.

Re: [PATCH] branch: trivial style fix

2018-10-05 Thread Junio C Hamano

Tao Qingyun  writes:

> ---
>  builtin/branch.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/branch.c b/builtin/branch.c
> index c396c41533..52aad0f602 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -222,10 +222,11 @@ static int delete_branches(int argc, const char **argv, 
> int force, int kinds,
>   if (!head_rev)
>   die(_("Couldn't look up commit object for HEAD"));
>   }
> - for (i = 0; i < argc; i++, strbuf_reset()) {
> + for (i = 0; i < argc; i++) {
>   char *target = NULL;
>   int flags = 0;
>  
> + strbuf_reset();

This is not "trivial" nor "style fix", though.

It is not "trivial" because it also changes the behaviour.  Instead
of resetting the strbuf each time after the loop body runs, the new
code resets it before the loop body, even for the 0-th iteration
where the strbuf hasn't even been used at all.  It is not a "style
fix" because we do not have a particular reason to avoid using the
comma operator to perform two simple expressions with side effects
inside a single expression.

>   strbuf_branchname(, argv[i], allowed_interpret);
>   free(name);
>   name = mkpathdup(fmt, bname.buf);
> @@ -716,8 +717,7 @@ int cmd_branch(int argc, const char **argv, const char 
> *prefix)
>   print_columns(, colopts, NULL);
>   string_list_clear(, 0);
>   return 0;
> - }
> - else if (edit_description) {
> + } else if (edit_description) {

This one is a style fix that is trivial.  It does not change the
semantics a bit, and we do prefer "else if" to be on the same line
as the closing "}" of the earlier "if" that opened the matching "{".

>   const char *branch_name;
>   struct strbuf branch_ref = STRBUF_INIT;

Re: Is there some script to find un-delta-able objects?

2018-10-05 Thread Ævar Arnfjörð Bjarmason

On Fri, Oct 05 2018, Jeff King wrote:

> On Fri, Oct 05, 2018 at 04:20:27PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> I.e. something to generate the .gitattributes file using this format:
>>
>> https://git-scm.com/docs/gitattributes#_packing_objects
>>
>> Some stuff is obvious, like "*.gpg binary -delta", but I'm wondering if
>> there's some repo scanner utility to spew this out for a given repo.
>
> I'm not sure what you mean by "un-delta-able" objects. Do you mean ones
> where we're not likely to find a delta? Or ones where Git will not try
> to look for a delta?
>
> If the latter, I think the only rules are the "-delta" attribute and the
> object size. You should be able to use git-check-attr and "git-cat-file"
> to get that info.
>
> If the former, I don't know how you would know. We can only report on
> what isn't a delta _yet_.

Some version of the former. Ones where we haven't found any (or much of)
useful deltas yet. E.g. say I had a repository with a lot of files
generated by this command at various points in the history:

dd if=/dev/urandom of=file.binary count=1024 bs=1024

Some script similar to git-sizer which could report that the
packed+compressed+delta'd version of the 10 *.binary files I had in my
history had a 1:1 ratio of how large they were in .git, v.s. how large
the sum of each file retrieved by "git show" was (i.e. uncompressed,
un-delta'd).

That doesn't mean that tomorrow I won't commit 10 new objects which
would have a really good delta ratio to those 10 existing files,
bringing the ratio to ~1:2, but if I had some report like:

For a given repo that could be fed into .gitattributes to say we
shouldn't bother to delta files of certain extensions.

Re: [PATCH v3] coccicheck: process every source file at once

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 02:40:48PM +0200, SZEDER Gábor wrote:

> On Thu, Oct 04, 2018 at 07:17:47PM -0700, Jacob Keller wrote:
> > Junio, do you want me to update the commit message on my side with the
> > memory concerns? Or could you update it to mention memory as a noted
> > trade off.
> 
> We have been running 'make -j2 coccicheck' in the static analysis
> build job on Travis CI, which worked just fine so far.  The Travis CI
> build environments have 3GB of memory available [1], but, as shown in
> [2], with this patch the memory consumption jumps up to about
> 1.3-1.8GB for each of those jobs.  So with two parallel jobs we will
> very likely bump into this limit.
> 
> So this patch should definitely change that build script to run only a
> single job.

It should still be a net win, since the total CPU seems to drop by a
factor of 3-4.

Are we OK with saying 1.3-1.8GB is necessary to run coccicheck? That
doesn't feel like an exorbitant request for a developer-only tool these
days, but I have noticed some people on the list tend to have lousier
machines than I do. ;)

-Peff

Re: [PATCH] branch: trivial style fix

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 05:52:14PM +0800, Tao Qingyun wrote:

> diff --git a/builtin/branch.c b/builtin/branch.c
> index c396c41533..52aad0f602 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -222,10 +222,11 @@ static int delete_branches(int argc, const char **argv, 
> int force, int kinds,
>   if (!head_rev)
>   die(_("Couldn't look up commit object for HEAD"));
>   }
> - for (i = 0; i < argc; i++, strbuf_reset()) {
> + for (i = 0; i < argc; i++) {
>   char *target = NULL;
>   int flags = 0;
>  
> + strbuf_reset();
>   strbuf_branchname(, argv[i], allowed_interpret);
>   free(name);
>   name = mkpathdup(fmt, bname.buf);

This one isn't just style: it switches the reset from the end of the
loop to the beginning of the loop. So we'd need to make sure that we are
not depending on its contents going into the first iteration, nor coming
out of the last one.

Looking at the surrounding code, I think it is OK.

> @@ -716,8 +717,7 @@ int cmd_branch(int argc, const char **argv, const char 
> *prefix)
>   print_columns(, colopts, NULL);
>   string_list_clear(, 0);
>   return 0;
> - }
> - else if (edit_description) {
> + } else if (edit_description) {
>   const char *branch_name;
>   struct strbuf branch_ref = STRBUF_INIT;

And this one is obviously correct.

-Peff

Re: Is there some script to find un-delta-able objects?

2018-10-05 Thread Jeff King

On Fri, Oct 05, 2018 at 04:20:27PM +0200, Ævar Arnfjörð Bjarmason wrote:

> I.e. something to generate the .gitattributes file using this format:
> 
> https://git-scm.com/docs/gitattributes#_packing_objects
> 
> Some stuff is obvious, like "*.gpg binary -delta", but I'm wondering if
> there's some repo scanner utility to spew this out for a given repo.

I'm not sure what you mean by "un-delta-able" objects. Do you mean ones
where we're not likely to find a delta? Or ones where Git will not try
to look for a delta?

If the latter, I think the only rules are the "-delta" attribute and the
object size. You should be able to use git-check-attr and "git-cat-file"
to get that info.

If the former, I don't know how you would know. We can only report on
what isn't a delta _yet_.

-Peff

[PATCH v3 0/1] Teach the builtin rebase about the builtin interactive rebase

2018-10-05 Thread Johannes Schindelin via GitGitGadget

The builtin rebase and the builtin interactive rebase have been developed
independently, on purpose: Google Summer of Code rules specifically state
that students have to work on independent projects, they cannot collaborate
on the same project.

The reason is probably the very fine tradition in academia to prohibit
teamwork, which makes grading easier (at the expense of not exactly
preparing the students for the real world, unless they want to stay in
academia).

One fallout is that the rebase-in-c and rebase-i-in-c patches cause no merge
conflicts but a royal number of tests in the test suite to fail.

It is easy to explain why: rebase-in-c was developed under the assumption
that all rebase backends are implemented in Unix shell script and can be
sourced via . git-rebase--, which is no longer true with 
rebase-i-in-c, where git-rebase--interactive is a hard-linked builtin.

This patch fixes that.

Note: while this patch targets pk/rebase-in-c-6-final, it will not work
correctly without ag/rebase-i-in-c. So my suggestion is to rewrite the 
pk/rebas-in-c-6-final branch by first merging ag/rebase-i-in-c, then
applying this here patch, and only then cherry-pick "rebase: default to
using the builtin rebase".

Changes since v2:

 * Prepare for the break command, by skipping the call to finish_rebase() 
   for interactive rebases altogether (the built-in interactive rebase
   already takes care of that).
 * Remove a no-longer-used case arm (skipped by the newly-introduced code).

Changes since v1:

 * replaced the too-terse commit message by a copy-edited version of this
   cover letter (leaving out only the rant about disallowing teamwork).

Johannes Schindelin (1):
  builtin rebase: prepare for builtin rebase -i

 builtin/rebase.c | 87 +---
 1 file changed, 83 insertions(+), 4 deletions(-)


base-commit: 67e0df2f391ec4177942a4d8b70e530aa5653c0d
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-23%2Fdscho%2Frebase-in-c-6-final-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-23/dscho/rebase-in-c-6-final-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/23

Range-diff vs v2:

 1:  5403014be7 ! 1:  db1652ef3e builtin rebase: prepare for builtin rebase -i
 @@ -18,6 +18,15 @@
  
  This patch fixes that.
  
 +Please note that we also skip the finish_rebase() call for interactive
 +rebases because the built-in interactive rebase already takes care of
 +that. This is needed to support the upcoming `break` command that 
wants
 +to interrupt the rebase with exit code 0 (and naturally wants to keep
 +the state directory intact when doing so).
 +
 +While at it, remove the `case` arm for the interactive rebase that is
 +now skipped in favor of the short-cut to the built-in rebase.
 +
  Signed-off-by: Johannes Schindelin 
  
  diff --git a/builtin/rebase.c b/builtin/rebase.c
 @@ -117,6 +126,17 @@
add_var(_snippet, "GIT_DIR", absolute_path(get_git_dir()));
add_var(_snippet, "state_dir", opts->state_dir);
   
 +@@
 +  backend = "git-rebase--am";
 +  backend_func = "git_rebase__am";
 +  break;
 +- case REBASE_INTERACTIVE:
 +- backend = "git-rebase--interactive";
 +- backend_func = "git_rebase__interactive";
 +- break;
 +  case REBASE_MERGE:
 +  backend = "git-rebase--merge";
 +  backend_func = "git_rebase__merge";
  @@
argv[0] = script_snippet.buf;
   
 @@ -124,4 +144,8 @@
  +finished_rebase:
if (opts->dont_finish_rebase)
; /* do nothing */
 ++ else if (opts->type == REBASE_INTERACTIVE)
 ++ ; /* interactive rebase cleans up after itself */
else if (status == 0) {
 +  if (!file_exists(state_dir_path("stopped-sha", opts)))
 +  finish_rebase(opts);

-- 
gitgitgadget

[PATCH v3 1/1] builtin rebase: prepare for builtin rebase -i

2018-10-05 Thread Johannes Schindelin via GitGitGadget

From: Johannes Schindelin 

The builtin rebase and the builtin interactive rebase have been
developed independently, on purpose: Google Summer of Code rules
specifically state that students have to work on independent projects,
they cannot collaborate on the same project.

One fallout is that the rebase-in-c and rebase-i-in-c patches cause no
merge conflicts but a royal number of tests in the test suite to fail.

It is easy to explain why: rebase-in-c was developed under the
assumption that all rebase backends are implemented in Unix shell script
and can be sourced via `. git-rebase--`, which is no longer
true with rebase-i-in-c, where git-rebase--interactive is a hard-linked
builtin.

This patch fixes that.

Please note that we also skip the finish_rebase() call for interactive
rebases because the built-in interactive rebase already takes care of
that. This is needed to support the upcoming `break` command that wants
to interrupt the rebase with exit code 0 (and naturally wants to keep
the state directory intact when doing so).

While at it, remove the `case` arm for the interactive rebase that is
now skipped in favor of the short-cut to the built-in rebase.

Signed-off-by: Johannes Schindelin 
---
 builtin/rebase.c | 87 +---
 1 file changed, 83 insertions(+), 4 deletions(-)

diff --git a/builtin/rebase.c b/builtin/rebase.c
index 1a697d70c9..20f7159cf2 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -326,6 +326,13 @@ static void add_var(struct strbuf *buf, const char *name, 
const char *value)
}
 }
 
+static const char *resolvemsg =
+N_("Resolve all conflicts manually, mark them as resolved with\n"
+"\"git add/rm \", then run \"git rebase --continue\".\n"
+"You can instead skip this commit: run \"git rebase --skip\".\n"
+"To abort and get back to the state before \"git rebase\", run "
+"\"git rebase --abort\".");
+
 static int run_specific_rebase(struct rebase_options *opts)
 {
const char *argv[] = { NULL, NULL };
@@ -333,6 +340,79 @@ static int run_specific_rebase(struct rebase_options *opts)
int status;
const char *backend, *backend_func;
 
+   if (opts->type == REBASE_INTERACTIVE) {
+   /* Run builtin interactive rebase */
+   struct child_process child = CHILD_PROCESS_INIT;
+
+   argv_array_pushf(_array, "GIT_CHERRY_PICK_HELP=%s",
+resolvemsg);
+   if (!(opts->flags & REBASE_INTERACTIVE_EXPLICIT)) {
+   argv_array_push(_array, "GIT_EDITOR=:");
+   opts->autosquash = 0;
+   }
+
+   child.git_cmd = 1;
+   argv_array_push(, "rebase--interactive");
+
+   if (opts->action)
+   argv_array_pushf(, "--%s", opts->action);
+   if (opts->keep_empty)
+   argv_array_push(, "--keep-empty");
+   if (opts->rebase_merges)
+   argv_array_push(, "--rebase-merges");
+   if (opts->rebase_cousins)
+   argv_array_push(, "--rebase-cousins");
+   if (opts->autosquash)
+   argv_array_push(, "--autosquash");
+   if (opts->flags & REBASE_VERBOSE)
+   argv_array_push(, "--verbose");
+   if (opts->flags & REBASE_FORCE)
+   argv_array_push(, "--no-ff");
+   if (opts->restrict_revision)
+   argv_array_pushf(,
+"--restrict-revision=^%s",
+
oid_to_hex(>restrict_revision->object.oid));
+   if (opts->upstream)
+   argv_array_pushf(, "--upstream=%s",
+
oid_to_hex(>upstream->object.oid));
+   if (opts->onto)
+   argv_array_pushf(, "--onto=%s",
+oid_to_hex(>onto->object.oid));
+   if (opts->squash_onto)
+   argv_array_pushf(, "--squash-onto=%s",
+oid_to_hex(opts->squash_onto));
+   if (opts->onto_name)
+   argv_array_pushf(, "--onto-name=%s",
+opts->onto_name);
+   argv_array_pushf(, "--head-name=%s",
+opts->head_name ?
+opts->head_name : "detached HEAD");
+   if (opts->strategy)
+   argv_array_pushf(, "--strategy=%s",
+opts->strategy);
+   if (opts->strategy_opts)
+   argv_array_pushf(, "--strategy-opts=%s",
+opts->strategy_opts);
+   if (opts->switch_to)
+   argv_array_pushf(, "--switch-to=%s",
+

Re: [PATCH 1/1] rebase -i: introduce the 'break' command

2018-10-05 Thread Johannes Schindelin

Hi Junio,

On Fri, 5 Oct 2018, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" 
> writes:
> 
> > From: Johannes Schindelin 
> >
> > The 'edit' command can be used to cherry-pick a commit and then
> > immediately drop out of the interactive rebase, with exit code 0, to let
> > the user amend the commit, or test it, or look around.
> >
> > Sometimes this functionality would come in handy *without*
> > cherry-picking a commit, e.g. to interrupt the interactive rebase even
> > before cherry-picking a commit, or immediately after an 'exec' or a
> > 'merge'.
> >
> > This commit introduces that functionality, as the spanking new 'break' 
> > command.
> >
> > Suggested-by: Stefan Beller 
> > Signed-off-by: Johannes Schindelin 
> > ---
> 
> If one wants to emulate this with the versions of Git that are
> currently deployed, would it be sufficient to insert "exec false"
> instead of "break"?

There is one crucial difference: the exit code.

If you are scripting around `git rebase -i` (and I do, heavily), then that
is quite a difference: who's to say that the rebase "failed" because of
that `exec false`, or if it failed another `exec` unexpectedly?

> The reason I am asking is *not* to imply that we do not need this
> new feature.  It is because I vaguely recall seeing a request to add
> 'pause' to the insn set and "exec false" was mentioned as a more
> general alternative long time ago.  I am trying to see if this is a
> recurring request/wish, because it would reinforce that this new
> feature would be a good addition if that is the case.
> 
> I suspect that "exec false" would give a message that looks like a
> complaint ("'false' failed so we are giving you control back to fix
> things" or something like that), and having a dedicated way to pause
> the execution without alarming the user is a good idea.
> 
> I think the earlier request asked for 'pause' (I didn't dig the list
> archive very carefully, though),

No need to: I mentioned it in the cover letter. Here is the link again,
for your convenience:
https://public-inbox.org/git/20180118183618.39853-3-sbel...@google.com/

> and 'stop' may also be a possible verb, but I tend to agree with this
> patch that 'break' is probably the best choice, simply because it begins
> with 'b' in the abbreviated form, a letter that is not yet used by
> others (unlike 'pause' or 'stop' that would want 'p' and 's' that are
> already taken)..
> 
> Here is a tangent, but I think the description of "-x " in "git
> rebase --continue" should mention that a failing command would
> interrupt the sequencer.  That fact about "exec" command is given
> much later in the last part of the "interactive mode" section of the
> manual, so technically our docs are not being incomplete, but the
> current description is not helpful to those who are looking for
> substring "exec" from the beginning of the documentation to find out
> if the exit status of the command affects the way commits are
> replayed (which is what I was doing when imagining how users would
> emulate this feature with deployed versions of Git).
> 
> Perhaps something as simple as this...
> 
>  Documentation/git-rebase.txt | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> index 0e20a66e73..0fc5a851b5 100644
> --- a/Documentation/git-rebase.txt
> +++ b/Documentation/git-rebase.txt
> @@ -420,7 +420,8 @@ idea unless you know what you are doing (see BUGS below).
>  --exec ::
>   Append "exec " after each line creating a commit in the
>   final history.  will be interpreted as one or more shell
> - commands.
> + commands, and interrupts the rebase session when it exits with
> + non-zero status.

Good initial version. I would like it to be a bit more precise about who
exits with what status. How about this:

+   commands. Any command that fails will interrupt the rebase,
+   with exit code 1.

>  +
>  You may execute several commands by either using one instance of `--exec`
>  with several commands:
> 
> 
> Also, it seems that this has some interaction with the topics in
> flight; the added test does pass when queued on top of 'master', but
> fails when merged to 'pu'.  I didn't look into the details as I am
> not fully online yet.

I had a similar issue in a preliminary revision, and had to unset
GIT_EDITOR to fix it. Probably the culprit here is the same; I could
imagine that core.editor was set by another topic.

[... clicketiclick, debug debug debug, 1.5h later...]

Actually, this is not the problem. The problem is that the interactive
rebase exits with status 0 but does not leave a `stopped-sha` file behind,
and the builtin rebase mistakes that for a sign that it can clean up the
state dir.

However, we definitely do not want to leave that file, because it
indicates a fixup or squash with merge conflicts was left behind.

Taking a step back, it appears that we do a whole lot of work for nothing
in the

Re: [PATCH v4 1/7] t/README: reformat Do, Don't, Keep in mind lists

2018-10-05 Thread Matthew DeVore

On Thu, Oct 4, 2018 at 11:15 PM Junio C Hamano  wrote:
>
> Matthew DeVore  writes:
>
> > -Do's, don'ts & things to keep in mind
> > +Do's & don'ts
> >  -
>
> We may not format this with AsciiDoc, but please shorten the
> underline so that it aligns with the line above that it applies to.
>
Thanks - Done. I will send a re-roll within the next 8 hours assuming
no further comments.

Is there some script to find un-delta-able objects?

2018-10-05 Thread Ævar Arnfjörð Bjarmason

I.e. something to generate the .gitattributes file using this format:

https://git-scm.com/docs/gitattributes#_packing_objects

Some stuff is obvious, like "*.gpg binary -delta", but I'm wondering if
there's some repo scanner utility to spew this out for a given repo.

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Ævar Arnfjörð Bjarmason



On Fri, Oct 05 2018, Derrick Stolee wrote:

> On 10/5/2018 9:05 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Fri, Oct 05 2018, Derrick Stolee wrote:
>>
>>> On 10/4/2018 5:42 PM, Ævar Arnfjörð Bjarmason wrote:
 I don't have time to polish this up for submission now, but here's a WIP
 patch that implements this, highlights:

* There's a gc.clone.autoDetach=false default setting which overrides
  gc.autoDetach if 'git gc --auto' is run via git-clone (we just pass a
  --cloning option to indicate this).
>>> I'll repeat that it could make sense to do the same thing on clone
>>> _and_ fetch. Perhaps a "--post-fetch" flag would be good here to
>>> communicate that we just downloaded a pack from a remote.
>> I don't think that makes sense, but let's talk about why, because maybe
>> I've missed something, you're certainly more familiar with the
>> commit-graph than I am.
>>
>> The reason to do it on clone as a special-case or when the file is
>> missing, is because we know the file is desired (via the GC config), and
>> presumably is expected to help performance, and we have 0% of it. So by
>> going from 0% to 100% on clone we'll get fast --contains and other
>> goodies the graph helps with.
>>
>> But when we're doing a fetch, or really anything else that runs "git gc
>> --auto" we can safely assume that we have a recent enough graph, because
>> it will have been run whenever auto-gc kicked in.
>>
>> I.e.:
>>
>>  # Slow, if we assume background forked commit-graph generation
>>  # (which I'm avoiding)
>>  git clone x && cd x && git tag --contains
>>  # Fast enough, since we have an existing commit-graph
>>  cd x && git fetch && git tag --contains
>>
>> I *do* think it might make sense to in general split off parts of "gc
>> --auto" that we'd like to be more aggressive about, simply because the
>> ratio of how long it takes to do, and how much it helps with performance
>> makes more sense than a full repack, which is what the current heuristic
>> is based on.
>>
>> And maybe when we run in that mode we should run in the foreground, but
>> I don't see why git-fetch should be a special case there, and in this
>> regard, the gc.clone.autoDetach=false setting I've made doesn't make
>> much sence. I.e. maybe we should also skip forking to the background in
>> such a mode when we trigger such a "mini gc" via git-commit or whatever.
>
> My misunderstanding was that your proposed change to gc computes the
> commit-graph in either of these two cases:
>
> (1) The auto-GC threshold is met.
>
> (2) There is no commit-graph file.
>
> And what I hope to have instead of (2) is (3):
>
> (3) The commit-graph file is "sufficiently behind" the tip refs.
>
> This condition is intentionally vague at the moment. It could be that
> we hint that (3) holds by saying "--post-fetch" (i.e. "We just
> downloaded a pack, and it probably contains a lot of new commits") or
> we could create some more complicated condition based on counting
> reachable commits with infinite generation number (the number of
> commits not in the commit-graph file).
>
> I like that you are moving forward to make the commit-graph be written
> more frequently, but I'm trying to push us in a direction of writing
> it even more often than your proposed strategy. We should avoid
> creating too many orthogonal conditions that trigger the commit-graph
> write, which is why I'm pushing on your design here.
>
> Anyone else have thoughts on this direction?

Ah. I see. I think #3 makes perfect sense, but probably makes sense to
do as a follow-up, or maybe you'd like to stick a patch on top of the
series I have when I send it. I don't know how to write the "I'm not
quite happy about the commit graph" code :)

What I will do is refactor gc.c a bit and leave it in a state where it's
going to be really easy to change the existing "we have no commit graph,
and thus should do the optimization step" to have some more complex
condition instead of "we have no commit graph", i.e. your "we just
grabbed a lot of data".

Also, I'll drop the gc.clone.autoDetach=false setting and name it
something more general. maybe gc.AutoDetachOnBigOptimization=false?
Anyway something more generic so that "clone" will always pass in some
option saying "expect a large % commit graph update" (100% in its case),
and then in "fetch" we could have some detection of how big what we just
got from the server is, and do the same.

This seems to be to be the most general thing that would make sense, and
could also be extended e.g. to "git commit" and other users of gc
--auto. If I started with a README file in an empty repo, and then made
a commit where I added 1 million files all in one commit, in which case
we'd (depending on that setting) also block in the foreground and
generate the commit-graph.

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Derrick Stolee


On 10/5/2018 9:05 AM, Ævar Arnfjörð Bjarmason wrote:

On Fri, Oct 05 2018, Derrick Stolee wrote:


On 10/4/2018 5:42 PM, Ævar Arnfjörð Bjarmason wrote:

I don't have time to polish this up for submission now, but here's a WIP
patch that implements this, highlights:

   * There's a gc.clone.autoDetach=false default setting which overrides
 gc.autoDetach if 'git gc --auto' is run via git-clone (we just pass a
 --cloning option to indicate this).

I'll repeat that it could make sense to do the same thing on clone
_and_ fetch. Perhaps a "--post-fetch" flag would be good here to
communicate that we just downloaded a pack from a remote.

I don't think that makes sense, but let's talk about why, because maybe
I've missed something, you're certainly more familiar with the
commit-graph than I am.

The reason to do it on clone as a special-case or when the file is
missing, is because we know the file is desired (via the GC config), and
presumably is expected to help performance, and we have 0% of it. So by
going from 0% to 100% on clone we'll get fast --contains and other
goodies the graph helps with.

But when we're doing a fetch, or really anything else that runs "git gc
--auto" we can safely assume that we have a recent enough graph, because
it will have been run whenever auto-gc kicked in.

I.e.:

 # Slow, if we assume background forked commit-graph generation
 # (which I'm avoiding)
 git clone x && cd x && git tag --contains
 # Fast enough, since we have an existing commit-graph
 cd x && git fetch && git tag --contains

I *do* think it might make sense to in general split off parts of "gc
--auto" that we'd like to be more aggressive about, simply because the
ratio of how long it takes to do, and how much it helps with performance
makes more sense than a full repack, which is what the current heuristic
is based on.

And maybe when we run in that mode we should run in the foreground, but
I don't see why git-fetch should be a special case there, and in this
regard, the gc.clone.autoDetach=false setting I've made doesn't make
much sence. I.e. maybe we should also skip forking to the background in
such a mode when we trigger such a "mini gc" via git-commit or whatever.


My misunderstanding was that your proposed change to gc computes the 
commit-graph in either of these two cases:


(1) The auto-GC threshold is met.

(2) There is no commit-graph file.

And what I hope to have instead of (2) is (3):

(3) The commit-graph file is "sufficiently behind" the tip refs.

This condition is intentionally vague at the moment. It could be that we 
hint that (3) holds by saying "--post-fetch" (i.e. "We just downloaded a 
pack, and it probably contains a lot of new commits") or we could create 
some more complicated condition based on counting reachable commits with 
infinite generation number (the number of commits not in the 
commit-graph file).


I like that you are moving forward to make the commit-graph be written 
more frequently, but I'm trying to push us in a direction of writing it 
even more often than your proposed strategy. We should avoid creating 
too many orthogonal conditions that trigger the commit-graph write, 
which is why I'm pushing on your design here.


Anyone else have thoughts on this direction?

Thanks,

-Stolee

Re: [BUG] Error while trying to git apply a patch; works with patch -p1

2018-10-05 Thread SZEDER Gábor

On Fri, Oct 05, 2018 at 09:17:54AM -0300, Eneas Queiroz wrote:
> Em qui, 4 de out de 2018 às 18:45, SZEDER Gábor  
> escreveu:
> >
> > On Thu, Oct 04, 2018 at 06:01:11PM -0300, Eneas Queiroz wrote:
> > > I've sent this to the list 2 days ago, but I can't find it in the list
> > > archives, so I'm sending it again without files attached.  I apologize
> > > if this is a duplicate. One should be able to reproduce this with the
> > > current PR files, but if not, I can provide them.
> > >
> > > I've hit a strange error while trying to apply a patch from github
> > > here: https://github.com/openwrt/openwrt/pull/965
> > >
> > > 965.patch:452: trailing whitespace.
> > >
> > > 965.patch:559: space before tab in indent.
> > >  -o $(SHLIBNAME_FULL) \
> > > 965.patch:560: space before tab in indent.
> > >  $$ALLSYMSFLAGS $$SHOBJECTS $$NOALLSYMSFLAGS $$LIBDEPS; \
> > > 965.patch:564: space before tab in indent.
> > > -o $(SHLIBNAME_FULL) \
> > > 965.patch:2334: trailing whitespace.
> > >
> > > error: package/libs/openssl/patches/100-Configure-afalg-support.patch:
> > > No such file or directory
> > > error: package/libs/openssl/patches/110-openwrt_targets.patch: No such
> > > file or directory
> > > error: package/libs/openssl/patches/120-fix_link_segfault.patch: No
> > > such file or directory
> > > error: 
> > > package/libs/openssl/patches/1.1.0/100-Configure-afalg-support.patch:
> > > No such file or directory
> > > error: package/libs/openssl/patches/1.1.0/110-openwrt_targets.patch:
> > > No such file or directory
> > >
> > > If you get the patch file from
> > > https://github.com/openwrt/openwrt/pull/965.patch and apply it with
> > > git apply, it fails.  If I apply the same file with patch -p1, it
> > > works fine.  I've tried it with git 2.16.4 and 2.19, and they both
> > > fail with the same error, and at least 2 more people have confirmed
> > > it.
> > >
> > > git apply fails even when using git format-patch -13 --stdout as a
> > > source, so it is not github doing something weird.
> > >
> > > The file is a series of 13 patches.  If I split the series after the
> >
> > So this is no _a_ patch, then, but a mailbox of patches.  'git apply'
> > is supposed to apply a single a patch; apparently 'patch' is more
> > lenient.
> >
> > Have you tried 'git am'?
> >
> > > 3rd patch, it works.
> > > Also, if I use https://github.com/openwrt/openwrt/pull/965.diff, it also 
> > > works.
> > >
> > > I'm not subscribed to the list, so please CC me.
> > >
> > > Cheers,
> > >
> > > Eneas
> 
> Thanks for the reply.  Is that expected behavior?  git apply seems to
> work in most similar cases.

Oh, I'm surprised by that; but I rarely use 'git apply' directly, as I
apply even single patches with 'git am'.

> git am works fine--although it complains a lot about whitespace errors
> when patching *.patch output files.
> 
> I know they're just warnings, but perhaps git apply/am should suppress
> them when creating/patching a *.patch file, since context lines in
> '*.patch' files all start with a space.

'git am' and 'git apply' don't know or care about filetypes, and
neither does the rest of Git for that matter, unless you explicitly
tell them to.  With a proper .gitattributes entry you can tell them to
ignore all or only specific types of whitespace errors in '*.patch'
files, e.g.:

  *.patch -whitespace

or

  *.patch whitespace=-space-before-tab,-blank-at-eol

[PATCH v6 09/10] submodule: support reading .gitmodules when it's not in the working tree

2018-10-05 Thread Antonio Ospite

When the .gitmodules file is not available in the working tree, try
using the content from the index and from the current branch. This
covers the case when the file is part of the repository but for some
reason it is not checked out, for example because of a sparse checkout.

This makes it possible to use at least the 'git submodule' commands
which *read* the gitmodules configuration file without fully populating
the working tree.

Writing to .gitmodules will still require that the file is checked out,
so check for that before calling config_set_in_gitmodules_file_gently.

Add a similar check also in git-submodule.sh::cmd_add() to anticipate
the eventual failure of the "git submodule add" command when .gitmodules
is not safely writeable; this prevents the command from leaving the
repository in a spurious state (e.g. the submodule repository was cloned
but .gitmodules was not updated because
config_set_in_gitmodules_file_gently failed).

Moreover, since config_from_gitmodules() now accesses the global object
store, it is necessary to protect all code paths which call the function
against concurrent access to the global object store. Currently this
only happens in builtin/grep.c::grep_submodules(), so call
grep_read_lock() before invoking code involving
config_from_gitmodules().

Finally, add t7416-submodule-sparse-gitmodules.sh to verify that reading
from .gitmodules succeeds and that writing to it fails when the file is
not checked out.

NOTE: there is one rare case where this new feature does not work
properly yet: nested submodules without .gitmodules in their working
tree.  This has been documented with a warning and a test_expect_failure
item in t7814, and in this case the current behavior is not altered: no
config is read.

Signed-off-by: Antonio Ospite 
---
 builtin/grep.c | 17 +-
 builtin/submodule--helper.c|  6 +-
 git-submodule.sh   |  5 ++
 submodule-config.c | 31 +-
 t/t7411-submodule-config.sh| 26 -
 t/t7416-submodule-sparse-gitmodules.sh | 78 ++
 t/t7814-grep-recurse-submodules.sh | 16 ++
 7 files changed, 172 insertions(+), 7 deletions(-)
 create mode 100755 t/t7416-submodule-sparse-gitmodules.sh

diff --git a/builtin/grep.c b/builtin/grep.c
index 601f801158..7da8fef31a 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -421,11 +421,23 @@ static int grep_submodule(struct grep_opt *opt, struct 
repository *superproject,
struct repository submodule;
int hit;
 
-   if (!is_submodule_active(superproject, path))
+   /*
+* NEEDSWORK: submodules functions need to be protected because they
+* access the object store via config_from_gitmodules(): the latter
+* uses get_oid() which, for now, relies on the global the_repository
+* object.
+*/
+   grep_read_lock();
+
+   if (!is_submodule_active(superproject, path)) {
+   grep_read_unlock();
return 0;
+   }
 
-   if (repo_submodule_init(, superproject, path))
+   if (repo_submodule_init(, superproject, path)) {
+   grep_read_unlock();
return 0;
+   }
 
repo_read_gitmodules();
 
@@ -439,7 +451,6 @@ static int grep_submodule(struct grep_opt *opt, struct 
repository *superproject,
 * store is no longer global and instead is a member of the repository
 * object.
 */
-   grep_read_lock();
add_to_alternates_memory(submodule.objects->objectdir);
grep_read_unlock();
 
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 28f3ccca6d..5f8a804a6e 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2154,8 +2154,12 @@ static int module_config(int argc, const char **argv, 
const char *prefix)
return print_config_from_gitmodules(the_repository, argv[1]);
 
/* Equivalent to ACTION_SET in builtin/config.c */
-   if (argc == 3)
+   if (argc == 3) {
+   if (!is_writing_gitmodules_ok())
+   die(_("please make sure that the .gitmodules file is in 
the working tree"));
+
return config_set_in_gitmodules_file_gently(argv[1], argv[2]);
+   }
 
usage_with_options(git_submodule_helper_usage, module_config_options);
 }
diff --git a/git-submodule.sh b/git-submodule.sh
index 0805fadf47..f5124bbf23 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -159,6 +159,11 @@ cmd_add()
shift
done
 
+   if ! git submodule--helper config --check-writeable >/dev/null 2>&1
+   then
+die "$(eval_gettext "please make sure that the .gitmodules 
file is in the working tree")"
+   fi
+
if test -n "$reference_path"
then
is_absolute_path "$reference_path" ||
diff --git a/submodule-config.c b/submodule-config.c
index 8659d97e06..69bebb721e 100644

[PATCH v6 08/10] submodule: add a helper to check if it is safe to write to .gitmodules

2018-10-05 Thread Antonio Ospite

Introduce a helper function named is_writing_gitmodules_ok() to verify
that the .gitmodules file is safe to write.

The function name follows the scheme of is_staging_gitmodules_ok().

The two symbolic constants GITMODULES_INDEX and GITMODULES_HEAD are used
to get help from the C preprocessor in preventing typos, especially for
future users.

This is in preparation for a future change which teaches git how to read
.gitmodules from the index or from the current branch if the file is not
available in the working tree.

The rationale behind the check is that writing to .gitmodules requires
the file to be present in the working tree, unless a brand new
.gitmodules is being created (in which case the .gitmodules file would
not exist at all: neither in the working tree nor in the index or in the
current branch).

Expose the functionality also via a "submodule-helper config
--check-writeable" command, as git scripts may want to perform the check
before modifying submodules configuration.

Signed-off-by: Antonio Ospite 
---
 builtin/submodule--helper.c | 24 +++-
 cache.h |  2 ++
 submodule.c | 18 ++
 submodule.h |  1 +
 t/t7411-submodule-config.sh | 31 +++
 5 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index e1bdca8f0b..28f3ccca6d 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2127,6 +2127,28 @@ static int check_name(int argc, const char **argv, const 
char *prefix)
 
 static int module_config(int argc, const char **argv, const char *prefix)
 {
+   enum {
+   CHECK_WRITEABLE = 1
+   } command = 0;
+
+   struct option module_config_options[] = {
+   OPT_CMDMODE(0, "check-writeable", ,
+   N_("check if it is safe to write to the .gitmodules 
file"),
+   CHECK_WRITEABLE),
+   OPT_END()
+   };
+   const char *const git_submodule_helper_usage[] = {
+   N_("git submodule--helper config name [value]"),
+   N_("git submodule--helper config --check-writeable"),
+   NULL
+   };
+
+   argc = parse_options(argc, argv, prefix, module_config_options,
+git_submodule_helper_usage, PARSE_OPT_KEEP_ARGV0);
+
+   if (argc == 1 && command == CHECK_WRITEABLE)
+   return is_writing_gitmodules_ok() ? 0 : -1;
+
/* Equivalent to ACTION_GET in builtin/config.c */
if (argc == 2)
return print_config_from_gitmodules(the_repository, argv[1]);
@@ -2135,7 +2157,7 @@ static int module_config(int argc, const char **argv, 
const char *prefix)
if (argc == 3)
return config_set_in_gitmodules_file_gently(argv[1], argv[2]);
 
-   die("submodule--helper config takes 1 or 2 arguments: name [value]");
+   usage_with_options(git_submodule_helper_usage, module_config_options);
 }
 
 #define SUPPORT_SUPER_PREFIX (1<<0)
diff --git a/cache.h b/cache.h
index d508f3d4f8..2d495fc800 100644
--- a/cache.h
+++ b/cache.h
@@ -486,6 +486,8 @@ static inline enum object_type object_type(unsigned int 
mode)
 #define INFOATTRIBUTES_FILE "info/attributes"
 #define ATTRIBUTE_MACRO_PREFIX "[attr]"
 #define GITMODULES_FILE ".gitmodules"
+#define GITMODULES_INDEX ":.gitmodules"
+#define GITMODULES_HEAD "HEAD:.gitmodules"
 #define GIT_NOTES_REF_ENVIRONMENT "GIT_NOTES_REF"
 #define GIT_NOTES_DEFAULT_REF "refs/notes/commits"
 #define GIT_NOTES_DISPLAY_REF_ENVIRONMENT "GIT_NOTES_DISPLAY_REF"
diff --git a/submodule.c b/submodule.c
index 3b23e76e55..bd2506c5ba 100644
--- a/submodule.c
+++ b/submodule.c
@@ -51,6 +51,24 @@ int is_gitmodules_unmerged(const struct index_state *istate)
return 0;
 }
 
+/*
+ * Check if the .gitmodules file is safe to write.
+ *
+ * Writing to the .gitmodules file requires that the file exists in the
+ * working tree or, if it doesn't, that a brand new .gitmodules file is going
+ * to be created (i.e. it's neither in the index nor in the current branch).
+ *
+ * It is not safe to write to .gitmodules if it's not in the working tree but
+ * it is in the index or in the current branch, because writing new values
+ * (and staging them) would blindly overwrite ALL the old content.
+ */
+int is_writing_gitmodules_ok(void)
+{
+   struct object_id oid;
+   return file_exists(GITMODULES_FILE) ||
+   (get_oid(GITMODULES_INDEX, ) < 0 && 
get_oid(GITMODULES_HEAD, ) < 0);
+}
+
 /*
  * Check if the .gitmodules file has unstaged modifications.  This must be
  * checked before allowing modifications to the .gitmodules file with the
diff --git a/submodule.h b/submodule.h
index e452919aa4..7a22f71cb9 100644
--- a/submodule.h
+++ b/submodule.h
@@ -40,6 +40,7 @@ struct submodule_update_strategy {
 #define SUBMODULE_UPDATE_STRATEGY_INIT {SM_UPDATE_UNSPECIFIED, NULL}
 
 int

[PATCH v6 05/10] submodule--helper: add a new 'config' subcommand

2018-10-05 Thread Antonio Ospite

Add a new 'config' subcommand to 'submodule--helper', this extra level
of indirection makes it possible to add some flexibility to how the
submodules configuration is handled.

Signed-off-by: Antonio Ospite 
---
 builtin/submodule--helper.c | 14 ++
 t/t7411-submodule-config.sh | 27 +++
 2 files changed, 41 insertions(+)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 40844870cf..e1bdca8f0b 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2125,6 +2125,19 @@ static int check_name(int argc, const char **argv, const 
char *prefix)
return 0;
 }
 
+static int module_config(int argc, const char **argv, const char *prefix)
+{
+   /* Equivalent to ACTION_GET in builtin/config.c */
+   if (argc == 2)
+   return print_config_from_gitmodules(the_repository, argv[1]);
+
+   /* Equivalent to ACTION_SET in builtin/config.c */
+   if (argc == 3)
+   return config_set_in_gitmodules_file_gently(argv[1], argv[2]);
+
+   die("submodule--helper config takes 1 or 2 arguments: name [value]");
+}
+
 #define SUPPORT_SUPER_PREFIX (1<<0)
 
 struct cmd_struct {
@@ -2154,6 +2167,7 @@ static struct cmd_struct commands[] = {
{"absorb-git-dirs", absorb_git_dirs, SUPPORT_SUPER_PREFIX},
{"is-active", is_active, 0},
{"check-name", check_name, 0},
+   {"config", module_config, 0},
 };
 
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
index b1f3c6489b..791245f18d 100755
--- a/t/t7411-submodule-config.sh
+++ b/t/t7411-submodule-config.sh
@@ -134,4 +134,31 @@ test_expect_success 'error in history in 
fetchrecursesubmodule lets continue' '
)
 '
 
+test_expect_success 'reading submodules config with "submodule--helper 
config"' '
+   (cd super &&
+   echo "../submodule" >expect &&
+   git submodule--helper config submodule.submodule.url >actual &&
+   test_cmp expect actual
+   )
+'
+
+test_expect_success 'writing submodules config with "submodule--helper 
config"' '
+   (cd super &&
+   echo "new_url" >expect &&
+   git submodule--helper config submodule.submodule.url "new_url" 
&&
+   git submodule--helper config submodule.submodule.url >actual &&
+   test_cmp expect actual
+   )
+'
+
+test_expect_success 'overwriting unstaged submodules config with 
"submodule--helper config"' '
+   test_when_finished "git -C super checkout .gitmodules" &&
+   (cd super &&
+   echo "newer_url" >expect &&
+   git submodule--helper config submodule.submodule.url 
"newer_url" &&
+   git submodule--helper config submodule.submodule.url >actual &&
+   test_cmp expect actual
+   )
+'
+
 test_done
-- 
2.19.0

[PATCH v6 10/10] t/helper: add test-submodule-nested-repo-config

2018-10-05 Thread Antonio Ospite

Add a test tool to exercise config_from_gitmodules(), in particular for
the case of nested submodules.

Add also a test to document that reading the submoudles config of nested
submodules does not work yet when the .gitmodules file is not in the
working tree but it still in the index.

This is because the git API does not always make it possible access the
object store  of an arbitrary repository (see get_oid() usage in
config_from_gitmodules()).

When this git limitation gets fixed the aforementioned use case will be
supported too.

Signed-off-by: Antonio Ospite 
---
 Makefile |  1 +
 t/helper/test-submodule-nested-repo-config.c | 30 +
 t/helper/test-tool.c |  1 +
 t/helper/test-tool.h |  1 +
 t/t7411-submodule-config.sh  | 34 
 5 files changed, 67 insertions(+)
 create mode 100644 t/helper/test-submodule-nested-repo-config.c

diff --git a/Makefile b/Makefile
index 13e1c52478..fe8587cd8c 100644
--- a/Makefile
+++ b/Makefile
@@ -737,6 +737,7 @@ TEST_BUILTINS_OBJS += test-sigchain.o
 TEST_BUILTINS_OBJS += test-strcmp-offset.o
 TEST_BUILTINS_OBJS += test-string-list.o
 TEST_BUILTINS_OBJS += test-submodule-config.o
+TEST_BUILTINS_OBJS += test-submodule-nested-repo-config.o
 TEST_BUILTINS_OBJS += test-subprocess.o
 TEST_BUILTINS_OBJS += test-urlmatch-normalization.o
 TEST_BUILTINS_OBJS += test-wildmatch.o
diff --git a/t/helper/test-submodule-nested-repo-config.c 
b/t/helper/test-submodule-nested-repo-config.c
new file mode 100644
index 00..a31e2a9bea
--- /dev/null
+++ b/t/helper/test-submodule-nested-repo-config.c
@@ -0,0 +1,30 @@
+#include "test-tool.h"
+#include "submodule-config.h"
+
+static void die_usage(int argc, const char **argv, const char *msg)
+{
+   fprintf(stderr, "%s\n", msg);
+   fprintf(stderr, "Usage: %s  \n", argv[0]);
+   exit(1);
+}
+
+int cmd__submodule_nested_repo_config(int argc, const char **argv)
+{
+   struct repository submodule;
+
+   if (argc < 3)
+   die_usage(argc, argv, "Wrong number of arguments.");
+
+   setup_git_directory();
+
+   if (repo_submodule_init(, the_repository, argv[1])) {
+   die_usage(argc, argv, "Submodule not found.");
+   }
+
+   /* Read the config of _child_ submodules. */
+   print_config_from_gitmodules(, argv[2]);
+
+   submodule_free(the_repository);
+
+   return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index b87a8c1f22..3b473bccd8 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -42,6 +42,7 @@ static struct test_cmd cmds[] = {
{ "strcmp-offset", cmd__strcmp_offset },
{ "string-list", cmd__string_list },
{ "submodule-config", cmd__submodule_config },
+   { "submodule-nested-repo-config", cmd__submodule_nested_repo_config },
{ "subprocess", cmd__subprocess },
{ "urlmatch-normalization", cmd__urlmatch_normalization },
{ "wildmatch", cmd__wildmatch },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index e074957279..3ca351230c 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -38,6 +38,7 @@ int cmd__sigchain(int argc, const char **argv);
 int cmd__strcmp_offset(int argc, const char **argv);
 int cmd__string_list(int argc, const char **argv);
 int cmd__submodule_config(int argc, const char **argv);
+int cmd__submodule_nested_repo_config(int argc, const char **argv);
 int cmd__subprocess(int argc, const char **argv);
 int cmd__urlmatch_normalization(int argc, const char **argv);
 int cmd__wildmatch(int argc, const char **argv);
diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
index 2cfabb18bc..89690b7adb 100755
--- a/t/t7411-submodule-config.sh
+++ b/t/t7411-submodule-config.sh
@@ -216,4 +216,38 @@ test_expect_success 'reading submodules config from the 
current branch when .git
)
 '
 
+test_expect_success 'reading nested submodules config' '
+   (cd super &&
+   git init submodule/nested_submodule &&
+   echo "a" >submodule/nested_submodule/a &&
+   git -C submodule/nested_submodule add a &&
+   git -C submodule/nested_submodule commit -m "add a" &&
+   git -C submodule submodule add ./nested_submodule &&
+   git -C submodule add nested_submodule &&
+   git -C submodule commit -m "added nested_submodule" &&
+   git add submodule &&
+   git commit -m "updated submodule" &&
+   echo "./nested_submodule" >expect &&
+   test-tool submodule-nested-repo-config \
+   submodule submodule.nested_submodule.url >actual &&
+   test_cmp expect actual
+   )
+'
+
+# When this test eventually passes, before turning it into
+# test_expect_success, remember to replace the test_i18ngrep below with
+# a "test_must_be_empty warning" to be sure that the

[PATCH v6 02/10] submodule: factor out a config_set_in_gitmodules_file_gently function

2018-10-05 Thread Antonio Ospite

Introduce a new config_set_in_gitmodules_file_gently() function to write
config values to the .gitmodules file.

This is in preparation for a future change which will use the function
to write to the .gitmodules file in a more controlled way instead of
using "git config -f .gitmodules".

The purpose of the change is mainly to centralize the code that writes
to the .gitmodules file to avoid some duplication.

The naming follows git_config_set_in_file_gently() but the git_ prefix
is removed to communicate that this is not a generic git-config API.

Signed-off-by: Antonio Ospite 
---
 submodule-config.c | 12 
 submodule-config.h |  1 +
 submodule.c| 10 +++---
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 823bc76812..8659d97e06 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -707,6 +707,18 @@ int print_config_from_gitmodules(struct repository *repo, 
const char *key)
return 0;
 }
 
+int config_set_in_gitmodules_file_gently(const char *key, const char *value)
+{
+   int ret;
+
+   ret = git_config_set_in_file_gently(GITMODULES_FILE, key, value);
+   if (ret < 0)
+   /* Maybe the user already did that, don't error out here */
+   warning(_("Could not update .gitmodules entry %s"), key);
+
+   return ret;
+}
+
 struct fetch_config {
int *max_children;
int *recurse_submodules;
diff --git a/submodule-config.h b/submodule-config.h
index 031747ccf8..4dc9b0771c 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -49,6 +49,7 @@ const struct submodule *submodule_from_path(struct repository 
*r,
const char *path);
 void submodule_free(struct repository *r);
 int print_config_from_gitmodules(struct repository *repo, const char *key);
+int config_set_in_gitmodules_file_gently(const char *key, const char *value);
 
 /*
  * Returns 0 if the name is syntactically acceptable as a submodule "name"
diff --git a/submodule.c b/submodule.c
index b53cb6e9c4..3b23e76e55 100644
--- a/submodule.c
+++ b/submodule.c
@@ -89,6 +89,7 @@ int update_path_in_gitmodules(const char *oldpath, const char 
*newpath)
 {
struct strbuf entry = STRBUF_INIT;
const struct submodule *submodule;
+   int ret;
 
if (!file_exists(GITMODULES_FILE)) /* Do nothing without .gitmodules */
return -1;
@@ -104,14 +105,9 @@ int update_path_in_gitmodules(const char *oldpath, const 
char *newpath)
strbuf_addstr(, "submodule.");
strbuf_addstr(, submodule->name);
strbuf_addstr(, ".path");
-   if (git_config_set_in_file_gently(GITMODULES_FILE, entry.buf, newpath) 
< 0) {
-   /* Maybe the user already did that, don't error out here */
-   warning(_("Could not update .gitmodules entry %s"), entry.buf);
-   strbuf_release();
-   return -1;
-   }
+   ret = config_set_in_gitmodules_file_gently(entry.buf, newpath);
strbuf_release();
-   return 0;
+   return ret;
 }
 
 /*
-- 
2.19.0

[PATCH v6 04/10] t7411: be nicer to future tests and really clean things up

2018-10-05 Thread Antonio Ospite

Tests 5 and 7 in t/t7411-submodule-config.sh add two commits with
invalid lines in .gitmodules but then only the second commit is removed.

This may affect future subsequent tests if they assume that the
.gitmodules file has no errors.

Remove both the commits as soon as they are not needed anymore.

Signed-off-by: Antonio Ospite 
---
 t/t7411-submodule-config.sh | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
index f2cd1f4a2c..b1f3c6489b 100755
--- a/t/t7411-submodule-config.sh
+++ b/t/t7411-submodule-config.sh
@@ -83,6 +83,8 @@ Submodule name: 'submodule' for path 'submodule'
 EOF
 
 test_expect_success 'error in history of one submodule config lets continue, 
stderr message contains blob ref' '
+   ORIG=$(git -C super rev-parse HEAD) &&
+   test_when_finished "git -C super reset --hard $ORIG" &&
(cd super &&
cp .gitmodules .gitmodules.bak &&
echo "  value = \"" >>.gitmodules &&
@@ -115,6 +117,8 @@ test_expect_success 'using different treeishs works' '
 '
 
 test_expect_success 'error in history in fetchrecursesubmodule lets continue' '
+   ORIG=$(git -C super rev-parse HEAD) &&
+   test_when_finished "git -C super reset --hard $ORIG" &&
(cd super &&
git config -f .gitmodules \
submodule.submodule.fetchrecursesubmodules blabla &&
@@ -126,8 +130,7 @@ test_expect_success 'error in history in 
fetchrecursesubmodule lets continue' '
HEAD b \
HEAD submodule \
>actual &&
-   test_cmp expect_error actual  &&
-   git reset --hard HEAD^
+   test_cmp expect_error actual
)
 '
 
-- 
2.19.0

[PATCH v6 01/10] submodule: add a print_config_from_gitmodules() helper

2018-10-05 Thread Antonio Ospite

Add a new print_config_from_gitmodules() helper function to print values
from .gitmodules just like "git config -f .gitmodules" would.

This will be used by a new submodule--helper subcommand to be able to
access the .gitmodules file in a more controlled way.

Signed-off-by: Antonio Ospite 
---
 submodule-config.c | 25 +
 submodule-config.h |  1 +
 2 files changed, 26 insertions(+)

diff --git a/submodule-config.c b/submodule-config.c
index e04ba756d9..823bc76812 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -682,6 +682,31 @@ void submodule_free(struct repository *r)
submodule_cache_clear(r->submodule_cache);
 }
 
+static int config_print_callback(const char *var, const char *value, void 
*cb_data)
+{
+   char *wanted_key = cb_data;
+
+   if (!strcmp(wanted_key, var))
+   printf("%s\n", value);
+
+   return 0;
+}
+
+int print_config_from_gitmodules(struct repository *repo, const char *key)
+{
+   int ret;
+   char *store_key;
+
+   ret = git_config_parse_key(key, _key, NULL);
+   if (ret < 0)
+   return CONFIG_INVALID_KEY;
+
+   config_from_gitmodules(config_print_callback, repo, store_key);
+
+   free(store_key);
+   return 0;
+}
+
 struct fetch_config {
int *max_children;
int *recurse_submodules;
diff --git a/submodule-config.h b/submodule-config.h
index dc7278eea4..031747ccf8 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -48,6 +48,7 @@ const struct submodule *submodule_from_path(struct repository 
*r,
const struct object_id 
*commit_or_tree,
const char *path);
 void submodule_free(struct repository *r);
+int print_config_from_gitmodules(struct repository *repo, const char *key);
 
 /*
  * Returns 0 if the name is syntactically acceptable as a submodule "name"
-- 
2.19.0

[PATCH v6 07/10] t7506: clean up .gitmodules properly before setting up new scenario

2018-10-05 Thread Antonio Ospite

In t/t7506-status-submodule.sh at some point a new scenario is set up to
test different things, in particular new submodules are added which are
meant to completely replace the previous ones.

However before calling the "git submodule add" commands for the new
layout, the .gitmodules file is removed only from the working tree still
leaving the previous content in current branch.

This can break if, in the future, "git submodule add" starts
differentiating between the following two cases:

  - .gitmodules is not in the working tree but it is in the current
branch (it may not be safe to add new submodules in this case);

  - .gitmodules is neither in the working tree nor anywhere in the
current branch (it is safe to add new submodules).

Since the test intends to get rid of .gitmodules anyways, let's
completely remove it from the current branch, to actually start afresh
in the new scenario.

This is more future-proof and does not break current tests.

Signed-off-by: Antonio Ospite 
---
 t/t7506-status-submodule.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh
index 943708fb04..08629a6e70 100755
--- a/t/t7506-status-submodule.sh
+++ b/t/t7506-status-submodule.sh
@@ -325,7 +325,8 @@ test_expect_success 'setup superproject with untracked file 
in nested submodule'
(
cd super &&
git clean -dfx &&
-   rm .gitmodules &&
+   git rm .gitmodules &&
+   git commit -m "remove .gitmodules" &&
git submodule add -f ./sub1 &&
git submodule add -f ./sub2 &&
git submodule add -f ./sub1 sub3 &&
-- 
2.19.0

[PATCH v6 06/10] submodule: use the 'submodule--helper config' command

2018-10-05 Thread Antonio Ospite

Use the 'submodule--helper config' command in git-submodules.sh to avoid
referring explicitly to .gitmodules by the hardcoded file path.

This makes it possible to access the submodules configuration in a more
controlled way.

Signed-off-by: Antonio Ospite 
---
 git-submodule.sh | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 1b568e29b9..0805fadf47 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -72,7 +72,7 @@ get_submodule_config () {
value=$(git config submodule."$name"."$option")
if test -z "$value"
then
-   value=$(git config -f .gitmodules submodule."$name"."$option")
+   value=$(git submodule--helper config 
submodule."$name"."$option")
fi
printf '%s' "${value:-$default}"
 }
@@ -283,11 +283,11 @@ or you are unsure what this means choose another name 
with the '--name' option."
git add --no-warn-embedded-repo $force "$sm_path" ||
die "$(eval_gettext "Failed to add submodule '\$sm_path'")"
 
-   git config -f .gitmodules submodule."$sm_name".path "$sm_path" &&
-   git config -f .gitmodules submodule."$sm_name".url "$repo" &&
+   git submodule--helper config submodule."$sm_name".path "$sm_path" &&
+   git submodule--helper config submodule."$sm_name".url "$repo" &&
if test -n "$branch"
then
-   git config -f .gitmodules submodule."$sm_name".branch "$branch"
+   git submodule--helper config submodule."$sm_name".branch 
"$branch"
fi &&
git add --force .gitmodules ||
die "$(eval_gettext "Failed to register submodule '\$sm_path'")"
-- 
2.19.0

[PATCH v6 00/10] Make submodules work if .gitmodules is not checked out

2018-10-05 Thread Antonio Ospite

Hi,

this series teaches git to try and read the .gitmodules file from the
index (:.gitmodules) or from the current branch (HEAD:.gitmodules) when
the file is not readily available in the working tree.

This can be used, together with sparse checkouts, to enable submodule
usage with programs like vcsh[1] which manage multiple repositories with
their working trees sharing the same path.

[1] https://github.com/RichiH/vcsh


Thanks to SZEDER Gábor we found out that the changes in patch 9 could
allow to access the object store in an inconsistent way when using
multi-threading in "git grep --recurse-submodules", this has been dealt
with in this revision.

BTW, with Stefan Beller we also identified some unneeded code which
could have been removed to alleviate the issue, but that would not have
solved it completely; so, I am not removing the unnecessary call to
repo_read_gitmodules() builtin/grep.c in this series, possibly this can
become a stand-alone change.

The problems from above also made me investigate what implications the
current use of a global object store had on my new addition, and now
I know that there is one case which I cannot support just yet: nested
submodules without .gitmodules in their working tree.

This case has been documented with a warning and test_expect_failure
items in tests, and hitting the unsupported case does not alter the
current behavior of git.

Apart form patch 9 and 10 there are no major changes to the previous
iteration.

IMHO we are in a place where the problem has been analyzed with enough
depth, the limitations have been identified and dealt with in a way that
should not affect current users nor diminish the usefulness of the new
code.

v5 of the series is here:
https://public-inbox.org/git/20180917140940.3839-1-...@ao2.it/

v4 of the series is here:
https://public-inbox.org/git/20180824132951.8000-1-...@ao2.it/

v3 of the series is here:
https://public-inbox.org/git/20180814110525.17801-1-...@ao2.it/

v2 of the series is here:
https://public-inbox.org/git/20180802134634.10300-1-...@ao2.it/

v1 of the series, with some background info, is here:
https://public-inbox.org/git/20180514105823.8378-1-...@ao2.it/

Changes since v5:

  * print_config_from_gitmodules() in patch 1 now accepts a struct
repository argument.

  * submodule--helper config in patch 5 has been adjusted to use the new
signature of print_config_from_gitmodules().

  * In patch 9 the grep read lock in builtin/grep.c now covers all code
paths involving config_from_gitmodules().

FTR git-grep is the only place where config_from_gitmodules() is
called from multi-threaded code.

  * Patch 9 also documents the rare case that cannot be supported just
yet, and adds a warning to the user.

  * In patch 9, config_from_gitmodules() now does not read any config
when the config source is not specified.(I added a catch-all "else"
block) This match more closely the behavior of the old code using
git_config_from_file.

  * Added a new test tool in patch 10 to exercise config_read_config()
in a more direct way, passing an arbitrary repository.

Admittedly, patch 10 performs a similar test to the one added to
t7814 in patch 9, so I'd be OK with dropping patch 10 if it is too
specific.

Thank you,
   Antonio

Antonio Ospite (10):
  submodule: add a print_config_from_gitmodules() helper
  submodule: factor out a config_set_in_gitmodules_file_gently function
  t7411: merge tests 5 and 6
  t7411: be nicer to future tests and really clean things up
  submodule--helper: add a new 'config' subcommand
  submodule: use the 'submodule--helper config' command
  t7506: clean up .gitmodules properly before setting up new scenario
  submodule: add a helper to check if it is safe to write to .gitmodules
  submodule: support reading .gitmodules when it's not in the working
tree
  t/helper: add test-submodule-nested-repo-config

 Makefile |   1 +
 builtin/grep.c   |  17 ++-
 builtin/submodule--helper.c  |  40 ++
 cache.h  |   2 +
 git-submodule.sh |  13 +-
 submodule-config.c   |  68 -
 submodule-config.h   |   2 +
 submodule.c  |  28 +++-
 submodule.h  |   1 +
 t/helper/test-submodule-nested-repo-config.c |  30 
 t/helper/test-tool.c |   1 +
 t/helper/test-tool.h |   1 +
 t/t7411-submodule-config.sh  | 141 +--
 t/t7416-submodule-sparse-gitmodules.sh   |  78 ++
 t/t7506-status-submodule.sh  |   3 +-
 t/t7814-grep-recurse-submodules.sh   |  16 +++
 16 files changed, 410 insertions(+), 32 deletions(-)
 create mode 100644 t/helper/test-submodule-nested-repo-config.c
 create mode 100755

[PATCH v6 03/10] t7411: merge tests 5 and 6

2018-10-05 Thread Antonio Ospite

Tests 5 and 6 check for the effects of the same commit, merge the two
tests to make it more straightforward to clean things up after the test
has finished.

The cleanup will be added in a future commit.

Signed-off-by: Antonio Ospite 
---
 t/t7411-submodule-config.sh | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
index 0bde5850ac..f2cd1f4a2c 100755
--- a/t/t7411-submodule-config.sh
+++ b/t/t7411-submodule-config.sh
@@ -82,29 +82,21 @@ Submodule name: 'a' for path 'b'
 Submodule name: 'submodule' for path 'submodule'
 EOF
 
-test_expect_success 'error in one submodule config lets continue' '
+test_expect_success 'error in history of one submodule config lets continue, 
stderr message contains blob ref' '
(cd super &&
cp .gitmodules .gitmodules.bak &&
echo "  value = \"" >>.gitmodules &&
git add .gitmodules &&
mv .gitmodules.bak .gitmodules &&
git commit -m "add error" &&
-   test-tool submodule-config \
-   HEAD b \
-   HEAD submodule \
-   >actual &&
-   test_cmp expect_error actual
-   )
-'
-
-test_expect_success 'error message contains blob reference' '
-   (cd super &&
sha1=$(git rev-parse HEAD) &&
test-tool submodule-config \
HEAD b \
HEAD submodule \
-   2>actual_err &&
-   test_i18ngrep "submodule-blob $sha1:.gitmodules" actual_err 
>/dev/null
+   >actual \
+   2>actual_stderr &&
+   test_cmp expect_error actual &&
+   test_i18ngrep "submodule-blob $sha1:.gitmodules" actual_stderr 
>/dev/null
)
 '
 
-- 
2.19.0

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Ævar Arnfjörð Bjarmason



On Fri, Oct 05 2018, Derrick Stolee wrote:

> On 10/4/2018 5:42 PM, Ævar Arnfjörð Bjarmason wrote:
>> I don't have time to polish this up for submission now, but here's a WIP
>> patch that implements this, highlights:
>>
>>   * There's a gc.clone.autoDetach=false default setting which overrides
>> gc.autoDetach if 'git gc --auto' is run via git-clone (we just pass a
>> --cloning option to indicate this).
>
> I'll repeat that it could make sense to do the same thing on clone
> _and_ fetch. Perhaps a "--post-fetch" flag would be good here to
> communicate that we just downloaded a pack from a remote.

I don't think that makes sense, but let's talk about why, because maybe
I've missed something, you're certainly more familiar with the
commit-graph than I am.

The reason to do it on clone as a special-case or when the file is
missing, is because we know the file is desired (via the GC config), and
presumably is expected to help performance, and we have 0% of it. So by
going from 0% to 100% on clone we'll get fast --contains and other
goodies the graph helps with.

But when we're doing a fetch, or really anything else that runs "git gc
--auto" we can safely assume that we have a recent enough graph, because
it will have been run whenever auto-gc kicked in.

I.e.:

# Slow, if we assume background forked commit-graph generation
# (which I'm avoiding)
git clone x && cd x && git tag --contains
# Fast enough, since we have an existing commit-graph
cd x && git fetch && git tag --contains

I *do* think it might make sense to in general split off parts of "gc
--auto" that we'd like to be more aggressive about, simply because the
ratio of how long it takes to do, and how much it helps with performance
makes more sense than a full repack, which is what the current heuristic
is based on.

And maybe when we run in that mode we should run in the foreground, but
I don't see why git-fetch should be a special case there, and in this
regard, the gc.clone.autoDetach=false setting I've made doesn't make
much sence. I.e. maybe we should also skip forking to the background in
such a mode when we trigger such a "mini gc" via git-commit or whatever.

>>   * A clone of say git.git with gc.writeCommitGraph=true looks like:
>>
>> [...]
>> Receiving objects: 100% (255262/255262), 100.49 MiB | 17.78 MiB/s, done.
>> Resolving deltas: 100% (188947/188947), done.
>> Computing commit graph generation numbers: 100% (55210/55210), done.
>
> This looks like good UX. Thanks for the progress here!
>
>>   * The 'git gc --auto' command also knows to (only) run the commit-graph
>> (and space is left for future optimization steps) if general GC isn't
>> needed, but we need "optimization":
>>
>> $ rm .git/objects/info/commit-graph; ~/g/git/git --exec-path=$PWD -c 
>> gc.writeCommitGraph=true -c gc.autoDetach=false gc --auto;
>> Annotating commits in commit graph: 341229, done.
>> Computing commit graph generation numbers: 100% (165969/165969), done.
>> $
>
> Will this also trigger a full commit-graph rewrite on every 'git
> commit' command?

Nope, because "git commit" can safely be assumed to have some
commit-graph anyway, and I'm just special casing the case where it
doesn't exist.

But if it doesn't exist and you do a "git commit" then "gc --auto" will
be run, and we'll fork to the background and generate it...

>  Or is there some way we can compute the staleness of
> the commit-graph in order to only update if we get too far ahead?
> Previously, this was solved by relying on the auto-GC threshold.

So re the "I don't think that makes sense..." at the start of my E-Mail,
isn't it fine to rely on the default thresholds here, or should we be
more aggressive?

>>   * The patch to gc.c looks less scary with -w, most of it is indenting
>> the existing pack-refs etc. with a "!auto_gc || should_gc" condition.
>>
>>   * I added a commit_graph_exists() exists function and only care if I
>> get ENOENT for the purposes of this gc mode. This would need to be
>> tweaked for the incremental mode Derrick talks about, but if we just
>> set "should_optimize" that'll also work as far as gc --auto is
>> concerned (e.g. on fetch, am etc.)
>
> The incremental mode would operate the same as split-index, which
> means we will still look for .git/objects/info/commit-graph. That file
> may point us to more files.

Ah!

>> +int commit_graph_exists(const char *graph_file)
>> +{
>> +struct stat st;
>> +if (stat(graph_file, )) {
>> +if (errno == ENOENT)
>> +return 0;
>> +else
>> +return -1;
>> +}
>> +return 1;
>> +}
>> +
>
> This method serves a very similar purpose to
> generation_numbers_enabled(), except your method only cares about the
> file existing. It ignores information like `core.commitGraph`, which
> should keep us from doing anything with the commit-graph file if
> false.
>
> Nothing about your

Re: [PATCH] grep: provide a noop --recursive option

2018-10-05 Thread Mischa POSLAWSKY

Junio C Hamano wrote 2018-10-05 1:19 (-0700):
> Stefan Beller  writes:
> 
> > git-grep is always file/tree recursive, but there is --recurse-submodules
> > which is off by default. Instead of providing a short alias to a noop,
> > we could use -r for submodules. (And if you happen to have no
> > submodules, this is a noop for you)
> 
> I am not sure if it is an overall win for those who do have and use
> submodules to easily be able to go recursive with a short-and-sweet
> 'r', or even they want to work inside one project at a time most of
> the time.  If the latter, then using 'r' for recurse-submodules is
> going to be a mistake (besides, other commands that have 'recursive'
> typically use 'r' for its shorthand,and 'r' does not stand for
> 'recurse-submodules' for them).

Personally I would welcome a shorthand for --recurse-submodules,
especially if --r^I no longer completes to this.

It is also closer to the behaviour provided by grep -r as that recurses
into submodules as well.

Regards,
-- 
Mischa

Re: [PATCH] grep: provide a noop --recursive option

2018-10-05 Thread Ævar Arnfjörð Bjarmason



On Fri, Oct 05 2018, Junio C Hamano wrote:

> René Scharfe  writes:
>
>>
>> Recognize -r and --recursive as synonyms for --max-depth=-1 for
>> compatibility with GNU grep; it's still the default for git grep.
>>
>> This also adds --no-recursive as synonym for --max-depth=0 for free,
>> which is welcome for completeness and consistency.
>>
>> Fix the description for --max-depth, while we're at it -- negative
>> values other than -1 actually disable recursion, i.e. they are
>> equivalent to --max-depth=0.
>> ...
>> diff --git a/builtin/grep.c b/builtin/grep.c
>> index 601f801158..f6e127f0bc 100644
>> --- a/builtin/grep.c
>> +++ b/builtin/grep.c
>> @@ -811,6 +811,8 @@ int cmd_grep(int argc, const char **argv, const char 
>> *prefix)
>>  GREP_BINARY_NOMATCH),
>>  OPT_BOOL(0, "textconv", _textconv,
>>   N_("process binary files with textconv filters")),
>> +OPT_SET_INT('r', "recursive", _depth,
>> +N_("search in subdirectories (default)"), -1),
>
> Wow.
>
> I didn't think of this trick to let OPT_SET_INT() to grok --no-* and
> set the variable to 0.  Being able to do this without a custom
> callback is certainly very nice.
>
> The patch looks good.

FWIW I'm not going to carry this series forward, I wrote it more as a
"here's how this could be done". But if Christoph / René wants to hack
on it...

Re: [PATCH v3] coccicheck: process every source file at once

2018-10-05 Thread SZEDER Gábor

On Thu, Oct 04, 2018 at 07:17:47PM -0700, Jacob Keller wrote:
> Junio, do you want me to update the commit message on my side with the
> memory concerns? Or could you update it to mention memory as a noted
> trade off.

We have been running 'make -j2 coccicheck' in the static analysis
build job on Travis CI, which worked just fine so far.  The Travis CI
build environments have 3GB of memory available [1], but, as shown in
[2], with this patch the memory consumption jumps up to about
1.3-1.8GB for each of those jobs.  So with two parallel jobs we will
very likely bump into this limit.

So this patch should definitely change that build script to run only a
single job.


1 - 
https://docs.travis-ci.com/user/common-build-problems/#my-build-script-is-killed-without-any-error
2 - https://public-inbox.org/git/20181003101658.GM23446@localhost/


> > >  Makefile | 6 ++
> > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/Makefile b/Makefile
> > > index df1df9db78da..da692ece9e12 100644
> > > --- a/Makefile
> > > +++ b/Makefile
> > > @@ -2715,10 +2715,8 @@ endif
> > >  %.cocci.patch: %.cocci $(COCCI_SOURCES)
> > > @echo '' SPATCH $<; \
> > > ret=0; \
> > > -   for f in $(COCCI_SOURCES); do \
> > > -   $(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS) || \
> > > -   { ret=$$?; break; }; \
> > > -   done >$@+ 2>$@.log; \
> > > +   $(SPATCH) --sp-file $< $(COCCI_SOURCES) $(SPATCH_FLAGS) >$@+ 
> > > 2>$@.log; \
> > > +   ret=$$?; \
> > > if test $$ret != 0; \
> > > then \
> > > cat $@.log; \
> > > --
> > > 2.18.0.219.gaf81d287a9da
> > >

Re: [BUG] Error while trying to git apply a patch; works with patch -p1

2018-10-05 Thread Eneas Queiroz

Em qui, 4 de out de 2018 às 18:45, SZEDER Gábor  escreveu:
>
> On Thu, Oct 04, 2018 at 06:01:11PM -0300, Eneas Queiroz wrote:
> > I've sent this to the list 2 days ago, but I can't find it in the list
> > archives, so I'm sending it again without files attached.  I apologize
> > if this is a duplicate. One should be able to reproduce this with the
> > current PR files, but if not, I can provide them.
> >
> > I've hit a strange error while trying to apply a patch from github
> > here: https://github.com/openwrt/openwrt/pull/965
> >
> > 965.patch:452: trailing whitespace.
> >
> > 965.patch:559: space before tab in indent.
> >  -o $(SHLIBNAME_FULL) \
> > 965.patch:560: space before tab in indent.
> >  $$ALLSYMSFLAGS $$SHOBJECTS $$NOALLSYMSFLAGS $$LIBDEPS; \
> > 965.patch:564: space before tab in indent.
> > -o $(SHLIBNAME_FULL) \
> > 965.patch:2334: trailing whitespace.
> >
> > error: package/libs/openssl/patches/100-Configure-afalg-support.patch:
> > No such file or directory
> > error: package/libs/openssl/patches/110-openwrt_targets.patch: No such
> > file or directory
> > error: package/libs/openssl/patches/120-fix_link_segfault.patch: No
> > such file or directory
> > error: package/libs/openssl/patches/1.1.0/100-Configure-afalg-support.patch:
> > No such file or directory
> > error: package/libs/openssl/patches/1.1.0/110-openwrt_targets.patch:
> > No such file or directory
> >
> > If you get the patch file from
> > https://github.com/openwrt/openwrt/pull/965.patch and apply it with
> > git apply, it fails.  If I apply the same file with patch -p1, it
> > works fine.  I've tried it with git 2.16.4 and 2.19, and they both
> > fail with the same error, and at least 2 more people have confirmed
> > it.
> >
> > git apply fails even when using git format-patch -13 --stdout as a
> > source, so it is not github doing something weird.
> >
> > The file is a series of 13 patches.  If I split the series after the
>
> So this is no _a_ patch, then, but a mailbox of patches.  'git apply'
> is supposed to apply a single a patch; apparently 'patch' is more
> lenient.
>
> Have you tried 'git am'?
>
> > 3rd patch, it works.
> > Also, if I use https://github.com/openwrt/openwrt/pull/965.diff, it also 
> > works.
> >
> > I'm not subscribed to the list, so please CC me.
> >
> > Cheers,
> >
> > Eneas

Thanks for the reply.  Is that expected behavior?  git apply seems to
work in most similar cases.

git am works fine--although it complains a lot about whitespace errors
when patching *.patch output files.

I know they're just warnings, but perhaps git apply/am should suppress
them when creating/patching a *.patch file, since context lines in
'*.patch' files all start with a space.

Cheers,

Eneas

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-05 Thread Derrick Stolee


On 10/4/2018 6:59 PM, René Scharfe wrote:

Am 01.10.2018 um 22:37 schrieb René Scharfe:

Am 01.10.2018 um 21:26 schrieb Derrick Stolee:

Good catch! I'm disappointed that we couldn't use type-checking here, as
it is quite difficult to discover that the types are wrong here.

Generics in C are hard, and type checking traditionally falls by the
wayside.  You could use macros for that, like klib [*] does, but that
has its own downsides (more object text, debugging the sort macros
themselves is harder).

[*] https://github.com/attractivechaos/klib/blob/master/ksort.h

We could also do something like this to reduce the amount of manual
casting, but do we want to?  (Macro at the bottom, three semi-random
examples at the top.)


I like the idea! It certainly can assist in some of the repeat work when 
preparing to QSORT, and make it less error-prone.



---
  bisect.c  | 11 +++
  commit-graph.c|  9 ++---
  commit-reach.c| 12 +---
  git-compat-util.h | 12 
  4 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/bisect.c b/bisect.c
index e8b17cf7e1..06be3a3c15 100644
--- a/bisect.c
+++ b/bisect.c
@@ -192,16 +192,11 @@ struct commit_dist {
int distance;
  };
  
-static int compare_commit_dist(const void *a_, const void *b_)

-{
-   struct commit_dist *a, *b;
-
-   a = (struct commit_dist *)a_;
-   b = (struct commit_dist *)b_;
+DEFINE_SORT(sort_by_commit_dist, struct commit_dist, a, b, {
if (a->distance != b->distance)
return b->distance - a->distance; /* desc sort */
return oidcmp(>commit->object.oid, >commit->object.oid);
-}
+})
  
  static struct commit_list *best_bisection_sorted(struct commit_list *list, int nr)

  {
@@ -223,7 +218,7 @@ static struct commit_list *best_bisection_sorted(struct 
commit_list *list, int n
array[cnt].distance = distance;
cnt++;
}
-   QSORT(array, cnt, compare_commit_dist);
+   sort_by_commit_dist(array, cnt);
for (p = list, i = 0; i < cnt; i++) {
struct object *obj = &(array[i].commit->object);
  
diff --git a/commit-graph.c b/commit-graph.c

index 7f4519ec3b..a2202414e0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -550,12 +550,7 @@ static void write_graph_chunk_large_edges(struct hashfile 
*f,
}
  }
  
-static int commit_compare(const void *_a, const void *_b)

-{
-   const struct object_id *a = (const struct object_id *)_a;
-   const struct object_id *b = (const struct object_id *)_b;
-   return oidcmp(a, b);
-}
+DEFINE_SORT(sort_oids, struct object_id, a, b, return oidcmp(a, b))
  
  struct packed_commit_list {

struct commit **list;
@@ -780,7 +775,7 @@ void write_commit_graph(const char *obj_dir,
  
  	close_reachable();
  
-	QSORT(oids.list, oids.nr, commit_compare);

+   sort_oids(oids.list, oids.nr);
  
  	count_distinct = 1;

for (i = 1; i < oids.nr; i++) {
diff --git a/commit-reach.c b/commit-reach.c
index 2f5e592d16..3aef47c3dd 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -527,17 +527,15 @@ int commit_contains(struct ref_filter *filter, struct 
commit *commit,
return is_descendant_of(commit, list);
  }
  
-static int compare_commits_by_gen(const void *_a, const void *_b)

-{
-   const struct commit *a = *(const struct commit * const *)_a;
-   const struct commit *b = *(const struct commit * const *)_b;
-
+DEFINE_SORT(sort_commits_by_gen, struct commit *, ap, bp, {
+   const struct commit *a = *ap;
+   const struct commit *b = *bp;
if (a->generation < b->generation)
return -1;
if (a->generation > b->generation)
return 1;
return 0;
-}
+})


Here, to make the macro version compile you need to cast ap and bp, 
which gives us a level of type-safety that wasn't there before. That can 
help us find errors at compile-time!


  
  int can_all_from_reach_with_flag(struct object_array *from,

 unsigned int with_flag,
@@ -580,7 +578,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
nr_commits++;
}
  
-	QSORT(list, nr_commits, compare_commits_by_gen);

+   sort_commits_by_gen(list, nr_commits);
  
  	for (i = 0; i < nr_commits; i++) {

/* DFS from list[i] */
diff --git a/git-compat-util.h b/git-compat-util.h
index 5f2e90932f..f9e78d69a2 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1066,6 +1066,18 @@ static inline void sane_qsort(void *base, size_t nmemb, 
size_t size,
qsort(base, nmemb, size, compar);
  }
  
+#define DEFINE_SORT(name, elemtype, one, two, code)			\

+static int name##_compare(const void *one##_v_, const void *two##_v_)  \
+{  \
+   elemtype const *one = one##_v_; \
+   elemtype const *two = two##_v_; \

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Derrick Stolee


On 10/4/2018 5:42 PM, Ævar Arnfjörð Bjarmason wrote:

I don't have time to polish this up for submission now, but here's a WIP
patch that implements this, highlights:

  * There's a gc.clone.autoDetach=false default setting which overrides
gc.autoDetach if 'git gc --auto' is run via git-clone (we just pass a
--cloning option to indicate this).


I'll repeat that it could make sense to do the same thing on clone _and_ 
fetch. Perhaps a "--post-fetch" flag would be good here to communicate 
that we just downloaded a pack from a remote.



  * A clone of say git.git with gc.writeCommitGraph=true looks like:

[...]
Receiving objects: 100% (255262/255262), 100.49 MiB | 17.78 MiB/s, done.
Resolving deltas: 100% (188947/188947), done.
Computing commit graph generation numbers: 100% (55210/55210), done.


This looks like good UX. Thanks for the progress here!


  * The 'git gc --auto' command also knows to (only) run the commit-graph
(and space is left for future optimization steps) if general GC isn't
needed, but we need "optimization":

$ rm .git/objects/info/commit-graph; ~/g/git/git --exec-path=$PWD -c 
gc.writeCommitGraph=true -c gc.autoDetach=false gc --auto;
Annotating commits in commit graph: 341229, done.
Computing commit graph generation numbers: 100% (165969/165969), done.
$


Will this also trigger a full commit-graph rewrite on every 'git commit' 
command? Or is there some way we can compute the staleness of the 
commit-graph in order to only update if we get too far ahead? 
Previously, this was solved by relying on the auto-GC threshold.



  * The patch to gc.c looks less scary with -w, most of it is indenting
the existing pack-refs etc. with a "!auto_gc || should_gc" condition.

  * I added a commit_graph_exists() exists function and only care if I
get ENOENT for the purposes of this gc mode. This would need to be
tweaked for the incremental mode Derrick talks about, but if we just
set "should_optimize" that'll also work as far as gc --auto is
concerned (e.g. on fetch, am etc.)


The incremental mode would operate the same as split-index, which means 
we will still look for .git/objects/info/commit-graph. That file may 
point us to more files.



+int commit_graph_exists(const char *graph_file)
+{
+   struct stat st;
+   if (stat(graph_file, )) {
+   if (errno == ENOENT)
+   return 0;
+   else
+   return -1;
+   }
+   return 1;
+}
+


This method serves a very similar purpose to 
generation_numbers_enabled(), except your method only cares about the 
file existing. It ignores information like `core.commitGraph`, which 
should keep us from doing anything with the commit-graph file if false.


Nothing about your method is specific to the commit-graph file, since 
you provide a filename as a parameter. It could easily be "int 
file_exists(const char *filename)".


Thanks,

-Stolee

LIFESTYLE EXPO TOKYO 2019 [January] - Here's why you should exhibit!

2018-10-05 Thread Eiichi Hasegawa LIFESTYLE EXPO TOKYO Show Management

Dear International Sales & Marketing Director,
Zhejiang Wuchuan Industrial Co., Ltd

Hello, this is Eiichi Hasegawa from LIFESTYLE EXPO TOKYO Show Management.
Are you considering expanding your business to Japan, Asia-Pacific?

If yes, this is your chance to exhibit at LIFESTYLE EXPO TOKYO 2019 [January]!
>> https://www.lifestyle-expo-spring.jp/en/

- 3 Key Reasons Why You Should Exhibit at LIFESTYLE EXPO TOKYO -

1. January is the best time to approach Japanese buyers.
Japan's fiscal year starts in April, so January is your chance to have your 
products selected in the buyers' budget for 2019.

2. You can approach the buyers before anyone else does.
This will be the year's first trade show for the buyers.
They will be more motivated to purchase and conduct business for the coming 
year and their new budget (fiscal 2019 starts from April).
  
3. Meet and approach buyers from other industries to further expand your 
business.
Exhibitors can meet a huge variety of visitors from different industries 
because LIFESTYLE EXPO TOKYO consists of 6 specialised shows.
You can take advantage of the synergy effects by meeting many key 
buyers/distributors who are visiting the other shows.
　 
< Visitor Profile of LIFESTYLE EXPO TOKYO 2019 [January] >
==
- GIFTEX TOKYO 2019 - 2nd Variety-Gifts Expo 
  (Gift Shops, Lifestyle Shops, Department Stores, etc.)
- 2nd Baby & Kids Expo Tokyo 
  (Baby and Kids Stores, Toy Stores, Gift Shops, etc.)
- 2nd Health & Beauty Goods Expo Tokyo
  (Beauty Shops, Salons/Spas/Hotels, Drug Stores, etc.)
- 2nd Fashion Goods & Accessories Expo Tokyo 
  (Apparel Shops, Bags/Shoes Shops, Department Stores, etc.)
- INTERIOR TOKYO 2019 - 2nd Interior Products & Furniture Expo
  (Interior Shops, Lifestyle Shops, Hotels/Restaurants, etc.)
- 2nd Table & Kitchenware Expo Tokyo 
  (Lifestyle Shops, Gift Shops, Department Stores, etc.)



If you would like to expand/promote your business to Japan/Asia-Pacific, don't 
miss this great opportunity!

Please fill in the Reply Form below to receive detailed information on 
exhibiting.
Show Management will assist you with the latest booth availability, cost 
estimation and any other inquiries.

== Reply Form 
mailto:lifestyle-...@reedexpo.co.jp
Company Name:
Contact Person:
Email:
TEL:
Main Products:
Your Request
(  ) Cost Information
(  ) Available Booth Locations
(  ) Information on Packaged Booths
(  ) Other ( )
===


We look forward to hearing from you soon.


Sincerely,


Eiichi Hasegawa (Mr.), Chisato Miyawaki (Ms.), Mikako Shimada (Ms.)
Qu Jun (Mr.), Choi Ilyong (Mr.)
LIFESTYLE EXPO TOKYO Show Management
Reed Exhibitions Japan Ltd.
TEL: +81-3-3349-8505
mailto:lifestyle-...@reedexpo.co.jp
--
LIFESTYLE EXPO TOKYO 2019 [January] 
Dates: January 30 (Wed.) - February 1 (Fri.), 2019
Venue: Makuhari Messe, Japan
https://www.lifestyle-expo-spring.jp/en/
--
ID:E36-G1402-0075














This message is delivered to you to provide details of exhibitions and 
conferences organized, co-organized, or managed by Reed Exhibitions Japan Ltd.
If you would like to change your contact information, or prefer not to receive 
further information on this exhibition/conference, please follow the directions 
below.


Please click the URL below and follow the directions on the website to update 
your e-mail and other information.
https://contact.reedexpo.co.jp/expo/REED/?lg=en=ch=CHANGE


Please reply to this mail changing the subject to "REMOVE FROM LIST".
You will not receive any further information on this exhibition/conference.

Re: [PATCH v3 1/3] help: redirect to aliased commands for "git cmd --help"

2018-10-05 Thread Rasmus Villemoes

On 2018-10-05 10:19, Junio C Hamano wrote:
> Rasmus Villemoes  writes:
> 
>>
>> I believe that printing the "is aliased to" message also in case (2) has
>> value: Depending on pager setup, or if the user has help.format=web, the
>> message is still present immediately above the prompt when the user
>> quits the pager/returns to the terminal. That serves as an explanation
>> for why one was redirected to "man git-cherry-pick" from "git cp
>> --help", and if cp is actually 'cherry-pick -n', it reminds the user
>> that using cp has some flag implicitly set before firing off the next
>> command.
>>
>> It also provides some useful info in case we end up erroring out, either
>> in the "bad alias string" check, or in the "No manual entry for gitbar"
>> case.
> 
> These two paragraphs were misleading, because they sounded as if you
> were lamenting that you were somehow forbidden from doing so even
> though you believe doing it is the right thing.
> 
> But that is not what is happening.  I think we should update the (2)
> above to mention what you actually do in the code, perhaps like so:

Yes, what I wrote was probably better placed below ---.

> and hopefully remain visible when help.format=web is used,
>"git bar --help" errors out, or the manpage of "git bar" is
>short enough. It may not help if the help shows manpage on

or, as in my case, the pager does not clear the terminal. I even think
that's the default behaviour (due to X in $LESS) - at least, I don't
have any magic in the environment or .gitconfig to achieve this. So it's
not visible while the man page is shown in the pager, but upon exit from
the pager.

> It also is strange to count from (0); if the patchset is rerolled
> again, I'd prefer to see these start counting from (1), in which
> case this item will become (3).

If you prefer, I can send a v4.

Rasmus

[PATCH] branch: trivial style fix

2018-10-05 Thread Tao Qingyun

---
 builtin/branch.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index c396c41533..52aad0f602 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -222,10 +222,11 @@ static int delete_branches(int argc, const char **argv, 
int force, int kinds,
if (!head_rev)
die(_("Couldn't look up commit object for HEAD"));
}
-   for (i = 0; i < argc; i++, strbuf_reset()) {
+   for (i = 0; i < argc; i++) {
char *target = NULL;
int flags = 0;
 
+   strbuf_reset();
strbuf_branchname(, argv[i], allowed_interpret);
free(name);
name = mkpathdup(fmt, bname.buf);
@@ -716,8 +717,7 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
print_columns(, colopts, NULL);
string_list_clear(, 0);
return 0;
-   }
-   else if (edit_description) {
+   } else if (edit_description) {
const char *branch_name;
struct strbuf branch_ref = STRBUF_INIT;
 
-- 
2.19.0

Re: [PATCH 1/1] rebase -i: introduce the 'break' command

2018-10-05 Thread Jacob Keller

On Fri, Oct 5, 2018 at 1:17 AM Junio C Hamano  wrote:
> If one wants to emulate this with the versions of Git that are
> currently deployed, would it be sufficient to insert "exec false"
> instead of "break"?
>
> The reason I am asking is *not* to imply that we do not need this
> new feature.  It is because I vaguely recall seeing a request to add
> 'pause' to the insn set and "exec false" was mentioned as a more
> general alternative long time ago.  I am trying to see if this is a
> recurring request/wish, because it would reinforce that this new
> feature would be a good addition if that is the case.
>
> I suspect that "exec false" would give a message that looks like a
> complaint ("'false' failed so we are giving you control back to fix
> things" or something like that), and having a dedicated way to pause
> the execution without alarming the user is a good idea.
>
> I think the earlier request asked for 'pause' (I didn't dig the list
> archive very carefully, though), and 'stop' may also be a possible
> verb, but I tend to agree with this patch that 'break' is probably
> the best choice, simply because it begins with 'b' in the
> abbreviated form, a letter that is not yet used by others (unlike
> 'pause' or 'stop' that would want 'p' and 's' that are already
> taken)..
>

Yea. I use "exec false" all the time for this purpose, but it's a bit
confusing, and it does cause rebase to indicate that a command failed.

I think adding a builtin command to do this is a good idea, and I
think break is a reasonable verb, (especially considering the
shorthand "b").

Regards,
Jake

Re: [PATCH v3 1/3] help: redirect to aliased commands for "git cmd --help"

2018-10-05 Thread Junio C Hamano

Rasmus Villemoes  writes:

> As discussed in the thread for v1 of this patch [1] [2], this changes the
> rules for "git foo --help" when foo is an alias.
>
> (0) When invoked as "git help foo", we continue to print the "foo is
> aliased to bar" message and nothing else.
>
> (1) If foo is an alias for a shell command, print "foo is aliased to
> !bar" as usual.
>
> (2) Otherwise, break the alias string into words, and pretend that "git
> word0 --help" was called.
>
> At least for me, getting the man page for git-cherry-pick directly with
> "git cp --help" is more useful (and how I expect an alias to behave)
> than the short "is aliased to" notice. It is also consistent with
> "--help" generally providing more comprehensive help than "-h".
>
> I believe that printing the "is aliased to" message also in case (2) has
> value: Depending on pager setup, or if the user has help.format=web, the
> message is still present immediately above the prompt when the user
> quits the pager/returns to the terminal. That serves as an explanation
> for why one was redirected to "man git-cherry-pick" from "git cp
> --help", and if cp is actually 'cherry-pick -n', it reminds the user
> that using cp has some flag implicitly set before firing off the next
> command.
>
> It also provides some useful info in case we end up erroring out, either
> in the "bad alias string" check, or in the "No manual entry for gitbar"
> case.

These two paragraphs were misleading, because they sounded as if you
were lamenting that you were somehow forbidden from doing so even
though you believe doing it is the right thing.

But that is not what is happening.  I think we should update the (2)
above to mention what you actually do in the code, perhaps like so:

(2) Otherwise, show "foo is aliased to bar" to the standard
error stream, and then break the alias string into words and
pretend as if "git word[0] --help" were called.  The former
is necessary to help users when 'foo' is aliased to a
command with an option (e.g. "[alias] cp = cherry-pick -n"),
and hopefully remain visible when help.format=web is used,
"git bar --help" errors out, or the manpage of "git bar" is
short enough. It may not help if the help shows manpage on
the terminal as usual, though.

As we explain why we show the alias information before going to the
manpage in the item itself and a brief discussion of pros-and-cons,
we can safely lose the "I believe..."  paragraph, which looks
somewhat out of place in a log message.

It also is strange to count from (0); if the patchset is rerolled
again, I'd prefer to see these start counting from (1), in which
case this item will become (3).

> [1] https://public-inbox.org/git/20180926102636.30691-1...@rasmusvillemoes.dk/
> [2] https://public-inbox.org/git/20180926184914.gc30...@sigill.intra.peff.net/
>
> Signed-off-by: Rasmus Villemoes 
> ---
>  builtin/help.c | 34 +++---
>  1 file changed, 31 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/help.c b/builtin/help.c
> index 8d4f6dd301..e0e3fe62e9 100644
> --- a/builtin/help.c
> +++ b/builtin/help.c
> @@ -415,9 +415,37 @@ static const char *check_git_cmd(const char* cmd)
>  
>   alias = alias_lookup(cmd);
>   if (alias) {
> - printf_ln(_("'%s' is aliased to '%s'"), cmd, alias);
> - free(alias);
> - exit(0);
> + const char **argv;
> + int count;
> +
> + /*
> +  * handle_builtin() in git.c rewrites "git cmd --help"
> +  * to "git help --exclude-guides cmd", so we can use
> +  * exclude_guides to distinguish "git cmd --help" from
> +  * "git help cmd". In the latter case, or if cmd is an
> +  * alias for a shell command, just print the alias
> +  * definition.
> +  */
> + if (!exclude_guides || alias[0] == '!') {
> + printf_ln(_("'%s' is aliased to '%s'"), cmd, alias);
> + free(alias);
> + exit(0);
> + }
> + /*
> +  * Otherwise, we pretend that the command was "git
> +  * word0 --help". We use split_cmdline() to get the
> +  * first word of the alias, to ensure that we use the
> +  * same rules as when the alias is actually
> +  * used. split_cmdline() modifies alias in-place.
> +  */
> + fprintf_ln(stderr, _("'%s' is aliased to '%s'"), cmd, alias);
> + count = split_cmdline(alias, );
> + if (count < 0)
> + die(_("bad alias.%s string: %s"), cmd,
> + split_cmdline_strerror(count));
> + free(argv);
> + UNLEAK(alias);
> + return alias;
>   }
>  
>   if (exclude_guides)

Re: [PATCH] grep: provide a noop --recursive option

2018-10-05 Thread Junio C Hamano

Stefan Beller  writes:

> git-grep is always file/tree recursive, but there is --recurse-submodules
> which is off by default. Instead of providing a short alias to a noop,
> we could use -r for submodules. (And if you happen to have no
> submodules, this is a noop for you)

I am not sure if it is an overall win for those who do have and use
submodules to easily be able to go recursive with a short-and-sweet
'r', or even they want to work inside one project at a time most of
the time.  If the latter, then using 'r' for recurse-submodules is
going to be a mistake (besides, other commands that have 'recursive'
typically use 'r' for its shorthand,and 'r' does not stand for
'recurse-submodules' for them).

1 2 >

1 - 100 of 109 matches

Mail list logo