date:20180511

On Sat, May 12, 2018 at 02:08:48PM +0900, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> >> $ git grep -v second
> >> $ git grep --not -e second
> >>
> >> may hit all lines in this message (except for the obvious two
> >> lines), but we cannot say which column we found a hit.  I am
> >> wondering if it is too grave a sin to report "the whole line is what
> >> satisfied the criteria given" and say the match lies at column #1.
> >
> > I think that is sensible. I previously was opposed to this because I
> > thought that it would be too difficult to script around the 'sometimes
> > we have columns but other times not' and 'I gave --column' but have to
> > check whether or not they are really there.
>
> I am not sure if you really got what I meant.  I am suggesting that
> "git grep -v --column second" should report that the entire line has
> hit for each and every line that does not have "second" on it, which
> is a good answer and eliminate "sometimes there is column answer (or
> answer to -o query) and sometimes not" at the same time.

I re-read your note and understand more clearly now what your suggestion
is. To ensure that we're in agreement, do you mean:

  1. '--column -v' will _never_ give a column, but will never die(),
  either

  2. '--column --[and | or | not]' will never give a column, but will
  also never die(), either.

I think that _those_ semantics are sensible, and I apologize for
misinterpreting your original note.

> > In other terms:
> >
> >   * not giving '--column' will _never_ give a column,
> >   * '--column --invert' will _always_ die(), and
> >   * '--column --[and | or | not]' will _never_ give a column.
>
> So this is completely opposite from what I would have expected. to
> somebody who said "I think that is sensible." over there.

Thanks,
Taylor

Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

Taylor Blau  writes:

>> $ git grep -v second
>> $ git grep --not -e second
>>
>> may hit all lines in this message (except for the obvious two
>> lines), but we cannot say which column we found a hit.  I am
>> wondering if it is too grave a sin to report "the whole line is what
>> satisfied the criteria given" and say the match lies at column #1.
>
> I think that is sensible. I previously was opposed to this because I
> thought that it would be too difficult to script around the 'sometimes
> we have columns but other times not' and 'I gave --column' but have to
> check whether or not they are really there.

I am not sure if you really got what I meant.  I am suggesting that
"git grep -v --column second" should report that the entire line has
hit for each and every line that does not have "second" on it, which
is a good answer and eliminate "sometimes there is column answer (or
answer to -o query) and sometimes not" at the same time.

> In other terms:
>
>   * not giving '--column' will _never_ give a column,
>   * '--column --invert' will _always_ die(), and
>   * '--column --[and | or | not]' will _never_ give a column.

So this is completely opposite from what I would have expected. to
somebody who said "I think that is sensible." over there.

Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

On Thu, May 10, 2018 at 09:04:34AM +0900, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> > This check we should retain and change the wording to mention '--and',
> > '--or', and '--not' specifically.
>
> Why are these problematic in the first place?  If I said
>
> $ git grep -e first --and -e these
> $ git grep -e first --and --not -e those
> $ git grep -e first --or -e those
>
> I'd expect that the first line of this paragraph will hit, and the
> first hit for these three are "these", "first" and "first",
> respectively.  Most importantly, in the last one, "--or" can be
> omitted and the whole thing stops being "extended", so rejecting
> extended as a whole does not make much sense.

Agreed that this is what I would expect, too. The trouble is that we
never do a compilation step from containing --and, --or, --not to a
POSIX regexp. So, if we're extended, we have to assume that there might
not be an answer.

Given this, I don't think we should categorically die() when we
encounter this (more below in the next hunk of this mail). I think this
thread has established that there are certainly cases where we cannot
provide a meaningful answer, (--not at the top-level, for instance) but
there are cases where we can (as you indicate above).

One day, I would like to support --column with --and, --or, or --not in
cases where there _is_ a definite answer. That said, omitting this
information for now and at least not die()-ing I think is worth taking
this patch earlier, and leaving some #leftoverbits.

> $ git grep -v second
> $ git grep --not -e second
>
> may hit all lines in this message (except for the obvious two
> lines), but we cannot say which column we found a hit.  I am
> wondering if it is too grave a sin to report "the whole line is what
> satisfied the criteria given" and say the match lies at column #1.

I think that is sensible. I previously was opposed to this because I
thought that it would be too difficult to script around the 'sometimes
we have columns but other times not' and 'I gave --column' but have to
check whether or not they are really there.

I no longer believe that my above argument is sound. It simplifies the
matter greatly to simply not share columns when we don't have a good
answer, and do when we do.

In other terms:

  * not giving '--column' will _never_ give a column,
  * '--column --invert' will _always_ die(), and
  * '--column --[and | or | not]' will _never_ give a column.

Thanks,
Taylor

[PATCH v2 2/2] builtin/grep.c: teach '-o', '--only-matching' to 'git-grep'

Teach GNU grep(1)'s '-o' ('--only-matching') to 'git-grep'. This option
prints only the matching components of each line. It writes multiple
lines if more than one match exists on a given line.

For example:

  $ /git grep -on --column git -- README.md | head -3
  README.md:15:56:git
  README.md:18:20:git
  README.md:19:16:git

By using show_line_header(), 'git grep --only-matching' correctly
respects the '--heading' option:

  $ git grep -on --column --heading git -- README.md | head -4
  README.md
  15:56:git
  18:20:git
  19:16:git

We mirror GNU grep's behavior when given -A, -B, or -C with
--only-matching, by displaying only the matching portion(s) of a line,
ignoring contextual line(s), but displaying '--' (context separator)
line(s).

Notably: when show_line() is called on a line that contains _multiple_
matches, we keep track of a relative offset from the beginning of the
line and increment 'cno' in subsequent calls to show_line_header() when
the expression is not extended. In the extended case, we do not do this,
because the column of the first match is undefined, thus relative
offsets are meaningless.

Signed-off-by: Taylor Blau 
---
 Documentation/git-grep.txt |  6 +++-
 builtin/grep.c |  1 +
 grep.c | 34 +--
 grep.h |  1 +
 t/t7810-grep.sh| 69 ++
 5 files changed, 107 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index c48a578cb1..5c09abec4a 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -20,7 +20,7 @@ SYNOPSIS
   [-c | --count] [--all-match] [-q | --quiet]
   [--max-depth ]
   [--color[=] | --no-color]
-  [--break] [--heading] [-p | --show-function]
+  [--break] [--heading] [-o | --only-matching] [-p | --show-function]
   [-A ] [-B ] [-C ]
   [-W | --function-context]
   [--threads ]
@@ -223,6 +223,10 @@ providing this option will cause it to die.
Show the filename above the matches in that file instead of
at the start of each shown line.
 
+--o::
+--only-matching::
+   Prints only the matching part of the lines.
+
 -p::
 --show-function::
Show the preceding line that contains the function name of
diff --git a/builtin/grep.c b/builtin/grep.c
index f9f516dfc4..0507ac335a 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -851,6 +851,7 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
N_("print empty line between matches from different 
files")),
OPT_BOOL(0, "heading", ,
N_("show filename only once above matches from same 
file")),
+   OPT_BOOL('o', "only-matching", _matching, N_("show 
only matches")),
OPT_GROUP(""),
OPT_CALLBACK('C', "context", , N_("n"),
N_("show  context lines before and after matches"),
diff --git a/grep.c b/grep.c
index 36bf7cf08d..9297fde643 100644
--- a/grep.c
+++ b/grep.c
@@ -1422,12 +1422,24 @@ static void show_line(struct grep_opt *opt, char *bol, 
char *eol,
opt->output(opt, "\n", 1);
}
}
+   if (opt->only_matching && sign != ':') {
+   /*
+* If we're given '--only-matching' and the line is a contextual
+* one (i.e., we're given '-A', '-B', or '-C'), mark the line as
+* shown (to advance opt->last_shown), but do not show it (since
+* we are given '--only-matching').
+*/
+   opt->last_shown = lno;
+   return;
+   }
show_line_header(opt, name, lno, cno, sign);
-   if (opt->color) {
+   if (opt->color || opt->only_matching) {
regmatch_t match;
enum grep_context ctx = GREP_CONTEXT_BODY;
int ch = *eol;
int eflags = 0;
+   int first = 1;
+   int offset = 1;
 
if (sign == ':')
match_color = opt->color_match_selected;
@@ -1444,16 +1456,32 @@ static void show_line(struct grep_opt *opt, char *bol, 
char *eol,
if (match.rm_so == match.rm_eo)
break;
 
-   output_color(opt, bol, match.rm_so, line_color);
+   if (!opt->only_matching) {
+   output_color(opt, bol, match.rm_so, line_color);
+   } else if (!first) {
+   /*
+* We are given --only-matching, and this is not
+* the first match on a line. Reprint the
+* newline and header before showing another
+* match.
+*/
+

[PATCH v2 0/2] builtin/grep.c: teach '-o', '--only-matching'

Hi,

Attached is the second re-roll of my series to add GNU grep's
'--only-matching' to git-grep.

The main thing that has changed since last time is our handling of
-{A,B,C}. Previously, as Peff points out in [1], we handle this in a
buggy way different than GNU.

I agree that although 'git grep -C -o ...' is an unusual invocation,
it is useful to (1) maintain as much consistency as reasonably makes
sense, and (2) to at least not be buggy.

I have also responded to Eric's suggestions in [2], and [3].

Thanks as always for your kind review :-).


Thanks,
Taylor

[1]: https://public-inbox.org/git/20180510064014.ga31...@sigill.intra.peff.net
[2]: 
https://public-inbox.org/git/capig+csrjww4-7vj6wk8aofnb20bqucsooysjdpci1r5vb8...@mail.gmail.com
[3]: 
https://public-inbox.org/git/capig+crbbz+qtqgiw_wq9e-groa-wtevp1vcrqmj5yqj8ty...@mail.gmail.com

Taylor Blau (2):
  grep.c: extract show_line_header()
  builtin/grep.c: teach '-o', '--only-matching' to 'git-grep'

 Documentation/git-grep.txt |  6 ++-
 builtin/grep.c |  1 +
 grep.c | 78 +++---
 grep.h |  1 +
 t/t7810-grep.sh| 69 +
 5 files changed, 132 insertions(+), 23 deletions(-)

--
2.17.0

[PATCH v2 1/2] grep.c: extract show_line_header()

Teach 'git-grep(1)' how to print a line header multiple times from
show_line() in preparation for it learning '--only-matching'.

show_line_header() comprises the code in show_line() executed in order
to produce a line header. It is a one-for-one extraction of the existing
implementation.

For now, show_line_header() provides no benefit over the change before
this patch. The following patch will conditionally call
show_line_header() multiple times per invocation to show_line(), which
is the desired benefit of this change.

Signed-off-by: Taylor Blau 
---
 grep.c | 44 +---
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/grep.c b/grep.c
index 5ba1b05526..36bf7cf08d 100644
--- a/grep.c
+++ b/grep.c
@@ -1369,26 +1369,9 @@ static int next_match(struct grep_opt *opt, char *bol, 
char *eol,
return hit;
 }
 
-static void show_line(struct grep_opt *opt, char *bol, char *eol,
- const char *name, unsigned lno, unsigned cno, char sign)
+static void show_line_header(struct grep_opt *opt, const char *name,
+ unsigned lno, unsigned cno, char sign)
 {
-   int rest = eol - bol;
-   const char *match_color, *line_color = NULL;
-
-   if (opt->file_break && opt->last_shown == 0) {
-   if (opt->show_hunk_mark)
-   opt->output(opt, "\n", 1);
-   } else if (opt->pre_context || opt->post_context || opt->funcbody) {
-   if (opt->last_shown == 0) {
-   if (opt->show_hunk_mark) {
-   output_color(opt, "--", 2, opt->color_sep);
-   opt->output(opt, "\n", 1);
-   }
-   } else if (lno > opt->last_shown + 1) {
-   output_color(opt, "--", 2, opt->color_sep);
-   opt->output(opt, "\n", 1);
-   }
-   }
if (opt->heading && opt->last_shown == 0) {
output_color(opt, name, strlen(name), opt->color_filename);
opt->output(opt, "\n", 1);
@@ -1417,6 +1400,29 @@ static void show_line(struct grep_opt *opt, char *bol, 
char *eol,
output_color(opt, buf, strlen(buf), opt->color_columnno);
output_sep(opt, sign);
}
+}
+
+static void show_line(struct grep_opt *opt, char *bol, char *eol,
+ const char *name, unsigned lno, unsigned cno, char sign)
+{
+   int rest = eol - bol;
+   const char *match_color, *line_color = NULL;
+
+   if (opt->file_break && opt->last_shown == 0) {
+   if (opt->show_hunk_mark)
+   opt->output(opt, "\n", 1);
+   } else if (opt->pre_context || opt->post_context || opt->funcbody) {
+   if (opt->last_shown == 0) {
+   if (opt->show_hunk_mark) {
+   output_color(opt, "--", 2, opt->color_sep);
+   opt->output(opt, "\n", 1);
+   }
+   } else if (lno > opt->last_shown + 1) {
+   output_color(opt, "--", 2, opt->color_sep);
+   opt->output(opt, "\n", 1);
+   }
+   }
+   show_line_header(opt, name, lno, cno, sign);
if (opt->color) {
regmatch_t match;
enum grep_context ctx = GREP_CONTEXT_BODY;
-- 
2.17.0

[PATCH v6 4/7] grep.c: display column number of first match

To prepare for 'git grep' learning '--column', teach grep.c's
show_line() how to show the column of the first match on non-context
line.

Signed-off-by: Taylor Blau 
---
 grep.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/grep.c b/grep.c
index fb0fa23231..f3fe416791 100644
--- a/grep.c
+++ b/grep.c
@@ -1364,7 +1364,7 @@ static int next_match(struct grep_opt *opt, char *bol, 
char *eol,
 }
 
 static void show_line(struct grep_opt *opt, char *bol, char *eol,
- const char *name, unsigned lno, char sign)
+ const char *name, unsigned lno, unsigned cno, char sign)
 {
int rest = eol - bol;
const char *match_color, *line_color = NULL;
@@ -1399,6 +1399,17 @@ static void show_line(struct grep_opt *opt, char *bol, 
char *eol,
output_color(opt, buf, strlen(buf), opt->color_lineno);
output_sep(opt, sign);
}
+   /*
+* Treat 'cno' as the 1-indexed offset from the start of a non-context
+* line to its first match. Otherwise, 'cno' is 0 indicating that we are
+* being called with a context line.
+*/
+   if (opt->columnnum && cno) {
+   char buf[32];
+   xsnprintf(buf, sizeof(buf), "%d", cno);
+   output_color(opt, buf, strlen(buf), opt->color_columnno);
+   output_sep(opt, sign);
+   }
if (opt->color) {
regmatch_t match;
enum grep_context ctx = GREP_CONTEXT_BODY;
@@ -1504,7 +1515,7 @@ static void show_funcname_line(struct grep_opt *opt, 
struct grep_source *gs,
break;
 
if (match_funcname(opt, gs, bol, eol)) {
-   show_line(opt, bol, eol, gs->name, lno, '=');
+   show_line(opt, bol, eol, gs->name, lno, 0, '=');
break;
}
}
@@ -1569,7 +1580,7 @@ static void show_pre_context(struct grep_opt *opt, struct 
grep_source *gs,
 
while (*eol != '\n')
eol++;
-   show_line(opt, bol, eol, gs->name, cur, sign);
+   show_line(opt, bol, eol, gs->name, cur, 0, sign);
bol = eol + 1;
cur++;
}
@@ -1833,7 +1844,7 @@ static int grep_source_1(struct grep_opt *opt, struct 
grep_source *gs, int colle
show_pre_context(opt, gs, bol, eol, lno);
else if (opt->funcname)
show_funcname_line(opt, gs, bol, lno);
-   show_line(opt, bol, eol, gs->name, lno, ':');
+   show_line(opt, bol, eol, gs->name, lno, match.rm_so+1, 
':');
last_hit = lno;
if (opt->funcbody)
show_function = 1;
@@ -1862,7 +1873,7 @@ static int grep_source_1(struct grep_opt *opt, struct 
grep_source *gs, int colle
/* If the last hit is within the post context,
 * we need to show this line.
 */
-   show_line(opt, bol, eol, gs->name, lno, '-');
+   show_line(opt, bol, eol, gs->name, lno, match.rm_so+1, 
'-');
}
 
next_line:
-- 
2.17.0

[PATCH v6 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

Teach 'git-grep(1)' a new option, '--column', to show the column
number of the first match on a non-context line. This makes it possible
to teach 'contrib/git-jump/git-jump' how to seek to the first matching
position of a grep match in your editor, and allows similar additional
scripting capabilities.

For example:

  $ git grep -n --column foo | head -n3
  .clang-format:51:14:# myFunction(foo, bar, baz);
  .clang-format:64:7:# int foo();
  .clang-format:75:8:# void foo()

Signed-off-by: Taylor Blau 
---
 Documentation/git-grep.txt |  7 ++-
 builtin/grep.c |  4 
 grep.c |  5 +++--
 t/t7810-grep.sh| 39 ++
 4 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 18b494731f..cec4665df5 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -13,7 +13,7 @@ SYNOPSIS
   [-v | --invert-match] [-h|-H] [--full-name]
   [-E | --extended-regexp] [-G | --basic-regexp]
   [-P | --perl-regexp]
-  [-F | --fixed-strings] [-n | --line-number]
+  [-F | --fixed-strings] [-n | --line-number] [--column]
   [-l | --files-with-matches] [-L | --files-without-match]
   [(-O | --open-files-in-pager) []]
   [-z | --null]
@@ -169,6 +169,11 @@ providing this option will cause it to die.
 --line-number::
Prefix the line number to matching lines.
 
+--column::
+   Prefix the 1-indexed byte-offset of the first match from the start of 
the
+   matching line. This option is incompatible with '--invert-match', and
+   ignored with expressions using '--and', '--or', '--not'.
+
 -l::
 --files-with-matches::
 --name-only::
diff --git a/builtin/grep.c b/builtin/grep.c
index 5f32d2ce84..f9f516dfc4 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -829,6 +829,7 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
GREP_PATTERN_TYPE_PCRE),
OPT_GROUP(""),
OPT_BOOL('n', "line-number", , N_("show line 
numbers")),
+   OPT_BOOL(0, "column", , N_("show column number of 
first match")),
OPT_NEGBIT('h', NULL, , N_("don't show 
filenames"), 1),
OPT_BIT('H', NULL, , N_("show filenames"), 1),
OPT_NEGBIT(0, "full-name", ,
@@ -,6 +1112,9 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
hit = grep_objects(, , the_repository, );
}
 
+   if (opt.columnnum && opt.invert)
+   die(_("--column and --invert-match cannot be combined"));
+
if (num_threads)
hit |= wait_all();
if (hit && show_in_pager)
diff --git a/grep.c b/grep.c
index f3fe416791..7396b49a2d 100644
--- a/grep.c
+++ b/grep.c
@@ -1402,9 +1402,10 @@ static void show_line(struct grep_opt *opt, char *bol, 
char *eol,
/*
 * Treat 'cno' as the 1-indexed offset from the start of a non-context
 * line to its first match. Otherwise, 'cno' is 0 indicating that we are
-* being called with a context line.
+* being called with a context line, or we are --extended, and cannot
+* always show an answer.
 */
-   if (opt->columnnum && cno) {
+   if (opt->columnnum && sign == ':' && !opt->extended) {
char buf[32];
xsnprintf(buf, sizeof(buf), "%d", cno);
output_color(opt, buf, strlen(buf), opt->color_columnno);
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 1797f632a3..491b2e044a 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -99,6 +99,40 @@ do
test_cmp expected actual
'
 
+   test_expect_success "grep -w $L (with --column)" '
+   {
+   echo ${HC}file:5:foo mmap bar
+   echo ${HC}file:14:foo_mmap bar mmap
+   echo ${HC}file:5:foo mmap bar_mmap
+   echo ${HC}file:14:foo_mmap bar mmap baz
+   } >expected &&
+   git grep --column -w -e mmap $H >actual &&
+   test_cmp expected actual
+   '
+
+   test_expect_success "grep -w $L (with --column, -C)" '
+   {
+   echo ${HC}file:5:foo mmap bar
+   echo ${HC}file-foo_mmap bar
+   echo ${HC}file:14:foo_mmap bar mmap
+   echo ${HC}file:5:foo mmap bar_mmap
+   echo ${HC}file:14:foo_mmap bar mmap baz
+   } >expected &&
+   git grep --column -w -C1 -e mmap $H >actual &&
+   test_cmp expected actual
+   '
+
+   test_expect_success "grep -w $L (with --line-number, --column)" '
+   {
+   echo ${HC}file:1:5:foo mmap bar
+   echo ${HC}file:3:14:foo_mmap bar mmap
+   echo

[PATCH v6 7/7] contrib/git-jump/git-jump: jump to match column in addition to line

Take advantage of 'git-grep(1)''s new option, '--column' in order to
teach Peff's 'git-jump' script how to jump to the correct column for any
given match.

'git-grep(1)''s output is in the correct format for Vim's jump list, so
no additional cleanup is necessary.

Signed-off-by: Taylor Blau 
---
 contrib/git-jump/README   | 12 ++--
 contrib/git-jump/git-jump |  2 +-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/contrib/git-jump/README b/contrib/git-jump/README
index 4484bda410..2f618a7f97 100644
--- a/contrib/git-jump/README
+++ b/contrib/git-jump/README
@@ -25,6 +25,13 @@ git-jump will feed this to the editor:
 foo.c:2: printf("hello word!\n");
 ---
 
+Or, when running 'git jump grep', column numbers will also be emitted,
+e.g. `git jump grep "hello"` would return:
+
+---
+foo.c:2:9: printf("hello word!\n");
+---
+
 Obviously this trivial case isn't that interesting; you could just open
 `foo.c` yourself. But when you have many changes scattered across a
 project, you can use the editor's support to "jump" from point to point.
@@ -35,7 +42,8 @@ Git-jump can generate four types of interesting lists:
 
   2. The beginning of any merge conflict markers.
 
-  3. Any grep matches.
+  3. Any grep matches, including the column of the first match on a
+ line.
 
   4. Any whitespace errors detected by `git diff --check`.
 
@@ -82,7 +90,7 @@ which does something similar to `git jump grep`. However, it 
is limited
 to positioning the cursor to the correct line in only the first file,
 leaving you to locate subsequent hits in that file or other files using
 the editor or pager. By contrast, git-jump provides the editor with a
-complete list of files and line numbers for each match.
+complete list of files, lines, and a column number for each match.
 
 
 Limitations
diff --git a/contrib/git-jump/git-jump b/contrib/git-jump/git-jump
index 80ab0590bc..931b0fe3a9 100755
--- a/contrib/git-jump/git-jump
+++ b/contrib/git-jump/git-jump
@@ -52,7 +52,7 @@ mode_merge() {
 # editor shows them to us in the status bar.
 mode_grep() {
cmd=$(git config jump.grepCmd)
-   test -n "$cmd" || cmd="git grep -n"
+   test -n "$cmd" || cmd="git grep -n --column"
$cmd "$@" |
perl -pe '
s/[ \t]+/ /g;
-- 
2.17.0

[PATCH v6 6/7] grep.c: add configuration variables to show matched option

To support git-grep(1)'s new option, '--column', document and teach
grep.c how to interpret relevant configuration options, similar to those
associated with '--line-number'.

Signed-off-by: Taylor Blau 
---
 Documentation/config.txt   | 5 +
 Documentation/git-grep.txt | 3 +++
 grep.c | 6 ++
 3 files changed, 14 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 6e8d969f52..b3c861c5c3 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1159,6 +1159,8 @@ color.grep.::
function name lines (when using `-p`)
 `lineNumber`;;
line number prefix (when using `-n`)
+`column`;;
+   column number prefix (when using `--column`)
 `match`;;
matching text (same as setting `matchContext` and `matchSelected`)
 `matchContext`;;
@@ -1708,6 +1710,9 @@ gitweb.snapshot::
 grep.lineNumber::
If set to true, enable `-n` option by default.
 
+grep.column::
+   If set to true, enable the `--column` option by default.
+
 grep.patternType::
Set the default matching behavior. Using a value of 'basic', 'extended',
'fixed', or 'perl' will enable the `--basic-regexp`, 
`--extended-regexp`,
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index cec4665df5..c48a578cb1 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -44,6 +44,9 @@ CONFIGURATION
 grep.lineNumber::
If set to true, enable `-n` option by default.
 
+grep.column::
+   If set to true, enable the `--column` option by default.
+
 grep.patternType::
Set the default matching behavior. Using a value of 'basic', 'extended',
'fixed', or 'perl' will enable the `--basic-regexp`, 
`--extended-regexp`,
diff --git a/grep.c b/grep.c
index 7396b49a2d..5ba1b05526 100644
--- a/grep.c
+++ b/grep.c
@@ -96,6 +96,10 @@ int grep_config(const char *var, const char *value, void *cb)
opt->linenum = git_config_bool(var, value);
return 0;
}
+   if (!strcmp(var, "grep.column")) {
+   opt->columnnum = git_config_bool(var, value);
+   return 0;
+   }
 
if (!strcmp(var, "grep.fullname")) {
opt->relative = !git_config_bool(var, value);
@@ -112,6 +116,8 @@ int grep_config(const char *var, const char *value, void 
*cb)
color = opt->color_function;
else if (!strcmp(var, "color.grep.linenumber"))
color = opt->color_lineno;
+   else if (!strcmp(var, "color.grep.column"))
+   color = opt->color_columnno;
else if (!strcmp(var, "color.grep.matchcontext"))
color = opt->color_match_context;
else if (!strcmp(var, "color.grep.matchselected"))
-- 
2.17.0

[PATCH v6 1/7] Documentation/config.txt: camel-case lineNumber for consistency

lineNumber has casing that is inconsistent with surrounding options,
like color.grep.matchContext, and color.grep.matchSelected. Re-case this
documentation in order to be consistent with the text around it, and to
ensure that new entries are consistent, too.

Signed-off-by: Taylor Blau 
---
 Documentation/config.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2659153cb3..6e8d969f52 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1157,7 +1157,7 @@ color.grep.::
filename prefix (when not using `-h`)
 `function`;;
function name lines (when using `-p`)
-`linenumber`;;
+`lineNumber`;;
line number prefix (when using `-n`)
 `match`;;
matching text (same as setting `matchContext` and `matchSelected`)
-- 
2.17.0

[PATCH v6 2/7] grep.c: expose matched column in match_line()

When calling match_line(), callers presently cannot determine the
relative offset of the match because match_line() discards the
'regmatch_t' that contains this information.

Instead, teach match_line() to take in a 'regmatch_t *' so that callers
can inspect the match's starting and ending offset from the beginning of
the line. This additional argument has no effect when opt->extended is
non-zero.

We will later pass the starting offset from 'regmatch_t *' to
show_line() in order to display the column number of the first match.

Signed-off-by: Taylor Blau 
---
 grep.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/grep.c b/grep.c
index 65b90c10a3..1c25782355 100644
--- a/grep.c
+++ b/grep.c
@@ -1299,17 +1299,17 @@ static int match_expr(struct grep_opt *opt, char *bol, 
char *eol,
 }
 
 static int match_line(struct grep_opt *opt, char *bol, char *eol,
- enum grep_context ctx, int collect_hits)
+ regmatch_t *match, enum grep_context ctx,
+ int collect_hits)
 {
struct grep_pat *p;
-   regmatch_t match;
 
if (opt->extended)
return match_expr(opt, bol, eol, ctx, collect_hits);
 
/* we do not call with collect_hits without being extended */
for (p = opt->pattern_list; p; p = p->next) {
-   if (match_one_pattern(p, bol, eol, ctx, , 0))
+   if (match_one_pattern(p, bol, eol, ctx, match, 0))
return 1;
}
return 0;
@@ -1699,6 +1699,7 @@ static int grep_source_1(struct grep_opt *opt, struct 
grep_source *gs, int colle
int try_lookahead = 0;
int show_function = 0;
struct userdiff_driver *textconv = NULL;
+   regmatch_t match;
enum grep_context ctx = GREP_CONTEXT_HEAD;
xdemitconf_t xecfg;
 
@@ -1788,7 +1789,7 @@ static int grep_source_1(struct grep_opt *opt, struct 
grep_source *gs, int colle
if ((ctx == GREP_CONTEXT_HEAD) && (eol == bol))
ctx = GREP_CONTEXT_BODY;
 
-   hit = match_line(opt, bol, eol, ctx, collect_hits);
+   hit = match_line(opt, bol, eol, , ctx, collect_hits);
*eol = ch;
 
if (collect_hits)
-- 
2.17.0

[PATCH v6 3/7] grep.[ch]: extend grep_opt to allow showing matched column

To support showing the matched column when calling 'git-grep(1)', teach
'grep_opt' the normal set of options to configure the default behavior
and colorization of this feature.

Signed-off-by: Taylor Blau 
---
 grep.c | 3 +++
 grep.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/grep.c b/grep.c
index 1c25782355..fb0fa23231 100644
--- a/grep.c
+++ b/grep.c
@@ -46,6 +46,7 @@ void init_grep_defaults(void)
color_set(opt->color_filename, "");
color_set(opt->color_function, "");
color_set(opt->color_lineno, "");
+   color_set(opt->color_columnno, "");
color_set(opt->color_match_context, GIT_COLOR_BOLD_RED);
color_set(opt->color_match_selected, GIT_COLOR_BOLD_RED);
color_set(opt->color_selected, "");
@@ -155,6 +156,7 @@ void grep_init(struct grep_opt *opt, const char *prefix)
opt->extended_regexp_option = def->extended_regexp_option;
opt->pattern_type_option = def->pattern_type_option;
opt->linenum = def->linenum;
+   opt->columnnum = def->columnnum;
opt->max_depth = def->max_depth;
opt->pathname = def->pathname;
opt->relative = def->relative;
@@ -164,6 +166,7 @@ void grep_init(struct grep_opt *opt, const char *prefix)
color_set(opt->color_filename, def->color_filename);
color_set(opt->color_function, def->color_function);
color_set(opt->color_lineno, def->color_lineno);
+   color_set(opt->color_columnno, def->color_columnno);
color_set(opt->color_match_context, def->color_match_context);
color_set(opt->color_match_selected, def->color_match_selected);
color_set(opt->color_selected, def->color_selected);
diff --git a/grep.h b/grep.h
index 399381c908..08a0b391c5 100644
--- a/grep.h
+++ b/grep.h
@@ -127,6 +127,7 @@ struct grep_opt {
int prefix_length;
regex_t regexp;
int linenum;
+   int columnnum;
int invert;
int ignore_case;
int status_only;
@@ -159,6 +160,7 @@ struct grep_opt {
char color_filename[COLOR_MAXLEN];
char color_function[COLOR_MAXLEN];
char color_lineno[COLOR_MAXLEN];
+   char color_columnno[COLOR_MAXLEN];
char color_match_context[COLOR_MAXLEN];
char color_match_selected[COLOR_MAXLEN];
char color_selected[COLOR_MAXLEN];
-- 
2.17.0

[PATCH v6 0/7] Teach '--column' to 'git-grep(1)'

Hi,

Attached is my sixth re-roll of a series to add '--column' to 'git
grep'.

The main change since v5 is supporting --column with queries containing
--and, --or, or --not. Previously, I had chosen to die() in this case
since there isn't always a good answer to "what is the first column of
?" but have gone back on this for two reasons:

  1. It is important not to regress calls to git-jump/contrib/git-jump
  that contain --and, --or, or --not.

  2. It is not that hard to detect the absence of column data in scripts.
 Likewise, git-jump will happily accept lines with or without
 columnar information, and Vim will accept it as-is.

So, let's support --column and only die() when also given
--invert-match. When we don't have a good answer, print nothing.

Thanks,
Taylor

Taylor Blau (7):
  Documentation/config.txt: camel-case lineNumber for consistency
  grep.c: expose matched column in match_line()
  grep.[ch]: extend grep_opt to allow showing matched column
  grep.c: display column number of first match
  builtin/grep.c: add '--column' option to 'git-grep(1)'
  grep.c: add configuration variables to show matched option
  contrib/git-jump/git-jump: jump to match column in addition to line

 Documentation/config.txt   |  7 ++-
 Documentation/git-grep.txt | 10 +-
 builtin/grep.c |  4 
 contrib/git-jump/README| 12 ++--
 contrib/git-jump/git-jump  |  2 +-
 grep.c | 40 +-
 grep.h |  2 ++
 t/t7810-grep.sh| 39 +
 8 files changed, 102 insertions(+), 14 deletions(-)

Inter-diff (since v5):

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index dc8f76ce99..c48a578cb1 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -173,8 +173,9 @@ providing this option will cause it to die.
Prefix the line number to matching lines.

 --column::
-   Prefix the 1-indexed byte-offset of the first match on non-context 
lines. This
-   option is incompatible with '--invert-match', and extended expressions.
+   Prefix the 1-indexed byte-offset of the first match from the start of 
the
+   matching line. This option is incompatible with '--invert-match', and
+   ignored with expressions using '--and', '--or', '--not'.

 -l::
 --files-with-matches::
diff --git a/grep.c b/grep.c
index 5d904810ad..5ba1b05526 100644
--- a/grep.c
+++ b/grep.c
@@ -1001,9 +1001,6 @@ static void compile_grep_patterns_real(struct grep_opt 
*opt)
else if (!opt->extended && !opt->debug)
return;

-   if (opt->columnnum && opt->extended)
-   die(_("--column and extended expressions cannot be combined"));
-
p = opt->pattern_list;
if (p)
opt->pattern_expression = compile_pattern_expr();
@@ -1411,9 +1408,10 @@ static void show_line(struct grep_opt *opt, char *bol, 
char *eol,
/*
 * Treat 'cno' as the 1-indexed offset from the start of a non-context
 * line to its first match. Otherwise, 'cno' is 0 indicating that we are
-* being called with a context line.
+* being called with a context line, or we are --extended, and cannot
+* always show an answer.
 */
-   if (opt->columnnum && cno) {
+   if (opt->columnnum && sign == ':' && !opt->extended) {
char buf[32];
xsnprintf(buf, sizeof(buf), "%d", cno);
output_color(opt, buf, strlen(buf), opt->color_columnno);
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index aa56b21ed9..491b2e044a 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -110,6 +110,18 @@ do
test_cmp expected actual
'

+   test_expect_success "grep -w $L (with --column, -C)" '
+   {
+   echo ${HC}file:5:foo mmap bar
+   echo ${HC}file-foo_mmap bar
+   echo ${HC}file:14:foo_mmap bar mmap
+   echo ${HC}file:5:foo mmap bar_mmap
+   echo ${HC}file:14:foo_mmap bar mmap baz
+   } >expected &&
+   git grep --column -w -C1 -e mmap $H >actual &&
+   test_cmp expected actual
+   '
+
test_expect_success "grep -w $L (with --line-number, --column)" '
{
echo ${HC}file:1:5:foo mmap bar
@@ -1617,9 +1629,4 @@ test_expect_success 'grep does not allow --column, 
--invert-match' '
test_i18ngrep "\-\-column and \-\-invert-match cannot be combined" err
 '

-test_expect_success 'grep does not allow --column, extended' '
-   test_must_fail git grep --column --not -e pat 2>err &&
-   test_i18ngrep "\-\-column and extended expressions cannot be combined" 
err
-'
-
 test_done

--
2.17.0

Bug report for git push

2018-05-11 Thread Surenkumar Nihalani

Hi everyone!

I am facing this spurious issue (not easily reproducible and usually a retry 
fixes it) with git push:

Warning: Permanently added 'github.com,192.30.255.112' (RSA) to the list of 
known hosts.
Counting objects: 8, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (8/8), done.
Writing objects: 100% (8/8), 971 bytes | 971.00 KiB/s, done.
Total 8 (delta 7), reused 0 (delta 0)
remote: Resolving deltas:   0% (0/7)
remote: Resolving deltas:  14% (1/7)
remote: Resolving deltas:  28% (2/7)
remote: Resolving deltas:  42% (3/7)
remote: Resolving deltas:  57% (4/7)
remote: Resolving deltas:  71% (5/7)
remote: Resolving deltas:  85% (6/7)
remote: Resolving deltas: 100% (7/7)
remote: Resolving deltas: 100% (7/7), completed with 7 local objects.
error: failed to push some refs to 'g...@github.com:quora/qcore.git'
[May 12 12:14 AM remote._get_push_info] Error lines received while fetching: 
error: failed to push some refs to 'g...@github.com:quora/qcore.git'
Push flags:  1040
Push summary: [remote rejected] (cannot lock ref 'refs/heads/master': is at 
cf2cc0c147d8215ec87d3ddaf32f0b2c58630423 but expected 
fdda486ad43a6e6b5dc5f2795ce27124e0686752)

Remote repo rejected your commit.


This is happening in git version 2.17.0

I've tried searching stack overflow and the git mailing list but the answers 
aren't recent enough or don’t seem to be permanent fixes.
How do I fix this issue?

Re: [PATCH] doc: fix config API documentation about config_with_options

Brandon Williams  writes:

> On 05/09, Antonio Ospite wrote:
>> In commit dc8441fdb ("config: don't implicitly use gitdir or commondir",
>> 2017-06-14) the function git_config_with_options was renamed to
>> config_with_options to better reflect the fact that it does not access
>> the git global config or the repo config by default.
>> 
>> However Documentation/technical/api-config.txt still refers to the
>> previous name, fix that.
>> 
>> While at it also update the documentation about the extra parameters,
>> because they too changed since the initial definition.
>> 
>> Signed-off-by: Antonio Ospite 
>> ---
>> 
>> Patch based on the maint branch.
>
> Thanks for updating the docs.  Maybe one day we can migrate these docs
> to the source files themselves, making it easier to keep up to date.
> For now this is good :)

Thanks, both.

Re: [PATCH] git-submodule.sh: try harder to fetch a submodule

Jonathan Nieder  writes:

>> HEAD is allowed by the protocol spec and would happen, if HEAD points at a
>> ref, that this user cannot see (due to ACLs for example).
>
> A more typical example would be if the ref simply doesn't exist (i.e.,
> is a branch yet to be born).

Indeed this is interesting.  At first glance I thought this was
about underlying "git clone" failing to grab things from a
repository with unborn HEAD, but that part works perfectly OK.  And
if this failed clone were a full-repository clone that tried to grab
even HEAD, then it is likely that we got the tip we need to populate
the submodule's working tree (or the remote is useless for that in
the first place).  So the "allow to try even harder" is probably a
good direction to go in.

>>  # is not reachable from a ref.
>>  is_tip_reachable "$sm_path" "$sha1" ||
>>  fetch_in_submodule "$sm_path" $depth ||
>
> Is keeping the '||' at the end of this line intended?

Good question.  It used to be

guard1 || action1 || die
guard2 || action2 || die

Even after a successful exit from "action1", the code used to try
the second attempt, and the patch is removing the first die, making
the whole thing into

guard1 || action1 ||
guard2 || action2 || die

which suggests a grave regression, doesn't it?  "action1" (a whole
repository fetch) may not pull down the needed commit even the fetch
operation itself may succeed, in which case "guard2" notices that
the tip is still not here and "action2" (an exact SHA-1 fetch) tries
to pull down the exact thing as the last resort.

So it probably should be more like

guard1 || action1 || warn
guard2 || action2 || die

so that no matter what the outcome of the action1 is, the second set
gets executed.

Re: [PATCH] git-submodule.sh: try harder to fetch a submodule

On Fri, May 11, 2018 at 4:28 PM, Jonathan Nieder  wrote:
> Hi,
>
> Stefan Beller wrote:
>
>> This is the logical continuum of fb43e31f2b4 (submodule: try harder to
>> fetch needed sha1 by direct fetching sha1, 2016-02-23) and fixes it as
>> some assumptions were not correct.
>
> Interesting.
>
> I think what would help most is an example set of commands I can use
> to reproduce this (bonus points if in the form of a test).

I tried coming up with a test in git-core that produces a remote similar to
Gerrit, and let me tell you, it's not Git that is weird here. ;)

>> > If $sha1 was not part of the default fetch ... fail ourselves here
>> assumes that the fetch_in_submodule only fails when the serverside does
>
> I'm having some trouble with the formatting here.  Is the part
> preceded by '>' a quote, and if so a quote from what?

The quote is from fb43e31f2b4.

>> There are other failures, why such a fetch may fail, such as
>> fatal: Couldn't find remote ref HEAD
>> which can happen if the remote side doesn't advertise HEAD. Not advertising
>
> nit: it can be useful to have a blank line before and after such
> example output to help both my eyes and tools like "git log
> --format='%w(100)%b'" to understand the formatting.

Why would you use this formatting?

>
>> HEAD is allowed by the protocol spec and would happen, if HEAD points at a
>> ref, that this user cannot see (due to ACLs for example).
>
> A more typical example would be if the ref simply doesn't exist (i.e.,
> is a branch yet to be born).

Oh, I checked that but not for the submodule case, let me check that.

>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index 24914963ca2..13b378a6c8f 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -614,7 +614,6 @@ cmd_update()
>>   # is not reachable from a ref.
>>   is_tip_reachable "$sm_path" "$sha1" ||
>>   fetch_in_submodule "$sm_path" $depth ||
>
> Is keeping the '||' at the end of this line intended?

yes, as we only want to run the code below if there was some error.

>> - die "$(eval_gettext "Unable to fetch in 
>> submodule path '\$displaypath'")"
>>
>>   # Now we tried the usual fetch, but $sha1 may
>>   # not be reachable from any of the refs
>>   is_tip_reachable "$sm_path" "$sha1" ||
>>   fetch_in_submodule "$sm_path" $depth "$sha1" ||
>>   die "$(eval_gettext "Fetched in submodule path 
>> '\$displaypath', but it did not contain \$sha1. Direct fetching of that 
>> commit failed.")"
>
> Should this error message be changed?

I don't think so?

Proposal

2018-05-11 Thread Zeliha Omer Faruk




--
Hello

Greetings to you please i have a business proposal for you contact me
for more detailes asap thanks.

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
Sisli - Istanbul, Turkey

Re: [PATCH] git-submodule.sh: try harder to fetch a submodule

2018-05-11 Thread Jonathan Nieder

Hi,

Stefan Beller wrote:

> This is the logical continuum of fb43e31f2b4 (submodule: try harder to
> fetch needed sha1 by direct fetching sha1, 2016-02-23) and fixes it as
> some assumptions were not correct.

Interesting.

I think what would help most is an example set of commands I can use
to reproduce this (bonus points if in the form of a test).

> > If $sha1 was not part of the default fetch ... fail ourselves here
> assumes that the fetch_in_submodule only fails when the serverside does

I'm having some trouble with the formatting here.  Is the part
preceded by '>' a quote, and if so a quote from what?

> not support fetching by sha1.
>
> There are other failures, why such a fetch may fail, such as
> fatal: Couldn't find remote ref HEAD
> which can happen if the remote side doesn't advertise HEAD. Not advertising

nit: it can be useful to have a blank line before and after such
example output to help both my eyes and tools like "git log
--format='%w(100)%b'" to understand the formatting.

> HEAD is allowed by the protocol spec and would happen, if HEAD points at a
> ref, that this user cannot see (due to ACLs for example).

A more typical example would be if the ref simply doesn't exist (i.e.,
is a branch yet to be born).

> So do try even harder for a submodule by ignoring the exit code of the
> first fetch and rather relying on the following is_tip_reachable to
> see if we try fetching again.
>
> Signed-off-by: Stefan Beller 
> ---
>  git-submodule.sh | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/git-submodule.sh b/git-submodule.sh
> index 24914963ca2..13b378a6c8f 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -614,7 +614,6 @@ cmd_update()
>   # is not reachable from a ref.
>   is_tip_reachable "$sm_path" "$sha1" ||
>   fetch_in_submodule "$sm_path" $depth ||

Is keeping the '||' at the end of this line intended?

> - die "$(eval_gettext "Unable to fetch in 
> submodule path '\$displaypath'")"
>  
>   # Now we tried the usual fetch, but $sha1 may
>   # not be reachable from any of the refs
>   is_tip_reachable "$sm_path" "$sha1" ||
>   fetch_in_submodule "$sm_path" $depth "$sha1" ||
>   die "$(eval_gettext "Fetched in submodule path 
> '\$displaypath', but it did not contain \$sha1. Direct fetching of that 
> commit failed.")"

Should this error message be changed?

Thanks,
Jonathan

[PATCH] git-submodule.sh: try harder to fetch a submodule

This is the logical continuum of fb43e31f2b4 (submodule: try harder to
fetch needed sha1 by direct fetching sha1, 2016-02-23) and fixes it as
some assumptions were not correct.

> If $sha1 was not part of the default fetch ... fail ourselves here
assumes that the fetch_in_submodule only fails when the serverside does
not support fetching by sha1.

There are other failures, why such a fetch may fail, such as
fatal: Couldn't find remote ref HEAD
which can happen if the remote side doesn't advertise HEAD. Not advertising
HEAD is allowed by the protocol spec and would happen, if HEAD points at a
ref, that this user cannot see (due to ACLs for example).

So do try even harder for a submodule by ignoring the exit code of the
first fetch and rather relying on the following is_tip_reachable to
see if we try fetching again.

Signed-off-by: Stefan Beller 
---
 git-submodule.sh | 1 -
 1 file changed, 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 24914963ca2..13b378a6c8f 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -614,7 +614,6 @@ cmd_update()
# is not reachable from a ref.
is_tip_reachable "$sm_path" "$sha1" ||
fetch_in_submodule "$sm_path" $depth ||
-   die "$(eval_gettext "Unable to fetch in 
submodule path '\$displaypath'")"
 
# Now we tried the usual fetch, but $sha1 may
# not be reachable from any of the refs
-- 
2.17.0.582.gccdcbd54c44.dirty

Re: [PATCH 7/9] completion: drop the hard coded list of config vars

On Thu, May 10, 2018 at 10:19 AM Nguyễn Thái Ngọc Duy 
wrote:
> The new help option --config-for-completion is a machine friendlier
> version of --config where all the placeholders and wildcards are
> dropped, leaving only the good, completable prefixes for
> git-completion.bash to consume.

> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
> diff --git a/builtin/help.c b/builtin/help.c
> @@ -446,9 +447,13 @@ int cmd_help(int argc, const char **argv, const char
*prefix)
> +   int for_human = show_config == 1;
> +
> +   if (for_human)
> +   setup_pager();
> +   list_config_help(for_human);
> +   if (for_human)
> +   printf("\n%s\n", _("'git help config' for more
information"));

Simpler to read, perhaps:

 if (!for_human)
 list_config_help(0);
 else {
 setup_pager();
 list_config_help(1);
 printf(...);
 }

Re: [PATCH 6/9] am: move advice.amWorkDir parsing back to advice.c

On Thu, May 10, 2018 at 10:19 AM Nguyễn Thái Ngọc Duy 
wrote:
> The only benefit from this move (apart from cleaner code) is that
> advice.amWorkDir should now show up in `git help --config`. There
> should be no regression since advice config is always read by the
> git_default_config().

> While at there, use advise() like other code. We now get "hint: "
> prefix and the output is stderr instead of stdout (which is also the
> reason for the test update because stderr is checked in a following
> test and the extra advice can fail it).

> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
> diff --git a/builtin/am.c b/builtin/am.c
> @@ -1827,15 +1827,12 @@ static void am_run(struct am_state *state, int
resume)
>  if (apply_status) {
> -   int advice_amworkdir = 1;


Nit: Maybe also delete the blank line following the removed declaration...

>  printf_ln(_("Patch failed at %s %.*s"),
msgnum(state),
>  linelen(state->msg), state->msg);

Re: [PATCH 4/9] help: add --config to list all available config

On Thu, May 10, 2018 at 10:20 AM Nguyễn Thái Ngọc Duy 
wrote:
> Sometimes it helps to list all available config vars so the user can
> search for something they want. The config man page can also be used
> but it's harder to search if you want to focus on the variable name,
> for example.

> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
> diff --git a/builtin/help.c b/builtin/help.c
> @@ -44,6 +45,7 @@ static struct option builtin_help_options[] = {
>  OPT_BOOL('g', "guides", _guides, N_("print list of useful
guides")),
> +   OPT_BOOL('c', "config", _config, N_("print list recognized
config variables")),

s/list/& of/

Though, simpler might be better: "print all configuration variable names"

> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
> @@ -76,6 +76,24 @@ print_command_list () {
> +print_config_list() {
> +   cat < +static const char *config_name_list[] = {
> +EOF
> +   grep '^[a-zA-Z].*\..*::$' Documentation/config.txt |
> +   grep -v deprecated |
> +   sed 's/::$//; s/,  */\n/g' |

Nit: "grep -v" and "sed" could be combined:

 sed '/deprecated/d; s/::$//; s/,  */\n/g' |

> +   sort |
> +   while read line
> +   do
> +   echo "  \"$line\","
> +   done
> +   cat < +   NULL,
> +};
> +EOF
> diff --git a/help.c b/help.c
> @@ -409,6 +409,54 @@ void list_common_guides_help(void)
> +void list_config_help(void)
> +{
> +   [...]
> +   for (p = config_name_list; *p; p++) {
> +   const char *var = *p;
> +
> +   for (e = slot_expansions; e->prefix; e++) {
> +   struct strbuf sb = STRBUF_INIT;
> +
> +   strbuf_addf(, "%s.%s", e->prefix,
e->placeholder);
> +   if (strcasecmp(var, sb.buf))
> +   continue;

Isn't this "continue" leaking the strbuf? It seems that it would be easier
to declare the strbuf once outside the loop, strbuf_reset() it at the top
of the loop, and finally strbuf_release() it after the loop exits.

> +   e->fn(e->prefix);
> +   strbuf_release();
> +   e->found++;
> +   break;
> +   }
> +   if (!e->prefix)
> +   puts(var);
> +   }

Re: [PATCH 3/9] fsck: factor out msg_id_info[] lazy initialization code

On Thu, May 10, 2018 at 10:19 AM Nguyễn Thái Ngọc Duy 
wrote:
> This array will be used by some other function than parse_msg_id() in
> the following commit. Factor out this prep code so it could be called
> from that one.

> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
> diff --git a/fsck.c b/fsck.c
> @@ -84,26 +84,32 @@ static struct {
> -static int parse_msg_id(const char *text)
> +static void prepare_msg_ids(void)
>   {
> -   if (!msg_id_info[0].downcased) {
> -   /* convert id_string to lower case, without underscores.
*/
> -   for (i = 0; i < FSCK_MSG_MAX; i++) {
> -   [...]
> -   }
> +   /* convert id_string to lower case, without underscores. */
> +   for (i = 0; i < FSCK_MSG_MAX; i++) {
> +   [...]
>  }
> +}
> +
> +static int parse_msg_id(const char *text)
> +{
> +   if (!msg_id_info[0].downcased)
> +   prepare_msg_ids();

If you move the "if (!msg_id_info...)" conditional into the new
parpare_msg_ids() function, then it becomes self-contained; it takes care
of avoiding double-initialization so callers don't have to worry or know
about it. (Doing so would also make the diff less noisy.)

Not at all worth a re-roll.

Re: Implementing reftable in Git

2018-05-11 Thread David Turner

On Fri, 2018-05-11 at 11:31 +0200, Michael Haggerty wrote:
> On Wed, May 9, 2018 at 4:33 PM, Christian Couder
>  wrote:
> > I might start working on implementing reftable in Git soon.
> > [...]
> 
> Nice. It'll be great to have a reftable implementation in git core
> (and ideally libgit2, as well). It seems to me that it could someday
> become the new default reference storage method. The file format is
> considerably more complicated than the current loose/packed scheme,
> which is definitely a disadvantage (for example, for other Git
> implementations). But implementing it *with good performance and
> without races* might be no more complicated than the current scheme.

I am somewhat concerned about perf, because as I recall, we have a
bunch of code which effectively load all refs, which will be more
expensive with reftable than packed-refs (though maybe cheaper than
loose refs).  But maybe we have eliminated this code or can work around
it.

> Testing will be important. There are already many tests specifically
> about testing loose/packed reference storage. These will always have
> to run against repositories that are forced to use that reference
> scheme. And there will need to be new tests specifically about the
> reftable scheme. Both classes of tests should be run every time. That
> much is pretty obvious.
> 
> But currently, there are a lot of tests that assume the loose/packed
> reference format on disk even though the tests are not really related
> to references at all. ISTM that these should be converted to work at
> a
> higher level, for example using `for-each-ref`, `rev-parse`, etc. to
> examine references rather than reading reference files directly. That
> way the tests should run correctly regardless of which scheme is in
> use.

I agree with that, and I think some of my patches from years ago
attempted to do that.  I probably should have broken those out into a
separate series so that they could have been applied separately.

> And since it's too expensive to run the whole test suite with both
> reference storage schemes, it seems to me that the reference storage
> scheme that is used while running the scheme-neutral tests should be
> easy to choose at runtime.

I ran the whole suite with both schemes during my testing, and I think
it was quite valuable in flushing out bugs.

> David Turner did some analogous work for wiring up and testing his
> proposed LMDB ref storage backend that might be useful [1]. I'm CCing
> him, since he might have thoughts on this topic.

Inline, above.

Re: [PATCH] doc: fix config API documentation about config_with_options

2018-05-11 Thread Antonio Ospite

On Wed, 9 May 2018 10:19:50 -0700
Brandon Williams  wrote:

> On 05/09, Antonio Ospite wrote:
> > In commit dc8441fdb ("config: don't implicitly use gitdir or commondir",
> > 2017-06-14) the function git_config_with_options was renamed to
> > config_with_options to better reflect the fact that it does not access
> > the git global config or the repo config by default.
> > 
> > However Documentation/technical/api-config.txt still refers to the
> > previous name, fix that.
> > 
> > While at it also update the documentation about the extra parameters,
> > because they too changed since the initial definition.
> > 
> > Signed-off-by: Antonio Ospite 
> > ---
> > 
> > Patch based on the maint branch.
> 
> Thanks for updating the docs.  Maybe one day we can migrate these docs
> to the source files themselves, making it easier to keep up to date.
> For now this is good :)
> 

Should I resend the patch to gits...@pobox.com with your Acked-by?

Thanks,
   Antonio

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?

[PATCH v2 05/12] commit-graph: verify fanout and lookup table

While running 'git commit-graph verify', verify that the object IDs
are listed in lexicographic order and that the fanout table correctly
navigates into that list of object IDs.

Add tests that check these corruptions are caught by the verify
subcommand. Most of the tests check the full output matches the exact
error we inserted, but since our OID order test triggers incorrect
fanout values (with possibly different numbers of output lines) we
focus only that the correct error is written in that case.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 36 
 t/t5318-commit-graph.sh | 31 +++
 2 files changed, 67 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 4dfff7e752..b0fd1d5320 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -848,6 +848,9 @@ static void graph_report(const char *fmt, ...)
 
 int verify_commit_graph(struct commit_graph *g)
 {
+   uint32_t i, cur_fanout_pos = 0;
+   struct object_id prev_oid, cur_oid;
+
if (!g) {
graph_report("no commit-graph file loaded");
return 1;
@@ -862,5 +865,38 @@ int verify_commit_graph(struct commit_graph *g)
if (!g->chunk_commit_data)
graph_report("commit-graph is missing the Commit Data chunk");
 
+   if (verify_commit_graph_error)
+   return 1;
+
+   for (i = 0; i < g->num_commits; i++) {
+   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+   if (i && oidcmp(_oid, _oid) >= 0)
+   graph_report("commit-graph has incorrect oid order: %s 
then %s",
+oid_to_hex(_oid),
+oid_to_hex(_oid));
+
+   oidcpy(_oid, _oid);
+
+   while (cur_oid.hash[0] > cur_fanout_pos) {
+   uint32_t fanout_value = get_be32(g->chunk_oid_fanout + 
cur_fanout_pos);
+   if (i != fanout_value)
+   graph_report("commit-graph has incorrect fanout 
value: fanout[%d] = %u != %u",
+cur_fanout_pos, fanout_value, i);
+
+   cur_fanout_pos++;
+   }
+   }
+
+   while (cur_fanout_pos < 256) {
+   uint32_t fanout_value = get_be32(g->chunk_oid_fanout + 
cur_fanout_pos);
+
+   if (g->num_commits != fanout_value)
+   graph_report("commit-graph has incorrect fanout value: 
fanout[%d] = %u != %u",
+cur_fanout_pos, fanout_value, i);
+
+   cur_fanout_pos++;
+   }
+
return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 0cb88232fa..6fb306b0da 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -293,4 +293,35 @@ test_expect_success 'detect too small chunk-count' '
grep "missing the Commit Data chunk" verify-errors
 '
 
+test_expect_success 'detect incorrect chunk lookup value' '
+   cd "$TRASH_DIRECTORY/full" &&
+   cp $objdir/info/commit-graph commit-graph-backup &&
+   test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+   corrupt_data $objdir/info/commit-graph 25 "\01" &&
+   test_must_fail git commit-graph verify 2>err &&
+   grep -v "^\+" err > verify-errors &&
+   test_line_count = 1 verify-errors &&
+   grep "improper chunk offset" verify-errors
+'
+
+test_expect_success 'detect incorrect fanout' '
+   cd "$TRASH_DIRECTORY/full" &&
+   cp $objdir/info/commit-graph commit-graph-backup &&
+   test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+   corrupt_data $objdir/info/commit-graph 128 "\01" &&
+   test_must_fail git commit-graph verify 2>err &&
+   grep -v "^\+" err > verify-errors &&
+   test_line_count = 1 verify-errors &&
+   grep "fanout value" verify-errors
+'
+
+test_expect_success 'detect incorrect OID order' '
+   cd "$TRASH_DIRECTORY/full" &&
+   cp $objdir/info/commit-graph commit-graph-backup &&
+   test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+   corrupt_data $objdir/info/commit-graph 1272 "\01" &&
+   test_must_fail git commit-graph verify 2>err &&
+   grep "incorrect oid order" err
+'
+
 test_done
-- 
2.16.2.329.gfb62395de6

[PATCH v2 04/12] commit-graph: parse commit from chosen graph

Before verifying a commit-graph file against the object database, we
need to parse all commits from the given commit-graph file. Create
parse_commit_in_graph_one() to target a given struct commit_graph.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index d2db20e49a..4dfff7e752 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -309,7 +309,7 @@ static int find_commit_in_graph(struct commit *item, struct 
commit_graph *g, uin
}
 }
 
-int parse_commit_in_graph(struct commit *item)
+int parse_commit_in_graph_one(struct commit_graph *g, struct commit *item)
 {
uint32_t pos;
 
@@ -317,9 +317,21 @@ int parse_commit_in_graph(struct commit *item)
return 0;
if (item->object.parsed)
return 1;
+
+   if (find_commit_in_graph(item, g, ))
+   return fill_commit_in_graph(item, g, pos);
+
+   return 0;
+}
+
+int parse_commit_in_graph(struct commit *item)
+{
+   if (!core_commit_graph)
+   return 0;
+
prepare_commit_graph();
-   if (commit_graph && find_commit_in_graph(item, commit_graph, ))
-   return fill_commit_in_graph(item, commit_graph, pos);
+   if (commit_graph)
+   return parse_commit_in_graph_one(commit_graph, item);
return 0;
 }
 
-- 
2.16.2.329.gfb62395de6

[PATCH v2 12/12] commit-graph: update design document

The commit-graph feature is now integrated with 'fsck' and 'gc',
so remove those items from the "Future Work" section of the
commit-graph design document.

Also remove the section on lazy-loading trees, as that was completed
in an earlier patch series.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph.txt | 22 --
 1 file changed, 22 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt 
b/Documentation/technical/commit-graph.txt
index e1a883eb46..c664acbd76 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -118,9 +118,6 @@ Future Work
 - The commit graph feature currently does not honor commit grafts. This can
   be remedied by duplicating or refactoring the current graft logic.
 
-- The 'commit-graph' subcommand does not have a "verify" mode that is
-  necessary for integration with fsck.
-
 - After computing and storing generation numbers, we must make graph
   walks aware of generation numbers to gain the performance benefits they
   enable. This will mostly be accomplished by swapping a commit-date-ordered
@@ -130,25 +127,6 @@ Future Work
 - 'log --topo-order'
 - 'tag --merged'
 
-- Currently, parse_commit_gently() requires filling in the root tree
-  object for a commit. This passes through lookup_tree() and consequently
-  lookup_object(). Also, it calls lookup_commit() when loading the parents.
-  These method calls check the ODB for object existence, even if the
-  consumer does not need the content. For example, we do not need the
-  tree contents when computing merge bases. Now that commit parsing is
-  removed from the computation time, these lookup operations are the
-  slowest operations keeping graph walks from being fast. Consider
-  loading these objects without verifying their existence in the ODB and
-  only loading them fully when consumers need them. Consider a method
-  such as "ensure_tree_loaded(commit)" that fully loads a tree before
-  using commit->tree.
-
-- The current design uses the 'commit-graph' subcommand to generate the graph.
-  When this feature stabilizes enough to recommend to most users, we should
-  add automatic graph writes to common operations that create many commits.
-  For example, one could compute a graph on 'clone', 'fetch', or 'repack'
-  commands.
-
 - A server could provide a commit graph file as part of the network protocol
   to avoid extra calculations by clients. This feature is only of benefit if
   the user is willing to trust the file, because verifying the file is correct
-- 
2.16.2.329.gfb62395de6

[PATCH v2 09/12] fsck: verify commit-graph

If core.commitGraph is true, verify the contents of the commit-graph
during 'git fsck' using the 'git commit-graph verify' subcommand. Run
this check on all alternates, as well.

We use a new process for two reasons:

1. The subcommand decouples the details of loading and verifying a
   commit-graph file from the other fsck details.

2. The commit-graph verification requires the commits to be loaded
   in a specific order to guarantee we parse from the commit-graph
   file for some objects and from the object database for others.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-fsck.txt |  3 +++
 builtin/fsck.c | 21 +
 t/t5318-commit-graph.sh| 21 ++---
 3 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index b9f060e3b2..ab9a93fb9b 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -110,6 +110,9 @@ Any corrupt objects you will have to find in backups or 
other archives
 (i.e., you can just remove them and do an 'rsync' with some other site in
 the hopes that somebody else has the object you have corrupted).
 
+If core.commitGraph is true, the commit-graph file will also be inspected
+using 'git commit-graph verify'. See linkgit:git-commit-graph[1].
+
 Extracted Diagnostics
 -
 
diff --git a/builtin/fsck.c b/builtin/fsck.c
index ef78c6c00c..a6d5045b77 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -16,6 +16,7 @@
 #include "streaming.h"
 #include "decorate.h"
 #include "packfile.h"
+#include "run-command.h"
 
 #define REACHABLE 0x0001
 #define SEEN  0x0002
@@ -45,6 +46,7 @@ static int name_objects;
 #define ERROR_REACHABLE 02
 #define ERROR_PACK 04
 #define ERROR_REFS 010
+#define ERROR_COMMIT_GRAPH 020
 
 static const char *describe_object(struct object *obj)
 {
@@ -815,5 +817,24 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
}
 
check_connectivity();
+
+   if (core_commit_graph) {
+   struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
+   const char *verify_argv[] = { "commit-graph", "verify", NULL, 
NULL, NULL, NULL };
+   commit_graph_verify.argv = verify_argv;
+   commit_graph_verify.git_cmd = 1;
+
+   if (run_command(_graph_verify))
+   errors_found |= ERROR_COMMIT_GRAPH;
+
+   prepare_alt_odb();
+   for (alt = alt_odb_list; alt; alt = alt->next) {
+   verify_argv[2] = "--object-dir";
+   verify_argv[3] = alt->path;
+   if (run_command(_graph_verify))
+   errors_found |= ERROR_COMMIT_GRAPH;
+   }
+   }
+
return errors_found;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 5ab268a024..91c8406d97 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -205,6 +205,16 @@ test_expect_success 'build graph from commits with append' 
'
 graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph using --reachable' '
+   cd "$TRASH_DIRECTORY/full" &&
+   git commit-graph write --reachable &&
+   test_path_is_file $objdir/info/commit-graph &&
+   graph_read_expect "11" "large_edges"
+'
+
+graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
cd "$TRASH_DIRECTORY" &&
git clone --bare --no-local full bare &&
@@ -335,7 +345,7 @@ test_expect_success 'detect OID not in object database' '
cd "$TRASH_DIRECTORY/full" &&
cp $objdir/info/commit-graph commit-graph-backup &&
test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
-   corrupt_data $objdir/info/commit-graph 1134 "\01" &&
+   corrupt_data $objdir/info/commit-graph 1134 "\00" &&
test_must_fail git commit-graph verify 2>err &&
grep -v "^\+" err > verify-errors &&
test_line_count = 3 verify-errors &&
@@ -348,7 +358,7 @@ test_expect_success 'detect incorrect tree OID' '
cd "$TRASH_DIRECTORY/full" &&
cp $objdir/info/commit-graph commit-graph-backup &&
test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
-   corrupt_data $objdir/info/commit-graph 1312 "\01" &&
+   corrupt_data $objdir/info/commit-graph 1312 "\00" &&
test_must_fail git commit-graph verify 2>err &&
grep -v "^\+" err > verify-errors &&
test_line_count = 1 verify-errors &&
@@ -382,10 +392,15 @@ test_expect_success 'detect incorrect commit date and 
generation number' '
cp $objdir/info/commit-graph commit-graph-backup &&
test_when_finished

[PATCH v2 06/12] commit: force commit to parse from object database

In anticipation of verifying commit-graph file contents against the
object database, create parse_commit_internal() to allow side-stepping
the commit-graph file and parse directly from the object database.

Due to the use of generation numbers, this method should not be called
unless the intention is explicit in avoiding commits from the
commit-graph file.

Signed-off-by: Derrick Stolee 
---
 commit.c | 13 +
 commit.h |  1 +
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/commit.c b/commit.c
index 1d28677dfb..7c92350373 100644
--- a/commit.c
+++ b/commit.c
@@ -392,7 +392,7 @@ int parse_commit_buffer(struct commit *item, const void 
*buffer, unsigned long s
return 0;
 }
 
-int parse_commit_gently(struct commit *item, int quiet_on_missing)
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph)
 {
enum object_type type;
void *buffer;
@@ -403,17 +403,17 @@ int parse_commit_gently(struct commit *item, int 
quiet_on_missing)
return -1;
if (item->object.parsed)
return 0;
-   if (parse_commit_in_graph(item))
+   if (use_commit_graph && parse_commit_in_graph(item))
return 0;
buffer = read_sha1_file(item->object.oid.hash, , );
if (!buffer)
return quiet_on_missing ? -1 :
error("Could not read %s",
-oid_to_hex(>object.oid));
+   oid_to_hex(>object.oid));
if (type != OBJ_COMMIT) {
free(buffer);
return error("Object %s not a commit",
-oid_to_hex(>object.oid));
+   oid_to_hex(>object.oid));
}
ret = parse_commit_buffer(item, buffer, size, 0);
if (save_commit_buffer && !ret) {
@@ -424,6 +424,11 @@ int parse_commit_gently(struct commit *item, int 
quiet_on_missing)
return ret;
 }
 
+int parse_commit_gently(struct commit *item, int quiet_on_missing)
+{
+   return parse_commit_internal(item, quiet_on_missing, 1);
+}
+
 void parse_commit_or_die(struct commit *item)
 {
if (parse_commit(item))
diff --git a/commit.h b/commit.h
index b5afde1ae9..5fde74fcd7 100644
--- a/commit.h
+++ b/commit.h
@@ -73,6 +73,7 @@ struct commit *lookup_commit_reference_by_name(const char 
*name);
 struct commit *lookup_commit_or_die(const struct object_id *oid, const char 
*ref_name);
 
 int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long 
size, int check_graph);
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph);
 int parse_commit_gently(struct commit *item, int quiet_on_missing);
 static inline int parse_commit(struct commit *item)
 {
-- 
2.16.2.329.gfb62395de6

[PATCH v2 07/12] commit-graph: load a root tree from specific graph

When lazy-loading a tree for a commit, it will be important to select
the tree from a specific struct commit_graph. Create a new method that
specifies the commit-graph file and use that in
get_commit_tree_in_graph().

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index b0fd1d5320..5bb93e533c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -357,14 +357,20 @@ static struct tree *load_tree_for_commit(struct 
commit_graph *g, struct commit *
return c->maybe_tree;
 }
 
-struct tree *get_commit_tree_in_graph(const struct commit *c)
+static struct tree *get_commit_tree_in_graph_one(struct commit_graph *g,
+const struct commit *c)
 {
if (c->maybe_tree)
return c->maybe_tree;
if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
BUG("get_commit_tree_in_graph called from non-commit-graph 
commit");
 
-   return load_tree_for_commit(commit_graph, (struct commit *)c);
+   return load_tree_for_commit(g, (struct commit *)c);
+}
+
+struct tree *get_commit_tree_in_graph(const struct commit *c)
+{
+   return get_commit_tree_in_graph_one(commit_graph, c);
 }
 
 static void write_graph_chunk_fanout(struct hashfile *f,
-- 
2.16.2.329.gfb62395de6

[PATCH v2 11/12] gc: automatically write commit-graph files

The commit-graph file is a very helpful feature for speeding up git
operations. In order to make it more useful, write the commit-graph file
by default during standard garbage collection operations.

Add a 'gc.commitGraph' config setting that triggers writing a
commit-graph file after any non-trivial 'git gc' command. Defaults to
false while the commit-graph feature matures. We specifically do not
want to turn this on by default until the commit-graph feature is fully
integrated with history-modifying features like shallow clones.

Signed-off-by: Derrick Stolee 
---
 Documentation/config.txt | 6 ++
 Documentation/git-gc.txt | 4 
 builtin/gc.c | 8 
 3 files changed, 18 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 11f027194e..9a3abd87e7 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1553,6 +1553,12 @@ gc.autoDetach::
Make `git gc --auto` return immediately and run in background
if the system supports it. Default is true.
 
+gc.commitGraph::
+   If true, then gc will rewrite the commit-graph file after any
+   change to the object database. If '--auto' is used, then the
+   commit-graph will not be updated unless the threshold is met.
+   See linkgit:git-commit-graph[1] for details.
+
 gc.logExpiry::
If the file gc.log exists, then `git gc --auto` won't run
unless that file is more than 'gc.logExpiry' old.  Default is
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 571b5a7e3c..17dd654a59 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -119,6 +119,10 @@ The optional configuration variable `gc.packRefs` 
determines if
 it within all non-bare repos or it can be set to a boolean value.
 This defaults to true.
 
+The optional configuration variable 'gc.commitGraph' determines if
+'git gc' runs 'git commit-graph write'. This can be set to a boolean
+value. This defaults to false.
+
 The optional configuration variable `gc.aggressiveWindow` controls how
 much time is spent optimizing the delta compression of the objects in
 the repository when the --aggressive option is specified.  The larger
diff --git a/builtin/gc.c b/builtin/gc.c
index 77fa720bd0..8403445738 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -34,6 +34,7 @@ static int aggressive_depth = 50;
 static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
+static int gc_commit_graph = 0;
 static int detach_auto = 1;
 static timestamp_t gc_log_expire_time;
 static const char *gc_log_expire = "1.day.ago";
@@ -46,6 +47,7 @@ static struct argv_array repack = ARGV_ARRAY_INIT;
 static struct argv_array prune = ARGV_ARRAY_INIT;
 static struct argv_array prune_worktrees = ARGV_ARRAY_INIT;
 static struct argv_array rerere = ARGV_ARRAY_INIT;
+static struct argv_array commit_graph = ARGV_ARRAY_INIT;
 
 static struct tempfile *pidfile;
 static struct lock_file log_lock;
@@ -121,6 +123,7 @@ static void gc_config(void)
git_config_get_int("gc.aggressivedepth", _depth);
git_config_get_int("gc.auto", _auto_threshold);
git_config_get_int("gc.autopacklimit", _auto_pack_limit);
+   git_config_get_bool("gc.commitgraph", _commit_graph);
git_config_get_bool("gc.autodetach", _auto);
git_config_get_expiry("gc.pruneexpire", _expire);
git_config_get_expiry("gc.worktreepruneexpire", 
_worktrees_expire);
@@ -374,6 +377,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
argv_array_pushl(, "prune", "--expire", NULL);
argv_array_pushl(_worktrees, "worktree", "prune", "--expire", 
NULL);
argv_array_pushl(, "rerere", "gc", NULL);
+   argv_array_pushl(_graph, "commit-graph", "write", "--reachable", 
NULL);
 
/* default expiry time, overwritten in gc_config */
gc_config();
@@ -480,6 +484,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
if (pack_garbage.nr > 0)
clean_pack_garbage();
 
+   if (gc_commit_graph)
+   if (run_command_v_opt(commit_graph.argv, RUN_GIT_CMD))
+   return error(FAILED_RUN, commit_graph.argv[0]);
+
if (auto_gc && too_many_loose_objects())
warning(_("There are too many unreachable loose objects; "
"run 'git prune' to remove them."));
-- 
2.16.2.329.gfb62395de6

[PATCH v2 10/12] commit-graph: add '--reachable' option

When writing commit-graph files, it can be convenient to ask for all
reachable commits (starting at the ref set) in the resulting file. This
is particularly helpful when writing to stdin is complicated, such as a
future integration with 'git gc' which will call
'git commit-graph write --reachable' after performing cleanup of the
object database.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  8 ++--
 builtin/commit-graph.c | 41 ++
 2 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index a222cfab08..cc1715a823 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -38,12 +38,16 @@ Write a commit graph file based on the commits found in 
packfiles.
 +
 With the `--stdin-packs` option, generate the new commit graph by
 walking objects only in the specified pack-indexes. (Cannot be combined
-with --stdin-commits.)
+with --stdin-commits or --reachable.)
 +
 With the `--stdin-commits` option, generate the new commit graph by
 walking commits starting at the commits specified in stdin as a list
 of OIDs in hex, one OID per line. (Cannot be combined with
---stdin-packs.)
+--stdin-packs or --reachable.)
++
+With the `--reachable` option, generate the new commit graph by walking
+commits starting at all refs. (Cannot be combined with --stdin-commits
+or --stind-packs.)
 +
 With the `--append` option, include all commits that are present in the
 existing commit-graph file.
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index af3101291f..7cb94a4813 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -3,13 +3,14 @@
 #include "dir.h"
 #include "lockfile.h"
 #include "parse-options.h"
+#include "refs.h"
 #include "commit-graph.h"
 
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
N_("git commit-graph verify [--object-dir ]"),
-   N_("git commit-graph write [--object-dir ] [--append] 
[--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits]"),
NULL
 };
 
@@ -24,12 +25,13 @@ static const char * const builtin_commit_graph_read_usage[] 
= {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-   N_("git commit-graph write [--object-dir ] [--append] 
[--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits]"),
NULL
 };
 
 static struct opts_commit_graph {
const char *obj_dir;
+   int reachable;
int stdin_packs;
int stdin_commits;
int append;
@@ -113,6 +115,25 @@ static int graph_read(int argc, const char **argv)
return 0;
 }
 
+struct hex_list {
+   char **hex_strs;
+   int hex_nr;
+   int hex_alloc;
+};
+
+static int add_ref_to_list(const char *refname,
+  const struct object_id *oid,
+  int flags, void *cb_data)
+{
+   struct hex_list *list = (struct hex_list*)cb_data;
+
+   ALLOC_GROW(list->hex_strs, list->hex_nr + 1, list->hex_alloc);
+   list->hex_strs[list->hex_nr] = xcalloc(GIT_MAX_HEXSZ + 1, 1);
+   strcpy(list->hex_strs[list->hex_nr], oid_to_hex(oid));
+   list->hex_nr++;
+   return 0;
+}
+
 static int graph_write(int argc, const char **argv)
 {
const char **pack_indexes = NULL;
@@ -127,6 +148,8 @@ static int graph_write(int argc, const char **argv)
OPT_STRING(0, "object-dir", _dir,
N_("dir"),
N_("The object directory to store the graph")),
+   OPT_BOOL(0, "reachable", ,
+   N_("start walk at all refs")),
OPT_BOOL(0, "stdin-packs", _packs,
N_("scan pack-indexes listed by stdin for commits")),
OPT_BOOL(0, "stdin-commits", _commits,
@@ -140,8 +163,8 @@ static int graph_write(int argc, const char **argv)
 builtin_commit_graph_write_options,
 builtin_commit_graph_write_usage, 0);
 
-   if (opts.stdin_packs && opts.stdin_commits)
-   die(_("cannot use both --stdin-commits and --stdin-packs"));
+   if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1)
+   die(_("use at most one of --reachable, --stdin-commits, or 
--stdin-packs"));
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();
 
@@ -164,6 +187,16 @@ static int graph_write(int argc, const char **argv)
commit_hex = lines;
commits_nr = lines_nr;
}
+   } else if (opts.reachable) {
+   struct hex_list list;

[PATCH v2 08/12] commit-graph: verify commit contents against odb

When running 'git commit-graph verify', compare the contents of the
commits that are loaded from the commit-graph file with commits that are
loaded directly from the object database. This includes checking the
root tree object ID, commit date, and parents.

Parse the commit from the graph during the initial loop through the
object IDs to guarantee we parse from the commit-graph file.

In addition, verify the generation number calculation is correct for all
commits in the commit-graph file.

While testing, we discovered that mutating the integer value for a
parent to be outside the accepted range causes a segmentation fault. Add
a new check in insert_parent_or_die() that prevents this fault. Check
for that error during the test, both in the typical parents and in the
list of parents for octopus merges.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 100 
 t/t5318-commit-graph.sh |  64 +++
 2 files changed, 164 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 5bb93e533c..a15ad9710d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -237,6 +237,10 @@ static struct commit_list **insert_parent_or_die(struct 
commit_graph *g,
 {
struct commit *c;
struct object_id oid;
+
+   if (pos >= g->num_commits)
+   die("invalide parent position %"PRIu64, pos);
+
hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
c = lookup_commit();
if (!c)
@@ -875,6 +879,8 @@ int verify_commit_graph(struct commit_graph *g)
return 1;
 
for (i = 0; i < g->num_commits; i++) {
+   struct commit *graph_commit;
+
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
if (i && oidcmp(_oid, _oid) >= 0)
@@ -892,6 +898,10 @@ int verify_commit_graph(struct commit_graph *g)
 
cur_fanout_pos++;
}
+
+   graph_commit = lookup_commit(_oid);
+   if (!parse_commit_in_graph_one(g, graph_commit))
+   graph_report("failed to parse %s from commit-graph", 
oid_to_hex(_oid));
}
 
while (cur_fanout_pos < 256) {
@@ -904,5 +914,95 @@ int verify_commit_graph(struct commit_graph *g)
cur_fanout_pos++;
}
 
+   if (verify_commit_graph_error)
+   return 1;
+
+   for (i = 0; i < g->num_commits; i++) {
+   struct commit *graph_commit, *odb_commit;
+   struct commit_list *graph_parents, *odb_parents;
+   int num_parents = 0;
+
+   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+   graph_commit = lookup_commit(_oid);
+   odb_commit = (struct commit *)create_object(cur_oid.hash, 
alloc_commit_node());
+   if (parse_commit_internal(odb_commit, 0, 0)) {
+   graph_report("failed to parse %s from object database", 
oid_to_hex(_oid));
+   continue;
+   }
+
+   if (oidcmp(_commit_tree_in_graph_one(g, 
graph_commit)->object.oid,
+  get_commit_tree_oid(odb_commit)))
+   graph_report("root tree object ID for commit %s in 
commit-graph is %s != %s",
+oid_to_hex(_oid),
+
oid_to_hex(get_commit_tree_oid(graph_commit)),
+
oid_to_hex(get_commit_tree_oid(odb_commit)));
+
+   if (graph_commit->date != odb_commit->date)
+   graph_report("commit date for commit %s in commit-graph 
is %"PRItime" != %"PRItime"",
+oid_to_hex(_oid),
+graph_commit->date,
+odb_commit->date);
+
+
+   graph_parents = graph_commit->parents;
+   odb_parents = odb_commit->parents;
+
+   while (graph_parents) {
+   num_parents++;
+
+   if (odb_parents == NULL)
+   graph_report("commit-graph parent list for 
commit %s is too long (%d)",
+oid_to_hex(_oid),
+num_parents);
+
+   if (oidcmp(_parents->item->object.oid, 
_parents->item->object.oid))
+   graph_report("commit-graph parent for %s is %s 
!= %s",
+oid_to_hex(_oid),
+
oid_to_hex(_parents->item->object.oid),
+
oid_to_hex(_parents->item->object.oid));
+
+   graph_parents = graph_parents->next;
+   odb_parents = odb_parents->next;
+   }
+
+   if (odb_parents != NULL)
+

[PATCH v2 03/12] commit-graph: test that 'verify' finds corruption

Add test cases to t5318-commit-graph.sh that corrupt the commit-graph
file and check that the 'git commit-graph verify' command fails. These
tests verify the header and chunk information is checked carefully.

Helped-by: Martin Ågren 
Signed-off-by: Derrick Stolee 
---
 t/t5318-commit-graph.sh | 53 +
 1 file changed, 53 insertions(+)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 6ca451dfd2..0cb88232fa 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -240,4 +240,57 @@ test_expect_success 'git commit-graph verify' '
git commit-graph verify >output
 '
 
+# usage: corrupt_data   []
+corrupt_data() {
+   file=$1
+   pos=$2
+   data="${3:-\0}"
+   printf "$data" | dd of="$file" bs=1 seek="$pos" conv=notrunc
+}
+
+test_expect_success 'detect bad signature' '
+   cd "$TRASH_DIRECTORY/full" &&
+   cp $objdir/info/commit-graph commit-graph-backup &&
+   test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+   corrupt_data $objdir/info/commit-graph 0 "\0" &&
+   test_must_fail git commit-graph verify 2>err &&
+   grep -v "^\+" err > verify-errors &&
+   test_line_count = 1 verify-errors &&
+   grep "graph signature" verify-errors
+'
+
+test_expect_success 'detect bad version number' '
+   cd "$TRASH_DIRECTORY/full" &&
+   cp $objdir/info/commit-graph commit-graph-backup &&
+   test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+   corrupt_data $objdir/info/commit-graph 4 "\02" &&
+   test_must_fail git commit-graph verify 2>err &&
+   grep -v "^\+" err > verify-errors &&
+   test_line_count = 1 verify-errors &&
+   grep "graph version" verify-errors
+'
+
+test_expect_success 'detect bad hash version' '
+   cd "$TRASH_DIRECTORY/full" &&
+   cp $objdir/info/commit-graph commit-graph-backup &&
+   test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+   corrupt_data $objdir/info/commit-graph 5 "\02" &&
+   test_must_fail git commit-graph verify 2>err &&
+   grep -v "^\+" err > verify-errors &&
+   test_line_count = 1 verify-errors &&
+   grep "hash version" verify-errors
+'
+
+test_expect_success 'detect too small chunk-count' '
+   cd "$TRASH_DIRECTORY/full" &&
+   cp $objdir/info/commit-graph commit-graph-backup &&
+   test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+   corrupt_data $objdir/info/commit-graph 6 "\01" &&
+   test_must_fail git commit-graph verify 2>err &&
+   grep -v "^\+" err > verify-errors &&
+   test_line_count = 2 verify-errors &&
+   grep "missing the OID Lookup chunk" verify-errors &&
+   grep "missing the Commit Data chunk" verify-errors
+'
+
 test_done
-- 
2.16.2.329.gfb62395de6

[PATCH v2 02/12] commit-graph: verify file header information

During a run of 'git commit-graph verify', list the issues with the
header information in the commit-graph file. Some of this information
is inferred from the loaded 'struct commit_graph'. Some header
information is checked as part of load_commit_graph_one().

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index b25aaed128..d2db20e49a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -818,7 +818,37 @@ void write_commit_graph(const char *obj_dir,
oids.nr = 0;
 }
 
+static int verify_commit_graph_error;
+
+static void graph_report(const char *fmt, ...)
+{
+   va_list ap;
+   struct strbuf sb = STRBUF_INIT;
+   verify_commit_graph_error = 1;
+
+   va_start(ap, fmt);
+   strbuf_vaddf(, fmt, ap);
+
+   fprintf(stderr, "%s\n", sb.buf);
+   strbuf_release();
+   va_end(ap);
+}
+
 int verify_commit_graph(struct commit_graph *g)
 {
-   return !g;
+   if (!g) {
+   graph_report("no commit-graph file loaded");
+   return 1;
+   }
+
+   verify_commit_graph_error = 0;
+
+   if (!g->chunk_oid_fanout)
+   graph_report("commit-graph is missing the OID Fanout chunk");
+   if (!g->chunk_oid_lookup)
+   graph_report("commit-graph is missing the OID Lookup chunk");
+   if (!g->chunk_commit_data)
+   graph_report("commit-graph is missing the Commit Data chunk");
+
+   return verify_commit_graph_error;
 }
-- 
2.16.2.329.gfb62395de6

[PATCH v2 01/12] commit-graph: add 'verify' subcommand

If the commit-graph file becomes corrupt, we need a way to verify
that its contents match the object database. In the manner of
'git fsck' we will implement a 'git commit-graph verify' subcommand
to report all issues with the file.

Add the 'verify' subcommand to the 'commit-graph' builtin and its
documentation. The subcommand is currently a no-op except for
loading the commit-graph into memory, which may trigger run-time
errors that would be caught by normal use. Add a simple test that
ensures the command returns a zero error code.

If no commit-graph file exists, this is an acceptable state. Do
not report any errors.

During review, we noticed that a FREE_AND_NULL(graph_name) was
placed after a possible 'return', and this pattern was also in
graph_read(). Fix that case, too.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  6 ++
 builtin/commit-graph.c | 40 +-
 commit-graph.c |  5 +
 commit-graph.h |  2 ++
 t/t5318-commit-graph.sh| 10 ++
 5 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 4c97b555cc..a222cfab08 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -10,6 +10,7 @@ SYNOPSIS
 
 [verse]
 'git commit-graph read' [--object-dir ]
+'git commit-graph verify' [--object-dir ]
 'git commit-graph write'  [--object-dir ]
 
 
@@ -52,6 +53,11 @@ existing commit-graph file.
 Read a graph file given by the commit-graph file and output basic
 details about the graph file. Used for debugging purposes.
 
+'verify'::
+
+Read the commit-graph file and verify its contents against the object
+database. Used to check for corrupted data.
+
 
 EXAMPLES
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 37420ae0fd..af3101291f 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,10 +8,16 @@
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
+   N_("git commit-graph verify [--object-dir ]"),
N_("git commit-graph write [--object-dir ] [--append] 
[--stdin-packs|--stdin-commits]"),
NULL
 };
 
+static const char * const builtin_commit_graph_verify_usage[] = {
+   N_("git commit-graph verify [--object-dir ]"),
+   NULL
+};
+
 static const char * const builtin_commit_graph_read_usage[] = {
N_("git commit-graph read [--object-dir ]"),
NULL
@@ -29,6 +35,36 @@ static struct opts_commit_graph {
int append;
 } opts;
 
+
+static int graph_verify(int argc, const char **argv)
+{
+   struct commit_graph *graph = 0;
+   char *graph_name;
+
+   static struct option builtin_commit_graph_verify_options[] = {
+   OPT_STRING(0, "object-dir", _dir,
+  N_("dir"),
+  N_("The object directory to store the graph")),
+   OPT_END(),
+   };
+
+   argc = parse_options(argc, argv, NULL,
+builtin_commit_graph_verify_options,
+builtin_commit_graph_verify_usage, 0);
+
+   if (!opts.obj_dir)
+   opts.obj_dir = get_object_directory();
+
+   graph_name = get_commit_graph_filename(opts.obj_dir);
+   graph = load_commit_graph_one(graph_name);
+   FREE_AND_NULL(graph_name);
+
+   if (!graph)
+   return 0;
+
+   return verify_commit_graph(graph);
+}
+
 static int graph_read(int argc, const char **argv)
 {
struct commit_graph *graph = NULL;
@@ -50,10 +86,10 @@ static int graph_read(int argc, const char **argv)
 
graph_name = get_commit_graph_filename(opts.obj_dir);
graph = load_commit_graph_one(graph_name);
+   FREE_AND_NULL(graph_name);
 
if (!graph)
die("graph file %s does not exist", graph_name);
-   FREE_AND_NULL(graph_name);
 
printf("header: %08x %d %d %d %d\n",
ntohl(*(uint32_t*)graph->data),
@@ -160,6 +196,8 @@ int cmd_commit_graph(int argc, const char **argv, const 
char *prefix)
 PARSE_OPT_STOP_AT_NON_OPTION);
 
if (argc > 0) {
+   if (!strcmp(argv[0], "verify"))
+   return graph_verify(argc, argv);
if (!strcmp(argv[0], "read"))
return graph_read(argc, argv);
if (!strcmp(argv[0], "write"))
diff --git a/commit-graph.c b/commit-graph.c
index a8c337dd77..b25aaed128 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -817,3 +817,8 @@ void write_commit_graph(const char *obj_dir,
oids.alloc = 0;
oids.nr = 0;
 }
+
+int verify_commit_graph(struct commit_graph *g)
+{
+   return !g;
+}
diff --git a/commit-graph.h b/commit-graph.h
index

[PATCH v2 00/12] Integrate commit-graph into fsck and gc

I'm sending this v2 re-roll rather quickly after the previous version
because Martin provided a framework to add tests to the 'verify'
subcommand. I took that framework and added tests for the other checks
of the commit-graph data. This also found some interesting things about
the command:

1. There were some segfaults because we were not checking for bad data
   carefully enough.

2. To avoid segfaults, we will now terminate the check early if we find
   problems earlier in the file, such as in the header, or OID lookup.

3. We were not writing newlines between reports. This now happens by
   default in graph_report().

The integration into 'fetch' is dropped (thanks Ævar!).

Derrick Stolee (12):
  commit-graph: add 'verify' subcommand
  commit-graph: verify file header information
  commit-graph: test that 'verify' finds corruption
  commit-graph: parse commit from chosen graph
  commit-graph: verify fanout and lookup table
  commit: force commit to parse from object database
  commit-graph: load a root tree from specific graph
  commit-graph: verify commit contents against odb
  fsck: verify commit-graph
  commit-graph: add '--reachable' option
  gc: automatically write commit-graph files
  commit-graph: update design document

 Documentation/config.txt |   6 +
 Documentation/git-commit-graph.txt   |  14 ++-
 Documentation/git-fsck.txt   |   3 +
 Documentation/git-gc.txt |   4 +
 Documentation/technical/commit-graph.txt |  22 
 builtin/commit-graph.c   |  81 -
 builtin/fsck.c   |  21 
 builtin/gc.c |   8 ++
 commit-graph.c   | 199 ++-
 commit-graph.h   |   2 +
 commit.c |  13 +-
 commit.h |   1 +
 t/t5318-commit-graph.sh  | 173 +++
 13 files changed, 509 insertions(+), 38 deletions(-)


base-commit: 34fdd433396ee0e3ef4de02eb2189f8226eafe4e
-- 
2.16.2.329.gfb62395de6

Re: [PATCH] alloc: allow arbitrary repositories for alloc functions

On Fri, May 11, 2018 at 3:17 PM Stefan Beller  wrote:
> diff --git a/commit.c b/commit.c
> @@ -296,6 +297,17 @@ void free_commit_buffer(struct commit *commit)
> +void relase_commit_memory(struct commit *c)

s/relase/release/

RE: [Best Practices Request] clean/smudge configuration

2018-05-11 Thread Randall S. Becker



On May 10, 2018 10:27 PM, Junio C Hamano wrote:
> "Randall S. Becker"  writes:
> 
> > What if we create a ../.gitconfig like ../.gitattributes, that is
> > loaded before .git/config?
> 
> You should not forget one of the two reasons why clean/smudge/diff etc.
> drivers must be given with a step of redirection, split between attributes and
> config.  "This path holds text from ancient macs" at the logical level may be
> expressed in .gitattributes, and then "when checking out text from ancient
> macs, process the blob data with this program to turn it into a form suitable
> for working tree" is given as a smudge filter program, that is (1) suitable 
> for
> the platform _you_ as the writer of the config file is using *AND* (2)
> something you are confortable running on behalf of you.
> 
> We *deliberately* lack a mechanism to pay attention to in-tree config that
> gets propagated to those who get them via "git clone", exactly because
> otherwise your upstream may not just specify a "smudge" program that your
> platform would be unable to run, but can specify a "smudge" program that
> pretends to be a smudger, but is actually a program that installs a keylogger
> and posts your login password and bank details to this mailing list ;-)
> 
> So, no, I do not think in-tree configuration will fly.

I agree, which is why I asked the original question instead of posting it as a 
formal patch. We would probably get a brand new CVE implementing the proposed 
proto-changes even if the original intent was internal trusted repositories 
only. That's why I asked this as a "Best Practices" type question, which I 
think I have a better idea on now. The actual situation is rather cool from a 
DevOps point of view, and whatever the ultimate solution is, might make for a 
nice presentation at some future conference .

Cheers and thanks,
Randall

[PATCH] alloc: allow arbitrary repositories for alloc functions

We have to convert all of the alloc functions at once, because alloc_report
uses a funky macro for reporting. It is better for the sake of mechanical
conversion to convert multiple functions at once rather than changing the
structure of the reporting function.

We record all memory allocation in alloc.c, and free them in
clear_alloc_state, which is called for all repositories except
the_repository.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---

> This might seem a bit bikesheddy, but I wouldn't call it
> free_tag_buffer(), since what's being freed is not the buffer of the
> object itself, but just a string. If you want such a function, I would
> just call it release_tag_memory() to match release_commit_memory().

So you would replace the last commit with a patch like this?

Thanks,
Stefan

Notes:
diff to what is currently queued:
diff --git c/commit.c w/commit.c
index 612ccf7b053..f3a5872c393 100644
--- c/commit.c
+++ w/commit.c
@@ -297,11 +297,15 @@ void free_commit_buffer(struct commit *commit)
}
 }

-void release_commit_memory(struct commit *c)
+void relase_commit_memory(struct commit *c)
 {
+   c->tree = NULL;
+   c->index = 0;
free_commit_buffer(c);
free_commit_list(c->parents);
/* TODO: what about commit->util? */
+
+   c->object.parsed = 0;
 }

 const void *detach_commit_buffer(struct commit *commit, unsigned long 
*sizep)
diff --git c/commit.h w/commit.h
index 2d764ab7d8e..366c151e0cb 100644
--- c/commit.h
+++ w/commit.h
@@ -103,7 +103,7 @@ void free_commit_buffer(struct commit *);
  * Release memory related to a commit, including the parent list and
  * any cached object buffer.
  */
-void release_commit_memory(struct commit *c);
+void relase_commit_memory(struct commit *c);

 /*
  * Disassociate any cached object buffer from the commit, but do not free 
it.
diff --git c/object.c w/object.c
index 9d5b10d5a20..a7d1fd4a20b 100644
--- c/object.c
+++ w/object.c
@@ -526,9 +526,9 @@ void parsed_object_pool_clear(struct parsed_object_pool 
*o)
if (obj->type == OBJ_TREE)
free_tree_buffer((struct tree*)obj);
else if (obj->type == OBJ_COMMIT)
-   release_commit_memory((struct commit*)obj);
+   relase_commit_memory((struct commit*)obj);
else if (obj->type == OBJ_TAG)
-   free_tag_buffer((struct tag*)obj);
+   release_tag_memory((struct tag*)obj);
}

FREE_AND_NULL(o->obj_hash);
diff --git c/tag.c w/tag.c
index 254352c30c6..7c12426b4ea 100644
--- c/tag.c
+++ w/tag.c
@@ -116,9 +116,12 @@ static timestamp_t parse_tag_date(const char *buf, 
const char *tail)
return parse_timestamp(dateptr, NULL, 10);
 }

-void free_tag_buffer(struct tag *t)
+void release_tag_memory(struct tag *t)
 {
free(t->tag);
+   t->tagged = NULL;
+   t->object.parsed = 0;
+   t->date = 0;
 }

 int parse_tag_buffer(struct tag *item, const void *data, unsigned long 
size)
diff --git c/tag.h w/tag.h
index b241fe67bc5..9057d76a506 100644
--- c/tag.h
+++ w/tag.h
@@ -15,7 +15,7 @@ struct tag {
 extern struct tag *lookup_tag(const struct object_id *oid);
 extern int parse_tag_buffer(struct tag *item, const void *data, unsigned 
long size);
 extern int parse_tag(struct tag *item);
-extern void free_tag_buffer(struct tag *t);
+extern void release_tag_memory(struct tag *t);
 extern struct object *deref_tag(struct object *, const char *, int);
 extern struct object *deref_tag_noverify(struct object *);
 extern int gpg_verify_tag(const struct object_id *oid,

 alloc.c   | 65 ++-
 alloc.h   | 19 ++
 blame.c   |  1 +
 blob.c|  1 +
 cache.h   | 16 
 commit.c  | 12 +
 commit.h  |  6 +
 merge-recursive.c |  1 +
 object.c  | 42 --
 object.h  |  8 ++
 tag.c |  9 +++
 tag.h |  1 +
 tree.c|  1 +
 13 files changed, 140 insertions(+), 42 deletions(-)
 create mode 100644 alloc.h

diff --git a/alloc.c b/alloc.c
index 277dadd221b..714df633169 100644
--- a/alloc.c
+++ b/alloc.c
@@ -4,8 +4,7 @@
  * Copyright (C) 2006 Linus Torvalds
  *
  * The standard malloc/free wastes too much space for objects, partly because
- * it maintains all the allocation infrastructure (which isn't needed, since
- * we never free an object descriptor anyway), but even more because it ends
+ * it maintains all the allocation infrastructure, but even more because it 
ends
  * up with maximal alignment because it doesn't know what the object

Re: [PATCH v2 0/4] Fix mem leaks of recent object store conversions.

On Fri, May 11, 2018 at 1:37 AM, Jeff King  wrote:
> On Thu, May 10, 2018 at 12:58:45PM -0700, Stefan Beller wrote:
>
>> This series replaces the two commits that were queued on 
>> sb/object-store-replace,
>> fixing memory leaks that were recently introduced.
>>
>> Compared to v1, I merged the two independent series from yesterday,
>> rewrote the commit message to clear up Junios confusion and addresses Peffs
>> comments for the packfiles as well.
>
> Mostly. :)
>
> My one remaining complaint is that the bitmap code may hold on to a
> dangling pointer to a packed_git after this series.

Ok, I'll look into that.

>
> I think that is part of a larger problem, though, which is that the
> bitmap code's globals need to be part of the struct raw_object_store.
> I think this can already cause problems before your series if we were to
> try to use bitmaps in both a superproject and a submodule in the same
> process, though I think we'd at least hit the "ignoring extra bitmap
> file" code path in open_pack_bitmap_1(). So right now it's an annoyance,
> but after your series it becomes a potential segfault.

Ok, maybe we'll need to convert bitmaps into the object store for that.

Thanks for the pointer,
Stefan

Re: Regression in patch add?

2018-05-11 Thread Phillip Wood

On 11/05/18 03:47, Junio C Hamano wrote:
> Phillip Wood  writes:
> 
>> Yes, I think it probably makes sense to do that. Originally I didn't
>> count empty lines as context lines in case the user accidentally added
>> some empty lines at the end of the hunk but if 'git apply' does then I
>> think 'git add -p' should as well
> 
> I am not sure if "adding to the tail" should be tolerated, but in
> any case, newer GNU diff can show an empty unaffected line as an
> empty line (unlike traditional unified context format in which such
> a line is expressed as a line with a lone SP on it), which is
> allowed as "implementation defined" by POSIX.1 [*1*]. Modern "git
> apply" knows about this.

Thanks for the reference, I hadn't realized the space was optional.

> If "add -p" parses a patch, it should learn to do so, too.

I'm about to go off line for a while, I'll send a fix when I'm back up
and running at next month (unfortunately the reroll of pw/add-p-select
will have to wait until then as well)

Best Wishes

Phillip


> 
> [Reference]
> 
> *1* http://pubs.opengroup.org/onlinepubs/9699919799/utilities/diff.html
> 
>>
>>> Meanwhile, I can easily configure my editor not to do this for `*.diff` 
>>> files.
>>>
>>> Thanks for your help, Phillip and Martin!
>>
>> Thanks for posting an example so we could test it, it makes it much
>> easier to track the problem down
>>
>> Best Wishes
>>
>> Phillip
>>
>>> Mahmoud, does this also explain your problem as per your original post?
>>>

[PATCH 4/4] mark_parents_uninteresting(): avoid most allocation

Commit 941ba8db57 (Eliminate recursion in setting/clearing
marks in commit list, 2012-01-14) used a clever double-loop
to avoid allocations for single-parent chains of history.
However, it did so only when following parents of parents
(which was an uncommon case), and _always_ incurred at least
one allocation to populate the list of pending parents in
the first place.

We can turn this into zero-allocation in the common case by
iterating directly over the initial parent list, and then
following up on any pending items we might have discovered.

Signed-off-by: Jeff King 
---
Again, try "-w" for more readability.

 revision.c | 44 +---
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/revision.c b/revision.c
index 89ff9a99ce..cbe041128e 100644
--- a/revision.c
+++ b/revision.c
@@ -115,32 +115,38 @@ static void commit_stack_clear(struct commit_stack *stack)
stack->nr = stack->alloc = 0;
 }
 
-void mark_parents_uninteresting(struct commit *commit)
+static void mark_one_parent_uninteresting(struct commit *commit,
+ struct commit_stack *pending)
 {
-   struct commit_stack pending = COMMIT_STACK_INIT;
struct commit_list *l;
 
+   if (commit->object.flags & UNINTERESTING)
+   return;
+   commit->object.flags |= UNINTERESTING;
+
+   /*
+* Normally we haven't parsed the parent
+* yet, so we won't have a parent of a parent
+* here. However, it may turn out that we've
+* reached this commit some other way (where it
+* wasn't uninteresting), in which case we need
+* to mark its parents recursively too..
+*/
for (l = commit->parents; l; l = l->next)
-   commit_stack_push(, l->item);
+   commit_stack_push(pending, l->item);
+}
 
-   while (pending.nr > 0) {
-   struct commit *commit = commit_stack_pop();
+void mark_parents_uninteresting(struct commit *commit)
+{
+   struct commit_stack pending = COMMIT_STACK_INIT;
+   struct commit_list *l;
 
-   if (commit->object.flags & UNINTERESTING)
-   return;
-   commit->object.flags |= UNINTERESTING;
+   for (l = commit->parents; l; l = l->next)
+   mark_one_parent_uninteresting(l->item, );
 
-   /*
-* Normally we haven't parsed the parent
-* yet, so we won't have a parent of a parent
-* here. However, it may turn out that we've
-* reached this commit some other way (where it
-* wasn't uninteresting), in which case we need
-* to mark its parents recursively too..
-*/
-   for (l = commit->parents; l; l = l->next)
-   commit_stack_push(, l->item);
-   }
+   while (pending.nr > 0)
+   mark_one_parent_uninteresting(commit_stack_pop(),
+ );
 
commit_stack_clear();
 }
-- 
2.17.0.988.gec4b43b3e5

[PATCH 3/4] mark_parents_uninteresting(): replace list with stack

The mark_parents_uninteresting() function uses two loops:
an outer one to process our queue of pending parents, and an
inner one to process first-parent chains. This is a clever
optimization from 941ba8db57 (Eliminate recursion in
setting/clearing marks in commit list, 2012-01-14) to limit
the number of linked-list allocations when following
single-parent chains.

Unfortunately, this makes the result a little hard to read.
Let's replace the list with a stack. Then we don't have to
worry about doing this double-loop optimization, as we'll
just reuse the top element of the stack as we pop/push.

Signed-off-by: Jeff King 
---
The diff makes a lot more sense with "-w".

 revision.c | 67 +++---
 1 file changed, 43 insertions(+), 24 deletions(-)

diff --git a/revision.c b/revision.c
index cee4f3a4b4..89ff9a99ce 100644
--- a/revision.c
+++ b/revision.c
@@ -92,38 +92,57 @@ void mark_tree_uninteresting(struct tree *tree)
mark_tree_contents_uninteresting(tree);
 }
 
-void mark_parents_uninteresting(struct commit *commit)
+struct commit_stack {
+   struct commit **items;
+   size_t nr, alloc;
+};
+#define COMMIT_STACK_INIT { NULL, 0, 0 }
+
+static void commit_stack_push(struct commit_stack *stack, struct commit 
*commit)
 {
-   struct commit_list *parents = NULL, *l;
+   ALLOC_GROW(stack->items, stack->nr + 1, stack->alloc);
+   stack->items[stack->nr++] = commit;
+}
 
-   for (l = commit->parents; l; l = l->next)
-   commit_list_insert(l->item, );
+static struct commit *commit_stack_pop(struct commit_stack *stack)
+{
+   return stack->nr ? stack->items[--stack->nr] : NULL;
+}
 
-   while (parents) {
-   struct commit *commit = pop_commit();
+static void commit_stack_clear(struct commit_stack *stack)
+{
+   FREE_AND_NULL(stack->items);
+   stack->nr = stack->alloc = 0;
+}
 
-   while (commit) {
-   if (commit->object.flags & UNINTERESTING)
-   break;
+void mark_parents_uninteresting(struct commit *commit)
+{
+   struct commit_stack pending = COMMIT_STACK_INIT;
+   struct commit_list *l;
 
-   commit->object.flags |= UNINTERESTING;
+   for (l = commit->parents; l; l = l->next)
+   commit_stack_push(, l->item);
 
-   /*
-* Normally we haven't parsed the parent
-* yet, so we won't have a parent of a parent
-* here. However, it may turn out that we've
-* reached this commit some other way (where it
-* wasn't uninteresting), in which case we need
-* to mark its parents recursively too..
-*/
-   if (!commit->parents)
-   break;
+   while (pending.nr > 0) {
+   struct commit *commit = commit_stack_pop();
 
-   for (l = commit->parents->next; l; l = l->next)
-   commit_list_insert(l->item, );
-   commit = commit->parents->item;
-   }
+   if (commit->object.flags & UNINTERESTING)
+   return;
+   commit->object.flags |= UNINTERESTING;
+
+   /*
+* Normally we haven't parsed the parent
+* yet, so we won't have a parent of a parent
+* here. However, it may turn out that we've
+* reached this commit some other way (where it
+* wasn't uninteresting), in which case we need
+* to mark its parents recursively too..
+*/
+   for (l = commit->parents; l; l = l->next)
+   commit_stack_push(, l->item);
}
+
+   commit_stack_clear();
 }
 
 static void add_pending_object_with_path(struct rev_info *revs,
-- 
2.17.0.988.gec4b43b3e5

[PATCH 2/4] mark_parents_uninteresting(): drop missing object check

We allow UNINTERESTING objects in a traversal to be
unavailable. As part of this, mark_parents_uninteresting()
checks whether we have a particular uninteresting parent; if
not, we will mark it as "parsed" so that later code skips
it.

This code is redundant and even a little bit harmful, so
let's drop it.

It's redundant because when our parse_object() call in
add_parents_to_list() fails, we already quietly skip
UNINTERESTING parents. This redundancy is a historical
artifact. The mark_parents_uninteresting() protection is
from 454fbbcde3 (git-rev-list: allow missing objects when
the parent is marked UNINTERESTING, 2005-07-10). Much later,
aeeae1b771 (revision traversal: allow UNINTERESTING objects
to be missing, 2009-01-27) covered more cases by making the
actual parse more gentle.

  As an aside, even if this weren't redundant, it would be
  insufficient. The gentle parsing handles both missing and
  corrupted objects, whereas the has_object_file() check
  we're getting rid of covers only missing ones.

And the code we're dropping is harmful for two reasons:

  1. We spend extra time on the object lookup, even though
 we don't actually need the information at this point
 (and will just repeat that lookup later when we parse
 for the common case that we _do_ have the object).

  2. It "lies" about the commit by setting the parsed flag,
 even though we didn't load any useful data into the
 struct. This shouldn't matter for the UNINTERESTING
 case, but we may later clear our flags and do another
 traversal in the same process. While pretty unlikely,
 it's possible that we could then look at the same
 commit without the UNINTERESTING flag, in which case
 we'd produce the wrong result (we'd think it's a commit
 with no parents, when in fact we should probably die
 due to the missing object).

Signed-off-by: Jeff King 
---
 revision.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/revision.c b/revision.c
index ef70f69f08..cee4f3a4b4 100644
--- a/revision.c
+++ b/revision.c
@@ -103,18 +103,6 @@ void mark_parents_uninteresting(struct commit *commit)
struct commit *commit = pop_commit();
 
while (commit) {
-   /*
-* A missing commit is ok iff its parent is marked
-* uninteresting.
-*
-* We just mark such a thing parsed, so that when
-* it is popped next time around, we won't be trying
-* to parse it and get an error.
-*/
-   if (!commit->object.parsed &&
-   !has_object_file(>object.oid))
-   commit->object.parsed = 1;
-
if (commit->object.flags & UNINTERESTING)
break;
 
-- 
2.17.0.988.gec4b43b3e5

[PATCH 1/4] mark_tree_contents_uninteresting(): drop missing object check

It's generally acceptable for UNINTERESTING objects in a
traversal to be unavailable (e.g., see aeeae1b771). When
marking trees UNINTERESTING, we access the object database
twice: once to check if the object is missing (and return
quietly if it is), and then again to actually parse it.

We can instead just try to parse; if that fails, we can then
return quietly. That halves the effort we spend on locating
the object.

Note that this isn't _exactly_ the same as the original
behavior, as the parse failure could be due to other
problems than a missing object: it could be corrupted, in
which case the original code would have died. But the new
behavior is arguably better, as it covers the object being
unavailable for any reason. We'll also still issue a warning
to stderr in such a case.

Signed-off-by: Jeff King 
---
 revision.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/revision.c b/revision.c
index 1cff11833e..ef70f69f08 100644
--- a/revision.c
+++ b/revision.c
@@ -52,12 +52,9 @@ static void mark_tree_contents_uninteresting(struct tree 
*tree)
 {
struct tree_desc desc;
struct name_entry entry;
-   struct object *obj = >object;
 
-   if (!has_object_file(>oid))
+   if (parse_tree_gently(tree, 1) < 0)
return;
-   if (parse_tree(tree) < 0)
-   die("bad tree %s", oid_to_hex(>oid));
 
init_tree_desc(, tree->buffer, tree->size);
while (tree_entry(, )) {
-- 
2.17.0.988.gec4b43b3e5

[PATCH 0/4] a few mark_parents_uninteresting cleanups

This is a follow-up to the discussion from February:

  
https://public-inbox.org/git/1519522496-73090-1-git-send-email-dsto...@microsoft.com/

There I theorized that some of these extra has_object_file() checks were
unnecessary. After poking around a bit, I've convinced myself that this
is the case, so here are some patches.

After Stolee's fix in ebbed3ba04 (revision.c: reduce object database
queries, 2018-02-24), I doubt this will provide any noticeable speedup.
IMHO the value is just in simplifying the logic.

The first two patches are the actual has_object_file simplifications.
The second two are an attempt to fix some head-scratching I had while
reading mark_parents_uninteresting(). I hope the result is easier to
follow, but I may have just shuffled one confusion for another. :)

  [1/4]: mark_tree_contents_uninteresting(): drop missing object check
  [2/4]: mark_parents_uninteresting(): drop missing object check
  [3/4]: mark_parents_uninteresting(): replace list with stack
  [4/4]: mark_parents_uninteresting(): avoid most allocation

 revision.c | 90 ++
 1 file changed, 50 insertions(+), 40 deletions(-)

-Peff

Re: Patch add: previous hunk across file boundaries

On Fri, May 11, 2018 at 09:53:52AM -0500, Robert Dailey wrote:

> I noticed that when stepping into a new file while doing `git add -p`,
> pressing `k` or `K` does not go back to the previous file. Is this a
> bug? Is there a setting for it? I googled & checked out the git docs,
> I didn't find any specific information on this.

Nope, this is just the way that "-p" is implemented. It considers each
file independently (another artifact of this is that after completing
the hunks in a file the changes are staged, even if you quit while
looking at a hunk in another file).

I agree that it would be nicer to consider the whole set atomically,
with both "save and quit" and "quit but do not save" options. The way it
works now is mostly due to the way that "add -p" grew out of the whole
"add -i" interactive system, where you'd invoke the patch-adder on
specific files from a menu.

So no, there's not a way to do what you want right now short of writing
patch.

-Peff

Re: [PATCH] fast-export: avoid NULL pointer arithmetic

On Fri, May 11, 2018 at 03:34:19PM +0200, Duy Nguyen wrote:

> On Fri, May 11, 2018 at 03:11:46PM +0200, Duy Nguyen wrote:
> > Back to fast-export, can we just allocate a new int on heap and point
> > it there? Allocating small pieces becomes quite cheap and fast with
> > mem-pool.h and we can avoid this storing integer in pointer business.
> 
> Something like this seems to work, but we use 4-ish more bytes per
> object, or 100MB overhead on a repo with 25M objects. I think it's a
> reasonable trade off.

I'm not sure I agree. 4 bytes per object certainly isn't the end of the
world, but what was the problem we were solving in the first place? Just
that we weren't comfortable with the round-trip from uintptr_t to void
and back? Is this actually a problem on real platforms? If not, it seems
silly to incur a run-time cost.

-Peff

Re: [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch

2018-05-11 Thread Martin Ågren

On 11 May 2018 at 19:23, Derrick Stolee  wrote:

> Martin's initial test cases are wonderful. I've adapted them to test the
> other conditions in the verify_commit_graph() method and caught some
> interesting behavior in the process. I'm preparing v2 so we can investigate
> the direction of the tests.

Cool, I'm glad you found them useful. One thought I had was that you
could possibly write the tests such that you introduce errors from the
back of the file. That might enable you to do less of the "backup
commit-graph file and restore it"-dance. Just a thought.

Martin

Re: [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch


On 5/10/2018 3:22 PM, Stefan Beller wrote:

On Thu, May 10, 2018 at 12:05 PM, Martin Ågren  wrote:

On 10 May 2018 at 19:34, Derrick Stolee  wrote:


Commits 01-07 focus on the 'git commit-graph verify' subcommand. These
are ready for full, rigorous review.

I don't know about "full" and "rigorous", but I tried to offer my thoughts.

A lingering feeling I have is that users could possibly benefit from
seeing "the commit-graph has a bad foo" a bit more than just "the
commit-graph is bad". But adding "the bar is baz, should have been
frotz" might not bring that much. Maybe you could keep the translatable
string somewhat simple, then, if the more technical data could be useful
to Git developers, dump it in a non-translated format. (I guess it could
be hidden behind a debug switch, but let's take one step at a time..)
This is nothing I feel strongly about.


  t/t5318-commit-graph.sh  |  25 +

I wonder about tests. Some tests seem to use `dd` to corrupt files and
check that it gets caught. Doing this in a a hash-agnostic way could be
tricky, but maybe not impossible. I guess we could do something
probabilistic, like "set the first two bytes of the very last OID to
zero -- surely all OIDs can't start with 16 zero bits". Hmm, that might
still require knowing the size of the OIDs...

I hope to find time to do some more hands-on testing of this to see that
errors actually do get caught.

Given that the commit graph is secondary data, the users work around
to quickly get back to a well working repository is most likely to remove
the file and regenerate it.
As a developer who wants to fix the bug, a stacktrace/datadump and the
history of git commands might be most valuable, but I agree we should
hide that behind a debug flag.

Packfiles and loose objects are primary data, which means that those
need a more advanced way to diagnose and repair them, so I would imagine
the commit graph fsck is closer to bitmaps fsck, which I would have suspected
to be found in t5310, but a quick read doesn't reveal many tests that are
checking for integrity. So I guess the test coverage here is ok, (although we
should always ask for more)


My main goal is to help developers figure out what is wrong with a file, 
and then we can use other diagnostic tools to discover how it got into 
that state.


Martin's initial test cases are wonderful. I've adapted them to test the 
other conditions in the verify_commit_graph() method and caught some 
interesting behavior in the process. I'm preparing v2 so we can 
investigate the direction of the tests.


Thanks,
-Stolee

Re: [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch


On 5/10/2018 3:17 PM, Ævar Arnfjörð Bjarmason wrote:

On Thu, May 10 2018, Derrick Stolee wrote:


The behavior in this patch series does the following:

1. Near the end of 'git gc', run 'git commit-graph write'. The location
of this code assumes that a 'git gc --auto' has not terminated early
due to not meeting the auto threshold.

2. At the end of 'git fetch', run 'git commit-graph write'. This means
that every reachable commit will be in the commit-graph after a
a successful fetch, which seems a reasonable frequency. Then, the
only times we would be missing a reachable commit is after creating
one locally. There is a problem with the current patch, though: every
'git fetch' call runs 'git commit-graph write', even if there were no
ref updates or objects downloaded. Is there a simple way to detect if
the fetch was non-trivial?

One obvious problem with this approach: if we compute this during 'gc'
AND 'fetch', there will be times where a 'fetch' calls 'gc' and triggers
two commit-graph writes. If I were to abandon one of these patches, it
would be the 'fetch' integration. A 'git gc' really wants to delete all
references to unreachable commits, and without updating the commit-graph
we may still have commit data in the commit-graph file that is not in
the object database. In fact, deleting commits from the object database
but not from the commit-graph will cause 'git commit-graph verify' to
fail!

I welcome discussion on these ideas, as we are venturing out of the
"pure data structure" world and into the "user experience" world. I am
less confident in my skills in this world, but the feature is worthless
if it does not improve the user experience.

I really like #1 here, but I wonder why #2 is necessary.

I.e. is it critical for the performance of the commit graph feature that
it be kept really up-to-date, moreso than other things that rely on gc
--auto (e.g. the optional bitmap index)?


It is not critical. The feature has been designed to have recent commits 
not in the file. For simplicity, it is probably best to limit ourselves 
to writing after a non-trivial 'gc'.



Even if that's the case, I think something that does this via gc --auto
is a much better option. I.e. now we have gc.auto & gc.autoPackLimit, if
the answer to my question above is "yes" this could also be accomplished
by introducing a new graph-specific gc.* setting, and --auto would just
update the graph more often, but leave the rest.


This is an excellent idea for a follow-up series, if we find we want the 
commit-graph written more frequently. For now, I'm satisfied with one 
place where it is automatically computed.


I'll drop the fetch integration in my v2 series.

Thanks,
-Stolee

[PATCH] commit.h: rearrange 'index' to shrink struct commit

On linux 64-bit architecture, pahole finds that there's a 4 bytes
padding after 'index'. Moving it to the end reduces this struct's size
from 72 to 64 bytes (because of another 4 bytes padding after
graph_pos). On linux 32-bit, the struct size remains 52 bytes like
before.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 commit.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit.h b/commit.h
index e57ae4b583..fd1cbe7263 100644
--- a/commit.h
+++ b/commit.h
@@ -19,11 +19,11 @@ struct commit_list {
 struct commit {
struct object object;
void *util;
-   unsigned int index;
timestamp_t date;
struct commit_list *parents;
struct tree *tree;
uint32_t graph_pos;
+   unsigned int index;
 };
 
 extern int save_commit_buffer;
-- 
2.17.0.705.g3525833791

Re: [PATCH v7 07/13] completion: implement and use --list-cmds=main,others

On Fri, May 11, 2018 at 4:06 PM, SZEDER Gábor  wrote:
> On Thu, May 10, 2018 at 10:46 AM, Nguyễn Thái Ngọc Duy
>  wrote:
>> Instead of parsing "git help -a" output, which is tricky to get right,
>> less elegant and also slow,
>
> Is it tricky?  It seems rather straightforward.

But there are traps. [1] is a reported one. And since you're parsing
what's meant for human reader, this code could easily break in the
future.

[1] 
https://public-inbox.org/git/CAPQz56bts8zFfUHyKJqnefQoH97L5TTA-k3be=5hsdeebcm...@mail.gmail.com/

> Is it slow?  Well, there is a pipe and an egrep, sure, but thanks to
> caching it's only run once, so basically doesn't matter.

This I agree is an exaggeration. Ultimately though I am breaking down
and providing the information pieces to
__git_list_porcelain_commands() and it happens to benefit this one
too. Perhaps I should rephrase my commit to say this.
-- 
Duy

[PATCH v4] add status config and command line options for rename detection

2018-05-11 Thread Ben Peart

After performing a merge that has conflicts git status will, by default,
attempt to detect renames which causes many objects to be examined.  In a
virtualized repo, those objects do not exist locally so the rename logic
triggers them to be fetched from the server. This results in the status call
taking hours to complete on very large repos vs seconds with this patch.

Add a new config status.renames setting to enable turning off rename
detection during status and commit.  This setting will default to the value
of diff.renames.

Add a new config status.renamelimit setting to to enable bounding the time
spent finding out inexact renames during status and commit.  This setting
will default to the value of diff.renamelimit.

Add --no-renames command line option to status that enables overriding the
config setting from the command line. Add --find-renames[=] command line
option to status that enables detecting renames and optionally setting the
similarity index.

Reviewed-by: Elijah Newren 
Original-Patch-by: Alejandro Pauly 
Signed-off-by: Ben Peart 
---

Notes:
Base Ref: commit dc6b1d92ca9c0c538daa244e3910bb8b2a50d959 
(em/status-rename-config)
Web-Diff: https://github.com/benpeart/git/commit/95974d512b
Checkout: git fetch https://github.com/benpeart/git status-renames-v4 && 
git checkout 95974d512b

### Interdiff (v3..v4):

### Patches

 Documentation/config.txt |  12 
 Documentation/git-status.txt |  10 
 builtin/commit.c |  42 +
 diff.c   |   2 +-
 diff.h   |   1 +
 t/t7525-status-rename.sh | 113 +++
 wt-status.c  |  12 
 wt-status.h  |   4 +-
 8 files changed, 194 insertions(+), 2 deletions(-)
 create mode 100755 t/t7525-status-rename.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2659153cb3..4f1ead 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3119,6 +3119,18 @@ status.displayCommentPrefix::
behavior of linkgit:git-status[1] in Git 1.8.4 and previous.
Defaults to false.
 
+status.renameLimit::
+   The number of files to consider when performing rename detection
+   in linkgit:git-status[1] and linkgit:git-commit[1]. Defaults to
+   the value of diff.renameLimit.
+
+status.renames::
+   Whether and how Git detects renames in linkgit:git-status[1] and
+   linkgit:git-commit[1] .  If set to "false", rename detection is
+   disabled. If set to "true", basic rename detection is enabled.
+   If set to "copies" or "copy", Git will detect copies, as well.
+   Defaults to the value of diff.renames.
+
 status.showStash::
If set to true, linkgit:git-status[1] will display the number of
entries currently stashed away.
diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt
index c16e27e63d..c4467ffb98 100644
--- a/Documentation/git-status.txt
+++ b/Documentation/git-status.txt
@@ -135,6 +135,16 @@ ignored, then the directory is not shown, but all contents 
are shown.
Display or do not display detailed ahead/behind counts for the
branch relative to its upstream branch.  Defaults to true.
 
+--renames::
+--no-renames::
+   Turn on/off rename detection regardless of user configuration.
+   See also linkgit:git-diff[1] `--no-renames`.
+
+--find-renames[=]::
+   Turn on rename detection, optionally setting the similarity
+   threshold.
+   See also linkgit:git-diff[1] `--find-renames`.
+
 ...::
See the 'pathspec' entry in linkgit:gitglossary[7].
 
diff --git a/builtin/commit.c b/builtin/commit.c
index 5240f11225..b50e33ef48 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -143,6 +143,16 @@ static int opt_parse_m(const struct option *opt, const 
char *arg, int unset)
return 0;
 }
 
+static int opt_parse_rename_score(const struct option *opt, const char *arg, 
int unset)
+{
+   const char **value = opt->value;
+   if (arg != NULL && *arg == '=')
+   arg = arg + 1;
+
+   *value = arg;
+   return 0;
+}
+
 static void determine_whence(struct wt_status *s)
 {
if (file_exists(git_path_merge_head()))
@@ -1259,11 +1269,31 @@ static int git_status_config(const char *k, const char 
*v, void *cb)
return error(_("Invalid untracked files mode '%s'"), v);
return 0;
}
+   if (!strcmp(k, "diff.renamelimit")) {
+   if (s->rename_limit == -1)
+   s->rename_limit = git_config_int(k, v);
+   return 0;
+   }
+   if (!strcmp(k, "status.renamelimit")) {
+   s->rename_limit = git_config_int(k, v);
+   return 0;
+   }
+   if (!strcmp(k, "diff.renames")) {
+   if (s->detect_rename == -1)
+   s->detect_rename =

Re: [PATCH 04/12] commit-graph: verify fanout and lookup table


On 5/10/2018 2:29 PM, Martin Ågren wrote:

On 10 May 2018 at 19:34, Derrick Stolee  wrote:

While running 'git commit-graph verify', verify that the object IDs
are listed in lexicographic order and that the fanout table correctly
navigates into that list of object IDs.

Signed-off-by: Derrick Stolee 
---
  commit-graph.c | 33 +
  1 file changed, 33 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index ce11af1d20..b4c146c423 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -839,6 +839,9 @@ static int verify_commit_graph_error;

  int verify_commit_graph(struct commit_graph *g)
  {
+   uint32_t i, cur_fanout_pos = 0;
+   struct object_id prev_oid, cur_oid;
+
 if (!g) {
 graph_report(_("no commit-graph file loaded"));
 return 1;
@@ -853,5 +856,35 @@ int verify_commit_graph(struct commit_graph *g)
 if (!g->chunk_commit_data)
 graph_report(_("commit-graph is missing the Commit Data 
chunk"));

+   for (i = 0; i < g->num_commits; i++) {
+   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+   if (i > 0 && oidcmp(_oid, _oid) >= 0)
+   graph_report(_("commit-graph has incorrect oid order: %s 
then %s"),

Minor: I think our style would prefer s/i > 0/i/.

I suppose the second check should be s/>=/>/, but it's not like it
should matter. ;-)


It shouldn't, but a duplicate commit is still an incorrect oid order.


I wonder if this is a message that would virtually never make sense to a
user, but more to a developer. Leave it untranslated to make sure any
bug reports to the list are readable to us?


Yeah, I'll make all of the errors untranslated since they are for 
developer debugging, not end-user information.



+
+   oid_to_hex(_oid),
+   oid_to_hex(_oid));

Hmm, these two lines do not actually achieve anything?


It's a formatting bug: They complete the statement starting with 
'graph_report(' above.





+   oidcpy(_oid, _oid);
+
+   while (cur_oid.hash[0] > cur_fanout_pos) {
+   uint32_t fanout_value = get_be32(g->chunk_oid_fanout + 
cur_fanout_pos);
+   if (i != fanout_value)
+   graph_report(_("commit-graph has incorrect fanout 
value: fanout[%d] = %u != %u"),
+cur_fanout_pos, fanout_value, i);

Same though re `_()`, even more so because of the more technical
notation.


+
+   cur_fanout_pos++;
+   }
+   }
+
+   while (cur_fanout_pos < 256) {
+   uint32_t fanout_value = get_be32(g->chunk_oid_fanout + 
cur_fanout_pos);
+
+   if (g->num_commits != fanout_value)
+   graph_report(_("commit-graph has incorrect fanout value: 
fanout[%d] = %u != %u"),
+cur_fanout_pos, fanout_value, i);

Same here. Or maybe these should just give a translated user-readable
basic idea of what is wrong and skip the details?


As someone who is responsible for probably inserting these problems, I 
think the details are important.


Thanks,
-Stolee

Re: [PATCH v7 12/13] completion: let git provide the completable command list

2018-05-11 Thread SZEDER Gábor

On Thu, May 10, 2018 at 10:46 AM, Nguyễn Thái Ngọc Duy
 wrote:
> Instead of maintaining a separate list of command classification,
> which often could go out of date, let's centralize the information
> back in git.
>
> While the function in git-completion.bash implies "list porcelain
> commands", that's not exactly what it does. It gets all commands (aka
> --list-cmds=main,others) then exclude certain non-porcelain ones. We
> could almost recreate this list two lists list-mainporcelain and
> others. The non-porcelain-but-included-anyway is added by the third
> category list-complete.
>
> list-complete does not recreate exactly the command list before this
> patch though. The following commands are not part of neither
> list-mainporcelain nor list-complete and as a result no longer
> completes:
>
> - annotate obsolete, discouraged to use
> - difftool-helper  not an end user command
> - filter-branchnot often used
> - get-tar-commit-idnot often used
> - imap-sendnot often used
> - interpreter-trailers not for interactive use
> - lost-found   obsolete
> - p4   too short and probably not often used (*)
> - peek-remote  deprecated
> - svn  same category as p4 (*)
> - tar-tree obsolete
> - verify-commitnot often used

'git name-rev' is plumbing as well.

I think this commit should be split into two:

  - first do the unequivocally beneficial thing and get rid of the
long, hard-coded command list in __git_list_porcelain_commands(),
while keeping its output unchanged,

  - then do the arguable thing and change the list of commands.

> Note that the current completion script incorrectly classifies
> filter-branch as porcelain and t9902 tests this behavior. We keep it
> this way in t9902 because this test does not really care which
> particular command is porcelain or plubmbing, they're just names.
>
> (*) to be fair, send-email command which is in the same
> foreignscminterface group as svn and p4 does get completion, just
> because it's used by git and kernel development. So maybe should
> include them.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  command-list.txt   |  37 
>  contrib/completion/git-completion.bash | 117 ++---
>  t/t9902-completion.sh  |   5 +-
>  3 files changed, 48 insertions(+), 111 deletions(-)

> diff --git a/contrib/completion/git-completion.bash 
> b/contrib/completion/git-completion.bash
> index 4e724a5b76..908692ea52 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -835,18 +835,30 @@ __git_complete_strategy ()
>  }
>
>  __git_commands () {
> -   if test -n "${GIT_TESTING_COMMAND_COMPLETION:-}"
> -   then
> -   printf "%s" "${GIT_TESTING_COMMAND_COMPLETION}"
> -   else
> -   git --list-cmds=main,others
> -   fi
> +   case "$1" in
> +   porcelain)
> +   if test -n "$GIT_TESTING_PORCELAIN_COMMAND_LIST"
> +   then
> +   printf "%s" "$GIT_TESTING_PORCELAIN_COMMAND_LIST"
> +   else
> +   git 
> --list-cmds=list-mainporcelain,others,list-complete
> +   fi
> +   ;;
> +   all)
> +   if test -n "$GIT_TESTING_ALL_COMMAND_LIST"
> +   then
> +   printf "%s" "$GIT_TESTING_ALL_COMMAND_LIST"
> +   else
> +   git --list-cmds=main,others
> +   fi
> +   ;;
> +   esac
>  }
>
> -__git_list_all_commands ()
> +__git_list_commands ()

Please add comments before these functions to document their mandatory
argument.

>  {
> local i IFS=" "$'\n'
> -   for i in $(__git_commands)
> +   for i in $(__git_commands $1)
> do
> case $i in
> *--*) : helper pattern;;

Is this loop to exclude helper commands with doubledash in their name
still necessary?

Patch add: previous hunk across file boundaries

2018-05-11 Thread Robert Dailey

I noticed that when stepping into a new file while doing `git add -p`,
pressing `k` or `K` does not go back to the previous file. Is this a
bug? Is there a setting for it? I googled & checked out the git docs,
I didn't find any specific information on this.

Re: [PATCH v3] add status config and command line options for rename detection

2018-05-11 Thread Elijah Newren

Hi Ben,

On Fri, May 11, 2018 at 5:56 AM, Ben Peart  wrote:
> After performing a merge that has conflicts git status will, by default,
> attempt to detect renames which causes many objects to be examined.  In a
> virtualized repo, those objects do not exist locally so the rename logic
> triggers them to be fetched from the server. This results in the status call
> taking hours to complete on very large repos vs seconds with this patch.
>
> Add a new config status.renames setting to enable turning off rename detection
> during status and commit.  This setting will default to the value of
> diff.renames.
>
> Add a new config status.renamelimit setting to to enable bounding the time
> spent finding out inexact renames during status and commit.  This setting will
> default to the value of diff.renamelimit.
>
> Add --no-renames command line option to status that enables overriding the
> config setting from the command line. Add --find-renames[=] command line
> option to status that enables detecting renames and optionally setting the
> similarity index.

Any chance I could get you to re-wrap this at a smaller column width?
Doesn't fit in my (80-char) terminal when I run `git log`; a couple
lines run over by a couple characters.  (Sorry for not noticing this
earlier)

> This patch depends on em/status-rename-config

I'd leave this line for the notes.  It's useful information now, but
won't be to someone looking at the commit a year from now, so it
probably doesn't belong in the commit message.


With those two changes:
  Reviewed-by: Elijah Newren

Re: [PATCH v7 07/13] completion: implement and use --list-cmds=main,others

2018-05-11 Thread SZEDER Gábor

On Fri, May 11, 2018 at 4:06 PM, SZEDER Gábor  wrote:
> OTOH you don't mention the most important reason, namely that the
> completion script contains a long hard-coded list of the names of all
> known plumbing commands to filter out, which is redundant with the
> categorization in 'command-list.txt', is a maintenance burden, and
> tends to get out-of-sync when new plumbing commands are added.

Oops, sorry, got lost among the many patches, please just forget this
paragraph.

Re: [PATCH v7 07/13] completion: implement and use --list-cmds=main,others

2018-05-11 Thread SZEDER Gábor

On Thu, May 10, 2018 at 10:46 AM, Nguyễn Thái Ngọc Duy
 wrote:
> Instead of parsing "git help -a" output, which is tricky to get right,
> less elegant and also slow,

Is it tricky?  It seems rather straightforward.
Is it slow?  Well, there is a pipe and an egrep, sure, but thanks to
caching it's only run once, so basically doesn't matter.

OTOH you don't mention the most important reason, namely that the
completion script contains a long hard-coded list of the names of all
known plumbing commands to filter out, which is redundant with the
categorization in 'command-list.txt', is a maintenance burden, and
tends to get out-of-sync when new plumbing commands are added.

> make git provide the list in a
> machine-friendly form. This adds two separate listing types, main and
> others, instead of just "all" for more flexibility.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  contrib/completion/git-completion.bash |  2 +-
>  git.c  |  4 
>  help.c | 32 ++
>  help.h |  4 
>  4 files changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/contrib/completion/git-completion.bash 
> b/contrib/completion/git-completion.bash
> index 3556838759..62ca8641f4 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -839,7 +839,7 @@ __git_commands () {
> then
> printf "%s" "${GIT_TESTING_COMMAND_COMPLETION}"
> else
> -   git help -a|egrep '^  [a-zA-Z0-9]'
> +   git --list-cmds=main,others
> fi
>  }
>
> diff --git a/git.c b/git.c
> index 376a59b97f..10907f7266 100644
> --- a/git.c
> +++ b/git.c
> @@ -56,6 +56,10 @@ static int list_cmds(const char *spec)
>
> if (match_token(spec, len, "builtins"))
> list_builtins(, 0);
> +   else if (match_token(spec, len, "main"))
> +   list_all_main_cmds();
> +   else if (match_token(spec, len, "others"))
> +   list_all_other_cmds();
> else
> die(_("unsupported command listing type '%s'"), spec);
> spec += len;
> diff --git a/help.c b/help.c
> index 2d6a3157f8..d5ce9dfcbb 100644
> --- a/help.c
> +++ b/help.c
> @@ -297,6 +297,38 @@ void list_common_cmds_help(void)
> print_cmd_by_category(common_categories);
>  }
>
> +void list_all_main_cmds(struct string_list *list)
> +{
> +   struct cmdnames main_cmds, other_cmds;
> +   int i;
> +
> +   memset(_cmds, 0, sizeof(main_cmds));
> +   memset(_cmds, 0, sizeof(other_cmds));
> +   load_command_list("git-", _cmds, _cmds);
> +
> +   for (i = 0; i < main_cmds.cnt; i++)
> +   string_list_append(list, main_cmds.names[i]->name);
> +
> +   clean_cmdnames(_cmds);
> +   clean_cmdnames(_cmds);
> +}
> +
> +void list_all_other_cmds(struct string_list *list)
> +{
> +   struct cmdnames main_cmds, other_cmds;
> +   int i;
> +
> +   memset(_cmds, 0, sizeof(main_cmds));
> +   memset(_cmds, 0, sizeof(other_cmds));
> +   load_command_list("git-", _cmds, _cmds);
> +
> +   for (i = 0; i < other_cmds.cnt; i++)
> +   string_list_append(list, other_cmds.names[i]->name);
> +
> +   clean_cmdnames(_cmds);
> +   clean_cmdnames(_cmds);
> +}
> +
>  int is_in_cmdlist(struct cmdnames *c, const char *s)
>  {
> int i;
> diff --git a/help.h b/help.h
> index b21d7c94e8..97e6c0965e 100644
> --- a/help.h
> +++ b/help.h
> @@ -1,6 +1,8 @@
>  #ifndef HELP_H
>  #define HELP_H
>
> +struct string_list;
> +
>  struct cmdnames {
> int alloc;
> int cnt;
> @@ -17,6 +19,8 @@ static inline void mput_char(char c, unsigned int num)
>  }
>
>  extern void list_common_cmds_help(void);
> +extern void list_all_main_cmds(struct string_list *list);
> +extern void list_all_other_cmds(struct string_list *list);
>  extern const char *help_unknown_cmd(const char *cmd);
>  extern void load_command_list(const char *prefix,
>   struct cmdnames *main_cmds,
> --
> 2.17.0.705.g3525833791
>

Re: [PATCH] fast-export: avoid NULL pointer arithmetic

On Fri, May 11, 2018 at 03:11:46PM +0200, Duy Nguyen wrote:
> Back to fast-export, can we just allocate a new int on heap and point
> it there? Allocating small pieces becomes quite cheap and fast with
> mem-pool.h and we can avoid this storing integer in pointer business.

Something like this seems to work, but we use 4-ish more bytes per
object, or 100MB overhead on a repo with 25M objects. I think it's a
reasonable trade off.

-- 8< --
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 530df12f05..de593035b1 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -21,6 +21,7 @@
 #include "quote.h"
 #include "remote.h"
 #include "blob.h"
+#include "mem-pool.h"
 
 static const char *fast_export_usage[] = {
N_("git fast-export [rev-list-opts]"),
@@ -38,6 +39,7 @@ static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
 static struct refspec *refspecs;
 static int refspecs_nr;
 static int anonymize;
+static struct mem_pool int_pool = MEM_POOL_INIT(2 * 1024 * 1024);
 
 static int parse_opt_signed_tag_mode(const struct option *opt,
 const char *arg, int unset)
@@ -156,20 +158,22 @@ static void anonymize_path(struct strbuf *out, const char 
*path,
}
 }
 
-/* Since intptr_t is C99, we do not use it here */
-static inline uint32_t *mark_to_ptr(uint32_t mark)
+static inline uint32_t ptr_to_mark(void *mark)
 {
-   return ((uint32_t *)NULL) + mark;
-}
-
-static inline uint32_t ptr_to_mark(void * mark)
-{
-   return (uint32_t *)mark - (uint32_t *)NULL;
+   if (!mark)
+   BUG("not marked!");
+   return *(uint32_t *)mark;
 }
 
 static inline void mark_object(struct object *object, uint32_t mark)
 {
-   add_decoration(, object, mark_to_ptr(mark));
+   uint32_t *ptr = lookup_decoration(, object);
+
+   if (!ptr)
+   ptr = mem_pool_alloc(_pool, sizeof(uint32_t));
+
+   *ptr = mark;
+   add_decoration(, object, ptr);
 }
 
 static inline void mark_next_object(struct object *object)
diff --git a/fast-import.c b/fast-import.c
index 34edf3fb8f..ce5ce2081f 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -300,8 +300,7 @@ static int global_argc;
 static const char **global_argv;
 
 /* Memory pools */
-static struct mem_pool fi_mem_pool =  {NULL, 2*1024*1024 -
-  sizeof(struct mp_block), 0 };
+static struct mem_pool fi_mem_pool = MEM_POOL_INIT(2*1024*1024);
 
 /* Atom management */
 static unsigned int atom_table_sz = 4451;
diff --git a/mem-pool.h b/mem-pool.h
index 829ad58ecf..bccbd3f224 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -21,6 +21,8 @@ struct mem_pool {
size_t pool_alloc;
 };
 
+#define MEM_POOL_INIT(block_size) { NULL, (block_size) - sizeof(struct 
mp_block), 0 }
+
 /*
  * Alloc memory from the mem_pool.
  */
-- 8< --

Re: [PATCH] fast-export: avoid NULL pointer arithmetic

On Fri, May 11, 2018 at 10:56 AM, Jeff King  wrote:
> On Fri, May 11, 2018 at 08:19:59AM +0200, Duy Nguyen wrote:
>
>> On Fri, May 11, 2018 at 6:49 AM, Junio C Hamano  wrote:
>> > Junio C Hamano  writes:
>> >
>> >> René Scharfe  writes:
>> >>
>>  But it somehow feels backwards in spirit to me, as the reason why we
>>  use "void *" there in the decoration field is because we expect that
>>  we'd have a pointer to some struture most of the time, and we have
>>  to occasionally store a small integer there.
>> >>>
>> >>> Yes, fast-export seems to be the only place that stores an integer as
>> >>> a decoration.
>> >>
>> >> With the decoration subsystem that might be the case, but I think
>> >> we have other codepaths where "void * .util" field in the structure
>> >> is used to store (void *)1, expecting that a normal allocation will
>> >> never yield a pointer that is indistinguishable from that value.
>> >
>> > I was looking at a different topic and noticed that bisect.c uses
>> > commit->util (which is of type "void *") to always store an int (it
>> > never stores a pointer and there is no mixing).  This one is equally
>> > unportable as fast-export after your fix.
>> >
>>
>> In both cases we should be able to use commit-slab instead of
>> commit->util. We could go even further and kill "util" pointer but
>> that's more work.
>
> I would love it if we could kill the util pointer in favor of
> commit-slab. Unfortunately the fast-export case is decorating non-commit
> objects, too.

Right. The "util" thing was a side discussion.

Back to fast-export, can we just allocate a new int on heap and point
it there? Allocating small pieces becomes quite cheap and fast with
mem-pool.h and we can avoid this storing integer in pointer business.
-- 
Duy

[PATCH v3] add status config and command line options for rename detection

2018-05-11 Thread Ben Peart

After performing a merge that has conflicts git status will, by default,
attempt to detect renames which causes many objects to be examined.  In a
virtualized repo, those objects do not exist locally so the rename logic
triggers them to be fetched from the server. This results in the status call
taking hours to complete on very large repos vs seconds with this patch.

Add a new config status.renames setting to enable turning off rename detection
during status and commit.  This setting will default to the value of
diff.renames.

Add a new config status.renamelimit setting to to enable bounding the time
spent finding out inexact renames during status and commit.  This setting will
default to the value of diff.renamelimit.

Add --no-renames command line option to status that enables overriding the
config setting from the command line. Add --find-renames[=] command line
option to status that enables detecting renames and optionally setting the
similarity index.

This patch depends on em/status-rename-config

Original-Patch-by: Alejandro Pauly 
Signed-off-by: Ben Peart 
---

Notes:
Base Ref: commit dc6b1d92ca9c0c538daa244e3910bb8b2a50d959 
(em/status-rename-config)
Web-Diff: https://github.com/benpeart/git/commit/5bac43610b
Checkout: git fetch https://github.com/benpeart/git status-renames-v3 && 
git checkout 5bac43610b

### Interdiff (v2..v3):

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 9c8eca05b1..4f1ead 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3120,14 +3120,16 @@ status.displayCommentPrefix::
Defaults to false.

 status.renameLimit::
-   The number of files to consider when performing rename detection;
-   if not specified, defaults to the value of diff.renameLimit.
+   The number of files to consider when performing rename detection
+   in linkgit:git-status[1] and linkgit:git-commit[1]. Defaults to
+   the value of diff.renameLimit.

 status.renames::
-   Whether and how Git detects renames.  If set to "false",
-   rename detection is disabled. If set to "true", basic rename
-   detection is enabled.  If set to "copies" or "copy", Git will
-   detect copies, as well.  Defaults to the value of diff.renames.
+   Whether and how Git detects renames in linkgit:git-status[1] and
+   linkgit:git-commit[1] .  If set to "false", rename detection is
+   disabled. If set to "true", basic rename detection is enabled.
+   If set to "copies" or "copy", Git will detect copies, as well.
+   Defaults to the value of diff.renames.

 status.showStash::
If set to true, linkgit:git-status[1] will display the number of
diff --git a/builtin/commit.c b/builtin/commit.c
index db886277f4..b50e33ef48 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1373,6 +1373,7 @@ int cmd_status(int argc, const char **argv, const 
char *prefix)
if (no_renames != -1)
s.detect_rename = !no_renames;
if ((intptr_t)rename_score_arg != -1) {
+   if (s.detect_rename < DIFF_DETECT_RENAME)
s.detect_rename = DIFF_DETECT_RENAME;
if (rename_score_arg)
s.rename_score = parse_rename_score(_score_arg);
diff --git a/t/t7525-status-rename.sh b/t/t7525-status-rename.sh
old mode 100644
new mode 100755
index 311df8038a..ef8b1b3078
--- a/t/t7525-status-rename.sh
+++ b/t/t7525-status-rename.sh
@@ -10,14 +10,13 @@ test_expect_success 'setup' '
git commit -m"Adding original file." &&
mv original renamed &&
echo 2 >> renamed &&
-   git add .
-'
-
-cat >.gitignore <<\EOF
+   git add . &&
+   cat >.gitignore <<-\EOF
.gitignore
expect*
actual*
EOF
+'

 test_expect_success 'status no-options' '
git status >actual &&
@@ -63,7 +62,18 @@ test_expect_success 'status status.renames=true' '
test_i18ngrep "renamed:" actual
 '

-test_expect_success 'status config overriden' '
+test_expect_success 'commit honors status.renames=false' '
+   git -c status.renames=false commit --dry-run >actual &&
+   test_i18ngrep "deleted:" actual &&
+   test_i18ngrep "new file:" actual
+'
+
+test_expect_success 'commit honors status.renames=true' '
+   git -c status.renames=true commit --dry-run >actual &&
+   test_i18ngrep "renamed:" actual
+'
+
+test_expect_success 'status config overridden' '
git -c status.renames=true status --no-renames >actual &&
test_i18ngrep "deleted:" actual &&
test_i18ngrep "new file:" actual
@@ -87,4 +97,17 @@ test_expect_success 'status score=01%' '
test_i18ngrep "renamed:" actual
 '

+test_expect_success 'copies not overridden by find-rename' '

Re: [PATCH v2] add status config and command line options for rename detection

2018-05-11 Thread Ben Peart




On 5/10/2018 6:31 PM, Elijah Newren wrote:

Hi Ben,

On Thu, May 10, 2018 at 12:09 PM, Ben Peart  wrote:

On 5/10/2018 12:19 PM, Elijah Newren wrote:

On Thu, May 10, 2018 at 7:16 AM, Ben Peart 
wrote:



Given the example perf impact is arbitrary (the actual example that
triggered this patch took status from 2+ hours to seconds) and can't be
replicated using the current performance tools in git, I'm just going drop
the specific numbers.  I believe the patch is worth while just to give users
the flexibility to control these behaviors.


Your parenthetical statement of timing going from hours to seconds I
think would be great; I don't think we need precise numbers.


+   if ((intptr_t)rename_score_arg != -1) {
+   s.detect_rename = DIFF_DETECT_RENAME;



I'd still prefer this was a
  if (s.detect_rename < DIFF_DETECT_RENAME)
  s.detect_rename = DIFF_DETECT_RENAME;

If a user specifies they are willing to pay for copy detection, but
then just passes --find-renames=40% because they want to find more
renames, it seems odd to disable copy detection to me.



I agree and will change it. It is unfortunate this will behave differently
than it does with merge.  Fixing the merge behavior to match is outside the
scope of this patch.


I agree that changing merge is outside the scope of this patch, but
I'm curious what change you have in mind for it to "make it match".
In particular, merge-recursive.c already has (or will shortly have)
+   if (opts.detect_rename > DIFF_DETECT_RENAME)
+   opts.detect_rename = DIFF_DETECT_RENAME;
from your commit 85b460305ce7 ("merge: add merge.renames config
setting", 2018-05-02), 


This is a good point that I missed.  With that recent change to merge, 
it no longer matters that the settings parsing code caps detect_rename 
at DIFF_DETECT_RENAME because it will cap it later anyway so there is no 
need to change the merge option behavior.



The one place copy detection does make sense inside a merge is for the
diffstat shown at the end (from builtin/merge.c), but it currently
isn't controlled by any configuration setting at all.  When it is
hooked up, it'd probably store the value separately from
merge-recursive's internal o->{diff,merge}_detect_rename anyway,
because builtin/merge.c's diffstat should be controlled by the
relevant confiig settings and flags (merge.renames, diff.renames,
-Xfind-renames, etc.) regardless of which merge strategy (recursive,
resolve, octopus, ours, ort) is employed.  And when that is hooked up,
I agree with you that it should look like what you've done with
status.renames here.  In fact, if you'd like to take a crack at it, I
think you'd do a great job.  :-)  If not, it's on my list of things to
do.



Thanks but I'll leave that to you. :)  I have a large backlog of patches 
I would like to see pushed through the mailing list into master.  We've 
been sitting on this one for over a year.  If the current rate is any 
indication, it will take man years to get caught up.



Testcases look good.  It'd be nice to also add a few testcases where
copy detection is turned on -- in particular, I'd like to see one with
--find-renames=$DIFFERENT_THAN_DEFAULT being passed when
merge.renames=copies.



OK.  I also added tests to verify the settings correctly impact commit.


Nice!

Re: [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch


On 5/10/2018 4:45 PM, Martin Ågren wrote:

On 10 May 2018 at 21:22, Stefan Beller  wrote:

On Thu, May 10, 2018 at 12:05 PM, Martin Ågren  wrote:

I hope to find time to do some more hands-on testing of this to see that
errors actually do get caught.

Packfiles and loose objects are primary data, which means that those
need a more advanced way to diagnose and repair them, so I would imagine
the commit graph fsck is closer to bitmaps fsck, which I would have suspected
to be found in t5310, but a quick read doesn't reveal many tests that are
checking for integrity. So I guess the test coverage here is ok, (although we
should always ask for more)

Since I'm wrapping up for today, I'm posting some simple tests that I
assembled. The last of these showcases one or two problems with the
current error-reporting. Depending on the error, there can be *lots* of
errors reported and there are no new-lines, so the result on stdout can
be a wall of not-very-legible text.

Some of these might not make sense. I just started going through the
documentation on the format, causing some sort of corruption in each
field. Maybe this can be helpful somehow.

Martin

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 82f95eb11f..a7e48db2de 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -255,4 +255,49 @@ test_expect_success 'git fsck (checks commit-graph)' '
git fsck
  '
  
+# usage: corrupt_data   []

+corrupt_data() {
+   file=$1
+   pos=$2
+   data="${3:-\0}"
+   printf "$data" | dd of="$file" bs=1 seek="$pos" conv=notrunc
+}
+
+test_expect_success 'detect bad signature' '
+   cd "$TRASH_DIRECTORY/full" &&
+   cp $objdir/info/commit-graph commit-graph-backup &&
+   test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+   corrupt_data $objdir/info/commit-graph 0 "\0" &&
+   test_must_fail git commit-graph verify 2>err &&
+   grep "graph signature" err
+'
+
+test_expect_success 'detect bad version number' '
+   cd "$TRASH_DIRECTORY/full" &&
+   cp $objdir/info/commit-graph commit-graph-backup &&
+   test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+   corrupt_data $objdir/info/commit-graph 4 "\02" &&
+   test_must_fail git commit-graph verify 2>err &&
+   grep "graph version" err
+'
+
+test_expect_success 'detect bad hash version' '
+   cd "$TRASH_DIRECTORY/full" &&
+   cp $objdir/info/commit-graph commit-graph-backup &&
+   test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+   corrupt_data $objdir/info/commit-graph 5 "\02" &&
+   test_must_fail git commit-graph verify 2>err &&
+   grep "hash version" err
+'
+
+test_expect_success 'detect too small chunk-count' '
+   cd "$TRASH_DIRECTORY/full" &&
+   cp $objdir/info/commit-graph commit-graph-backup &&
+   test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+   corrupt_data $objdir/info/commit-graph 6 "\01" &&
+   test_must_fail git commit-graph verify 2>err &&
+   cat err
+'
+
+
  test_done



Martin: thank you so much for these test examples, and for running them 
to find out about the newline issue. This is really helpful.

Re: [PATCH] pager: set COLUMNS to term_columns()

On Fri, May 11, 2018 at 11:25 AM, Jeff King  wrote:
> --- a/pager.c
> +++ b/pager.c
> @@ -109,10 +109,15 @@ void setup_pager(void)
> return;
>
> /*
> -* force computing the width of the terminal before we redirect
> -* the standard output to the pager.
> +* After we redirect standard output, we won't be able to use an ioctl
> +* to get the terminal size. Let's grab it now, and then set $COLUMNS
> +* to communicate it to any sub-processes.
>  */
> -   (void) term_columns();
> +   {
> +   char buf[64];
> +   xsnprintf(buf, sizeof(buf), "%d", term_columns());
> +   setenv("COLUMNS", buf, 0);

I wonder if this affects bash being a subprocess. E.g. if COLUMNS is
exported will it still dynamically adjust COLUMNS? A quick test shows
that it adjusts $COLUMNS anyway, so even if we need to launch a shell
we should be good.

> +   }
>
> setenv("GIT_PAGER_IN_USE", "true", 1);
>
> --
> 2.17.0.984.g9b00a423a4
>
-- 
Duy

[PATCH] column: fix off-by-one default width

By default we want to fill the whole screen if possible, but we do not
want to use up _all_ terminal columns because the last character is
going hit the border, push the cursor over and wrap. Keep it at
default value zero, which will make print_columns() set the width at
term_columns() - 1.

This affects the test in t7004 because effective column width before
was 40 but now 39 so we need to compensate it by one or the output at
39 columns has a different layout.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/column.c | 1 -
 t/t7004-tag.sh   | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/builtin/column.c b/builtin/column.c
index 0c3223d64b..5228ccf37a 100644
--- a/builtin/column.c
+++ b/builtin/column.c
@@ -42,7 +42,6 @@ int cmd_column(int argc, const char **argv, const char 
*prefix)
git_config(column_config, NULL);
 
memset(, 0, sizeof(copts));
-   copts.width = term_columns();
copts.padding = 1;
argc = parse_options(argc, argv, "", options, builtin_column_usage, 0);
if (argc)
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index e3f1e014aa..d7b319e919 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -363,7 +363,7 @@ test_expect_success 'tag -l  -l  works, 
as our buggy documenta
 '
 
 test_expect_success 'listing tags in column' '
-   COLUMNS=40 git tag -l --column=row >actual &&
+   COLUMNS=41 git tag -l --column=row >actual &&
cat >expected <<\EOF &&
 a1  aa1 cba t210t211
 v0.2.1  v1.0v1.0.1  v1.1.3
-- 
2.17.0.705.g3525833791

Re: Implementing reftable in Git

2018-05-11 Thread Michael Haggerty

On Wed, May 9, 2018 at 4:33 PM, Christian Couder
wrote:
> I might start working on implementing reftable in Git soon.
> [...]

Nice. It'll be great to have a reftable implementation in git core
(and ideally libgit2, as well). It seems to me that it could someday
become the new default reference storage method. The file format is
considerably more complicated than the current loose/packed scheme,
which is definitely a disadvantage (for example, for other Git
implementations). But implementing it *with good performance and
without races* might be no more complicated than the current scheme.

Testing will be important. There are already many tests specifically
about testing loose/packed reference storage. These will always have
to run against repositories that are forced to use that reference
scheme. And there will need to be new tests specifically about the
reftable scheme. Both classes of tests should be run every time. That
much is pretty obvious.

But currently, there are a lot of tests that assume the loose/packed
reference format on disk even though the tests are not really related
to references at all. ISTM that these should be converted to work at a
higher level, for example using `for-each-ref`, `rev-parse`, etc. to
examine references rather than reading reference files directly. That
way the tests should run correctly regardless of which scheme is in
use.

And since it's too expensive to run the whole test suite with both
reference storage schemes, it seems to me that the reference storage
scheme that is used while running the scheme-neutral tests should be
easy to choose at runtime.

David Turner did some analogous work for wiring up and testing his
proposed LMDB ref storage backend that might be useful [1]. I'm CCing
him, since he might have thoughts on this topic.

Regarding the reftable spec itself:

I recently gave a little internal talk about it, and while preparing
the talk I noticed a couple of things that should maybe be tweaked:

* The spec proposes to change `$GIT_DIR/refs`, which is currently a
directory that holds the loose refs, into a file that holds the table
of contents of reftable files comprising the full set of references.
This was my suggestion. I was thinking that this would prevent old
refs code from being used accidentally on a reftable-enabled
repository, while still enabling old versions of Git recognize this as
a git directory [2]. I think that the latter is important to make
things like `git rev-parse --git-dir` work correctly, even if the
installed version of git can't actually *read* the repository.

The problem is that `is_git_directory()` checks not only whether
`$GIT_DIR/refs` exists, but also whether it is executable (i.e., since
it is normally a directory, that it is searchable). It would be silly
to make the reftable table of contents executable, so this doesn't
seem like a good approach after all.

So probably `$GIT_DIR/refs` should continue to be a directory. If
it's there, it would probably make sense to place the reftable files
and maybe the ToC inside of it. We would have to rely on older Git
versions refusing to work in the directory because its `config` file
has an unrecognized `core.repositoryFormatVersion`, but that should be
OK I think.

* The scheme for naming reftable files [3] is, I believe, just a
suggestion as far as the spec is concerned (except for the use of
`.ref`/`.log` file extensions). It might be more less unwieldy to use
`%d` rather than `%08d`, and more convenient to name compacted files
to `${min_update_index}-${max_update_index}_${n}.{ref,log}` to make it
clearer to see by inspection what each file contains. That would also
make it unnecessary, in most cases, to insert a `_${n}` to make the
filename unique.

Michael

[1] https://github.com/dturner-tw/git/tree/dturner/pluggable-backends
[2]
https://github.com/git/git/blob/ccdcbd54c4475c2238b310f7113ab3075b5abc9c/setup.c#L309-L347
[3]
https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#layout

https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#compaction
[4]
https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#footer

[PATCH] pager: set COLUMNS to term_columns()

On Fri, May 11, 2018 at 10:43:45AM +0200, Duy Nguyen wrote:

> > As an aside, I was confused while looking into this because I _thought_
> > I had COLUMNS set:
> >
> >   $ echo $COLUMNS
> >   119
> >
> > But it turns out that bash sets that by default (if you have the
> > checkwinsize option on) but does not export it. ;)
> 
> Yep. This confused me too and I was "f*ck it I don't want to deal with
> this tty crap again". I'll leave the term_columns() fix to you then,
> and limit this patch to the "while at there" part which should be
> fixed anyway.

Heh. OK, here it is. I wondered why we didn't notice this earlier and
elsewhere (like in "git -p help -a"). The answer is that git-tag is the
only program that uses the column filter. Everybody else does it
in-process.

-- >8 --
Subject: [PATCH] pager: set COLUMNS to term_columns()

After we invoke the pager, our stdout goes to a pipe, not the
terminal, meaning we can no longer use an ioctl to get the
terminal width. For that reason, ad6c3739a3 (pager: find out
the terminal width before spawning the pager, 2012-02-12)
started caching the terminal width.

But that cache is only an in-process variable. Any programs
we spawn will also not be able to run that ioctl, but won't
have access to our cache. They'll end up falling back to our
80-column default.

You can see the problem with:

  git tag --column=row

Since git-tag spawns a pager these days, its spawned
git-column helper will see neither the terminal on stdout
nor a useful COLUMNS value (assuming you do not export it
from your shell already). And you'll end up with 80-column
output in the pager, regardless of your terminal size.

We can fix this by setting COLUMNS right before spawning the
pager. That fixes this case, as well as any more complicated
ones (e.g., a paged program spawns another script which then
generates columnized output).

Signed-off-by: Jeff King 
---
 pager.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/pager.c b/pager.c
index 92b23e6cd1..226828f178 100644
--- a/pager.c
+++ b/pager.c
@@ -109,10 +109,15 @@ void setup_pager(void)
return;

/*
-* force computing the width of the terminal before we redirect
-* the standard output to the pager.
+* After we redirect standard output, we won't be able to use an ioctl
+* to get the terminal size. Let's grab it now, and then set $COLUMNS
+* to communicate it to any sub-processes.
 */
-   (void) term_columns();
+   {
+   char buf[64];
+   xsnprintf(buf, sizeof(buf), "%d", term_columns());
+   setenv("COLUMNS", buf, 0);
+   }

setenv("GIT_PAGER_IN_USE", "true", 1);

-- 
2.17.0.984.g9b00a423a4

Re: [PATCH v2] commit-graph: fix UX issue when .lock file exists

On Thu, May 10, 2018 at 05:42:52PM +, Derrick Stolee wrote:

> We use the lockfile API to avoid multiple Git processes from writing to
> the commit-graph file in the .git/objects/info directory. In some cases,
> this directory may not exist, so we check for its existence.
> [...]

This version looks good to me. Thanks!

-Peff

Re: [PATCH] fast-export: avoid NULL pointer arithmetic

On Fri, May 11, 2018 at 08:19:59AM +0200, Duy Nguyen wrote:

> On Fri, May 11, 2018 at 6:49 AM, Junio C Hamano  wrote:
> > Junio C Hamano  writes:
> >
> >> René Scharfe  writes:
> >>
>  But it somehow feels backwards in spirit to me, as the reason why we
>  use "void *" there in the decoration field is because we expect that
>  we'd have a pointer to some struture most of the time, and we have
>  to occasionally store a small integer there.
> >>>
> >>> Yes, fast-export seems to be the only place that stores an integer as
> >>> a decoration.
> >>
> >> With the decoration subsystem that might be the case, but I think
> >> we have other codepaths where "void * .util" field in the structure
> >> is used to store (void *)1, expecting that a normal allocation will
> >> never yield a pointer that is indistinguishable from that value.
> >
> > I was looking at a different topic and noticed that bisect.c uses
> > commit->util (which is of type "void *") to always store an int (it
> > never stores a pointer and there is no mixing).  This one is equally
> > unportable as fast-export after your fix.
> >
> 
> In both cases we should be able to use commit-slab instead of
> commit->util. We could go even further and kill "util" pointer but
> that's more work.

I would love it if we could kill the util pointer in favor of
commit-slab. Unfortunately the fast-export case is decorating non-commit
objects, too.

I'm not sure how possible/easy it would be to have an "object slab".
Some complications are:

  1. It would increase the size of "struct tree" and "struct blob" by at
 least 4 bytes (possibly 8, due to alignment) to store the slab
 index. Commits would stay the same, but my experience is that most
 repositories have 5-10 times as many trees/blobs as commits.

 Some of that memory might be reclaimable. E.g., if we moved
 tree->buffer and tree->size into a slab, callers which don't use
 them would actually come out ahead (and more callers than you might
 expect could do this, since many only need to look at the tree
 contents inside a single function).

  2. If we allocate the indices for all types as a single sequence, then
 callers who only care about a specific type will have very sparse
 slabs (so they'll over-allocate, and it will have poor cache
 behavior).

 It might be possible to do something more clever. E.g., give each
 object type its own index counter, but then for a full object-slab,
 store per-type slabs and dereference obj->type to know which slab
 to look in.

 There'd be some complication with any_object. We'd probably need an
 OBJ_NONE slab, and then to allocate a type-specific index when the
 type is assigned. There's already a central point for this in
 object_as_type(). But we'd also have to migrate any
 previously-stored slab data (stored when it was OBJ_NONE), which
 implies having a global list of slabs.

So I dunno. It seems do-able but complicated.

-Peff

Re: [PATCH] tag: fix column output not using all terminal space

On Fri, May 11, 2018 at 10:28 AM, Jeff King  wrote:
> On Fri, May 11, 2018 at 09:56:02AM +0200, Nguyễn Thái Ngọc Duy wrote:
>
>> git-tag runs a separate git-column command via run_column_filter().
>> This makes the new 'git-column' process fail to pick up the terminal
>> width for some reason and fall back to default width. Just explicitly
>> pass terminal width and avoid this terminal width detection business
>> in subprocesses.
>
> I think "some reason" is that we start the pager before running "git
> column". Running "git --no-pager tag --column=row" seems to fix it.
>
> It doesn't seem to have anything to do with the pager program itself.
> Doing:
>
>   # use sh to avoid optimizing out pager invocation
>   GIT_PAGER='sh -c cat' git tag --column=row
>
> shows the same problem. It looks like we force term_columns() to run
> before invoking the pager in order to cache the value. That makes sense,
> since TIOCGWINSZ on stdout is not going to be valid after we start
> piping it to the pager. But of course our git-column sub-process won't
> see that; the value is cached only in memory.
>
> So I think the approach of communicating the width to the sub-process is
> the right one. But I think we'd probably want to do so through the
> $COLUMNS variable, rather than passing a command-line option. That would
> fix the same bug for other cases where we might have multiple layers of
> sub-processes (e.g., if we pipe to the pager and then run a hook which
> columnizes output).
>
> Something like this seems to make it work for me:
>
> diff --git a/pager.c b/pager.c
> index 92b23e6cd1..c4f3412a84 100644
> --- a/pager.c
> +++ b/pager.c
> @@ -162,8 +162,12 @@ int term_columns(void)
>  #ifdef TIOCGWINSZ
> else {
> struct winsize ws;
> -   if (!ioctl(1, TIOCGWINSZ, ) && ws.ws_col)
> +   if (!ioctl(1, TIOCGWINSZ, ) && ws.ws_col) {
> +   char buf[64];
> term_columns_at_startup = ws.ws_col;
> +   xsnprintf(buf, sizeof(buf), "%d", ws.ws_col);
> +   setenv("COLUMNS", buf, 0);
> +   }
> }
>  #endif
>
>
> though perhaps that should go into setup_pager(), which is what is
> actually making stdout inaccessible.
>
> As an aside, I was confused while looking into this because I _thought_
> I had COLUMNS set:
>
>   $ echo $COLUMNS
>   119
>
> But it turns out that bash sets that by default (if you have the
> checkwinsize option on) but does not export it. ;)

Yep. This confused me too and I was "f*ck it I don't want to deal with
this tty crap again". I'll leave the term_columns() fix to you then,
and limit this patch to the "while at there" part which should be
fixed anyway.
-- 
Duy

Re: [PATCH v2 0/4] Fix mem leaks of recent object store conversions.

On Thu, May 10, 2018 at 12:58:45PM -0700, Stefan Beller wrote:

> This series replaces the two commits that were queued on 
> sb/object-store-replace,
> fixing memory leaks that were recently introduced.
> 
> Compared to v1, I merged the two independent series from yesterday,
> rewrote the commit message to clear up Junios confusion and addresses Peffs
> comments for the packfiles as well.

Mostly. :)

My one remaining complaint is that the bitmap code may hold on to a
dangling pointer to a packed_git after this series.

I think that is part of a larger problem, though, which is that the
bitmap code's globals need to be part of the struct raw_object_store.
I think this can already cause problems before your series if we were to
try to use bitmaps in both a superproject and a submodule in the same
process, though I think we'd at least hit the "ignoring extra bitmap
file" code path in open_pack_bitmap_1(). So right now it's an annoyance,
but after your series it becomes a potential segfault.

-Peff

Re: [PATCH] tag: fix column output not using all terminal space

On Fri, May 11, 2018 at 09:56:02AM +0200, Nguyễn Thái Ngọc Duy wrote:

> git-tag runs a separate git-column command via run_column_filter().
> This makes the new 'git-column' process fail to pick up the terminal
> width for some reason and fall back to default width. Just explicitly
> pass terminal width and avoid this terminal width detection business
> in subprocesses.

I think "some reason" is that we start the pager before running "git
column". Running "git --no-pager tag --column=row" seems to fix it.

It doesn't seem to have anything to do with the pager program itself.
Doing:

  # use sh to avoid optimizing out pager invocation
  GIT_PAGER='sh -c cat' git tag --column=row

shows the same problem. It looks like we force term_columns() to run
before invoking the pager in order to cache the value. That makes sense,
since TIOCGWINSZ on stdout is not going to be valid after we start
piping it to the pager. But of course our git-column sub-process won't
see that; the value is cached only in memory.

So I think the approach of communicating the width to the sub-process is
the right one. But I think we'd probably want to do so through the
$COLUMNS variable, rather than passing a command-line option. That would
fix the same bug for other cases where we might have multiple layers of
sub-processes (e.g., if we pipe to the pager and then run a hook which
columnizes output).

Something like this seems to make it work for me:

diff --git a/pager.c b/pager.c
index 92b23e6cd1..c4f3412a84 100644
--- a/pager.c
+++ b/pager.c
@@ -162,8 +162,12 @@ int term_columns(void)
 #ifdef TIOCGWINSZ
else {
struct winsize ws;
-   if (!ioctl(1, TIOCGWINSZ, ) && ws.ws_col)
+   if (!ioctl(1, TIOCGWINSZ, ) && ws.ws_col) {
+   char buf[64];
term_columns_at_startup = ws.ws_col;
+   xsnprintf(buf, sizeof(buf), "%d", ws.ws_col);
+   setenv("COLUMNS", buf, 0);
+   }
}
 #endif

though perhaps that should go into setup_pager(), which is what is
actually making stdout inaccessible.

As an aside, I was confused while looking into this because I _thought_
I had COLUMNS set:

  $ echo $COLUMNS
  119

But it turns out that bash sets that by default (if you have the
checkwinsize option on) but does not export it. ;)

-Peff

[PATCH] tag: fix column output not using all terminal space

git-tag runs a separate git-column command via run_column_filter().
This makes the new 'git-column' process fail to pick up the terminal
width for some reason and fall back to default width. Just explicitly
pass terminal width and avoid this terminal width detection business
in subprocesses.

While at there, fix an off-by-one column setting in git-column. We do
not want to use up _all_ terminal columns because the last character
is going hit the border and wrap. Keep it at term_columns() - 1 like
print_columns() does. This affects the test in t7004 because effective
column width before was 40 but now 39 so we need to adjust it back.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/column.c | 2 +-
 column.c | 2 ++
 t/t7004-tag.sh   | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/column.c b/builtin/column.c
index 0c3223d64b..182c84f778 100644
--- a/builtin/column.c
+++ b/builtin/column.c
@@ -42,7 +42,7 @@ int cmd_column(int argc, const char **argv, const char 
*prefix)
git_config(column_config, NULL);
 
memset(, 0, sizeof(copts));
-   copts.width = term_columns();
+   copts.width = term_columns() - 1;
copts.padding = 1;
argc = parse_options(argc, argv, "", options, builtin_column_usage, 0);
if (argc)
diff --git a/column.c b/column.c
index 49ab85b769..382537b324 100644
--- a/column.c
+++ b/column.c
@@ -381,6 +381,8 @@ int run_column_filter(int colopts, const struct 
column_options *opts)
argv_array_pushf(argv, "--raw-mode=%d", colopts);
if (opts && opts->width)
argv_array_pushf(argv, "--width=%d", opts->width);
+   else
+   argv_array_pushf(argv, "--width=%d", term_columns() - 1);
if (opts && opts->indent)
argv_array_pushf(argv, "--indent=%s", opts->indent);
if (opts && opts->padding)
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index e3f1e014aa..d7b319e919 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -363,7 +363,7 @@ test_expect_success 'tag -l  -l  works, 
as our buggy documenta
 '
 
 test_expect_success 'listing tags in column' '
-   COLUMNS=40 git tag -l --column=row >actual &&
+   COLUMNS=41 git tag -l --column=row >actual &&
cat >expected <<\EOF &&
 a1  aa1 cba t210t211
 v0.2.1  v1.0v1.0.1  v1.1.3
-- 
2.17.0.705.g3525833791

Re: [PATCH 1/9] Add and use generic name->id mapping code for color slot parsing

On Thu, May 10, 2018 at 7:16 PM, Stefan Beller  wrote:
> On Thu, May 10, 2018 at 7:19 AM, Nguyễn Thái Ngọc Duy  
> wrote:
>
>>  7 files changed, 82 insertions(+), 112 deletions(-)
>
> Nice!
>
>
>>
>> +static const char *color_branch_slots[] = {
>> +   [BRANCH_COLOR_RESET]= "reset",
>
> In 512f41cfac5 (clean.c: use designated initializer, 2017-07-14)
> we thought we'll do it once and see if anyone complains
> (and it shipped v2.15.0, 2017-10-29), and so far
> nobody complained half a year later. So designated initializers
> are all good now?

I don't know :) but it's worth pushing. I don't think this series
would land on the next release, which gives people a couple more
months to complain about C99. It does save me time and if it causes
problem, removing designated initializers is trivial

> Do we want to mention this decision in the commit message?
>
> If so, the patch looks good!
> Thanks,
> Stefan
-- 
Duy

[PATCH v3] pack-format.txt: more details on pack file format

The current document mentions OBJ_* constants without their actual
values. A git developer would know these are from cache.h but that's
not very friendly to a person who wants to read this file to implement
a pack file parser.

Similarly, the deltified representation is not documented at all (the
"document" is basically patch-delta.c). Translate that C code to
English with a bit more about what ofs-delta and ref-delta mean.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Diff from v2

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 00351cb822..70a99fd142 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -56,27 +56,37 @@ blob. However to save space, an object could be stored 
as a "delta" of
 another "base" object. These representations are assigned new types
 ofs-delta and ref-delta, which is only valid in a pack file.
 
-Both ofs-delta and ref-delta store the "delta" against another
-object. The difference between them is, ref-delta directly encodes
-20-byte base object name. If the base object is in the same pack,
-ofs-delta encodes the offset of the base object in the pack instead.
+Both ofs-delta and ref-delta store the "delta" to be applied to
+another object (called 'base object') to reconstruct the object. The
+difference between them is, ref-delta directly encodes 20-byte base
+object name. If the base object is in the same pack, ofs-delta encodes
+the offset of the base object in the pack instead.
+
+The base object could also be deltified if it's in the same pack.
+Ref-delta can also refer to an object outside the pack (i.e. the
+so-called "thin pack"). When stored on disk however, the pack should
+be self contained to avoid cyclic dependency.
 
 The delta data is a sequence of instructions to reconstruct an object
-from the base object. Each instruction appends more and more data to
-the target object until it's complete. There are two supported
-instructions so far: one for copy a byte range from the source object
-and one for inserting new data embedded in the instruction itself.
+from the base object. If the base object is deltified, it must be
+converted to canonical form first. Each instruction appends more and
+more data to the target object until it's complete. There are two
+supported instructions so far: one for copy a byte range from the
+source object and one for inserting new data embedded in the
+instruction itself.
 
 Each instruction has variable length. Instruction type is determined
 by the seventh bit of the first octet. The following diagrams follow
 the convention in RFC 1951 (Deflate compressed data format).
 
+ Instruction to copy from base object
+
   
+--+-+-+-+-+---+---+---+
   | 1xxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | 
size3 |
   
+--+-+-+-+-+---+---+---+
 
 This is the instruction format to copy a byte range from the source
-object. It encodes the offset to copy from any the number of bytes to
+object. It encodes the offset to copy from and the number of bytes to
 copy. Offset and size are in little-endian order.
 
 All offset and size bytes are optional. This is to reduce the
@@ -99,6 +109,8 @@ In its most compact form, this instruction only takes up 
one byte
 values zero. There is another exception: size zero is automatically
 converted to 0x1.
 
+ Instruction to add new data
+
   +--++
   | 0xxx |data|
   +--++
@@ -108,6 +120,8 @@ object. The following data is appended to the target 
object. The first
 seven bits of the first octet determines the size of data in
 bytes. The size must be non-zero.
 
+ Reserved instruction
+
   +--+
   |  |
   +--+

 Documentation/technical/pack-format.txt | 92 +
 cache.h |  5 ++
 2 files changed, 97 insertions(+)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 8e5bf60be3..70a99fd142 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -36,6 +36,98 @@ Git pack format
 
   - The trailer records 20-byte SHA-1 checksum of all of the above.
 
+=== Object types
+
+Valid object types are:
+
+- OBJ_COMMIT (1)
+- OBJ_TREE (2)
+- OBJ_BLOB (3)
+- OBJ_TAG (4)
+- OBJ_OFS_DELTA (6)
+- OBJ_REF_DELTA (7)
+
+Type 5 is reserved for future expansion. Type 0 is invalid.
+
+=== Deltified representation
+
+Conceptually there are only four object types: commit, tree,

Re: [PATCH v2] pack-format.txt: more details on pack file format

On Thu, May 10, 2018 at 7:06 PM, Stefan Beller  wrote:
>> +=== Deltified representation
>> +
>> +Conceptually there are only four object types: commit, tree, tag and
>> +blob. However to save space, an object could be stored as a "delta" of
>> +another "base" object. These representations are assigned new types
>> +ofs-delta and ref-delta, which is only valid in a pack file.
>
> ...only valid...
>
> as opposed to loose objects or as opposed to referencing cross-packs?
> I would think the former, not the latter.

Yeah. This is pretty much an implementation detail of a pack. The
"real" type is always blob/commit/tree/tag. But you only see this when
you dig deep down in pack-related code.

>> +Both ofs-delta and ref-delta store the "delta" against another
>> +object. The difference between them is, ref-delta directly encodes
>> +20-byte base object name. If the base object is in the same pack,
>> +ofs-delta encodes the offset of the base object in the pack instead.
>
> Reading this paragraph clears up the question from before.
> The ref delta is a delta to another "reference by hash id (sha1)".
> What abbreviation is OFS? OFfSet ?

I guess so. I never bothered to track down the source for that.

>> +The delta data is a sequence of instructions to reconstruct an object
>> +from the base object.
>
> As said before the base object is of type 1..4, we do not "delta-on-delta"
> yet, but to construct the object we have to create the base object first,
> which itself can be represented as a deltified object leading to a delta
> chain.

Yeah that's the delta chain concept. I'll just make a note here about
base object potentially being a delta object as well.
-- 
Duy

Re: [PATCH v2] add status config and command line options for rename detection

Ben Peart  writes:

>  Documentation/config.txt | 10 
>  Documentation/git-status.txt | 10 
>  builtin/commit.c | 41 
>  diff.c   |  2 +-
>  diff.h   |  1 +
>  t/t7525-status-rename.sh | 90 
>  wt-status.c  | 12 +
>  wt-status.h  |  4 +-
>  8 files changed, 168 insertions(+), 2 deletions(-)
>  create mode 100644 t/t7525-status-rename.sh

I'll mark the new script as executable (otherwise the test will not
even start).

Re: [PATCH 0/9] completion: avoid hard coding config var list

On Fri, May 11, 2018 at 8:00 AM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> While at there, this config list is also available to the user via
>> `git help --config` if you can't remember the exact config name you
>> want and don't want to go through that big git-config man page.
>
> Makes a reader anticipate a new user of levenshtein code, perhaps
> ;-)

Why would you go and write that for? Now I want to do it :-) Yeah I
think config name typo has been a common complaint (including from
me).
-- 
Duy

Re: [PATCH] fast-export: avoid NULL pointer arithmetic

On Fri, May 11, 2018 at 6:49 AM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> René Scharfe  writes:
>>
 But it somehow feels backwards in spirit to me, as the reason why we
 use "void *" there in the decoration field is because we expect that
 we'd have a pointer to some struture most of the time, and we have
 to occasionally store a small integer there.
>>>
>>> Yes, fast-export seems to be the only place that stores an integer as
>>> a decoration.
>>
>> With the decoration subsystem that might be the case, but I think
>> we have other codepaths where "void * .util" field in the structure
>> is used to store (void *)1, expecting that a normal allocation will
>> never yield a pointer that is indistinguishable from that value.
>
> I was looking at a different topic and noticed that bisect.c uses
> commit->util (which is of type "void *") to always store an int (it
> never stores a pointer and there is no mixing).  This one is equally
> unportable as fast-export after your fix.
>

In both cases we should be able to use commit-slab instead of
commit->util. We could go even further and kill "util" pointer but
that's more work.
-- 
Duy

Re: [PATCH 0/9] completion: avoid hard coding config var list