Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
On Sat, May 12, 2018 at 02:08:48PM +0900, Junio C Hamano wrote: > Taylor Blauwrites: > > >> $ git grep -v second > >> $ git grep --not -e second > >> > >> may hit all lines in this message (except for the obvious two > >> lines), but we cannot say which column we found a hit. I am > >> wondering if it is too grave a sin to report "the whole line is what > >> satisfied the criteria given" and say the match lies at column #1. > > > > I think that is sensible. I previously was opposed to this because I > > thought that it would be too difficult to script around the 'sometimes > > we have columns but other times not' and 'I gave --column' but have to > > check whether or not they are really there. > > I am not sure if you really got what I meant. I am suggesting that > "git grep -v --column second" should report that the entire line has > hit for each and every line that does not have "second" on it, which > is a good answer and eliminate "sometimes there is column answer (or > answer to -o query) and sometimes not" at the same time. I re-read your note and understand more clearly now what your suggestion is. To ensure that we're in agreement, do you mean: 1. '--column -v' will _never_ give a column, but will never die(), either 2. '--column --[and | or | not]' will never give a column, but will also never die(), either. I think that _those_ semantics are sensible, and I apologize for misinterpreting your original note. > > In other terms: > > > > * not giving '--column' will _never_ give a column, > > * '--column --invert' will _always_ die(), and > > * '--column --[and | or | not]' will _never_ give a column. > > So this is completely opposite from what I would have expected. to > somebody who said "I think that is sensible." over there. Thanks, Taylor
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
Taylor Blauwrites: >> $ git grep -v second >> $ git grep --not -e second >> >> may hit all lines in this message (except for the obvious two >> lines), but we cannot say which column we found a hit. I am >> wondering if it is too grave a sin to report "the whole line is what >> satisfied the criteria given" and say the match lies at column #1. > > I think that is sensible. I previously was opposed to this because I > thought that it would be too difficult to script around the 'sometimes > we have columns but other times not' and 'I gave --column' but have to > check whether or not they are really there. I am not sure if you really got what I meant. I am suggesting that "git grep -v --column second" should report that the entire line has hit for each and every line that does not have "second" on it, which is a good answer and eliminate "sometimes there is column answer (or answer to -o query) and sometimes not" at the same time. > In other terms: > > * not giving '--column' will _never_ give a column, > * '--column --invert' will _always_ die(), and > * '--column --[and | or | not]' will _never_ give a column. So this is completely opposite from what I would have expected. to somebody who said "I think that is sensible." over there.
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
On Thu, May 10, 2018 at 09:04:34AM +0900, Junio C Hamano wrote: > Taylor Blauwrites: > > > This check we should retain and change the wording to mention '--and', > > '--or', and '--not' specifically. > > Why are these problematic in the first place? If I said > > $ git grep -e first --and -e these > $ git grep -e first --and --not -e those > $ git grep -e first --or -e those > > I'd expect that the first line of this paragraph will hit, and the > first hit for these three are "these", "first" and "first", > respectively. Most importantly, in the last one, "--or" can be > omitted and the whole thing stops being "extended", so rejecting > extended as a whole does not make much sense. Agreed that this is what I would expect, too. The trouble is that we never do a compilation step from containing --and, --or, --not to a POSIX regexp. So, if we're extended, we have to assume that there might not be an answer. Given this, I don't think we should categorically die() when we encounter this (more below in the next hunk of this mail). I think this thread has established that there are certainly cases where we cannot provide a meaningful answer, (--not at the top-level, for instance) but there are cases where we can (as you indicate above). One day, I would like to support --column with --and, --or, or --not in cases where there _is_ a definite answer. That said, omitting this information for now and at least not die()-ing I think is worth taking this patch earlier, and leaving some #leftoverbits. > $ git grep -v second > $ git grep --not -e second > > may hit all lines in this message (except for the obvious two > lines), but we cannot say which column we found a hit. I am > wondering if it is too grave a sin to report "the whole line is what > satisfied the criteria given" and say the match lies at column #1. I think that is sensible. I previously was opposed to this because I thought that it would be too difficult to script around the 'sometimes we have columns but other times not' and 'I gave --column' but have to check whether or not they are really there. I no longer believe that my above argument is sound. It simplifies the matter greatly to simply not share columns when we don't have a good answer, and do when we do. In other terms: * not giving '--column' will _never_ give a column, * '--column --invert' will _always_ die(), and * '--column --[and | or | not]' will _never_ give a column. Thanks, Taylor
[PATCH v2 2/2] builtin/grep.c: teach '-o', '--only-matching' to 'git-grep'
Teach GNU grep(1)'s '-o' ('--only-matching') to 'git-grep'. This option prints only the matching components of each line. It writes multiple lines if more than one match exists on a given line. For example: $ /git grep -on --column git -- README.md | head -3 README.md:15:56:git README.md:18:20:git README.md:19:16:git By using show_line_header(), 'git grep --only-matching' correctly respects the '--heading' option: $ git grep -on --column --heading git -- README.md | head -4 README.md 15:56:git 18:20:git 19:16:git We mirror GNU grep's behavior when given -A, -B, or -C with --only-matching, by displaying only the matching portion(s) of a line, ignoring contextual line(s), but displaying '--' (context separator) line(s). Notably: when show_line() is called on a line that contains _multiple_ matches, we keep track of a relative offset from the beginning of the line and increment 'cno' in subsequent calls to show_line_header() when the expression is not extended. In the extended case, we do not do this, because the column of the first match is undefined, thus relative offsets are meaningless. Signed-off-by: Taylor Blau--- Documentation/git-grep.txt | 6 +++- builtin/grep.c | 1 + grep.c | 34 +-- grep.h | 1 + t/t7810-grep.sh| 69 ++ 5 files changed, 107 insertions(+), 4 deletions(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index c48a578cb1..5c09abec4a 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -20,7 +20,7 @@ SYNOPSIS [-c | --count] [--all-match] [-q | --quiet] [--max-depth ] [--color[=] | --no-color] - [--break] [--heading] [-p | --show-function] + [--break] [--heading] [-o | --only-matching] [-p | --show-function] [-A ] [-B ] [-C ] [-W | --function-context] [--threads ] @@ -223,6 +223,10 @@ providing this option will cause it to die. Show the filename above the matches in that file instead of at the start of each shown line. +--o:: +--only-matching:: + Prints only the matching part of the lines. + -p:: --show-function:: Show the preceding line that contains the function name of diff --git a/builtin/grep.c b/builtin/grep.c index f9f516dfc4..0507ac335a 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -851,6 +851,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) N_("print empty line between matches from different files")), OPT_BOOL(0, "heading", , N_("show filename only once above matches from same file")), + OPT_BOOL('o', "only-matching", _matching, N_("show only matches")), OPT_GROUP(""), OPT_CALLBACK('C', "context", , N_("n"), N_("show context lines before and after matches"), diff --git a/grep.c b/grep.c index 36bf7cf08d..9297fde643 100644 --- a/grep.c +++ b/grep.c @@ -1422,12 +1422,24 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol, opt->output(opt, "\n", 1); } } + if (opt->only_matching && sign != ':') { + /* +* If we're given '--only-matching' and the line is a contextual +* one (i.e., we're given '-A', '-B', or '-C'), mark the line as +* shown (to advance opt->last_shown), but do not show it (since +* we are given '--only-matching'). +*/ + opt->last_shown = lno; + return; + } show_line_header(opt, name, lno, cno, sign); - if (opt->color) { + if (opt->color || opt->only_matching) { regmatch_t match; enum grep_context ctx = GREP_CONTEXT_BODY; int ch = *eol; int eflags = 0; + int first = 1; + int offset = 1; if (sign == ':') match_color = opt->color_match_selected; @@ -1444,16 +1456,32 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol, if (match.rm_so == match.rm_eo) break; - output_color(opt, bol, match.rm_so, line_color); + if (!opt->only_matching) { + output_color(opt, bol, match.rm_so, line_color); + } else if (!first) { + /* +* We are given --only-matching, and this is not +* the first match on a line. Reprint the +* newline and header before showing another +* match. +*/ +
[PATCH v2 0/2] builtin/grep.c: teach '-o', '--only-matching'
Hi, Attached is the second re-roll of my series to add GNU grep's '--only-matching' to git-grep. The main thing that has changed since last time is our handling of -{A,B,C}. Previously, as Peff points out in [1], we handle this in a buggy way different than GNU. I agree that although 'git grep -C -o ...' is an unusual invocation, it is useful to (1) maintain as much consistency as reasonably makes sense, and (2) to at least not be buggy. I have also responded to Eric's suggestions in [2], and [3]. Thanks as always for your kind review :-). Thanks, Taylor [1]: https://public-inbox.org/git/20180510064014.ga31...@sigill.intra.peff.net [2]: https://public-inbox.org/git/capig+csrjww4-7vj6wk8aofnb20bqucsooysjdpci1r5vb8...@mail.gmail.com [3]: https://public-inbox.org/git/capig+crbbz+qtqgiw_wq9e-groa-wtevp1vcrqmj5yqj8ty...@mail.gmail.com Taylor Blau (2): grep.c: extract show_line_header() builtin/grep.c: teach '-o', '--only-matching' to 'git-grep' Documentation/git-grep.txt | 6 ++- builtin/grep.c | 1 + grep.c | 78 +++--- grep.h | 1 + t/t7810-grep.sh| 69 + 5 files changed, 132 insertions(+), 23 deletions(-) -- 2.17.0
[PATCH v2 1/2] grep.c: extract show_line_header()
Teach 'git-grep(1)' how to print a line header multiple times from show_line() in preparation for it learning '--only-matching'. show_line_header() comprises the code in show_line() executed in order to produce a line header. It is a one-for-one extraction of the existing implementation. For now, show_line_header() provides no benefit over the change before this patch. The following patch will conditionally call show_line_header() multiple times per invocation to show_line(), which is the desired benefit of this change. Signed-off-by: Taylor Blau--- grep.c | 44 +--- 1 file changed, 25 insertions(+), 19 deletions(-) diff --git a/grep.c b/grep.c index 5ba1b05526..36bf7cf08d 100644 --- a/grep.c +++ b/grep.c @@ -1369,26 +1369,9 @@ static int next_match(struct grep_opt *opt, char *bol, char *eol, return hit; } -static void show_line(struct grep_opt *opt, char *bol, char *eol, - const char *name, unsigned lno, unsigned cno, char sign) +static void show_line_header(struct grep_opt *opt, const char *name, + unsigned lno, unsigned cno, char sign) { - int rest = eol - bol; - const char *match_color, *line_color = NULL; - - if (opt->file_break && opt->last_shown == 0) { - if (opt->show_hunk_mark) - opt->output(opt, "\n", 1); - } else if (opt->pre_context || opt->post_context || opt->funcbody) { - if (opt->last_shown == 0) { - if (opt->show_hunk_mark) { - output_color(opt, "--", 2, opt->color_sep); - opt->output(opt, "\n", 1); - } - } else if (lno > opt->last_shown + 1) { - output_color(opt, "--", 2, opt->color_sep); - opt->output(opt, "\n", 1); - } - } if (opt->heading && opt->last_shown == 0) { output_color(opt, name, strlen(name), opt->color_filename); opt->output(opt, "\n", 1); @@ -1417,6 +1400,29 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol, output_color(opt, buf, strlen(buf), opt->color_columnno); output_sep(opt, sign); } +} + +static void show_line(struct grep_opt *opt, char *bol, char *eol, + const char *name, unsigned lno, unsigned cno, char sign) +{ + int rest = eol - bol; + const char *match_color, *line_color = NULL; + + if (opt->file_break && opt->last_shown == 0) { + if (opt->show_hunk_mark) + opt->output(opt, "\n", 1); + } else if (opt->pre_context || opt->post_context || opt->funcbody) { + if (opt->last_shown == 0) { + if (opt->show_hunk_mark) { + output_color(opt, "--", 2, opt->color_sep); + opt->output(opt, "\n", 1); + } + } else if (lno > opt->last_shown + 1) { + output_color(opt, "--", 2, opt->color_sep); + opt->output(opt, "\n", 1); + } + } + show_line_header(opt, name, lno, cno, sign); if (opt->color) { regmatch_t match; enum grep_context ctx = GREP_CONTEXT_BODY; -- 2.17.0
[PATCH v6 4/7] grep.c: display column number of first match
To prepare for 'git grep' learning '--column', teach grep.c's show_line() how to show the column of the first match on non-context line. Signed-off-by: Taylor Blau--- grep.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/grep.c b/grep.c index fb0fa23231..f3fe416791 100644 --- a/grep.c +++ b/grep.c @@ -1364,7 +1364,7 @@ static int next_match(struct grep_opt *opt, char *bol, char *eol, } static void show_line(struct grep_opt *opt, char *bol, char *eol, - const char *name, unsigned lno, char sign) + const char *name, unsigned lno, unsigned cno, char sign) { int rest = eol - bol; const char *match_color, *line_color = NULL; @@ -1399,6 +1399,17 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol, output_color(opt, buf, strlen(buf), opt->color_lineno); output_sep(opt, sign); } + /* +* Treat 'cno' as the 1-indexed offset from the start of a non-context +* line to its first match. Otherwise, 'cno' is 0 indicating that we are +* being called with a context line. +*/ + if (opt->columnnum && cno) { + char buf[32]; + xsnprintf(buf, sizeof(buf), "%d", cno); + output_color(opt, buf, strlen(buf), opt->color_columnno); + output_sep(opt, sign); + } if (opt->color) { regmatch_t match; enum grep_context ctx = GREP_CONTEXT_BODY; @@ -1504,7 +1515,7 @@ static void show_funcname_line(struct grep_opt *opt, struct grep_source *gs, break; if (match_funcname(opt, gs, bol, eol)) { - show_line(opt, bol, eol, gs->name, lno, '='); + show_line(opt, bol, eol, gs->name, lno, 0, '='); break; } } @@ -1569,7 +1580,7 @@ static void show_pre_context(struct grep_opt *opt, struct grep_source *gs, while (*eol != '\n') eol++; - show_line(opt, bol, eol, gs->name, cur, sign); + show_line(opt, bol, eol, gs->name, cur, 0, sign); bol = eol + 1; cur++; } @@ -1833,7 +1844,7 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle show_pre_context(opt, gs, bol, eol, lno); else if (opt->funcname) show_funcname_line(opt, gs, bol, lno); - show_line(opt, bol, eol, gs->name, lno, ':'); + show_line(opt, bol, eol, gs->name, lno, match.rm_so+1, ':'); last_hit = lno; if (opt->funcbody) show_function = 1; @@ -1862,7 +1873,7 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle /* If the last hit is within the post context, * we need to show this line. */ - show_line(opt, bol, eol, gs->name, lno, '-'); + show_line(opt, bol, eol, gs->name, lno, match.rm_so+1, '-'); } next_line: -- 2.17.0
[PATCH v6 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
Teach 'git-grep(1)' a new option, '--column', to show the column number of the first match on a non-context line. This makes it possible to teach 'contrib/git-jump/git-jump' how to seek to the first matching position of a grep match in your editor, and allows similar additional scripting capabilities. For example: $ git grep -n --column foo | head -n3 .clang-format:51:14:# myFunction(foo, bar, baz); .clang-format:64:7:# int foo(); .clang-format:75:8:# void foo() Signed-off-by: Taylor Blau--- Documentation/git-grep.txt | 7 ++- builtin/grep.c | 4 grep.c | 5 +++-- t/t7810-grep.sh| 39 ++ 4 files changed, 52 insertions(+), 3 deletions(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 18b494731f..cec4665df5 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -13,7 +13,7 @@ SYNOPSIS [-v | --invert-match] [-h|-H] [--full-name] [-E | --extended-regexp] [-G | --basic-regexp] [-P | --perl-regexp] - [-F | --fixed-strings] [-n | --line-number] + [-F | --fixed-strings] [-n | --line-number] [--column] [-l | --files-with-matches] [-L | --files-without-match] [(-O | --open-files-in-pager) []] [-z | --null] @@ -169,6 +169,11 @@ providing this option will cause it to die. --line-number:: Prefix the line number to matching lines. +--column:: + Prefix the 1-indexed byte-offset of the first match from the start of the + matching line. This option is incompatible with '--invert-match', and + ignored with expressions using '--and', '--or', '--not'. + -l:: --files-with-matches:: --name-only:: diff --git a/builtin/grep.c b/builtin/grep.c index 5f32d2ce84..f9f516dfc4 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -829,6 +829,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) GREP_PATTERN_TYPE_PCRE), OPT_GROUP(""), OPT_BOOL('n', "line-number", , N_("show line numbers")), + OPT_BOOL(0, "column", , N_("show column number of first match")), OPT_NEGBIT('h', NULL, , N_("don't show filenames"), 1), OPT_BIT('H', NULL, , N_("show filenames"), 1), OPT_NEGBIT(0, "full-name", , @@ -,6 +1112,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix) hit = grep_objects(, , the_repository, ); } + if (opt.columnnum && opt.invert) + die(_("--column and --invert-match cannot be combined")); + if (num_threads) hit |= wait_all(); if (hit && show_in_pager) diff --git a/grep.c b/grep.c index f3fe416791..7396b49a2d 100644 --- a/grep.c +++ b/grep.c @@ -1402,9 +1402,10 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol, /* * Treat 'cno' as the 1-indexed offset from the start of a non-context * line to its first match. Otherwise, 'cno' is 0 indicating that we are -* being called with a context line. +* being called with a context line, or we are --extended, and cannot +* always show an answer. */ - if (opt->columnnum && cno) { + if (opt->columnnum && sign == ':' && !opt->extended) { char buf[32]; xsnprintf(buf, sizeof(buf), "%d", cno); output_color(opt, buf, strlen(buf), opt->color_columnno); diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index 1797f632a3..491b2e044a 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -99,6 +99,40 @@ do test_cmp expected actual ' + test_expect_success "grep -w $L (with --column)" ' + { + echo ${HC}file:5:foo mmap bar + echo ${HC}file:14:foo_mmap bar mmap + echo ${HC}file:5:foo mmap bar_mmap + echo ${HC}file:14:foo_mmap bar mmap baz + } >expected && + git grep --column -w -e mmap $H >actual && + test_cmp expected actual + ' + + test_expect_success "grep -w $L (with --column, -C)" ' + { + echo ${HC}file:5:foo mmap bar + echo ${HC}file-foo_mmap bar + echo ${HC}file:14:foo_mmap bar mmap + echo ${HC}file:5:foo mmap bar_mmap + echo ${HC}file:14:foo_mmap bar mmap baz + } >expected && + git grep --column -w -C1 -e mmap $H >actual && + test_cmp expected actual + ' + + test_expect_success "grep -w $L (with --line-number, --column)" ' + { + echo ${HC}file:1:5:foo mmap bar + echo ${HC}file:3:14:foo_mmap bar mmap + echo
[PATCH v6 7/7] contrib/git-jump/git-jump: jump to match column in addition to line
Take advantage of 'git-grep(1)''s new option, '--column' in order to teach Peff's 'git-jump' script how to jump to the correct column for any given match. 'git-grep(1)''s output is in the correct format for Vim's jump list, so no additional cleanup is necessary. Signed-off-by: Taylor Blau--- contrib/git-jump/README | 12 ++-- contrib/git-jump/git-jump | 2 +- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/contrib/git-jump/README b/contrib/git-jump/README index 4484bda410..2f618a7f97 100644 --- a/contrib/git-jump/README +++ b/contrib/git-jump/README @@ -25,6 +25,13 @@ git-jump will feed this to the editor: foo.c:2: printf("hello word!\n"); --- +Or, when running 'git jump grep', column numbers will also be emitted, +e.g. `git jump grep "hello"` would return: + +--- +foo.c:2:9: printf("hello word!\n"); +--- + Obviously this trivial case isn't that interesting; you could just open `foo.c` yourself. But when you have many changes scattered across a project, you can use the editor's support to "jump" from point to point. @@ -35,7 +42,8 @@ Git-jump can generate four types of interesting lists: 2. The beginning of any merge conflict markers. - 3. Any grep matches. + 3. Any grep matches, including the column of the first match on a + line. 4. Any whitespace errors detected by `git diff --check`. @@ -82,7 +90,7 @@ which does something similar to `git jump grep`. However, it is limited to positioning the cursor to the correct line in only the first file, leaving you to locate subsequent hits in that file or other files using the editor or pager. By contrast, git-jump provides the editor with a -complete list of files and line numbers for each match. +complete list of files, lines, and a column number for each match. Limitations diff --git a/contrib/git-jump/git-jump b/contrib/git-jump/git-jump index 80ab0590bc..931b0fe3a9 100755 --- a/contrib/git-jump/git-jump +++ b/contrib/git-jump/git-jump @@ -52,7 +52,7 @@ mode_merge() { # editor shows them to us in the status bar. mode_grep() { cmd=$(git config jump.grepCmd) - test -n "$cmd" || cmd="git grep -n" + test -n "$cmd" || cmd="git grep -n --column" $cmd "$@" | perl -pe ' s/[ \t]+/ /g; -- 2.17.0
[PATCH v6 6/7] grep.c: add configuration variables to show matched option
To support git-grep(1)'s new option, '--column', document and teach grep.c how to interpret relevant configuration options, similar to those associated with '--line-number'. Signed-off-by: Taylor Blau--- Documentation/config.txt | 5 + Documentation/git-grep.txt | 3 +++ grep.c | 6 ++ 3 files changed, 14 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index 6e8d969f52..b3c861c5c3 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1159,6 +1159,8 @@ color.grep.:: function name lines (when using `-p`) `lineNumber`;; line number prefix (when using `-n`) +`column`;; + column number prefix (when using `--column`) `match`;; matching text (same as setting `matchContext` and `matchSelected`) `matchContext`;; @@ -1708,6 +1710,9 @@ gitweb.snapshot:: grep.lineNumber:: If set to true, enable `-n` option by default. +grep.column:: + If set to true, enable the `--column` option by default. + grep.patternType:: Set the default matching behavior. Using a value of 'basic', 'extended', 'fixed', or 'perl' will enable the `--basic-regexp`, `--extended-regexp`, diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index cec4665df5..c48a578cb1 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -44,6 +44,9 @@ CONFIGURATION grep.lineNumber:: If set to true, enable `-n` option by default. +grep.column:: + If set to true, enable the `--column` option by default. + grep.patternType:: Set the default matching behavior. Using a value of 'basic', 'extended', 'fixed', or 'perl' will enable the `--basic-regexp`, `--extended-regexp`, diff --git a/grep.c b/grep.c index 7396b49a2d..5ba1b05526 100644 --- a/grep.c +++ b/grep.c @@ -96,6 +96,10 @@ int grep_config(const char *var, const char *value, void *cb) opt->linenum = git_config_bool(var, value); return 0; } + if (!strcmp(var, "grep.column")) { + opt->columnnum = git_config_bool(var, value); + return 0; + } if (!strcmp(var, "grep.fullname")) { opt->relative = !git_config_bool(var, value); @@ -112,6 +116,8 @@ int grep_config(const char *var, const char *value, void *cb) color = opt->color_function; else if (!strcmp(var, "color.grep.linenumber")) color = opt->color_lineno; + else if (!strcmp(var, "color.grep.column")) + color = opt->color_columnno; else if (!strcmp(var, "color.grep.matchcontext")) color = opt->color_match_context; else if (!strcmp(var, "color.grep.matchselected")) -- 2.17.0
[PATCH v6 1/7] Documentation/config.txt: camel-case lineNumber for consistency
lineNumber has casing that is inconsistent with surrounding options, like color.grep.matchContext, and color.grep.matchSelected. Re-case this documentation in order to be consistent with the text around it, and to ensure that new entries are consistent, too. Signed-off-by: Taylor Blau--- Documentation/config.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 2659153cb3..6e8d969f52 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1157,7 +1157,7 @@ color.grep.:: filename prefix (when not using `-h`) `function`;; function name lines (when using `-p`) -`linenumber`;; +`lineNumber`;; line number prefix (when using `-n`) `match`;; matching text (same as setting `matchContext` and `matchSelected`) -- 2.17.0
[PATCH v6 2/7] grep.c: expose matched column in match_line()
When calling match_line(), callers presently cannot determine the relative offset of the match because match_line() discards the 'regmatch_t' that contains this information. Instead, teach match_line() to take in a 'regmatch_t *' so that callers can inspect the match's starting and ending offset from the beginning of the line. This additional argument has no effect when opt->extended is non-zero. We will later pass the starting offset from 'regmatch_t *' to show_line() in order to display the column number of the first match. Signed-off-by: Taylor Blau--- grep.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/grep.c b/grep.c index 65b90c10a3..1c25782355 100644 --- a/grep.c +++ b/grep.c @@ -1299,17 +1299,17 @@ static int match_expr(struct grep_opt *opt, char *bol, char *eol, } static int match_line(struct grep_opt *opt, char *bol, char *eol, - enum grep_context ctx, int collect_hits) + regmatch_t *match, enum grep_context ctx, + int collect_hits) { struct grep_pat *p; - regmatch_t match; if (opt->extended) return match_expr(opt, bol, eol, ctx, collect_hits); /* we do not call with collect_hits without being extended */ for (p = opt->pattern_list; p; p = p->next) { - if (match_one_pattern(p, bol, eol, ctx, , 0)) + if (match_one_pattern(p, bol, eol, ctx, match, 0)) return 1; } return 0; @@ -1699,6 +1699,7 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle int try_lookahead = 0; int show_function = 0; struct userdiff_driver *textconv = NULL; + regmatch_t match; enum grep_context ctx = GREP_CONTEXT_HEAD; xdemitconf_t xecfg; @@ -1788,7 +1789,7 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle if ((ctx == GREP_CONTEXT_HEAD) && (eol == bol)) ctx = GREP_CONTEXT_BODY; - hit = match_line(opt, bol, eol, ctx, collect_hits); + hit = match_line(opt, bol, eol, , ctx, collect_hits); *eol = ch; if (collect_hits) -- 2.17.0
[PATCH v6 3/7] grep.[ch]: extend grep_opt to allow showing matched column
To support showing the matched column when calling 'git-grep(1)', teach 'grep_opt' the normal set of options to configure the default behavior and colorization of this feature. Signed-off-by: Taylor Blau--- grep.c | 3 +++ grep.h | 2 ++ 2 files changed, 5 insertions(+) diff --git a/grep.c b/grep.c index 1c25782355..fb0fa23231 100644 --- a/grep.c +++ b/grep.c @@ -46,6 +46,7 @@ void init_grep_defaults(void) color_set(opt->color_filename, ""); color_set(opt->color_function, ""); color_set(opt->color_lineno, ""); + color_set(opt->color_columnno, ""); color_set(opt->color_match_context, GIT_COLOR_BOLD_RED); color_set(opt->color_match_selected, GIT_COLOR_BOLD_RED); color_set(opt->color_selected, ""); @@ -155,6 +156,7 @@ void grep_init(struct grep_opt *opt, const char *prefix) opt->extended_regexp_option = def->extended_regexp_option; opt->pattern_type_option = def->pattern_type_option; opt->linenum = def->linenum; + opt->columnnum = def->columnnum; opt->max_depth = def->max_depth; opt->pathname = def->pathname; opt->relative = def->relative; @@ -164,6 +166,7 @@ void grep_init(struct grep_opt *opt, const char *prefix) color_set(opt->color_filename, def->color_filename); color_set(opt->color_function, def->color_function); color_set(opt->color_lineno, def->color_lineno); + color_set(opt->color_columnno, def->color_columnno); color_set(opt->color_match_context, def->color_match_context); color_set(opt->color_match_selected, def->color_match_selected); color_set(opt->color_selected, def->color_selected); diff --git a/grep.h b/grep.h index 399381c908..08a0b391c5 100644 --- a/grep.h +++ b/grep.h @@ -127,6 +127,7 @@ struct grep_opt { int prefix_length; regex_t regexp; int linenum; + int columnnum; int invert; int ignore_case; int status_only; @@ -159,6 +160,7 @@ struct grep_opt { char color_filename[COLOR_MAXLEN]; char color_function[COLOR_MAXLEN]; char color_lineno[COLOR_MAXLEN]; + char color_columnno[COLOR_MAXLEN]; char color_match_context[COLOR_MAXLEN]; char color_match_selected[COLOR_MAXLEN]; char color_selected[COLOR_MAXLEN]; -- 2.17.0
[PATCH v6 0/7] Teach '--column' to 'git-grep(1)'
Hi, Attached is my sixth re-roll of a series to add '--column' to 'git grep'. The main change since v5 is supporting --column with queries containing --and, --or, or --not. Previously, I had chosen to die() in this case since there isn't always a good answer to "what is the first column of ?" but have gone back on this for two reasons: 1. It is important not to regress calls to git-jump/contrib/git-jump that contain --and, --or, or --not. 2. It is not that hard to detect the absence of column data in scripts. Likewise, git-jump will happily accept lines with or without columnar information, and Vim will accept it as-is. So, let's support --column and only die() when also given --invert-match. When we don't have a good answer, print nothing. Thanks, Taylor Taylor Blau (7): Documentation/config.txt: camel-case lineNumber for consistency grep.c: expose matched column in match_line() grep.[ch]: extend grep_opt to allow showing matched column grep.c: display column number of first match builtin/grep.c: add '--column' option to 'git-grep(1)' grep.c: add configuration variables to show matched option contrib/git-jump/git-jump: jump to match column in addition to line Documentation/config.txt | 7 ++- Documentation/git-grep.txt | 10 +- builtin/grep.c | 4 contrib/git-jump/README| 12 ++-- contrib/git-jump/git-jump | 2 +- grep.c | 40 +- grep.h | 2 ++ t/t7810-grep.sh| 39 + 8 files changed, 102 insertions(+), 14 deletions(-) Inter-diff (since v5): diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index dc8f76ce99..c48a578cb1 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -173,8 +173,9 @@ providing this option will cause it to die. Prefix the line number to matching lines. --column:: - Prefix the 1-indexed byte-offset of the first match on non-context lines. This - option is incompatible with '--invert-match', and extended expressions. + Prefix the 1-indexed byte-offset of the first match from the start of the + matching line. This option is incompatible with '--invert-match', and + ignored with expressions using '--and', '--or', '--not'. -l:: --files-with-matches:: diff --git a/grep.c b/grep.c index 5d904810ad..5ba1b05526 100644 --- a/grep.c +++ b/grep.c @@ -1001,9 +1001,6 @@ static void compile_grep_patterns_real(struct grep_opt *opt) else if (!opt->extended && !opt->debug) return; - if (opt->columnnum && opt->extended) - die(_("--column and extended expressions cannot be combined")); - p = opt->pattern_list; if (p) opt->pattern_expression = compile_pattern_expr(); @@ -1411,9 +1408,10 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol, /* * Treat 'cno' as the 1-indexed offset from the start of a non-context * line to its first match. Otherwise, 'cno' is 0 indicating that we are -* being called with a context line. +* being called with a context line, or we are --extended, and cannot +* always show an answer. */ - if (opt->columnnum && cno) { + if (opt->columnnum && sign == ':' && !opt->extended) { char buf[32]; xsnprintf(buf, sizeof(buf), "%d", cno); output_color(opt, buf, strlen(buf), opt->color_columnno); diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index aa56b21ed9..491b2e044a 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -110,6 +110,18 @@ do test_cmp expected actual ' + test_expect_success "grep -w $L (with --column, -C)" ' + { + echo ${HC}file:5:foo mmap bar + echo ${HC}file-foo_mmap bar + echo ${HC}file:14:foo_mmap bar mmap + echo ${HC}file:5:foo mmap bar_mmap + echo ${HC}file:14:foo_mmap bar mmap baz + } >expected && + git grep --column -w -C1 -e mmap $H >actual && + test_cmp expected actual + ' + test_expect_success "grep -w $L (with --line-number, --column)" ' { echo ${HC}file:1:5:foo mmap bar @@ -1617,9 +1629,4 @@ test_expect_success 'grep does not allow --column, --invert-match' ' test_i18ngrep "\-\-column and \-\-invert-match cannot be combined" err ' -test_expect_success 'grep does not allow --column, extended' ' - test_must_fail git grep --column --not -e pat 2>err && - test_i18ngrep "\-\-column and extended expressions cannot be combined" err -' - test_done -- 2.17.0
Bug report for git push
Hi everyone! I am facing this spurious issue (not easily reproducible and usually a retry fixes it) with git push: Warning: Permanently added 'github.com,192.30.255.112' (RSA) to the list of known hosts. Counting objects: 8, done. Delta compression using up to 2 threads. Compressing objects: 100% (8/8), done. Writing objects: 100% (8/8), 971 bytes | 971.00 KiB/s, done. Total 8 (delta 7), reused 0 (delta 0) remote: Resolving deltas: 0% (0/7) remote: Resolving deltas: 14% (1/7) remote: Resolving deltas: 28% (2/7) remote: Resolving deltas: 42% (3/7) remote: Resolving deltas: 57% (4/7) remote: Resolving deltas: 71% (5/7) remote: Resolving deltas: 85% (6/7) remote: Resolving deltas: 100% (7/7) remote: Resolving deltas: 100% (7/7), completed with 7 local objects. error: failed to push some refs to 'g...@github.com:quora/qcore.git' [May 12 12:14 AM remote._get_push_info] Error lines received while fetching: error: failed to push some refs to 'g...@github.com:quora/qcore.git' Push flags: 1040 Push summary: [remote rejected] (cannot lock ref 'refs/heads/master': is at cf2cc0c147d8215ec87d3ddaf32f0b2c58630423 but expected fdda486ad43a6e6b5dc5f2795ce27124e0686752) Remote repo rejected your commit. This is happening in git version 2.17.0 I've tried searching stack overflow and the git mailing list but the answers aren't recent enough or don’t seem to be permanent fixes. How do I fix this issue?
Re: [PATCH] doc: fix config API documentation about config_with_options
Brandon Williamswrites: > On 05/09, Antonio Ospite wrote: >> In commit dc8441fdb ("config: don't implicitly use gitdir or commondir", >> 2017-06-14) the function git_config_with_options was renamed to >> config_with_options to better reflect the fact that it does not access >> the git global config or the repo config by default. >> >> However Documentation/technical/api-config.txt still refers to the >> previous name, fix that. >> >> While at it also update the documentation about the extra parameters, >> because they too changed since the initial definition. >> >> Signed-off-by: Antonio Ospite >> --- >> >> Patch based on the maint branch. > > Thanks for updating the docs. Maybe one day we can migrate these docs > to the source files themselves, making it easier to keep up to date. > For now this is good :) Thanks, both.
Re: [PATCH] git-submodule.sh: try harder to fetch a submodule
Jonathan Niederwrites: >> HEAD is allowed by the protocol spec and would happen, if HEAD points at a >> ref, that this user cannot see (due to ACLs for example). > > A more typical example would be if the ref simply doesn't exist (i.e., > is a branch yet to be born). Indeed this is interesting. At first glance I thought this was about underlying "git clone" failing to grab things from a repository with unborn HEAD, but that part works perfectly OK. And if this failed clone were a full-repository clone that tried to grab even HEAD, then it is likely that we got the tip we need to populate the submodule's working tree (or the remote is useless for that in the first place). So the "allow to try even harder" is probably a good direction to go in. >> # is not reachable from a ref. >> is_tip_reachable "$sm_path" "$sha1" || >> fetch_in_submodule "$sm_path" $depth || > > Is keeping the '||' at the end of this line intended? Good question. It used to be guard1 || action1 || die guard2 || action2 || die Even after a successful exit from "action1", the code used to try the second attempt, and the patch is removing the first die, making the whole thing into guard1 || action1 || guard2 || action2 || die which suggests a grave regression, doesn't it? "action1" (a whole repository fetch) may not pull down the needed commit even the fetch operation itself may succeed, in which case "guard2" notices that the tip is still not here and "action2" (an exact SHA-1 fetch) tries to pull down the exact thing as the last resort. So it probably should be more like guard1 || action1 || warn guard2 || action2 || die so that no matter what the outcome of the action1 is, the second set gets executed.
Re: [PATCH] git-submodule.sh: try harder to fetch a submodule
On Fri, May 11, 2018 at 4:28 PM, Jonathan Niederwrote: > Hi, > > Stefan Beller wrote: > >> This is the logical continuum of fb43e31f2b4 (submodule: try harder to >> fetch needed sha1 by direct fetching sha1, 2016-02-23) and fixes it as >> some assumptions were not correct. > > Interesting. > > I think what would help most is an example set of commands I can use > to reproduce this (bonus points if in the form of a test). I tried coming up with a test in git-core that produces a remote similar to Gerrit, and let me tell you, it's not Git that is weird here. ;) >> > If $sha1 was not part of the default fetch ... fail ourselves here >> assumes that the fetch_in_submodule only fails when the serverside does > > I'm having some trouble with the formatting here. Is the part > preceded by '>' a quote, and if so a quote from what? The quote is from fb43e31f2b4. >> There are other failures, why such a fetch may fail, such as >> fatal: Couldn't find remote ref HEAD >> which can happen if the remote side doesn't advertise HEAD. Not advertising > > nit: it can be useful to have a blank line before and after such > example output to help both my eyes and tools like "git log > --format='%w(100)%b'" to understand the formatting. Why would you use this formatting? > >> HEAD is allowed by the protocol spec and would happen, if HEAD points at a >> ref, that this user cannot see (due to ACLs for example). > > A more typical example would be if the ref simply doesn't exist (i.e., > is a branch yet to be born). Oh, I checked that but not for the submodule case, let me check that. >> diff --git a/git-submodule.sh b/git-submodule.sh >> index 24914963ca2..13b378a6c8f 100755 >> --- a/git-submodule.sh >> +++ b/git-submodule.sh >> @@ -614,7 +614,6 @@ cmd_update() >> # is not reachable from a ref. >> is_tip_reachable "$sm_path" "$sha1" || >> fetch_in_submodule "$sm_path" $depth || > > Is keeping the '||' at the end of this line intended? yes, as we only want to run the code below if there was some error. >> - die "$(eval_gettext "Unable to fetch in >> submodule path '\$displaypath'")" >> >> # Now we tried the usual fetch, but $sha1 may >> # not be reachable from any of the refs >> is_tip_reachable "$sm_path" "$sha1" || >> fetch_in_submodule "$sm_path" $depth "$sha1" || >> die "$(eval_gettext "Fetched in submodule path >> '\$displaypath', but it did not contain \$sha1. Direct fetching of that >> commit failed.")" > > Should this error message be changed? I don't think so?
Proposal
-- Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Re: [PATCH] git-submodule.sh: try harder to fetch a submodule
Hi, Stefan Beller wrote: > This is the logical continuum of fb43e31f2b4 (submodule: try harder to > fetch needed sha1 by direct fetching sha1, 2016-02-23) and fixes it as > some assumptions were not correct. Interesting. I think what would help most is an example set of commands I can use to reproduce this (bonus points if in the form of a test). > > If $sha1 was not part of the default fetch ... fail ourselves here > assumes that the fetch_in_submodule only fails when the serverside does I'm having some trouble with the formatting here. Is the part preceded by '>' a quote, and if so a quote from what? > not support fetching by sha1. > > There are other failures, why such a fetch may fail, such as > fatal: Couldn't find remote ref HEAD > which can happen if the remote side doesn't advertise HEAD. Not advertising nit: it can be useful to have a blank line before and after such example output to help both my eyes and tools like "git log --format='%w(100)%b'" to understand the formatting. > HEAD is allowed by the protocol spec and would happen, if HEAD points at a > ref, that this user cannot see (due to ACLs for example). A more typical example would be if the ref simply doesn't exist (i.e., is a branch yet to be born). > So do try even harder for a submodule by ignoring the exit code of the > first fetch and rather relying on the following is_tip_reachable to > see if we try fetching again. > > Signed-off-by: Stefan Beller> --- > git-submodule.sh | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/git-submodule.sh b/git-submodule.sh > index 24914963ca2..13b378a6c8f 100755 > --- a/git-submodule.sh > +++ b/git-submodule.sh > @@ -614,7 +614,6 @@ cmd_update() > # is not reachable from a ref. > is_tip_reachable "$sm_path" "$sha1" || > fetch_in_submodule "$sm_path" $depth || Is keeping the '||' at the end of this line intended? > - die "$(eval_gettext "Unable to fetch in > submodule path '\$displaypath'")" > > # Now we tried the usual fetch, but $sha1 may > # not be reachable from any of the refs > is_tip_reachable "$sm_path" "$sha1" || > fetch_in_submodule "$sm_path" $depth "$sha1" || > die "$(eval_gettext "Fetched in submodule path > '\$displaypath', but it did not contain \$sha1. Direct fetching of that > commit failed.")" Should this error message be changed? Thanks, Jonathan
[PATCH] git-submodule.sh: try harder to fetch a submodule
This is the logical continuum of fb43e31f2b4 (submodule: try harder to fetch needed sha1 by direct fetching sha1, 2016-02-23) and fixes it as some assumptions were not correct. > If $sha1 was not part of the default fetch ... fail ourselves here assumes that the fetch_in_submodule only fails when the serverside does not support fetching by sha1. There are other failures, why such a fetch may fail, such as fatal: Couldn't find remote ref HEAD which can happen if the remote side doesn't advertise HEAD. Not advertising HEAD is allowed by the protocol spec and would happen, if HEAD points at a ref, that this user cannot see (due to ACLs for example). So do try even harder for a submodule by ignoring the exit code of the first fetch and rather relying on the following is_tip_reachable to see if we try fetching again. Signed-off-by: Stefan Beller--- git-submodule.sh | 1 - 1 file changed, 1 deletion(-) diff --git a/git-submodule.sh b/git-submodule.sh index 24914963ca2..13b378a6c8f 100755 --- a/git-submodule.sh +++ b/git-submodule.sh @@ -614,7 +614,6 @@ cmd_update() # is not reachable from a ref. is_tip_reachable "$sm_path" "$sha1" || fetch_in_submodule "$sm_path" $depth || - die "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'")" # Now we tried the usual fetch, but $sha1 may # not be reachable from any of the refs -- 2.17.0.582.gccdcbd54c44.dirty
Re: [PATCH 7/9] completion: drop the hard coded list of config vars
On Thu, May 10, 2018 at 10:19 AM Nguyễn Thái Ngọc Duywrote: > The new help option --config-for-completion is a machine friendlier > version of --config where all the placeholders and wildcards are > dropped, leaving only the good, completable prefixes for > git-completion.bash to consume. > Signed-off-by: Nguyễn Thái Ngọc Duy > --- > diff --git a/builtin/help.c b/builtin/help.c > @@ -446,9 +447,13 @@ int cmd_help(int argc, const char **argv, const char *prefix) > + int for_human = show_config == 1; > + > + if (for_human) > + setup_pager(); > + list_config_help(for_human); > + if (for_human) > + printf("\n%s\n", _("'git help config' for more information")); Simpler to read, perhaps: if (!for_human) list_config_help(0); else { setup_pager(); list_config_help(1); printf(...); }
Re: [PATCH 6/9] am: move advice.amWorkDir parsing back to advice.c
On Thu, May 10, 2018 at 10:19 AM Nguyễn Thái Ngọc Duywrote: > The only benefit from this move (apart from cleaner code) is that > advice.amWorkDir should now show up in `git help --config`. There > should be no regression since advice config is always read by the > git_default_config(). > While at there, use advise() like other code. We now get "hint: " > prefix and the output is stderr instead of stdout (which is also the > reason for the test update because stderr is checked in a following > test and the extra advice can fail it). > Signed-off-by: Nguyễn Thái Ngọc Duy > --- > diff --git a/builtin/am.c b/builtin/am.c > @@ -1827,15 +1827,12 @@ static void am_run(struct am_state *state, int resume) > if (apply_status) { > - int advice_amworkdir = 1; Nit: Maybe also delete the blank line following the removed declaration... > printf_ln(_("Patch failed at %s %.*s"), msgnum(state), > linelen(state->msg), state->msg);
Re: [PATCH 4/9] help: add --config to list all available config
On Thu, May 10, 2018 at 10:20 AM Nguyễn Thái Ngọc Duywrote: > Sometimes it helps to list all available config vars so the user can > search for something they want. The config man page can also be used > but it's harder to search if you want to focus on the variable name, > for example. > Signed-off-by: Nguyễn Thái Ngọc Duy > --- > diff --git a/builtin/help.c b/builtin/help.c > @@ -44,6 +45,7 @@ static struct option builtin_help_options[] = { > OPT_BOOL('g', "guides", _guides, N_("print list of useful guides")), > + OPT_BOOL('c', "config", _config, N_("print list recognized config variables")), s/list/& of/ Though, simpler might be better: "print all configuration variable names" > diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh > @@ -76,6 +76,24 @@ print_command_list () { > +print_config_list() { > + cat < +static const char *config_name_list[] = { > +EOF > + grep '^[a-zA-Z].*\..*::$' Documentation/config.txt | > + grep -v deprecated | > + sed 's/::$//; s/, */\n/g' | Nit: "grep -v" and "sed" could be combined: sed '/deprecated/d; s/::$//; s/, */\n/g' | > + sort | > + while read line > + do > + echo " \"$line\"," > + done > + cat < + NULL, > +}; > +EOF > diff --git a/help.c b/help.c > @@ -409,6 +409,54 @@ void list_common_guides_help(void) > +void list_config_help(void) > +{ > + [...] > + for (p = config_name_list; *p; p++) { > + const char *var = *p; > + > + for (e = slot_expansions; e->prefix; e++) { > + struct strbuf sb = STRBUF_INIT; > + > + strbuf_addf(, "%s.%s", e->prefix, e->placeholder); > + if (strcasecmp(var, sb.buf)) > + continue; Isn't this "continue" leaking the strbuf? It seems that it would be easier to declare the strbuf once outside the loop, strbuf_reset() it at the top of the loop, and finally strbuf_release() it after the loop exits. > + e->fn(e->prefix); > + strbuf_release(); > + e->found++; > + break; > + } > + if (!e->prefix) > + puts(var); > + }
Re: [PATCH 3/9] fsck: factor out msg_id_info[] lazy initialization code
On Thu, May 10, 2018 at 10:19 AM Nguyễn Thái Ngọc Duywrote: > This array will be used by some other function than parse_msg_id() in > the following commit. Factor out this prep code so it could be called > from that one. > Signed-off-by: Nguyễn Thái Ngọc Duy > --- > diff --git a/fsck.c b/fsck.c > @@ -84,26 +84,32 @@ static struct { > -static int parse_msg_id(const char *text) > +static void prepare_msg_ids(void) > { > - if (!msg_id_info[0].downcased) { > - /* convert id_string to lower case, without underscores. */ > - for (i = 0; i < FSCK_MSG_MAX; i++) { > - [...] > - } > + /* convert id_string to lower case, without underscores. */ > + for (i = 0; i < FSCK_MSG_MAX; i++) { > + [...] > } > +} > + > +static int parse_msg_id(const char *text) > +{ > + if (!msg_id_info[0].downcased) > + prepare_msg_ids(); If you move the "if (!msg_id_info...)" conditional into the new parpare_msg_ids() function, then it becomes self-contained; it takes care of avoiding double-initialization so callers don't have to worry or know about it. (Doing so would also make the diff less noisy.) Not at all worth a re-roll.
Re: Implementing reftable in Git
On Fri, 2018-05-11 at 11:31 +0200, Michael Haggerty wrote: > On Wed, May 9, 2018 at 4:33 PM, Christian Couder >wrote: > > I might start working on implementing reftable in Git soon. > > [...] > > Nice. It'll be great to have a reftable implementation in git core > (and ideally libgit2, as well). It seems to me that it could someday > become the new default reference storage method. The file format is > considerably more complicated than the current loose/packed scheme, > which is definitely a disadvantage (for example, for other Git > implementations). But implementing it *with good performance and > without races* might be no more complicated than the current scheme. I am somewhat concerned about perf, because as I recall, we have a bunch of code which effectively load all refs, which will be more expensive with reftable than packed-refs (though maybe cheaper than loose refs). But maybe we have eliminated this code or can work around it. > Testing will be important. There are already many tests specifically > about testing loose/packed reference storage. These will always have > to run against repositories that are forced to use that reference > scheme. And there will need to be new tests specifically about the > reftable scheme. Both classes of tests should be run every time. That > much is pretty obvious. > > But currently, there are a lot of tests that assume the loose/packed > reference format on disk even though the tests are not really related > to references at all. ISTM that these should be converted to work at > a > higher level, for example using `for-each-ref`, `rev-parse`, etc. to > examine references rather than reading reference files directly. That > way the tests should run correctly regardless of which scheme is in > use. I agree with that, and I think some of my patches from years ago attempted to do that. I probably should have broken those out into a separate series so that they could have been applied separately. > And since it's too expensive to run the whole test suite with both > reference storage schemes, it seems to me that the reference storage > scheme that is used while running the scheme-neutral tests should be > easy to choose at runtime. I ran the whole suite with both schemes during my testing, and I think it was quite valuable in flushing out bugs. > David Turner did some analogous work for wiring up and testing his > proposed LMDB ref storage backend that might be useful [1]. I'm CCing > him, since he might have thoughts on this topic. Inline, above.
Re: [PATCH] doc: fix config API documentation about config_with_options
On Wed, 9 May 2018 10:19:50 -0700 Brandon Williamswrote: > On 05/09, Antonio Ospite wrote: > > In commit dc8441fdb ("config: don't implicitly use gitdir or commondir", > > 2017-06-14) the function git_config_with_options was renamed to > > config_with_options to better reflect the fact that it does not access > > the git global config or the repo config by default. > > > > However Documentation/technical/api-config.txt still refers to the > > previous name, fix that. > > > > While at it also update the documentation about the extra parameters, > > because they too changed since the initial definition. > > > > Signed-off-by: Antonio Ospite > > --- > > > > Patch based on the maint branch. > > Thanks for updating the docs. Maybe one day we can migrate these docs > to the source files themselves, making it easier to keep up to date. > For now this is good :) > Should I resend the patch to gits...@pobox.com with your Acked-by? Thanks, Antonio -- Antonio Ospite https://ao2.it https://twitter.com/ao2it A: Because it messes up the order in which people normally read text. See http://en.wikipedia.org/wiki/Posting_style Q: Why is top-posting such a bad thing?
[PATCH v2 05/12] commit-graph: verify fanout and lookup table
While running 'git commit-graph verify', verify that the object IDs are listed in lexicographic order and that the fanout table correctly navigates into that list of object IDs. Add tests that check these corruptions are caught by the verify subcommand. Most of the tests check the full output matches the exact error we inserted, but since our OID order test triggers incorrect fanout values (with possibly different numbers of output lines) we focus only that the correct error is written in that case. Signed-off-by: Derrick Stolee--- commit-graph.c | 36 t/t5318-commit-graph.sh | 31 +++ 2 files changed, 67 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index 4dfff7e752..b0fd1d5320 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -848,6 +848,9 @@ static void graph_report(const char *fmt, ...) int verify_commit_graph(struct commit_graph *g) { + uint32_t i, cur_fanout_pos = 0; + struct object_id prev_oid, cur_oid; + if (!g) { graph_report("no commit-graph file loaded"); return 1; @@ -862,5 +865,38 @@ int verify_commit_graph(struct commit_graph *g) if (!g->chunk_commit_data) graph_report("commit-graph is missing the Commit Data chunk"); + if (verify_commit_graph_error) + return 1; + + for (i = 0; i < g->num_commits; i++) { + hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); + + if (i && oidcmp(_oid, _oid) >= 0) + graph_report("commit-graph has incorrect oid order: %s then %s", +oid_to_hex(_oid), +oid_to_hex(_oid)); + + oidcpy(_oid, _oid); + + while (cur_oid.hash[0] > cur_fanout_pos) { + uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos); + if (i != fanout_value) + graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u", +cur_fanout_pos, fanout_value, i); + + cur_fanout_pos++; + } + } + + while (cur_fanout_pos < 256) { + uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos); + + if (g->num_commits != fanout_value) + graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u", +cur_fanout_pos, fanout_value, i); + + cur_fanout_pos++; + } + return verify_commit_graph_error; } diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 0cb88232fa..6fb306b0da 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -293,4 +293,35 @@ test_expect_success 'detect too small chunk-count' ' grep "missing the Commit Data chunk" verify-errors ' +test_expect_success 'detect incorrect chunk lookup value' ' + cd "$TRASH_DIRECTORY/full" && + cp $objdir/info/commit-graph commit-graph-backup && + test_when_finished mv commit-graph-backup $objdir/info/commit-graph && + corrupt_data $objdir/info/commit-graph 25 "\01" && + test_must_fail git commit-graph verify 2>err && + grep -v "^\+" err > verify-errors && + test_line_count = 1 verify-errors && + grep "improper chunk offset" verify-errors +' + +test_expect_success 'detect incorrect fanout' ' + cd "$TRASH_DIRECTORY/full" && + cp $objdir/info/commit-graph commit-graph-backup && + test_when_finished mv commit-graph-backup $objdir/info/commit-graph && + corrupt_data $objdir/info/commit-graph 128 "\01" && + test_must_fail git commit-graph verify 2>err && + grep -v "^\+" err > verify-errors && + test_line_count = 1 verify-errors && + grep "fanout value" verify-errors +' + +test_expect_success 'detect incorrect OID order' ' + cd "$TRASH_DIRECTORY/full" && + cp $objdir/info/commit-graph commit-graph-backup && + test_when_finished mv commit-graph-backup $objdir/info/commit-graph && + corrupt_data $objdir/info/commit-graph 1272 "\01" && + test_must_fail git commit-graph verify 2>err && + grep "incorrect oid order" err +' + test_done -- 2.16.2.329.gfb62395de6
[PATCH v2 04/12] commit-graph: parse commit from chosen graph
Before verifying a commit-graph file against the object database, we need to parse all commits from the given commit-graph file. Create parse_commit_in_graph_one() to target a given struct commit_graph. Signed-off-by: Derrick Stolee--- commit-graph.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index d2db20e49a..4dfff7e752 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -309,7 +309,7 @@ static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uin } } -int parse_commit_in_graph(struct commit *item) +int parse_commit_in_graph_one(struct commit_graph *g, struct commit *item) { uint32_t pos; @@ -317,9 +317,21 @@ int parse_commit_in_graph(struct commit *item) return 0; if (item->object.parsed) return 1; + + if (find_commit_in_graph(item, g, )) + return fill_commit_in_graph(item, g, pos); + + return 0; +} + +int parse_commit_in_graph(struct commit *item) +{ + if (!core_commit_graph) + return 0; + prepare_commit_graph(); - if (commit_graph && find_commit_in_graph(item, commit_graph, )) - return fill_commit_in_graph(item, commit_graph, pos); + if (commit_graph) + return parse_commit_in_graph_one(commit_graph, item); return 0; } -- 2.16.2.329.gfb62395de6
[PATCH v2 12/12] commit-graph: update design document
The commit-graph feature is now integrated with 'fsck' and 'gc', so remove those items from the "Future Work" section of the commit-graph design document. Also remove the section on lazy-loading trees, as that was completed in an earlier patch series. Signed-off-by: Derrick Stolee--- Documentation/technical/commit-graph.txt | 22 -- 1 file changed, 22 deletions(-) diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index e1a883eb46..c664acbd76 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -118,9 +118,6 @@ Future Work - The commit graph feature currently does not honor commit grafts. This can be remedied by duplicating or refactoring the current graft logic. -- The 'commit-graph' subcommand does not have a "verify" mode that is - necessary for integration with fsck. - - After computing and storing generation numbers, we must make graph walks aware of generation numbers to gain the performance benefits they enable. This will mostly be accomplished by swapping a commit-date-ordered @@ -130,25 +127,6 @@ Future Work - 'log --topo-order' - 'tag --merged' -- Currently, parse_commit_gently() requires filling in the root tree - object for a commit. This passes through lookup_tree() and consequently - lookup_object(). Also, it calls lookup_commit() when loading the parents. - These method calls check the ODB for object existence, even if the - consumer does not need the content. For example, we do not need the - tree contents when computing merge bases. Now that commit parsing is - removed from the computation time, these lookup operations are the - slowest operations keeping graph walks from being fast. Consider - loading these objects without verifying their existence in the ODB and - only loading them fully when consumers need them. Consider a method - such as "ensure_tree_loaded(commit)" that fully loads a tree before - using commit->tree. - -- The current design uses the 'commit-graph' subcommand to generate the graph. - When this feature stabilizes enough to recommend to most users, we should - add automatic graph writes to common operations that create many commits. - For example, one could compute a graph on 'clone', 'fetch', or 'repack' - commands. - - A server could provide a commit graph file as part of the network protocol to avoid extra calculations by clients. This feature is only of benefit if the user is willing to trust the file, because verifying the file is correct -- 2.16.2.329.gfb62395de6
[PATCH v2 09/12] fsck: verify commit-graph
If core.commitGraph is true, verify the contents of the commit-graph during 'git fsck' using the 'git commit-graph verify' subcommand. Run this check on all alternates, as well. We use a new process for two reasons: 1. The subcommand decouples the details of loading and verifying a commit-graph file from the other fsck details. 2. The commit-graph verification requires the commits to be loaded in a specific order to guarantee we parse from the commit-graph file for some objects and from the object database for others. Signed-off-by: Derrick Stolee--- Documentation/git-fsck.txt | 3 +++ builtin/fsck.c | 21 + t/t5318-commit-graph.sh| 21 ++--- 3 files changed, 42 insertions(+), 3 deletions(-) diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt index b9f060e3b2..ab9a93fb9b 100644 --- a/Documentation/git-fsck.txt +++ b/Documentation/git-fsck.txt @@ -110,6 +110,9 @@ Any corrupt objects you will have to find in backups or other archives (i.e., you can just remove them and do an 'rsync' with some other site in the hopes that somebody else has the object you have corrupted). +If core.commitGraph is true, the commit-graph file will also be inspected +using 'git commit-graph verify'. See linkgit:git-commit-graph[1]. + Extracted Diagnostics - diff --git a/builtin/fsck.c b/builtin/fsck.c index ef78c6c00c..a6d5045b77 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -16,6 +16,7 @@ #include "streaming.h" #include "decorate.h" #include "packfile.h" +#include "run-command.h" #define REACHABLE 0x0001 #define SEEN 0x0002 @@ -45,6 +46,7 @@ static int name_objects; #define ERROR_REACHABLE 02 #define ERROR_PACK 04 #define ERROR_REFS 010 +#define ERROR_COMMIT_GRAPH 020 static const char *describe_object(struct object *obj) { @@ -815,5 +817,24 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) } check_connectivity(); + + if (core_commit_graph) { + struct child_process commit_graph_verify = CHILD_PROCESS_INIT; + const char *verify_argv[] = { "commit-graph", "verify", NULL, NULL, NULL, NULL }; + commit_graph_verify.argv = verify_argv; + commit_graph_verify.git_cmd = 1; + + if (run_command(_graph_verify)) + errors_found |= ERROR_COMMIT_GRAPH; + + prepare_alt_odb(); + for (alt = alt_odb_list; alt; alt = alt->next) { + verify_argv[2] = "--object-dir"; + verify_argv[3] = alt->path; + if (run_command(_graph_verify)) + errors_found |= ERROR_COMMIT_GRAPH; + } + } + return errors_found; } diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 5ab268a024..91c8406d97 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -205,6 +205,16 @@ test_expect_success 'build graph from commits with append' ' graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1 graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2 +test_expect_success 'build graph using --reachable' ' + cd "$TRASH_DIRECTORY/full" && + git commit-graph write --reachable && + test_path_is_file $objdir/info/commit-graph && + graph_read_expect "11" "large_edges" +' + +graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1 +graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2 + test_expect_success 'setup bare repo' ' cd "$TRASH_DIRECTORY" && git clone --bare --no-local full bare && @@ -335,7 +345,7 @@ test_expect_success 'detect OID not in object database' ' cd "$TRASH_DIRECTORY/full" && cp $objdir/info/commit-graph commit-graph-backup && test_when_finished mv commit-graph-backup $objdir/info/commit-graph && - corrupt_data $objdir/info/commit-graph 1134 "\01" && + corrupt_data $objdir/info/commit-graph 1134 "\00" && test_must_fail git commit-graph verify 2>err && grep -v "^\+" err > verify-errors && test_line_count = 3 verify-errors && @@ -348,7 +358,7 @@ test_expect_success 'detect incorrect tree OID' ' cd "$TRASH_DIRECTORY/full" && cp $objdir/info/commit-graph commit-graph-backup && test_when_finished mv commit-graph-backup $objdir/info/commit-graph && - corrupt_data $objdir/info/commit-graph 1312 "\01" && + corrupt_data $objdir/info/commit-graph 1312 "\00" && test_must_fail git commit-graph verify 2>err && grep -v "^\+" err > verify-errors && test_line_count = 1 verify-errors && @@ -382,10 +392,15 @@ test_expect_success 'detect incorrect commit date and generation number' ' cp $objdir/info/commit-graph commit-graph-backup && test_when_finished
[PATCH v2 06/12] commit: force commit to parse from object database
In anticipation of verifying commit-graph file contents against the object database, create parse_commit_internal() to allow side-stepping the commit-graph file and parse directly from the object database. Due to the use of generation numbers, this method should not be called unless the intention is explicit in avoiding commits from the commit-graph file. Signed-off-by: Derrick Stolee--- commit.c | 13 + commit.h | 1 + 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/commit.c b/commit.c index 1d28677dfb..7c92350373 100644 --- a/commit.c +++ b/commit.c @@ -392,7 +392,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s return 0; } -int parse_commit_gently(struct commit *item, int quiet_on_missing) +int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph) { enum object_type type; void *buffer; @@ -403,17 +403,17 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing) return -1; if (item->object.parsed) return 0; - if (parse_commit_in_graph(item)) + if (use_commit_graph && parse_commit_in_graph(item)) return 0; buffer = read_sha1_file(item->object.oid.hash, , ); if (!buffer) return quiet_on_missing ? -1 : error("Could not read %s", -oid_to_hex(>object.oid)); + oid_to_hex(>object.oid)); if (type != OBJ_COMMIT) { free(buffer); return error("Object %s not a commit", -oid_to_hex(>object.oid)); + oid_to_hex(>object.oid)); } ret = parse_commit_buffer(item, buffer, size, 0); if (save_commit_buffer && !ret) { @@ -424,6 +424,11 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing) return ret; } +int parse_commit_gently(struct commit *item, int quiet_on_missing) +{ + return parse_commit_internal(item, quiet_on_missing, 1); +} + void parse_commit_or_die(struct commit *item) { if (parse_commit(item)) diff --git a/commit.h b/commit.h index b5afde1ae9..5fde74fcd7 100644 --- a/commit.h +++ b/commit.h @@ -73,6 +73,7 @@ struct commit *lookup_commit_reference_by_name(const char *name); struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref_name); int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size, int check_graph); +int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph); int parse_commit_gently(struct commit *item, int quiet_on_missing); static inline int parse_commit(struct commit *item) { -- 2.16.2.329.gfb62395de6
[PATCH v2 07/12] commit-graph: load a root tree from specific graph
When lazy-loading a tree for a commit, it will be important to select the tree from a specific struct commit_graph. Create a new method that specifies the commit-graph file and use that in get_commit_tree_in_graph(). Signed-off-by: Derrick Stolee--- commit-graph.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index b0fd1d5320..5bb93e533c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -357,14 +357,20 @@ static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit * return c->maybe_tree; } -struct tree *get_commit_tree_in_graph(const struct commit *c) +static struct tree *get_commit_tree_in_graph_one(struct commit_graph *g, +const struct commit *c) { if (c->maybe_tree) return c->maybe_tree; if (c->graph_pos == COMMIT_NOT_FROM_GRAPH) BUG("get_commit_tree_in_graph called from non-commit-graph commit"); - return load_tree_for_commit(commit_graph, (struct commit *)c); + return load_tree_for_commit(g, (struct commit *)c); +} + +struct tree *get_commit_tree_in_graph(const struct commit *c) +{ + return get_commit_tree_in_graph_one(commit_graph, c); } static void write_graph_chunk_fanout(struct hashfile *f, -- 2.16.2.329.gfb62395de6
[PATCH v2 11/12] gc: automatically write commit-graph files
The commit-graph file is a very helpful feature for speeding up git operations. In order to make it more useful, write the commit-graph file by default during standard garbage collection operations. Add a 'gc.commitGraph' config setting that triggers writing a commit-graph file after any non-trivial 'git gc' command. Defaults to false while the commit-graph feature matures. We specifically do not want to turn this on by default until the commit-graph feature is fully integrated with history-modifying features like shallow clones. Signed-off-by: Derrick Stolee--- Documentation/config.txt | 6 ++ Documentation/git-gc.txt | 4 builtin/gc.c | 8 3 files changed, 18 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index 11f027194e..9a3abd87e7 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1553,6 +1553,12 @@ gc.autoDetach:: Make `git gc --auto` return immediately and run in background if the system supports it. Default is true. +gc.commitGraph:: + If true, then gc will rewrite the commit-graph file after any + change to the object database. If '--auto' is used, then the + commit-graph will not be updated unless the threshold is met. + See linkgit:git-commit-graph[1] for details. + gc.logExpiry:: If the file gc.log exists, then `git gc --auto` won't run unless that file is more than 'gc.logExpiry' old. Default is diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt index 571b5a7e3c..17dd654a59 100644 --- a/Documentation/git-gc.txt +++ b/Documentation/git-gc.txt @@ -119,6 +119,10 @@ The optional configuration variable `gc.packRefs` determines if it within all non-bare repos or it can be set to a boolean value. This defaults to true. +The optional configuration variable 'gc.commitGraph' determines if +'git gc' runs 'git commit-graph write'. This can be set to a boolean +value. This defaults to false. + The optional configuration variable `gc.aggressiveWindow` controls how much time is spent optimizing the delta compression of the objects in the repository when the --aggressive option is specified. The larger diff --git a/builtin/gc.c b/builtin/gc.c index 77fa720bd0..8403445738 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -34,6 +34,7 @@ static int aggressive_depth = 50; static int aggressive_window = 250; static int gc_auto_threshold = 6700; static int gc_auto_pack_limit = 50; +static int gc_commit_graph = 0; static int detach_auto = 1; static timestamp_t gc_log_expire_time; static const char *gc_log_expire = "1.day.ago"; @@ -46,6 +47,7 @@ static struct argv_array repack = ARGV_ARRAY_INIT; static struct argv_array prune = ARGV_ARRAY_INIT; static struct argv_array prune_worktrees = ARGV_ARRAY_INIT; static struct argv_array rerere = ARGV_ARRAY_INIT; +static struct argv_array commit_graph = ARGV_ARRAY_INIT; static struct tempfile *pidfile; static struct lock_file log_lock; @@ -121,6 +123,7 @@ static void gc_config(void) git_config_get_int("gc.aggressivedepth", _depth); git_config_get_int("gc.auto", _auto_threshold); git_config_get_int("gc.autopacklimit", _auto_pack_limit); + git_config_get_bool("gc.commitgraph", _commit_graph); git_config_get_bool("gc.autodetach", _auto); git_config_get_expiry("gc.pruneexpire", _expire); git_config_get_expiry("gc.worktreepruneexpire", _worktrees_expire); @@ -374,6 +377,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix) argv_array_pushl(, "prune", "--expire", NULL); argv_array_pushl(_worktrees, "worktree", "prune", "--expire", NULL); argv_array_pushl(, "rerere", "gc", NULL); + argv_array_pushl(_graph, "commit-graph", "write", "--reachable", NULL); /* default expiry time, overwritten in gc_config */ gc_config(); @@ -480,6 +484,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix) if (pack_garbage.nr > 0) clean_pack_garbage(); + if (gc_commit_graph) + if (run_command_v_opt(commit_graph.argv, RUN_GIT_CMD)) + return error(FAILED_RUN, commit_graph.argv[0]); + if (auto_gc && too_many_loose_objects()) warning(_("There are too many unreachable loose objects; " "run 'git prune' to remove them.")); -- 2.16.2.329.gfb62395de6
[PATCH v2 10/12] commit-graph: add '--reachable' option
When writing commit-graph files, it can be convenient to ask for all reachable commits (starting at the ref set) in the resulting file. This is particularly helpful when writing to stdin is complicated, such as a future integration with 'git gc' which will call 'git commit-graph write --reachable' after performing cleanup of the object database. Signed-off-by: Derrick Stolee--- Documentation/git-commit-graph.txt | 8 ++-- builtin/commit-graph.c | 41 ++ 2 files changed, 43 insertions(+), 6 deletions(-) diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index a222cfab08..cc1715a823 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -38,12 +38,16 @@ Write a commit graph file based on the commits found in packfiles. + With the `--stdin-packs` option, generate the new commit graph by walking objects only in the specified pack-indexes. (Cannot be combined -with --stdin-commits.) +with --stdin-commits or --reachable.) + With the `--stdin-commits` option, generate the new commit graph by walking commits starting at the commits specified in stdin as a list of OIDs in hex, one OID per line. (Cannot be combined with ---stdin-packs.) +--stdin-packs or --reachable.) ++ +With the `--reachable` option, generate the new commit graph by walking +commits starting at all refs. (Cannot be combined with --stdin-commits +or --stind-packs.) + With the `--append` option, include all commits that are present in the existing commit-graph file. diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index af3101291f..7cb94a4813 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -3,13 +3,14 @@ #include "dir.h" #include "lockfile.h" #include "parse-options.h" +#include "refs.h" #include "commit-graph.h" static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph [--object-dir ]"), N_("git commit-graph read [--object-dir ]"), N_("git commit-graph verify [--object-dir ]"), - N_("git commit-graph write [--object-dir ] [--append] [--stdin-packs|--stdin-commits]"), + N_("git commit-graph write [--object-dir ] [--append] [--reachable|--stdin-packs|--stdin-commits]"), NULL }; @@ -24,12 +25,13 @@ static const char * const builtin_commit_graph_read_usage[] = { }; static const char * const builtin_commit_graph_write_usage[] = { - N_("git commit-graph write [--object-dir ] [--append] [--stdin-packs|--stdin-commits]"), + N_("git commit-graph write [--object-dir ] [--append] [--reachable|--stdin-packs|--stdin-commits]"), NULL }; static struct opts_commit_graph { const char *obj_dir; + int reachable; int stdin_packs; int stdin_commits; int append; @@ -113,6 +115,25 @@ static int graph_read(int argc, const char **argv) return 0; } +struct hex_list { + char **hex_strs; + int hex_nr; + int hex_alloc; +}; + +static int add_ref_to_list(const char *refname, + const struct object_id *oid, + int flags, void *cb_data) +{ + struct hex_list *list = (struct hex_list*)cb_data; + + ALLOC_GROW(list->hex_strs, list->hex_nr + 1, list->hex_alloc); + list->hex_strs[list->hex_nr] = xcalloc(GIT_MAX_HEXSZ + 1, 1); + strcpy(list->hex_strs[list->hex_nr], oid_to_hex(oid)); + list->hex_nr++; + return 0; +} + static int graph_write(int argc, const char **argv) { const char **pack_indexes = NULL; @@ -127,6 +148,8 @@ static int graph_write(int argc, const char **argv) OPT_STRING(0, "object-dir", _dir, N_("dir"), N_("The object directory to store the graph")), + OPT_BOOL(0, "reachable", , + N_("start walk at all refs")), OPT_BOOL(0, "stdin-packs", _packs, N_("scan pack-indexes listed by stdin for commits")), OPT_BOOL(0, "stdin-commits", _commits, @@ -140,8 +163,8 @@ static int graph_write(int argc, const char **argv) builtin_commit_graph_write_options, builtin_commit_graph_write_usage, 0); - if (opts.stdin_packs && opts.stdin_commits) - die(_("cannot use both --stdin-commits and --stdin-packs")); + if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1) + die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs")); if (!opts.obj_dir) opts.obj_dir = get_object_directory(); @@ -164,6 +187,16 @@ static int graph_write(int argc, const char **argv) commit_hex = lines; commits_nr = lines_nr; } + } else if (opts.reachable) { + struct hex_list list;
[PATCH v2 08/12] commit-graph: verify commit contents against odb
When running 'git commit-graph verify', compare the contents of the commits that are loaded from the commit-graph file with commits that are loaded directly from the object database. This includes checking the root tree object ID, commit date, and parents. Parse the commit from the graph during the initial loop through the object IDs to guarantee we parse from the commit-graph file. In addition, verify the generation number calculation is correct for all commits in the commit-graph file. While testing, we discovered that mutating the integer value for a parent to be outside the accepted range causes a segmentation fault. Add a new check in insert_parent_or_die() that prevents this fault. Check for that error during the test, both in the typical parents and in the list of parents for octopus merges. Signed-off-by: Derrick Stolee--- commit-graph.c | 100 t/t5318-commit-graph.sh | 64 +++ 2 files changed, 164 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index 5bb93e533c..a15ad9710d 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -237,6 +237,10 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g, { struct commit *c; struct object_id oid; + + if (pos >= g->num_commits) + die("invalide parent position %"PRIu64, pos); + hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos); c = lookup_commit(); if (!c) @@ -875,6 +879,8 @@ int verify_commit_graph(struct commit_graph *g) return 1; for (i = 0; i < g->num_commits; i++) { + struct commit *graph_commit; + hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); if (i && oidcmp(_oid, _oid) >= 0) @@ -892,6 +898,10 @@ int verify_commit_graph(struct commit_graph *g) cur_fanout_pos++; } + + graph_commit = lookup_commit(_oid); + if (!parse_commit_in_graph_one(g, graph_commit)) + graph_report("failed to parse %s from commit-graph", oid_to_hex(_oid)); } while (cur_fanout_pos < 256) { @@ -904,5 +914,95 @@ int verify_commit_graph(struct commit_graph *g) cur_fanout_pos++; } + if (verify_commit_graph_error) + return 1; + + for (i = 0; i < g->num_commits; i++) { + struct commit *graph_commit, *odb_commit; + struct commit_list *graph_parents, *odb_parents; + int num_parents = 0; + + hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); + + graph_commit = lookup_commit(_oid); + odb_commit = (struct commit *)create_object(cur_oid.hash, alloc_commit_node()); + if (parse_commit_internal(odb_commit, 0, 0)) { + graph_report("failed to parse %s from object database", oid_to_hex(_oid)); + continue; + } + + if (oidcmp(_commit_tree_in_graph_one(g, graph_commit)->object.oid, + get_commit_tree_oid(odb_commit))) + graph_report("root tree object ID for commit %s in commit-graph is %s != %s", +oid_to_hex(_oid), + oid_to_hex(get_commit_tree_oid(graph_commit)), + oid_to_hex(get_commit_tree_oid(odb_commit))); + + if (graph_commit->date != odb_commit->date) + graph_report("commit date for commit %s in commit-graph is %"PRItime" != %"PRItime"", +oid_to_hex(_oid), +graph_commit->date, +odb_commit->date); + + + graph_parents = graph_commit->parents; + odb_parents = odb_commit->parents; + + while (graph_parents) { + num_parents++; + + if (odb_parents == NULL) + graph_report("commit-graph parent list for commit %s is too long (%d)", +oid_to_hex(_oid), +num_parents); + + if (oidcmp(_parents->item->object.oid, _parents->item->object.oid)) + graph_report("commit-graph parent for %s is %s != %s", +oid_to_hex(_oid), + oid_to_hex(_parents->item->object.oid), + oid_to_hex(_parents->item->object.oid)); + + graph_parents = graph_parents->next; + odb_parents = odb_parents->next; + } + + if (odb_parents != NULL) +
[PATCH v2 03/12] commit-graph: test that 'verify' finds corruption
Add test cases to t5318-commit-graph.sh that corrupt the commit-graph file and check that the 'git commit-graph verify' command fails. These tests verify the header and chunk information is checked carefully. Helped-by: Martin ÅgrenSigned-off-by: Derrick Stolee --- t/t5318-commit-graph.sh | 53 + 1 file changed, 53 insertions(+) diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 6ca451dfd2..0cb88232fa 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -240,4 +240,57 @@ test_expect_success 'git commit-graph verify' ' git commit-graph verify >output ' +# usage: corrupt_data [] +corrupt_data() { + file=$1 + pos=$2 + data="${3:-\0}" + printf "$data" | dd of="$file" bs=1 seek="$pos" conv=notrunc +} + +test_expect_success 'detect bad signature' ' + cd "$TRASH_DIRECTORY/full" && + cp $objdir/info/commit-graph commit-graph-backup && + test_when_finished mv commit-graph-backup $objdir/info/commit-graph && + corrupt_data $objdir/info/commit-graph 0 "\0" && + test_must_fail git commit-graph verify 2>err && + grep -v "^\+" err > verify-errors && + test_line_count = 1 verify-errors && + grep "graph signature" verify-errors +' + +test_expect_success 'detect bad version number' ' + cd "$TRASH_DIRECTORY/full" && + cp $objdir/info/commit-graph commit-graph-backup && + test_when_finished mv commit-graph-backup $objdir/info/commit-graph && + corrupt_data $objdir/info/commit-graph 4 "\02" && + test_must_fail git commit-graph verify 2>err && + grep -v "^\+" err > verify-errors && + test_line_count = 1 verify-errors && + grep "graph version" verify-errors +' + +test_expect_success 'detect bad hash version' ' + cd "$TRASH_DIRECTORY/full" && + cp $objdir/info/commit-graph commit-graph-backup && + test_when_finished mv commit-graph-backup $objdir/info/commit-graph && + corrupt_data $objdir/info/commit-graph 5 "\02" && + test_must_fail git commit-graph verify 2>err && + grep -v "^\+" err > verify-errors && + test_line_count = 1 verify-errors && + grep "hash version" verify-errors +' + +test_expect_success 'detect too small chunk-count' ' + cd "$TRASH_DIRECTORY/full" && + cp $objdir/info/commit-graph commit-graph-backup && + test_when_finished mv commit-graph-backup $objdir/info/commit-graph && + corrupt_data $objdir/info/commit-graph 6 "\01" && + test_must_fail git commit-graph verify 2>err && + grep -v "^\+" err > verify-errors && + test_line_count = 2 verify-errors && + grep "missing the OID Lookup chunk" verify-errors && + grep "missing the Commit Data chunk" verify-errors +' + test_done -- 2.16.2.329.gfb62395de6
[PATCH v2 02/12] commit-graph: verify file header information
During a run of 'git commit-graph verify', list the issues with the header information in the commit-graph file. Some of this information is inferred from the loaded 'struct commit_graph'. Some header information is checked as part of load_commit_graph_one(). Signed-off-by: Derrick Stolee--- commit-graph.c | 32 +++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/commit-graph.c b/commit-graph.c index b25aaed128..d2db20e49a 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -818,7 +818,37 @@ void write_commit_graph(const char *obj_dir, oids.nr = 0; } +static int verify_commit_graph_error; + +static void graph_report(const char *fmt, ...) +{ + va_list ap; + struct strbuf sb = STRBUF_INIT; + verify_commit_graph_error = 1; + + va_start(ap, fmt); + strbuf_vaddf(, fmt, ap); + + fprintf(stderr, "%s\n", sb.buf); + strbuf_release(); + va_end(ap); +} + int verify_commit_graph(struct commit_graph *g) { - return !g; + if (!g) { + graph_report("no commit-graph file loaded"); + return 1; + } + + verify_commit_graph_error = 0; + + if (!g->chunk_oid_fanout) + graph_report("commit-graph is missing the OID Fanout chunk"); + if (!g->chunk_oid_lookup) + graph_report("commit-graph is missing the OID Lookup chunk"); + if (!g->chunk_commit_data) + graph_report("commit-graph is missing the Commit Data chunk"); + + return verify_commit_graph_error; } -- 2.16.2.329.gfb62395de6
[PATCH v2 01/12] commit-graph: add 'verify' subcommand
If the commit-graph file becomes corrupt, we need a way to verify that its contents match the object database. In the manner of 'git fsck' we will implement a 'git commit-graph verify' subcommand to report all issues with the file. Add the 'verify' subcommand to the 'commit-graph' builtin and its documentation. The subcommand is currently a no-op except for loading the commit-graph into memory, which may trigger run-time errors that would be caught by normal use. Add a simple test that ensures the command returns a zero error code. If no commit-graph file exists, this is an acceptable state. Do not report any errors. During review, we noticed that a FREE_AND_NULL(graph_name) was placed after a possible 'return', and this pattern was also in graph_read(). Fix that case, too. Signed-off-by: Derrick Stolee--- Documentation/git-commit-graph.txt | 6 ++ builtin/commit-graph.c | 40 +- commit-graph.c | 5 + commit-graph.h | 2 ++ t/t5318-commit-graph.sh| 10 ++ 5 files changed, 62 insertions(+), 1 deletion(-) diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index 4c97b555cc..a222cfab08 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -10,6 +10,7 @@ SYNOPSIS [verse] 'git commit-graph read' [--object-dir ] +'git commit-graph verify' [--object-dir ] 'git commit-graph write' [--object-dir ] @@ -52,6 +53,11 @@ existing commit-graph file. Read a graph file given by the commit-graph file and output basic details about the graph file. Used for debugging purposes. +'verify':: + +Read the commit-graph file and verify its contents against the object +database. Used to check for corrupted data. + EXAMPLES diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 37420ae0fd..af3101291f 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -8,10 +8,16 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph [--object-dir ]"), N_("git commit-graph read [--object-dir ]"), + N_("git commit-graph verify [--object-dir ]"), N_("git commit-graph write [--object-dir ] [--append] [--stdin-packs|--stdin-commits]"), NULL }; +static const char * const builtin_commit_graph_verify_usage[] = { + N_("git commit-graph verify [--object-dir ]"), + NULL +}; + static const char * const builtin_commit_graph_read_usage[] = { N_("git commit-graph read [--object-dir ]"), NULL @@ -29,6 +35,36 @@ static struct opts_commit_graph { int append; } opts; + +static int graph_verify(int argc, const char **argv) +{ + struct commit_graph *graph = 0; + char *graph_name; + + static struct option builtin_commit_graph_verify_options[] = { + OPT_STRING(0, "object-dir", _dir, + N_("dir"), + N_("The object directory to store the graph")), + OPT_END(), + }; + + argc = parse_options(argc, argv, NULL, +builtin_commit_graph_verify_options, +builtin_commit_graph_verify_usage, 0); + + if (!opts.obj_dir) + opts.obj_dir = get_object_directory(); + + graph_name = get_commit_graph_filename(opts.obj_dir); + graph = load_commit_graph_one(graph_name); + FREE_AND_NULL(graph_name); + + if (!graph) + return 0; + + return verify_commit_graph(graph); +} + static int graph_read(int argc, const char **argv) { struct commit_graph *graph = NULL; @@ -50,10 +86,10 @@ static int graph_read(int argc, const char **argv) graph_name = get_commit_graph_filename(opts.obj_dir); graph = load_commit_graph_one(graph_name); + FREE_AND_NULL(graph_name); if (!graph) die("graph file %s does not exist", graph_name); - FREE_AND_NULL(graph_name); printf("header: %08x %d %d %d %d\n", ntohl(*(uint32_t*)graph->data), @@ -160,6 +196,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix) PARSE_OPT_STOP_AT_NON_OPTION); if (argc > 0) { + if (!strcmp(argv[0], "verify")) + return graph_verify(argc, argv); if (!strcmp(argv[0], "read")) return graph_read(argc, argv); if (!strcmp(argv[0], "write")) diff --git a/commit-graph.c b/commit-graph.c index a8c337dd77..b25aaed128 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -817,3 +817,8 @@ void write_commit_graph(const char *obj_dir, oids.alloc = 0; oids.nr = 0; } + +int verify_commit_graph(struct commit_graph *g) +{ + return !g; +} diff --git a/commit-graph.h b/commit-graph.h index
[PATCH v2 00/12] Integrate commit-graph into fsck and gc
I'm sending this v2 re-roll rather quickly after the previous version because Martin provided a framework to add tests to the 'verify' subcommand. I took that framework and added tests for the other checks of the commit-graph data. This also found some interesting things about the command: 1. There were some segfaults because we were not checking for bad data carefully enough. 2. To avoid segfaults, we will now terminate the check early if we find problems earlier in the file, such as in the header, or OID lookup. 3. We were not writing newlines between reports. This now happens by default in graph_report(). The integration into 'fetch' is dropped (thanks Ævar!). Derrick Stolee (12): commit-graph: add 'verify' subcommand commit-graph: verify file header information commit-graph: test that 'verify' finds corruption commit-graph: parse commit from chosen graph commit-graph: verify fanout and lookup table commit: force commit to parse from object database commit-graph: load a root tree from specific graph commit-graph: verify commit contents against odb fsck: verify commit-graph commit-graph: add '--reachable' option gc: automatically write commit-graph files commit-graph: update design document Documentation/config.txt | 6 + Documentation/git-commit-graph.txt | 14 ++- Documentation/git-fsck.txt | 3 + Documentation/git-gc.txt | 4 + Documentation/technical/commit-graph.txt | 22 builtin/commit-graph.c | 81 - builtin/fsck.c | 21 builtin/gc.c | 8 ++ commit-graph.c | 199 ++- commit-graph.h | 2 + commit.c | 13 +- commit.h | 1 + t/t5318-commit-graph.sh | 173 +++ 13 files changed, 509 insertions(+), 38 deletions(-) base-commit: 34fdd433396ee0e3ef4de02eb2189f8226eafe4e -- 2.16.2.329.gfb62395de6
Re: [PATCH] alloc: allow arbitrary repositories for alloc functions
On Fri, May 11, 2018 at 3:17 PM Stefan Bellerwrote: > diff --git a/commit.c b/commit.c > @@ -296,6 +297,17 @@ void free_commit_buffer(struct commit *commit) > +void relase_commit_memory(struct commit *c) s/relase/release/
RE: [Best Practices Request] clean/smudge configuration
On May 10, 2018 10:27 PM, Junio C Hamano wrote: > "Randall S. Becker"writes: > > > What if we create a ../.gitconfig like ../.gitattributes, that is > > loaded before .git/config? > > You should not forget one of the two reasons why clean/smudge/diff etc. > drivers must be given with a step of redirection, split between attributes and > config. "This path holds text from ancient macs" at the logical level may be > expressed in .gitattributes, and then "when checking out text from ancient > macs, process the blob data with this program to turn it into a form suitable > for working tree" is given as a smudge filter program, that is (1) suitable > for > the platform _you_ as the writer of the config file is using *AND* (2) > something you are confortable running on behalf of you. > > We *deliberately* lack a mechanism to pay attention to in-tree config that > gets propagated to those who get them via "git clone", exactly because > otherwise your upstream may not just specify a "smudge" program that your > platform would be unable to run, but can specify a "smudge" program that > pretends to be a smudger, but is actually a program that installs a keylogger > and posts your login password and bank details to this mailing list ;-) > > So, no, I do not think in-tree configuration will fly. I agree, which is why I asked the original question instead of posting it as a formal patch. We would probably get a brand new CVE implementing the proposed proto-changes even if the original intent was internal trusted repositories only. That's why I asked this as a "Best Practices" type question, which I think I have a better idea on now. The actual situation is rather cool from a DevOps point of view, and whatever the ultimate solution is, might make for a nice presentation at some future conference . Cheers and thanks, Randall
[PATCH] alloc: allow arbitrary repositories for alloc functions
We have to convert all of the alloc functions at once, because alloc_report uses a funky macro for reporting. It is better for the sake of mechanical conversion to convert multiple functions at once rather than changing the structure of the reporting function. We record all memory allocation in alloc.c, and free them in clear_alloc_state, which is called for all repositories except the_repository. Signed-off-by: Stefan BellerSigned-off-by: Junio C Hamano --- > This might seem a bit bikesheddy, but I wouldn't call it > free_tag_buffer(), since what's being freed is not the buffer of the > object itself, but just a string. If you want such a function, I would > just call it release_tag_memory() to match release_commit_memory(). So you would replace the last commit with a patch like this? Thanks, Stefan Notes: diff to what is currently queued: diff --git c/commit.c w/commit.c index 612ccf7b053..f3a5872c393 100644 --- c/commit.c +++ w/commit.c @@ -297,11 +297,15 @@ void free_commit_buffer(struct commit *commit) } } -void release_commit_memory(struct commit *c) +void relase_commit_memory(struct commit *c) { + c->tree = NULL; + c->index = 0; free_commit_buffer(c); free_commit_list(c->parents); /* TODO: what about commit->util? */ + + c->object.parsed = 0; } const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep) diff --git c/commit.h w/commit.h index 2d764ab7d8e..366c151e0cb 100644 --- c/commit.h +++ w/commit.h @@ -103,7 +103,7 @@ void free_commit_buffer(struct commit *); * Release memory related to a commit, including the parent list and * any cached object buffer. */ -void release_commit_memory(struct commit *c); +void relase_commit_memory(struct commit *c); /* * Disassociate any cached object buffer from the commit, but do not free it. diff --git c/object.c w/object.c index 9d5b10d5a20..a7d1fd4a20b 100644 --- c/object.c +++ w/object.c @@ -526,9 +526,9 @@ void parsed_object_pool_clear(struct parsed_object_pool *o) if (obj->type == OBJ_TREE) free_tree_buffer((struct tree*)obj); else if (obj->type == OBJ_COMMIT) - release_commit_memory((struct commit*)obj); + relase_commit_memory((struct commit*)obj); else if (obj->type == OBJ_TAG) - free_tag_buffer((struct tag*)obj); + release_tag_memory((struct tag*)obj); } FREE_AND_NULL(o->obj_hash); diff --git c/tag.c w/tag.c index 254352c30c6..7c12426b4ea 100644 --- c/tag.c +++ w/tag.c @@ -116,9 +116,12 @@ static timestamp_t parse_tag_date(const char *buf, const char *tail) return parse_timestamp(dateptr, NULL, 10); } -void free_tag_buffer(struct tag *t) +void release_tag_memory(struct tag *t) { free(t->tag); + t->tagged = NULL; + t->object.parsed = 0; + t->date = 0; } int parse_tag_buffer(struct tag *item, const void *data, unsigned long size) diff --git c/tag.h w/tag.h index b241fe67bc5..9057d76a506 100644 --- c/tag.h +++ w/tag.h @@ -15,7 +15,7 @@ struct tag { extern struct tag *lookup_tag(const struct object_id *oid); extern int parse_tag_buffer(struct tag *item, const void *data, unsigned long size); extern int parse_tag(struct tag *item); -extern void free_tag_buffer(struct tag *t); +extern void release_tag_memory(struct tag *t); extern struct object *deref_tag(struct object *, const char *, int); extern struct object *deref_tag_noverify(struct object *); extern int gpg_verify_tag(const struct object_id *oid, alloc.c | 65 ++- alloc.h | 19 ++ blame.c | 1 + blob.c| 1 + cache.h | 16 commit.c | 12 + commit.h | 6 + merge-recursive.c | 1 + object.c | 42 -- object.h | 8 ++ tag.c | 9 +++ tag.h | 1 + tree.c| 1 + 13 files changed, 140 insertions(+), 42 deletions(-) create mode 100644 alloc.h diff --git a/alloc.c b/alloc.c index 277dadd221b..714df633169 100644 --- a/alloc.c +++ b/alloc.c @@ -4,8 +4,7 @@ * Copyright (C) 2006 Linus Torvalds * * The standard malloc/free wastes too much space for objects, partly because - * it maintains all the allocation infrastructure (which isn't needed, since - * we never free an object descriptor anyway), but even more because it ends + * it maintains all the allocation infrastructure, but even more because it ends * up with maximal alignment because it doesn't know what the object
Re: [PATCH v2 0/4] Fix mem leaks of recent object store conversions.
On Fri, May 11, 2018 at 1:37 AM, Jeff Kingwrote: > On Thu, May 10, 2018 at 12:58:45PM -0700, Stefan Beller wrote: > >> This series replaces the two commits that were queued on >> sb/object-store-replace, >> fixing memory leaks that were recently introduced. >> >> Compared to v1, I merged the two independent series from yesterday, >> rewrote the commit message to clear up Junios confusion and addresses Peffs >> comments for the packfiles as well. > > Mostly. :) > > My one remaining complaint is that the bitmap code may hold on to a > dangling pointer to a packed_git after this series. Ok, I'll look into that. > > I think that is part of a larger problem, though, which is that the > bitmap code's globals need to be part of the struct raw_object_store. > I think this can already cause problems before your series if we were to > try to use bitmaps in both a superproject and a submodule in the same > process, though I think we'd at least hit the "ignoring extra bitmap > file" code path in open_pack_bitmap_1(). So right now it's an annoyance, > but after your series it becomes a potential segfault. Ok, maybe we'll need to convert bitmaps into the object store for that. Thanks for the pointer, Stefan
Re: Regression in patch add?
On 11/05/18 03:47, Junio C Hamano wrote: > Phillip Woodwrites: > >> Yes, I think it probably makes sense to do that. Originally I didn't >> count empty lines as context lines in case the user accidentally added >> some empty lines at the end of the hunk but if 'git apply' does then I >> think 'git add -p' should as well > > I am not sure if "adding to the tail" should be tolerated, but in > any case, newer GNU diff can show an empty unaffected line as an > empty line (unlike traditional unified context format in which such > a line is expressed as a line with a lone SP on it), which is > allowed as "implementation defined" by POSIX.1 [*1*]. Modern "git > apply" knows about this. Thanks for the reference, I hadn't realized the space was optional. > If "add -p" parses a patch, it should learn to do so, too. I'm about to go off line for a while, I'll send a fix when I'm back up and running at next month (unfortunately the reroll of pw/add-p-select will have to wait until then as well) Best Wishes Phillip > > [Reference] > > *1* http://pubs.opengroup.org/onlinepubs/9699919799/utilities/diff.html > >> >>> Meanwhile, I can easily configure my editor not to do this for `*.diff` >>> files. >>> >>> Thanks for your help, Phillip and Martin! >> >> Thanks for posting an example so we could test it, it makes it much >> easier to track the problem down >> >> Best Wishes >> >> Phillip >> >>> Mahmoud, does this also explain your problem as per your original post? >>>
[PATCH 4/4] mark_parents_uninteresting(): avoid most allocation
Commit 941ba8db57 (Eliminate recursion in setting/clearing marks in commit list, 2012-01-14) used a clever double-loop to avoid allocations for single-parent chains of history. However, it did so only when following parents of parents (which was an uncommon case), and _always_ incurred at least one allocation to populate the list of pending parents in the first place. We can turn this into zero-allocation in the common case by iterating directly over the initial parent list, and then following up on any pending items we might have discovered. Signed-off-by: Jeff King--- Again, try "-w" for more readability. revision.c | 44 +--- 1 file changed, 25 insertions(+), 19 deletions(-) diff --git a/revision.c b/revision.c index 89ff9a99ce..cbe041128e 100644 --- a/revision.c +++ b/revision.c @@ -115,32 +115,38 @@ static void commit_stack_clear(struct commit_stack *stack) stack->nr = stack->alloc = 0; } -void mark_parents_uninteresting(struct commit *commit) +static void mark_one_parent_uninteresting(struct commit *commit, + struct commit_stack *pending) { - struct commit_stack pending = COMMIT_STACK_INIT; struct commit_list *l; + if (commit->object.flags & UNINTERESTING) + return; + commit->object.flags |= UNINTERESTING; + + /* +* Normally we haven't parsed the parent +* yet, so we won't have a parent of a parent +* here. However, it may turn out that we've +* reached this commit some other way (where it +* wasn't uninteresting), in which case we need +* to mark its parents recursively too.. +*/ for (l = commit->parents; l; l = l->next) - commit_stack_push(, l->item); + commit_stack_push(pending, l->item); +} - while (pending.nr > 0) { - struct commit *commit = commit_stack_pop(); +void mark_parents_uninteresting(struct commit *commit) +{ + struct commit_stack pending = COMMIT_STACK_INIT; + struct commit_list *l; - if (commit->object.flags & UNINTERESTING) - return; - commit->object.flags |= UNINTERESTING; + for (l = commit->parents; l; l = l->next) + mark_one_parent_uninteresting(l->item, ); - /* -* Normally we haven't parsed the parent -* yet, so we won't have a parent of a parent -* here. However, it may turn out that we've -* reached this commit some other way (where it -* wasn't uninteresting), in which case we need -* to mark its parents recursively too.. -*/ - for (l = commit->parents; l; l = l->next) - commit_stack_push(, l->item); - } + while (pending.nr > 0) + mark_one_parent_uninteresting(commit_stack_pop(), + ); commit_stack_clear(); } -- 2.17.0.988.gec4b43b3e5
[PATCH 3/4] mark_parents_uninteresting(): replace list with stack
The mark_parents_uninteresting() function uses two loops: an outer one to process our queue of pending parents, and an inner one to process first-parent chains. This is a clever optimization from 941ba8db57 (Eliminate recursion in setting/clearing marks in commit list, 2012-01-14) to limit the number of linked-list allocations when following single-parent chains. Unfortunately, this makes the result a little hard to read. Let's replace the list with a stack. Then we don't have to worry about doing this double-loop optimization, as we'll just reuse the top element of the stack as we pop/push. Signed-off-by: Jeff King--- The diff makes a lot more sense with "-w". revision.c | 67 +++--- 1 file changed, 43 insertions(+), 24 deletions(-) diff --git a/revision.c b/revision.c index cee4f3a4b4..89ff9a99ce 100644 --- a/revision.c +++ b/revision.c @@ -92,38 +92,57 @@ void mark_tree_uninteresting(struct tree *tree) mark_tree_contents_uninteresting(tree); } -void mark_parents_uninteresting(struct commit *commit) +struct commit_stack { + struct commit **items; + size_t nr, alloc; +}; +#define COMMIT_STACK_INIT { NULL, 0, 0 } + +static void commit_stack_push(struct commit_stack *stack, struct commit *commit) { - struct commit_list *parents = NULL, *l; + ALLOC_GROW(stack->items, stack->nr + 1, stack->alloc); + stack->items[stack->nr++] = commit; +} - for (l = commit->parents; l; l = l->next) - commit_list_insert(l->item, ); +static struct commit *commit_stack_pop(struct commit_stack *stack) +{ + return stack->nr ? stack->items[--stack->nr] : NULL; +} - while (parents) { - struct commit *commit = pop_commit(); +static void commit_stack_clear(struct commit_stack *stack) +{ + FREE_AND_NULL(stack->items); + stack->nr = stack->alloc = 0; +} - while (commit) { - if (commit->object.flags & UNINTERESTING) - break; +void mark_parents_uninteresting(struct commit *commit) +{ + struct commit_stack pending = COMMIT_STACK_INIT; + struct commit_list *l; - commit->object.flags |= UNINTERESTING; + for (l = commit->parents; l; l = l->next) + commit_stack_push(, l->item); - /* -* Normally we haven't parsed the parent -* yet, so we won't have a parent of a parent -* here. However, it may turn out that we've -* reached this commit some other way (where it -* wasn't uninteresting), in which case we need -* to mark its parents recursively too.. -*/ - if (!commit->parents) - break; + while (pending.nr > 0) { + struct commit *commit = commit_stack_pop(); - for (l = commit->parents->next; l; l = l->next) - commit_list_insert(l->item, ); - commit = commit->parents->item; - } + if (commit->object.flags & UNINTERESTING) + return; + commit->object.flags |= UNINTERESTING; + + /* +* Normally we haven't parsed the parent +* yet, so we won't have a parent of a parent +* here. However, it may turn out that we've +* reached this commit some other way (where it +* wasn't uninteresting), in which case we need +* to mark its parents recursively too.. +*/ + for (l = commit->parents; l; l = l->next) + commit_stack_push(, l->item); } + + commit_stack_clear(); } static void add_pending_object_with_path(struct rev_info *revs, -- 2.17.0.988.gec4b43b3e5
[PATCH 2/4] mark_parents_uninteresting(): drop missing object check
We allow UNINTERESTING objects in a traversal to be unavailable. As part of this, mark_parents_uninteresting() checks whether we have a particular uninteresting parent; if not, we will mark it as "parsed" so that later code skips it. This code is redundant and even a little bit harmful, so let's drop it. It's redundant because when our parse_object() call in add_parents_to_list() fails, we already quietly skip UNINTERESTING parents. This redundancy is a historical artifact. The mark_parents_uninteresting() protection is from 454fbbcde3 (git-rev-list: allow missing objects when the parent is marked UNINTERESTING, 2005-07-10). Much later, aeeae1b771 (revision traversal: allow UNINTERESTING objects to be missing, 2009-01-27) covered more cases by making the actual parse more gentle. As an aside, even if this weren't redundant, it would be insufficient. The gentle parsing handles both missing and corrupted objects, whereas the has_object_file() check we're getting rid of covers only missing ones. And the code we're dropping is harmful for two reasons: 1. We spend extra time on the object lookup, even though we don't actually need the information at this point (and will just repeat that lookup later when we parse for the common case that we _do_ have the object). 2. It "lies" about the commit by setting the parsed flag, even though we didn't load any useful data into the struct. This shouldn't matter for the UNINTERESTING case, but we may later clear our flags and do another traversal in the same process. While pretty unlikely, it's possible that we could then look at the same commit without the UNINTERESTING flag, in which case we'd produce the wrong result (we'd think it's a commit with no parents, when in fact we should probably die due to the missing object). Signed-off-by: Jeff King--- revision.c | 12 1 file changed, 12 deletions(-) diff --git a/revision.c b/revision.c index ef70f69f08..cee4f3a4b4 100644 --- a/revision.c +++ b/revision.c @@ -103,18 +103,6 @@ void mark_parents_uninteresting(struct commit *commit) struct commit *commit = pop_commit(); while (commit) { - /* -* A missing commit is ok iff its parent is marked -* uninteresting. -* -* We just mark such a thing parsed, so that when -* it is popped next time around, we won't be trying -* to parse it and get an error. -*/ - if (!commit->object.parsed && - !has_object_file(>object.oid)) - commit->object.parsed = 1; - if (commit->object.flags & UNINTERESTING) break; -- 2.17.0.988.gec4b43b3e5
[PATCH 1/4] mark_tree_contents_uninteresting(): drop missing object check
It's generally acceptable for UNINTERESTING objects in a traversal to be unavailable (e.g., see aeeae1b771). When marking trees UNINTERESTING, we access the object database twice: once to check if the object is missing (and return quietly if it is), and then again to actually parse it. We can instead just try to parse; if that fails, we can then return quietly. That halves the effort we spend on locating the object. Note that this isn't _exactly_ the same as the original behavior, as the parse failure could be due to other problems than a missing object: it could be corrupted, in which case the original code would have died. But the new behavior is arguably better, as it covers the object being unavailable for any reason. We'll also still issue a warning to stderr in such a case. Signed-off-by: Jeff King--- revision.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/revision.c b/revision.c index 1cff11833e..ef70f69f08 100644 --- a/revision.c +++ b/revision.c @@ -52,12 +52,9 @@ static void mark_tree_contents_uninteresting(struct tree *tree) { struct tree_desc desc; struct name_entry entry; - struct object *obj = >object; - if (!has_object_file(>oid)) + if (parse_tree_gently(tree, 1) < 0) return; - if (parse_tree(tree) < 0) - die("bad tree %s", oid_to_hex(>oid)); init_tree_desc(, tree->buffer, tree->size); while (tree_entry(, )) { -- 2.17.0.988.gec4b43b3e5
[PATCH 0/4] a few mark_parents_uninteresting cleanups
This is a follow-up to the discussion from February: https://public-inbox.org/git/1519522496-73090-1-git-send-email-dsto...@microsoft.com/ There I theorized that some of these extra has_object_file() checks were unnecessary. After poking around a bit, I've convinced myself that this is the case, so here are some patches. After Stolee's fix in ebbed3ba04 (revision.c: reduce object database queries, 2018-02-24), I doubt this will provide any noticeable speedup. IMHO the value is just in simplifying the logic. The first two patches are the actual has_object_file simplifications. The second two are an attempt to fix some head-scratching I had while reading mark_parents_uninteresting(). I hope the result is easier to follow, but I may have just shuffled one confusion for another. :) [1/4]: mark_tree_contents_uninteresting(): drop missing object check [2/4]: mark_parents_uninteresting(): drop missing object check [3/4]: mark_parents_uninteresting(): replace list with stack [4/4]: mark_parents_uninteresting(): avoid most allocation revision.c | 90 ++ 1 file changed, 50 insertions(+), 40 deletions(-) -Peff
Re: Patch add: previous hunk across file boundaries
On Fri, May 11, 2018 at 09:53:52AM -0500, Robert Dailey wrote: > I noticed that when stepping into a new file while doing `git add -p`, > pressing `k` or `K` does not go back to the previous file. Is this a > bug? Is there a setting for it? I googled & checked out the git docs, > I didn't find any specific information on this. Nope, this is just the way that "-p" is implemented. It considers each file independently (another artifact of this is that after completing the hunks in a file the changes are staged, even if you quit while looking at a hunk in another file). I agree that it would be nicer to consider the whole set atomically, with both "save and quit" and "quit but do not save" options. The way it works now is mostly due to the way that "add -p" grew out of the whole "add -i" interactive system, where you'd invoke the patch-adder on specific files from a menu. So no, there's not a way to do what you want right now short of writing patch. -Peff
Re: [PATCH] fast-export: avoid NULL pointer arithmetic
On Fri, May 11, 2018 at 03:34:19PM +0200, Duy Nguyen wrote: > On Fri, May 11, 2018 at 03:11:46PM +0200, Duy Nguyen wrote: > > Back to fast-export, can we just allocate a new int on heap and point > > it there? Allocating small pieces becomes quite cheap and fast with > > mem-pool.h and we can avoid this storing integer in pointer business. > > Something like this seems to work, but we use 4-ish more bytes per > object, or 100MB overhead on a repo with 25M objects. I think it's a > reasonable trade off. I'm not sure I agree. 4 bytes per object certainly isn't the end of the world, but what was the problem we were solving in the first place? Just that we weren't comfortable with the round-trip from uintptr_t to void and back? Is this actually a problem on real platforms? If not, it seems silly to incur a run-time cost. -Peff
Re: [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch
On 11 May 2018 at 19:23, Derrick Stoleewrote: > Martin's initial test cases are wonderful. I've adapted them to test the > other conditions in the verify_commit_graph() method and caught some > interesting behavior in the process. I'm preparing v2 so we can investigate > the direction of the tests. Cool, I'm glad you found them useful. One thought I had was that you could possibly write the tests such that you introduce errors from the back of the file. That might enable you to do less of the "backup commit-graph file and restore it"-dance. Just a thought. Martin
Re: [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch
On 5/10/2018 3:22 PM, Stefan Beller wrote: On Thu, May 10, 2018 at 12:05 PM, Martin Ågrenwrote: On 10 May 2018 at 19:34, Derrick Stolee wrote: Commits 01-07 focus on the 'git commit-graph verify' subcommand. These are ready for full, rigorous review. I don't know about "full" and "rigorous", but I tried to offer my thoughts. A lingering feeling I have is that users could possibly benefit from seeing "the commit-graph has a bad foo" a bit more than just "the commit-graph is bad". But adding "the bar is baz, should have been frotz" might not bring that much. Maybe you could keep the translatable string somewhat simple, then, if the more technical data could be useful to Git developers, dump it in a non-translated format. (I guess it could be hidden behind a debug switch, but let's take one step at a time..) This is nothing I feel strongly about. t/t5318-commit-graph.sh | 25 + I wonder about tests. Some tests seem to use `dd` to corrupt files and check that it gets caught. Doing this in a a hash-agnostic way could be tricky, but maybe not impossible. I guess we could do something probabilistic, like "set the first two bytes of the very last OID to zero -- surely all OIDs can't start with 16 zero bits". Hmm, that might still require knowing the size of the OIDs... I hope to find time to do some more hands-on testing of this to see that errors actually do get caught. Given that the commit graph is secondary data, the users work around to quickly get back to a well working repository is most likely to remove the file and regenerate it. As a developer who wants to fix the bug, a stacktrace/datadump and the history of git commands might be most valuable, but I agree we should hide that behind a debug flag. Packfiles and loose objects are primary data, which means that those need a more advanced way to diagnose and repair them, so I would imagine the commit graph fsck is closer to bitmaps fsck, which I would have suspected to be found in t5310, but a quick read doesn't reveal many tests that are checking for integrity. So I guess the test coverage here is ok, (although we should always ask for more) My main goal is to help developers figure out what is wrong with a file, and then we can use other diagnostic tools to discover how it got into that state. Martin's initial test cases are wonderful. I've adapted them to test the other conditions in the verify_commit_graph() method and caught some interesting behavior in the process. I'm preparing v2 so we can investigate the direction of the tests. Thanks, -Stolee
Re: [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch
On 5/10/2018 3:17 PM, Ævar Arnfjörð Bjarmason wrote: On Thu, May 10 2018, Derrick Stolee wrote: The behavior in this patch series does the following: 1. Near the end of 'git gc', run 'git commit-graph write'. The location of this code assumes that a 'git gc --auto' has not terminated early due to not meeting the auto threshold. 2. At the end of 'git fetch', run 'git commit-graph write'. This means that every reachable commit will be in the commit-graph after a a successful fetch, which seems a reasonable frequency. Then, the only times we would be missing a reachable commit is after creating one locally. There is a problem with the current patch, though: every 'git fetch' call runs 'git commit-graph write', even if there were no ref updates or objects downloaded. Is there a simple way to detect if the fetch was non-trivial? One obvious problem with this approach: if we compute this during 'gc' AND 'fetch', there will be times where a 'fetch' calls 'gc' and triggers two commit-graph writes. If I were to abandon one of these patches, it would be the 'fetch' integration. A 'git gc' really wants to delete all references to unreachable commits, and without updating the commit-graph we may still have commit data in the commit-graph file that is not in the object database. In fact, deleting commits from the object database but not from the commit-graph will cause 'git commit-graph verify' to fail! I welcome discussion on these ideas, as we are venturing out of the "pure data structure" world and into the "user experience" world. I am less confident in my skills in this world, but the feature is worthless if it does not improve the user experience. I really like #1 here, but I wonder why #2 is necessary. I.e. is it critical for the performance of the commit graph feature that it be kept really up-to-date, moreso than other things that rely on gc --auto (e.g. the optional bitmap index)? It is not critical. The feature has been designed to have recent commits not in the file. For simplicity, it is probably best to limit ourselves to writing after a non-trivial 'gc'. Even if that's the case, I think something that does this via gc --auto is a much better option. I.e. now we have gc.auto & gc.autoPackLimit, if the answer to my question above is "yes" this could also be accomplished by introducing a new graph-specific gc.* setting, and --auto would just update the graph more often, but leave the rest. This is an excellent idea for a follow-up series, if we find we want the commit-graph written more frequently. For now, I'm satisfied with one place where it is automatically computed. I'll drop the fetch integration in my v2 series. Thanks, -Stolee
[PATCH] commit.h: rearrange 'index' to shrink struct commit
On linux 64-bit architecture, pahole finds that there's a 4 bytes padding after 'index'. Moving it to the end reduces this struct's size from 72 to 64 bytes (because of another 4 bytes padding after graph_pos). On linux 32-bit, the struct size remains 52 bytes like before. Signed-off-by: Nguyễn Thái Ngọc Duy--- commit.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/commit.h b/commit.h index e57ae4b583..fd1cbe7263 100644 --- a/commit.h +++ b/commit.h @@ -19,11 +19,11 @@ struct commit_list { struct commit { struct object object; void *util; - unsigned int index; timestamp_t date; struct commit_list *parents; struct tree *tree; uint32_t graph_pos; + unsigned int index; }; extern int save_commit_buffer; -- 2.17.0.705.g3525833791
Re: [PATCH v7 07/13] completion: implement and use --list-cmds=main,others
On Fri, May 11, 2018 at 4:06 PM, SZEDER Gáborwrote: > On Thu, May 10, 2018 at 10:46 AM, Nguyễn Thái Ngọc Duy > wrote: >> Instead of parsing "git help -a" output, which is tricky to get right, >> less elegant and also slow, > > Is it tricky? It seems rather straightforward. But there are traps. [1] is a reported one. And since you're parsing what's meant for human reader, this code could easily break in the future. [1] https://public-inbox.org/git/CAPQz56bts8zFfUHyKJqnefQoH97L5TTA-k3be=5hsdeebcm...@mail.gmail.com/ > Is it slow? Well, there is a pipe and an egrep, sure, but thanks to > caching it's only run once, so basically doesn't matter. This I agree is an exaggeration. Ultimately though I am breaking down and providing the information pieces to __git_list_porcelain_commands() and it happens to benefit this one too. Perhaps I should rephrase my commit to say this. -- Duy
[PATCH v4] add status config and command line options for rename detection
After performing a merge that has conflicts git status will, by default, attempt to detect renames which causes many objects to be examined. In a virtualized repo, those objects do not exist locally so the rename logic triggers them to be fetched from the server. This results in the status call taking hours to complete on very large repos vs seconds with this patch. Add a new config status.renames setting to enable turning off rename detection during status and commit. This setting will default to the value of diff.renames. Add a new config status.renamelimit setting to to enable bounding the time spent finding out inexact renames during status and commit. This setting will default to the value of diff.renamelimit. Add --no-renames command line option to status that enables overriding the config setting from the command line. Add --find-renames[=] command line option to status that enables detecting renames and optionally setting the similarity index. Reviewed-by: Elijah NewrenOriginal-Patch-by: Alejandro Pauly Signed-off-by: Ben Peart --- Notes: Base Ref: commit dc6b1d92ca9c0c538daa244e3910bb8b2a50d959 (em/status-rename-config) Web-Diff: https://github.com/benpeart/git/commit/95974d512b Checkout: git fetch https://github.com/benpeart/git status-renames-v4 && git checkout 95974d512b ### Interdiff (v3..v4): ### Patches Documentation/config.txt | 12 Documentation/git-status.txt | 10 builtin/commit.c | 42 + diff.c | 2 +- diff.h | 1 + t/t7525-status-rename.sh | 113 +++ wt-status.c | 12 wt-status.h | 4 +- 8 files changed, 194 insertions(+), 2 deletions(-) create mode 100755 t/t7525-status-rename.sh diff --git a/Documentation/config.txt b/Documentation/config.txt index 2659153cb3..4f1ead 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -3119,6 +3119,18 @@ status.displayCommentPrefix:: behavior of linkgit:git-status[1] in Git 1.8.4 and previous. Defaults to false. +status.renameLimit:: + The number of files to consider when performing rename detection + in linkgit:git-status[1] and linkgit:git-commit[1]. Defaults to + the value of diff.renameLimit. + +status.renames:: + Whether and how Git detects renames in linkgit:git-status[1] and + linkgit:git-commit[1] . If set to "false", rename detection is + disabled. If set to "true", basic rename detection is enabled. + If set to "copies" or "copy", Git will detect copies, as well. + Defaults to the value of diff.renames. + status.showStash:: If set to true, linkgit:git-status[1] will display the number of entries currently stashed away. diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt index c16e27e63d..c4467ffb98 100644 --- a/Documentation/git-status.txt +++ b/Documentation/git-status.txt @@ -135,6 +135,16 @@ ignored, then the directory is not shown, but all contents are shown. Display or do not display detailed ahead/behind counts for the branch relative to its upstream branch. Defaults to true. +--renames:: +--no-renames:: + Turn on/off rename detection regardless of user configuration. + See also linkgit:git-diff[1] `--no-renames`. + +--find-renames[=]:: + Turn on rename detection, optionally setting the similarity + threshold. + See also linkgit:git-diff[1] `--find-renames`. + ...:: See the 'pathspec' entry in linkgit:gitglossary[7]. diff --git a/builtin/commit.c b/builtin/commit.c index 5240f11225..b50e33ef48 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -143,6 +143,16 @@ static int opt_parse_m(const struct option *opt, const char *arg, int unset) return 0; } +static int opt_parse_rename_score(const struct option *opt, const char *arg, int unset) +{ + const char **value = opt->value; + if (arg != NULL && *arg == '=') + arg = arg + 1; + + *value = arg; + return 0; +} + static void determine_whence(struct wt_status *s) { if (file_exists(git_path_merge_head())) @@ -1259,11 +1269,31 @@ static int git_status_config(const char *k, const char *v, void *cb) return error(_("Invalid untracked files mode '%s'"), v); return 0; } + if (!strcmp(k, "diff.renamelimit")) { + if (s->rename_limit == -1) + s->rename_limit = git_config_int(k, v); + return 0; + } + if (!strcmp(k, "status.renamelimit")) { + s->rename_limit = git_config_int(k, v); + return 0; + } + if (!strcmp(k, "diff.renames")) { + if (s->detect_rename == -1) + s->detect_rename =
Re: [PATCH 04/12] commit-graph: verify fanout and lookup table
On 5/10/2018 2:29 PM, Martin Ågren wrote: On 10 May 2018 at 19:34, Derrick Stoleewrote: While running 'git commit-graph verify', verify that the object IDs are listed in lexicographic order and that the fanout table correctly navigates into that list of object IDs. Signed-off-by: Derrick Stolee --- commit-graph.c | 33 + 1 file changed, 33 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index ce11af1d20..b4c146c423 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -839,6 +839,9 @@ static int verify_commit_graph_error; int verify_commit_graph(struct commit_graph *g) { + uint32_t i, cur_fanout_pos = 0; + struct object_id prev_oid, cur_oid; + if (!g) { graph_report(_("no commit-graph file loaded")); return 1; @@ -853,5 +856,35 @@ int verify_commit_graph(struct commit_graph *g) if (!g->chunk_commit_data) graph_report(_("commit-graph is missing the Commit Data chunk")); + for (i = 0; i < g->num_commits; i++) { + hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); + + if (i > 0 && oidcmp(_oid, _oid) >= 0) + graph_report(_("commit-graph has incorrect oid order: %s then %s"), Minor: I think our style would prefer s/i > 0/i/. I suppose the second check should be s/>=/>/, but it's not like it should matter. ;-) It shouldn't, but a duplicate commit is still an incorrect oid order. I wonder if this is a message that would virtually never make sense to a user, but more to a developer. Leave it untranslated to make sure any bug reports to the list are readable to us? Yeah, I'll make all of the errors untranslated since they are for developer debugging, not end-user information. + + oid_to_hex(_oid), + oid_to_hex(_oid)); Hmm, these two lines do not actually achieve anything? It's a formatting bug: They complete the statement starting with 'graph_report(' above. + oidcpy(_oid, _oid); + + while (cur_oid.hash[0] > cur_fanout_pos) { + uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos); + if (i != fanout_value) + graph_report(_("commit-graph has incorrect fanout value: fanout[%d] = %u != %u"), +cur_fanout_pos, fanout_value, i); Same though re `_()`, even more so because of the more technical notation. + + cur_fanout_pos++; + } + } + + while (cur_fanout_pos < 256) { + uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos); + + if (g->num_commits != fanout_value) + graph_report(_("commit-graph has incorrect fanout value: fanout[%d] = %u != %u"), +cur_fanout_pos, fanout_value, i); Same here. Or maybe these should just give a translated user-readable basic idea of what is wrong and skip the details? As someone who is responsible for probably inserting these problems, I think the details are important. Thanks, -Stolee
Re: [PATCH v7 12/13] completion: let git provide the completable command list
On Thu, May 10, 2018 at 10:46 AM, Nguyễn Thái Ngọc Duywrote: > Instead of maintaining a separate list of command classification, > which often could go out of date, let's centralize the information > back in git. > > While the function in git-completion.bash implies "list porcelain > commands", that's not exactly what it does. It gets all commands (aka > --list-cmds=main,others) then exclude certain non-porcelain ones. We > could almost recreate this list two lists list-mainporcelain and > others. The non-porcelain-but-included-anyway is added by the third > category list-complete. > > list-complete does not recreate exactly the command list before this > patch though. The following commands are not part of neither > list-mainporcelain nor list-complete and as a result no longer > completes: > > - annotate obsolete, discouraged to use > - difftool-helper not an end user command > - filter-branchnot often used > - get-tar-commit-idnot often used > - imap-sendnot often used > - interpreter-trailers not for interactive use > - lost-found obsolete > - p4 too short and probably not often used (*) > - peek-remote deprecated > - svn same category as p4 (*) > - tar-tree obsolete > - verify-commitnot often used 'git name-rev' is plumbing as well. I think this commit should be split into two: - first do the unequivocally beneficial thing and get rid of the long, hard-coded command list in __git_list_porcelain_commands(), while keeping its output unchanged, - then do the arguable thing and change the list of commands. > Note that the current completion script incorrectly classifies > filter-branch as porcelain and t9902 tests this behavior. We keep it > this way in t9902 because this test does not really care which > particular command is porcelain or plubmbing, they're just names. > > (*) to be fair, send-email command which is in the same > foreignscminterface group as svn and p4 does get completion, just > because it's used by git and kernel development. So maybe should > include them. > > Signed-off-by: Nguyễn Thái Ngọc Duy > --- > command-list.txt | 37 > contrib/completion/git-completion.bash | 117 ++--- > t/t9902-completion.sh | 5 +- > 3 files changed, 48 insertions(+), 111 deletions(-) > diff --git a/contrib/completion/git-completion.bash > b/contrib/completion/git-completion.bash > index 4e724a5b76..908692ea52 100644 > --- a/contrib/completion/git-completion.bash > +++ b/contrib/completion/git-completion.bash > @@ -835,18 +835,30 @@ __git_complete_strategy () > } > > __git_commands () { > - if test -n "${GIT_TESTING_COMMAND_COMPLETION:-}" > - then > - printf "%s" "${GIT_TESTING_COMMAND_COMPLETION}" > - else > - git --list-cmds=main,others > - fi > + case "$1" in > + porcelain) > + if test -n "$GIT_TESTING_PORCELAIN_COMMAND_LIST" > + then > + printf "%s" "$GIT_TESTING_PORCELAIN_COMMAND_LIST" > + else > + git > --list-cmds=list-mainporcelain,others,list-complete > + fi > + ;; > + all) > + if test -n "$GIT_TESTING_ALL_COMMAND_LIST" > + then > + printf "%s" "$GIT_TESTING_ALL_COMMAND_LIST" > + else > + git --list-cmds=main,others > + fi > + ;; > + esac > } > > -__git_list_all_commands () > +__git_list_commands () Please add comments before these functions to document their mandatory argument. > { > local i IFS=" "$'\n' > - for i in $(__git_commands) > + for i in $(__git_commands $1) > do > case $i in > *--*) : helper pattern;; Is this loop to exclude helper commands with doubledash in their name still necessary?
Patch add: previous hunk across file boundaries
I noticed that when stepping into a new file while doing `git add -p`, pressing `k` or `K` does not go back to the previous file. Is this a bug? Is there a setting for it? I googled & checked out the git docs, I didn't find any specific information on this.
Re: [PATCH v3] add status config and command line options for rename detection
Hi Ben, On Fri, May 11, 2018 at 5:56 AM, Ben Peartwrote: > After performing a merge that has conflicts git status will, by default, > attempt to detect renames which causes many objects to be examined. In a > virtualized repo, those objects do not exist locally so the rename logic > triggers them to be fetched from the server. This results in the status call > taking hours to complete on very large repos vs seconds with this patch. > > Add a new config status.renames setting to enable turning off rename detection > during status and commit. This setting will default to the value of > diff.renames. > > Add a new config status.renamelimit setting to to enable bounding the time > spent finding out inexact renames during status and commit. This setting will > default to the value of diff.renamelimit. > > Add --no-renames command line option to status that enables overriding the > config setting from the command line. Add --find-renames[=] command line > option to status that enables detecting renames and optionally setting the > similarity index. Any chance I could get you to re-wrap this at a smaller column width? Doesn't fit in my (80-char) terminal when I run `git log`; a couple lines run over by a couple characters. (Sorry for not noticing this earlier) > This patch depends on em/status-rename-config I'd leave this line for the notes. It's useful information now, but won't be to someone looking at the commit a year from now, so it probably doesn't belong in the commit message. With those two changes: Reviewed-by: Elijah Newren
Re: [PATCH v7 07/13] completion: implement and use --list-cmds=main,others
On Fri, May 11, 2018 at 4:06 PM, SZEDER Gáborwrote: > OTOH you don't mention the most important reason, namely that the > completion script contains a long hard-coded list of the names of all > known plumbing commands to filter out, which is redundant with the > categorization in 'command-list.txt', is a maintenance burden, and > tends to get out-of-sync when new plumbing commands are added. Oops, sorry, got lost among the many patches, please just forget this paragraph.
Re: [PATCH v7 07/13] completion: implement and use --list-cmds=main,others
On Thu, May 10, 2018 at 10:46 AM, Nguyễn Thái Ngọc Duywrote: > Instead of parsing "git help -a" output, which is tricky to get right, > less elegant and also slow, Is it tricky? It seems rather straightforward. Is it slow? Well, there is a pipe and an egrep, sure, but thanks to caching it's only run once, so basically doesn't matter. OTOH you don't mention the most important reason, namely that the completion script contains a long hard-coded list of the names of all known plumbing commands to filter out, which is redundant with the categorization in 'command-list.txt', is a maintenance burden, and tends to get out-of-sync when new plumbing commands are added. > make git provide the list in a > machine-friendly form. This adds two separate listing types, main and > others, instead of just "all" for more flexibility. > > Signed-off-by: Nguyễn Thái Ngọc Duy > --- > contrib/completion/git-completion.bash | 2 +- > git.c | 4 > help.c | 32 ++ > help.h | 4 > 4 files changed, 41 insertions(+), 1 deletion(-) > > diff --git a/contrib/completion/git-completion.bash > b/contrib/completion/git-completion.bash > index 3556838759..62ca8641f4 100644 > --- a/contrib/completion/git-completion.bash > +++ b/contrib/completion/git-completion.bash > @@ -839,7 +839,7 @@ __git_commands () { > then > printf "%s" "${GIT_TESTING_COMMAND_COMPLETION}" > else > - git help -a|egrep '^ [a-zA-Z0-9]' > + git --list-cmds=main,others > fi > } > > diff --git a/git.c b/git.c > index 376a59b97f..10907f7266 100644 > --- a/git.c > +++ b/git.c > @@ -56,6 +56,10 @@ static int list_cmds(const char *spec) > > if (match_token(spec, len, "builtins")) > list_builtins(, 0); > + else if (match_token(spec, len, "main")) > + list_all_main_cmds(); > + else if (match_token(spec, len, "others")) > + list_all_other_cmds(); > else > die(_("unsupported command listing type '%s'"), spec); > spec += len; > diff --git a/help.c b/help.c > index 2d6a3157f8..d5ce9dfcbb 100644 > --- a/help.c > +++ b/help.c > @@ -297,6 +297,38 @@ void list_common_cmds_help(void) > print_cmd_by_category(common_categories); > } > > +void list_all_main_cmds(struct string_list *list) > +{ > + struct cmdnames main_cmds, other_cmds; > + int i; > + > + memset(_cmds, 0, sizeof(main_cmds)); > + memset(_cmds, 0, sizeof(other_cmds)); > + load_command_list("git-", _cmds, _cmds); > + > + for (i = 0; i < main_cmds.cnt; i++) > + string_list_append(list, main_cmds.names[i]->name); > + > + clean_cmdnames(_cmds); > + clean_cmdnames(_cmds); > +} > + > +void list_all_other_cmds(struct string_list *list) > +{ > + struct cmdnames main_cmds, other_cmds; > + int i; > + > + memset(_cmds, 0, sizeof(main_cmds)); > + memset(_cmds, 0, sizeof(other_cmds)); > + load_command_list("git-", _cmds, _cmds); > + > + for (i = 0; i < other_cmds.cnt; i++) > + string_list_append(list, other_cmds.names[i]->name); > + > + clean_cmdnames(_cmds); > + clean_cmdnames(_cmds); > +} > + > int is_in_cmdlist(struct cmdnames *c, const char *s) > { > int i; > diff --git a/help.h b/help.h > index b21d7c94e8..97e6c0965e 100644 > --- a/help.h > +++ b/help.h > @@ -1,6 +1,8 @@ > #ifndef HELP_H > #define HELP_H > > +struct string_list; > + > struct cmdnames { > int alloc; > int cnt; > @@ -17,6 +19,8 @@ static inline void mput_char(char c, unsigned int num) > } > > extern void list_common_cmds_help(void); > +extern void list_all_main_cmds(struct string_list *list); > +extern void list_all_other_cmds(struct string_list *list); > extern const char *help_unknown_cmd(const char *cmd); > extern void load_command_list(const char *prefix, > struct cmdnames *main_cmds, > -- > 2.17.0.705.g3525833791 >
Re: [PATCH] fast-export: avoid NULL pointer arithmetic
On Fri, May 11, 2018 at 03:11:46PM +0200, Duy Nguyen wrote: > Back to fast-export, can we just allocate a new int on heap and point > it there? Allocating small pieces becomes quite cheap and fast with > mem-pool.h and we can avoid this storing integer in pointer business. Something like this seems to work, but we use 4-ish more bytes per object, or 100MB overhead on a repo with 25M objects. I think it's a reasonable trade off. -- 8< -- diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 530df12f05..de593035b1 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -21,6 +21,7 @@ #include "quote.h" #include "remote.h" #include "blob.h" +#include "mem-pool.h" static const char *fast_export_usage[] = { N_("git fast-export [rev-list-opts]"), @@ -38,6 +39,7 @@ static struct string_list extra_refs = STRING_LIST_INIT_NODUP; static struct refspec *refspecs; static int refspecs_nr; static int anonymize; +static struct mem_pool int_pool = MEM_POOL_INIT(2 * 1024 * 1024); static int parse_opt_signed_tag_mode(const struct option *opt, const char *arg, int unset) @@ -156,20 +158,22 @@ static void anonymize_path(struct strbuf *out, const char *path, } } -/* Since intptr_t is C99, we do not use it here */ -static inline uint32_t *mark_to_ptr(uint32_t mark) +static inline uint32_t ptr_to_mark(void *mark) { - return ((uint32_t *)NULL) + mark; -} - -static inline uint32_t ptr_to_mark(void * mark) -{ - return (uint32_t *)mark - (uint32_t *)NULL; + if (!mark) + BUG("not marked!"); + return *(uint32_t *)mark; } static inline void mark_object(struct object *object, uint32_t mark) { - add_decoration(, object, mark_to_ptr(mark)); + uint32_t *ptr = lookup_decoration(, object); + + if (!ptr) + ptr = mem_pool_alloc(_pool, sizeof(uint32_t)); + + *ptr = mark; + add_decoration(, object, ptr); } static inline void mark_next_object(struct object *object) diff --git a/fast-import.c b/fast-import.c index 34edf3fb8f..ce5ce2081f 100644 --- a/fast-import.c +++ b/fast-import.c @@ -300,8 +300,7 @@ static int global_argc; static const char **global_argv; /* Memory pools */ -static struct mem_pool fi_mem_pool = {NULL, 2*1024*1024 - - sizeof(struct mp_block), 0 }; +static struct mem_pool fi_mem_pool = MEM_POOL_INIT(2*1024*1024); /* Atom management */ static unsigned int atom_table_sz = 4451; diff --git a/mem-pool.h b/mem-pool.h index 829ad58ecf..bccbd3f224 100644 --- a/mem-pool.h +++ b/mem-pool.h @@ -21,6 +21,8 @@ struct mem_pool { size_t pool_alloc; }; +#define MEM_POOL_INIT(block_size) { NULL, (block_size) - sizeof(struct mp_block), 0 } + /* * Alloc memory from the mem_pool. */ -- 8< --
Re: [PATCH] fast-export: avoid NULL pointer arithmetic
On Fri, May 11, 2018 at 10:56 AM, Jeff Kingwrote: > On Fri, May 11, 2018 at 08:19:59AM +0200, Duy Nguyen wrote: > >> On Fri, May 11, 2018 at 6:49 AM, Junio C Hamano wrote: >> > Junio C Hamano writes: >> > >> >> René Scharfe writes: >> >> >> But it somehow feels backwards in spirit to me, as the reason why we >> use "void *" there in the decoration field is because we expect that >> we'd have a pointer to some struture most of the time, and we have >> to occasionally store a small integer there. >> >>> >> >>> Yes, fast-export seems to be the only place that stores an integer as >> >>> a decoration. >> >> >> >> With the decoration subsystem that might be the case, but I think >> >> we have other codepaths where "void * .util" field in the structure >> >> is used to store (void *)1, expecting that a normal allocation will >> >> never yield a pointer that is indistinguishable from that value. >> > >> > I was looking at a different topic and noticed that bisect.c uses >> > commit->util (which is of type "void *") to always store an int (it >> > never stores a pointer and there is no mixing). This one is equally >> > unportable as fast-export after your fix. >> > >> >> In both cases we should be able to use commit-slab instead of >> commit->util. We could go even further and kill "util" pointer but >> that's more work. > > I would love it if we could kill the util pointer in favor of > commit-slab. Unfortunately the fast-export case is decorating non-commit > objects, too. Right. The "util" thing was a side discussion. Back to fast-export, can we just allocate a new int on heap and point it there? Allocating small pieces becomes quite cheap and fast with mem-pool.h and we can avoid this storing integer in pointer business. -- Duy
[PATCH v3] add status config and command line options for rename detection
After performing a merge that has conflicts git status will, by default, attempt to detect renames which causes many objects to be examined. In a virtualized repo, those objects do not exist locally so the rename logic triggers them to be fetched from the server. This results in the status call taking hours to complete on very large repos vs seconds with this patch. Add a new config status.renames setting to enable turning off rename detection during status and commit. This setting will default to the value of diff.renames. Add a new config status.renamelimit setting to to enable bounding the time spent finding out inexact renames during status and commit. This setting will default to the value of diff.renamelimit. Add --no-renames command line option to status that enables overriding the config setting from the command line. Add --find-renames[=] command line option to status that enables detecting renames and optionally setting the similarity index. This patch depends on em/status-rename-config Original-Patch-by: Alejandro PaulySigned-off-by: Ben Peart --- Notes: Base Ref: commit dc6b1d92ca9c0c538daa244e3910bb8b2a50d959 (em/status-rename-config) Web-Diff: https://github.com/benpeart/git/commit/5bac43610b Checkout: git fetch https://github.com/benpeart/git status-renames-v3 && git checkout 5bac43610b ### Interdiff (v2..v3): diff --git a/Documentation/config.txt b/Documentation/config.txt index 9c8eca05b1..4f1ead 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -3120,14 +3120,16 @@ status.displayCommentPrefix:: Defaults to false. status.renameLimit:: - The number of files to consider when performing rename detection; - if not specified, defaults to the value of diff.renameLimit. + The number of files to consider when performing rename detection + in linkgit:git-status[1] and linkgit:git-commit[1]. Defaults to + the value of diff.renameLimit. status.renames:: - Whether and how Git detects renames. If set to "false", - rename detection is disabled. If set to "true", basic rename - detection is enabled. If set to "copies" or "copy", Git will - detect copies, as well. Defaults to the value of diff.renames. + Whether and how Git detects renames in linkgit:git-status[1] and + linkgit:git-commit[1] . If set to "false", rename detection is + disabled. If set to "true", basic rename detection is enabled. + If set to "copies" or "copy", Git will detect copies, as well. + Defaults to the value of diff.renames. status.showStash:: If set to true, linkgit:git-status[1] will display the number of diff --git a/builtin/commit.c b/builtin/commit.c index db886277f4..b50e33ef48 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1373,6 +1373,7 @@ int cmd_status(int argc, const char **argv, const char *prefix) if (no_renames != -1) s.detect_rename = !no_renames; if ((intptr_t)rename_score_arg != -1) { + if (s.detect_rename < DIFF_DETECT_RENAME) s.detect_rename = DIFF_DETECT_RENAME; if (rename_score_arg) s.rename_score = parse_rename_score(_score_arg); diff --git a/t/t7525-status-rename.sh b/t/t7525-status-rename.sh old mode 100644 new mode 100755 index 311df8038a..ef8b1b3078 --- a/t/t7525-status-rename.sh +++ b/t/t7525-status-rename.sh @@ -10,14 +10,13 @@ test_expect_success 'setup' ' git commit -m"Adding original file." && mv original renamed && echo 2 >> renamed && - git add . -' - -cat >.gitignore <<\EOF + git add . && + cat >.gitignore <<-\EOF .gitignore expect* actual* EOF +' test_expect_success 'status no-options' ' git status >actual && @@ -63,7 +62,18 @@ test_expect_success 'status status.renames=true' ' test_i18ngrep "renamed:" actual ' -test_expect_success 'status config overriden' ' +test_expect_success 'commit honors status.renames=false' ' + git -c status.renames=false commit --dry-run >actual && + test_i18ngrep "deleted:" actual && + test_i18ngrep "new file:" actual +' + +test_expect_success 'commit honors status.renames=true' ' + git -c status.renames=true commit --dry-run >actual && + test_i18ngrep "renamed:" actual +' + +test_expect_success 'status config overridden' ' git -c status.renames=true status --no-renames >actual && test_i18ngrep "deleted:" actual && test_i18ngrep "new file:" actual @@ -87,4 +97,17 @@ test_expect_success 'status score=01%' ' test_i18ngrep "renamed:" actual ' +test_expect_success 'copies not overridden by find-rename' '
Re: [PATCH v2] add status config and command line options for rename detection
On 5/10/2018 6:31 PM, Elijah Newren wrote: Hi Ben, On Thu, May 10, 2018 at 12:09 PM, Ben Peartwrote: On 5/10/2018 12:19 PM, Elijah Newren wrote: On Thu, May 10, 2018 at 7:16 AM, Ben Peart wrote: Given the example perf impact is arbitrary (the actual example that triggered this patch took status from 2+ hours to seconds) and can't be replicated using the current performance tools in git, I'm just going drop the specific numbers. I believe the patch is worth while just to give users the flexibility to control these behaviors. Your parenthetical statement of timing going from hours to seconds I think would be great; I don't think we need precise numbers. + if ((intptr_t)rename_score_arg != -1) { + s.detect_rename = DIFF_DETECT_RENAME; I'd still prefer this was a if (s.detect_rename < DIFF_DETECT_RENAME) s.detect_rename = DIFF_DETECT_RENAME; If a user specifies they are willing to pay for copy detection, but then just passes --find-renames=40% because they want to find more renames, it seems odd to disable copy detection to me. I agree and will change it. It is unfortunate this will behave differently than it does with merge. Fixing the merge behavior to match is outside the scope of this patch. I agree that changing merge is outside the scope of this patch, but I'm curious what change you have in mind for it to "make it match". In particular, merge-recursive.c already has (or will shortly have) + if (opts.detect_rename > DIFF_DETECT_RENAME) + opts.detect_rename = DIFF_DETECT_RENAME; from your commit 85b460305ce7 ("merge: add merge.renames config setting", 2018-05-02), This is a good point that I missed. With that recent change to merge, it no longer matters that the settings parsing code caps detect_rename at DIFF_DETECT_RENAME because it will cap it later anyway so there is no need to change the merge option behavior. The one place copy detection does make sense inside a merge is for the diffstat shown at the end (from builtin/merge.c), but it currently isn't controlled by any configuration setting at all. When it is hooked up, it'd probably store the value separately from merge-recursive's internal o->{diff,merge}_detect_rename anyway, because builtin/merge.c's diffstat should be controlled by the relevant confiig settings and flags (merge.renames, diff.renames, -Xfind-renames, etc.) regardless of which merge strategy (recursive, resolve, octopus, ours, ort) is employed. And when that is hooked up, I agree with you that it should look like what you've done with status.renames here. In fact, if you'd like to take a crack at it, I think you'd do a great job. :-) If not, it's on my list of things to do. Thanks but I'll leave that to you. :) I have a large backlog of patches I would like to see pushed through the mailing list into master. We've been sitting on this one for over a year. If the current rate is any indication, it will take man years to get caught up. Testcases look good. It'd be nice to also add a few testcases where copy detection is turned on -- in particular, I'd like to see one with --find-renames=$DIFFERENT_THAN_DEFAULT being passed when merge.renames=copies. OK. I also added tests to verify the settings correctly impact commit. Nice!
Re: [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch
On 5/10/2018 4:45 PM, Martin Ågren wrote: On 10 May 2018 at 21:22, Stefan Bellerwrote: On Thu, May 10, 2018 at 12:05 PM, Martin Ågren wrote: I hope to find time to do some more hands-on testing of this to see that errors actually do get caught. Packfiles and loose objects are primary data, which means that those need a more advanced way to diagnose and repair them, so I would imagine the commit graph fsck is closer to bitmaps fsck, which I would have suspected to be found in t5310, but a quick read doesn't reveal many tests that are checking for integrity. So I guess the test coverage here is ok, (although we should always ask for more) Since I'm wrapping up for today, I'm posting some simple tests that I assembled. The last of these showcases one or two problems with the current error-reporting. Depending on the error, there can be *lots* of errors reported and there are no new-lines, so the result on stdout can be a wall of not-very-legible text. Some of these might not make sense. I just started going through the documentation on the format, causing some sort of corruption in each field. Maybe this can be helpful somehow. Martin diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 82f95eb11f..a7e48db2de 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -255,4 +255,49 @@ test_expect_success 'git fsck (checks commit-graph)' ' git fsck ' +# usage: corrupt_data [] +corrupt_data() { + file=$1 + pos=$2 + data="${3:-\0}" + printf "$data" | dd of="$file" bs=1 seek="$pos" conv=notrunc +} + +test_expect_success 'detect bad signature' ' + cd "$TRASH_DIRECTORY/full" && + cp $objdir/info/commit-graph commit-graph-backup && + test_when_finished mv commit-graph-backup $objdir/info/commit-graph && + corrupt_data $objdir/info/commit-graph 0 "\0" && + test_must_fail git commit-graph verify 2>err && + grep "graph signature" err +' + +test_expect_success 'detect bad version number' ' + cd "$TRASH_DIRECTORY/full" && + cp $objdir/info/commit-graph commit-graph-backup && + test_when_finished mv commit-graph-backup $objdir/info/commit-graph && + corrupt_data $objdir/info/commit-graph 4 "\02" && + test_must_fail git commit-graph verify 2>err && + grep "graph version" err +' + +test_expect_success 'detect bad hash version' ' + cd "$TRASH_DIRECTORY/full" && + cp $objdir/info/commit-graph commit-graph-backup && + test_when_finished mv commit-graph-backup $objdir/info/commit-graph && + corrupt_data $objdir/info/commit-graph 5 "\02" && + test_must_fail git commit-graph verify 2>err && + grep "hash version" err +' + +test_expect_success 'detect too small chunk-count' ' + cd "$TRASH_DIRECTORY/full" && + cp $objdir/info/commit-graph commit-graph-backup && + test_when_finished mv commit-graph-backup $objdir/info/commit-graph && + corrupt_data $objdir/info/commit-graph 6 "\01" && + test_must_fail git commit-graph verify 2>err && + cat err +' + + test_done Martin: thank you so much for these test examples, and for running them to find out about the newline issue. This is really helpful.
Re: [PATCH] pager: set COLUMNS to term_columns()
On Fri, May 11, 2018 at 11:25 AM, Jeff Kingwrote: > --- a/pager.c > +++ b/pager.c > @@ -109,10 +109,15 @@ void setup_pager(void) > return; > > /* > -* force computing the width of the terminal before we redirect > -* the standard output to the pager. > +* After we redirect standard output, we won't be able to use an ioctl > +* to get the terminal size. Let's grab it now, and then set $COLUMNS > +* to communicate it to any sub-processes. > */ > - (void) term_columns(); > + { > + char buf[64]; > + xsnprintf(buf, sizeof(buf), "%d", term_columns()); > + setenv("COLUMNS", buf, 0); I wonder if this affects bash being a subprocess. E.g. if COLUMNS is exported will it still dynamically adjust COLUMNS? A quick test shows that it adjusts $COLUMNS anyway, so even if we need to launch a shell we should be good. > + } > > setenv("GIT_PAGER_IN_USE", "true", 1); > > -- > 2.17.0.984.g9b00a423a4 > -- Duy
[PATCH] column: fix off-by-one default width
By default we want to fill the whole screen if possible, but we do not want to use up _all_ terminal columns because the last character is going hit the border, push the cursor over and wrap. Keep it at default value zero, which will make print_columns() set the width at term_columns() - 1. This affects the test in t7004 because effective column width before was 40 but now 39 so we need to compensate it by one or the output at 39 columns has a different layout. Signed-off-by: Nguyễn Thái Ngọc Duy--- builtin/column.c | 1 - t/t7004-tag.sh | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/builtin/column.c b/builtin/column.c index 0c3223d64b..5228ccf37a 100644 --- a/builtin/column.c +++ b/builtin/column.c @@ -42,7 +42,6 @@ int cmd_column(int argc, const char **argv, const char *prefix) git_config(column_config, NULL); memset(, 0, sizeof(copts)); - copts.width = term_columns(); copts.padding = 1; argc = parse_options(argc, argv, "", options, builtin_column_usage, 0); if (argc) diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh index e3f1e014aa..d7b319e919 100755 --- a/t/t7004-tag.sh +++ b/t/t7004-tag.sh @@ -363,7 +363,7 @@ test_expect_success 'tag -l -l works, as our buggy documenta ' test_expect_success 'listing tags in column' ' - COLUMNS=40 git tag -l --column=row >actual && + COLUMNS=41 git tag -l --column=row >actual && cat >expected <<\EOF && a1 aa1 cba t210t211 v0.2.1 v1.0v1.0.1 v1.1.3 -- 2.17.0.705.g3525833791
Re: Implementing reftable in Git
On Wed, May 9, 2018 at 4:33 PM, Christian Couderwrote: > I might start working on implementing reftable in Git soon. > [...] Nice. It'll be great to have a reftable implementation in git core (and ideally libgit2, as well). It seems to me that it could someday become the new default reference storage method. The file format is considerably more complicated than the current loose/packed scheme, which is definitely a disadvantage (for example, for other Git implementations). But implementing it *with good performance and without races* might be no more complicated than the current scheme. Testing will be important. There are already many tests specifically about testing loose/packed reference storage. These will always have to run against repositories that are forced to use that reference scheme. And there will need to be new tests specifically about the reftable scheme. Both classes of tests should be run every time. That much is pretty obvious. But currently, there are a lot of tests that assume the loose/packed reference format on disk even though the tests are not really related to references at all. ISTM that these should be converted to work at a higher level, for example using `for-each-ref`, `rev-parse`, etc. to examine references rather than reading reference files directly. That way the tests should run correctly regardless of which scheme is in use. And since it's too expensive to run the whole test suite with both reference storage schemes, it seems to me that the reference storage scheme that is used while running the scheme-neutral tests should be easy to choose at runtime. David Turner did some analogous work for wiring up and testing his proposed LMDB ref storage backend that might be useful [1]. I'm CCing him, since he might have thoughts on this topic. Regarding the reftable spec itself: I recently gave a little internal talk about it, and while preparing the talk I noticed a couple of things that should maybe be tweaked: * The spec proposes to change `$GIT_DIR/refs`, which is currently a directory that holds the loose refs, into a file that holds the table of contents of reftable files comprising the full set of references. This was my suggestion. I was thinking that this would prevent old refs code from being used accidentally on a reftable-enabled repository, while still enabling old versions of Git recognize this as a git directory [2]. I think that the latter is important to make things like `git rev-parse --git-dir` work correctly, even if the installed version of git can't actually *read* the repository. The problem is that `is_git_directory()` checks not only whether `$GIT_DIR/refs` exists, but also whether it is executable (i.e., since it is normally a directory, that it is searchable). It would be silly to make the reftable table of contents executable, so this doesn't seem like a good approach after all. So probably `$GIT_DIR/refs` should continue to be a directory. If it's there, it would probably make sense to place the reftable files and maybe the ToC inside of it. We would have to rely on older Git versions refusing to work in the directory because its `config` file has an unrecognized `core.repositoryFormatVersion`, but that should be OK I think. * The scheme for naming reftable files [3] is, I believe, just a suggestion as far as the spec is concerned (except for the use of `.ref`/`.log` file extensions). It might be more less unwieldy to use `%d` rather than `%08d`, and more convenient to name compacted files to `${min_update_index}-${max_update_index}_${n}.{ref,log}` to make it clearer to see by inspection what each file contains. That would also make it unnecessary, in most cases, to insert a `_${n}` to make the filename unique. Michael [1] https://github.com/dturner-tw/git/tree/dturner/pluggable-backends [2] https://github.com/git/git/blob/ccdcbd54c4475c2238b310f7113ab3075b5abc9c/setup.c#L309-L347 [3] https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#layout https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#compaction [4] https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#footer
[PATCH] pager: set COLUMNS to term_columns()
On Fri, May 11, 2018 at 10:43:45AM +0200, Duy Nguyen wrote: > > As an aside, I was confused while looking into this because I _thought_ > > I had COLUMNS set: > > > > $ echo $COLUMNS > > 119 > > > > But it turns out that bash sets that by default (if you have the > > checkwinsize option on) but does not export it. ;) > > Yep. This confused me too and I was "f*ck it I don't want to deal with > this tty crap again". I'll leave the term_columns() fix to you then, > and limit this patch to the "while at there" part which should be > fixed anyway. Heh. OK, here it is. I wondered why we didn't notice this earlier and elsewhere (like in "git -p help -a"). The answer is that git-tag is the only program that uses the column filter. Everybody else does it in-process. -- >8 -- Subject: [PATCH] pager: set COLUMNS to term_columns() After we invoke the pager, our stdout goes to a pipe, not the terminal, meaning we can no longer use an ioctl to get the terminal width. For that reason, ad6c3739a3 (pager: find out the terminal width before spawning the pager, 2012-02-12) started caching the terminal width. But that cache is only an in-process variable. Any programs we spawn will also not be able to run that ioctl, but won't have access to our cache. They'll end up falling back to our 80-column default. You can see the problem with: git tag --column=row Since git-tag spawns a pager these days, its spawned git-column helper will see neither the terminal on stdout nor a useful COLUMNS value (assuming you do not export it from your shell already). And you'll end up with 80-column output in the pager, regardless of your terminal size. We can fix this by setting COLUMNS right before spawning the pager. That fixes this case, as well as any more complicated ones (e.g., a paged program spawns another script which then generates columnized output). Signed-off-by: Jeff King--- pager.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/pager.c b/pager.c index 92b23e6cd1..226828f178 100644 --- a/pager.c +++ b/pager.c @@ -109,10 +109,15 @@ void setup_pager(void) return; /* -* force computing the width of the terminal before we redirect -* the standard output to the pager. +* After we redirect standard output, we won't be able to use an ioctl +* to get the terminal size. Let's grab it now, and then set $COLUMNS +* to communicate it to any sub-processes. */ - (void) term_columns(); + { + char buf[64]; + xsnprintf(buf, sizeof(buf), "%d", term_columns()); + setenv("COLUMNS", buf, 0); + } setenv("GIT_PAGER_IN_USE", "true", 1); -- 2.17.0.984.g9b00a423a4
Re: [PATCH v2] commit-graph: fix UX issue when .lock file exists
On Thu, May 10, 2018 at 05:42:52PM +, Derrick Stolee wrote: > We use the lockfile API to avoid multiple Git processes from writing to > the commit-graph file in the .git/objects/info directory. In some cases, > this directory may not exist, so we check for its existence. > [...] This version looks good to me. Thanks! -Peff
Re: [PATCH] fast-export: avoid NULL pointer arithmetic
On Fri, May 11, 2018 at 08:19:59AM +0200, Duy Nguyen wrote: > On Fri, May 11, 2018 at 6:49 AM, Junio C Hamanowrote: > > Junio C Hamano writes: > > > >> René Scharfe writes: > >> > But it somehow feels backwards in spirit to me, as the reason why we > use "void *" there in the decoration field is because we expect that > we'd have a pointer to some struture most of the time, and we have > to occasionally store a small integer there. > >>> > >>> Yes, fast-export seems to be the only place that stores an integer as > >>> a decoration. > >> > >> With the decoration subsystem that might be the case, but I think > >> we have other codepaths where "void * .util" field in the structure > >> is used to store (void *)1, expecting that a normal allocation will > >> never yield a pointer that is indistinguishable from that value. > > > > I was looking at a different topic and noticed that bisect.c uses > > commit->util (which is of type "void *") to always store an int (it > > never stores a pointer and there is no mixing). This one is equally > > unportable as fast-export after your fix. > > > > In both cases we should be able to use commit-slab instead of > commit->util. We could go even further and kill "util" pointer but > that's more work. I would love it if we could kill the util pointer in favor of commit-slab. Unfortunately the fast-export case is decorating non-commit objects, too. I'm not sure how possible/easy it would be to have an "object slab". Some complications are: 1. It would increase the size of "struct tree" and "struct blob" by at least 4 bytes (possibly 8, due to alignment) to store the slab index. Commits would stay the same, but my experience is that most repositories have 5-10 times as many trees/blobs as commits. Some of that memory might be reclaimable. E.g., if we moved tree->buffer and tree->size into a slab, callers which don't use them would actually come out ahead (and more callers than you might expect could do this, since many only need to look at the tree contents inside a single function). 2. If we allocate the indices for all types as a single sequence, then callers who only care about a specific type will have very sparse slabs (so they'll over-allocate, and it will have poor cache behavior). It might be possible to do something more clever. E.g., give each object type its own index counter, but then for a full object-slab, store per-type slabs and dereference obj->type to know which slab to look in. There'd be some complication with any_object. We'd probably need an OBJ_NONE slab, and then to allocate a type-specific index when the type is assigned. There's already a central point for this in object_as_type(). But we'd also have to migrate any previously-stored slab data (stored when it was OBJ_NONE), which implies having a global list of slabs. So I dunno. It seems do-able but complicated. -Peff
Re: [PATCH] tag: fix column output not using all terminal space
On Fri, May 11, 2018 at 10:28 AM, Jeff Kingwrote: > On Fri, May 11, 2018 at 09:56:02AM +0200, Nguyễn Thái Ngọc Duy wrote: > >> git-tag runs a separate git-column command via run_column_filter(). >> This makes the new 'git-column' process fail to pick up the terminal >> width for some reason and fall back to default width. Just explicitly >> pass terminal width and avoid this terminal width detection business >> in subprocesses. > > I think "some reason" is that we start the pager before running "git > column". Running "git --no-pager tag --column=row" seems to fix it. > > It doesn't seem to have anything to do with the pager program itself. > Doing: > > # use sh to avoid optimizing out pager invocation > GIT_PAGER='sh -c cat' git tag --column=row > > shows the same problem. It looks like we force term_columns() to run > before invoking the pager in order to cache the value. That makes sense, > since TIOCGWINSZ on stdout is not going to be valid after we start > piping it to the pager. But of course our git-column sub-process won't > see that; the value is cached only in memory. > > So I think the approach of communicating the width to the sub-process is > the right one. But I think we'd probably want to do so through the > $COLUMNS variable, rather than passing a command-line option. That would > fix the same bug for other cases where we might have multiple layers of > sub-processes (e.g., if we pipe to the pager and then run a hook which > columnizes output). > > Something like this seems to make it work for me: > > diff --git a/pager.c b/pager.c > index 92b23e6cd1..c4f3412a84 100644 > --- a/pager.c > +++ b/pager.c > @@ -162,8 +162,12 @@ int term_columns(void) > #ifdef TIOCGWINSZ > else { > struct winsize ws; > - if (!ioctl(1, TIOCGWINSZ, ) && ws.ws_col) > + if (!ioctl(1, TIOCGWINSZ, ) && ws.ws_col) { > + char buf[64]; > term_columns_at_startup = ws.ws_col; > + xsnprintf(buf, sizeof(buf), "%d", ws.ws_col); > + setenv("COLUMNS", buf, 0); > + } > } > #endif > > > though perhaps that should go into setup_pager(), which is what is > actually making stdout inaccessible. > > As an aside, I was confused while looking into this because I _thought_ > I had COLUMNS set: > > $ echo $COLUMNS > 119 > > But it turns out that bash sets that by default (if you have the > checkwinsize option on) but does not export it. ;) Yep. This confused me too and I was "f*ck it I don't want to deal with this tty crap again". I'll leave the term_columns() fix to you then, and limit this patch to the "while at there" part which should be fixed anyway. -- Duy
Re: [PATCH v2 0/4] Fix mem leaks of recent object store conversions.
On Thu, May 10, 2018 at 12:58:45PM -0700, Stefan Beller wrote: > This series replaces the two commits that were queued on > sb/object-store-replace, > fixing memory leaks that were recently introduced. > > Compared to v1, I merged the two independent series from yesterday, > rewrote the commit message to clear up Junios confusion and addresses Peffs > comments for the packfiles as well. Mostly. :) My one remaining complaint is that the bitmap code may hold on to a dangling pointer to a packed_git after this series. I think that is part of a larger problem, though, which is that the bitmap code's globals need to be part of the struct raw_object_store. I think this can already cause problems before your series if we were to try to use bitmaps in both a superproject and a submodule in the same process, though I think we'd at least hit the "ignoring extra bitmap file" code path in open_pack_bitmap_1(). So right now it's an annoyance, but after your series it becomes a potential segfault. -Peff
Re: [PATCH] tag: fix column output not using all terminal space
On Fri, May 11, 2018 at 09:56:02AM +0200, Nguyễn Thái Ngọc Duy wrote: > git-tag runs a separate git-column command via run_column_filter(). > This makes the new 'git-column' process fail to pick up the terminal > width for some reason and fall back to default width. Just explicitly > pass terminal width and avoid this terminal width detection business > in subprocesses. I think "some reason" is that we start the pager before running "git column". Running "git --no-pager tag --column=row" seems to fix it. It doesn't seem to have anything to do with the pager program itself. Doing: # use sh to avoid optimizing out pager invocation GIT_PAGER='sh -c cat' git tag --column=row shows the same problem. It looks like we force term_columns() to run before invoking the pager in order to cache the value. That makes sense, since TIOCGWINSZ on stdout is not going to be valid after we start piping it to the pager. But of course our git-column sub-process won't see that; the value is cached only in memory. So I think the approach of communicating the width to the sub-process is the right one. But I think we'd probably want to do so through the $COLUMNS variable, rather than passing a command-line option. That would fix the same bug for other cases where we might have multiple layers of sub-processes (e.g., if we pipe to the pager and then run a hook which columnizes output). Something like this seems to make it work for me: diff --git a/pager.c b/pager.c index 92b23e6cd1..c4f3412a84 100644 --- a/pager.c +++ b/pager.c @@ -162,8 +162,12 @@ int term_columns(void) #ifdef TIOCGWINSZ else { struct winsize ws; - if (!ioctl(1, TIOCGWINSZ, ) && ws.ws_col) + if (!ioctl(1, TIOCGWINSZ, ) && ws.ws_col) { + char buf[64]; term_columns_at_startup = ws.ws_col; + xsnprintf(buf, sizeof(buf), "%d", ws.ws_col); + setenv("COLUMNS", buf, 0); + } } #endif though perhaps that should go into setup_pager(), which is what is actually making stdout inaccessible. As an aside, I was confused while looking into this because I _thought_ I had COLUMNS set: $ echo $COLUMNS 119 But it turns out that bash sets that by default (if you have the checkwinsize option on) but does not export it. ;) -Peff
[PATCH] tag: fix column output not using all terminal space
git-tag runs a separate git-column command via run_column_filter(). This makes the new 'git-column' process fail to pick up the terminal width for some reason and fall back to default width. Just explicitly pass terminal width and avoid this terminal width detection business in subprocesses. While at there, fix an off-by-one column setting in git-column. We do not want to use up _all_ terminal columns because the last character is going hit the border and wrap. Keep it at term_columns() - 1 like print_columns() does. This affects the test in t7004 because effective column width before was 40 but now 39 so we need to adjust it back. Signed-off-by: Nguyễn Thái Ngọc Duy--- builtin/column.c | 2 +- column.c | 2 ++ t/t7004-tag.sh | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/builtin/column.c b/builtin/column.c index 0c3223d64b..182c84f778 100644 --- a/builtin/column.c +++ b/builtin/column.c @@ -42,7 +42,7 @@ int cmd_column(int argc, const char **argv, const char *prefix) git_config(column_config, NULL); memset(, 0, sizeof(copts)); - copts.width = term_columns(); + copts.width = term_columns() - 1; copts.padding = 1; argc = parse_options(argc, argv, "", options, builtin_column_usage, 0); if (argc) diff --git a/column.c b/column.c index 49ab85b769..382537b324 100644 --- a/column.c +++ b/column.c @@ -381,6 +381,8 @@ int run_column_filter(int colopts, const struct column_options *opts) argv_array_pushf(argv, "--raw-mode=%d", colopts); if (opts && opts->width) argv_array_pushf(argv, "--width=%d", opts->width); + else + argv_array_pushf(argv, "--width=%d", term_columns() - 1); if (opts && opts->indent) argv_array_pushf(argv, "--indent=%s", opts->indent); if (opts && opts->padding) diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh index e3f1e014aa..d7b319e919 100755 --- a/t/t7004-tag.sh +++ b/t/t7004-tag.sh @@ -363,7 +363,7 @@ test_expect_success 'tag -l -l works, as our buggy documenta ' test_expect_success 'listing tags in column' ' - COLUMNS=40 git tag -l --column=row >actual && + COLUMNS=41 git tag -l --column=row >actual && cat >expected <<\EOF && a1 aa1 cba t210t211 v0.2.1 v1.0v1.0.1 v1.1.3 -- 2.17.0.705.g3525833791
Re: [PATCH 1/9] Add and use generic name->id mapping code for color slot parsing
On Thu, May 10, 2018 at 7:16 PM, Stefan Bellerwrote: > On Thu, May 10, 2018 at 7:19 AM, Nguyễn Thái Ngọc Duy > wrote: > >> 7 files changed, 82 insertions(+), 112 deletions(-) > > Nice! > > >> >> +static const char *color_branch_slots[] = { >> + [BRANCH_COLOR_RESET]= "reset", > > In 512f41cfac5 (clean.c: use designated initializer, 2017-07-14) > we thought we'll do it once and see if anyone complains > (and it shipped v2.15.0, 2017-10-29), and so far > nobody complained half a year later. So designated initializers > are all good now? I don't know :) but it's worth pushing. I don't think this series would land on the next release, which gives people a couple more months to complain about C99. It does save me time and if it causes problem, removing designated initializers is trivial > Do we want to mention this decision in the commit message? > > If so, the patch looks good! > Thanks, > Stefan -- Duy
[PATCH v3] pack-format.txt: more details on pack file format
The current document mentions OBJ_* constants without their actual values. A git developer would know these are from cache.h but that's not very friendly to a person who wants to read this file to implement a pack file parser. Similarly, the deltified representation is not documented at all (the "document" is basically patch-delta.c). Translate that C code to English with a bit more about what ofs-delta and ref-delta mean. Signed-off-by: Nguyễn Thái Ngọc Duy--- Diff from v2 diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index 00351cb822..70a99fd142 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -56,27 +56,37 @@ blob. However to save space, an object could be stored as a "delta" of another "base" object. These representations are assigned new types ofs-delta and ref-delta, which is only valid in a pack file. -Both ofs-delta and ref-delta store the "delta" against another -object. The difference between them is, ref-delta directly encodes -20-byte base object name. If the base object is in the same pack, -ofs-delta encodes the offset of the base object in the pack instead. +Both ofs-delta and ref-delta store the "delta" to be applied to +another object (called 'base object') to reconstruct the object. The +difference between them is, ref-delta directly encodes 20-byte base +object name. If the base object is in the same pack, ofs-delta encodes +the offset of the base object in the pack instead. + +The base object could also be deltified if it's in the same pack. +Ref-delta can also refer to an object outside the pack (i.e. the +so-called "thin pack"). When stored on disk however, the pack should +be self contained to avoid cyclic dependency. The delta data is a sequence of instructions to reconstruct an object -from the base object. Each instruction appends more and more data to -the target object until it's complete. There are two supported -instructions so far: one for copy a byte range from the source object -and one for inserting new data embedded in the instruction itself. +from the base object. If the base object is deltified, it must be +converted to canonical form first. Each instruction appends more and +more data to the target object until it's complete. There are two +supported instructions so far: one for copy a byte range from the +source object and one for inserting new data embedded in the +instruction itself. Each instruction has variable length. Instruction type is determined by the seventh bit of the first octet. The following diagrams follow the convention in RFC 1951 (Deflate compressed data format). + Instruction to copy from base object + +--+-+-+-+-+---+---+---+ | 1xxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 | +--+-+-+-+-+---+---+---+ This is the instruction format to copy a byte range from the source -object. It encodes the offset to copy from any the number of bytes to +object. It encodes the offset to copy from and the number of bytes to copy. Offset and size are in little-endian order. All offset and size bytes are optional. This is to reduce the @@ -99,6 +109,8 @@ In its most compact form, this instruction only takes up one byte values zero. There is another exception: size zero is automatically converted to 0x1. + Instruction to add new data + +--++ | 0xxx |data| +--++ @@ -108,6 +120,8 @@ object. The following data is appended to the target object. The first seven bits of the first octet determines the size of data in bytes. The size must be non-zero. + Reserved instruction + +--+ | | +--+ Documentation/technical/pack-format.txt | 92 + cache.h | 5 ++ 2 files changed, 97 insertions(+) diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index 8e5bf60be3..70a99fd142 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -36,6 +36,98 @@ Git pack format - The trailer records 20-byte SHA-1 checksum of all of the above. +=== Object types + +Valid object types are: + +- OBJ_COMMIT (1) +- OBJ_TREE (2) +- OBJ_BLOB (3) +- OBJ_TAG (4) +- OBJ_OFS_DELTA (6) +- OBJ_REF_DELTA (7) + +Type 5 is reserved for future expansion. Type 0 is invalid. + +=== Deltified representation + +Conceptually there are only four object types: commit, tree,
Re: [PATCH v2] pack-format.txt: more details on pack file format
On Thu, May 10, 2018 at 7:06 PM, Stefan Bellerwrote: >> +=== Deltified representation >> + >> +Conceptually there are only four object types: commit, tree, tag and >> +blob. However to save space, an object could be stored as a "delta" of >> +another "base" object. These representations are assigned new types >> +ofs-delta and ref-delta, which is only valid in a pack file. > > ...only valid... > > as opposed to loose objects or as opposed to referencing cross-packs? > I would think the former, not the latter. Yeah. This is pretty much an implementation detail of a pack. The "real" type is always blob/commit/tree/tag. But you only see this when you dig deep down in pack-related code. >> +Both ofs-delta and ref-delta store the "delta" against another >> +object. The difference between them is, ref-delta directly encodes >> +20-byte base object name. If the base object is in the same pack, >> +ofs-delta encodes the offset of the base object in the pack instead. > > Reading this paragraph clears up the question from before. > The ref delta is a delta to another "reference by hash id (sha1)". > What abbreviation is OFS? OFfSet ? I guess so. I never bothered to track down the source for that. >> +The delta data is a sequence of instructions to reconstruct an object >> +from the base object. > > As said before the base object is of type 1..4, we do not "delta-on-delta" > yet, but to construct the object we have to create the base object first, > which itself can be represented as a deltified object leading to a delta > chain. Yeah that's the delta chain concept. I'll just make a note here about base object potentially being a delta object as well. -- Duy
Re: [PATCH v2] add status config and command line options for rename detection
Ben Peartwrites: > Documentation/config.txt | 10 > Documentation/git-status.txt | 10 > builtin/commit.c | 41 > diff.c | 2 +- > diff.h | 1 + > t/t7525-status-rename.sh | 90 > wt-status.c | 12 + > wt-status.h | 4 +- > 8 files changed, 168 insertions(+), 2 deletions(-) > create mode 100644 t/t7525-status-rename.sh I'll mark the new script as executable (otherwise the test will not even start).
Re: [PATCH 0/9] completion: avoid hard coding config var list
On Fri, May 11, 2018 at 8:00 AM, Junio C Hamanowrote: > Nguyễn Thái Ngọc Duy writes: > >> While at there, this config list is also available to the user via >> `git help --config` if you can't remember the exact config name you >> want and don't want to go through that big git-config man page. > > Makes a reader anticipate a new user of levenshtein code, perhaps > ;-) Why would you go and write that for? Now I want to do it :-) Yeah I think config name typo has been a common complaint (including from me). -- Duy
Re: [PATCH] fast-export: avoid NULL pointer arithmetic
On Fri, May 11, 2018 at 6:49 AM, Junio C Hamanowrote: > Junio C Hamano writes: > >> René Scharfe writes: >> But it somehow feels backwards in spirit to me, as the reason why we use "void *" there in the decoration field is because we expect that we'd have a pointer to some struture most of the time, and we have to occasionally store a small integer there. >>> >>> Yes, fast-export seems to be the only place that stores an integer as >>> a decoration. >> >> With the decoration subsystem that might be the case, but I think >> we have other codepaths where "void * .util" field in the structure >> is used to store (void *)1, expecting that a normal allocation will >> never yield a pointer that is indistinguishable from that value. > > I was looking at a different topic and noticed that bisect.c uses > commit->util (which is of type "void *") to always store an int (it > never stores a pointer and there is no mixing). This one is equally > unportable as fast-export after your fix. > In both cases we should be able to use commit-slab instead of commit->util. We could go even further and kill "util" pointer but that's more work. -- Duy
Re: [PATCH 0/9] completion: avoid hard coding config var list
Nguyễn Thái Ngọc Duywrites: > While at there, this config list is also available to the user via > `git help --config` if you can't remember the exact config name you > want and don't want to go through that big git-config man page. Makes a reader anticipate a new user of levenshtein code, perhaps ;-)