regarding fix on "git clone $there $here"

2018-05-08 Thread Leslie Wang
Dear git experts,

Recently we try to upgrade ubuntu from 17.10 to 18.04, then we found one 
inconsistent behavior on git clone.

At 2.14.1 or 2.15.1, if I run command like 
 - mkdir /tmp/111
 - git clone g...@github.com:111/111 /tmp/111

because it will failure, then /tmp/111 will be removed automatically.

However, at latest 2.17.0 which is part of ubuntu 18.04, seems like git clone 
failure will not auto remove this folder. I notice 2.16.2 and 2.17.0 release 
note includes this fix. So just wonder to know if prior behavior was think of 
bug, and this fix has change the behavior. 

 * "git clone $there $here" is allowed even when here directory exists
   as long as it is an empty directory, but the command incorrectly
   removed it upon a failure of the operation.

Thanks & Regards
Leslie Wang

Re: [PATCH 3/5] refs.c: drop dead code checking lock status in `delete_pseudoref()`

2018-05-08 Thread Martin Ågren
On 8 May 2018 at 20:10, Jeff King  wrote:
> On Sun, May 06, 2018 at 04:10:29PM +0200, Martin Ågren wrote:
>> Unlike in the previous patch, this function is not prepared for
>> indicating errors via a `strbuf err`, so let's just drop the dead code.
>> Any improved error-handling can be added later.
>
> I suspect this ought to eventually be converted to return an error, to
> match the rest of the refs code. And in that sense, this is a step
> backwards versus leaving the current die_errno in place and dropping the
> LOCK_DIE_ON_ERROR flag. But it probably doesn't matter much either way,
> and whoever does the conversion later can deal with it.

Thank you for your comments on this series. It's much appreciated. I
will be rerolling this series with extended commit messages about the
lock-handling. Looking over this code again, I notice that a larger
cleanup of the error-handling might be warranted. It would indeed feel
better to do a minor part of that, instead of taking a small step in the
wrong direction...

Martin


Re: [PATCH/RFC] completion: complete all possible -no-

2018-05-08 Thread Aaron Schrab

At 17:24 +0200 08 May 2018, Duy Nguyen  wrote:

It took me so long to reply partly because I remember seeing some guy
doing clever trick with tab completion that also shows a short help
text in addition to the complete words. I could not find that again
and from my reading (also internet searching) it's probably not
possible to do this without trickery.


Was that perhaps using zsh rather than bash? Below is some of the 
display from its git completion (this is likely affected somewhat by my 
configuration).  The group descriptions (lines that begin with 
"Completing") appear in a different color, and are not available for 
selection.


1113$ git c
Completing alias
ci   -- alias for 'commit -v'
cia  -- alias for 'commit -v -a'
co   -- alias for 'checkout'
conf -- alias for 'config'
Completing main porcelain command
checkout -- checkout branch or paths to working tree
cherry-pick  -- apply changes introduced by some existing commits
citool   -- graphical alternative to git commit
clean-- remove untracked files from working tree
clone-- clone repository into new directory
commit   -- record changes to repository
Completing ancillary manipulator command
config   -- get and set repository or global options
Completing ancillary interrogator command
cherry   -- find commits not merged upstream
count-objects-- count unpacked objects and display their disk consumption
Completing plumbing manipulator command
checkout-index   -- copy files from index to working directory
commit-tree  -- create new commit object
Completing plumbing interrogator command
cat-file -- provide content or type information for repository objects

1114$ git commit -
Completing option
--all  -a   -- stage all modified and deleted paths
--allow-empty   -- allow recording an empty commit
--allow-empty-message   -- allow recording a commit with an empty 
message
--amend -- amend the tip of the current branch
--author-- override the author name used in the commit


[PATCH v5 2/7] grep.c: expose matched column in match_line()

2018-05-08 Thread Taylor Blau
When calling match_line(), callers presently cannot determine the
relative offset of the match because match_line() discards the
'regmatch_t' that contains this information.

Instead, teach match_line() to take in a 'regmatch_t *' so that callers
can inspect the match's starting and ending offset from the beginning of
the line. This additional argument has no effect when opt->extended is
non-zero.

We will later pass the starting offset from 'regmatch_t *' to
show_line() in order to display the column number of the first match.

Signed-off-by: Taylor Blau 
---
 grep.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/grep.c b/grep.c
index 65b90c10a3..1c25782355 100644
--- a/grep.c
+++ b/grep.c
@@ -1299,17 +1299,17 @@ static int match_expr(struct grep_opt *opt, char *bol, 
char *eol,
 }
 
 static int match_line(struct grep_opt *opt, char *bol, char *eol,
- enum grep_context ctx, int collect_hits)
+ regmatch_t *match, enum grep_context ctx,
+ int collect_hits)
 {
struct grep_pat *p;
-   regmatch_t match;
 
if (opt->extended)
return match_expr(opt, bol, eol, ctx, collect_hits);
 
/* we do not call with collect_hits without being extended */
for (p = opt->pattern_list; p; p = p->next) {
-   if (match_one_pattern(p, bol, eol, ctx, , 0))
+   if (match_one_pattern(p, bol, eol, ctx, match, 0))
return 1;
}
return 0;
@@ -1699,6 +1699,7 @@ static int grep_source_1(struct grep_opt *opt, struct 
grep_source *gs, int colle
int try_lookahead = 0;
int show_function = 0;
struct userdiff_driver *textconv = NULL;
+   regmatch_t match;
enum grep_context ctx = GREP_CONTEXT_HEAD;
xdemitconf_t xecfg;
 
@@ -1788,7 +1789,7 @@ static int grep_source_1(struct grep_opt *opt, struct 
grep_source *gs, int colle
if ((ctx == GREP_CONTEXT_HEAD) && (eol == bol))
ctx = GREP_CONTEXT_BODY;
 
-   hit = match_line(opt, bol, eol, ctx, collect_hits);
+   hit = match_line(opt, bol, eol, , ctx, collect_hits);
*eol = ch;
 
if (collect_hits)
-- 
2.17.0



[PATCH v5 4/7] grep.c: display column number of first match

2018-05-08 Thread Taylor Blau
To prepare for 'git grep' learning '--column', teach grep.c's
show_line() how to show the column of the first match on non-context
line.

Signed-off-by: Taylor Blau 
---
 grep.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/grep.c b/grep.c
index fb0fa23231..f3fe416791 100644
--- a/grep.c
+++ b/grep.c
@@ -1364,7 +1364,7 @@ static int next_match(struct grep_opt *opt, char *bol, 
char *eol,
 }
 
 static void show_line(struct grep_opt *opt, char *bol, char *eol,
- const char *name, unsigned lno, char sign)
+ const char *name, unsigned lno, unsigned cno, char sign)
 {
int rest = eol - bol;
const char *match_color, *line_color = NULL;
@@ -1399,6 +1399,17 @@ static void show_line(struct grep_opt *opt, char *bol, 
char *eol,
output_color(opt, buf, strlen(buf), opt->color_lineno);
output_sep(opt, sign);
}
+   /*
+* Treat 'cno' as the 1-indexed offset from the start of a non-context
+* line to its first match. Otherwise, 'cno' is 0 indicating that we are
+* being called with a context line.
+*/
+   if (opt->columnnum && cno) {
+   char buf[32];
+   xsnprintf(buf, sizeof(buf), "%d", cno);
+   output_color(opt, buf, strlen(buf), opt->color_columnno);
+   output_sep(opt, sign);
+   }
if (opt->color) {
regmatch_t match;
enum grep_context ctx = GREP_CONTEXT_BODY;
@@ -1504,7 +1515,7 @@ static void show_funcname_line(struct grep_opt *opt, 
struct grep_source *gs,
break;
 
if (match_funcname(opt, gs, bol, eol)) {
-   show_line(opt, bol, eol, gs->name, lno, '=');
+   show_line(opt, bol, eol, gs->name, lno, 0, '=');
break;
}
}
@@ -1569,7 +1580,7 @@ static void show_pre_context(struct grep_opt *opt, struct 
grep_source *gs,
 
while (*eol != '\n')
eol++;
-   show_line(opt, bol, eol, gs->name, cur, sign);
+   show_line(opt, bol, eol, gs->name, cur, 0, sign);
bol = eol + 1;
cur++;
}
@@ -1833,7 +1844,7 @@ static int grep_source_1(struct grep_opt *opt, struct 
grep_source *gs, int colle
show_pre_context(opt, gs, bol, eol, lno);
else if (opt->funcname)
show_funcname_line(opt, gs, bol, lno);
-   show_line(opt, bol, eol, gs->name, lno, ':');
+   show_line(opt, bol, eol, gs->name, lno, match.rm_so+1, 
':');
last_hit = lno;
if (opt->funcbody)
show_function = 1;
@@ -1862,7 +1873,7 @@ static int grep_source_1(struct grep_opt *opt, struct 
grep_source *gs, int colle
/* If the last hit is within the post context,
 * we need to show this line.
 */
-   show_line(opt, bol, eol, gs->name, lno, '-');
+   show_line(opt, bol, eol, gs->name, lno, match.rm_so+1, 
'-');
}
 
next_line:
-- 
2.17.0



[PATCH v5 6/7] grep.c: add configuration variables to show matched option

2018-05-08 Thread Taylor Blau
To support git-grep(1)'s new option, '--column', document and teach
grep.c how to interpret relevant configuration options, similar to those
associated with '--line-number'.

Signed-off-by: Taylor Blau 
---
 Documentation/config.txt   | 5 +
 Documentation/git-grep.txt | 3 +++
 grep.c | 6 ++
 3 files changed, 14 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 6e8d969f52..b3c861c5c3 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1159,6 +1159,8 @@ color.grep.::
function name lines (when using `-p`)
 `lineNumber`;;
line number prefix (when using `-n`)
+`column`;;
+   column number prefix (when using `--column`)
 `match`;;
matching text (same as setting `matchContext` and `matchSelected`)
 `matchContext`;;
@@ -1708,6 +1710,9 @@ gitweb.snapshot::
 grep.lineNumber::
If set to true, enable `-n` option by default.
 
+grep.column::
+   If set to true, enable the `--column` option by default.
+
 grep.patternType::
Set the default matching behavior. Using a value of 'basic', 'extended',
'fixed', or 'perl' will enable the `--basic-regexp`, 
`--extended-regexp`,
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 75f1561112..dc8f76ce99 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -44,6 +44,9 @@ CONFIGURATION
 grep.lineNumber::
If set to true, enable `-n` option by default.
 
+grep.column::
+   If set to true, enable the `--column` option by default.
+
 grep.patternType::
Set the default matching behavior. Using a value of 'basic', 'extended',
'fixed', or 'perl' will enable the `--basic-regexp`, 
`--extended-regexp`,
diff --git a/grep.c b/grep.c
index f4228c23ac..5d904810ad 100644
--- a/grep.c
+++ b/grep.c
@@ -96,6 +96,10 @@ int grep_config(const char *var, const char *value, void *cb)
opt->linenum = git_config_bool(var, value);
return 0;
}
+   if (!strcmp(var, "grep.column")) {
+   opt->columnnum = git_config_bool(var, value);
+   return 0;
+   }
 
if (!strcmp(var, "grep.fullname")) {
opt->relative = !git_config_bool(var, value);
@@ -112,6 +116,8 @@ int grep_config(const char *var, const char *value, void 
*cb)
color = opt->color_function;
else if (!strcmp(var, "color.grep.linenumber"))
color = opt->color_lineno;
+   else if (!strcmp(var, "color.grep.column"))
+   color = opt->color_columnno;
else if (!strcmp(var, "color.grep.matchcontext"))
color = opt->color_match_context;
else if (!strcmp(var, "color.grep.matchselected"))
-- 
2.17.0



[PATCH v5 3/7] grep.[ch]: extend grep_opt to allow showing matched column

2018-05-08 Thread Taylor Blau
To support showing the matched column when calling 'git-grep(1)', teach
'grep_opt' the normal set of options to configure the default behavior
and colorization of this feature.

Signed-off-by: Taylor Blau 
---
 grep.c | 3 +++
 grep.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/grep.c b/grep.c
index 1c25782355..fb0fa23231 100644
--- a/grep.c
+++ b/grep.c
@@ -46,6 +46,7 @@ void init_grep_defaults(void)
color_set(opt->color_filename, "");
color_set(opt->color_function, "");
color_set(opt->color_lineno, "");
+   color_set(opt->color_columnno, "");
color_set(opt->color_match_context, GIT_COLOR_BOLD_RED);
color_set(opt->color_match_selected, GIT_COLOR_BOLD_RED);
color_set(opt->color_selected, "");
@@ -155,6 +156,7 @@ void grep_init(struct grep_opt *opt, const char *prefix)
opt->extended_regexp_option = def->extended_regexp_option;
opt->pattern_type_option = def->pattern_type_option;
opt->linenum = def->linenum;
+   opt->columnnum = def->columnnum;
opt->max_depth = def->max_depth;
opt->pathname = def->pathname;
opt->relative = def->relative;
@@ -164,6 +166,7 @@ void grep_init(struct grep_opt *opt, const char *prefix)
color_set(opt->color_filename, def->color_filename);
color_set(opt->color_function, def->color_function);
color_set(opt->color_lineno, def->color_lineno);
+   color_set(opt->color_columnno, def->color_columnno);
color_set(opt->color_match_context, def->color_match_context);
color_set(opt->color_match_selected, def->color_match_selected);
color_set(opt->color_selected, def->color_selected);
diff --git a/grep.h b/grep.h
index 399381c908..08a0b391c5 100644
--- a/grep.h
+++ b/grep.h
@@ -127,6 +127,7 @@ struct grep_opt {
int prefix_length;
regex_t regexp;
int linenum;
+   int columnnum;
int invert;
int ignore_case;
int status_only;
@@ -159,6 +160,7 @@ struct grep_opt {
char color_filename[COLOR_MAXLEN];
char color_function[COLOR_MAXLEN];
char color_lineno[COLOR_MAXLEN];
+   char color_columnno[COLOR_MAXLEN];
char color_match_context[COLOR_MAXLEN];
char color_match_selected[COLOR_MAXLEN];
char color_selected[COLOR_MAXLEN];
-- 
2.17.0



[PATCH v5 7/7] contrib/git-jump/git-jump: jump to match column in addition to line

2018-05-08 Thread Taylor Blau
Take advantage of 'git-grep(1)''s new option, '--column' in order to
teach Peff's 'git-jump' script how to jump to the correct column for any
given match.

'git-grep(1)''s output is in the correct format for Vim's jump list, so
no additional cleanup is necessary.

Signed-off-by: Taylor Blau 
---
 contrib/git-jump/README   | 12 ++--
 contrib/git-jump/git-jump |  2 +-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/contrib/git-jump/README b/contrib/git-jump/README
index 4484bda410..2f618a7f97 100644
--- a/contrib/git-jump/README
+++ b/contrib/git-jump/README
@@ -25,6 +25,13 @@ git-jump will feed this to the editor:
 foo.c:2: printf("hello word!\n");
 ---
 
+Or, when running 'git jump grep', column numbers will also be emitted,
+e.g. `git jump grep "hello"` would return:
+
+---
+foo.c:2:9: printf("hello word!\n");
+---
+
 Obviously this trivial case isn't that interesting; you could just open
 `foo.c` yourself. But when you have many changes scattered across a
 project, you can use the editor's support to "jump" from point to point.
@@ -35,7 +42,8 @@ Git-jump can generate four types of interesting lists:
 
   2. The beginning of any merge conflict markers.
 
-  3. Any grep matches.
+  3. Any grep matches, including the column of the first match on a
+ line.
 
   4. Any whitespace errors detected by `git diff --check`.
 
@@ -82,7 +90,7 @@ which does something similar to `git jump grep`. However, it 
is limited
 to positioning the cursor to the correct line in only the first file,
 leaving you to locate subsequent hits in that file or other files using
 the editor or pager. By contrast, git-jump provides the editor with a
-complete list of files and line numbers for each match.
+complete list of files, lines, and a column number for each match.
 
 
 Limitations
diff --git a/contrib/git-jump/git-jump b/contrib/git-jump/git-jump
index 80ab0590bc..931b0fe3a9 100755
--- a/contrib/git-jump/git-jump
+++ b/contrib/git-jump/git-jump
@@ -52,7 +52,7 @@ mode_merge() {
 # editor shows them to us in the status bar.
 mode_grep() {
cmd=$(git config jump.grepCmd)
-   test -n "$cmd" || cmd="git grep -n"
+   test -n "$cmd" || cmd="git grep -n --column"
$cmd "$@" |
perl -pe '
s/[ \t]+/ /g;
-- 
2.17.0


[PATCH v5 1/7] Documentation/config.txt: camel-case lineNumber for consistency

2018-05-08 Thread Taylor Blau
lineNumber has casing that is inconsistent with surrounding options,
like color.grep.matchContext, and color.grep.matchSelected. Re-case this
documentation in order to be consistent with the text around it, and to
ensure that new entries are consistent, too.

Signed-off-by: Taylor Blau 
---
 Documentation/config.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2659153cb3..6e8d969f52 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1157,7 +1157,7 @@ color.grep.::
filename prefix (when not using `-h`)
 `function`;;
function name lines (when using `-p`)
-`linenumber`;;
+`lineNumber`;;
line number prefix (when using `-n`)
 `match`;;
matching text (same as setting `matchContext` and `matchSelected`)
-- 
2.17.0



[PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-08 Thread Taylor Blau
Teach 'git-grep(1)' a new option, '--column', to show the column
number of the first match on a non-context line. This makes it possible
to teach 'contrib/git-jump/git-jump' how to seek to the first matching
position of a grep match in your editor, and allows similar additional
scripting capabilities.

For example:

  $ git grep -n --column foo | head -n3
  .clang-format:51:14:# myFunction(foo, bar, baz);
  .clang-format:64:7:# int foo();
  .clang-format:75:8:# void foo()

Signed-off-by: Taylor Blau 
---
 Documentation/git-grep.txt |  6 +-
 builtin/grep.c |  4 
 grep.c |  3 +++
 t/t7810-grep.sh| 32 
 4 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 18b494731f..75f1561112 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -13,7 +13,7 @@ SYNOPSIS
   [-v | --invert-match] [-h|-H] [--full-name]
   [-E | --extended-regexp] [-G | --basic-regexp]
   [-P | --perl-regexp]
-  [-F | --fixed-strings] [-n | --line-number]
+  [-F | --fixed-strings] [-n | --line-number] [--column]
   [-l | --files-with-matches] [-L | --files-without-match]
   [(-O | --open-files-in-pager) []]
   [-z | --null]
@@ -169,6 +169,10 @@ providing this option will cause it to die.
 --line-number::
Prefix the line number to matching lines.
 
+--column::
+   Prefix the 1-indexed byte-offset of the first match on non-context 
lines. This
+   option is incompatible with '--invert-match', and extended expressions.
+
 -l::
 --files-with-matches::
 --name-only::
diff --git a/builtin/grep.c b/builtin/grep.c
index 5f32d2ce84..f9f516dfc4 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -829,6 +829,7 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
GREP_PATTERN_TYPE_PCRE),
OPT_GROUP(""),
OPT_BOOL('n', "line-number", , N_("show line 
numbers")),
+   OPT_BOOL(0, "column", , N_("show column number of 
first match")),
OPT_NEGBIT('h', NULL, , N_("don't show 
filenames"), 1),
OPT_BIT('H', NULL, , N_("show filenames"), 1),
OPT_NEGBIT(0, "full-name", ,
@@ -,6 +1112,9 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
hit = grep_objects(, , the_repository, );
}
 
+   if (opt.columnnum && opt.invert)
+   die(_("--column and --invert-match cannot be combined"));
+
if (num_threads)
hit |= wait_all();
if (hit && show_in_pager)
diff --git a/grep.c b/grep.c
index f3fe416791..f4228c23ac 100644
--- a/grep.c
+++ b/grep.c
@@ -995,6 +995,9 @@ static void compile_grep_patterns_real(struct grep_opt *opt)
else if (!opt->extended && !opt->debug)
return;
 
+   if (opt->columnnum && opt->extended)
+   die(_("--column and extended expressions cannot be combined"));
+
p = opt->pattern_list;
if (p)
opt->pattern_expression = compile_pattern_expr();
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 1797f632a3..aa56b21ed9 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -99,6 +99,28 @@ do
test_cmp expected actual
'
 
+   test_expect_success "grep -w $L (with --column)" '
+   {
+   echo ${HC}file:5:foo mmap bar
+   echo ${HC}file:14:foo_mmap bar mmap
+   echo ${HC}file:5:foo mmap bar_mmap
+   echo ${HC}file:14:foo_mmap bar mmap baz
+   } >expected &&
+   git grep --column -w -e mmap $H >actual &&
+   test_cmp expected actual
+   '
+
+   test_expect_success "grep -w $L (with --line-number, --column)" '
+   {
+   echo ${HC}file:1:5:foo mmap bar
+   echo ${HC}file:3:14:foo_mmap bar mmap
+   echo ${HC}file:4:5:foo mmap bar_mmap
+   echo ${HC}file:5:14:foo_mmap bar mmap baz
+   } >expected &&
+   git grep -n --column -w -e mmap $H >actual &&
+   test_cmp expected actual
+   '
+
test_expect_success "grep -w $L" '
{
echo ${HC}file:1:foo mmap bar
@@ -1590,4 +1612,14 @@ test_expect_success 'grep does not report i-t-a and 
assume unchanged with -L' '
test_cmp expected actual
 '
 
+test_expect_success 'grep does not allow --column, --invert-match' '
+   test_must_fail git grep --column --invert-match pat 2>err &&
+   test_i18ngrep "\-\-column and \-\-invert-match cannot be combined" err
+'
+
+test_expect_success 'grep does not allow --column, extended' '
+   test_must_fail git grep --column --not -e pat 2>err &&
+   test_i18ngrep "\-\-column and extended 

[PATCH v5 0/7] Teach '--column' to 'git-grep(1)'

2018-05-08 Thread Taylor Blau
Hi,

Attached is my fifth re-roll of the series to add '--column' to
'git-grep(1)'.

The main changes are from a René's concerns in [1]. He points out that
'--column' with certain extended expressions can produce nonsense
(particularly because we leave the regmatch_t uninitialized).

> So at least the combination of extended matches and --column should error
> out.  Supporting it would be better, of course.  That could get tricky for
> negations, though (e.g. git grep --not -e foo).

I have opted for the die() option, instead of supporting extended
matches with --column. The problem with extended matches in this case is
that _sometimes_ --column can produce sensible results, and other times
it cannot.

For instance, an extended expression containing a single atom
has a known answer for --column, but one containing a negative atom does
only sometimes. Specifically: if an NOT atom is at the top-level, we
_never_ have a sensible answer for --column, but only sometimes do when
it is not at the top-level.

So, this leaves us two choices: (1) don't support --column and extended
expressions, or (2) support --column with extended expressions by not
printing columnar information when we don't have an answer. Option (2)
requires callers to perform deep inspection of their extended
expressions, and determine whether or not there is a answer that Git
could produce. This is too much to ask a caller to reasonably consider
when scripting. On the other hand, option (1) does not allow the caller
to do as much under certain circumstances, but simplifies their lives
when scripting, etc. For those reasons, let's pick (1).

Beyond that, here is a summary of what has changed since last time:

  * die() when given --extended, or compiling to an extended grammar,
like in the case of 'git grep --column --not -e foo' [1].

  * Clarify patch 5/7 and indicate the intended purpose of supporting
'--column' [2].

  * Clarify that '--column' gives a 1-indexed _byte_ offset, nothing
else [3,5].

  * Remove a dangling reference to '--column-number' [4].

  * Clean up contrib/git-jump/README to Ævar's suggestion [6].


Thanks as always for your kind review.


Thanks,
Taylor

[1]: https://public-inbox.org/git/d030b4ee-5a92-4863-a29c-de2642bfa...@web.de
[2]: 
https://public-inbox.org/git/CACsJy8BdJ0=gwzqvfsqy-vjtzvt4uznzrapyxryxx2wnzal...@mail.gmail.com
[3]: 
https://public-inbox.org/git/20bc9baf-a85e-f00e-859e-e796ab432...@talktalk.net
[4]: https://public-inbox.org/git/87efioy8f9@evledraar.gmail.com
[5]: https://public-inbox.org/git/xmqqk1sfpn9j@gitster-ct.c.googlers.com
[6]: https://public-inbox.org/git/87d0y8y84q@evledraar.gmail.com/

Taylor Blau (7):
  Documentation/config.txt: camel-case lineNumber for consistency
  grep.c: expose matched column in match_line()
  grep.[ch]: extend grep_opt to allow showing matched column
  grep.c: display column number of first match
  builtin/grep.c: add '--column' option to 'git-grep(1)'
  grep.c: add configuration variables to show matched option
  contrib/git-jump/git-jump: jump to match column in addition to line

 Documentation/config.txt   |  7 ++-
 Documentation/git-grep.txt |  9 +++-
 builtin/grep.c |  4 
 contrib/git-jump/README| 12 +--
 contrib/git-jump/git-jump  |  2 +-
 grep.c | 42 ++
 grep.h |  2 ++
 t/t7810-grep.sh| 32 +
 8 files changed, 96 insertions(+), 14 deletions(-)

Inter-diff (since v5):

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index d451cd8883..dc8f76ce99 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -173,7 +173,8 @@ providing this option will cause it to die.
Prefix the line number to matching lines.

 --column::
-   Prefix the 1-indexed column number of the first match on non-context 
lines.
+   Prefix the 1-indexed byte-offset of the first match on non-context 
lines. This
+   option is incompatible with '--invert-match', and extended expressions.

 -l::
 --files-with-matches::
diff --git a/builtin/grep.c b/builtin/grep.c
index 5c83f17759..f9f516dfc4 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1112,6 +1112,9 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
hit = grep_objects(, , the_repository, );
}

+   if (opt.columnnum && opt.invert)
+   die(_("--column and --invert-match cannot be combined"));
+
if (num_threads)
hit |= wait_all();
if (hit && show_in_pager)
diff --git a/contrib/git-jump/README b/contrib/git-jump/README
index 7630e16854..2f618a7f97 100644
--- a/contrib/git-jump/README
+++ b/contrib/git-jump/README
@@ -25,6 +25,13 @@ git-jump will feed this to the editor:
 foo.c:2: printf("hello word!\n");
 ---

+Or, when running 'git jump grep', column numbers will also be emitted,
+e.g. `git jump 

[PATCHv2 0/4] Rebooting pc/submodule-helper-foreach

2018-05-08 Thread Stefan Beller
v2:
* rebased onto origin/master
* dropped leftover "toplevel" variable from experimentation
* reworded the commit message for the first patch extensively
* dropped the third patch
* see "branch-diff" below.

v1:
The "What's cooking" email carried this series for some time now:
> * pc/submodule-helper-foreach (2018-02-02) 5 commits
>  - submodule: port submodule subcommand 'foreach' from shell to C
> - submodule foreach: document variable '$displaypath'
>  - submodule foreach: clarify the '$toplevel' variable documentation
>  - submodule foreach: document '$sm_path' instead of '$path'
>  - submodule foreach: correct '$path' in nested submodules from a subdirectory
> 
>  Expecting a response to review comments
>  e.g. cf. <20180206150044.1bffbb573c088d38c8e44...@google.com>

I reworded the commit message of the first patch and nearly confused
myself again, as "toplevel" doesn't refer to the "topmost" superproject,
just the direct superproject of the submodule.

However I think the code of the first patch is correct as originally presented.
Just the wording of the commit message was changed to explain the reasoning
more extensively.

With this series, we get
* keep the invariant of $toplevel + $path to be an absolute path that is
  correctly pointing at the submodule. "git -C $toplevel config" + $name
  allowing to ask configuration of the submodule.  
* $displaypath will have the relative path form $pwd to the submodule root.
* better documentation.

In later patches we could add $topmost, that points at the superproject
in which the command was started from, and $path_from_topmost, that would
be the relative path from $topmost to the submodule, potentially skipping
intermediate superprojects.

Thanks,
Stefan

Prathamesh Chavan (4):
  submodule foreach: correct '$path' in nested submodules from a
subdirectory
  submodule foreach: document '$sm_path' instead of '$path'
  submodule foreach: document variable '$displaypath'
  submodule: port submodule subcommand 'foreach' from shell to C

 Documentation/git-submodule.txt |  15 ++--
 builtin/submodule--helper.c | 144 
 git-submodule.sh|  40 +
 t/t7407-submodule-foreach.sh|  38 -
 4 files changed, 190 insertions(+), 47 deletions(-)

-- 
2.17.0.255.g8bfb7c0704

1:  0c5f405db24 ! 1:  85f91b48379 submodule foreach: correct '$path' in nested 
submodules from a subdirectory
@@ -12,45 +12,38 @@
 submodule path inside the submodule. This value is of little use and is
 hard to document.
 
-There are three different possible solutions that have more value:
-(a) The path value is documented as the path from the toplevel of the
-superproject to the mount point of the submodule. If 'the' refers 
to
-the superproject holding this submodule ('sub' holding 'nested'),
-the path would be expected to be path='nested'.
-(b) In case 'the' superproject is referring to the toplevel, which
-is the superproject in which the original command was invoked,
-then path is expected to be path='sub/nested'.
-(c) The documentation explains $path as [...] "relative to the
-superproject", following 091a6eb0fe (submodule: drop the
-top-level requirement, 2013-06-16), such that the nested submodule
-would be expected as path='../sub/nested', when "the" superproject
-is the superproject, where the command was run from
-(d) or the value of path='nested' is expected if we take the
-intermediate superproject into account. [This is the same as
-(a); it highlights that the documentation is not clear, but
-technically correct if we were to revert 091a6eb0fe.]
+Also, in git-submodule.txt, $path is documented to be the "name of the
+submodule directory relative to the superproject", but "the
+superproject" is ambiguous.
 
-The behavior for (c) was introduced in 091a6eb0fe (submodule: drop the
-top-level requirement, 2013-06-16) the intent for $path seemed to be
-relative to $cwd to the submodule worktree, but that did not work for
-nested submodules, as the intermittent submodules were not included in
-the path.
+To resolve both these issues, we could:
+(a) Change "the superproject" to "its immediate superproject", so
+$path would be "nested" instead of "../nested".
+(b) Change "the superproject" to "the superproject the original
+command was run from", so $path would be "sub/nested" instead of
+"../nested".
+(c) Change "the superproject" to "the directory the original command
+was run from", so $path would be "../sub/nested" instead of
+"../nested".
 
-If we were to fix the meaning of the $path using (a), (d) such that 
"path"
-is "the path from the 

[PATCH 1/4] submodule foreach: correct '$path' in nested submodules from a subdirectory

2018-05-08 Thread Stefan Beller
From: Prathamesh Chavan 

When running 'git submodule foreach --recursive' from a subdirectory of
your repository, nested submodules get a bogus value for $path:
For a submodule 'sub' that contains a nested submodule 'nested',
running 'git -C dir submodule foreach echo $path' from the root of the
superproject would report path='../nested' for the nested submodule.
The first part '../' is derived from the logic computing the relative
path from $pwd to the root of the superproject. The second part is the
submodule path inside the submodule. This value is of little use and is
hard to document.

Also, in git-submodule.txt, $path is documented to be the "name of the
submodule directory relative to the superproject", but "the
superproject" is ambiguous.

To resolve both these issues, we could:
(a) Change "the superproject" to "its immediate superproject", so
$path would be "nested" instead of "../nested".
(b) Change "the superproject" to "the superproject the original
command was run from", so $path would be "sub/nested" instead of
"../nested".
(c) Change "the superproject" to "the directory the original command
was run from", so $path would be "../sub/nested" instead of
"../nested".

The behavior for (c) was attempted to be introduced in 091a6eb0fe
(submodule: drop the top-level requirement, 2013-06-16) with the intent
for $path to be relative from $pwd to the submodule worktree, but that
did not work for nested submodules, as the intermittent submodules
were not included in the path.

If we were to fix the meaning of the $path using (a), we would break
any existing submodule user that runs foreach from non-root of the
superproject as the non-nested submodule '../sub' would change its
path to 'sub'.

If we were to fix the meaning of $path using (b), then we would break
any user that uses nested submodules (even from the root directory)
as the 'nested' would become 'sub/nested'.

If we were to fix the meaning of $path using (c), then we would break
the same users as in (b) as 'nested' would become 'sub/nested' from
the root directory of the superproject.

All groups can be found in the wild.  The author has no data if one group
outweighs the other by large margin, and offending each one seems equally
bad at first.  However in the authors imagination it is better to go with
(a) as running from a sub directory sounds like it is carried out by a
human rather than by some automation task.  With a human on the keyboard
the feedback loop is short and the changed behavior can be adapted to
quickly unlike some automation that can break silently.

Discussed-with: Ramsay Jones 
Signed-off-by: Prathamesh Chavan 
Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
---
 git-submodule.sh |  1 -
 t/t7407-submodule-foreach.sh | 36 ++--
 2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 24914963ca2..331d71c908b 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -345,7 +345,6 @@ cmd_foreach()
prefix="$prefix$sm_path/"
sanitize_submodule_env
cd "$sm_path" &&
-   sm_path=$(git submodule--helper relative-path 
"$sm_path" "$wt_prefix") &&
# we make $path available to scripts ...
path=$sm_path &&
if test $# -eq 1
diff --git a/t/t7407-submodule-foreach.sh b/t/t7407-submodule-foreach.sh
index 6ba5daf42ee..5144cc6926b 100755
--- a/t/t7407-submodule-foreach.sh
+++ b/t/t7407-submodule-foreach.sh
@@ -82,9 +82,9 @@ test_expect_success 'test basic "submodule foreach" usage' '
 
 cat >expect  expect <

[PATCH 2/4] submodule foreach: document '$sm_path' instead of '$path'

2018-05-08 Thread Stefan Beller
From: Prathamesh Chavan 

As using a variable '$path' may be harmful to users due to
capitalization issues, see 64394e3ae9 (git-submodule.sh: Don't
use $path variable in eval_gettext string, 2012-04-17). Adjust
the documentation to advocate for using $sm_path,  which contains
the same value. We still make the 'path' variable available and
document it as a deprecated synonym of 'sm_path'.

Discussed-with: Ramsay Jones 
Signed-off-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
Signed-off-by: Junio C Hamano 
---
 Documentation/git-submodule.txt | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 630999f41a9..066c7b6c18e 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -183,12 +183,15 @@ information too.
 
 foreach [--recursive] ::
Evaluates an arbitrary shell command in each checked out submodule.
-   The command has access to the variables $name, $path, $sha1 and
+   The command has access to the variables $name, $sm_path, $sha1 and
$toplevel:
$name is the name of the relevant submodule section in `.gitmodules`,
-   $path is the name of the submodule directory relative to the
-   superproject, $sha1 is the commit as recorded in the superproject,
-   and $toplevel is the absolute path to the top-level of the superproject.
+   $sm_path is the path of the submodule as recorded in the immediate
+   superproject, $sha1 is the commit as recorded in the immediate
+   superproject, and $toplevel is the absolute path to the top-level
+   of the immediate superproject.
+   Note that to avoid conflicts with '$PATH' on Windows, the '$path'
+   variable is now a deprecated synonym of '$sm_path' variable.
Any submodules defined in the superproject but not checked out are
ignored by this command. Unless given `--quiet`, foreach prints the name
of each submodule before evaluating the command.
-- 
2.17.0.255.g8bfb7c0704



[PATCH 3/4] submodule foreach: document variable '$displaypath'

2018-05-08 Thread Stefan Beller
From: Prathamesh Chavan 

It was observed that the variable '$displaypath' was accessible but
undocumented. Hence, document it.

Discussed-with: Ramsay Jones 
Signed-off-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
Signed-off-by: Junio C Hamano 
---
 Documentation/git-submodule.txt |  8 +---
 t/t7407-submodule-foreach.sh| 22 +++---
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 066c7b6c18e..500dfdafd16 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -183,11 +183,13 @@ information too.
 
 foreach [--recursive] ::
Evaluates an arbitrary shell command in each checked out submodule.
-   The command has access to the variables $name, $sm_path, $sha1 and
-   $toplevel:
+   The command has access to the variables $name, $sm_path, $displaypath,
+   $sha1 and $toplevel:
$name is the name of the relevant submodule section in `.gitmodules`,
$sm_path is the path of the submodule as recorded in the immediate
-   superproject, $sha1 is the commit as recorded in the immediate
+   superproject, $displaypath contains the relative path from the
+   current working directory to the submodules root directory,
+   $sha1 is the commit as recorded in the immediate
superproject, and $toplevel is the absolute path to the top-level
of the immediate superproject.
Note that to avoid conflicts with '$PATH' on Windows, the '$path'
diff --git a/t/t7407-submodule-foreach.sh b/t/t7407-submodule-foreach.sh
index 5144cc6926b..77729ac4aa1 100755
--- a/t/t7407-submodule-foreach.sh
+++ b/t/t7407-submodule-foreach.sh
@@ -82,16 +82,16 @@ test_expect_success 'test basic "submodule foreach" usage' '
 
 cat >expect <../../actual
+   git submodule foreach "echo 
\$toplevel-\$name-\$sm_path-\$displaypath-\$sha1" >../../actual
) &&
test_i18ncmp expect actual
 '
@@ -206,25 +206,25 @@ submodulesha1=$(cd 
clone2/nested1/nested2/nested3/submodule && git rev-parse HEA
 
 cat >expect <../../actual
+   git submodule foreach --recursive "echo toplevel: \$toplevel 
name: \$name path: \$sm_path displaypath: \$displaypath hash: \$sha1" 
>../../actual
) &&
test_i18ncmp expect actual
 '
-- 
2.17.0.255.g8bfb7c0704



[PATCH 4/4] submodule: port submodule subcommand 'foreach' from shell to C

2018-05-08 Thread Stefan Beller
From: Prathamesh Chavan 

This aims to make git-submodule foreach a builtin. 'foreach' is ported to
the submodule--helper, and submodule--helper is called from
git-submodule.sh.

Helped-by: Brandon Williams 
Mentored-by: Christian Couder 
Mentored-by: Stefan Beller 
Signed-off-by: Prathamesh Chavan 
Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 144 
 git-submodule.sh|  39 +-
 2 files changed, 145 insertions(+), 38 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index c2403a915ff..4b0b773c06b 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -439,6 +439,149 @@ static void for_each_listed_submodule(const struct 
module_list *list,
fn(list->entries[i], cb_data);
 }
 
+struct cb_foreach {
+   int argc;
+   const char **argv;
+   const char *prefix;
+   int quiet;
+   int recursive;
+};
+#define CB_FOREACH_INIT { 0 }
+
+static void runcommand_in_submodule_cb(const struct cache_entry *list_item,
+  void *cb_data)
+{
+   struct cb_foreach *info = cb_data;
+   const char *path = list_item->name;
+   const struct object_id *ce_oid = _item->oid;
+
+   const struct submodule *sub;
+   struct child_process cp = CHILD_PROCESS_INIT;
+   char *displaypath;
+
+   displaypath = get_submodule_displaypath(path, info->prefix);
+
+   sub = submodule_from_path(the_repository, _oid, path);
+
+   if (!sub)
+   die(_("No url found for submodule path '%s' in .gitmodules"),
+   displaypath);
+
+   if (!is_submodule_populated_gently(path, NULL))
+   goto cleanup;
+
+   prepare_submodule_repo_env(_array);
+
+   /*
+   * For the purpose of executing  in the submodule,
+   * separate shell is used for the purpose of running the
+   * child process.
+   */
+   cp.use_shell = 1;
+   cp.dir = path;
+
+   /*
+   * NEEDSWORK: the command currently has access to the variables $name,
+   * $sm_path, $displaypath, $sha1 and $toplevel only when the command
+   * contains a single argument. This is done for maintaining a faithful
+   * translation from shell script.
+   */
+   if (info->argc == 1) {
+   char *toplevel = xgetcwd();
+   struct strbuf sb = STRBUF_INIT;
+
+   argv_array_pushf(_array, "name=%s", sub->name);
+   argv_array_pushf(_array, "sm_path=%s", path);
+   argv_array_pushf(_array, "displaypath=%s", displaypath);
+   argv_array_pushf(_array, "sha1=%s",
+   oid_to_hex(ce_oid));
+   argv_array_pushf(_array, "toplevel=%s", toplevel);
+
+   /*
+   * Since the path variable was accessible from the script
+   * before porting, it is also made available after porting.
+   * The environment variable "PATH" has a very special purpose
+   * on windows. And since environment variables are
+   * case-insensitive in windows, it interferes with the
+   * existing PATH variable. Hence, to avoid that, we expose
+   * path via the args argv_array and not via env_array.
+   */
+   sq_quote_buf(, path);
+   argv_array_pushf(, "path=%s; %s",
+sb.buf, info->argv[0]);
+   strbuf_release();
+   free(toplevel);
+   } else {
+   argv_array_pushv(, info->argv);
+   }
+
+   if (!info->quiet)
+   printf(_("Entering '%s'\n"), displaypath);
+
+   if (info->argv[0] && run_command())
+   die(_("run_command returned non-zero status for %s\n."),
+   displaypath);
+
+   if (info->recursive) {
+   struct child_process cpr = CHILD_PROCESS_INIT;
+
+   cpr.git_cmd = 1;
+   cpr.dir = path;
+   prepare_submodule_repo_env(_array);
+
+   argv_array_pushl(, "--super-prefix", NULL);
+   argv_array_pushf(, "%s/", displaypath);
+   argv_array_pushl(, "submodule--helper", "foreach", 
"--recursive",
+   NULL);
+
+   if (info->quiet)
+   argv_array_push(, "--quiet");
+
+   argv_array_pushv(, info->argv);
+
+   if (run_command())
+   die(_("run_command returned non-zero status while"
+   "recursing in the nested submodules of %s\n."),
+   displaypath);
+   }
+
+cleanup:
+   free(displaypath);
+}
+
+static int module_foreach(int argc, const char **argv, const char *prefix)
+{
+   struct 

Re: [PATCH 2/2] Documentation: render revisions correctly under Asciidoctor

2018-05-08 Thread brian m. carlson
On Tue, May 08, 2018 at 08:31:10PM +0200, Martin Ågren wrote:
> On 8 May 2018 at 03:13, brian m. carlson  wrote:
> > Since this patch fixes the present issue, I'd like to leave it as it is
> > and run through a cleanup series a bit later that catches all the
> > literal indented blocks and marks them explicitly to avoid this issue in
> > the future.
> 
> Sounds like a plan. :-) As noted elsewhere, I have no immediate idea how
> to identify which of the 13000 tab-indented lines are really part of
> diagrams and ascii-art. Counting the number of spaces? Anyway, thanks
> for this mini-series.

I have an utterly terrible Ruby one-liner which I wrote in a few
minutes.  It has some false positives, but not enough that I'd mind
sorting through them.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v2] mailmap: update brian m. carlson's email address

2018-05-08 Thread brian m. carlson
On Tue, May 08, 2018 at 10:34:58AM +0530, Kaartic Sivaraam wrote:
> Just to be sure, you're meaning the use of `git log --use-mailmap` when
> you mean `git log` in the log message. Am I right? Or did you mean `git
> shortlog` ?
> 
> I'm asking this because I think the `git log` output doesn't consider
> the mailmap file by default.

Correct, it doesn't.  In my case, I was using --pretty='%aN <%aE>',
which is how I noticed it in the first place.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 06/28] t0000: annotate with SHA1 prerequisite

2018-05-08 Thread brian m. carlson
On Tue, May 08, 2018 at 08:28:47PM +0200, Martin Ågren wrote:
> On 8 May 2018 at 01:40, brian m. carlson  wrote:
> > As I mentioned in an earlier email, I plan to set an environment
> > variable for the algorithms in use and then do something like:
> >
> >   test "$tree" = "$(test-tool hash-helper --output known-tree)"
> >
> > where "known-tree" is some key we can use to look up the SHA-1 or
> > NewHash value, and we've specified we want the output format (as opposed
> > to input format or on-disk format).
> 
> My first thought was, can't we introduce such a helper already now? It
> would not check with the environment, but would always output SHA-1
> values. Thinking about it some more, maybe the exact usage/interface
> would be a bit volatile in the beginning, making it just as good an idea
> to gain some more experience and feeling first.

I think your second thought is spot on.  As we move on in this work, it
will become more obvious what functionality we need.

As a note, the benefit of using the prerequisite is in development work.
It's useful to know that all but a certain set of tests pass in the
codebase, because that gives me confidence that Git as a whole is mostly
functional as a result of my refactors.

Because I'm using a hash that may not be the final NewHash, it doesn't
make sense for me to compute translation tables.  Even if I did go
through that work, I couldn't submit it upstream, because nobody is
interested in carrying code for brian's development hash.  A helper that
output SHA-1 wouldn't be useful to me, because the tests would still
fail, and I wouldn't know what was failing because it was broken and
what just depended on SHA-1.

So part of the idea with the prerequisite is to provide code that can be
included in the main project and provides benefit to those that are
doing future development work.  The test helper approach becomes more
viable once we're clearer about what variations we need from the helper
and what hash we're actually using.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 01/28] t/test-lib: add an SHA1 prerequisite

2018-05-08 Thread brian m. carlson
On Tue, May 08, 2018 at 08:26:05PM +0200, Martin Ågren wrote:
> On 8 May 2018 at 01:30, brian m. carlson  wrote:
> > On Mon, May 07, 2018 at 12:10:39PM +0200, Martin Ågren wrote:
> >> Do we actually need more SHA-1-related prereqs, at least long-term, in
> >> which case we would want to find a more specific name for this one now?
> >> Is this SHA1_STORAGE, or some much better name than that?
> >
> > We may.  The transition plan anticipates several states:
> 
> "We may" as in, "we may need more SHA1-FOO prereqs later", or "we may
> want this to be SHA1-BAR"?

As in, we may need additional prerequisites.

> I do not feel entirely relaxed about a reasoning such as "this prereq
> will soon go away again, so we do not need to think too much about its
> name and meaning" (heavily paraphrased and possibly a bit pointed, but
> hopefully not too dishonest).

I think "SHA1" is short and reasonable considering that it's basically
stating, "This test depends on Git using SHA-1."  That's all we're
stating here.

I agree that the expected lifetime of the code should not impact its
design or naming in this case.  As someone who does maintenance for a
living, I'm all too aware that code lives far longer than its expected
lifetime.

> I guess a counter-argument might be "sure, if only we knew which
> SHA1-FOOs we will need. Only time and experience will tell." You've
> certainly spent way more brain-cycles on this than I have, and most
> likely more than anyone else on this list.
> 
> Maybe we want to document the transition-ness of this in the code and/or
> the commit message. Not only "transition" in the sense of the big
> transition, but in the sense of "this will probably go away long before
> the transition is completed."

Sure.  I can fix this up in a reroll.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 8/8] gpg-interface: handle alternative signature types

2018-05-08 Thread brian m. carlson
On Tue, May 08, 2018 at 09:28:14AM -0400, Jeff King wrote:
> OK, so my question then is: what does just-gpgsm support look like?
> 
> Do we literally add gpgsm.program? My thought was that taking us the
> first step towards a more generic config scheme would prevent us having
> to backtrack later.

I think the signingtool prefix is fine, or something similar.  My "just
gpgsm" proposal is literally just "check for PGP header" and "check for
CMS header" in parse_signature and dispatch appropriately.

> There are also more CMS signers than gpgsm (and I know Ben is working on
> a tool). So it feels a little ugly to make it "gpgsm.program", since it
> really is a more generic format.

Okay, so signingtool.cms.program?  signingtool.smime.program?

I suppose Ben still intends to use the same command-line interface as
for gpgsm.

> Or would you be happy if we just turned the matcher into a whole-line
> substring or regex match?

A first line regex would probably be fine, if you want to go that way.
That, I think, is generic enough that we can make use of it down the
line, since it distinguishes all known formats, TTBOMK.

It would be nice if we could still continue to use gpg without having to
add specific configuration for it, at least for compatibility reasons.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [GSoC] A blog about 'git stash' project

2018-05-08 Thread Paul-Sebastian Ungureanu

Hello,

On 08.05.2018 07:00, Taylor Blau wrote:

On Fri, May 04, 2018 at 12:48:21AM +0300, Paul-Sebastian Ungureanu wrote:

Hello everybody,

The community bonding period started. It is well known that for a greater
rate of success, it is recommended to send weekly reports regarding project
status. But, what would be a good form for such a report? I, for one, think
that starting a blog is one of the best options because all the content will
be stored in one place. Without further ado, I would like you to present my
blog [1].


Hi Paul, and welcome to Git! I am looking forward to reading your
patches and any additional writing posted on your blog.


It is a pleasure to be here. Thank you!


Any feedback is greatly appreciated! Thank you!


Do you have a RSS feed that I can consume in a feed reader?


Yes. It can be found here [1].

[1]
https://ungps.github.io/atom.xml
Best,
Paul


Re: [PATCH] t6050-replace: don't disable stdin for the whole test script

2018-05-08 Thread Johannes Schindelin
Hi Gábor,

On Mon, 7 May 2018, SZEDER Gábor wrote:

> The test script 't6050-replace.sh' starts off with redirecting the whole
> test script's stdin from /dev/null.  This redirection has been there
> since the test script was introduced in a3e8267225 (replace_object: add
> a test case, 2009-01-23), but the commit message doesn't explain why it
> was deemed necessary.  AFAICT, it has never been necessary, and t6050
> runs just fine and succeeds even without it, not only the current
> version but past versions as well.
> 
> Besides being unnecessary, this redirection is also harmful, as it
> prevents the test helper functions 'test_pause' and 'debug' from working
> properly in t6050, because we can't enter any commands to the shell and
> the debugger, respectively.

The redirection might have been necessary before 781f76b1582 (test-lib:
redirect stdin of tests, 2011-12-15), but it definitely is not necessary
now.

Thanks,
Dscho

Re: [PATCH v3 13/13] alloc: allow arbitrary repositories for alloc functions

2018-05-08 Thread Stefan Beller
On Tue, May 8, 2018 at 1:04 PM, Jonathan Tan  wrote:
> On Tue,  8 May 2018 12:37:36 -0700
> Stefan Beller  wrote:
>
>> +void clear_alloc_state(struct alloc_state *s)
>> +{
>> + while (s->slab_nr > 0) {
>> + s->slab_nr--;
>> + free(s->slabs[s->slab_nr]);
>> + }
>
> I should have caught this earlier, but you need to free s->slabs itself
> too.

ok.

>
>> +void release_tree_node(struct tree *t);
>> +void release_commit_node(struct commit *c);
>> +void release_tag_node(struct tag *t);
>
> Do these really need to be defined in alloc.c? I would think that it
> would be sufficient to define them as static in object.c.
>
> Having said that, opinions differ (e.g. Duy said he thinks that release_
> goes with alloc_ [1]) so I'm OK either way.

I would have preferred static as well, but went with Duys suggestion of
having it in alloc.c.

I can change that.

Thanks,
Stefan


Re: [PATCH v3 13/13] alloc: allow arbitrary repositories for alloc functions

2018-05-08 Thread Jonathan Tan
On Tue,  8 May 2018 12:37:36 -0700
Stefan Beller  wrote:

> +void clear_alloc_state(struct alloc_state *s)
> +{
> + while (s->slab_nr > 0) {
> + s->slab_nr--;
> + free(s->slabs[s->slab_nr]);
> + }

I should have caught this earlier, but you need to free s->slabs itself
too.

> +void release_tree_node(struct tree *t);
> +void release_commit_node(struct commit *c);
> +void release_tag_node(struct tag *t);

Do these really need to be defined in alloc.c? I would think that it
would be sufficient to define them as static in object.c.

Having said that, opinions differ (e.g. Duy said he thinks that release_
goes with alloc_ [1]) so I'm OK either way.

[1] 
https://public-inbox.org/git/cacsjy8d-e-bff3s+lqamfwb-w8opkjrffrv9o5s3ku+m5aa...@mail.gmail.com/


[PATCH v3 09/13] alloc: add repository argument to alloc_report

2018-05-08 Thread Stefan Beller
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 alloc.c | 2 +-
 cache.h | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/alloc.c b/alloc.c
index f031ce422d9..28b85b22144 100644
--- a/alloc.c
+++ b/alloc.c
@@ -105,7 +105,7 @@ static void report(const char *name, unsigned int count, 
size_t size)
 #define REPORT(name, type) \
 report(#name, name##_state.count, name##_state.count * sizeof(type) >> 10)
 
-void alloc_report(void)
+void alloc_report_the_repository(void)
 {
REPORT(blob, struct blob);
REPORT(tree, struct tree);
diff --git a/cache.h b/cache.h
index 2d60359a964..01cc207d218 100644
--- a/cache.h
+++ b/cache.h
@@ -1774,7 +1774,8 @@ extern void *alloc_commit_node_the_repository(void);
 extern void *alloc_tag_node_the_repository(void);
 #define alloc_object_node(r) alloc_object_node_##r()
 extern void *alloc_object_node_the_repository(void);
-extern void alloc_report(void);
+#define alloc_report(r) alloc_report_##r()
+extern void alloc_report_the_repository(void);
 extern unsigned int alloc_commit_index(void);
 
 /* pkt-line.c */
-- 
2.17.0.255.g8bfb7c0704



[PATCH v3 06/13] alloc: add repository argument to alloc_commit_node

2018-05-08 Thread Stefan Beller
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 alloc.c   | 2 +-
 blame.c   | 2 +-
 cache.h   | 3 ++-
 commit.c  | 2 +-
 merge-recursive.c | 2 +-
 5 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/alloc.c b/alloc.c
index 2c8d1430758..9e2b897ec1d 100644
--- a/alloc.c
+++ b/alloc.c
@@ -88,7 +88,7 @@ unsigned int alloc_commit_index(void)
return count++;
 }
 
-void *alloc_commit_node(void)
+void *alloc_commit_node_the_repository(void)
 {
struct commit *c = alloc_node(_state, sizeof(struct commit));
c->object.type = OBJ_COMMIT;
diff --git a/blame.c b/blame.c
index dfa24473dc6..ba9b18e7542 100644
--- a/blame.c
+++ b/blame.c
@@ -161,7 +161,7 @@ static struct commit *fake_working_tree_commit(struct 
diff_options *opt,
 
read_cache();
time();
-   commit = alloc_commit_node();
+   commit = alloc_commit_node(the_repository);
commit->object.parsed = 1;
commit->date = now;
parent_tail = >parents;
diff --git a/cache.h b/cache.h
index 1717d07a2c5..bf6e8c87d83 100644
--- a/cache.h
+++ b/cache.h
@@ -1768,7 +1768,8 @@ void encode_85(char *buf, const unsigned char *data, int 
bytes);
 extern void *alloc_blob_node_the_repository(void);
 #define alloc_tree_node(r) alloc_tree_node_##r()
 extern void *alloc_tree_node_the_repository(void);
-extern void *alloc_commit_node(void);
+#define alloc_commit_node(r) alloc_commit_node_##r()
+extern void *alloc_commit_node_the_repository(void);
 extern void *alloc_tag_node(void);
 extern void *alloc_object_node(void);
 extern void alloc_report(void);
diff --git a/commit.c b/commit.c
index 9106acf0aad..a9a43e79bae 100644
--- a/commit.c
+++ b/commit.c
@@ -51,7 +51,7 @@ struct commit *lookup_commit(const struct object_id *oid)
struct object *obj = lookup_object(oid->hash);
if (!obj)
return create_object(the_repository, oid->hash,
-alloc_commit_node());
+alloc_commit_node(the_repository));
return object_as_type(obj, OBJ_COMMIT, 0);
 }
 
diff --git a/merge-recursive.c b/merge-recursive.c
index 0c0d48624da..6dac8908648 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -98,7 +98,7 @@ static struct tree *shift_tree_object(struct tree *one, 
struct tree *two,
 
 static struct commit *make_virtual_commit(struct tree *tree, const char 
*comment)
 {
-   struct commit *commit = alloc_commit_node();
+   struct commit *commit = alloc_commit_node(the_repository);
 
set_merge_remote_desc(commit, comment, (struct object *)commit);
commit->tree = tree;
-- 
2.17.0.255.g8bfb7c0704



[PATCH v3 10/13] alloc: add repository argument to alloc_commit_index

2018-05-08 Thread Stefan Beller
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 alloc.c  | 4 ++--
 cache.h  | 3 ++-
 object.c | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/alloc.c b/alloc.c
index 28b85b22144..277dadd221b 100644
--- a/alloc.c
+++ b/alloc.c
@@ -82,7 +82,7 @@ void *alloc_object_node_the_repository(void)
 
 static struct alloc_state commit_state;
 
-unsigned int alloc_commit_index(void)
+unsigned int alloc_commit_index_the_repository(void)
 {
static unsigned int count;
return count++;
@@ -92,7 +92,7 @@ void *alloc_commit_node_the_repository(void)
 {
struct commit *c = alloc_node(_state, sizeof(struct commit));
c->object.type = OBJ_COMMIT;
-   c->index = alloc_commit_index();
+   c->index = alloc_commit_index(the_repository);
return c;
 }
 
diff --git a/cache.h b/cache.h
index 01cc207d218..0e6c5dd5639 100644
--- a/cache.h
+++ b/cache.h
@@ -1776,7 +1776,8 @@ extern void *alloc_tag_node_the_repository(void);
 extern void *alloc_object_node_the_repository(void);
 #define alloc_report(r) alloc_report_##r()
 extern void alloc_report_the_repository(void);
-extern unsigned int alloc_commit_index(void);
+#define alloc_commit_index(r) alloc_commit_index_##r()
+extern unsigned int alloc_commit_index_the_repository(void);
 
 /* pkt-line.c */
 void packet_trace_identity(const char *prog);
diff --git a/object.c b/object.c
index b8c3f923c51..a365a910859 100644
--- a/object.c
+++ b/object.c
@@ -162,7 +162,7 @@ void *object_as_type(struct object *obj, enum object_type 
type, int quiet)
return obj;
else if (obj->type == OBJ_NONE) {
if (type == OBJ_COMMIT)
-   ((struct commit *)obj)->index = alloc_commit_index();
+   ((struct commit *)obj)->index = 
alloc_commit_index(the_repository);
obj->type = type;
return obj;
}
-- 
2.17.0.255.g8bfb7c0704



[PATCH v3 03/13] object: add repository argument to grow_object_hash

2018-05-08 Thread Stefan Beller
From: Jonathan Nieder 

Add a repository argument to allow the caller of grow_object_hash to
be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

Signed-off-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 object.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/object.c b/object.c
index 2de029275bc..91edc30770c 100644
--- a/object.c
+++ b/object.c
@@ -116,7 +116,8 @@ struct object *lookup_object(const unsigned char *sha1)
  * power of 2 (but at least 32).  Copy the existing values to the new
  * hash map.
  */
-static void grow_object_hash(void)
+#define grow_object_hash(r) grow_object_hash_##r()
+static void grow_object_hash_the_repository(void)
 {
int i;
/*
@@ -147,7 +148,7 @@ void *create_object_the_repository(const unsigned char 
*sha1, void *o)
hashcpy(obj->oid.hash, sha1);
 
if (the_repository->parsed_objects->obj_hash_size - 1 <= 
the_repository->parsed_objects->nr_objs * 2)
-   grow_object_hash();
+   grow_object_hash(the_repository);
 
insert_obj_hash(obj, the_repository->parsed_objects->obj_hash,
the_repository->parsed_objects->obj_hash_size);
-- 
2.17.0.255.g8bfb7c0704



[PATCH v3 04/13] alloc: add repository argument to alloc_blob_node

2018-05-08 Thread Stefan Beller
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 alloc.c | 2 +-
 blob.c  | 2 +-
 cache.h | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 12afadfacdd..6c5c376a25a 100644
--- a/alloc.c
+++ b/alloc.c
@@ -49,7 +49,7 @@ static inline void *alloc_node(struct alloc_state *s, size_t 
node_size)
 }
 
 static struct alloc_state blob_state;
-void *alloc_blob_node(void)
+void *alloc_blob_node_the_repository(void)
 {
struct blob *b = alloc_node(_state, sizeof(struct blob));
b->object.type = OBJ_BLOB;
diff --git a/blob.c b/blob.c
index 85c2143f299..9e64f301895 100644
--- a/blob.c
+++ b/blob.c
@@ -9,7 +9,7 @@ struct blob *lookup_blob(const struct object_id *oid)
struct object *obj = lookup_object(oid->hash);
if (!obj)
return create_object(the_repository, oid->hash,
-alloc_blob_node());
+alloc_blob_node(the_repository));
return object_as_type(obj, OBJ_BLOB, 0);
 }
 
diff --git a/cache.h b/cache.h
index 3a4d80e92bf..2258e611275 100644
--- a/cache.h
+++ b/cache.h
@@ -1764,7 +1764,8 @@ int decode_85(char *dst, const char *line, int linelen);
 void encode_85(char *buf, const unsigned char *data, int bytes);
 
 /* alloc.c */
-extern void *alloc_blob_node(void);
+#define alloc_blob_node(r) alloc_blob_node_##r()
+extern void *alloc_blob_node_the_repository(void);
 extern void *alloc_tree_node(void);
 extern void *alloc_commit_node(void);
 extern void *alloc_tag_node(void);
-- 
2.17.0.255.g8bfb7c0704



[PATCH v3 08/13] alloc: add repository argument to alloc_object_node

2018-05-08 Thread Stefan Beller
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 alloc.c  | 2 +-
 cache.h  | 3 ++-
 object.c | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 290250e3595..f031ce422d9 100644
--- a/alloc.c
+++ b/alloc.c
@@ -73,7 +73,7 @@ void *alloc_tag_node_the_repository(void)
 }
 
 static struct alloc_state object_state;
-void *alloc_object_node(void)
+void *alloc_object_node_the_repository(void)
 {
struct object *obj = alloc_node(_state, sizeof(union 
any_object));
obj->type = OBJ_NONE;
diff --git a/cache.h b/cache.h
index 32f340cde59..2d60359a964 100644
--- a/cache.h
+++ b/cache.h
@@ -1772,7 +1772,8 @@ extern void *alloc_tree_node_the_repository(void);
 extern void *alloc_commit_node_the_repository(void);
 #define alloc_tag_node(r) alloc_tag_node_##r()
 extern void *alloc_tag_node_the_repository(void);
-extern void *alloc_object_node(void);
+#define alloc_object_node(r) alloc_object_node_##r()
+extern void *alloc_object_node_the_repository(void);
 extern void alloc_report(void);
 extern unsigned int alloc_commit_index(void);
 
diff --git a/object.c b/object.c
index 91edc30770c..b8c3f923c51 100644
--- a/object.c
+++ b/object.c
@@ -180,7 +180,7 @@ struct object *lookup_unknown_object(const unsigned char 
*sha1)
struct object *obj = lookup_object(sha1);
if (!obj)
obj = create_object(the_repository, sha1,
-   alloc_object_node());
+   alloc_object_node(the_repository));
return obj;
 }
 
-- 
2.17.0.255.g8bfb7c0704



[PATCH v3 12/13] object: allow create_object to handle arbitrary repositories

2018-05-08 Thread Stefan Beller
Reviewed-by: Jonathan Tan 
Signed-off-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 object.c | 12 ++--
 object.h |  3 +--
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/object.c b/object.c
index 0fcd6f6df42..49b952e9299 100644
--- a/object.c
+++ b/object.c
@@ -139,7 +139,7 @@ static void grow_object_hash(struct repository *r)
r->parsed_objects->obj_hash_size = new_hash_size;
 }
 
-void *create_object_the_repository(const unsigned char *sha1, void *o)
+void *create_object(struct repository *r, const unsigned char *sha1, void *o)
 {
struct object *obj = o;
 
@@ -147,12 +147,12 @@ void *create_object_the_repository(const unsigned char 
*sha1, void *o)
obj->flags = 0;
hashcpy(obj->oid.hash, sha1);
 
-   if (the_repository->parsed_objects->obj_hash_size - 1 <= 
the_repository->parsed_objects->nr_objs * 2)
-   grow_object_hash(the_repository);
+   if (r->parsed_objects->obj_hash_size - 1 <= r->parsed_objects->nr_objs 
* 2)
+   grow_object_hash(r);
 
-   insert_obj_hash(obj, the_repository->parsed_objects->obj_hash,
-   the_repository->parsed_objects->obj_hash_size);
-   the_repository->parsed_objects->nr_objs++;
+   insert_obj_hash(obj, r->parsed_objects->obj_hash,
+   r->parsed_objects->obj_hash_size);
+   r->parsed_objects->nr_objs++;
return obj;
 }
 
diff --git a/object.h b/object.h
index 2cb0b241083..b41d7a3accb 100644
--- a/object.h
+++ b/object.h
@@ -93,8 +93,7 @@ extern struct object *get_indexed_object(unsigned int);
  */
 struct object *lookup_object(const unsigned char *sha1);
 
-#define create_object(r, s, o) create_object_##r(s, o)
-extern void *create_object_the_repository(const unsigned char *sha1, void 
*obj);
+extern void *create_object(struct repository *r, const unsigned char *sha1, 
void *obj);
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
-- 
2.17.0.255.g8bfb7c0704



[PATCH v3 13/13] alloc: allow arbitrary repositories for alloc functions

2018-05-08 Thread Stefan Beller
We have to convert all of the alloc functions at once, because alloc_report
uses a funky macro for reporting. It is better for the sake of mechanical
conversion to convert multiple functions at once rather than changing the
structure of the reporting function.

We record all memory allocation in alloc.c, and free them in
clear_alloc_state, which is called for all repositories except
the_repository.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 alloc.c   | 79 +--
 alloc.h   | 23 ++
 blame.c   |  1 +
 blob.c|  1 +
 cache.h   | 16 --
 commit.c  |  1 +
 merge-recursive.c |  1 +
 object.c  | 37 --
 object.h  |  8 +
 tag.c |  1 +
 tree.c|  1 +
 11 files changed, 127 insertions(+), 42 deletions(-)
 create mode 100644 alloc.h

diff --git a/alloc.c b/alloc.c
index 277dadd221b..4ecf0f160f4 100644
--- a/alloc.c
+++ b/alloc.c
@@ -4,8 +4,7 @@
  * Copyright (C) 2006 Linus Torvalds
  *
  * The standard malloc/free wastes too much space for objects, partly because
- * it maintains all the allocation infrastructure (which isn't needed, since
- * we never free an object descriptor anyway), but even more because it ends
+ * it maintains all the allocation infrastructure, but even more because it 
ends
  * up with maximal alignment because it doesn't know what the object alignment
  * for the new allocation is.
  */
@@ -15,6 +14,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 
 #define BLOCKING 1024
 
@@ -30,8 +30,25 @@ struct alloc_state {
int count; /* total number of nodes allocated */
int nr;/* number of nodes left in current allocation */
void *p;   /* first free node in current allocation */
+
+   /* bookkeeping of allocations */
+   void **slabs;
+   int slab_nr, slab_alloc;
 };
 
+void *allocate_alloc_state(void)
+{
+   return xcalloc(1, sizeof(struct alloc_state));
+}
+
+void clear_alloc_state(struct alloc_state *s)
+{
+   while (s->slab_nr > 0) {
+   s->slab_nr--;
+   free(s->slabs[s->slab_nr]);
+   }
+}
+
 static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 {
void *ret;
@@ -39,63 +56,76 @@ static inline void *alloc_node(struct alloc_state *s, 
size_t node_size)
if (!s->nr) {
s->nr = BLOCKING;
s->p = xmalloc(BLOCKING * node_size);
+
+   ALLOC_GROW(s->slabs, s->slab_nr + 1, s->slab_alloc);
+   s->slabs[s->slab_nr++] = s->p;
}
s->nr--;
s->count++;
ret = s->p;
s->p = (char *)s->p + node_size;
memset(ret, 0, node_size);
+
return ret;
 }
 
-static struct alloc_state blob_state;
-void *alloc_blob_node_the_repository(void)
+void *alloc_blob_node(struct repository *r)
 {
-   struct blob *b = alloc_node(_state, sizeof(struct blob));
+   struct blob *b = alloc_node(r->parsed_objects->blob_state, 
sizeof(struct blob));
b->object.type = OBJ_BLOB;
return b;
 }
 
-static struct alloc_state tree_state;
-void *alloc_tree_node_the_repository(void)
+void *alloc_tree_node(struct repository *r)
 {
-   struct tree *t = alloc_node(_state, sizeof(struct tree));
+   struct tree *t = alloc_node(r->parsed_objects->tree_state, 
sizeof(struct tree));
t->object.type = OBJ_TREE;
return t;
 }
 
-static struct alloc_state tag_state;
-void *alloc_tag_node_the_repository(void)
+void *alloc_tag_node(struct repository *r)
 {
-   struct tag *t = alloc_node(_state, sizeof(struct tag));
+   struct tag *t = alloc_node(r->parsed_objects->tag_state, sizeof(struct 
tag));
t->object.type = OBJ_TAG;
return t;
 }
 
-static struct alloc_state object_state;
-void *alloc_object_node_the_repository(void)
+void *alloc_object_node(struct repository *r)
 {
-   struct object *obj = alloc_node(_state, sizeof(union 
any_object));
+   struct object *obj = alloc_node(r->parsed_objects->object_state, 
sizeof(union any_object));
obj->type = OBJ_NONE;
return obj;
 }
 
-static struct alloc_state commit_state;
-
-unsigned int alloc_commit_index_the_repository(void)
+unsigned int alloc_commit_index(struct repository *r)
 {
-   static unsigned int count;
-   return count++;
+   return r->parsed_objects->commit_count++;
 }
 
-void *alloc_commit_node_the_repository(void)
+void *alloc_commit_node(struct repository *r)
 {
-   struct commit *c = alloc_node(_state, sizeof(struct commit));
+   struct commit *c = alloc_node(r->parsed_objects->commit_state, 
sizeof(struct commit));
c->object.type = OBJ_COMMIT;
-   c->index = alloc_commit_index(the_repository);
+   c->index = alloc_commit_index(r);
return c;
 }
 
+void release_tree_node(struct tree *t)
+{
+

[PATCH v3 07/13] alloc: add repository argument to alloc_tag_node

2018-05-08 Thread Stefan Beller
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 alloc.c | 2 +-
 cache.h | 3 ++-
 tag.c   | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 9e2b897ec1d..290250e3595 100644
--- a/alloc.c
+++ b/alloc.c
@@ -65,7 +65,7 @@ void *alloc_tree_node_the_repository(void)
 }
 
 static struct alloc_state tag_state;
-void *alloc_tag_node(void)
+void *alloc_tag_node_the_repository(void)
 {
struct tag *t = alloc_node(_state, sizeof(struct tag));
t->object.type = OBJ_TAG;
diff --git a/cache.h b/cache.h
index bf6e8c87d83..32f340cde59 100644
--- a/cache.h
+++ b/cache.h
@@ -1770,7 +1770,8 @@ extern void *alloc_blob_node_the_repository(void);
 extern void *alloc_tree_node_the_repository(void);
 #define alloc_commit_node(r) alloc_commit_node_##r()
 extern void *alloc_commit_node_the_repository(void);
-extern void *alloc_tag_node(void);
+#define alloc_tag_node(r) alloc_tag_node_##r()
+extern void *alloc_tag_node_the_repository(void);
 extern void *alloc_object_node(void);
 extern void alloc_report(void);
 extern unsigned int alloc_commit_index(void);
diff --git a/tag.c b/tag.c
index 7150b759d66..02ef4eaafc0 100644
--- a/tag.c
+++ b/tag.c
@@ -94,7 +94,7 @@ struct tag *lookup_tag(const struct object_id *oid)
struct object *obj = lookup_object(oid->hash);
if (!obj)
return create_object(the_repository, oid->hash,
-alloc_tag_node());
+alloc_tag_node(the_repository));
return object_as_type(obj, OBJ_TAG, 0);
 }
 
-- 
2.17.0.255.g8bfb7c0704



[PATCH v3 11/13] object: allow grow_object_hash to handle arbitrary repositories

2018-05-08 Thread Stefan Beller
Reviewed-by: Jonathan Tan 
Signed-off-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 object.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/object.c b/object.c
index a365a910859..0fcd6f6df42 100644
--- a/object.c
+++ b/object.c
@@ -116,27 +116,27 @@ struct object *lookup_object(const unsigned char *sha1)
  * power of 2 (but at least 32).  Copy the existing values to the new
  * hash map.
  */
-#define grow_object_hash(r) grow_object_hash_##r()
-static void grow_object_hash_the_repository(void)
+static void grow_object_hash(struct repository *r)
 {
int i;
/*
 * Note that this size must always be power-of-2 to match hash_obj
 * above.
 */
-   int new_hash_size = the_repository->parsed_objects->obj_hash_size < 32 
? 32 : 2 * the_repository->parsed_objects->obj_hash_size;
+   int new_hash_size = r->parsed_objects->obj_hash_size < 32 ? 32 : 2 * 
r->parsed_objects->obj_hash_size;
struct object **new_hash;
 
new_hash = xcalloc(new_hash_size, sizeof(struct object *));
-   for (i = 0; i < the_repository->parsed_objects->obj_hash_size; i++) {
-   struct object *obj = 
the_repository->parsed_objects->obj_hash[i];
+   for (i = 0; i < r->parsed_objects->obj_hash_size; i++) {
+   struct object *obj = r->parsed_objects->obj_hash[i];
+
if (!obj)
continue;
insert_obj_hash(obj, new_hash, new_hash_size);
}
-   free(the_repository->parsed_objects->obj_hash);
-   the_repository->parsed_objects->obj_hash = new_hash;
-   the_repository->parsed_objects->obj_hash_size = new_hash_size;
+   free(r->parsed_objects->obj_hash);
+   r->parsed_objects->obj_hash = new_hash;
+   r->parsed_objects->obj_hash_size = new_hash_size;
 }
 
 void *create_object_the_repository(const unsigned char *sha1, void *o)
-- 
2.17.0.255.g8bfb7c0704



[PATCH v3 05/13] alloc: add repository argument to alloc_tree_node

2018-05-08 Thread Stefan Beller
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 alloc.c | 2 +-
 cache.h | 3 ++-
 tree.c  | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 6c5c376a25a..2c8d1430758 100644
--- a/alloc.c
+++ b/alloc.c
@@ -57,7 +57,7 @@ void *alloc_blob_node_the_repository(void)
 }
 
 static struct alloc_state tree_state;
-void *alloc_tree_node(void)
+void *alloc_tree_node_the_repository(void)
 {
struct tree *t = alloc_node(_state, sizeof(struct tree));
t->object.type = OBJ_TREE;
diff --git a/cache.h b/cache.h
index 2258e611275..1717d07a2c5 100644
--- a/cache.h
+++ b/cache.h
@@ -1766,7 +1766,8 @@ void encode_85(char *buf, const unsigned char *data, int 
bytes);
 /* alloc.c */
 #define alloc_blob_node(r) alloc_blob_node_##r()
 extern void *alloc_blob_node_the_repository(void);
-extern void *alloc_tree_node(void);
+#define alloc_tree_node(r) alloc_tree_node_##r()
+extern void *alloc_tree_node_the_repository(void);
 extern void *alloc_commit_node(void);
 extern void *alloc_tag_node(void);
 extern void *alloc_object_node(void);
diff --git a/tree.c b/tree.c
index 63730e3fb46..58cf19b4fa8 100644
--- a/tree.c
+++ b/tree.c
@@ -197,7 +197,7 @@ struct tree *lookup_tree(const struct object_id *oid)
struct object *obj = lookup_object(oid->hash);
if (!obj)
return create_object(the_repository, oid->hash,
-alloc_tree_node());
+alloc_tree_node(the_repository));
return object_as_type(obj, OBJ_TREE, 0);
 }
 
-- 
2.17.0.255.g8bfb7c0704



[PATCH v3 01/13] repository: introduce parsed objects field

2018-05-08 Thread Stefan Beller
Convert the existing global cache for parsed objects (obj_hash) into
repository-specific parsed object caches. Existing code that uses
obj_hash are modified to use the parsed object cache of
the_repository; future patches will use the parsed object caches of
other repositories.

Another future use case for a pool of objects is ease of memory management
in revision walking: If we can free the rev-list related memory early in
pack-objects (e.g. part of repack operation) then it could lower memory
pressure significantly when running on large repos. While this has been
discussed on the mailing list lately, this series doesn't implement this.

Signed-off-by: Stefan Beller 
---
 object.c | 63 +---
 object.h |  8 +++
 repository.c |  7 ++
 repository.h |  9 
 4 files changed, 64 insertions(+), 23 deletions(-)

diff --git a/object.c b/object.c
index 5044d08e96c..f7c624a7ba6 100644
--- a/object.c
+++ b/object.c
@@ -8,17 +8,14 @@
 #include "object-store.h"
 #include "packfile.h"
 
-static struct object **obj_hash;
-static int nr_objs, obj_hash_size;
-
 unsigned int get_max_object_index(void)
 {
-   return obj_hash_size;
+   return the_repository->parsed_objects->obj_hash_size;
 }
 
 struct object *get_indexed_object(unsigned int idx)
 {
-   return obj_hash[idx];
+   return the_repository->parsed_objects->obj_hash[idx];
 }
 
 static const char *object_type_strings[] = {
@@ -90,15 +87,16 @@ struct object *lookup_object(const unsigned char *sha1)
unsigned int i, first;
struct object *obj;
 
-   if (!obj_hash)
+   if (!the_repository->parsed_objects->obj_hash)
return NULL;
 
-   first = i = hash_obj(sha1, obj_hash_size);
-   while ((obj = obj_hash[i]) != NULL) {
+   first = i = hash_obj(sha1,
+the_repository->parsed_objects->obj_hash_size);
+   while ((obj = the_repository->parsed_objects->obj_hash[i]) != NULL) {
if (!hashcmp(sha1, obj->oid.hash))
break;
i++;
-   if (i == obj_hash_size)
+   if (i == the_repository->parsed_objects->obj_hash_size)
i = 0;
}
if (obj && i != first) {
@@ -107,7 +105,8 @@ struct object *lookup_object(const unsigned char *sha1)
 * that we do not need to walk the hash table the next
 * time we look for it.
 */
-   SWAP(obj_hash[i], obj_hash[first]);
+   SWAP(the_repository->parsed_objects->obj_hash[i],
+the_repository->parsed_objects->obj_hash[first]);
}
return obj;
 }
@@ -124,19 +123,19 @@ static void grow_object_hash(void)
 * Note that this size must always be power-of-2 to match hash_obj
 * above.
 */
-   int new_hash_size = obj_hash_size < 32 ? 32 : 2 * obj_hash_size;
+   int new_hash_size = the_repository->parsed_objects->obj_hash_size < 32 
? 32 : 2 * the_repository->parsed_objects->obj_hash_size;
struct object **new_hash;
 
new_hash = xcalloc(new_hash_size, sizeof(struct object *));
-   for (i = 0; i < obj_hash_size; i++) {
-   struct object *obj = obj_hash[i];
+   for (i = 0; i < the_repository->parsed_objects->obj_hash_size; i++) {
+   struct object *obj = 
the_repository->parsed_objects->obj_hash[i];
if (!obj)
continue;
insert_obj_hash(obj, new_hash, new_hash_size);
}
-   free(obj_hash);
-   obj_hash = new_hash;
-   obj_hash_size = new_hash_size;
+   free(the_repository->parsed_objects->obj_hash);
+   the_repository->parsed_objects->obj_hash = new_hash;
+   the_repository->parsed_objects->obj_hash_size = new_hash_size;
 }
 
 void *create_object(const unsigned char *sha1, void *o)
@@ -147,11 +146,12 @@ void *create_object(const unsigned char *sha1, void *o)
obj->flags = 0;
hashcpy(obj->oid.hash, sha1);
 
-   if (obj_hash_size - 1 <= nr_objs * 2)
+   if (the_repository->parsed_objects->obj_hash_size - 1 <= 
the_repository->parsed_objects->nr_objs * 2)
grow_object_hash();
 
-   insert_obj_hash(obj, obj_hash, obj_hash_size);
-   nr_objs++;
+   insert_obj_hash(obj, the_repository->parsed_objects->obj_hash,
+   the_repository->parsed_objects->obj_hash_size);
+   the_repository->parsed_objects->nr_objs++;
return obj;
 }
 
@@ -431,8 +431,8 @@ void clear_object_flags(unsigned flags)
 {
int i;
 
-   for (i=0; i < obj_hash_size; i++) {
-   struct object *obj = obj_hash[i];
+   for (i=0; i < the_repository->parsed_objects->obj_hash_size; i++) {
+   struct object *obj = 
the_repository->parsed_objects->obj_hash[i];
if (obj)
obj->flags &= ~flags;
}
@@ 

[PATCH v3 00/13] object store: alloc

2018-05-08 Thread Stefan Beller
v3:

* I used the (soon to be renamed?) branch-diff tool to attach a diff below
  between v2 and v3 
  
* fixed comment in patch 1
* correctly free objects and its hashmap in the last patch.
* drop free'ing the commit->util pointer as we do not know where
  it points to.

v2:
* I decided to stick with alloc.c and not migrate it to the mem-pool for now.
  The reasoning for that is that mem-pool.h would introduce some alignment
  waste, which I really did not want to.
* renamed to struct parsed_object_pool;
* free'd the additional memory of trees and commits.
* do not special case the_repository for allocation purposes
* corrected commit messages
* I used the (soon to be renamed?) branch-diff tool to attach a diff below.
  (I still need to get used to that format, I find an interdiff of the
   branches easier to read, but that would not yield the commit messages)



v1:
This applies on top of sb/oid-object-info and is the logical continuum of
the series that it builds on; this brings the object store into more of
Gits code, removing global state, such that reasoning about the state of
the in-memory representation of the repository is easier.

My original plan was to convert lookup_commit_graft as the next series,
which would be similar to lookup_replace_object, as in sb/object-store-replace.
The grafts and shallow mechanism are very close to each other, such that
they need to be converted at the same time, both depending on the
"parsed object store" that is introduced in this commit.

The next series will then convert code in {object/blob/tree/commit/tag}.c
hopefully finishing the lookup_* functions.

I also debated if it is worth converting alloc.c via this patch series
or if it might make more sense to use the new mem-pool by Jameson[1].

I vaguely wonder about the performance impact, as the object allocation
code seemed to be relevant in the past.

[1] https://public-inbox.org/git/20180430153122.243976-1-jam...@microsoft.com/

Any comments welcome,
Thanks,
Stefan

Jonathan Nieder (1):
  object: add repository argument to grow_object_hash

Stefan Beller (12):
  repository: introduce parsed objects field
  object: add repository argument to create_object
  alloc: add repository argument to alloc_blob_node
  alloc: add repository argument to alloc_tree_node
  alloc: add repository argument to alloc_commit_node
  alloc: add repository argument to alloc_tag_node
  alloc: add repository argument to alloc_object_node
  alloc: add repository argument to alloc_report
  alloc: add repository argument to alloc_commit_index
  object: allow grow_object_hash to handle arbitrary repositories
  object: allow create_object to handle arbitrary repositories
  alloc: allow arbitrary repositories for alloc functions

 alloc.c   |  79 ++---
 alloc.h   |  23 ++
 blame.c   |   3 +-
 blob.c|   5 ++-
 cache.h   |   9 
 commit.c  |   4 +-
 merge-recursive.c |   3 +-
 object.c  | 108 ++
 object.h  |  18 +++-
 repository.c  |   7 +++
 repository.h  |   9 
 tag.c |   4 +-
 tree.c|   4 +-
 13 files changed, 208 insertions(+), 68 deletions(-)
 create mode 100644 alloc.h

-- 
2.17.0.255.g8bfb7c0704

1:  9efc685875b ! 1:  f8e521c7c11 repository: introduce parsed objects field
@@ -15,7 +15,6 @@
 discussed on the mailing list lately, this series doesn't implement 
this.
 
 Signed-off-by: Stefan Beller 
-Signed-off-by: Junio C Hamano 
 
 diff --git a/object.c b/object.c
 --- a/object.c
@@ -224,13 +223,6 @@
 --- a/repository.h
 +++ b/repository.h
 @@
-   char *commondir;
- 
-   /*
--   * Holds any information related to accessing the raw object content.
-+   * Holds any information needed to retrieve the raw content
-+   * of objects. The object_parser uses this to get object
-+   * content which it then parses.
 */
struct raw_object_store *objects;
  
2:  0d41290a9e6 = 2:  55c555b32eb object: add repository argument to 
create_object
3:  0242ec870f5 = 3:  f1661c9e46a object: add repository argument to 
grow_object_hash
4:  9a6aeee10db = 4:  f72b25db946 alloc: add repository argument to 
alloc_blob_node
5:  f7ed8da3909 = 5:  87b7ddda195 alloc: add repository argument to 
alloc_tree_node
6:  253f1bf5c2a = 6:  4480e916bdf alloc: add repository argument to 
alloc_commit_node
7:  4f8d3dfd460 = 7:  c3aa2a7c252 alloc: add repository argument to 
alloc_tag_node
8:  6ce5d5b0f0e = 8:  59d33cfaff2 alloc: add repository argument to 
alloc_object_node
9:  104f158fc37 = 9:  2ba78c289c1 alloc: add repository argument to alloc_report
10:  38d90052c29 = 10:  10ce6c44d4b alloc: add repository argument to 
alloc_commit_index
11:  eae3dea5763 = 11:  eae95e75b0b object: allow grow_object_hash to handle 
arbitrary repositories

[PATCH v3 02/13] object: add repository argument to create_object

2018-05-08 Thread Stefan Beller
Add a repository argument to allow the callers of create_object
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

Signed-off-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 blob.c   | 4 +++-
 commit.c | 3 ++-
 object.c | 5 +++--
 object.h | 3 ++-
 tag.c| 3 ++-
 tree.c   | 3 ++-
 6 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/blob.c b/blob.c
index fa2ab4f7a74..85c2143f299 100644
--- a/blob.c
+++ b/blob.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "blob.h"
+#include "repository.h"
 
 const char *blob_type = "blob";
 
@@ -7,7 +8,8 @@ struct blob *lookup_blob(const struct object_id *oid)
 {
struct object *obj = lookup_object(oid->hash);
if (!obj)
-   return create_object(oid->hash, alloc_blob_node());
+   return create_object(the_repository, oid->hash,
+alloc_blob_node());
return object_as_type(obj, OBJ_BLOB, 0);
 }
 
diff --git a/commit.c b/commit.c
index ca474a7c112..9106acf0aad 100644
--- a/commit.c
+++ b/commit.c
@@ -50,7 +50,8 @@ struct commit *lookup_commit(const struct object_id *oid)
 {
struct object *obj = lookup_object(oid->hash);
if (!obj)
-   return create_object(oid->hash, alloc_commit_node());
+   return create_object(the_repository, oid->hash,
+alloc_commit_node());
return object_as_type(obj, OBJ_COMMIT, 0);
 }
 
diff --git a/object.c b/object.c
index f7c624a7ba6..2de029275bc 100644
--- a/object.c
+++ b/object.c
@@ -138,7 +138,7 @@ static void grow_object_hash(void)
the_repository->parsed_objects->obj_hash_size = new_hash_size;
 }
 
-void *create_object(const unsigned char *sha1, void *o)
+void *create_object_the_repository(const unsigned char *sha1, void *o)
 {
struct object *obj = o;
 
@@ -178,7 +178,8 @@ struct object *lookup_unknown_object(const unsigned char 
*sha1)
 {
struct object *obj = lookup_object(sha1);
if (!obj)
-   obj = create_object(sha1, alloc_object_node());
+   obj = create_object(the_repository, sha1,
+   alloc_object_node());
return obj;
 }
 
diff --git a/object.h b/object.h
index cecda7da370..2cb0b241083 100644
--- a/object.h
+++ b/object.h
@@ -93,7 +93,8 @@ extern struct object *get_indexed_object(unsigned int);
  */
 struct object *lookup_object(const unsigned char *sha1);
 
-extern void *create_object(const unsigned char *sha1, void *obj);
+#define create_object(r, s, o) create_object_##r(s, o)
+extern void *create_object_the_repository(const unsigned char *sha1, void 
*obj);
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
diff --git a/tag.c b/tag.c
index 3d37c1bd251..7150b759d66 100644
--- a/tag.c
+++ b/tag.c
@@ -93,7 +93,8 @@ struct tag *lookup_tag(const struct object_id *oid)
 {
struct object *obj = lookup_object(oid->hash);
if (!obj)
-   return create_object(oid->hash, alloc_tag_node());
+   return create_object(the_repository, oid->hash,
+alloc_tag_node());
return object_as_type(obj, OBJ_TAG, 0);
 }
 
diff --git a/tree.c b/tree.c
index 1c68ea586bd..63730e3fb46 100644
--- a/tree.c
+++ b/tree.c
@@ -196,7 +196,8 @@ struct tree *lookup_tree(const struct object_id *oid)
 {
struct object *obj = lookup_object(oid->hash);
if (!obj)
-   return create_object(oid->hash, alloc_tree_node());
+   return create_object(the_repository, oid->hash,
+alloc_tree_node());
return object_as_type(obj, OBJ_TREE, 0);
 }
 
-- 
2.17.0.255.g8bfb7c0704



Re: [PATCH] pack-format.txt: more details on pack file format

2018-05-08 Thread Stefan Beller
>>> +Deltified representation
>>
>> Does this refer to OFS delta as well as REF deltas?
>
> Yes. Both OFS and REF deltas have the same "body" which is what this
> part is about. The differences between OFS and REF deltas are not
> described (in fact I don't think we describe what OFS and REF deltas
> are at all).

Maybe we should?

>
>>> is a sequence of one byte command optionally
>>> +followed by more data for the command. The following commands are
>>> +recognized:
>>
>> So a Deltified representation of an object is a 6 or 7 in the 3 bit type
>> and then the length. Then a command is shown how to construct
>> the object based on other objects. Can there be more commands?
>>
>>> +- If bit 7 is set, the remaining bits in the command byte specifies
>>> +  how to extract copy offset and size to copy. The following must be
>>> +  evaluated in this exact order:
>>
>> So there are 2 modes, and the high bit indicates which mode is used.
>> You start describing the more complicated mode first,
>> maybe give names to both of them? "direct copy" (below) and
>> "compressed copy with offset" ?
>
> I started to update this more because even this text is hard to get
> even to me. So let's get the background first.
>
> We have a source object somewhere (the object name comes from ofs/ref
> delta's header), basically we have the whole content. This delta
> thingy tells us how to use that source object to create a new (target)
> object.
>
> The delta is actually a sequence of instructions (of variable length).

The previous paragraph and this sentence are great for my understanding.
thanks! (Maybe keep it in a similar form around?)

> One is for copying from the source object.

ok that makes sense. I can think of it as a "HTTP range request", just
optimized for packfiles and the source is inside the same pack.
So it would say "Goto object  and copy bytes 13-168 here"

> The other copies from the
> delta itself

itself means the same object here, that we are describing here?
or does it mean other deltas?

> (e.g. this is new data in the target which is not
> available anywhere in the source object to copy from).




>
> The instruction looks like this
>
> bit  0123   4  5  6
>   +--+++++--+--+--+
>   | 1xxx | offset | offset | offset | offset | size | size | size |
>   +--+++++--+--+--+
>
> Here you can see it in its full form, each box represents a byte. The
> first byte has bit 7 set as mentioned. We can see here that offsets
> (where to copy from in the source object) takes 4 bytes and size (how
> many bytes to copy) takes 3. Offset size size is in LSB order.
>
> The "xxx" part lets us shrink this down.

.. by indicating how much prefix we can skip and assume it be all zero(?)

> If the offset can fit in
> 16 bits, there's no reason to waste the last two bytes describing
> zero. Each 'x' marks whether the corresponding byte is present.

So for a full instruction (as above), we'd have to

1  111 <4 bytes offset> <3 bytes size>

for smaller instructions we have

1 1100 100 <2 bytes offset> <1 byte size>
and here the offset is in range 0..64k and
the size is 1-255 or 0x1 ?


Modes to skip bytes in between are not allowed, e.g.
1 1101 101 < 3 bytes of offsets> <2 bytes of size>
and the missing bytes would be assumed to be 0?

> The
> bit number is in the first row. So if you have offset 255 and size 1,
> the instruction is three bytes 10010001b, 255,

Oh it is the other way round, the size will be just one byte,
indicating we can have a range of 1-255 or 0x1 and an
offset of 0..255.

>
> I think this is a corner case in this format. I think Nico meant to
> specify consecutive bytes: if size is 2 bytes then you have to specify
> _both_ of them even if the first byte could be zero and omitted.

So it is not a mutually exclusive group, but a sequence (similar as in
git-bisect), where we start with 0 and end with exactly one edge
in between (sort of, we can also start with 1, then we have to have
all 1s)

> The implementation detail is, if bit 6 is set but bit 4 is not, then
> the size value is pretty much random. It's only when bit 4 is set that
> we first clear out "size" and start adding bits to it.

That sounds similar to what I spelled out above.

Thanks for taking on the documentation here.
The box with numbers really helped me!

Stefan


Re: [PATCH v3 09/12] get_short_oid / peel_onion: ^{tree} should be tree, not treeish

2018-05-08 Thread Ævar Arnfjörð Bjarmason

On Tue, May 08 2018, Jeff King wrote:

> On Mon, May 07, 2018 at 01:08:46PM +0900, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason  writes:
>>
>> > Right, and I'm with you so far, this makes sense to me for all existing
>> > uses of the peel syntax, otherwise v2.17.0^{tree} wouldn't be the same
>> > as rev-parse v2.17.0^{tree}^{tree}...
>>
>> More importantly, you could spell v2.17.0 part of the above with a
>> short hexadecimal string.  And that string should be naming some
>> tree-ish, the most important thing being that it is *NOT* required
>> to be a tree (and practically, it is likely that the user has a
>> tree-ish that is *NOT* a tree).
>>
>> I guess I have a reaction to the title
>>
>> "get_short_oid/peel_onion: ^{tree} should be tree"
>>
>> "X^{tree}" should *RESULT* in a tree, but it should *REQUIRE* X to
>> be a tree-ish.  It is unclear "should be tree" is about the former
>> and I read (perhaps mis-read) it as saying "it should require X to
>> be a tree"---that statement is utterly incorrect as we agreed above.
>
> FWIW, I had the same feeling as you when reading this, that this commit
> (and the one after) are doing the wrong thing. And these paragraphs sum
> it up. The "^{tree}" is about asking us to peel to a tree, not about
> resolving X in the first place. We can use it as a _hint_ when resolving
> X, but the correct hint is "something that can be peeled to a tree", not
> "is definitely a tree".

Maybe I'm just being dense, but I still don't get from this & Junio's
E-Mails what the general rule should be.

I think a response to the part after "leaving that aside" of my upthread
E-Mail
(https://public-inbox.org/git/87lgczyfq6@evledraar.gmail.com/) would
help me out.

Not to belabor the point, but here's a patch I came up with to
revisions.txt that's a WIP version of something that would describe the
worldview after this v3:

diff --git a/Documentation/revisions.txt b/Documentation/revisions.txt
index dfcc49c72c..0bf68f4ad2 100644
--- a/Documentation/revisions.txt
+++ b/Documentation/revisions.txt
@@ -143,29 +143,52 @@ thing no matter the case.
   '{caret}1{caret}1{caret}1'.  See below for an illustration of
   the usage of this form.

 '{caret}{}', e.g. 'v0.99.8{caret}\{commit\}'::
   A suffix '{caret}' followed by an object type name enclosed in
   brace pair means dereference the object at '' recursively until
   an object of type '' is found or the object cannot be
-  dereferenced anymore (in which case, barf).
+  dereferenced anymore (in which case either return that object type, or 
barf).
   For example, if '' is a commit-ish, '{caret}\{commit\}'
   describes the corresponding commit object.
   Similarly, if '' is a tree-ish, '{caret}\{tree\}'
   describes the corresponding tree object.
   '{caret}0'
   is a short-hand for '{caret}\{commit\}'.
 +
 'rev{caret}\{object\}' can be used to make sure 'rev' names an
 object that exists, without requiring 'rev' to be a tag, and
 without dereferencing 'rev'; because a tag is already an object,
 it does not have to be dereferenced even once to get to an object.
 +
 'rev{caret}\{tag\}' can be used to ensure that 'rev' identifies an
 existing tag object.
++
+Object types that don't have a 1=1 mapping to other object types
+cannot be dereferenced with the peel syntax, and will return an
+error. E.g. '{caret}{commit}' or '{caret}{tree}' is
+allowed because a tag can only point to one commit, and a commit can
+only point to one tree. But '{caret}{blob}' will always
+produce an error since trees in general don't 1=1 map to blobs, even
+though the specific '' in question might only contain one
+blob. Note that '{caret}{blob}' is not an error if '' is
+a tag that points directly to a blob, since that again becomes
+unambiguous.
++
+'{caret}{}' takes on a different meaning when '' is a
+SHA-1 that's ambiguous within the object store. In that case we don't
+have a 1=1 mapping anymore. E.g. e8f2 in git.git can refer to multiple
+objects of all the different object types. In that case
+{caret}{} should always be an error to be consistent with the
+logic above, but that wouldn't be useful to anybody. Instead it'll
+fall back to being selector syntax for the given object types,
+e.g. e8f2{caret}{tag} will (as of writing this) return the v2.17.0
+tag, and {caret}{commit}, {caret}{tree} and {caret}{blob} will return
+commit, tree and blob objects, respectively.
+
[...]

My understanding of what you two are saying is that somehow the peel
semantics should be preserved when we take this beyond the 1=1 mapping
case, but I don't see how if we run with that how we wouldn't need to
introduce the concept of blobish for consistency as I noted upthread.

So it would be very useful to me if you or someone who understands the
behavior you & 

Re: [PATCH v2 13/13] alloc: allow arbitrary repositories for alloc functions

2018-05-08 Thread Stefan Beller
On Tue, May 8, 2018 at 8:00 AM, Duy Nguyen  wrote:
> On Tue, May 8, 2018 at 12:59 AM, Stefan Beller  wrote:
>> @@ -501,9 +509,31 @@ void raw_object_store_clear(struct raw_object_store *o)
>>  void parsed_object_pool_clear(struct parsed_object_pool *o)
>>  {
>> /*
>> -* TOOD free objects in o->obj_hash.
>> -*
>>  * As objects are allocated in slabs (see alloc.c), we do
>>  * not need to free each object, but each slab instead.
>> +*
>> +* Before doing so, we need to free any additional memory
>> +* the objects may hold.
>>  */
>> +   unsigned i;
>> +
>> +   for (i = 0; i < o->obj_hash_size; i++) {
>> +   struct object *obj = o->obj_hash[i];
>> +
>> +   if (!obj)
>> +   continue;
>> +
>> +   if (obj->type == OBJ_TREE) {
>> +   free(((struct tree*)obj)->buffer);
>
> It would be nicer to keep this in separate functions, e.g.
> release_tree_node() and release_commit_node() to go with
> alloc_xxx_node().

ok, I can introduce that, although it seems unnecessary complicated
for now.

On top of this series I started an experiment (which rewrites alloc
and object.c a whole lot more; for performance reasons), which gets
rid of the multiple alloc_states. There will be only one allocation for
one repository, it can allocate across multiple types without alignment
overhead. It will reduce memory footprint of obj_hash by half, via
storing indexes instead of pointers in there.
That said, the experiment shall not influence the
direction of this series. Will fix.

>> +   } else if (obj->type == OBJ_COMMIT) {
>> +   free_commit_list(((struct commit*)obj)->parents);
>> +   free(&((struct commit*)obj)->util);
>> +   }
>> +   }
>
> I still don't see who frees obj_hash[] (or at least clears it if not
> freed). If I'm going to use this to free memory in pack-objects then
> I'd really prefer obj_hash[] freed because it's a big _big_ array.

gah!

> Just to be clear, what I mean is
>
> FREE_AND_NULL(o->obj_hash);
> o->obj_hash_size = 0;

ok, I just put it here, just before the calls
to clear_alloc_state()s.


Re: [PATCH 2/2] Documentation: render revisions correctly under Asciidoctor

2018-05-08 Thread Martin Ågren
On 8 May 2018 at 03:13, brian m. carlson  wrote:
> On Mon, May 07, 2018 at 06:11:43AM +0200, Martin Ågren wrote:
>> Excellent. These two patches fix my original problem and seem like the
>> obviously correct approach (in hindsight ;-) ). I wonder if the diagrams
>> earlier in the file should be tackled also. Right now, one has a huge
>> number of dots instead of the four you added; the other has none. They
>> do render fine though, so that'd only be about consistency in the
>> original .txt-file.
>
> Yeah, the number of dots has to be at least four, but can be any
> matching number.
>
> Since this patch fixes the present issue, I'd like to leave it as it is
> and run through a cleanup series a bit later that catches all the
> literal indented blocks and marks them explicitly to avoid this issue in
> the future.

Sounds like a plan. :-) As noted elsewhere, I have no immediate idea how
to identify which of the 13000 tab-indented lines are really part of
diagrams and ascii-art. Counting the number of spaces? Anyway, thanks
for this mini-series.

Martin


Re: [PATCH 06/28] t0000: annotate with SHA1 prerequisite

2018-05-08 Thread Martin Ågren
On 8 May 2018 at 01:40, brian m. carlson  wrote:
> On Mon, May 07, 2018 at 12:24:13PM +0200, Martin Ågren wrote:
>> This could be more centralized at the top of the file, more likely to be
>> noticed during a `make test` and easier to adapt in the future. Maybe
>> something like this at the top of the file:
>>
>> if hash for storage is SHA-1 then
>> knowntree=7bb943559a305bdd6bdee2cef6e5df2413c3d30a
>> path0=f87290f8eb2cbbea7857214459a0739927eab154
>> 
>> else
>> skip_all='unknown hash, but you really need to expand this test'
>> # or even error out!
>> fi
>
> As I mentioned in an earlier email, I plan to set an environment
> variable for the algorithms in use and then do something like:
>
>   test "$tree" = "$(test-tool hash-helper --output known-tree)"
>
> where "known-tree" is some key we can use to look up the SHA-1 or
> NewHash value, and we've specified we want the output format (as opposed
> to input format or on-disk format).

My first thought was, can't we introduce such a helper already now? It
would not check with the environment, but would always output SHA-1
values. Thinking about it some more, maybe the exact usage/interface
would be a bit volatile in the beginning, making it just as good an idea
to gain some more experience and feeling first.

>> When we add NewHash, we get to add an "else if". Otherwise, we'd be
>> plugging in NewHash without testing some very basic stuff.
>>
>> Ok, so `skip_all` is quite hard, but skipping certain basic tests also
>> feels really shaky. Adding a new hash algo without adapting this test
>> should be a no-go, IMHO.
>
> I agree that this test needs to be updated for NewHash, but since we
> haven't decided what that's going to be, it makes sense during
> development to still test the rest of the code if possible so that we
> know what does and doesn't work.
>
> This is a first step in making it obvious what doesn't work, and when we
> know what the data is supposed to be, we can adjust it by fixing the
> tests so that it works in all cases.

A lingering feeling is, how do we find all these tests that we "hide"
(for lack of a better word)? Maybe introduce some standardized comment
keyword. Or use a more grep-friendly prereq-name. Or, if all occurances
of "SHA1" (outside of t-*sha*) are expected to go away real soon
now, maybe this is not an issue.

Anyway, thanks for putting up with me, and thanks a big lot for driving
this major undertaking forward.

Martin


Re: [PATCH 01/28] t/test-lib: add an SHA1 prerequisite

2018-05-08 Thread Martin Ågren
On 8 May 2018 at 01:30, brian m. carlson  wrote:
> On Mon, May 07, 2018 at 12:10:39PM +0200, Martin Ågren wrote:
>> On 7 May 2018 at 01:17, brian m. carlson  
>> wrote:
>> > Add an SHA1 prerequisite to annotate both of these types of tests and
>> > disable them when we're using a different hash.  In the future, we can
>> > create versions of these tests which handle both SHA-1 and NewHash.
>>
>> Minor nit: s/can/can and should/
>
> I agree with that sentiment.  I can change that.

To be clear, this was an "if you have independent reasons for rerolling
this, then maybe consider possibly doing this".

>> So SHA1 means roughly "git hash-object uses SHA-1, so supposedly
>> everything on disk is SHA-1." I could imagine one or two different
>> meanings: "Git was compiled with support for SHA-1 [oids]."
>
> Currently it means that, yes.  It may specialize to mean, "git emits
> SHA-1 output, regardless of the format on disk."  See cases 1 and 2
> below.
>
>> Do we actually need more SHA-1-related prereqs, at least long-term, in
>> which case we would want to find a more specific name for this one now?
>> Is this SHA1_STORAGE, or some much better name than that?
>
> We may.  The transition plan anticipates several states:

"We may" as in, "we may need more SHA1-FOO prereqs later", or "we may
want this to be SHA1-BAR"?

> 1. Store data in NewHash, but input and output are SHA-1.
> 2. Store data in NewHash; output is SHA-1; input can be either.
> 3. Store data and output in NewHash; input can be either.
> 4. All NewHash.
>
> At this point, I'm working on getting the tests to handle case 4, as
> that's actually the easiest to fix (because if things are wrong, the
> code tends to segfault).  Case 1 will be next, in which case, we may
> need SHA1_STORAGE or such.

> I plan to make the SHA1 prerequisite go away or at least be far less
> used in a few releases.  Once we know what NewHash is going to be, it's
> pretty easy to write tooling and translation tables that do the
> conversion for most tests, and a test helper can simply emit the right
> output for whichever algorithm we're using in that situation, whether
> that's the on-disk algorithm, the input algorithm, or the output
> algorithm.

I do not feel entirely relaxed about a reasoning such as "this prereq
will soon go away again, so we do not need to think too much about its
name and meaning" (heavily paraphrased and possibly a bit pointed, but
hopefully not too dishonest).

I guess a counter-argument might be "sure, if only we knew which
SHA1-FOOs we will need. Only time and experience will tell." You've
certainly spent way more brain-cycles on this than I have, and most
likely more than anyone else on this list.

Maybe we want to document the transition-ness of this in the code and/or
the commit message. Not only "transition" in the sense of the big
transition, but in the sense of "this will probably go away long before
the transition is completed."

>> I am thinking for example about a repo with NewHash that gets pushed to
>> and fetched from a SHA-1 server, see hash-function-transition.txt, goal
>> 1b. We'd want to always test that SHA-1-related functionality in git.
>> (But only until the day when someone defines a prereq such as "SHA1" to
>> be able to test a git that was compiled without any traces of SHA-1
>> whatsoever.)
>
> I anticipate that by the time we get to that point, the SHA1
> prerequisite will be long gone and can be reused for that purpose,
> should we need it.


Re: [PATCH] pack-format.txt: more details on pack file format

2018-05-08 Thread Duy Nguyen
On Tue, May 8, 2018 at 8:21 PM, Ævar Arnfjörð Bjarmason
 wrote:
>
> On Tue, May 08 2018, Nguyễn Thái Ngọc Duy wrote:
>
>> The current document mentions OBJ_* constants without their actual
>> values. A git developer would know these are from cache.h but that's
>> not very friendly to a person who wants to read this file to implement
>> a pack file parser.
>>
>> Similarly, the deltified representation is not documented at all (the
>> "document" is basically patch-delta.c). Translate that C code in
>> English.
>
> Thanks, will drop my version from v4, although a comment for the enum in
> cache.h pointing the reader to these docs would be very useful.

True. I will add some together with the pack-format.txt update.
-- 
Duy


Re: [PATCH 0/5] getting rid of most "static struct lock_file"s

2018-05-08 Thread Jeff King
On Sun, May 06, 2018 at 04:10:26PM +0200, Martin Ågren wrote:

> This series addresses two classes of "static struct lock_file", removing
> the staticness: Those locks that already live inside a function, and
> those that can simply be moved into the function they are used from.
> 
> The first three patches are some cleanups I noticed along the way, where
> we first take a lock using LOCK_DIE_ON_ERROR, then check whether we got
> it.
> 
> After this series, we have a small number of "static struct lock_file"
> left, namely those locks that are used from within more than one
> function. Thus, after this series, when one stumbles upon a static
> lock_file, it should be a clear warning that the lock is being
> used by more than one function.

These all look fine to me. The commit messages all made perfect sense to
me, but it sounds like some people weren't aware of the new
post-076aa2cbda rules. So maybe it makes sense to reference that even in
the earlier commits, and to explicitly say that it's safe to convert
even in the case where the lock_file goes out of scope while still
active.

The only dangerous thing left to check for is anybody holding onto a
pointer-to-lockfile. The only such pointer declared outside of a
parameter list is in create_reflock(), and there it's just to
temporarily coerce a void pointer. So unless somebody is doing something
really tricky (putting a pointer-to-lock in a "void *"), I think these
conversions all have to be trivially correct (famous last words...).

-Peff


Re: [PATCH] pack-format.txt: more details on pack file format

2018-05-08 Thread Duy Nguyen
On Tue, May 8, 2018 at 7:23 PM, Stefan Beller  wrote:
>>  While at there, I also add some text about this obscure delta format.
>>  We occasionally have questions about this on the mailing list if I
>>  remember correctly.
>
> Let me see if I can understand it, as I am not well versed in the
> delta format, so ideally I would understand it from the patch here?

Well yes. I don't expect my first version to be that easy to
understand. This is where you come in to help ;-)

>> +Valid object types are:
>> +
>> +- OBJ_COMMIT (1)
>> +- OBJ_TREE (2)
>> +- OBJ_BLOB (3)
>> +- OBJ_TAG (4)
>> +- OBJ_OFS_DELTA (6)
>> +- OBJ_REF_DELTA (7)
>> +
>> +Type 5 is reserved for future expansion.
>
> and type 0 as well, but that is not spelled out?

type 0 is invalid. I think in some encoding it's not even possible to
encode zero. Anyway yes it should be spelled out.

>
>> +Deltified representation
>
> Does this refer to OFS delta as well as REF deltas?

Yes. Both OFS and REF deltas have the same "body" which is what this
part is about. The differences between OFS and REF deltas are not
described (in fact I don't think we describe what OFS and REF deltas
are at all).

>> is a sequence of one byte command optionally
>> +followed by more data for the command. The following commands are
>> +recognized:
>
> So a Deltified representation of an object is a 6 or 7 in the 3 bit type
> and then the length. Then a command is shown how to construct
> the object based on other objects. Can there be more commands?
>
>> +- If bit 7 is set, the remaining bits in the command byte specifies
>> +  how to extract copy offset and size to copy. The following must be
>> +  evaluated in this exact order:
>
> So there are 2 modes, and the high bit indicates which mode is used.
> You start describing the more complicated mode first,
> maybe give names to both of them? "direct copy" (below) and
> "compressed copy with offset" ?

I started to update this more because even this text is hard to get
even to me. So let's get the background first.

We have a source object somewhere (the object name comes from ofs/ref
delta's header), basically we have the whole content. This delta
thingy tells us how to use that source object to create a new (target)
object.

The delta is actually a sequence of instructions (of variable length).
One is for copying from the source object. The other copies from the
delta itself (e.g. this is new data in the target which is not
available anywhere in the source object to copy from). The last bit of
the first byte determines what instruction type it is.


>> +  - If bit 0 is set, the following byte contains bits 0-7 of the copy
>> +offset (this also resets all other bits in the copy offset to
>> +zero).
>> +  - If bit 1 is set, the following byte contains bits 8-15 of the copy
>> +offset.
>> +  - If bit 2 is set, the following byte contains bits 16-23 of the
>> +copy offset.
>> +  - If bit 3 is set, the following byte contains bits 24-31 of the
>> +copy offset.
>
> I assume these bits are exclusive, i.e. if bit 3 is set, bits 0-2 are not
> allowed to be set. What happens if they are set, do we care?
>
> If bit 3 is set, then the following byte contains 24-31 of the copy offset,
> where is the rest? Do I wait for another command byte with
> bits 2,1,0 to learn about the body offsets, or do they follow the
> following byte? Something like:
>
>   "If bit 3 is set, then the next 4 bytes are the copy offset,
>   in network byte order"

My first attempt at "translating to English" is like a constructing C
from assembly: it's horrible.

The instruction looks like this

bit  0123   4  5  6
  +--+++++--+--+--+
  | 1xxx | offset | offset | offset | offset | size | size | size |
  +--+++++--+--+--+

Here you can see it in its full form, each box represents a byte. The
first byte has bit 7 set as mentioned. We can see here that offsets
(where to copy from in the source object) takes 4 bytes and size (how
many bytes to copy) takes 3. Offset size size is in LSB order.

The "xxx" part lets us shrink this down. If the offset can fit in
16 bits, there's no reason to waste the last two bytes describing
zero. Each 'x' marks whether the corresponding byte is present. The
bit number is in the first row. So if you have offset 255 and size 1,
the instruction is three bytes 10010001b, 255, 1. The octets on "bit
column" 1, 2, 3, 5 and 6 are missing because the corresponding bit in
the first bit is not set.

>> +  - If bit 4 is set, the following byte contains bits 0-7 of the copy
>> +size (this also resets all other bits in the copy size to zero_.
>> +  - If bit 5 is set, the following byte contains bits 8-15 of the copy
>> +size.
>> +  - If bit 6 is set, the following byte contains bits 16-23 of the
>> +copy size.
>
> bits 4-7 seem to be another 

Re: [PATCH] pack-format.txt: more details on pack file format

2018-05-08 Thread Ævar Arnfjörð Bjarmason

On Tue, May 08 2018, Nguyễn Thái Ngọc Duy wrote:

> The current document mentions OBJ_* constants without their actual
> values. A git developer would know these are from cache.h but that's
> not very friendly to a person who wants to read this file to implement
> a pack file parser.
>
> Similarly, the deltified representation is not documented at all (the
> "document" is basically patch-delta.c). Translate that C code in
> English.

Thanks, will drop my version from v4, although a comment for the enum in
cache.h pointing the reader to these docs would be very useful.


Re: [PATCH 4/5] lock_file: make function-local locks non-static

2018-05-08 Thread Jeff King
On Mon, May 07, 2018 at 05:24:05PM +0200, Duy Nguyen wrote:

>  -   static struct lock_file lock;
>  +   struct lock_file lock = LOCK_INIT;
> >>>
> >>> Is it really safe to do this? I vaguely remember something about
> >>> (global) linked list and signal handling which could trigger any time
> >>> and probably at atexit() time too (i.e. die()). You don't want to
> >>> depend on stack-based variables in that case.
> >>
> >> So I dug in a bit more about this. The original implementation does
> >> not allow stack-based lock files at all in 415e96c8b7 ([PATCH]
> >> Implement git-checkout-cache -u to update stat information in the
> >> cache. - 2005-05-15). The situation has changed since 422a21c6a0
> >> (tempfile: remove deactivated list entries - 2017-09-05). At the end
> >> of that second commit, Jeff mentioned "We can clean them up
> >> individually" which I guess is what these patches do. Though I do not
> >> know if we need to make sure to call "release" function or something/
> >> Either way you need more explanation and assurance than just "we can
> >> drop their staticness" in the commit mesage.
> >
> > Thank you Duy for your comments. How about I write the commit message
> > like so:
> 
> +Jeff. Since he made it possible to remove lock file from the global
> linked list, he probably knows well what to check when switching from
> a static lock file to a stack-local one.

It should be totally safe. If you look at "struct lock_file", it is now
simply a pointer to a tempfile allocated on the heap (in fact, I thought
about getting rid of lock_file entirely, but the diff is noisy and it
actually has some value as an abstraction over a pure tempfile).

If you fail to call a release function, it will just hang around until
program exit, which is more or less what the static version would do.
The big difference is that if we re-enter the function while still
holding the lock, then the static version would BUG() on trying to use
the already-active lockfile. Whereas after this series, we'd try to
create a new lockfile and say "woah, somebody else is holding the lock".

> >   After 076aa2cbd (tempfile: auto-allocate tempfiles on heap, 2017-09-05),
> >   we can have lockfiles on the stack. These `struct lock_file`s are local
> >   to their respective functions and we can drop their staticness.
> >
> >   Each of these users either commits or rolls back the lock in every
> >   codepath, with these possible exceptions:
> >
> > * We bail using a call to `die()` or `exit()`. The lock will be
> >   cleaned up automatically.
> >
> > * We return early from a function `cmd_foo()` in builtin/, i.e., we
> >   are just about to exit. The lock will be cleaned up automatically.
> 
> There are also signals which can be caught and run on its own stack (I
> think) so whatever variable on the current stack should be safe, I
> guess.

Yes, the stack variables should all be intact during an exit or a
signal.

> >   If I have missed some codepath where we do not exit, yet leave a locked
> >   lock around, that was so also before this patch. If we would later
> >   re-enter the same function, then before this patch, we would be retaking
> >   a lock for the very same `struct lock_file`, which feels awkward, but to
> >   the best of my reading has well-defined behavior. Whereas after this
> >   patch, we would attempt to take the lock with a completely fresh `struct
> >   lock_file`. In both cases, the result would simply be that the lock can
> >   not be taken, which is a situation we already handle.
> 
> There is a difference here, if the lock is not released properly,
> previously the lockfile is still untouched. If it's on stack, it may
> be overwritten which can corrupt the linked list to get to the next
> lock file.  (and this is about calling the function in question just
> _once_ not the second time).

The only bits on the stack are just a pointer to the list item. So the
linked list is fine if it goes out of scope while the tempfile is still
active. That was the point of 076aa2cbd.

-Peff


Re: [PATCH 3/5] refs.c: drop dead code checking lock status in `delete_pseudoref()`

2018-05-08 Thread Jeff King
On Sun, May 06, 2018 at 04:10:29PM +0200, Martin Ågren wrote:

> After taking the lock we check whether we got it and die otherwise. But
> since we take the lock using `LOCK_DIE_ON_ERROR`, we would already have
> died.
> 
> Unlike in the previous patch, this function is not prepared for
> indicating errors via a `strbuf err`, so let's just drop the dead code.
> Any improved error-handling can be added later.

I suspect this ought to eventually be converted to return an error, to
match the rest of the refs code. And in that sense, this is a step
backwards versus leaving the current die_errno in place and dropping the
LOCK_DIE_ON_ERROR flag. But it probably doesn't matter much either way,
and whoever does the conversion later can deal with it.

-Peff


Re: [PATCH 2/5] refs.c: do not die if locking fails in `write_pseudoref()`

2018-05-08 Thread Jeff King
On Mon, May 07, 2018 at 05:12:27AM -0600, David Turner wrote:

> >Right. After commit 076aa2cbda (tempfile: auto-allocate tempfiles on
> >heap, 2017-09-05) this is safe though. Quite a few locks have already
> >been moved to the stack, e.g., in 14bca6c63c (sequencer: make lockfiles
> >non-static, 2018-02-27) and 02ae242fdd (checkout-index: simplify
> >locking
> >logic, 2017-10-05).  I could add a note to the commit message to make
> >this clear, like "After 076aa2cbda, locks no longer need to be static."
> 
> I am going to reply now to keep the thread moving, but I am on my
> phone with bad connectivity (few cell towers in Bears Ears), so I
> can't really check the code. Feel free to disregard if I am still
> wrong.
> 
> I saw that patch, but I thought the new logic required that cleanup
> funtions be called before the lock goes out of scope.

No, it should be fine. After 422a21c6a0 (tempfile: remove deactivated
list entries, 2017-09-05) it became _possible_ to use a non-static
tempfile. But it was dangerous, because if you failed to clean up, bad
things would happen. So right after that in 076aa2cbda we switched to
using the heap, which means the tempfile code takes full ownership, and
the local lockfile variable is just a pointer to that storage.

-Peff


Re: [PATCH v2 13/13] alloc: allow arbitrary repositories for alloc functions

2018-05-08 Thread Jonathan Tan
On Mon,  7 May 2018 15:59:16 -0700
Stefan Beller  wrote:

> + for (i = 0; i < o->obj_hash_size; i++) {
> + struct object *obj = o->obj_hash[i];
> +
> + if (!obj)
> + continue;
> +
> + if (obj->type == OBJ_TREE) {
> + free(((struct tree*)obj)->buffer);
> + } else if (obj->type == OBJ_COMMIT) {
> + free_commit_list(((struct commit*)obj)->parents);
> + free(&((struct commit*)obj)->util);
> + }

Besides the other comments by Peff and Duy, should the "tag" field of a
tag object be freed too? It is allocated by xmemdupz in tag.c, and is
not assigned to by any other code (verified by renaming it and then
fixing the compile errors one by one).

Other than that, and other than my small comment on patch 1, this patch
set looks good to me.


Re: [PATCH 2/2] builtin/grep.c: teach '-o', '--only-matching' to 'git-grep'

2018-05-08 Thread Jeff King
On Sat, May 05, 2018 at 08:49:43AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > +test_expect_success 'grep --only-matching --heading' '
> > +   git grep --only-matching --heading --line-number --column mmap file 
> > >actual &&
> > +   test_cmp expected actual
> > +'
> > +
> >  cat >expected < >  hello.c
> >  4:int main(int argc, const char **argv)
> 
> We should test this a lot more, I think a good way to do that would be
> to extend this series by first importing GNU grep's -o tests, see
> http://git.savannah.gnu.org/cgit/grep.git/tree/tests/foad1 they are
> license-compatible. Then change the grep_test() function to call git
> grep instead.

I'm trying to figure out what GNU grep's tests are actually checking
that we don't have. I see:

 - they check that "-i" returns the actual found string in its original
   case. This seems like a subset of finding a non-trivial regex. I.e.,
   "foo.*" should find "foobar". We probably should have a test like
   that.

 - they test multiple hits on the same line, which seems like an
   important and easy-to-screw-up case; we do that, too.

 - they test everything other thing with and without "-i" because those
   are apparently separate code paths in GNU grep, but I don't think
   that applies to us.

 - they test each case with "-b", but we don't have that (we do test
   with "--column", which is good)

 - they test with "-n", which we do here (we don't test without, but
   that seems like an unlikely bug, knowing how it is implemented)

 - they test with -H, but that is already the default for git-grep

 - they test with context (-C3) for us. It looks like GNU grep omits
   context lines with "-o", but we show a bunch of blank lines. This is
   I guess a bug (though it seems kind of an odd combination to specify
   in the first place)

So I think it probably makes sense to just pick through the list I just
wrote and write our own tests than to try to import GNU grep's specific
tests (and there's a ton of other unrelated tests in that file that may
or may not even run with git-grep).

> It should also be tested with the various grep.patternType options to
> make sure it works with basic, extended, perl, fixed etc.

Hmm, this code is all external to the actual matching. So unless one of
those is totally screwing up the regmatch_t return, I'm not sure that's
accomplishing much (and if one of them is, it's totally broken for
coloring, "-c", and maybe other features).

We've usually taken a pretty white-box approach to our testing, covering
things that seem likely to go wrong given the general problem space and
our implementation. But maybe I'm missing something in particular that
you think might be tricky.

-Peff


Re: [PATCH] pack-format.txt: more details on pack file format

2018-05-08 Thread Stefan Beller
On Tue, May 8, 2018 at 8:56 AM, Nguyễn Thái Ngọc Duy  wrote:
> The current document mentions OBJ_* constants without their actual
> values. A git developer would know these are from cache.h but that's
> not very friendly to a person who wants to read this file to implement
> a pack file parser.
>
> Similarly, the deltified representation is not documented at all (the
> "document" is basically patch-delta.c). Translate that C code in
> English.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  I noticed that these object type values are not documented in
>  pack-format.txt so here's my attempt to improve it.

Thanks for sending this patch!

>
>  While at there, I also add some text about this obscure delta format.
>  We occasionally have questions about this on the mailing list if I
>  remember correctly.

Let me see if I can understand it, as I am not well versed in the
delta format, so ideally I would understand it from the patch here?

>
>  Documentation/technical/pack-format.txt | 41 +
>  1 file changed, 41 insertions(+)
>
> diff --git a/Documentation/technical/pack-format.txt 
> b/Documentation/technical/pack-format.txt
> index 8e5bf60be3..2c7d5c0e74 100644
> --- a/Documentation/technical/pack-format.txt
> +++ b/Documentation/technical/pack-format.txt
> @@ -36,6 +36,47 @@ Git pack format
>
>- The trailer records 20-byte SHA-1 checksum of all of the above.
>
> +Valid object types are:
> +
> +- OBJ_COMMIT (1)
> +- OBJ_TREE (2)
> +- OBJ_BLOB (3)
> +- OBJ_TAG (4)
> +- OBJ_OFS_DELTA (6)
> +- OBJ_REF_DELTA (7)
> +
> +Type 5 is reserved for future expansion.

and type 0 as well, but that is not spelled out?

> +Deltified representation

Does this refer to OFS delta as well as REF deltas?

> is a sequence of one byte command optionally
> +followed by more data for the command. The following commands are
> +recognized:

So a Deltified representation of an object is a 6 or 7 in the 3 bit type
and then the length. Then a command is shown how to construct
the object based on other objects. Can there be more commands?

> +- If bit 7 is set, the remaining bits in the command byte specifies
> +  how to extract copy offset and size to copy. The following must be
> +  evaluated in this exact order:

So there are 2 modes, and the high bit indicates which mode is used.
You start describing the more complicated mode first,
maybe give names to both of them? "direct copy" (below) and
"compressed copy with offset" ?


> +  - If bit 0 is set, the following byte contains bits 0-7 of the copy
> +offset (this also resets all other bits in the copy offset to
> +zero).
> +  - If bit 1 is set, the following byte contains bits 8-15 of the copy
> +offset.
> +  - If bit 2 is set, the following byte contains bits 16-23 of the
> +copy offset.
> +  - If bit 3 is set, the following byte contains bits 24-31 of the
> +copy offset.

I assume these bits are exclusive, i.e. if bit 3 is set, bits 0-2 are not
allowed to be set. What happens if they are set, do we care?

If bit 3 is set, then the following byte contains 24-31 of the copy offset,
where is the rest? Do I wait for another command byte with
bits 2,1,0 to learn about the body offsets, or do they follow the
following byte? Something like:

  "If bit 3 is set, then the next 4 bytes are the copy offset,
  in network byte order"


> +  - If bit 4 is set, the following byte contains bits 0-7 of the copy
> +size (this also resets all other bits in the copy size to zero_.
> +  - If bit 5 is set, the following byte contains bits 8-15 of the copy
> +size.
> +  - If bit 6 is set, the following byte contains bits 16-23 of the
> +copy size.

bits 4-7 seem to be another group of mutually exclusive bits.
The same question as above:
If bit 6 is set, where are bits 0-15 of the copy size?

> +
> +  Copy size zero means 0x1 bytes.

This is an interesting caveat. So we can only copy 1-0x1 bytes,
and cannot express to copy 0 bytes?

> The data from source object at
> +  the given copy offset is copied back to the destination buffer.
> +
> +- If bit 7 is not set, it is the copy size in bytes. The following
> +  bytes are copied to destination buffer
> +- Command byte zero is reserved for future expansion.

Thanks,
Stefan


Re: [PATCH v2 01/13] repository: introduce parsed objects field

2018-05-08 Thread Jonathan Tan
On Mon,  7 May 2018 15:59:04 -0700
Stefan Beller  wrote:

>   /*
> -  * Holds any information related to accessing the raw object content.
> +  * Holds any information needed to retrieve the raw content
> +  * of objects. The object_parser uses this to get object
> +  * content which it then parses.

Update this comment - there is no more object_parser. (Maybe just delete
the last sentence, or specifically name some of the functions that
access this field.)

>*/
>   struct raw_object_store *objects;

Other than that, this patch looks good.


Re: [PATCH 1/2] gitk: show part of submodule log instead of empty pane when listing trees

2018-05-08 Thread Stefan Beller
On Tue, May 8, 2018 at 5:22 AM, Alex Riesen
 wrote:
> From: Alex Riesen 
>
> Currently, the submodules either are not shown at all (if listing a
> committed tree) or a Tcl error appears (when clicking on a submodule
> from the index list).

I do not understand where this appears, yet.
Where do I have to click to see the effects of this patch?

>
> This will make it show first arbitrarily chosen number of commits,
> which might be only marginally better.
>
> Signed-off-by: Alex Riesen 
> ---
>  gitk | 42 --
>  1 file changed, 32 insertions(+), 10 deletions(-)
>
> diff --git a/gitk b/gitk
> index a14d7a1..d34833f 100755
> --- a/gitk
> +++ b/gitk
> @@ -7627,9 +7627,10 @@ proc gettreeline {gtf id} {
> if {$i < 0} continue
> set fname [string range $line [expr {$i+1}] end]
> set line [string range $line 0 [expr {$i-1}]]
> -   if {$diffids ne $nullid2 && [lindex $line 1] ne "blob"} continue
> +   set objtype [lindex $line 1]
> +   if {$diffids ne $nullid2 && $objtype ne "blob" && $objtype ne 
> "commit" } { continue }
> set sha1 [lindex $line 2]
> -   lappend treeidlist($id) $sha1
> +   lappend treeidlist($id) "$sha1 $objtype"
> }
> if {[string index $fname 0] eq "\""} {
> set fname [lindex $fname 0]
> @@ -7659,21 +7660,42 @@ proc showfile {f} {
>  global ctext_file_names ctext_file_lines
>  global ctext commentend
>
> +set submodlog "git\\ log\\ --format='%h\\ %aN:\\ %s'\\ -100"

Do we want to respect the config option diff.submodule here?
The -100 is chosen rather arbitrarily. Ideally we'd only walk to the
previous entry?

> +set fcmt ""
>  set i [lsearch -exact $treefilelist($diffids) $f]
>  if {$i < 0} {
> puts "oops, $f not in list for id $diffids"
> return
>  }
>  if {$diffids eq $nullid} {
> -   if {[catch {set bf [open $f r]} err]} {
> -   puts "oops, can't read $f: $err"
> -   return
> +   if {[file isdirectory $f]} {
> +   # a submodule
> +   if {[catch {set bf [open "| sh -c cd\\ \"$f\"&&$submodlog" r]} 
> err]} {

Can we have $submodlog use the "git -C  command"
option, then we could save the "cd &&" part, which might even
save us from spawning a shell?

Thanks,
Stefan


Re: What's cooking in git.git (May 2018, #01; Mon, 7)

2018-05-08 Thread Ben Peart



On 5/7/2018 10:58 AM, Junio C Hamano wrote:


* bp/merge-rename-config (2018-05-04) 3 commits
  - merge: pass aggressive when rename detection is turned off
  - merge: add merge.renames config setting
  - merge: update documentation for {merge,diff}.renameLimit
  (this branch uses en/rename-directory-detection-reboot.)



It isn't specifically called out here but is it safe to assume this is 
also headed to next behind en/rename-directory-detection-reboot?





Re: [PATCH/RFC] completion: complete all possible -no-

2018-05-08 Thread Stefan Beller
On Tue, May 8, 2018 at 8:24 AM, Duy Nguyen  wrote:

> I'm arguing about this because I want to see your reaction, because
> I'm thinking of doing the very same thing for config completion. Right
> now "git config " gives you two pages of all available config
> variables. I'm thinking that we "git config " just shows the
> groups, e.g.
>
>> ~/w/git $ git config
> add.  interactive.
> advice.   log.
> alias.mailmap.
> am.   man.
>
> Only when you do "git config log." that it shows you log.*

How cool is that? I'd love it.


[PATCH] pack-format.txt: more details on pack file format

2018-05-08 Thread Nguyễn Thái Ngọc Duy
The current document mentions OBJ_* constants without their actual
values. A git developer would know these are from cache.h but that's
not very friendly to a person who wants to read this file to implement
a pack file parser.

Similarly, the deltified representation is not documented at all (the
"document" is basically patch-delta.c). Translate that C code in
English.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 I noticed that these object type values are not documented in
 pack-format.txt so here's my attempt to improve it.

 While at there, I also add some text about this obscure delta format.
 We occasionally have questions about this on the mailing list if I
 remember correctly.

 Documentation/technical/pack-format.txt | 41 +
 1 file changed, 41 insertions(+)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 8e5bf60be3..2c7d5c0e74 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -36,6 +36,47 @@ Git pack format
 
   - The trailer records 20-byte SHA-1 checksum of all of the above.
 
+Valid object types are:
+
+- OBJ_COMMIT (1)
+- OBJ_TREE (2)
+- OBJ_BLOB (3)
+- OBJ_TAG (4)
+- OBJ_OFS_DELTA (6)
+- OBJ_REF_DELTA (7)
+
+Type 5 is reserved for future expansion.
+
+Deltified representation is a sequence of one byte command optionally
+followed by more data for the command. The following commands are
+recognized:
+
+- If bit 7 is set, the remaining bits in the command byte specifies
+  how to extract copy offset and size to copy. The following must be
+  evaluated in this exact order:
+  - If bit 0 is set, the following byte contains bits 0-7 of the copy
+offset (this also resets all other bits in the copy offset to
+zero).
+  - If bit 1 is set, the following byte contains bits 8-15 of the copy
+offset.
+  - If bit 2 is set, the following byte contains bits 16-23 of the
+copy offset.
+  - If bit 3 is set, the following byte contains bits 24-31 of the
+copy offset.
+  - If bit 4 is set, the following byte contains bits 0-7 of the copy
+size (this also resets all other bits in the copy size to zero_.
+  - If bit 5 is set, the following byte contains bits 8-15 of the copy
+size.
+  - If bit 6 is set, the following byte contains bits 16-23 of the
+copy size.
+
+  Copy size zero means 0x1 bytes. The data from source object at
+  the given copy offset is copied back to the destination buffer.
+
+- If bit 7 is not set, it is the copy size in bytes. The following
+  bytes are copied to destination buffer
+- Command byte zero is reserved for future expansion.
+
 == Original (version 1) pack-*.idx files have the following format:
 
   - The header consists of 256 4-byte network byte order
-- 
2.17.0.705.g3525833791



Re: [PATCH v3 04/12] cache.h: add comment explaining the order in object_type

2018-05-08 Thread Duy Nguyen
On Tue, May 1, 2018 at 8:40 PM, Ævar Arnfjörð Bjarmason
 wrote:
> The order in the enum might seem arbitrary, and isn't explained by
> 72518e9c26 ("more lightweight revalidation while reusing deflated
> stream in packing", 2006-09-03) which added it.
>
> Derrick Stolee suggested that it's ordered topologically in
> 5f8b1ec1-258d-1acc-133e-a7c248b40...@gmail.com. Makes sense to me, add
> that as a comment.
>
> Helped-by: Derrick Stolee 
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  cache.h | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/cache.h b/cache.h
> index 77b7acebb6..354903c3ea 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -376,6 +376,14 @@ extern void free_name_hash(struct index_state *istate);
>  enum object_type {
> OBJ_BAD = -1,
> OBJ_NONE = 0,
> +   /*
> +* Why have our our "real" object types in this order? They're
> +* ordered topologically:
> +*
> +* tag(4)-> commit(1), tree(2), blob(3)
> +* commit(1) -> tree(2)
> +* tree(2)   -> blob(3)
> +*/

I think it's more important that these constants are part of the pack
file format. Even if it follows some order now, when a new object type
comes, you cannot just reorder to keep things look nice because then
you break pack file access.

I'm afraid this comment suggests that these numbers are just about
order, which is very wrong.

> OBJ_COMMIT = 1,
> OBJ_TREE = 2,
> OBJ_BLOB = 3,
-- 
Duy


Re: [PATCH/RFC] completion: complete all possible -no-

2018-05-08 Thread Duy Nguyen
On Mon, Apr 23, 2018 at 7:36 AM, Eric Sunshine  wrote:
> I haven't looked at the implementation, so this may be an entirely
> stupid suggestion, but would it be possible to instead render the
> completions as?
>
> % git checkout --
> --[no-]conflict=   --[no-]patch
> --[no-]detach  --[no-]progress
> --[no-]ignore-other-worktrees  --[no-]quiet
> --[no-]ignore-skip-worktree-bits   --[no-]recurse-submodules
> --[no-]merge   --theirs
> --[no-]orphan= --[no-]track
> --ours
>
> This would address the problem of the --no-* options taking double the
> screen space.

It took me so long to reply partly because I remember seeing some guy
doing clever trick with tab completion that also shows a short help
text in addition to the complete words. I could not find that again
and from my reading (also internet searching) it's probably not
possible to do this without trickery.

> It's also more intuitive than that lone and somewhat weird-looking
> "--no-" suggestion.

It's not that weird if you think about file path completion, where you
complete one path component at a time not full path, bash just does
not show you full paths to everything.

I'm arguing about this because I want to see your reaction, because
I'm thinking of doing the very same thing for config completion. Right
now "git config " gives you two pages of all available config
variables. I'm thinking that we "git config " just shows the
groups, e.g.

> ~/w/git $ git config
add.  interactive.
advice.   log.
alias.mailmap.
am.   man.

Only when you do "git config log." that it shows you log.*
-- 
Duy


Re: [PATCH v2 13/13] alloc: allow arbitrary repositories for alloc functions

2018-05-08 Thread Duy Nguyen
On Tue, May 8, 2018 at 12:59 AM, Stefan Beller  wrote:
> @@ -501,9 +509,31 @@ void raw_object_store_clear(struct raw_object_store *o)
>  void parsed_object_pool_clear(struct parsed_object_pool *o)
>  {
> /*
> -* TOOD free objects in o->obj_hash.
> -*
>  * As objects are allocated in slabs (see alloc.c), we do
>  * not need to free each object, but each slab instead.
> +*
> +* Before doing so, we need to free any additional memory
> +* the objects may hold.
>  */
> +   unsigned i;
> +
> +   for (i = 0; i < o->obj_hash_size; i++) {
> +   struct object *obj = o->obj_hash[i];
> +
> +   if (!obj)
> +   continue;
> +
> +   if (obj->type == OBJ_TREE) {
> +   free(((struct tree*)obj)->buffer);

It would be nicer to keep this in separate functions, e.g.
release_tree_node() and release_commit_node() to go with
alloc_xxx_node().

> +   } else if (obj->type == OBJ_COMMIT) {
> +   free_commit_list(((struct commit*)obj)->parents);
> +   free(&((struct commit*)obj)->util);
> +   }
> +   }

I still don't see who frees obj_hash[] (or at least clears it if not
freed). If I'm going to use this to free memory in pack-objects then
I'd really prefer obj_hash[] freed because it's a big _big_ array.

Just to be clear, what I mean is

FREE_AND_NULL(o->obj_hash);
o->obj_hash_size = 0;

> +
> +   clear_alloc_state(o->blob_state);
> +   clear_alloc_state(o->tree_state);
> +   clear_alloc_state(o->commit_state);
> +   clear_alloc_state(o->tag_state);
> +   clear_alloc_state(o->object_state);
>  }
-- 
Duy


Re: [PATCH v3 06/12] get_short_oid: sort ambiguous objects by type, then SHA-1

2018-05-08 Thread Jeff King
On Tue, May 01, 2018 at 06:40:10PM +, Ævar Arnfjörð Bjarmason wrote:

> Change the output emitted when an ambiguous object is encountered so
> that we show tags first, then commits, followed by trees, and finally
> blobs. Within each type we show objects in hashcmp() order. Before
> this change the objects were only ordered by hashcmp().
> 
> The reason for doing this is that the output looks better as a result,
> e.g. the v2.17.0 tag before this change on "git show e8f2" would
> display:

FWIW, I agree that this gives a big improvement to readability.

-Peff


Re: [PATCH v3 11/12] config doc: document core.disambiguate

2018-05-08 Thread Jeff King
On Tue, May 01, 2018 at 06:40:15PM +, Ævar Arnfjörð Bjarmason wrote:

> The core.disambiguate variable was added in
> 5b33cb1fd7 ("get_short_sha1: make default disambiguation
> configurable", 2016-09-27) but never documented.

Thanks, this seems reasonable. It was originally added as a tool to let
people experiment with different defaults, and I never really expected
it to be something normal people would set. But I'm not sure if anybody
really did much experimentation (I still suspect that setting it to
"commit" or "committish" would make most people happy).

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 2659153cb3..14a3d57e77 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -910,6 +910,19 @@ core.abbrev::
>   abbreviated object names to stay unique for some time.
>   The minimum length is 4.
>  
> +core.disambiguate::
> + If Git is given a SHA-1 that's ambigous it'll suggest what
> + objects you might mean. By default it'll print out all
> + potential objects with that prefix regardless of their
> + type. This setting, along with the `^{}` peel syntax
> + (see linkgit:gitrevisions[7]), allows for narrowing that down.

This isn't just about what we print, but also about excluding objects
from consideration that don't match.

> +Is set to `none` by default to show all object types. Can also be
> +`commit` (peel syntax: `$sha1^{commit}`), `committish` (commits and
> +tags), `tree` (peel: `$sha1^{tree}`), `treeish` (everything except
> +blobs, peel syntax: `$sha1:`), `blob` (peel: `$sha1^{blob}`) or `tag`
> +(peel: `$sha1^{tag}`). The peel syntax will override any config value.

These peel references would need updating pending the discussion over
the earlier patches.

I suspect there are other things besides peel syntax which may override
this. It's really just the fallback when the caller does not give the
lookup machinery any other context. Certainly the peel specifiers are
one way to get syntax, but I think there are others. Grepping for
GET_OID_, I see that the revision dot syntax infers committish context,
as does anything that passes REVARG_COMMITTISH (so git-log, for
example).

-Peff


Re: [PATCH 4/8] push tests: assert re-pushing annotated tags

2018-05-08 Thread SZEDER Gábor

> diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
> index c9a2011915..71fc902062 100755
> --- a/t/t5516-fetch-push.sh
> +++ b/t/t5516-fetch-push.sh
> @@ -965,35 +965,43 @@ test_expect_success 'push into aliased refs 
> (inconsistent)' '
>   )
>  '
>  
> -test_expect_success 'push requires --force to update lightweight tag' '
> - mk_test testrepo heads/master &&
> - mk_child testrepo child1 &&
> - mk_child testrepo child2 &&
> - (
> - cd child1 &&
> - git tag Tag &&
> - git push ../child2 Tag &&
> - >file1 &&
> - git add file1 &&
> - git commit -m "file1" &&
> - git tag -f Tag &&
> - test_must_fail git push ../child2 Tag &&
> - git push --force ../child2 Tag &&
> - git tag -f Tag HEAD~ &&
> - test_must_fail git push ../child2 Tag &&
> - git push --force ../child2 Tag &&
> - git tag -f Tag &&
> - test_must_fail git push ../child2 "refs/tags/*:refs/tags/*" &&
> - git push --force ../child2 "refs/tags/*:refs/tags/*" &&
> - git tag -f Tag HEAD~ &&
> - git push ../child2 "+refs/tags/*:refs/tags/*" &&
> - git tag -f Tag &&
> - git push --no-force ../child2 "+refs/tags/*:refs/tags/*" &&
> - git tag -f Tag HEAD~ &&
> - test_must_fail git push ../child2 tag Tag &&
> - git push --force ../child2 tag Tag
> - )
> -'
> +test_force_push_tag () {
> + tag_type_description=$1
> + tag_args=$2
> +
> + test_expect_success "push requires --force to update 
> $tag_type_description" "
> + mk_test testrepo heads/master &&
> + mk_child testrepo child1 &&
> + mk_child testrepo child2 &&
> + (
> + cd child1 &&
> + git tag Tag &&
> + git push ../child2 Tag &&
> + >file1 &&
> + git add file1 &&
> + git commit -m 'file1' &&
> + git tag $tag_args Tag &&
> + test_must_fail git push ../child2 Tag &&
> + git push --force ../child2 Tag &&
> + git tag $tag_args Tag HEAD~ &&
> + test_must_fail git push ../child2 Tag &&
> + git push --force ../child2 Tag &&
> + git tag $tag_args Tag &&
> + test_must_fail git push ../child2 
> 'refs/tags/*:refs/tags/*' &&
> + git push --force ../child2 'refs/tags/*:refs/tags/*' &&
> + git tag $tag_args Tag HEAD~ &&
> + git push ../child2 '+refs/tags/*:refs/tags/*' &&
> + git tag $tag_args Tag &

There is that unwanted "function" at the end of the line.

Interstingly, the test does pass when run with dash, but fails the
chain-lint tests when run with Bash, even though it's in a subshell.

> + git push --no-force ../child2 
> '+refs/tags/*:refs/tags/*' &&
> + git tag $tag_args Tag HEAD~ &&
> + test_must_fail git push ../child2 tag Tag &&
> + git push --force ../child2 tag Tag
> + )
> + "
> +}
> +
> +test_force_push_tag "lightweight tag" "-f"
> +test_force_push_tag "annotated tag" "-f -a -m'msg'"
>  
>  test_expect_success 'push --porcelain' '
>   mk_empty testrepo &&


Re: [PATCH v3 13/15] git_config_set: make use of the config parser's event stream

2018-05-08 Thread Jeff King
On Tue, May 08, 2018 at 09:42:48AM -0400, Jeff King wrote:

> On Mon, Apr 09, 2018 at 10:32:20AM +0200, Johannes Schindelin wrote:
> 
> > +static int store_aux_event(enum config_event_t type,
> > +  size_t begin, size_t end, void *data)
> > +{
> > +   struct config_store_data *store = data;
> > +
> > +   ALLOC_GROW(store->parsed, store->parsed_nr + 1, store->parsed_alloc);
> > +   store->parsed[store->parsed_nr].begin = begin;
> > +   store->parsed[store->parsed_nr].end = end;
> > +   store->parsed[store->parsed_nr].type = type;
> > +   store->parsed_nr++;
> > +
> > +   if (type == CONFIG_EVENT_SECTION) {
> > +   if (cf->var.len < 2 || cf->var.buf[cf->var.len - 1] != '.')
> > +   BUG("Invalid section name '%s'", cf->var.buf);
> 
> I triggered this BUG today while playing around. Here's a minimal
> reproduction:
> 
>   echo '[broken' >config
>   git config --file=config a.b c
> 
> I'm not sure if it should simply be a die() and not a BUG(), since
> it depends on the input. Or if it is a BUG and we expected an earlier
> part of the code (like the event generator) to catch this broken case
> before we get to this function.

By the way, one side effect of BUG() here is that we call abort(), which
means that our atexit handlers don't run. And a crufty "config.lock"
file is left that prevents running the command again.

In our discussion elsewhere of having BUG() just call exit(), I'm not
sure if we'd want it to skip those cleanups or not (it's helpful to
not run them if you're trying to debug, but otherwise is annoying).

-Peff


Re: [PATCH v3 13/15] git_config_set: make use of the config parser's event stream

2018-05-08 Thread Jeff King
On Mon, Apr 09, 2018 at 10:32:20AM +0200, Johannes Schindelin wrote:

> +static int store_aux_event(enum config_event_t type,
> +size_t begin, size_t end, void *data)
> +{
> + struct config_store_data *store = data;
> +
> + ALLOC_GROW(store->parsed, store->parsed_nr + 1, store->parsed_alloc);
> + store->parsed[store->parsed_nr].begin = begin;
> + store->parsed[store->parsed_nr].end = end;
> + store->parsed[store->parsed_nr].type = type;
> + store->parsed_nr++;
> +
> + if (type == CONFIG_EVENT_SECTION) {
> + if (cf->var.len < 2 || cf->var.buf[cf->var.len - 1] != '.')
> + BUG("Invalid section name '%s'", cf->var.buf);

I triggered this BUG today while playing around. Here's a minimal
reproduction:

  echo '[broken' >config
  git config --file=config a.b c

I'm not sure if it should simply be a die() and not a BUG(), since
it depends on the input. Or if it is a BUG and we expected an earlier
part of the code (like the event generator) to catch this broken case
before we get to this function.

-Peff


Re: [PATCH 2/2] gitk: add an option to run gitk on an item in the file list

2018-05-08 Thread Alex Riesen
Bert Wesarg, Tue, May 08, 2018 15:17:03 +0200:
> On Tue, May 8, 2018 at 2:22 PM, Alex Riesen  
> wrote:
> > +proc flist_gitk {} {
> > +global flist_menu_file findstring gdttype
> > +
> > +set x [shellquote $flist_menu_file]
> 
> this needs to handle cdup, i.e., if gitk is run from a subdirectory,
> all paths needs to be prefixed with $cdup, like: [file join $cdup
> $flist_menu_file]

That, indeed, is easily done:

set x [shellquote [file join $cdup $flist_menu_file]]
if {[file isdirectory $flist_menu_file]} {
exec sh -c "cd $x&" &
} else {
exec gitk -- $x &
}


It just doesn't seem to work: gitk does not find any commits!
Maybe that's because the file panel lists only files for the current
subdirectory (without the path from the repo top-level)? E.g.

mkdir tst
cd tst
git init
mkdir a
touch top-file a/sub-file
git add -A ; git commit -m.
cd a
gitk

Gitk lists only sub-file.

Frankly, this listing limited to just a sub-directory confuses me a bit. Is
there anyway to get to display full repository without changing to the top
level?


---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus



Re: [PATCH 8/8] gpg-interface: handle alternative signature types

2018-05-08 Thread Jeff King
On Mon, May 07, 2018 at 11:06:50PM +, brian m. carlson wrote:

> I think my main objection to this series is that it is generic in a way
> that isn't necessarily useful.  We know there are essentially only two
> formats of PEM-style signatures: OpenPGP and CMS[0].  Even if there are
> more, they aren't intrinsically useful, because our codebase can only
> handle GnuPG-style tools, and those are the only formats GnuPG-style
> tools really support (although, as you point out, other tools could
> mimic the interface).
> 
> I think if we aren't going to implement some sort of interface that's
> generically useful for all signing tools, it would be better to simply
> say that we support gpg and gpgsm and have signingtool.gpg.program and
> signingtool.gpgsm.program and hard-code the logic for those two formats.
> That way we don't have a generic interface that's really only useful for
> PEM-style tools, when we know it likely won't be useful for other tools
> as well.  We can add a more generic interface when we have more varied
> tools to support and we know more about what the requirements will be.

OK, so my question then is: what does just-gpgsm support look like?

Do we literally add gpgsm.program? My thought was that taking us the
first step towards a more generic config scheme would prevent us having
to backtrack later.

There are also more CMS signers than gpgsm (and I know Ben is working on
a tool). So it feels a little ugly to make it "gpgsm.program", since it
really is a more generic format.

Or would you be happy if we just turned the matcher into a whole-line
substring or regex match?

> This doesn't address Junio's concern about whether adding CMS support is
> the right direction to go.  I personally think OpenPGP is the right
> direction for most open-source projects, but I know some companies want
> to use CMS internally and I'm not intrinsically opposed to that[1].
> That decision is ultimately up to Junio, though.

My guess is that fragmentation isn't likely to be much of a problem in
practice, because the tool choice generally falls along
culture/community boundaries. I'd expect that open source projects are
never going to choose CMS, because the centralized cert management is
awful. But it's exactly what many closed-source enterprises want, and
they will literally choose "no signing" over wrestling with PGP.

I'd be much more worried about the open source world splitting into
"signify" and "gpg" camps or similar. OTOH, I just don't see it as all
that big a deal. It's a project decision, and it may even allow for some
healthy competition between standards.

-Peff


Re: [PATCH 2/2] gitk: add an option to run gitk on an item in the file list

2018-05-08 Thread Bert Wesarg
On Tue, May 8, 2018 at 2:22 PM, Alex Riesen
 wrote:
> From: Alex Riesen 
>
> Similar to a git gui feature which visualizes history in a submodule,
> the submodules cause the gitk be started inside the submodule.
>
> Signed-off-by: Alex Riesen 
> ---
>  gitk | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/gitk b/gitk
> index d34833f..1ec545e 100755
> --- a/gitk
> +++ b/gitk
> @@ -2682,6 +2682,7 @@ proc makewindow {} {
> {mc "External diff" command {external_diff}}
> {mc "Blame parent commit" command {external_blame 1}}
> {mc "Copy path" command {clipboard clear; clipboard append 
> $flist_menu_file}}
> +   {mc "Run gitk on this" command {flist_gitk}}
>  }
>  $flist_menu configure -tearoff 0
>
> @@ -3555,6 +3556,17 @@ proc flist_hl {only} {
>  set gdttype [mc "touching paths:"]
>  }
>
> +proc flist_gitk {} {
> +global flist_menu_file findstring gdttype
> +
> +set x [shellquote $flist_menu_file]

this needs to handle cdup, i.e., if gitk is run from a subdirectory,
all paths needs to be prefixed with $cdup, like: [file join $cdup
$flist_menu_file]

Bert

> +if {[file isdirectory $flist_menu_file]} {
> +   exec sh -c "cd $x&" &
> +} else {
> +   exec gitk -- $x &
> +}
> +}
> +
>  proc gitknewtmpdir {} {
>  global diffnum gitktmpdir gitdir env
>


Re: main url for linking to git source?

2018-05-08 Thread brian m. carlson
On Mon, May 07, 2018 at 10:46:58PM -0400, Konstantin Ryabitsev wrote:
> On Tue, May 08, 2018 at 01:51:30AM +, brian m. carlson wrote:
> > I think I would also prefer a list of available repositories over a
> > hard-coded choice.  It may be that some places (say, Australia) have
> > better bandwidth to one over the other, and users will be able to have a
> > better experience with certain mirrors.
> > 
> > While I'm sympathetic to the idea of referring to kernel.org because
> > it's open-source and non-profit, users outside of North America are
> > likely to have a less stellar experience with its mirrors, since they're
> > all in North America.
> 
> I'm a bit worried that I'll come across as some kind of annoying pedant,
> but git.kernel.org is actually 6 different systems available in US,
> Europe, Hong Kong, and Australia. :)

That is very interesting to know.  Perhaps it would be useful to update
https://www.kernel.org/category/about.html?

I did specifically look to see where various mirrors' servers were
located, although I didn't actually test for kernel.org since you
specified on the website.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


[PATCH 2/2] gitk: add an option to run gitk on an item in the file list

2018-05-08 Thread Alex Riesen
From: Alex Riesen 

Similar to a git gui feature which visualizes history in a submodule,
the submodules cause the gitk be started inside the submodule.

Signed-off-by: Alex Riesen 
---
 gitk | 12 
 1 file changed, 12 insertions(+)

diff --git a/gitk b/gitk
index d34833f..1ec545e 100755
--- a/gitk
+++ b/gitk
@@ -2682,6 +2682,7 @@ proc makewindow {} {
{mc "External diff" command {external_diff}}
{mc "Blame parent commit" command {external_blame 1}}
{mc "Copy path" command {clipboard clear; clipboard append 
$flist_menu_file}}
+   {mc "Run gitk on this" command {flist_gitk}}
 }
 $flist_menu configure -tearoff 0
 
@@ -3555,6 +3556,17 @@ proc flist_hl {only} {
 set gdttype [mc "touching paths:"]
 }
 
+proc flist_gitk {} {
+global flist_menu_file findstring gdttype
+
+set x [shellquote $flist_menu_file]
+if {[file isdirectory $flist_menu_file]} {
+   exec sh -c "cd $x&" &
+} else {
+   exec gitk -- $x &
+}
+}
+
 proc gitknewtmpdir {} {
 global diffnum gitktmpdir gitdir env
 
-- 
2.17.0.rc1.56.gb9824190bd


---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus



[PATCH 1/2] gitk: show part of submodule log instead of empty pane when listing trees

2018-05-08 Thread Alex Riesen
From: Alex Riesen 

Currently, the submodules either are not shown at all (if listing a
committed tree) or a Tcl error appears (when clicking on a submodule
from the index list).

This will make it show first arbitrarily chosen number of commits,
which might be only marginally better.

Signed-off-by: Alex Riesen 
---
 gitk | 42 --
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/gitk b/gitk
index a14d7a1..d34833f 100755
--- a/gitk
+++ b/gitk
@@ -7627,9 +7627,10 @@ proc gettreeline {gtf id} {
if {$i < 0} continue
set fname [string range $line [expr {$i+1}] end]
set line [string range $line 0 [expr {$i-1}]]
-   if {$diffids ne $nullid2 && [lindex $line 1] ne "blob"} continue
+   set objtype [lindex $line 1]
+   if {$diffids ne $nullid2 && $objtype ne "blob" && $objtype ne 
"commit" } { continue }
set sha1 [lindex $line 2]
-   lappend treeidlist($id) $sha1
+   lappend treeidlist($id) "$sha1 $objtype"
}
if {[string index $fname 0] eq "\""} {
set fname [lindex $fname 0]
@@ -7659,21 +7660,42 @@ proc showfile {f} {
 global ctext_file_names ctext_file_lines
 global ctext commentend
 
+set submodlog "git\\ log\\ --format='%h\\ %aN:\\ %s'\\ -100"
+set fcmt ""
 set i [lsearch -exact $treefilelist($diffids) $f]
 if {$i < 0} {
puts "oops, $f not in list for id $diffids"
return
 }
 if {$diffids eq $nullid} {
-   if {[catch {set bf [open $f r]} err]} {
-   puts "oops, can't read $f: $err"
-   return
+   if {[file isdirectory $f]} {
+   # a submodule
+   if {[catch {set bf [open "| sh -c cd\\ \"$f\"&&$submodlog" r]} 
err]} {
+   puts "oops, can't read submodule $f: $err"
+   return
+   }
+} else {
+   if {[catch {set bf [open $f r]} err]} {
+   puts "oops, can't read $f: $err"
+   return
+   }
}
 } else {
-   set blob [lindex $treeidlist($diffids) $i]
-   if {[catch {set bf [open [concat | git cat-file blob $blob] r]} err]} {
-   puts "oops, error reading blob $blob: $err"
-   return
+   set bo [lindex $treeidlist($diffids) $i]
+   set blob [lindex $bo 0]
+   set objtype [lindex $bo 1]
+   if { "$objtype" eq "blob" } {
+   if {[catch {set bf [open [concat | git cat-file blob $blob] r]} 
err]} {
+   puts "oops, error reading blob $blob: $err"
+   return
+   }
+   } else {
+   # also a submodule
+   if {[catch {set bf [open "| sh -c cd\\ \"$f\"&&$submodlog\\ $blob" 
r]} err]} {
+   puts "oops, error reading submodule commit: $err"
+   return
+   }
+   set fcmt "/"
}
 }
 fconfigure $bf -blocking 0 -encoding [get_path_encoding $f]
@@ -7683,7 +7705,7 @@ proc showfile {f} {
 lappend ctext_file_names $f
 lappend ctext_file_lines [lindex [split $commentend "."] 0]
 $ctext insert end "\n"
-$ctext insert end "$f\n" filesep
+$ctext insert end "$f$fcmt\n" filesep
 $ctext config -state disabled
 $ctext yview $commentend
 settabs 0
-- 
2.17.0.rc1.56.gb9824190bd


---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus



[PATCH 0/2] gitk: improve handling of submodules in the file list panel

2018-05-08 Thread Alex Riesen
Currently, the submodule entries in the file list panel are mostly ignored.
This series attempts to improve the situation by showing part of submodule
history when focusing it in the file list panel and by adding a menu element
to start gitk in the submodule (similar to git gui).

  [1/2]: gitk: show part of submodule log instead of empty pane when listing 
trees
  [2/2]: gitk: add an option to run gitk on an item in the file list

 gitk | 54 --
 1 file changed, 44 insertions(+), 10 deletions(-)

---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus



Re: [RFC PATCH v4 1/3] Teach remote add the --remote-tags option

2018-05-08 Thread Kaartic Sivaraam
On Tuesday 01 May 2018 10:29 PM, Wink Saville wrote:
> When --remote-tags is passed to `git remote add` the tagopt is set to
> --remote-tags and a second fetch line is added so tags are placed in
> a separate hierarchy per remote.
> 

I find '--remote' in the option name to be redundant given that it is an
option to `git remote add`. I guess '--namespace-tags' would be a better
alternative as it seems to convey the meaning more directly to the user.


-- 
Sivaraam

QUOTE:

“The most valuable person on any team is the person who makes everyone
else on the team more valuable, not the person who knows the most.”

  - Joel Spolsky


Sivaraam?

You possibly might have noticed that my signature recently changed from
'Kaartic' to 'Sivaraam' both of which are parts of my name. I find the
new signature to be better for several reasons one of which is that the
former signature has a lot of ambiguities in the place I live as it is a
common name (NOTE: it's not a common spelling, just a common name). So,
I switched signatures before it's too late.

That said, I won't mind you calling me 'Kaartic' if you like it [of
course ;-)]. You can always call me using either of the names.


KIND NOTE TO THE NATIVE ENGLISH SPEAKER:

As I'm not a native English speaker myself, there might be mistaeks in
my usage of English. I apologise for any mistakes that I make.

It would be "helpful" if you take the time to point out the mistakes.

It would be "super helpful" if you could provide suggestions about how
to correct those mistakes.

Thanks in advance!



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 3/8] push tests: add more testing for forced tag pushing

2018-05-08 Thread Kaartic Sivaraam
On Monday 30 April 2018 01:50 AM, Ævar Arnfjörð Bjarmason wrote:
> diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
> index 15c8d5a734..c9a2011915 100755
> --- a/t/t5516-fetch-push.sh
> +++ b/t/t5516-fetch-push.sh
> @@ -981,7 +981,17 @@ test_expect_success 'push requires --force to update 
> lightweight tag' '
>   git push --force ../child2 Tag &&
>   git tag -f Tag HEAD~ &&
>   test_must_fail git push ../child2 Tag &&
> - git push --force ../child2 Tag
> + git push --force ../child2 Tag &&
> + git tag -f Tag &&
> + test_must_fail git push ../child2 "refs/tags/*:refs/tags/*" &&
> + git push --force ../child2 "refs/tags/*:refs/tags/*" &&
> + git tag -f Tag HEAD~ &&
> + git push ../child2 "+refs/tags/*:refs/tags/*" &&
> + git tag -f Tag &&
> + git push --no-force ../child2 "+refs/tags/*:refs/tags/*" &&
> + git tag -f Tag HEAD~ &&
> + test_must_fail git push ../child2 tag Tag &&
> + git push --force ../child2 tag Tag

As a person who came to know about the "tag " refspec for the
first time while seeing this patch, I found it a little hard to parse
the following two lines of the test:

test_must_fail git push ../child2 tag Tag &&
git push --force ../child2 tag Tag

Maybe some other name than "Tag" for the example would have made it
easier for the person reading it. Something like "foo"/"bar" etc.


-- 
Sivaraam

QUOTE:

“The most valuable person on any team is the person who makes everyone
else on the team more valuable, not the person who knows the most.”

  - Joel Spolsky


Sivaraam?

You possibly might have noticed that my signature recently changed from
'Kaartic' to 'Sivaraam' both of which are parts of my name. I find the
new signature to be better for several reasons one of which is that the
former signature has a lot of ambiguities in the place I live as it is a
common name (NOTE: it's not a common spelling, just a common name). So,
I switched signatures before it's too late.

That said, I won't mind you calling me 'Kaartic' if you like it [of
course ;-)]. You can always call me using either of the names.


KIND NOTE TO THE NATIVE ENGLISH SPEAKER:

As I'm not a native English speaker myself, there might be mistaeks in
my usage of English. I apologise for any mistakes that I make.

It would be "helpful" if you take the time to point out the mistakes.

It would be "super helpful" if you could provide suggestions about how
to correct those mistakes.

Thanks in advance!



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 13/13] alloc: allow arbitrary repositories for alloc functions

2018-05-08 Thread Jeff King
On Mon, May 07, 2018 at 03:59:16PM -0700, Stefan Beller wrote:

> @@ -501,9 +509,31 @@ void raw_object_store_clear(struct raw_object_store *o)
>  void parsed_object_pool_clear(struct parsed_object_pool *o)
> [...]
> + for (i = 0; i < o->obj_hash_size; i++) {
> + struct object *obj = o->obj_hash[i];
> +
> + if (!obj)
> + continue;
> +
> + if (obj->type == OBJ_TREE) {
> + free(((struct tree*)obj)->buffer);
> + } else if (obj->type == OBJ_COMMIT) {
> + free_commit_list(((struct commit*)obj)->parents);
> + free(&((struct commit*)obj)->util);
> + }
> + }

Coverity complains about this final free(). I think the "&" is doing an
incorrect extra level of indirection?

That said, I'm not sure if it is safe to blindly free the util field. We
don't necessarily know what downstream code has pointed it to. It may
not be allocated memory[1], or it may even be a more complicated data
structure that has sub-components that need freeing[2].

In the long run, it may be worth trying to get rid of this util field
completely, in favor of having callers use a commit_slab. That has
better memory-ownership semantics, and it would save 8 bytes in struct
commit.

[1] Grepping for "commit->util =", sequencer.c seems to assign pointers
into other arrays, as well as the "(void *)1".

[2] Most assignments seem to be flex-structs, but blame.c assigns a
linked list.

-Peff


Re: [PATCH 3/8] push tests: add more testing for forced tag pushing

2018-05-08 Thread Kaartic Sivaraam
On Tuesday 08 May 2018 08:49 AM, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
>> I couldn't quite get what you meant by "(but not the other way
>> around)".  Did you mean
>>
>>  $ git push --force ../child2 refs/tags/*:refs/tags/*
>>
>> should not become non-forcing version because of the (lack of)
>> prefix on the refspec does not trump the --force command line
>> option?

When I was reading the commit message, I had the same doubt about what
"(but not the other way around)" actually meant but I assumed it meant
that "the `--no-force` in the command line does not override the '+' in
the refspec". Maybe the commit message could be updated to clarify this?


-- 
Sivaraam

QUOTE:

“The most valuable person on any team is the person who makes everyone
else on the team more valuable, not the person who knows the most.”

  - Joel Spolsky


Sivaraam?

You possibly might have noticed that my signature recently changed from
'Kaartic' to 'Sivaraam' both of which are parts of my name. I find the
new signature to be better for several reasons one of which is that the
former signature has a lot of ambiguities in the place I live as it is a
common name (NOTE: it's not a common spelling, just a common name). So,
I switched signatures before it's too late.

That said, I won't mind you calling me 'Kaartic' if you like it [of
course ;-)]. You can always call me using either of the names.


KIND NOTE TO THE NATIVE ENGLISH SPEAKER:

As I'm not a native English speaker myself, there might be mistaeks in
my usage of English. I apologise for any mistakes that I make.

It would be "helpful" if you take the time to point out the mistakes.

It would be "super helpful" if you could provide suggestions about how
to correct those mistakes.

Thanks in advance!



signature.asc
Description: OpenPGP digital signature


Re: Weak option parsing in `git submodule`

2018-05-08 Thread Kaartic Sivaraam
On Tuesday 08 May 2018 12:35 AM, Stefan Beller wrote:
>> The lack of checking for the reason behind why `git add` fails seems to
>> be the reason behind that weird message.
> 
> (from the man page)
> git submodule [--quiet] add [] [--]  []
> 
> When options are given after  or  we can count
> the arguments and know something is up. (The number of arguments
> must be 1 or 2. If it is 3 or above, something fishy is going on), which
> I would suggest as a first step.
> 
>> Ways to fix this:
>>
>> 1. Fix this particular issue by adding a '--' after the '--no-warn-
>> embedded-repo' option in the above check. But that would also
>> require that we allow other parts of the script to accept weird
>> paths such as '--path'. Not so useful/helpful.
>>
>> 2. Check for the actual return value of `git add` in the check and
>> act accordingly. Also, check if there are unnecessary arguments for
>> `submodule add`.
> 
> The second part of this suggestion seems to me as the way to go.
> Do you want to write a patch?
> 

Incidentally, I was writing a patch to check for the return value of
`git add` to fix the particular issue I noted in my initial message.
Then I was in a dilemma as to whether this was the right way to do it.
So, I thought it would be better to ask before continuing with the patch
and hence started this thread. I wasn't counting the arguments to `git
submodule add` at that time.

Now that I'm ensured that my initial approach is not the worst way to do
things and as I'm about to write a patch for this, I'll sum up what I'm
about to achieve in the short-term fix patch, for the sake of clarity.

1. Check the return value of `git add ...` and throw an error
   appropriately.

2. Check the no. of arguments to `submodule add` and throw an
   error if there are more arguments than there should be.

   I require a little clarification with this. How should this
   be done. Does checking whether the number of arguments after
is not more than one do the job? Or am I missing
   something?


>> 3. Tighten option parsing in the `git-submodule` script to ensure
>> this doesn't happen in the long term and helps users to get more
>> relevant error messages.
>>
>> I find 3 to be a long term solution but not sure if it's worth trying
>> as there are efforts towards porting `git submodule` to C. So, I guess
>> we should at least do 2 as a short-term solution.
> 
> Yeah, bringing it to C, would be nice as a long term solution instead
> of piling up more and more shell features. :)
> 

Hope the day it is brought into C comes soon.


> Thanks for such a well written bug report with suggested bug fixes. :)

You're welcome :-)


-- 
Sivaraam

QUOTE:

“The most valuable person on any team is the person who makes everyone
else on the team more valuable, not the person who knows the most.”

  - Joel Spolsky


Sivaraam?

You possibly might have noticed that my signature recently changed from
'Kaartic' to 'Sivaraam' both of which are parts of my name. I find the
new signature to be better for several reasons one of which is that the
former signature has a lot of ambiguities in the place I live as it is a
common name (NOTE: it's not a common spelling, just a common name). So,
I switched signatures before it's too late.

That said, I won't mind you calling me 'Kaartic' if you like it [of
course ;-)]. You can always call me using either of the names.


KIND NOTE TO THE NATIVE ENGLISH SPEAKER:

As I'm not a native English speaker myself, there might be mistaeks in
my usage of English. I apologise for any mistakes that I make.

It would be "helpful" if you take the time to point out the mistakes.

It would be "super helpful" if you could provide suggestions about how
to correct those mistakes.

Thanks in advance!



signature.asc
Description: OpenPGP digital signature


Re: main url for linking to git source?

2018-05-08 Thread Kaartic Sivaraam
On Monday 07 May 2018 11:45 PM, Stefan Beller wrote:
>> I could see arguments both ways, so I thought I'd take a straw poll of
>> what people on the list think.
>  
> Junios reply below focuses on the URL passed to git-clone, which
> is only found at https://git-scm.com/downloads (?)
> 
> There I would try to mirror Junios list of "public repositories"
> https://git-blame.blogspot.com/p/git-public-repositories.html
> without officially endorsing one over another.
>
FWIW, I also seem to support this suggestion as it's not opinionated.


> For those links that link to web pages, I am ok with any of the
> hosting providers, maybe even taking the one with the prettiest
> web page. Maybe we want to reword those sections to rely
> more on indirection, e.g. "get the source[link to the source page],
> checkout the next branch", without the quick web link to a web page
> showing the 'next' tree.

Seems to be a nice suggestion to avoid the "main/official" url issue.

To add a little more, it might be better replace the "Source code" link
with a link to Junio's list of public repositories stated above. Also,
it might be better to rename the link to "Public repositories containing
the source".


-- 
Sivaraam

QUOTE:

“The most valuable person on any team is the person who makes everyone
else on the team more valuable, not the person who knows the most.”

  - Joel Spolsky


Sivaraam?

You possibly might have noticed that my signature recently changed from
'Kaartic' to 'Sivaraam' both of which are parts of my name. I find the
new signature to be better for several reasons one of which is that the
former signature has a lot of ambiguities in the place I live as it is a
common name (NOTE: it's not a common spelling, just a common name). So,
I switched signatures before it's too late.

That said, I won't mind you calling me 'Kaartic' if you like it [of
course ;-)]. You can always call me using either of the names.


KIND NOTE TO THE NATIVE ENGLISH SPEAKER:

As I'm not a native English speaker myself, there might be mistaeks in
my usage of English. I apologise for any mistakes that I make.

It would be "helpful" if you take the time to point out the mistakes.

It would be "super helpful" if you could provide suggestions about how
to correct those mistakes.

Thanks in advance!



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v4 2/7] grep.c: expose matched column in match_line()

2018-05-08 Thread René Scharfe
Am 05.05.2018 um 04:42 schrieb Taylor Blau:
> When calling match_line(), callers presently cannot determine the
> relative offset of the match because match_line() discards the
> 'regmatch_t' that contains this information.
> 
> Instead, teach match_line() to take in a 'regmatch_t *' so that callers
> can inspect the match's starting and ending offset from the beginning of
> the line. This additional argument has no effect when opt->extended is
> non-zero.
> 
> We will later pass the starting offset from 'regmatch_t *' to
> show_line() in order to display the column number of the first match.
> 
> Signed-off-by: Taylor Blau 
> ---
>   grep.c | 9 +
>   1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/grep.c b/grep.c
> index 65b90c10a3..1c25782355 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -1299,17 +1299,17 @@ static int match_expr(struct grep_opt *opt, char 
> *bol, char *eol,
>   }
>   
>   static int match_line(struct grep_opt *opt, char *bol, char *eol,
> -   enum grep_context ctx, int collect_hits)
> +   regmatch_t *match, enum grep_context ctx,
> +   int collect_hits)
>   {
>   struct grep_pat *p;
> - regmatch_t match;
>   
>   if (opt->extended)
>   return match_expr(opt, bol, eol, ctx, collect_hits);

If ->extended is set then match won't be touched...

>   
>   /* we do not call with collect_hits without being extended */
>   for (p = opt->pattern_list; p; p = p->next) {
> - if (match_one_pattern(p, bol, eol, ctx, , 0))
> + if (match_one_pattern(p, bol, eol, ctx, match, 0))
>   return 1;
>   }
>   return 0;
> @@ -1699,6 +1699,7 @@ static int grep_source_1(struct grep_opt *opt, struct 
> grep_source *gs, int colle
>   int try_lookahead = 0;
>   int show_function = 0;
>   struct userdiff_driver *textconv = NULL;
> + regmatch_t match;
>   enum grep_context ctx = GREP_CONTEXT_HEAD;
>   xdemitconf_t xecfg;
>   
> @@ -1788,7 +1789,7 @@ static int grep_source_1(struct grep_opt *opt, struct 
> grep_source *gs, int colle
>   if ((ctx == GREP_CONTEXT_HEAD) && (eol == bol))
>   ctx = GREP_CONTEXT_BODY;
>   
> - hit = match_line(opt, bol, eol, ctx, collect_hits);
> + hit = match_line(opt, bol, eol, , ctx, collect_hits);
>   *eol = ch;
>   
>   if (collect_hits)
> 

... which leaves it uninitialized.

So at least the combination of extended matches and --column should error
out.  Supporting it would be better, of course.  That could get tricky for
negations, though (e.g. git grep --not -e foo).

René