Re: [PoC -- do not apply 3/3] test-tree-bitmap: replace ewah with custom rle encoding

2018-10-10 Thread Jeff King
On Wed, Oct 10, 2018 at 09:58:51AM +0900, Junio C Hamano wrote:

> > +static void bitmap_to_rle(struct strbuf *out, struct bitmap *bitmap)
> > +{
> > +   int curval = 0; /* count zeroes, then ones, then zeroes, etc */
> > +   size_t run = 0;
> > +   size_t word;
> > +   size_t orig_len = out->len;
> > +
> > +   for (word = 0; word < bitmap->word_alloc; word++) {
> > +   int bit;
> > +
> > +   for (bit = 0; bit < BITS_IN_EWORD; bit++) {
> > +   int val = !!(bitmap->words[word] & (((eword_t)1) << 
> > bit));
> > +   if (val == curval)
> > +   run++;
> > +   else {
> > +   strbuf_add_varint(out, run);
> > +   curval = 1 - curval; /* flip 0/1 */
> > +   run = 1;
> > +   }
> > +   }
> 
> OK.  I find it a bit disturbing to see that the loop knows a bit too
> much about how "struct bitmap" is implemented, but that is a complaint
> against the bitmap API, not this new user of the API.

Heh, again, this is not really meant to be production code. I'm not at
all happy about inventing a new compressed bitmap format here, and I'd
want to investigate the state of the art a bit more. In particular, the
worst case here is quite bad, and I wonder if there are formats that can
select the best encoding when writing a bitmap (naive RLE when it's
good, something else other times).

I also suspect part of why this does better is that other formats are
optimized less for our case. We really don't care about setting or
looking at a few bits part way through a bitmap. Our bitmaps are small
enough that we don't mind streaming through a whole one. It's just that
we have so _many_ of them that we want to be meticulous about wasted
bytes.

Whatever format we choose, I think it would become part of the bitmap.c
file, and internal details would be OK to access there. I just put it
here to keep the patch simple.

> We do not try to handle the case where bitmap has bits that is not
> multiple of BITS_IN_EWORD and instead pretend that size of such a
> bitmap can be rounded up, because we ignore trailing 0-bit anyway,
> and we know the "struct bitmap" would pad with 0-bit at the tail?

Right. We do not know the "real" number of zero bits at all. It's just
assumed that there are infinite zeroes trailing off the end (and this is
how "struct bitmap" works, since it is the one that does not bother to
keep a separate size pointer).

> > +   /*
> > +* ugh, varint does not seem to have a way to prevent reading past
> > +* the end of the buffer. We'll do a length check after each one,
> > +* so the worst case is bounded.
> > +*/
> 
> Sorry about that :-).

:) We may want to address that. I know we did some hardening about
reading off the end of .pack and .idx files. But it seems like any user
of decode_varint() may read up to 16 bytes past the end of a buffer.

We seem to only use them for the $GIT_DIR/index, though. Anybody with a
"struct hashfile" result at least has a 20-byte trailer we can
accidentally read from. But I wouldn't be surprised if there's a way to
trick it in practice.

-Peff


Bug: manpage for `git ls-files` uses instead of

2018-10-10 Thread Lily Ballard
`git ls-files` takes zero or more s, but the manpage (and the `-h` 
output) lists it as taking zero or more s instead. This is confusing as 
 is documented in `man git` as a filename, without any magic. But a 
pathspec has a lot of special handling. The gitglossary entry for pathspec does 
say that `git ls-files` takes it, but nobody is going to know to look there in 
the first place if they look at `git ls-files` and see that it takes files. 

I haven’t checked any other commands that the glossary lists as taking 
pathspecs (except `git add`, which does properly list it as taking pathspecs), 
so I don’t know if any of the other commands incorrectly list themselves as 
taking files when they take pathspecs.

git version 2.19.1.

-Lily Ballard


Re: [PoC -- do not apply 2/3] test-tree-bitmap: add "dump" mode

2018-10-10 Thread Jeff King
On Wed, Oct 10, 2018 at 09:48:53AM +0900, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > The one difference is the sort order: git's diff output is
> > in tree-sort order, so a subtree "foo" sorts like "foo/",
> > which is after "foo.bar". Whereas the bitmap path list has a
> > true byte sort, which puts "foo.bar" after "foo".
> 
> If we truly cared, it is easy enough to fix by having a custom
> comparison function in 1/3 used in collect_paths() phase.

Yep. I thought about doing it just so I could drop this "one difference"
note, but I got lazy.

Running this on linux.git, I do see a few other differences. It looks
like my code does actually compute lists of touched paths for some
merges (presumably using "-c"). That wasn't intended, and it would
actually make my timings less good, but my goal was just to get a rough
idea on size here (but see below).

> > +   /* dump it while we have the sorted order in memory */
> > +   for (i = 0; i < n; i++) {
> > +   printf("%s", sorted[i]->path);
> > +   putchar('\0');
> > +   }
> 
> With printf("%s%c", sorted[i]->path, '\0'); you can lose the braces.

Heh, I didn't really expect review at that level. I'm not even sure this
is a good direction to go versus something like the bloom filters (or
even a more full --raw cache). But if it is, this code is mostly
throw-away anyway, as we'd want to integrate it with the actual diff
code.

My original goal had mostly been to get an idea of the size, and the
"dump" half was there to verify that the results were roughly sane. But
it actually works for rough timing, too. I can generate roughly the same
results as "rev-list --all | diff-tree --stdin -t --name-only" in about
300ms, as opposed to 33s. So that's good.

But it's also a slight cheat, since I'm not actually traversing the
commits, but rather just opening up the bitmaps in the order we wrote
them. ;)

Actually walking the commits (and not looking at the trees) takes ~7s,
so it would at least be more like 33s versus 7.3s. With core.commitgraph,
it's more like 1.1s, so imagine 27s versus 1.4s, I guess.

That's also neglecting any load/lookup time for actual random-access to
the bitmaps. I doubt that's more than a few hundred ms, but that's just
a made-up number.

So I think the rough timings are favorable, but the real proof would
actually be using it from a revision walk, which I haven't written.

> > +   putchar('\0');
> > +
> > free(sorted);
> >  }
> >  
> > @@ -142,6 +150,8 @@ static void generate_bitmap(struct diff_queue_struct *q,
> >  
> > ewah = bitmap_to_ewah(bitmap);
> > ewah_serialize_strbuf(ewah, );
> > +
> > +   fwrite(data->commit->object.oid.hash, 1, GIT_SHA1_RAWSZ, stdout);
> > fwrite(out.buf, 1, out.len, stdout);
> 
> OK, so per commit, we have ewah bitmap that records the "changed
> paths" after the commit object name.  Makes sense.

Yeah. This format, btw, is garbage. It was just the smallest and
simplest thing I could think of that would work for my case. We'd want
random-access to the bitmaps for each commit, probably via an index
block in the commit-graph file.

> And the list of paths are based on the "one" side of the filepair.
> When we do an equivalent of "git show X", we see "diff-tree X~1 X"
> and by collecting the "one" side (i.e. subset of paths in the tree
> of X~1 that were modified when going to X) we say "commit X changed
> these paths".  Makes sense, too.

I didn't think too hard on whether we'd need to look at the "two" side
ever. I turned off renames, so we'd see deletions via the "one". I feel
like we'd miss additions in that case, though, but from my results, we
do not seem to.

-Peff


Re: `--rebase-merges' still failing badly

2018-10-10 Thread Michael Witten
On Thu, 11 Oct 2018 08:01:40 +0900, Junio wrote:

> Michael Witten  writes:
>
>> On Wed, 10 Oct 2018 14:43:46 +0900, Junio wrote:
>>
>>> We haven't seen  much complaints and breakages  reported against the
>>> two big "rewrite in C" topics  around "rebase"; perhaps it is a good
>>> time to merge  them to 'next' soonish  to cook them for  a few weeks
>>> before moving them to 'master'?
>>
>> In my opinion, the `--rebase-merges' feature has been broken since the
>> beginning, and the builtin version should  be fixed before it is moved
>> ahead.
>
> [...]
>
> If "rebase-merges" has been broken since  the beginning, as long as the
> "rewrite in C" topics  around "rebase" do not make it  even worse, I do
> not think it is a good move  to block the topics moving forward. If the
> feature were so  broken that it is not practically  useful, then people
> wouldn't be using it  in the versions of Git before  the rewrite, so it
> won't harm  anybody if  the same  feature in  the rewritten  version is
> equally (or even  more severely) broken, as long as  the other parts of
> the feature works at least equally well compared to the older version.
>
> We are not in the business of hostage taking.
>
> What  *should*  block  the  rewrited  version  is  a  regression,  i.e.
> something that used  to work well no longer works  or works differently
> in such a way that established workflows need to be adjusted.
>
> [...] I do not think that is a reason to keep "rewrite in C" waiting in
> 'pu'.

* Your logic  is appealing,  and I  nearly pursuaded  myself by  the same
  reasoning to submit my email as  a separate discussion, as you suggest.
  However, what convinced me otherwise is the following:

  The  closer you  move  the rewrite  to  a fast-forward-only  public
  branch  name, the  more  likely downstream  projects  are going  to
  set  up  new,  long-lived  releases around  this  very  useful  but
  nevertheless broken feature.

  The moment you announce a new release, there are going to be a bunch of
  people who grab that release and then  NEVER look back, and so the rest
  of us will be stuck with this problem for who knows how long.

  So, not only is this an appeal  to the authors to fix this problem, but
  its also  an appeal  to you to  make sure that the  next  major release
  includes the fix.

* Also, I say the following without irony or tongue in cheek:

  Maybe, no one  has complained  because  few people  are using  this
  feature yet, or  their commit summaries are  simplistic, or they've
  got workarounds (as I've got).

  Not  only must  this feature  be turned  on explicitly,  but `git'  has
  existed for  over a decade  *without* it;  users who are  interested in
  sophisticated management of commit history have already developed other
  ways  to achieve  the  same result  (I  know I  did),  or their  commit
  messages are  so simplistic that  the bug  is never triggered,  or they
  just plan around it by automatically running a quick search/replace for
  the offending characters or for the irritating "labels".

  If the last decade has shown  us anything, it's that git's fundamentals
  are  so good  that programmers  can get  around any  bug on  their own,
  without having to appeal to others  for help. And, what is a programmer
  if not someone who is used to making things Just Work [Damnit]?

  As an illustration,  consider the recent `break' command  that is being
  added to the repertoire of `git  rebase -i'. Hell, I (and probably many
  others) have been doing that for YEARS with:

  x false

  No need for a "new" command. I bet that 10 years from now,  people will
  *still* be using their own ways,  and will *still* be totally oblivious
  to the existence of `break'.

  That is to say, I wouldn't put much faith in the degree to which people
  report issues. The programming world has a lot of itchy backs, and just
  as many personal inventions for scratching them.

As always, thanks for taking the time to review everyone's input.

Sincerely,
Michael Witten


Re: builtin stash/rebase, was Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-10 Thread Junio C Hamano
Johannes Schindelin  writes:

> https://github.com/git-for-windows/git/commit/6bc7024aecdb1aeb2760c519f7b26e6e5ef21051
> fixup! builtin rebase: support `-C` and `--whitespace=`

For c7ee2134d4 (rebase-in-c-4-opts); a single liner that is
obviously correct.

> https://github.com/git-for-windows/git/commit/1e6a1c510ffeae5bb0a4bda7f0528a8213728837
> fixup! builtin rebase: support `--gpg-sign` option

For 28a02c5a79 (rebase-in-c-4-opts); the change looks correct (see a
separate message).

> https://github.com/git-for-windows/git/commit/ddb6e5ca19d5cdd318bc4bcbb7f7f3fb0892c8cc
> fixup! rebase -i: implement the main part of interactive rebase as a 
> builtin

You said Alban already has this in the update, which I took
yesterday, so I'll ignore this one.

> https://github.com/git-for-windows/git/commit/2af24038a95a3879aa0c29d91a43180b9465247e
> fixup! stash: convert apply to builtin

I think we are expecting another round of update, so I'll ignore
this one for now, too.

> Speaking about the two `rebase` ones: they are simple fixup! commits,
> could I trouble you to fetch and cherry-pick them into `pu`, or would you
> prefer if I sent another iteration of `rebase-in-c-4-opts`?

Rebuilding 4 will involve rebuilding all the later ones anyway, so
I'll just try doing it myself and report back if I saw issues.
Thanks.



Re: [PATCH] fixup! builtin rebase: support `--gpg-sign` option

2018-10-10 Thread Junio C Hamano
Junio C Hamano  writes:

> From: Johannes Schindelin 
> Date: Thu, 27 Sep 2018 14:48:17 +0200
>
> The `--gpg-sign` option takes an *optional* argument, not a mandatory
> one.
>
> This was discovered as part of the investigation of
> https://github.com/git-for-windows/git/issues/1836.
>
> Signed-off-by: Johannes Schindelin 
> ---
>
>  * I am sending this out as I want to mimize the number of
>non-trivial changes that come into my tree without hitting the
>list archive.
>
>  builtin/rebase.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/rebase.c b/builtin/rebase.c
> index a28bfbd62f..43bc6f7915 100644
> --- a/builtin/rebase.c
> +++ b/builtin/rebase.c
> @@ -1030,8 +1030,9 @@ int cmd_rebase(int argc, const char **argv, const char 
> *prefix)
>   OPT_BOOL(0, "autosquash", ,
>N_("move commits that begin with "
>   "squash!/fixup! under -i")),
> - OPT_STRING('S', "gpg-sign", _sign,
> -N_("gpg-sign?"), N_("GPG-sign commits")),
> + { OPTION_STRING, 'S', "gpg-sign", _sign, N_("key-id"),
> + N_("GPG-sign commits"),
> + PARSE_OPT_OPTARG, NULL, (intptr_t) "" },

This allows "--gpg-sign" in addition to "--gpg-sign=ARG" and stores
an empty string in place of ARG when accepting the option without
arg.  

The result in gpg_sign is used by checking if the pointer is
non-NULL and formatted with xstrfmt("-S%s", gpg_sign).

The change looks correct; will squash into what has been queued on
'pu'.

Thanks.

>   OPT_STRING_LIST(0, "whitespace", ,
>   N_("whitespace"), N_("passed to 'git apply'")),
>   OPT_SET_INT('C', 0, _c, N_("passed to 'git apply'"),


[PATCH] fixup! builtin rebase: support `--gpg-sign` option

2018-10-10 Thread Junio C Hamano
From: Johannes Schindelin 
Date: Thu, 27 Sep 2018 14:48:17 +0200

The `--gpg-sign` option takes an *optional* argument, not a mandatory
one.

This was discovered as part of the investigation of
https://github.com/git-for-windows/git/issues/1836.

Signed-off-by: Johannes Schindelin 
---

 * I am sending this out as I want to mimize the number of
   non-trivial changes that come into my tree without hitting the
   list archive.

 builtin/rebase.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/rebase.c b/builtin/rebase.c
index a28bfbd62f..43bc6f7915 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -1030,8 +1030,9 @@ int cmd_rebase(int argc, const char **argv, const char 
*prefix)
OPT_BOOL(0, "autosquash", ,
 N_("move commits that begin with "
"squash!/fixup! under -i")),
-   OPT_STRING('S', "gpg-sign", _sign,
-  N_("gpg-sign?"), N_("GPG-sign commits")),
+   { OPTION_STRING, 'S', "gpg-sign", _sign, N_("key-id"),
+   N_("GPG-sign commits"),
+   PARSE_OPT_OPTARG, NULL, (intptr_t) "" },
OPT_STRING_LIST(0, "whitespace", ,
N_("whitespace"), N_("passed to 'git apply'")),
OPT_SET_INT('C', 0, _c, N_("passed to 'git apply'"),
-- 
2.19.1-328-g5a0cc8aca7



Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-10 Thread Junio C Hamano
Stefan Beller  writes:

>> * pw/diff-color-moved-ws-fix (2018-10-04) 5 commits
> I would suggest merging to 'next'.

OK.

>> * sb/strbuf-h-update (2018-09-29) 1 commit
> The patch as-is just adds names everywhere.
> I'd be happy to resend with either
> (a) not enforcing names everywhere, but only as needed or
> (b) having names everywhere, capitalizing them NAMES in
> the doc comment.
>
> I am tempted to ask for
> (c) take as-is, defer the rewording of doc strings for a follow up patch.

As long as the planned update eventually comes before all of us
forget, (c) is fine by me.  I'll mark it to be merged to 'next' for
now, and follow through that plan, unless somebody else stops me
before it happens.

>> * sb/submodule-recursive-fetch-gets-the-tip (2018-09-12) 9 commits
> Will resend after a local review.

OK.

Thanks for helping me in updating the status for various topics.


Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-10 Thread Junio C Hamano
Phillip Wood  writes:

> On 10/10/2018 06:43, Junio C Hamano wrote:
>> Here are the topics that have been cooking.  Commits prefixed with
>> '-' are only in 'pu' (proposed updates) while commits prefixed with
>> '+' are in 'next'.  The ones marked with '.' do not appear in any of
>> the integration branches, but I am still holding onto them.
>>
>> * pw/diff-color-moved-ws-fix (2018-10-04) 5 commits
>>   - diff --color-moved: fix a memory leak
>>   - diff --color-moved-ws: fix another memory leak
>>   - diff --color-moved-ws: fix a memory leak
>>   - diff --color-moved-ws: fix out of bounds string access
>>   - diff --color-moved-ws: fix double free crash
>>
>>   Various fixes to "diff --color-moved-ws".
>>
>>   What's the status of this topic?
>
> I think it is ready for next - Stefan was happy with the last iteration.

Thanks.


Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-10 Thread Junio C Hamano
Johannes Sixt  writes:

> Am 10.10.18 um 07:43 schrieb Junio C Hamano:
>> We haven't seen much complaints and breakages reported against the
>> two big "rewrite in C" topics around "rebase"; perhaps it is a good
>> time to merge them to 'next' soonish to cook them for a few weeks
>> before moving them to 'master'?
>
> Please let me express my sincerest gratitude to Alban, Joel,
> Paul-Sebastian, Pratik, and Dscho. It is such a pleasure to work with
> the builtin rebase and stash commands on Windows now. I am using them
> since a month or two, and they work extremely well for me.
>
> Thank you all for your hard work!

OK.  With another Ack from Dscho, I'd feel safe to merge the
"rebase" topics 'next' and start cooking.  "stash" seems to be
almost there but I think it deserves a chance for a final touch-up
before hitting 'next' (see another thread with Thomas).

Thanks.


Re: [RFC PATCH 6/6] commit-reach: fix first-parent heuristic

2018-10-10 Thread Jonathan Nieder
Hi,

Derrick Stolee wrote:

>  commit-reach.c| 4 +++-
>  t/t6600-test-reach.sh | 2 +-
>  2 files changed, 4 insertions(+), 2 deletions(-)

I like this testing technique, and the test passes for me.

Except: if I put

CC = cc -m32
NO_OPENSSL = YesPlease
NO_CURL = YesPlease

in config.mak (the first line to force 32-bit pointers, the others
to avoid some dependencies on libs that I don't have 32-bit versions
of), then the test fails for me:

 $ ./t6600-test-reach.sh -v -x -i
 [...]
 expecting success:
 cp commit-graph-full .git/objects/info/commit-graph &&
 run_and_check_trace2 can_all_from_reach_with_flag num_walked 19 input \
 "test-tool reach can_all_from_reach"

 ++ cp commit-graph-full .git/objects/info/commit-graph
 ++ run_and_check_trace2 can_all_from_reach_with_flag num_walked 19 input 
'test-tool reach can_all_from_r
 each'
 ++ CATEGORY=can_all_from_reach_with_flag
 ++ KEY=num_walked
 ++ VALUE=19
 ++ INPUT=input
 ++ COMMAND='test-tool reach can_all_from_reach'
 +++ pwd
 ++ GIT_TR2_PERFORMANCE='/usr/local/google/home/jrn/src/git/t/trash 
directory.t6600-test-reach/perf-log.t
 xt'
 ++ test-tool reach can_all_from_reach
 can_all_from_reach(X,Y):1
 ++ cat perf-log.txt
 ++ grep 'category:can_all_from_reach_with_flag key:num_walked value:19'
 error: last command exited with $?=1
 not ok 11 - can_all_from_reach:perf
 #
 #   cp commit-graph-full .git/objects/info/commit-graph &&
 #   run_and_check_trace2 can_all_from_reach_with_flag num_walked 
19 input \
 #   "test-tool reach can_all_from_reach"
 #

When I cat perf-log.txt, I see

  ..category:can_all_from_reach_with_flag key:num_walked value:20

instead of the expected 19.

Known issue?

Thanks,
Jonathan


Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-10 Thread Junio C Hamano
Thomas Gummerer  writes:

> There was a v9 of this series [*1*], which hasn't been picked up yet.
> Was that intentional, or an oversight?

;-) Yes, I often miss patches that are buried in other discussions,
but this time, it was quite deliberate.  I saw comments that pointed
out at least one thing that needs to be fixed before the series can
move forward, so I skipped that iteration, anticipating another
round of update.

Also, I was waiting for [*3*] to be answered.

> I left some comments on that iteration.  Some were just style nits,
> but I think at least [*2*] should be addressed before we merge this
> down to master, not sure if any of my other comments apply to v8 as
> well.  I'm happy to send fixup patches, or a patches on top of
> this series for that and my other comments, should they apply to v8,
> or wait for Paul-Sebastian to send a re-roll.  What do you prefer?

The ideal from my point of view is to see responses to your comments
in the original thread (which is about 1300 messages ago in the list
archive by now) by Paul-Sebastian, possibly responded by you and/or
others, resulting in a concensus on what the right update for the
patches should be, finally followed by v10, which hopefully would be
the final one.

> [*1*]: 
> [*2*]: <20180930174848.ge2...@hank.intra.tgummerer.com>

[*3*] 


[PATCH 2/2] Only make bloom filter for first parent

2018-10-10 Thread Jonathan Tan
Signed-off-by: Jonathan Tan 
---
 commit-graph.c |  4 ++--
 revision.c | 20 
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 90b0b3df90..d21d555611 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -782,9 +782,9 @@ static void fill_bloom_filter(struct bloom_filter *bf,
 
for (i = 0; i < commit_nr; i++) {
struct commit *commit = commits[i];
-   struct commit_list *parent;
+   struct commit_list *parent = commit->parents;
 
-   for (parent = commit->parents; parent; parent = parent->next)
+   if (parent)
add_changes_to_bloom_filter(bf, parent->item, commit, i,
);
 
diff --git a/revision.c b/revision.c
index c84a997928..5a433a5878 100644
--- a/revision.c
+++ b/revision.c
@@ -539,11 +539,11 @@ static int check_maybe_different_in_bloom_filter(struct 
rev_info *revs,
 }
 
 static int rev_compare_tree(struct rev_info *revs,
-   struct commit *parent, struct commit *commit)
+   struct commit *parent, struct commit *commit, int 
nth_parent)
 {
struct tree *t1 = get_commit_tree(parent);
struct tree *t2 = get_commit_tree(commit);
-   int bloom_ret;
+   int bloom_ret = 1;
 
if (!t1)
return REV_TREE_NEW;
@@ -568,17 +568,21 @@ static int rev_compare_tree(struct rev_info *revs,
return REV_TREE_SAME;
}
 
-   bloom_ret = check_maybe_different_in_bloom_filter(revs, parent, commit);
-   if (bloom_ret == 0)
-   return REV_TREE_SAME;
+   if (!nth_parent) {
+   bloom_ret = check_maybe_different_in_bloom_filter(revs, parent, 
commit);
+   if (bloom_ret == 0)
+   return REV_TREE_SAME;
+   }
 
tree_difference = REV_TREE_SAME;
revs->pruning.flags.has_changes = 0;
if (diff_tree_oid(>object.oid, >object.oid, "",
   >pruning) < 0)
return REV_TREE_DIFFERENT;
-   if (bloom_ret == 1 && tree_difference == REV_TREE_SAME)
-   bloom_filter_count_false_positive++;
+   if (!nth_parent) {
+   if (bloom_ret == 1 && tree_difference == REV_TREE_SAME)
+   bloom_filter_count_false_positive++;
+   }
return tree_difference;
 }
 
@@ -776,7 +780,7 @@ static void try_to_simplify_commit(struct rev_info *revs, 
struct commit *commit)
die("cannot simplify commit %s (because of %s)",
oid_to_hex(>object.oid),
oid_to_hex(>object.oid));
-   switch (rev_compare_tree(revs, p, commit)) {
+   switch (rev_compare_tree(revs, p, commit, nth_parent)) {
case REV_TREE_SAME:
if (!revs->simplify_history || !relevant_commit(p)) {
/* Even if a merge with an uninteresting
-- 
2.19.0.271.gfe8321ec05.dirty



[PATCH 1/2] One filter per commit

2018-10-10 Thread Jonathan Tan
Signed-off-by: Jonathan Tan 
---
 bloom-filter.c | 31 ++-
 bloom-filter.h | 12 
 commit-graph.c | 26 --
 revision.c |  9 +++--
 4 files changed, 37 insertions(+), 41 deletions(-)

diff --git a/bloom-filter.c b/bloom-filter.c
index 7dce0e35fa..39b453908f 100644
--- a/bloom-filter.c
+++ b/bloom-filter.c
@@ -1,14 +1,17 @@
 #include "cache.h"
 #include "bloom-filter.h"
 
-void bloom_filter_init(struct bloom_filter *bf, uint32_t bit_size)
+void bloom_filter_init(struct bloom_filter *bf, uint32_t commit_nr, uint32_t 
bit_size)
 {
if (bit_size % CHAR_BIT)
BUG("invalid size for bloom filter");
+   if (bit_size > 1024)
+   BUG("aborting: the bit size is per commit, not for the whole 
filter");
 
bf->nr_entries = 0;
+   bf->commit_nr = commit_nr;
bf->bit_size = bit_size;
-   bf->bits = xmalloc(bit_size / CHAR_BIT);
+   bf->bits = xcalloc(1, commit_nr * bit_size / CHAR_BIT);
 }
 
 void bloom_filter_free(struct bloom_filter *bf)
@@ -19,24 +22,24 @@ void bloom_filter_free(struct bloom_filter *bf)
 }
 
 
-void bloom_filter_set_bits(struct bloom_filter *bf, const uint32_t *offsets,
+static void bloom_filter_set_bits(struct bloom_filter *bf, uint32_t graph_pos, 
const uint32_t *offsets,
   int nr_offsets, int nr_entries)
 {
int i;
for (i = 0; i < nr_offsets; i++) {
-   uint32_t byte_offset = (offsets[i] % bf->bit_size) / CHAR_BIT;
+   uint32_t byte_offset = (offsets[i] % bf->bit_size + graph_pos * 
bf->bit_size) / CHAR_BIT;
unsigned char mask = 1 << offsets[i] % CHAR_BIT;
bf->bits[byte_offset] |= mask;
}
bf->nr_entries += nr_entries;
 }
 
-int bloom_filter_check_bits(struct bloom_filter *bf, const uint32_t *offsets,
+static int bloom_filter_check_bits(struct bloom_filter *bf, uint32_t 
graph_pos, const uint32_t *offsets,
int nr)
 {
int i;
for (i = 0; i < nr; i++) {
-   uint32_t byte_offset = (offsets[i] % bf->bit_size) / CHAR_BIT;
+   uint32_t byte_offset = (offsets[i] % bf->bit_size + graph_pos * 
bf->bit_size) / CHAR_BIT;
unsigned char mask = 1 << offsets[i] % CHAR_BIT;
if (!(bf->bits[byte_offset] & mask))
return 0;
@@ -45,19 +48,19 @@ int bloom_filter_check_bits(struct bloom_filter *bf, const 
uint32_t *offsets,
 }
 
 
-void bloom_filter_add_hash(struct bloom_filter *bf, const unsigned char *hash)
+void bloom_filter_add_hash(struct bloom_filter *bf, uint32_t graph_pos, const 
unsigned char *hash)
 {
uint32_t offsets[GIT_MAX_RAWSZ / sizeof(uint32_t)];
hashcpy((unsigned char*)offsets, hash);
-   bloom_filter_set_bits(bf, offsets,
+   bloom_filter_set_bits(bf, graph_pos, offsets,
 the_hash_algo->rawsz / sizeof(*offsets), 1);
 }
 
-int bloom_filter_check_hash(struct bloom_filter *bf, const unsigned char *hash)
+int bloom_filter_check_hash(struct bloom_filter *bf, uint32_t graph_pos, const 
unsigned char *hash)
 {
uint32_t offsets[GIT_MAX_RAWSZ / sizeof(uint32_t)];
hashcpy((unsigned char*)offsets, hash);
-   return bloom_filter_check_bits(bf, offsets,
+   return bloom_filter_check_bits(bf, graph_pos, offsets,
the_hash_algo->rawsz / sizeof(*offsets));
 }
 
@@ -80,11 +83,12 @@ int bloom_filter_load(struct bloom_filter *bf)
return -1;
 
read_in_full(fd, >nr_entries, sizeof(bf->nr_entries));
+   read_in_full(fd, >commit_nr, sizeof(bf->commit_nr));
read_in_full(fd, >bit_size, sizeof(bf->bit_size));
if (bf->bit_size % CHAR_BIT)
BUG("invalid size for bloom filter");
-   bf->bits = xmalloc(bf->bit_size / CHAR_BIT);
-   read_in_full(fd, bf->bits, bf->bit_size / CHAR_BIT);
+   bf->bits = xmalloc(bf->commit_nr * bf->bit_size / CHAR_BIT);
+   read_in_full(fd, bf->bits, bf->commit_nr * bf->bit_size / CHAR_BIT);
 
close(fd);
 
@@ -96,8 +100,9 @@ void bloom_filter_write(struct bloom_filter *bf)
int fd = xopen(git_path_bloom(), O_WRONLY | O_CREAT | O_TRUNC, 0666);
 
write_in_full(fd, >nr_entries, sizeof(bf->nr_entries));
+   write_in_full(fd, >commit_nr, sizeof(bf->commit_nr));
write_in_full(fd, >bit_size, sizeof(bf->bit_size));
-   write_in_full(fd, bf->bits, bf->bit_size / CHAR_BIT);
+   write_in_full(fd, bf->bits, bf->commit_nr * bf->bit_size / CHAR_BIT);
 
close(fd);
 }
diff --git a/bloom-filter.h b/bloom-filter.h
index 94d0af1708..607649b8db 100644
--- a/bloom-filter.h
+++ b/bloom-filter.h
@@ -5,30 +5,26 @@
 
 struct bloom_filter {
uint32_t nr_entries;
+   uint32_t commit_nr;
uint32_t bit_size;
unsigned char *bits;
 };
 
 
-void bloom_filter_init(struct bloom_filter *bf, uint32_t bit_size);
+void 

[PATCH 0/2] Per-commit filter proof of concept

2018-10-10 Thread Jonathan Tan
I did my own experiments on top of what Szeder provided - the first
patch is to have one fixed-size bloom filter per commit, and the second
patch makes that bloom filter apply to only the first parent of each
commit. The results are:

  Original (szeder's)
  $ GIT_USE_POC_BLOOM_FILTER=$((8*1024*1024*8)) time ./git commit-graph write
  0:10.28
  $ ls -l .git/objects/info/bloom
  8388616
  $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y time ./git -c \
core.commitgraph=true rev-list --count --full-history HEAD -- \
t/valgrind/valgrind.sh
  886
  bloom filter total queries: 66459 definitely not: 65276 maybe: 1183 false 
positives: 297 fp ratio: 0.004469
  0:00.24

  With patch 1
  $ GIT_USE_POC_BLOOM_FILTER=256 time ./git commit-graph write
  0:16.22
  $ ls -l .git/objects/info/bloom
  1832620
  $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y time ./git -c \
core.commitgraph=true rev-list --count --full-history HEAD -- \
t/valgrind/valgrind.sh
  886
  bloom filter total queries: 66459 definitely not: 46637 maybe: 19822 false 
positives: 18936 fp ratio: 0.284928
  0:01.53

  With patch 2
  $ GIT_USE_POC_BLOOM_FILTER=256 time ./git commit-graph write
  0:06.70
  $ ls -l .git/objects/info/bloom
  1832620
  $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y time ./git -c \
core.commitgraph=true rev-list --count --full-history HEAD -- \
t/valgrind/valgrind.sh
  886
  bloom filter total queries: 53096 definitely not: 52989 maybe: 107 false 
positives: 89 fp ratio: 0.001676
  0:01.29

For comparison, a non-GIT_USE_POC_BLOOM_FILTER rev-list takes 3.517
seconds.

I haven't investigated why patch 1 takes longer than the original to
create the bloom filter.

Using per-commit filters and restricting the bloom filter to a single
parent increases the relative power of the filter in omitting tree
inspections compared to the original (107/53096 vs 1183/66459), but the
lack of coverage w.r.t. the non-first parents had a more significant
effect than I thought (1.29s vs .24s). It might be best to have one
filter for each (commit, parent) pair (or, at least, the first two
parents of each commit - we probably don't need to care that much about
octopus merges) - this would take up more disk space than if we only
store filters for the first parent, but is still less than the original
example of storing information for all commits in one filter.

There are more possibilities like dynamic filter sizing, different
hashing, and hashing to support wildcard matches, which I haven't looked
into.

Jonathan Tan (2):
  One filter per commit
  Only make bloom filter for first parent

 bloom-filter.c | 31 ++-
 bloom-filter.h | 12 
 commit-graph.c | 30 ++
 revision.c | 29 +++--
 4 files changed, 51 insertions(+), 51 deletions(-)

-- 
2.19.0.271.gfe8321ec05.dirty



Re: [PATCH 2/3] ls-remote: release memory instead of UNLEAK

2018-10-10 Thread Junio C Hamano
Olga Telezhnaya  writes:

> Use ref_array_clear() to release memory instead of UNLEAK macros.
>
> Signed-off-by: Olga Telezhnaia 
> ---
>  builtin/ls-remote.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

OK, this is immediately before the command exits, and we have a way
to clear and release the resource, so it is obvious we should use
it.

Good.


>
> diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
> index 1a25df7ee15b4..6a0cdec30d2d7 100644
> --- a/builtin/ls-remote.c
> +++ b/builtin/ls-remote.c
> @@ -151,6 +151,6 @@ int cmd_ls_remote(int argc, const char **argv, const char 
> *prefix)
>   }
>  
>   UNLEAK(sorting);
> - UNLEAK(ref_array);
> + ref_array_clear(_array);
>   return status;
>  }
>
> --
> https://github.com/git/git/pull/538


Re: [PATCH 3/3] ref-filter: free item->value and item->value->s

2018-10-10 Thread Junio C Hamano
Olga Telezhnaya  writes:

> Release item->value.
> Initialize item->value->s dynamically and then release its resources.
> Release some local variables.

Again, "why" is lacking.

> @@ -1373,36 +1379,31 @@ static void fill_remote_ref_details(struct used_atom 
> *atom, const char *refname,
>   }
>   } else if (atom->u.remote_ref.option == RR_TRACKSHORT) {
>   if (stat_tracking_info(branch, _ours, _theirs,
> -NULL, AHEAD_BEHIND_FULL) < 0)
> +NULL, AHEAD_BEHIND_FULL) < 0) {
> + *s = xstrdup("");
>   return;

It is a bit sad that we need to sprinkle xstrdup() all over the
place, but I do not offhand think of a better alternative to ensure
that it is safe to blindly free the .s field.

> - if (explicit)
> - *s = xstrdup(remote);
> - else
> - *s = "";
> + *s = explicit ? xstrdup(remote) : xstrdup("");

Next time, please avoid mixing this kind of unrelated changes with
the main theme (i.e. "the original had allocated and static pieces
of memory pointed by the same variable, which made it impossible to
blindly free it, so make sure everything is allocated").  It makes
it harder to let reviewers' eyes coast over the patch.

I say "Next time" because the change is already written this time,
and I already spent time to see it was an OK change.  By the way,

*s = xstrdup(explicit ? remote : "");

is probably shorter.

> @@ -1562,10 +1566,11 @@ static int populate_value(struct ref_array_item *ref, 
> struct strbuf *err)
>   if (!refname)
>   continue;
>   }
> + free((char *)v->s); // we will definitely re-init it on 
> the next line

No // comment, please.

>  static void free_array_item(struct ref_array_item *item)
>  {
>   free((char *)item->symref);
> + if (item->value) {
> + free((char *)item->value->s);
> + free(item->value);
> + }
>   free(item);
>  }

OK.


Re: [PATCH 1/3] ref-filter: free memory from used_atom

2018-10-10 Thread Junio C Hamano
Olga Telezhnaya  writes:

> Release memory from used_atom variable.

That much readers would know from a quick look of the patch text.

Without knowing what you are aiming at, it is impossible to judge if
the patch is a good change.

Seeing FREE_AND_NULL(array->items) in the same function makes me
think that the designer of ref_array_clear() function would want
this sequence of events to work correctly in an ideal future:

 * Do a ref-filter operation by calling filter_refs(), receiving the
   result into an array..

 * Do another ref-filter by calling filter_refs(), using different
   criteria, receiving the result into a different array.

 * Iterate over the resulting refs in the first array, and call
   format_ref_array_item().

 * ref_array_clear() the first array, as the caller is done with it.

 * Iterate over the resulting refs in the second array, and call
   format_ref_array_item().

 * ref_array_clear() the second array, as the caller is done with
   it.

However, I think it would make it impossible for the second call to
work correctly if this code freed used_atom without clearing, and
not re-initializing the used_atom_cnt etc.

If on the other hand, the only thing you are interested in is to
just discard pieces of memory we no longer use, and you are not
interested in helping to move us closer to the world in which we can
call filter_refs() twice, then the change this patch makes is
sufficient.

And the place to answer the "what are you aiming at?" question is in
the proposed commit log message.

In an ideal future, I _think_ the file-scope static variables in
ref-filter.c like used_atom and used_atom_cnt should become fields
of a new structure (say "struct ref_filter"), with initializer and
uninitializer ref_filter_new() and ref_filter_destroy().  When that
happens, I think FREE_AND_NULL(used_atom) + used_atom_cnt=0 should
become part of ref_filter_destroy(), not part of ref_array_clear().

But we are not there yet, and a clean-up patch like this does not
have to be a step towards that goal.

> Signed-off-by: Olga Telezhnaia 
> ---
>  ref-filter.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/ref-filter.c b/ref-filter.c
> index e1bcb4ca8a197..1b71d08a43a84 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -1996,6 +1996,9 @@ void ref_array_clear(struct ref_array *array)
>  {
>   int i;
>  
> + for (i = 0; i < used_atom_cnt; i++)
> + free((char *)used_atom[i].name);
> + free(used_atom);
>   for (i = 0; i < array->nr; i++)
>   free_array_item(array->items[i]);
>   FREE_AND_NULL(array->items);
>
> --
> https://github.com/git/git/pull/538


Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275

2018-10-10 Thread Jeff King
On Wed, Oct 10, 2018 at 10:59:45PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Callers who _are_ prepared to act on the exit code probably ought to
> > just use --auto-exit-code in their invocation.
> >
> > That said, I'm not entirely opposed to the matching config. There's
> > enough history here that somebody might want a sledgehammer setting to
> > go back to the old behavior.
> 
> If it's not a config option then as git is upgraded I'll need to change
> my across-server invocation to be some variant of checking git version,
> then etiher using the --auto-exit-code option or not (which'll error on
> older gits). Easier to be able to just drop in a config setting before
> the upgrade.

Yeah, that's the "there's enough history here" that I was referring to,
but I hadn't quite thought through a concrete example. That makes sense.

(Though I also think the other part of the thread is reasonable, too,
where we'd just have a command to abstract away "cat .git/gc.log" into
"git gc --show-detached-log" or something).

-Peff


Re: [PATCH v2 1/1] branch: introduce --show-current display option

2018-10-10 Thread Jeff King
On Wed, Oct 10, 2018 at 10:54:32PM +0200, Daniels Umanovskis wrote:

> When called with --show-current, git branch will print the current
> branch name and terminate. Only the actual name gets printed,
> without refs/heads. In detached HEAD state, nothing is output.

I also wondered what happens in an unborn-branch state (i.e., we are on
refs/heads/master, but have not yet made any commits).

In that case, resolve_ref_unsafe() will return the branch name, but a
null oid. And you'll print that. Which seems sensible.

> Intended both for scripting and interactive/informative use.
> Unlike git branch --list, no filtering is needed to just get the
> branch name.

We should not advertise this to be used for scripting. The git-branch
command is porcelain, and we reserve the right to change its output
between versions, or based on user config. Fortunately, there is already
a blessed way to get this in a script, which is:

  git symbolic-ref [--short] HEAD

I'm not opposed to having "branch --show-current" as a more user-facing
alternative for people who want to query the state. I do wonder if
people would want it to show extra information, like:

  - if we're detached, from where (like the normal "branch --list"
shows)

  - are we on an unborn branch

  - are we ahead/behind an upstream (like "branch -v")

I guess that's slowly reinventing the first line of "git status -b".
Maybe that's a good thing; possibly this should just be a way to get
that data without doing the rest of git-status, and it's perhaps more
discoverable since it's part of git-branch. I dunno.

It just seems like in its current form it might be in an uncanny valley
where it is not quite scriptable plumbing, but not as informative as
other porcelain.

-Peff


Re: [PATCH 2/2] push: add an advice on unqualified push

2018-10-10 Thread Jeff King
On Thu, Oct 11, 2018 at 06:54:15AM +0900, Junio C Hamano wrote:

> > I'm not sure about saying "branch or tag" in the first bullet. It's
> > friendlier to most users, but less technically correct (if you said
> > "notes/foo", I believe we'd match an existing "refs/notes/foo", because
> > it's really just using the normal lookup rules).
> 
> An alternative may be "looking for a ref that matches %s on the
> remote side".  I am no longer a total newbie, so I cannot tell how
> well that message would help one to connect notes/foo one just typed
> with refs/notes/foo that potentially exists on the remote side.

Yeah. Really, it would be nice to imply that it somehow does the same
DWIM lookup that we do for local refs. But I didn't know how to say
that. Possibly we could refer to the documentation, but it's buried in
git-rev-parse.

> > Also, as an aside, I wonder if we should allow "heads/foo" to work as
> > "refs/heads/foo" (even when no such ref already exists). But that is
> > totally orthogonal to changing the message.
> 
> I am neutral on this point but agree that it is better done outside
> this patch.

Yeah, definitely. I would almost call it a leftover bit, but I think the
subtlety is not in the code, but in whether it is a good thing to be
doing (i.e., too many false positives).

-Peff


Re: [PATCH 2/2] push: add an advice on unqualified push

2018-10-10 Thread Jeff King
On Wed, Oct 10, 2018 at 11:23:25PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I wonder if we could reword the first paragraph to be a little less
> > confusing, and spell out what we tried already. E.g., something like:
> >
> >   The destination you provided is not a full refname (i.e., starting
> >   with "ref"). Git tried to guess what you meant by:
> >
> > - looking for a matching branch or tag on the remote side
> >
> > - looking at the refname of the local source
> >
> >   but neither worked.
> >
> >   The  part of the refspec is a commit object.
> >   Did you mean...
> 
> Yeah that makes sense. I was trying to avoid touching the existing
> wording to make this more surgical, but you came up with it, and since
> you don't like it I'll just change that too.

I certainly know the feeling of trying to avoid wording bikeshed
discussions. But in this instance, please feel free to aggressively
rewrite that old message. ;)

What I wrote above was off-the-cuff, and I also do not mind if you use
it as a starting point to make improvements (or take it wholesale if you
really like it).

> > I think it would probably be OK to put the first paragraph in its own
> > variable. I know we try to avoid translation lego, but I'd think
> > paragraphs are separate units. Or are you worried about how to get them
> > into the same advise() call? I don't know that we need to, but we could
> > also plug one into the other with a "%s" (and leave a translator note).
> 
> To be honest from being on the code side of a much bigger i18n effort
> (the MediaWiki/WikiMedia software) back in the early days of my career I
> just do this sort of thing reflexively, because from experience when I
> started trying to simplify stuff by making assumptions I was wrong every
> time.
> [...]

OK, I'm happy to defer to your judgement here. I have very little
translation experience myself.

> > Can we just do it as:
> >
> >   if (advice_push_ambiguous_ref_name) {
> > struct object_id oid;
> > enum object_type type;
> >
> > if (get_oid(...))
> >etc...
> >   }
> >
> > instead? That pushes your indentation one level in, but I think the
> > whole conditional body might be cleaner in a helper function anyway.
> 
> I started out with that and found myself really constrained by the 72
> char ceiling which I'm already smashing through with these ~95 character
> lines (but at least it's under 100!). But sure, we can do with 8 more.

That's why I suggested the helper function. :) I'm also not opposed to
pulling messages out to static file-level variables, even if they're
only used once. Sometimes it's nice to have them left-aligned (or close
to it) to see how they'll actually look in a terminal.

-Peff


Re: [PATCH v5 17/23] userdiff.c: remove implicit dependency on the_index

2018-10-10 Thread Jeff King
On Thu, Oct 11, 2018 at 07:14:31AM +0900, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > I get why you're doing it: your topic here only cares about removing
> > index dependencies, so you did the minimal thing to move that forward.
> >
> > But if you think about what this function is doing, it is quite clearly
> > dependent on the whole repository, since the userdiff config we're
> > looking up may come from repo config.
> 
> In the case of userdiff that is pretty much limited to read-only
> operation, I fully agree, but in more general cases, we would need
> to pass both the repository and an in-core index separately, I would
> say.  Imagine doing a partial commit, where we construct a separate
> istate that is not the "repo's index" and use that to write out a
> tree object to be wrapped in a new commit, and update the current
> branch ref.

Yeah, agreed. I was actually puzzled at first why userdiff needs to know
about the index at all. But the answer is that the attr code may read
.gitattributes out of the index. It's _possible_ somebody would want to
do that with an index besides the normal repo one, but I find it
somewhat unlikely. I think my instinct there is based on it being
"read-only", as you said.

One thing that confused me even more is that diff_options now has a
"struct repository" field in it. I get how that saves passing it around,
but I also wonder if it may run into similar issues at some point. I'm
perfectly willing to punt on it until it actually comes up in practice,
though.

-Peff


Re: [PATCH] range-diff: allow to diff files regardless submodule

2018-10-10 Thread brian m. carlson
On Wed, Oct 10, 2018 at 08:09:16AM -0700, Lucas De Marchi wrote:
> Do like it's done in grep so mode doesn't end up as
> 016, which means range-diff doesn't work if one has
> "submodule.diff = log" in the configuration. Without this
> while using range-diff I only get a
> 
> Submodule a 000...000 (new submodule)
> 
> instead of the diff between the revisions.
> 
> Signed-off-by: Lucas De Marchi 
> ---
>  range-diff.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/range-diff.c b/range-diff.c
> index 60edb2f518..bd8083f2d1 100644
> --- a/range-diff.c
> +++ b/range-diff.c
> @@ -354,7 +354,7 @@ static struct diff_filespec *get_filespec(const char 
> *name, const char *p)
>  {
>   struct diff_filespec *spec = alloc_filespec(name);
>  
> - fill_filespec(spec, _oid, 0, 0644);
> + fill_filespec(spec, _oid, 0, 0100644);

If we have a system that has different mode values from the common Unix
ones, is this still correct or does it need to change?
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH] gc: remove redundant check for gc_auto_threshold

2018-10-10 Thread Brandon Casey
On Wed, Oct 10, 2018 at 4:38 PM Junio C Hamano  wrote:
>
> Brandon Casey  writes:
>
> > ...  Again, I don't feel strongly about it, but I'm not
> > sure this change actually improves the code.
>
> Yeah, in the context of the current caller, this is a safe change
> that does not break anybody and reduces the number of instructions
> executed in this codepath.  A mistaken caller may be added in the
> future that fails to check auto-threashold beforehand, but that
> won't lead to anything bad like looping for a large number of times,
> so as long as the API contract into this helper function is clear
> that callers are responsible to check beforehand, it is still not
> too bad.
>
> So, I'd throw this into "Meh - I won't regret applying it, but it is
> not the end of the world if I forget to apply it, either" pile.
>
> I _think_ a change that actually improves the code would be to
> restructure so that there is a helper that is responsible for
> guestimating the number of loose objects, and another that uses the
> helper to see if there are too many loose objects.  The latter is
> the only one tha needs to know about auto-threashold.  But we are
> not in immdiate need for such a clean-up, I guess, unless somebody
> is actively looking into revamping how auto-gc works and doing a
> preparatory clean-up.

Agreed on all points, and as usual, said better than I could :-)

-Brandon


Re: [PATCH 1/1] protocol: limit max protocol version per service

2018-10-10 Thread Josh Steadmon
On 2018.10.05 12:25, Stefan Beller wrote:
> > > > I suppose if we are strict about serving from a single endpoint, the
> > > > version registry makes more sense, and individual operations can declare
> > > > acceptable version numbers before calling any network code?
> > >
> > > Ah yeah, that makes sense!
> >
> > Thinking out loud here. Please let me know if I say something stupid :)
> >
> > So we'll have (up to) three pieces of version information we'll care
> > about for version negotiation:
> >
> > 1. (maybe) a client-side protocol.version config entry
> 
> (and in case we don't, we have it implicit ly hardcoded, as
> we have to choose the default for users that don't care themselves about
> this setting)
> 
> > 2. a list of acceptable proto versions for the currently running
> >operation on the client
> > 3. a list of acceptable proto versions for the server endpoint that
> >handles the request
> 
> Yes that matches my understanding. The issue is between (1) and (2)
> as (1) is in a generic config, whereas (2) is specific to the command,
> such that it may differ. And as a user you may want to express things
> like: "use the highest version", which is done by setting (1) to "version 2"
> despite (2) not having support of all commands for v2.
> 
> > According to the doc on protocol.version: "If set, clients will attempt
> > to communicate with a server using the specified protocol version. If
> > unset, no attempt will be made by the client to communicate using a
> > particular protocol version, this results in protocol version 0 being
> > used."
> >
> > So, if protocol.version is not set, or set to 0, the client should not
> > attempt any sort of version negotiation.
> 
> Yes, as version 0 is unaware of versions, i.e. there are old installations
> out there where all the versioning code is not there, so in case of an
> old client the new server *must* speak v0 to be able to communicate
> (and vice versa).
> 
> 
> > Otherwise, the client prefers a
> > particular version, but we don't guarantee that they will actually use
> > that version after the (unspecified) version negotiation procedure.
> >
> > If protocol.version is set to something other than 0, we construct a
> > list of acceptable versions for the given operation. If the
> > protocol.version entry is present in that list, we move it to the front
> > of the list to note that it is the preferred version. We send all of
> > these, in order, in the request.
> >
> > When the server endpoint begins to handle a request, it first constructs
> > a list of acceptable versions. If the client specifies a list of
> > versions, we check them one-by-one to see if they are acceptable. If we
> > find a match, we use that version. If we don't find any matches or if
> > the client did not send a version list, we default to v0.
> >
> > Seem reasonable?
> 
> This sounds super reasonable!

So this runs into problems with remote-curl (and possibly other remote
helpers):

builtin/push.c can declare whatever allowed versions it wants, but those
are not carried over when remote-curl is started to actually talk to the
remote. What's worse, remote-curl starts its HTTP connection before it
knows what command it's actually acting as a helper for.

For now, I'm going to try adding an --allowed_versions flag for the
remote helpers, but if anyone has a better idea, let me know.


Thanks,
Josh


Re: [PATCH 0/2] branch: introduce --current display option

2018-10-10 Thread brian m. carlson
On Wed, Oct 10, 2018 at 05:59:05AM +0900, Junio C Hamano wrote:
> I do not offhand know if we want "show the current one only" option
> that is "command mode" sitting next to "list", "delete", "rename"
> etc., or "limit the operation to the one that is currently cheked
> out".  If we want the former, the name of the option must *NOT* be
> just "current".  Have a verb in its name to avoid it from getting
> mistaken as a botched attempt to do the latter.  Somethng like
> "--show-current", "--list-current", "--display-current", etc.

I had considered sending a patch with this option spelled "--show".
This is certainly a highly desired feature (hence my intent to send a
patch), and I think there's room for both a porcelain (this series) and
a plumbing (git rev-parse --abbrev-ref) version.

> Even if we were doing the latter (i.e. focused "this is only for
> listing/showing"), if we do not want to close the door to later
> extend the concept of "current" to the former (i.e. "--show-current"
> becomes a convenience synonym for "--list --current-only") we also
> need to think about what to do with the detached HEAD state.  When
> the concept of "current" is extended to become "usually an operation
> can work on multiple branches but we are limiting it to the current
> one", detached HEAD state is conceptually "not having any current
> branch".  We could fail the operation (i.e. you told me to distim
> the branch but there is no such branch) or make it a silent no-op
> (i.e. you told me to distim no branch, so nothing happened and there
> is no error).

What I would suggest is the same thing git status shows: "HEAD (detached
at...)".  I'll admit it isn't strictly a branch, but that's what most
people will want to see, I expect.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH] doc: move git-rev-parse from porcelain to plumbing

2018-10-10 Thread Junio C Hamano
Daniels Umanovskis  writes:

> On 10/11/18 12:26 AM, Junio C Hamano wrote:
>> Among the remaining ones in the list, cherry and get-tar-commit-id
>> are probably better classified as plumbing.  I do not know why
>> cherry is marked for completion; perhaps some crazy people use that
>> on the command line?
>
> I think cherry could go either way, get-tar-commit-id is definitely
> plumbing. Would you like me to fix those two on the same patch then?

No, what you sent for rev-parse is already good.  A separate patch
that addresses other ones can be discussed as an orthogonal matter.
It even may deserve to make them two separate patches, as I
anticipate that some people would resist marking "cherry" as
plumbing, so that only one can be applied while dropping the other
as/if needed.

Thanks.



Re: [PATCH] gc: remove redundant check for gc_auto_threshold

2018-10-10 Thread Junio C Hamano
Brandon Casey  writes:

> ...  Again, I don't feel strongly about it, but I'm not
> sure this change actually improves the code.

Yeah, in the context of the current caller, this is a safe change
that does not break anybody and reduces the number of instructions
executed in this codepath.  A mistaken caller may be added in the
future that fails to check auto-threashold beforehand, but that
won't lead to anything bad like looping for a large number of times,
so as long as the API contract into this helper function is clear
that callers are responsible to check beforehand, it is still not
too bad.

So, I'd throw this into "Meh - I won't regret applying it, but it is
not the end of the world if I forget to apply it, either" pile.

I _think_ a change that actually improves the code would be to
restructure so that there is a helper that is responsible for
guestimating the number of loose objects, and another that uses the
helper to see if there are too many loose objects.  The latter is
the only one tha needs to know about auto-threashold.  But we are
not in immdiate need for such a clean-up, I guess, unless somebody
is actively looking into revamping how auto-gc works and doing a
preparatory clean-up.



Re: [PATCH] doc: fix a typo and clarify a sentence

2018-10-10 Thread Junio C Hamano
Mihir Mehta  writes:

> -Just in case if you are doing something exotic, it should be
> +Just in case you are doing something exotic, it should be

Thanks.  Somehow I didn't notice this change earlier, but it looks
good, too.

Will queue.


[PATCH] diff.c: pass sign_index to emit_line_ws_markup

2018-10-10 Thread Stefan Beller
Instead of passing the sign directly to emit_line_ws_markup, pass only the
index to lookup the sign in diff_options->output_indicators.

Signed-off-by: Stefan Beller 
---

I still have this patch laying around, it simplifies the diff code
a tiny bit.

Stefan

 diff.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index f0c7557b40..9e895f2191 100644
--- a/diff.c
+++ b/diff.c
@@ -1202,10 +1202,11 @@ static void dim_moved_lines(struct diff_options *o)
 static void emit_line_ws_markup(struct diff_options *o,
const char *set_sign, const char *set,
const char *reset,
-   char sign, const char *line, int len,
+   int sign_index, const char *line, int len,
unsigned ws_rule, int blank_at_eof)
 {
const char *ws = NULL;
+   int sign = o->output_indicators[sign_index];
 
if (o->ws_error_highlight & ws_rule) {
ws = diff_get_color_opt(o, DIFF_WHITESPACE);
@@ -1285,8 +1286,7 @@ static void emit_diff_symbol_from_struct(struct 
diff_options *o,
set = diff_get_color_opt(o, DIFF_FILE_OLD);
}
emit_line_ws_markup(o, set_sign, set, reset,
-   
o->output_indicators[OUTPUT_INDICATOR_CONTEXT],
-   line, len,
+   OUTPUT_INDICATOR_CONTEXT, line, len,
flags & (DIFF_SYMBOL_CONTENT_WS_MASK), 0);
break;
case DIFF_SYMBOL_PLUS:
@@ -1330,8 +1330,7 @@ static void emit_diff_symbol_from_struct(struct 
diff_options *o,
flags &= ~DIFF_SYMBOL_CONTENT_WS_MASK;
}
emit_line_ws_markup(o, set_sign, set, reset,
-   o->output_indicators[OUTPUT_INDICATOR_NEW],
-   line, len,
+   OUTPUT_INDICATOR_NEW, line, len,
flags & DIFF_SYMBOL_CONTENT_WS_MASK,
flags & DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF);
break;
@@ -1375,8 +1374,7 @@ static void emit_diff_symbol_from_struct(struct 
diff_options *o,
set = diff_get_color_opt(o, DIFF_CONTEXT_DIM);
}
emit_line_ws_markup(o, set_sign, set, reset,
-   o->output_indicators[OUTPUT_INDICATOR_OLD],
-   line, len,
+   OUTPUT_INDICATOR_OLD, line, len,
flags & DIFF_SYMBOL_CONTENT_WS_MASK, 0);
break;
case DIFF_SYMBOL_WORDS_PORCELAIN:
-- 
2.19.0



Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-10 Thread Ævar Arnfjörð Bjarmason


On Wed, Oct 10 2018, SZEDER Gábor wrote:

> On Thu, Oct 04, 2018 at 11:09:58PM -0700, Junio C Hamano wrote:
>> SZEDER Gábor  writes:
>>
>> >> git-gc - Cleanup unnecessary files and optimize the local repository
>> >>
>> >> Creating these indexes like the commit-graph falls under "optimize the
>> >> local repository",
>> >
>> > But it doesn't fall under "cleanup unnecessary files", which the
>> > commit-graph file is, since, strictly speaking, it's purely
>> > optimization.
>>
>> I won't be actively engaged in this discussion soon, but I must say
>> that "git gc" doing "garbage collection" is merely an implementation
>> detail of optimizing the repository for further use.  And from that
>> point of view, what needs to be updated is the synopsis of the
>> git-gc doc.  It states "X and Y" above, but it actually is "Y by
>> doing X and other things".
>
> Well, then perhaps the name of the command should be updated, too, to
> better reflect what it actually does...

I don't disagree, but between "git gc" being a longstanding thing called
not just by us , but third parties expecting it to a be a general swiss
army knife for "make repo better" (so for a new tool they'd need to
update their code), and general name bikeshedding I think it's best if
we just proceed for the sake of argument with the assumption that none
of us find the name confusing in this context.

At least that's the discussion I'm interested in. I.e. whether it makes
conceptual / structural sense for this stuff to live in the same place /
function in the code. We can always argue about the name of the function
as a separate topic.

>> I understand your "by definition there is no garbage immediately
>> after clone" position, and also I would understand if you find it
>> (perhaps philosophically) disturbing that "git clone" may give users
>> a suboptimal repository that immediately needs optimizing [*1*].
>>
>> But that bridge was crossed long time ago ever since pack transfer
>> was invented.  The data source sends only the pack data stream, and
>> the receiving end is responsible for spending cycles to build .idx
>> file.  Theoretically, .pack should be all that is needed---you
>> should be able to locate any necessary object by parsing the .pack
>> file every time you open it, and .idx is mere optimization.  You can
>> think of the .midx and graph files the same way.
>
> I don't think this is a valid comparison, because, practically, Git
> just didn't work after I deleted all pack index files.  So while they
> can be easily (re)generated, they are essential to make pack files
> usable.  The commit-graph and .midx files, however, can be safely
> deleted, and everything keeps working as before.

For "things that would run in 20ms now run in 30 seconds" (actual
numbers on a repo I have) values of "keeps working".

So I think this line gets blurred somewhat. In practice if you're
expecting the graph to be there to run the sort of commands that most
benefit from it it's essential that it be generated, not some nice
optional extra.

> OTOH, this is an excellent comparison, and I do think of the .midx and
> graph files the same way as the pack index files.  During a clone, the
> pack index file isn't generated by running a separate 'git gc
> (--auto)', but by clone (or fetch-pack?) running 'git index-pack'.
> The way I see it that should be the case for these other files as well.
>
> And it is much simpler, shorter, and cleaner to either run 'git
> commit-graph ...' or even to call write_commit_graph_reachable()
> directly from cmd_clone(), than to bolt on another option and config
> variable on 'git gc' [1] to coax it into some kind of an "after clone"
> mode, that it shouldn't be doing in the first place.  At least for
> now, so when we'll eventually get as far ...
>
>> I would not be surprised by a future in which the initial index-pack
>> that is responsible for receiving the incoming pack stream and
>> storing that in .pack file(s) while creating corresponding .idx
>> file(s) becomes also responsible for building .midx and graph files
>> in the same pass, or at least smaller number of passes.  Once we
>> gain experience and confidence with these new auxiliary files, that
>> ought to happen naturally.  And at that point, we won't be having
>> this discussion---we'd all happily run index-pack to receive the
>> pack data, because that is pretty much the fundamental requirement
>> to make use of the data.
>
> ... that what you wrote here becomes a reality (and I fully agree that
> this is what we should ultimately aim for), then we won't have that
> option and config variable still lying around and requiring
> maintenance because of backwards compatibility.

We'll still have the use-case of wanting to turn on
gc.writeCommitGraph=true or equivalent and wanting previously-cloned
repositories to "catch up" and get a commit graph ASAP (but not do a
full repack).

This is why my patch tries to unify those two codepaths, i.e. so I can
turn this on 

Re: `--rebase-merges' still failing badly

2018-10-10 Thread Junio C Hamano
Michael Witten  writes:

> On Wed, 10 Oct 2018 14:43:46 +0900, Junio wrote:
>
>> We haven't seen  much complaints and breakages  reported against the
>> two big "rewrite in C" topics  around "rebase"; perhaps it is a good
>> time to merge  them to 'next' soonish  to cook them for  a few weeks
>> before moving them to 'master'?
>
> In my opinion, the `--rebase-merges' feature has been broken since the
> beginning, and the builtin version should  be fixed before it is moved
> ahead.

I'll omit the remainder of the message not because I disagree with
your suggested improvements to "rebase-merges" (that conversation
should happen primarily with Dscho), but because I need to react to
the above three lines.

If "rebase-merges" has been broken since the beginning, as long as
the "rewrite in C" topics around "rebase" do not make it even worse,
I do not think it is a good move to block the topics moving forward.
If the feature were so broken that it is not practically useful,
then people wouldn't be using it in the versions of Git before the
rewrite, so it won't harm anybody if the same feature in the rewritten
version is equally (or even more severely) broken, as long as the
other parts of the feature works at least equally well compared to
the older version.

We are not in the business of hostage taking.

What *should* block the rewrited version is a regression,
i.e. something that used to work well no longer works or works
differently in such a way that established workflows need to be
adjusted.

In any case, suggestions to improve "rebase-merges" is a very much
welcome thing to be discussed on the list, so thanks for raising the
issue.  What I wanted to say is that I do not think that is a reason
to keep "rewrite in C" waiting in 'pu'.



Re: [PATCH] doc: move git-rev-parse from porcelain to plumbing

2018-10-10 Thread Daniels Umanovskis
On 10/11/18 12:26 AM, Junio C Hamano wrote:
> Among the remaining ones in the list, cherry and get-tar-commit-id
> are probably better classified as plumbing.  I do not know why
> cherry is marked for completion; perhaps some crazy people use that
> on the command line?

I think cherry could go either way, get-tar-commit-id is definitely
plumbing. Would you like me to fix those two on the same patch then?



Re: [PATCH v6 09/10] submodule: support reading .gitmodules when it's not in the working tree

2018-10-10 Thread Stefan Beller
On Wed, Oct 10, 2018 at 11:56 AM Antonio Ospite  wrote:
>
> On Mon, 8 Oct 2018 15:19:00 -0700
> Stefan Beller  wrote:
>
> > > +test_expect_success 'not writing gitmodules config file when it is not 
> > > checked out' '
> > > +test_must_fail git -C super submodule--helper config 
> > > submodule.submodule.url newurl
> >
> > This only checks the exit code, do we also want to check for
> >
> > test_path_is_missing .gitmodules ?
> >
>
> OK, I agree, let's re-check also *after* we tried and failed to set
> a config value, just to be sure that the code does not get accidentally
> changed in the future to create the file. I'll add the check.
>
> > > +test_expect_success 'initialising submodule when the gitmodules config 
> > > is not checked out' '
> > > +   git -C super submodule init
> > > +'
> > > +
> > > +test_expect_success 'showing submodule summary when the gitmodules 
> > > config is not checked out' '
> > > +   git -C super submodule summary
> > > +'
> >
> > Same for these, is the exit code enough, or do we want to look at
> > specific things?
> >
>
> Except for the "summary" test which was not even exercising the
> config_from_gitmodule path,  checking exist status should be sufficient
> to verify that "submodule--helper config" does not fail, but we can
> surely do better.
>
> I will add checks to confirm that not only the commands exited without
> errors but they also achieved the desired effect, to validate the actual
> high-level use case advertised by the test file. This should be more
> future-proof.
>
> And I think I'll merge the summary and the update tests.
>
> > > +
> > > +test_expect_success 'updating submodule when the gitmodules config is 
> > > not checked out' '
> > > +   (cd submodule &&
> > > +   echo file2 >file2 &&
> > > +   git add file2 &&
> > > +   git commit -m "add file2 to submodule"
> > > +   ) &&
> > > +   git -C super submodule update
> >
> > git status would want to be clean afterwards?
>
> Mmh, this should have been "submodule update --remote" in the first
> place to have any effect, I'll take the chance and rewrite this test in
> a different way and also check the effect of the update operation, and
> the repository status.
>
> I'll be something like this:
>
> ORIG_SUBMODULE=$(git -C submodule rev-parse HEAD)
> ORIG_UPSTREAM=$(git -C upstream rev-parse HEAD)
> ORIG_SUPER=$(git -C super rev-parse HEAD)
>
> test_expect_success 're-updating submodule when the gitmodules config is not 
> checked out' '
> test_when_finished "git -C submodule reset --hard $ORIG_SUBMODULE;
> git -C upstream reset --hard $ORIG_UPSTREAM;
> git -C super reset --hard $ORIG_SUPER;
> git -C upstream submodule update --remote;
> git -C super pull;
> git -C super submodule update --remote" &&
> (cd submodule &&
> echo file2 >file2 &&
> git add file2 &&
> test_tick &&
> git commit -m "add file2 to submodule"
> ) &&
> (cd upstream &&
> git submodule update --remote &&
> git add submodule &&
> test_tick &&
> git commit -m "Update submodule"
> ) &&
> git -C super pull &&
> # The --for-status options reads the gitmdoules config

gitmodules

> git -C super submodule summary --for-status >actual &&
> cat >expect <<-\EOF &&
> * submodule 951c301...a939200 (1):

hardcoding hash values burdens the plan to migrate to another
hash function,

rev1=$(git -C submodule rev-parse --short HEAD^)
rev2=$(git -C submodule rev-parse --short HEAD)

and then use ${rev1}..${rev2} ?


>   < add file2 to submodule
>
> EOF
> test_cmp expect actual &&
> # Test that the update actually succeeds
> test_path_is_missing super/submodule/file2 &&
> git -C super submodule update &&
> test_cmp submodule/file2 super/submodule/file2 &&
> git -C super status --short >output &&
> test_must_be_empty output
> '
>
> Maybe a little overkill?

Wow, very thorough! You might call it overkill, but now that you have it...

> The "upstream" repo will be added in test 1 to better clarify the roles
> of the involved repositories.
>
> The commit ids should be stable because of test_tick, shouldn't they?

Yes, but see
Documentation/technical/hash-function-transition.txt
that a couple people are working on. Let's be nice to them. :-)

Stefan


Re: [PATCH] builtin/grep.c: remote superflous submodule code

2018-10-10 Thread Stefan Beller
On Tue, Oct 9, 2018 at 5:10 PM Jonathan Tan  wrote:
>
> > It claimed that grep would still need some explicit handling, but that is
> > not the call to repo_read_gitmodules (applying this patch on top of
> > ff6f1f564c4 still keep the test suite happy, specifically
> > t7814-grep-recurse-submodules, which contains a test
> > "grep history with moved submoules")
>
> Firstly, spelling of "remove" and "superfluous" in the commit title.
>
> I don't think the "grep history with moved submodules" test exercises
> much. That test only tests the superproject > submodule case, but we
> need a superproject > submodule > sub-submodule case, because what is
> being removed is a call to repo_read_gitmodules() on a repository
> ("struct repository submodule") that has a superproject ("struct
> repository *superproject"). In other words, we need a submodule that has
> its own gitmodules.

Right; we do have a test 'grep and nested submodules', which still passes.
I added another test, that would grep through nested submodules in
the history (not checked out), but that would not work on nested submodules
with or without this patch applied. (As the nested submodule is not checked
out, is_submodule_active(repo, path) would return false and we'd not dive
into the nested submodule.

I looked into ao/submodule-wo-gitmodules-checked-out, as that touches
this area of code as well and promises to allow working with submodules
when .gitmodules is not checked out, it doesn't help this use case, either.

That is (as Antonio diagnosed), due to get_oid not working with a repository
handle, yet.

> > The special handling is the call to gitmodules_config_oid which was added
> > already in 74ed43711f (grep: enable recurse-submodules to work on
> >  objects, 2016-12-16), but then was still named
> > gitmodules_config_sha1.
>
> If you're stating that gitmodules_config_oid() is where the .gitmodules
> file is lazily loaded, it doesn't seem to be that way, because that
> function works only on the_repository (at least on 'master' and 'next').

yes, that is why nested submodules do not work currently when they
are not in the working tree.

>
> > This is a resend of origin/sb/grep-submodule-cleanup,
> > and I think picking ff6f1f564c4 as the base for the series would
> > also be appropriate.
>
> Any particular reason why you suggest that commit (which is more than a
> year old)? It seems that basing this on 'master' is fine.

After more analysis, I think we'd want to wait for Antonios series to land
and then build on top of that, while also getting get_oid converted.

Regarding this patch, let's retract it for now and revisit it once we have
more submodule infrastructure working.

Thanks,
Stefan


Re: [PATCH v3 1/2] commit-graph write: add progress output

2018-10-10 Thread Ævar Arnfjörð Bjarmason


On Wed, Oct 10 2018, SZEDER Gábor wrote:

> On Wed, Oct 10, 2018 at 11:56:45PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> On Wed, Oct 10 2018, SZEDER Gábor wrote:
>
>> >>   for (i = 0; i < oids->nr; i++) {
>> >> + display_progress(progress, ++j);
>>
>> [...]
>>
>> > This display_progress() call, however, doesn't seem to be necessary.
>> > First, it counts all commits for a second time, resulting in the ~2x
>> > difference compared to the actual number of commits, and then causing
>> > my confusion.  Second, all what this loop is doing is setting a flag
>> > in commits that were already looked up and parsed in the above loops.
>> > IOW this loop is very fast, and the progress indicator jumps from
>> > ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
>> > progress indicator at all.
>>
>> You're right, I tried this patch on top:
>
> [...]
>
>> And on a large repo with around 3 million commits the 3rd progress bar
>> doesn't kick in.
>>
>> But if I apply this on top:
>>
> [...]
>>
>> I.e. start the timer after 1/4 of a second instead of 1 second, I get
>> that progress bar.
>>
>> So I'm inclined to keep it. It just needs to be 4x the size before it's
>> noticeably hanging for 1 second.
>
> Just to clarify: are you worried about a 1 second hang in an approx. 12
> million commit repository?  If so, then I'm unconvinced, that's not
> even a blip on the radar, and the misleading numbers are far worse.

It's not a blip on the runtime, but the point of these progress bars in
general is so we don't have a UI where there's no UI differnce between
git hanging and just doing work in some tight loop in the background,
and even 1 second when you're watching something is noticeable if it
stalls.

Also it's 1 second on a server where I had 128G of RAM. I think even a
"trivial" flag change like this would very much change if e.g. the
system was under memory pressure or was swapping.

And as noted code like this tends to change over time, that loop might
get more expensive, so let's future proof by having all the loops over N
call the progress code.

When I wrote this the intent was just "report progress". So that it's
counting anything is just an implementation detail of how progress.c
works now.

This was the reference to Duy's patch, i.e. instead of spewing numbers
at the user here let's just render a spinner. Then we no longer need to
make judgement calls about which loop over N is expensive right now, and
which one isn't, and if any of them will result in reporting a 2N number
while the user might be more familiar with or expecting N.

>> That repo isn't all that big compared to what we've heard about out
>> there, and inner loops like this have a tendency to accumulate some more
>> code over time without a re-visit of why we weren't monitoring progress
>> there.
>>
>> But maybe we can fix the message. We say "Annotating commits in commit
>> grap", not "Counting" or whatever. I was trying to find something that
>> didn't imply that we were doing this once. One can annotate a thing more
>> than once, but maybe ther's a better way to explain this...
>
> IMO just remove it.


Re: [PATCH] gc: remove redundant check for gc_auto_threshold

2018-10-10 Thread Brandon Casey
On Wed, Oct 10, 2018 at 12:32 PM Ævar Arnfjörð Bjarmason
 wrote:
>
> Checking gc_auto_threshold in too_many_loose_objects() was added in
> 17815501a8 ("git-gc --auto: run "repack -A -d -l" as necessary.",
> 2007-09-17) when need_to_gc() itself was also reliant on
> gc_auto_pack_limit before its early return:
>
> gc_auto_threshold <= 0 && gc_auto_pack_limit <= 0
>
> When that check was simplified to just checking "gc_auto_threshold <=
> 0" in b14d255ba8 ("builtin-gc.c: allow disabling all auto-gc'ing by
> assigning 0 to gc.auto", 2008-03-19) this unreachable code should have
> been removed. We only call too_many_loose_objects() from within
> need_to_gc() itself, which will return if this condition holds, and in
> cmd_gc() which will return before ever getting to "auto_gc &&
> too_many_loose_objects()" if "auto_gc && !need_to_gc()" is true
> earlier in the function.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>
> I had this in my tree as part of some general gc cleanups I was
> working on, but since it's trivially considered as a stand-alone topic
> and unlikely to conflict with anything I or anyone else has planned
> I'm sending it as a one-off.

Hmm, yeah you're right that the check seems to be redundant for the
current uses of too_many_loose_objects().  I don't feel strongly about
it, but I think an argument could be made that it makes sense for
too_many_loose_object() and too_many_packs() to each inspect the
configuration variable that controls them and detect when they're
disabled, rather than having one of them require that the check be
done beforehand.  Again, I don't feel strongly about it, but I'm not
sure this change actually improves the code.

-Brandon


[PATCH] doc: fix a typo and clarify a sentence

2018-10-10 Thread Mihir Mehta
I noticed that git-merge-base was unlikely to actually be a git command,
and tried it in my shell. Seeing that it doesn't work, I cleaned up two
places in the docs where it appears.

Signed-off-by: Mihir Mehta 
---
 Documentation/git-diff.txt  | 4 ++--
 Documentation/howto/update-hook-example.txt | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
index b180f1fa5..030f162f3 100644
--- a/Documentation/git-diff.txt
+++ b/Documentation/git-diff.txt
@@ -72,10 +72,10 @@ two blob objects, or changes between two files on disk.
This form is to view the changes on the branch containing
and up to the second , starting at a common ancestor
of both .  "git diff A\...B" is equivalent to
-   "git diff $(git-merge-base A B) B".  You can omit any one
+   "git diff $(git merge-base A B) B".  You can omit any one
of , which has the same effect as using HEAD instead.
 
-Just in case if you are doing something exotic, it should be
+Just in case you are doing something exotic, it should be
 noted that all of the  in the above description, except
 in the last two forms that use ".." notations, can be any
 .
diff --git a/Documentation/howto/update-hook-example.txt 
b/Documentation/howto/update-hook-example.txt
index a5193b1e5..89821ec74 100644
--- a/Documentation/howto/update-hook-example.txt
+++ b/Documentation/howto/update-hook-example.txt
@@ -80,7 +80,7 @@ case "$1" in
   info "The branch '$1' is new..."
 else
   # updating -- make sure it is a fast-forward
-  mb=$(git-merge-base "$2" "$3")
+  mb=$(git merge-base "$2" "$3")
   case "$mb,$2" in
 "$2,$mb") info "Update is fast-forward" ;;
*)noff=y; info "This is not a fast-forward update.";;
-- 
2.19.0



Re: [PATCH] doc: move git-rev-parse from porcelain to plumbing

2018-10-10 Thread Junio C Hamano
Daniels Umanovskis  writes:

> git-rev-parse mostly seems like plumbing, and is more usd in
> scripts than in regular use. Online it's often mentioned as
> a plumbing command. Nonetheless it's listed under porcelain
> interrogators in `man git`. It seems appropriate to formally
> move git-rev-parse to plumbing interrogators.

Correct.  "ancillary" category ended up with full of Porcelain, it
seems, but there still are plumbing commands there, and this is a
prime example that should not be mixed up with Porcelain commands.

Among the remaining ones in the list, cherry and get-tar-commit-id
are probably better classified as plumbing.  I do not know why
cherry is marked for completion; perhaps some crazy people use that
on the command line?


Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275

2018-10-10 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:

> Which is what I'm doing by running "gc --auto" across a set of servers
> and looking at the exit code. If it's been failing I get an error, if
> there's no need to gc nothing happens, and if it hasn't been failing and
> it just so happens that it's time to GC then fine, now was as good a
> time as any.

For this, a simple "git gc --detached-status" would work.  It sounds
like the bonus gc --auto run was a side effect instead of being a
requirement.

> So if we assume that for the sake of argument there's no point in a
> --detached-status either. My only reason for ever caring about that
> status is when I run "gc --auto" and it says it can't fork() itself so
> it fails. Since I'm using "gc --auto" I have zero reason to even ask
> that question unless I'm OK with kicking off a gc run as a side-effect,
> so why split up the two? It just introduces a race condition for no
> benefit.

What I am trying to do is design an interface which is simple to
explain in the manual.  The existing "git gc --auto" interface since
detaching was introduced is super confusing to me, so I'm going
through the thought exercise of "If we were starting over, what would
we build instead?"

Part of the answer to that question might include a

--report-from-the-last-time-you-detached

option.  I'm still failing to come up of a case where the answer to
that question would include a

--report-from-the-last-time-you-detached-and-if-it-went-okay\
-then-run-another-detached-gc

option.

In other words, I think our disconnect is that you are describing
things in terms of "I have been happy with the existing git gc --auto
detaching behavior, so how can we maintain something as close as
possible to that"?  And I am trying to describe things in terms of
"What is the simplest, most maintainable, and easiest to explain way
to keep Ævar's servers working well"?

Jonathan


Re: [PATCH v3 1/2] commit-graph write: add progress output

2018-10-10 Thread SZEDER Gábor
On Wed, Oct 10, 2018 at 11:56:45PM +0200, Ævar Arnfjörð Bjarmason wrote:
> On Wed, Oct 10 2018, SZEDER Gábor wrote:

> >>for (i = 0; i < oids->nr; i++) {
> >> +  display_progress(progress, ++j);
> 
> [...]
> 
> > This display_progress() call, however, doesn't seem to be necessary.
> > First, it counts all commits for a second time, resulting in the ~2x
> > difference compared to the actual number of commits, and then causing
> > my confusion.  Second, all what this loop is doing is setting a flag
> > in commits that were already looked up and parsed in the above loops.
> > IOW this loop is very fast, and the progress indicator jumps from
> > ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
> > progress indicator at all.
> 
> You're right, I tried this patch on top:

[...] 

> And on a large repo with around 3 million commits the 3rd progress bar
> doesn't kick in.
> 
> But if I apply this on top:
> 
[...]
> 
> I.e. start the timer after 1/4 of a second instead of 1 second, I get
> that progress bar.
> 
> So I'm inclined to keep it. It just needs to be 4x the size before it's
> noticeably hanging for 1 second.

Just to clarify: are you worried about a 1 second hang in an approx. 12
million commit repository?  If so, then I'm unconvinced, that's not
even a blip on the radar, and the misleading numbers are far worse.

> That repo isn't all that big compared to what we've heard about out
> there, and inner loops like this have a tendency to accumulate some more
> code over time without a re-visit of why we weren't monitoring progress
> there.
> 
> But maybe we can fix the message. We say "Annotating commits in commit
> grap", not "Counting" or whatever. I was trying to find something that
> didn't imply that we were doing this once. One can annotate a thing more
> than once, but maybe ther's a better way to explain this...

IMO just remove it.



Re: none

2018-10-10 Thread Junio C Hamano
Mihir Mehta  writes:

> Thanks, Junio. Instead of removing that part of the patch, I opted to
> expand it to make it a little clearer (in my opinion) than it was
> before. Let me know if this works.

I am mildly negative on that change.  "Omitting both would give an
empty diff" would be understandable to anybody who understands that
an omitted end of dot-dot is substituted with HEAD *and* thinks what
range HEAD..HEAD means, so it is just an additional noise to them,
and to those who do not want to waste time on thinking, it is a
statement that reads as if "it will be an error" without saying why
it is an error.  So overall, it seems, at least to me, that the
additional text adds negative value.

So, I dunno.


Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275

2018-10-10 Thread Ævar Arnfjörð Bjarmason


On Wed, Oct 10 2018, Jonathan Nieder wrote:

> Junio C Hamano wrote:
>> Jonathan Nieder  writes:
>
>>> Perhaps this reporting could also print the message from a previous
>>> run, so you could write:
>>>
>>> git gc --detached-status || exit
>>> git gc --auto; # perhaps also passing --detach
>>>
>>> (Names still open for bikeshedding.)
>>
>> When the command is given --detached-exit-code/status option, what
>> does it do?  Does it perform the "did an earlier run left gc.log?"
>> and report the result and nothing else?  In other words, is it a
>> pure replacement for "test -e .git/gc.log"?
>
> My intent was the latter.  In other words, in the idiom
>
>   do_something_async &
>   ... a lot of time passes ...
>   wait
>
> it is something like the replacement for "wait".
>
> More precisely,
>
>   git gc --detached-status || exit
>
> would mean something like
>
>   if test -e .git/gc.log  # Error from previous gc --detach?
>   then
>   cat >&2 .git/gc.log # Report the error.
>   exit 1
>   fi
>
>>  Or does it do some of
>> the "auto-gc" prep logic like guestimating loose object count and
>> have that also in its exit status (e.g. "from the gc.log left
>> behind, we know that we failed to reduce loose object count down
>> sufficiently after finding there are more than 6700 earlier, but now
>> we do not have that many loose object, so there is nothing to
>> complain about the presence of gc.log")?
>
> Depending on the use case, a user might want to avoid losing
> information about the results of a previous "git gc --detach" run,
> even if they no longer apply.  For example, a user might want to
> collect the error message for monitoring or later log analysis, to
> track down intermittent gc errors that go away on their own.
>
> A separate possible use case might be a
>
>   git gc --needs-auto-gc
>
> command that detects whether an auto gc is needed.  With that, a
> caller that only wants to learn about errors if auto gc is needed
> could run
>
>   if git gc --needs-auto-gc
>   then
>   git gc --detached-status || exit
>   fi
>
>> I am bad at naming myself, but worse at guessing what others meant
>> with a new thing that was given a new name whose name is fuzzy,
>> so... ;-)
>
> No problem.  I'm mostly trying to tease out more details about the use
> case.

Likewise, so don't take the following as an assertion of fact, but more
of a fact-finding mission:

We could add something like this --detached-status / --needs-auto-gc,
but I don't need it, and frankly I can't think of a reason for why
anyone would want to use these.

The entire point of having gc --auto in the first place is that you
don't care when exactly GC happens, you're happy with whenever git
decides it's needed.

So why would anyone need a --needs-auto-gc? If your criteria for doing
GC exactly matches that of gc --auto then ... you just run gc --auto, if
it isn't (e.g. if you're using Microsoft's Windows repo) you're not
using gc --auto in the first place, and neither --needs-auto-gc nor
--auto is useful to you.

So maybe I'm missing something here, but a --needs-auto-gc just seems
like a gratuitous exposure of an internal implementation detail whose
only actionable result is doing what we're doing with "gc --auto" now,
i.e. just run gc.

Which is what I'm doing by running "gc --auto" across a set of servers
and looking at the exit code. If it's been failing I get an error, if
there's no need to gc nothing happens, and if it hasn't been failing and
it just so happens that it's time to GC then fine, now was as good a
time as any.

So if we assume that for the sake of argument there's no point in a
--detached-status either. My only reason for ever caring about that
status is when I run "gc --auto" and it says it can't fork() itself so
it fails. Since I'm using "gc --auto" I have zero reason to even ask
that question unless I'm OK with kicking off a gc run as a side-effect,
so why split up the two? It just introduces a race condition for no
benefit.


Re: [PATCH v5 17/23] userdiff.c: remove implicit dependency on the_index

2018-10-10 Thread Junio C Hamano
Jeff King  writes:

> I get why you're doing it: your topic here only cares about removing
> index dependencies, so you did the minimal thing to move that forward.
>
> But if you think about what this function is doing, it is quite clearly
> dependent on the whole repository, since the userdiff config we're
> looking up may come from repo config.

In the case of userdiff that is pretty much limited to read-only
operation, I fully agree, but in more general cases, we would need
to pass both the repository and an in-core index separately, I would
say.  Imagine doing a partial commit, where we construct a separate
istate that is not the "repo's index" and use that to write out a
tree object to be wrapped in a new commit, and update the current
branch ref.


Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-10 Thread SZEDER Gábor
On Thu, Oct 04, 2018 at 11:09:58PM -0700, Junio C Hamano wrote:
> SZEDER Gábor  writes:
> 
> >> git-gc - Cleanup unnecessary files and optimize the local repository
> >> 
> >> Creating these indexes like the commit-graph falls under "optimize the
> >> local repository",
> >
> > But it doesn't fall under "cleanup unnecessary files", which the
> > commit-graph file is, since, strictly speaking, it's purely
> > optimization.
> 
> I won't be actively engaged in this discussion soon, but I must say
> that "git gc" doing "garbage collection" is merely an implementation
> detail of optimizing the repository for further use.  And from that
> point of view, what needs to be updated is the synopsis of the
> git-gc doc.  It states "X and Y" above, but it actually is "Y by
> doing X and other things".

Well, then perhaps the name of the command should be updated, too, to
better reflect what it actually does...

> I understand your "by definition there is no garbage immediately
> after clone" position, and also I would understand if you find it
> (perhaps philosophically) disturbing that "git clone" may give users
> a suboptimal repository that immediately needs optimizing [*1*].
> 
> But that bridge was crossed long time ago ever since pack transfer
> was invented.  The data source sends only the pack data stream, and
> the receiving end is responsible for spending cycles to build .idx
> file.  Theoretically, .pack should be all that is needed---you
> should be able to locate any necessary object by parsing the .pack
> file every time you open it, and .idx is mere optimization.  You can
> think of the .midx and graph files the same way.

I don't think this is a valid comparison, because, practically, Git
just didn't work after I deleted all pack index files.  So while they
can be easily (re)generated, they are essential to make pack files
usable.  The commit-graph and .midx files, however, can be safely
deleted, and everything keeps working as before.

OTOH, this is an excellent comparison, and I do think of the .midx and
graph files the same way as the pack index files.  During a clone, the
pack index file isn't generated by running a separate 'git gc
(--auto)', but by clone (or fetch-pack?) running 'git index-pack'.
The way I see it that should be the case for these other files as well.

And it is much simpler, shorter, and cleaner to either run 'git
commit-graph ...' or even to call write_commit_graph_reachable()
directly from cmd_clone(), than to bolt on another option and config
variable on 'git gc' [1] to coax it into some kind of an "after clone"
mode, that it shouldn't be doing in the first place.  At least for
now, so when we'll eventually get as far ...

> I would not be surprised by a future in which the initial index-pack
> that is responsible for receiving the incoming pack stream and
> storing that in .pack file(s) while creating corresponding .idx
> file(s) becomes also responsible for building .midx and graph files
> in the same pass, or at least smaller number of passes.  Once we
> gain experience and confidence with these new auxiliary files, that
> ought to happen naturally.  And at that point, we won't be having
> this discussion---we'd all happily run index-pack to receive the
> pack data, because that is pretty much the fundamental requirement
> to make use of the data.

... that what you wrote here becomes a reality (and I fully agree that
this is what we should ultimately aim for), then we won't have that
option and config variable still lying around and requiring
maintenance because of backwards compatibility.

1 - https://public-inbox.org/git/87in2hgzin@evledraar.gmail.com/

> [Footnote]
> 
> *1* Even without considering these recent invention of auxiliary
> files, cloning from a sloppily packed server whose primary focus
> is to avoid spending cycles by not computing better deltas will
> give the cloner a suboptimal repository.  If we truly want to
> have an optimized repository ready to be used after cloning, we
> should run an equivalent of "repack -a -d -f" immediately after
> "git clone".

I noticed a few times that I got surprisingly large packs from GitHub,
e.g. there is over 70% size difference between --single-branch cloning
v2.19.0 from GitHub and from my local clone or from kernel.org (~95MB
vs. ~55MB vs ~52MB).  After running 'git repack -a -d -f' they all end
up at ~65MB, which is a nice size reduction for the clone from GitHub,
but the others just gained 10-13 more MBs.



Re: [PATCH v3 1/2] commit-graph write: add progress output

2018-10-10 Thread Ævar Arnfjörð Bjarmason


On Wed, Oct 10 2018, SZEDER Gábor wrote:

> On Mon, Sep 17, 2018 at 03:33:35PM +, Ævar Arnfjörð Bjarmason wrote:
>> $ git -c gc.writeCommitGraph=true gc
>> [...]
>> Annotating commits in commit graph: 1565573, done.
>> Computing commit graph generation numbers: 100% (782484/782484), done.
>
> While poking around 'commit-graph.c' in my Bloom filter experiment, I
> saw similar numbers like above, and was confused by the much higher
> than expected number of annotated commits.  It's about twice as much
> as the number of commits in the repository, or the number shown on the
> very next line.
>
>> diff --git a/commit-graph.c b/commit-graph.c
>> index 8a1bec7b8a..2c5d996194 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
>> -static void close_reachable(struct packed_oid_list *oids)
>> +static void close_reachable(struct packed_oid_list *oids, int 
>> report_progress)
>>  {
>>  int i;
>>  struct commit *commit;
>> +struct progress *progress = NULL;
>> +int j = 0;
>>
>> +if (report_progress)
>> +progress = start_delayed_progress(
>> +_("Annotating commits in commit graph"), 0);
>>  for (i = 0; i < oids->nr; i++) {
>> +display_progress(progress, ++j);
>>  commit = lookup_commit(the_repository, >list[i]);
>>  if (commit)
>>  commit->object.flags |= UNINTERESTING;
>> @@ -604,6 +616,7 @@ static void close_reachable(struct packed_oid_list *oids)
>>   * closure.
>>   */
>>  for (i = 0; i < oids->nr; i++) {
>> +display_progress(progress, ++j);
>>  commit = lookup_commit(the_repository, >list[i]);
>>
>>  if (commit && !parse_commit(commit))
>> @@ -611,19 +624,28 @@ static void close_reachable(struct packed_oid_list 
>> *oids)
>>  }
>
> The above loops have already counted all the commits, and, more
> importantly, did all the hard work that takes time and makes the
> progress indicator useful.
>
>>  for (i = 0; i < oids->nr; i++) {
>> +display_progress(progress, ++j);

[...]

> This display_progress() call, however, doesn't seem to be necessary.
> First, it counts all commits for a second time, resulting in the ~2x
> difference compared to the actual number of commits, and then causing
> my confusion.  Second, all what this loop is doing is setting a flag
> in commits that were already looked up and parsed in the above loops.
> IOW this loop is very fast, and the progress indicator jumps from
> ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
> progress indicator at all.

You're right, I tried this patch on top:

diff --git a/commit-graph.c b/commit-graph.c
index a686758603..cccd83de72 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -655,12 +655,16 @@ static void close_reachable(struct packed_oid_list 
*oids, int report_progress)
if (commit)
commit->object.flags |= UNINTERESTING;
}
+   stop_progress(); j = 0;

/*
 * As this loop runs, oids->nr may grow, but not more
 * than the number of missing commits in the reachable
 * closure.
 */
+   if (report_progress)
+   progress = start_delayed_progress(
+   _("Annotating commits in commit graph 2"), 0);
for (i = 0; i < oids->nr; i++) {
display_progress(progress, ++j);
commit = lookup_commit(the_repository, >list[i]);
@@ -668,7 +672,11 @@ static void close_reachable(struct packed_oid_list 
*oids, int report_progress)
if (commit && !parse_commit(commit))
add_missing_parents(oids, commit);
}
+   stop_progress(); j = 0;

+   if (report_progress)
+   progress = start_delayed_progress(
+   _("Annotating commits in commit graph 3"), 0);
for (i = 0; i < oids->nr; i++) {
display_progress(progress, ++j);
commit = lookup_commit(the_repository, >list[i]);

And on a large repo with around 3 million commits the 3rd progress bar
doesn't kick in.

But if I apply this on top:

diff --git a/progress.c b/progress.c
index 5a99c9fbf0..89cc705bf7 100644
--- a/progress.c
+++ b/progress.c
@@ -58,8 +58,8 @@ static void set_progress_signal(void)
sa.sa_flags = SA_RESTART;
sigaction(SIGALRM, , NULL);

-   v.it_interval.tv_sec = 1;
-   v.it_interval.tv_usec = 0;
+   v.it_interval.tv_sec = 0;
+   v.it_interval.tv_usec = 25;
v.it_value = v.it_interval;
setitimer(ITIMER_REAL, , NULL);
 }

I.e. start the timer after 1/4 of a second instead of 1 second, I get
that progress bar.

So I'm inclined to keep it. It just needs to be 4x the size before it's
noticeably hanging for 1 second.

That repo isn't all that big compared to what we've heard about out
there, and inner loops like this have a 

Re: [PATCH 2/2] push: add an advice on unqualified push

2018-10-10 Thread Junio C Hamano
Jeff King  writes:

>> Fix both of those, now the message will look like this instead:
>> 
>> $ ./git-push avar v2.19.0^{commit}:newbranch -n
>> error: unable to push to unqualified destination: newbranch
>> hint: The destination refspec neither matches an existing
>> hint: ref on the remote nor begins with refs/, and we are
>> hint: unable to guess a prefix based on the source ref.
>> hint:
>> hint: The  part of the refspec is a commit object.
>> hint: Did you mean to create a new branch by pushing to
>> hint: 'v2.19.0^{commit}:refs/heads/newbranch'?
>> error: failed to push some refs to 'g...@github.com:avar/git.git'
>> 
>> When trying to push a tag, tree or a blob we suggest that perhaps the
>> user meant to push them to refs/tags/ instead.
>
> This is much better, and I love the customized behavior based on the
> object type.
>
> I wonder if we could reword the first paragraph to be a little less
> confusing, and spell out what we tried already. E.g., something like:
>
>   The destination you provided is not a full refname (i.e., starting
>   with "ref"). Git tried to guess what you meant by:

s|ref|refs/|; I fully agree that "unqualified destination" was a
poor way to communicate the failure to those who would likely hit
this error path, because somebody who can ell what's qualified and
what's not would not be triggering the error in the first place.

> - looking for a matching branch or tag on the remote side
>
> - looking at the refname of the local source
>
>   but neither worked.
>
>   The  part of the refspec is a commit object.
>   Did you mean...

Looks great.

> I'm not sure about saying "branch or tag" in the first bullet. It's
> friendlier to most users, but less technically correct (if you said
> "notes/foo", I believe we'd match an existing "refs/notes/foo", because
> it's really just using the normal lookup rules).

An alternative may be "looking for a ref that matches %s on the
remote side".  I am no longer a total newbie, so I cannot tell how
well that message would help one to connect notes/foo one just typed
with refs/notes/foo that potentially exists on the remote side.

> Also, as an aside, I wonder if we should allow "heads/foo" to work as
> "refs/heads/foo" (even when no such ref already exists). But that is
> totally orthogonal to changing the message.

I am neutral on this point but agree that it is better done outside
this patch.


Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275

2018-10-10 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> Perhaps this reporting could also print the message from a previous
>> run, so you could write:
>>
>>  git gc --detached-status || exit
>>  git gc --auto; # perhaps also passing --detach
>>
>> (Names still open for bikeshedding.)
>
> When the command is given --detached-exit-code/status option, what
> does it do?  Does it perform the "did an earlier run left gc.log?"
> and report the result and nothing else?  In other words, is it a
> pure replacement for "test -e .git/gc.log"?

My intent was the latter.  In other words, in the idiom

do_something_async &
... a lot of time passes ...
wait

it is something like the replacement for "wait".

More precisely,

git gc --detached-status || exit

would mean something like

if test -e .git/gc.log  # Error from previous gc --detach?
then
cat >&2 .git/gc.log # Report the error.
exit 1
fi

>  Or does it do some of
> the "auto-gc" prep logic like guestimating loose object count and
> have that also in its exit status (e.g. "from the gc.log left
> behind, we know that we failed to reduce loose object count down
> sufficiently after finding there are more than 6700 earlier, but now
> we do not have that many loose object, so there is nothing to
> complain about the presence of gc.log")?

Depending on the use case, a user might want to avoid losing
information about the results of a previous "git gc --detach" run,
even if they no longer apply.  For example, a user might want to
collect the error message for monitoring or later log analysis, to
track down intermittent gc errors that go away on their own.

A separate possible use case might be a

git gc --needs-auto-gc

command that detects whether an auto gc is needed.  With that, a
caller that only wants to learn about errors if auto gc is needed
could run

if git gc --needs-auto-gc
then
git gc --detached-status || exit
fi

> I am bad at naming myself, but worse at guessing what others meant
> with a new thing that was given a new name whose name is fuzzy,
> so... ;-)

No problem.  I'm mostly trying to tease out more details about the use
case.

Thanks,
Jonathan


[PATCH v5 9/9] builtin/fetch: check for submodule updates for non branch fetches

2018-10-10 Thread Stefan Beller
Gerrit, the code review tool, has a different workflow than our mailing
list based approach. Usually users upload changes to a Gerrit server and
continuous integration and testing happens by bots. Sometimes however a
user wants to checkout a change locally and look at it locally. For this
use case, Gerrit offers a command line snippet to copy and paste to your
terminal, which looks like

  git fetch https:///gerrit refs/changes/ &&
  git checkout FETCH_HEAD

For Gerrit changes that contain changing submodule gitlinks, it would be
easy to extend both the fetch and checkout with the '--recurse-submodules'
flag, such that this command line snippet would produce the state of a
change locally.

However the functionality added in the previous patch, which would
ensure that we fetch the objects in the submodule that the gitlink pointed
at, only works for remote tracking branches so far, not for FETCH_HEAD.

Make sure that fetching a superproject to its FETCH_HEAD, also respects
the existence checks for objects in the submodule recursion.

Signed-off-by: Stefan Beller 
---
 builtin/fetch.c | 5 -
 t/t5526-fetch-submodules.sh | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e3b03ad3bd..f2d9e548bf 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -894,11 +894,14 @@ static int store_updated_refs(const char *raw_url, const 
char *remote_name,
rc |= update_local_ref(ref, what, rm, ,
   summary_width);
free(ref);
-   } else
+   } else {
+   check_for_new_submodule_commits(>old_oid);
format_display(, '*',
   *kind ? kind : "branch", NULL,
   *what ? what : "HEAD",
   "FETCH_HEAD", summary_width);
+   }
+
if (note.len) {
if (verbosity >= 0 && !shown_url) {
fprintf(stderr, _("From %.*s\n"),
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index af12c50e7d..a509eabb04 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -615,7 +615,7 @@ test_expect_success "fetch new commits on-demand when they 
are not reachable" '
git update-ref refs/changes/2 $D &&
(
cd downstream &&
-   git fetch --recurse-submodules --recurse-submodules-default 
on-demand origin refs/changes/2:refs/heads/my_branch &&
+   git fetch --recurse-submodules origin refs/changes/2 &&
git -C submodule cat-file -t $C &&
git checkout --recurse-submodules FETCH_HEAD
)
-- 
2.19.0



[PATCH v5 3/9] submodule.c: sort changed_submodule_names before searching it

2018-10-10 Thread Stefan Beller
We can string_list_insert() to maintain sorted-ness of the
list as we find new items, or we can string_list_append() to
build an unsorted list and sort it at the end just once.

As we do not rely on the sortedness while building the
list, we pick the "append and sort at the end" as it
has better worst case execution times.

Signed-off-by: Stefan Beller 
---
 submodule.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index 0de9e2800a..22c64bd855 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1256,7 +1256,7 @@ static int get_next_submodule(struct child_process *cp,
case RECURSE_SUBMODULES_DEFAULT:
case RECURSE_SUBMODULES_ON_DEMAND:
if (!submodule ||
-   !unsorted_string_list_lookup(
+   !string_list_lookup(
_submodule_names,
submodule->name))
continue;
@@ -1350,6 +1350,7 @@ int fetch_populated_submodules(struct repository *r,
/* default value, "--submodule-prefix" and its value are added later */
 
calculate_changed_submodule_paths();
+   string_list_sort(_submodule_names);
run_processes_parallel(max_parallel_jobs,
   get_next_submodule,
   fetch_start_failure,
-- 
2.19.0



[PATCH v5 2/9] submodule.c: fix indentation

2018-10-10 Thread Stefan Beller
The submodule subsystem is really bad at staying within 80 characters.
Fix it while we are here.

Signed-off-by: Stefan Beller 
---
 submodule.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index b53cb6e9c4..0de9e2800a 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1244,7 +1244,8 @@ static int get_next_submodule(struct child_process *cp,
if (!submodule) {
const char *name = default_name_or_path(ce->name);
if (name) {
-   default_submodule.path = default_submodule.name 
= name;
+   default_submodule.path = name;
+   default_submodule.name = name;
submodule = _submodule;
}
}
@@ -1254,8 +1255,10 @@ static int get_next_submodule(struct child_process *cp,
default:
case RECURSE_SUBMODULES_DEFAULT:
case RECURSE_SUBMODULES_ON_DEMAND:
-   if (!submodule || 
!unsorted_string_list_lookup(_submodule_names,
-submodule->name))
+   if (!submodule ||
+   !unsorted_string_list_lookup(
+   _submodule_names,
+   submodule->name))
continue;
default_argv = "on-demand";
break;
-- 
2.19.0



[PATCH v5 1/9] sha1-array: provide oid_array_filter

2018-10-10 Thread Stefan Beller
Helped-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
---
 Documentation/technical/api-oid-array.txt |  5 +
 sha1-array.c  | 17 +
 sha1-array.h  |  3 +++
 3 files changed, 25 insertions(+)

diff --git a/Documentation/technical/api-oid-array.txt 
b/Documentation/technical/api-oid-array.txt
index 9febfb1d52..c97428c2c3 100644
--- a/Documentation/technical/api-oid-array.txt
+++ b/Documentation/technical/api-oid-array.txt
@@ -48,6 +48,11 @@ Functions
is not sorted, this function has the side effect of sorting
it.
 
+`oid_array_filter`::
+   Apply the callback function `want` to each entry in the array,
+   retaining only the entries for which the function returns true.
+   Preserve the order of the entries that are retained.
+
 Examples
 
 
diff --git a/sha1-array.c b/sha1-array.c
index b94e0ec0f5..d922e94e3f 100644
--- a/sha1-array.c
+++ b/sha1-array.c
@@ -77,3 +77,20 @@ int oid_array_for_each_unique(struct oid_array *array,
}
return 0;
 }
+
+void oid_array_filter(struct oid_array *array,
+ for_each_oid_fn want,
+ void *cb_data)
+{
+   unsigned nr = array->nr, src, dst;
+   struct object_id *oids = array->oid;
+
+   for (src = dst = 0; src < nr; src++) {
+   if (want([src], cb_data)) {
+   if (src != dst)
+   oidcpy([dst], [src]);
+   dst++;
+   }
+   }
+   array->nr = dst;
+}
diff --git a/sha1-array.h b/sha1-array.h
index 232bf95017..55d016c4bf 100644
--- a/sha1-array.h
+++ b/sha1-array.h
@@ -22,5 +22,8 @@ int oid_array_for_each(struct oid_array *array,
 int oid_array_for_each_unique(struct oid_array *array,
  for_each_oid_fn fn,
  void *data);
+void oid_array_filter(struct oid_array *array,
+ for_each_oid_fn want,
+ void *cbdata);
 
 #endif /* SHA1_ARRAY_H */
-- 
2.19.0



[PATCH v5 4/9] submodule.c: move global changed_submodule_names into fetch submodule struct

2018-10-10 Thread Stefan Beller
The `changed_submodule_names` are only used for fetching, so let's make it
part of the struct that is passed around for fetching submodules.

Signed-off-by: Stefan Beller 
---
 submodule.c | 42 +++---
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/submodule.c b/submodule.c
index 22c64bd855..17103379ba 100644
--- a/submodule.c
+++ b/submodule.c
@@ -25,7 +25,7 @@
 #include "commit-reach.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
-static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
+
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -1110,7 +1110,22 @@ void check_for_new_submodule_commits(struct object_id 
*oid)
oid_array_append(_tips_after_fetch, oid);
 }
 
-static void calculate_changed_submodule_paths(void)
+struct submodule_parallel_fetch {
+   int count;
+   struct argv_array args;
+   struct repository *r;
+   const char *prefix;
+   int command_line_option;
+   int default_option;
+   int quiet;
+   int result;
+
+   struct string_list changed_submodule_names;
+};
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, 
STRING_LIST_INIT_DUP }
+
+static void calculate_changed_submodule_paths(
+   struct submodule_parallel_fetch *spf)
 {
struct argv_array argv = ARGV_ARRAY_INIT;
struct string_list changed_submodules = STRING_LIST_INIT_DUP;
@@ -1148,7 +1163,8 @@ static void calculate_changed_submodule_paths(void)
continue;
 
if (!submodule_has_commits(path, commits))
-   string_list_append(_submodule_names, 
name->string);
+   string_list_append(>changed_submodule_names,
+  name->string);
}
 
free_submodules_oids(_submodules);
@@ -1185,18 +1201,6 @@ int submodule_touches_in_range(struct object_id 
*excl_oid,
return ret;
 }
 
-struct submodule_parallel_fetch {
-   int count;
-   struct argv_array args;
-   struct repository *r;
-   const char *prefix;
-   int command_line_option;
-   int default_option;
-   int quiet;
-   int result;
-};
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
-
 static int get_fetch_recurse_config(const struct submodule *submodule,
struct submodule_parallel_fetch *spf)
 {
@@ -1257,7 +1261,7 @@ static int get_next_submodule(struct child_process *cp,
case RECURSE_SUBMODULES_ON_DEMAND:
if (!submodule ||
!string_list_lookup(
-   _submodule_names,
+   >changed_submodule_names,
submodule->name))
continue;
default_argv = "on-demand";
@@ -1349,8 +1353,8 @@ int fetch_populated_submodules(struct repository *r,
argv_array_push(, "--recurse-submodules-default");
/* default value, "--submodule-prefix" and its value are added later */
 
-   calculate_changed_submodule_paths();
-   string_list_sort(_submodule_names);
+   calculate_changed_submodule_paths();
+   string_list_sort(_submodule_names);
run_processes_parallel(max_parallel_jobs,
   get_next_submodule,
   fetch_start_failure,
@@ -1359,7 +1363,7 @@ int fetch_populated_submodules(struct repository *r,
 
argv_array_clear();
 out:
-   string_list_clear(_submodule_names, 1);
+   string_list_clear(_submodule_names, 1);
return spf.result;
 }
 
-- 
2.19.0



[PATCH v5 7/9] submodule: fetch in submodules git directory instead of in worktree

2018-10-10 Thread Stefan Beller
This patch started as a refactoring to make 'get_next_submodule' more
readable, but upon doing so, I realized that "git fetch" of the submodule
actually doesn't need to be run in the submodules worktree. So let's run
it in its git dir instead.

That should pave the way towards fetching submodules that are currently
not checked out.

This patch leaks the cp->dir in get_next_submodule, as any further
callback in run_processes_parallel doesn't have access to the child
process any more. In an early iteration of this patch, the function
get_submodule_repo_for directly returned the string containing the
git directory, which would be a better design choice for this patch.

However the next patch both fixes the memory leak of cp->dir and also has
a use case for using the full repository handle of the submodule, so
it makes sense to introduce the get_submodule_repo_for here already.

Signed-off-by: Stefan Beller 
---
 submodule.c | 51 +++--
 t/t5526-fetch-submodules.sh |  7 -
 2 files changed, 44 insertions(+), 14 deletions(-)

diff --git a/submodule.c b/submodule.c
index dd478ed70b..3f791f2277 100644
--- a/submodule.c
+++ b/submodule.c
@@ -481,6 +481,12 @@ void prepare_submodule_repo_env(struct argv_array *out)
 DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
+static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
+{
+   prepare_submodule_repo_env_no_git_dir(out);
+   argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
+}
+
 /* Helper function to display the submodule header line prior to the full
  * summary output. If it can locate the submodule objects directory it will
  * attempt to lookup both the left and right commits and put them into the
@@ -1227,6 +1233,29 @@ static int get_fetch_recurse_config(const struct 
submodule *submodule,
return spf->default_option;
 }
 
+static struct repository *get_submodule_repo_for(struct repository *r,
+const struct submodule *sub)
+{
+   struct repository *ret = xmalloc(sizeof(*ret));
+
+   if (repo_submodule_init(ret, r, sub)) {
+   /*
+* No entry in .gitmodules? Technically not a submodule,
+* but historically we supported repositories that happen to be
+* in-place where a gitlink is. Keep supporting them.
+*/
+   struct strbuf gitdir = STRBUF_INIT;
+   strbuf_repo_worktree_path(, r, "%s/.git", sub->path);
+   if (repo_init(ret, gitdir.buf, NULL)) {
+   strbuf_release();
+   return NULL;
+   }
+   strbuf_release();
+   }
+
+   return ret;
+}
+
 static int get_next_submodule(struct child_process *cp,
  struct strbuf *err, void *data, void **task_cb)
 {
@@ -1234,12 +1263,11 @@ static int get_next_submodule(struct child_process *cp,
struct submodule_parallel_fetch *spf = data;
 
for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-   struct strbuf submodule_path = STRBUF_INIT;
-   struct strbuf submodule_git_dir = STRBUF_INIT;
struct strbuf submodule_prefix = STRBUF_INIT;
const struct cache_entry *ce = spf->r->index->cache[spf->count];
-   const char *git_dir, *default_argv;
+   const char *default_argv;
const struct submodule *submodule;
+   struct repository *repo;
struct submodule default_submodule = SUBMODULE_INIT;
 
if (!S_ISGITLINK(ce->ce_mode))
@@ -1274,16 +1302,12 @@ static int get_next_submodule(struct child_process *cp,
continue;
}
 
-   strbuf_repo_worktree_path(_path, spf->r, "%s", 
ce->name);
-   strbuf_addf(_git_dir, "%s/.git", submodule_path.buf);
strbuf_addf(_prefix, "%s%s/", spf->prefix, ce->name);
-   git_dir = read_gitfile(submodule_git_dir.buf);
-   if (!git_dir)
-   git_dir = submodule_git_dir.buf;
-   if (is_directory(git_dir)) {
+   repo = get_submodule_repo_for(spf->r, submodule);
+   if (repo) {
child_process_init(cp);
-   cp->dir = strbuf_detach(_path, NULL);
-   prepare_submodule_repo_env(>env_array);
+   prepare_submodule_repo_env_in_gitdir(>env_array);
+   cp->dir = xstrdup(repo->gitdir);
cp->git_cmd = 1;
if (!spf->quiet)
strbuf_addf(err, "Fetching submodule %s%s\n",
@@ -1293,10 +1317,11 @@ static int get_next_submodule(struct child_process *cp,
argv_array_push(>args, default_argv);
argv_array_push(>args, "--submodule-prefix");

[PATCH v5 8/9] fetch: retry fetching submodules if needed objects were not fetched

2018-10-10 Thread Stefan Beller
Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C " (with some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

This works surprisingly well in some workflows (such as using submodules
as a third party library), while not so well in other scenarios, such
as in a Gerrit topic-based workflow, that can tie together changes
(potentially across repositories) on the server side. One of the parts
of such a Gerrit workflow is to download a change when wanting to examine
it, and you'd want to have its submodule changes that are in the same
topic downloaded as well. However these submodule changes reside in their
own repository in their own ref (refs/changes/).

Retry fetching a submodule by object name if the object id that the
superproject points to, cannot be found.

This retrying does not happen when the "git fetch" done at the
superproject is not storing the fetched results in remote
tracking branches (i.e. instead just recording them to
FETCH_HEAD) in this step. A later patch will fix this.

builtin/fetch used to only inspect submodules when they were fetched
"on-demand", as in either on/off case it was clear whether the submodule
needs to be fetched. However to know whether we need to try fetching the
object ids, we need to identify the object names, which is done in this
function check_for_new_submodule_commits(), so we'll also run that code
in case the submodule recursion is set to "on".

Signed-off-by: Stefan Beller 
---
 builtin/fetch.c |   9 +-
 submodule.c | 185 ++--
 t/t5526-fetch-submodules.sh |  16 
 3 files changed, 177 insertions(+), 33 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 0696abfc2a..e3b03ad3bd 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -707,8 +707,7 @@ static int update_local_ref(struct ref *ref,
what = _("[new ref]");
}
 
-   if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-   (recurse_submodules != RECURSE_SUBMODULES_ON))
+   if (recurse_submodules != RECURSE_SUBMODULES_OFF)
check_for_new_submodule_commits(>new_oid);
r = s_update_ref(msg, ref, 0);
format_display(display, r ? '!' : '*', what,
@@ -723,8 +722,7 @@ static int update_local_ref(struct ref *ref,
strbuf_add_unique_abbrev(, >object.oid, 
DEFAULT_ABBREV);
strbuf_addstr(, "..");
strbuf_add_unique_abbrev(, >new_oid, 
DEFAULT_ABBREV);
-   if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-   (recurse_submodules != RECURSE_SUBMODULES_ON))
+   if (recurse_submodules != RECURSE_SUBMODULES_OFF)
check_for_new_submodule_commits(>new_oid);
r = s_update_ref("fast-forward", ref, 1);
format_display(display, r ? '!' : ' ', quickref.buf,
@@ -738,8 +736,7 @@ static int update_local_ref(struct ref *ref,
strbuf_add_unique_abbrev(, >object.oid, 
DEFAULT_ABBREV);
strbuf_addstr(, "...");
strbuf_add_unique_abbrev(, >new_oid, 
DEFAULT_ABBREV);
-   if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-   (recurse_submodules != RECURSE_SUBMODULES_ON))
+   if (recurse_submodules != RECURSE_SUBMODULES_OFF)
check_for_new_submodule_commits(>new_oid);
r = s_update_ref("forced-update", ref, 1);
format_display(display, r ? '!' : '+', quickref.buf,
diff --git a/submodule.c b/submodule.c
index 3f791f2277..05799362e0 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1127,8 +1127,12 @@ struct submodule_parallel_fetch {
int result;
 
struct string_list changed_submodule_names;
+   struct get_next_submodule_task **retry;
+   int retry_nr, retry_alloc;
 };
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, 
STRING_LIST_INIT_DUP }
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \
+ STRING_LIST_INIT_DUP, \
+ NULL, 0, 0}
 
 static void calculate_changed_submodule_paths(
struct submodule_parallel_fetch *spf)
@@ -1233,6 +1237,56 @@ static int get_fetch_recurse_config(const struct 
submodule *submodule,
return spf->default_option;
 }
 
+struct get_next_submodule_task {
+   struct repository *repo;
+   const struct submodule *sub;
+   unsigned free_sub : 1; /* Do we need to free the submodule? */
+   struct oid_array *commits;
+};
+
+static const struct submodule *get_default_submodule(const char *path)
+{
+   struct submodule *ret = NULL;
+   const char *name = default_name_or_path(path);
+
+   if (!name)
+   return NULL;
+
+   ret = xmalloc(sizeof(*ret));
+   

[PATCH v5 6/9] repository: repo_submodule_init to take a submodule struct

2018-10-10 Thread Stefan Beller
When constructing a struct repository for a submodule for some revision
of the superproject where the submodule is not contained in the index,
it may not be present in the working tree currently either. In that
siutation giving a 'path' argument is not useful. Upgrade the
repo_submodule_init function to take a struct submodule instead.

While we are at it, overhaul the repo_submodule_init function by renaming
the submodule repository struct, which is to be initialized, to a name
that is not confused with the struct submodule as easily.

Also move its documentation into the header file.

Signed-off-by: Stefan Beller 
---
 builtin/grep.c  | 17 ++---
 builtin/ls-files.c  | 12 +++-
 builtin/submodule--helper.c |  2 +-
 repository.c| 27 ++-
 repository.h| 11 +--
 5 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 601f801158..81c53c862b 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -418,16 +418,19 @@ static int grep_submodule(struct grep_opt *opt, struct 
repository *superproject,
  const struct object_id *oid,
  const char *filename, const char *path)
 {
-   struct repository submodule;
+   struct repository subrepo;
+   const struct submodule *sub = submodule_from_path(superproject,
+ _oid, path);
+
int hit;
 
if (!is_submodule_active(superproject, path))
return 0;
 
-   if (repo_submodule_init(, superproject, path))
+   if (repo_submodule_init(, superproject, sub))
return 0;
 
-   repo_read_gitmodules();
+   repo_read_gitmodules();
 
/*
 * NEEDSWORK: This adds the submodule's object directory to the list of
@@ -440,7 +443,7 @@ static int grep_submodule(struct grep_opt *opt, struct 
repository *superproject,
 * object.
 */
grep_read_lock();
-   add_to_alternates_memory(submodule.objects->objectdir);
+   add_to_alternates_memory(subrepo.objects->objectdir);
grep_read_unlock();
 
if (oid) {
@@ -465,14 +468,14 @@ static int grep_submodule(struct grep_opt *opt, struct 
repository *superproject,
 
init_tree_desc(, data, size);
hit = grep_tree(opt, pathspec, , , base.len,
-   object->type == OBJ_COMMIT, );
+   object->type == OBJ_COMMIT, );
strbuf_release();
free(data);
} else {
-   hit = grep_cache(opt, , pathspec, 1);
+   hit = grep_cache(opt, , pathspec, 1);
}
 
-   repo_clear();
+   repo_clear();
return hit;
 }
 
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 7f9919a362..4d1649c1b3 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -206,17 +206,19 @@ static void show_files(struct repository *repo, struct 
dir_struct *dir);
 static void show_submodule(struct repository *superproject,
   struct dir_struct *dir, const char *path)
 {
-   struct repository submodule;
+   struct repository subrepo;
+   const struct submodule *sub = submodule_from_path(superproject,
+ _oid, path);
 
-   if (repo_submodule_init(, superproject, path))
+   if (repo_submodule_init(, superproject, sub))
return;
 
-   if (repo_read_index() < 0)
+   if (repo_read_index() < 0)
die("index file corrupt");
 
-   show_files(, dir);
+   show_files(, dir);
 
-   repo_clear();
+   repo_clear();
 }
 
 static void show_ce(struct repository *repo, struct dir_struct *dir,
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 247881189f..8214e77688 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2038,7 +2038,7 @@ static int ensure_core_worktree(int argc, const char 
**argv, const char *prefix)
if (!sub)
BUG("We could get the submodule handle before?");
 
-   if (repo_submodule_init(, the_repository, path))
+   if (repo_submodule_init(, the_repository, sub))
die(_("could not get a repository handle for submodule '%s'"), 
path);
 
if (!repo_config_get_string(, "core.worktree", )) {
diff --git a/repository.c b/repository.c
index 5dd1486718..aabe64ee5d 100644
--- a/repository.c
+++ b/repository.c
@@ -166,30 +166,23 @@ int repo_init(struct repository *repo,
return -1;
 }
 
-/*
- * Initialize 'submodule' as the submodule given by 'path' in parent repository
- * 'superproject'.
- * Return 0 upon success and a non-zero value upon failure.
- */
-int repo_submodule_init(struct repository *submodule,
+int repo_submodule_init(struct repository *subrepo,
struct repository 

[PATCH v5 5/9] submodule.c: do not copy around submodule list

2018-10-10 Thread Stefan Beller
'calculate_changed_submodule_paths' uses a local list to compute the
changed submodules, and then produces the result by copying appropriate
items into the result list.

Instead use the result list directly and prune items afterwards
using string_list_remove_empty_items.

By doing so we'll have access to the util pointer for longer that
contains the commits that we need to fetch, which will be
useful in a later patch.

Signed-off-by: Stefan Beller 
---
 submodule.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/submodule.c b/submodule.c
index 17103379ba..dd478ed70b 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1128,8 +1128,7 @@ static void calculate_changed_submodule_paths(
struct submodule_parallel_fetch *spf)
 {
struct argv_array argv = ARGV_ARRAY_INIT;
-   struct string_list changed_submodules = STRING_LIST_INIT_DUP;
-   const struct string_list_item *name;
+   struct string_list_item *name;
 
/* No need to check if there are no submodules configured */
if (!submodule_from_path(the_repository, NULL, NULL))
@@ -1146,9 +1145,9 @@ static void calculate_changed_submodule_paths(
 * Collect all submodules (whether checked out or not) for which new
 * commits have been recorded upstream in "changed_submodule_names".
 */
-   collect_changed_submodules(_submodules, );
+   collect_changed_submodules(>changed_submodule_names, );
 
-   for_each_string_list_item(name, _submodules) {
+   for_each_string_list_item(name, >changed_submodule_names) {
struct oid_array *commits = name->util;
const struct submodule *submodule;
const char *path = NULL;
@@ -1162,12 +1161,14 @@ static void calculate_changed_submodule_paths(
if (!path)
continue;
 
-   if (!submodule_has_commits(path, commits))
-   string_list_append(>changed_submodule_names,
-  name->string);
+   if (submodule_has_commits(path, commits)) {
+   oid_array_clear(commits);
+   *name->string = '\0';
+   }
}
 
-   free_submodules_oids(_submodules);
+   string_list_remove_empty_items(>changed_submodule_names, 1);
+
argv_array_clear();
oid_array_clear(_tips_before_fetch);
oid_array_clear(_tips_after_fetch);
@@ -1363,7 +1364,7 @@ int fetch_populated_submodules(struct repository *r,
 
argv_array_clear();
 out:
-   string_list_clear(_submodule_names, 1);
+   free_submodules_oids(_submodule_names);
return spf.result;
 }
 
-- 
2.19.0



[PATCH v5 0/9] fetch: make sure submodule oids are fetched

2018-10-10 Thread Stefan Beller
This is nearly the same as sent in [1], with one commit message fixed.
[1] https://public-inbox.org/git/20180925194755.105578-1-sbel...@google.com/
and replaces sb/submodule-recursive-fetch-gets-the-tip.

Thanks,
Stefan

Stefan Beller (9):
  sha1-array: provide oid_array_filter
  submodule.c: fix indentation
  submodule.c: sort changed_submodule_names before searching it
  submodule.c: move global changed_submodule_names into fetch submodule
struct
  submodule.c: do not copy around submodule list
  repository: repo_submodule_init to take a submodule struct
  submodule: fetch in submodules git directory instead of in worktree
  fetch: retry fetching submodules if needed objects were not fetched
  builtin/fetch: check for submodule updates for non branch fetches

 Documentation/technical/api-oid-array.txt |   5 +
 builtin/fetch.c   |  14 +-
 builtin/grep.c|  17 +-
 builtin/ls-files.c|  12 +-
 builtin/submodule--helper.c   |   2 +-
 repository.c  |  27 +--
 repository.h  |  11 +-
 sha1-array.c  |  17 ++
 sha1-array.h  |   3 +
 submodule.c   | 275 +-
 t/t5526-fetch-submodules.sh   |  23 +-
 11 files changed, 311 insertions(+), 95 deletions(-)

-- 
2.19.0



[PATCH] doc: move git-rev-parse from porcelain to plumbing

2018-10-10 Thread Daniels Umanovskis
git-rev-parse mostly seems like plumbing, and is more usd in
scripts than in regular use. Online it's often mentioned as
a plumbing command. Nonetheless it's listed under porcelain
interrogators in `man git`. It seems appropriate to formally
move git-rev-parse to plumbing interrogators.

Signed-off-by: Daniels Umanovskis 
---
 command-list.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/command-list.txt b/command-list.txt
index c36ea3c18..e6364e167 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -153,7 +153,7 @@ git-rerere  
ancillaryinterrogators
 git-reset   mainporcelain   worktree
 git-revert  mainporcelain
 git-rev-listplumbinginterrogators
-git-rev-parse   ancillaryinterrogators
+git-rev-parse   plumbinginterrogators
 git-rm  mainporcelain   worktree
 git-send-email  foreignscminterface 
complete
 git-send-pack   synchingrepositories
-- 
2.19.1.330.g93276587c.dirty



Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275

2018-10-10 Thread Junio C Hamano
Jonathan Nieder  writes:

> Perhaps this reporting could also print the message from a previous
> run, so you could write:
>
>   git gc --detached-status || exit
>   git gc --auto; # perhaps also passing --detach
>
> (Names still open for bikeshedding.)

When the command is given --detached-exit-code/status option, what
does it do?  Does it perform the "did an earlier run left gc.log?"
and report the result and nothing else?  In other words, is it a
pure replacement for "test -e .git/gc.log"?  Or does it do some of
the "auto-gc" prep logic like guestimating loose object count and
have that also in its exit status (e.g. "from the gc.log left
behind, we know that we failed to reduce loose object count down
sufficiently after finding there are more than 6700 earlier, but now
we do not have that many loose object, so there is nothing to
complain about the presence of gc.log")?

I am bad at naming myself, but worse at guessing what others meant
with a new thing that was given a new name whose name is fuzzy,
so... ;-)


Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-10 Thread Johannes Sixt

Am 10.10.18 um 07:43 schrieb Junio C Hamano:

We haven't seen much complaints and breakages reported against the
two big "rewrite in C" topics around "rebase"; perhaps it is a good
time to merge them to 'next' soonish to cook them for a few weeks
before moving them to 'master'?


Please let me express my sincerest gratitude to Alban, Joel, 
Paul-Sebastian, Pratik, and Dscho. It is such a pleasure to work with 
the builtin rebase and stash commands on Windows now. I am using them 
since a month or two, and they work extremely well for me.


Thank you all for your hard work!

-- Hannes


Re: [PATCH 2/2] push: add an advice on unqualified push

2018-10-10 Thread Ævar Arnfjörð Bjarmason


On Wed, Oct 10 2018, Jeff King wrote:

> On Wed, Oct 10, 2018 at 10:41:45AM +, Ævar Arnfjörð Bjarmason wrote:
>
>> Improve the error message added in f8aae12034 ("push: allow
>> unqualified dest refspecs to DWIM", 2008-04-23), which before this
>> change looks like this:
>>
>> $ git push avar v2.19.0^{commit}:newbranch -n
>> error: unable to push to unqualified destination: newbranch
>> The destination refspec neither matches an existing ref on the remote nor
>> begins with refs/, and we are unable to guess a prefix based on the 
>> source ref.
>> error: failed to push some refs to 'g...@github.com:avar/git.git'
>
> Thanks for looking into this. Despite being largely responsible for that
> message myself, I always cringe when I see it because it's so opaque.
>
>> This message needed to be read very carefully to spot how to fix the
>> error, i.e. to push to refs/heads/newbranch, and it didn't use the
>> advice system (since initial addition of the error predated it).
>>
>> Fix both of those, now the message will look like this instead:
>>
>> $ ./git-push avar v2.19.0^{commit}:newbranch -n
>> error: unable to push to unqualified destination: newbranch
>> hint: The destination refspec neither matches an existing
>> hint: ref on the remote nor begins with refs/, and we are
>> hint: unable to guess a prefix based on the source ref.
>> hint:
>> hint: The  part of the refspec is a commit object.
>> hint: Did you mean to create a new branch by pushing to
>> hint: 'v2.19.0^{commit}:refs/heads/newbranch'?
>> error: failed to push some refs to 'g...@github.com:avar/git.git'
>>
>> When trying to push a tag, tree or a blob we suggest that perhaps the
>> user meant to push them to refs/tags/ instead.
>
> This is much better, and I love the customized behavior based on the
> object type.
>
> I wonder if we could reword the first paragraph to be a little less
> confusing, and spell out what we tried already. E.g., something like:
>
>   The destination you provided is not a full refname (i.e., starting
>   with "ref"). Git tried to guess what you meant by:
>
> - looking for a matching branch or tag on the remote side
>
> - looking at the refname of the local source
>
>   but neither worked.
>
>   The  part of the refspec is a commit object.
>   Did you mean...

Yeah that makes sense. I was trying to avoid touching the existing
wording to make this more surgical, but you came up with it, and since
you don't like it I'll just change that too.

> I'm not sure about saying "branch or tag" in the first bullet. It's
> friendlier to most users, but less technically correct (if you said
> "notes/foo", I believe we'd match an existing "refs/notes/foo", because
> it's really just using the normal lookup rules).
>
> Also, as an aside, I wonder if we should allow "heads/foo" to work as
> "refs/heads/foo" (even when no such ref already exists). But that is
> totally orthogonal to changing the message.
>
>> The if/else duplication for all of OBJ_{COMMIT,TAG,TREE,BLOB} is
>> unfortunate, but is required to correctly mark the messages for
>> translation.
>
> I think it would probably be OK to put the first paragraph in its own
> variable. I know we try to avoid translation lego, but I'd think
> paragraphs are separate units. Or are you worried about how to get them
> into the same advise() call? I don't know that we need to, but we could
> also plug one into the other with a "%s" (and leave a translator note).

To be honest from being on the code side of a much bigger i18n effort
(the MediaWiki/WikiMedia software) back in the early days of my career I
just do this sort of thing reflexively, because from experience when I
started trying to simplify stuff by making assumptions I was wrong every
time.

Although in that case there were >100+ languages, so maybe we can get
away with this.

In this case one red flag I see is that we make a reference to "the
source ref" in the first paragraph, and then in the second we'll either
talk about "commit", "tag" or "blob" etc. Now imagine a language where
those words have different genders, and where even secondary references
to those things ("the source ref") spill over and need to be changed
too.

You also get languages where a message that stretches multiple
paragraphs flows more naturally if the wording is re-arranged, even
between paragraphs. This is why document translation systems generally
split things by sections at best (not paragraphs), or just by whole
documents.

>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index 1546833213..fd455e2739 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -320,6 +320,13 @@ advice.*::
>>  tries to overwrite a remote ref that points at an
>>  object that is not a commit-ish, or make the remote
>>  ref point at an object that is not a commit-ish.
>> +pushAmbigiousRefName::
>> +Shown 

Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275

2018-10-10 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:

> Right. I know. What I mean is now I can (and do) use it to run 'git gc
> --auto' across our server fleet and see whether I have any of #3, or
> whether it's all #1 or #2. If there's nothing to do in #1 that's fine,
> and it just so happens that I'll run gc due to #2 that's also fine, but
> I'd like to see if gc really is stuck.
>
> This of course relies on them having other users / scripts doing normal
> git commands which would trigger previous 'git gc --auto' runs.
>
> I.e. with your change that command:
>
> git gc --auto
>
> Would change to something like:
>
> git gc --auto && ! test -e .git/gc.log
>
> Which, as noted is a bit of a nasty breaker of the encapsulation

That helps.  What if we package up the "test -e .git/gc.log" bit
*without* having the side effect of running git gc --auto, so that you
can run

if ! git gc --detached-exit-code
then
... handle the error ...
fi
git gc --auto; # perhaps also with --detach

?

I'm not great at naming options, so the --detached-exit-code name is
bikesheddable.  What I really mean to ask about is: what if the status
reporting goes in a separate command from running gc --auto?

Perhaps this reporting could also print the message from a previous
run, so you could write:

git gc --detached-status || exit
git gc --auto; # perhaps also passing --detach

(Names still open for bikeshedding.)

Thanks and hope that helps,
Jonathan


Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275

2018-10-10 Thread Ævar Arnfjörð Bjarmason


On Wed, Oct 10 2018, Jonathan Nieder wrote:

> Hi,
>
> Ævar Arnfjörð Bjarmason wrote:
>
>> Add an --auto-exit-code variable and a corresponding 'gc.autoExitCode'
>> configuration option to optionally bring back the 'git gc --auto' exit
>> code behavior as it existed between 2.6.3..2.19.0 (inclusive).
>
> Hm.  Can you tell me more about the use case where this would be
> helpful to you?  That would help us come up with a better name for it.

>From the E-Mail linked from the commit message[1] (I opted not to put
this in, because it was getting a bit long:

Right. I know. What I mean is now I can (and do) use it to run 'git gc
--auto' across our server fleet and see whether I have any of #3, or
whether it's all #1 or #2. If there's nothing to do in #1 that's fine,
and it just so happens that I'll run gc due to #2 that's also fine, but
I'd like to see if gc really is stuck.

This of course relies on them having other users / scripts doing normal
git commands which would trigger previous 'git gc --auto' runs.

I.e. with your change that command:

git gc --auto

Would change to something like:

git gc --auto && ! test -e .git/gc.log

Which, as noted is a bit of a nasty breaker of the encapsulation, so
now:

git gc --auto --auto-exit-code

Or just a variant of that which will have dropped the config in-place in
/etc/gitconfig, and then as before:

git gc --auto

1. https://public-inbox.org/git/878t69dgvx@evledraar.gmail.com/


Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275

2018-10-10 Thread Ævar Arnfjörð Bjarmason


On Wed, Oct 10 2018, Jeff King wrote:

> On Wed, Oct 10, 2018 at 07:27:32PM +, Ævar Arnfjörð Bjarmason wrote:
>
>> Add an --auto-exit-code variable and a corresponding 'gc.autoExitCode'
>> configuration option to optionally bring back the 'git gc --auto' exit
>> code behavior as it existed between 2.6.3..2.19.0 (inclusive).
>>
>> This was changed in 3029970275 ("gc: do not return error for prior
>> errors in daemonized mode", 2018-07-16). The motivation for that patch
>> was to appease 3rd party tools whose treatment of the 'git gc --auto'
>> exit code is different from that of git core where it has always been
>> ignored.
>
> OK. I wouldn't want to use this myself, but I think you've made clear
> why you find it useful. So I don't mind making it an optional behavior
> (and it probably beats you trying to poke at the logfile yourself).

[...]

> I'm not sure if the config is going to actually help that much, though.
> The callers within Git will generally ignore the exit code anyway. So
> for those cases, setting it will at best do nothing, and at worst it may
> confuse the few stragglers (e.g., the git-svn one under recent
> discussion).

Yeah git internals don't care, but we've never advertised the
combination of --auto and gc.autoDetach=true as being something
internal-only, so e.g. I wrote stuff expecting errors, and one might run
"git gc --auto" in a repo whose .git/objects state is uncertain to see
if it needed repack (and have a shell integration that reports
failures...).

> Callers who _are_ prepared to act on the exit code probably ought to
> just use --auto-exit-code in their invocation.
>
> That said, I'm not entirely opposed to the matching config. There's
> enough history here that somebody might want a sledgehammer setting to
> go back to the old behavior.

If it's not a config option then as git is upgraded I'll need to change
my across-server invocation to be some variant of checking git version,
then etiher using the --auto-exit-code option or not (which'll error on
older gits). Easier to be able to just drop in a config setting before
the upgrade.


[PATCH v2 1/1] branch: introduce --show-current display option

2018-10-10 Thread Daniels Umanovskis
When called with --show-current, git branch will print the current
branch name and terminate. Only the actual name gets printed,
without refs/heads. In detached HEAD state, nothing is output.

Intended both for scripting and interactive/informative use.
Unlike git branch --list, no filtering is needed to just get the
branch name.

Signed-off-by: Daniels Umanovskis 
---
 Documentation/git-branch.txt |  6 +-
 builtin/branch.c | 21 --
 t/t3203-branch-output.sh | 41 
 3 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
index bf5316ffa..0babb9b1b 100644
--- a/Documentation/git-branch.txt
+++ b/Documentation/git-branch.txt
@@ -9,7 +9,7 @@ SYNOPSIS
 
 [verse]
 'git branch' [--color[=] | --no-color] [-r | -a]
-   [--list] [-v [--abbrev= | --no-abbrev]]
+   [--list] [--show-current] [-v [--abbrev= | --no-abbrev]]
[--column[=] | --no-column] [--sort=]
[(--merged | --no-merged) []]
[--contains []]
@@ -160,6 +160,10 @@ This option is only applicable in non-verbose mode.
branch --list 'maint-*'`, list only the branches that match
the pattern(s).
 
+--show-current::
+   Print the name of the current branch. In detached HEAD state,
+   nothing is printed.
+
 -v::
 -vv::
 --verbose::
diff --git a/builtin/branch.c b/builtin/branch.c
index c396c4153..ab03073b2 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -443,6 +443,17 @@ static void print_ref_list(struct ref_filter *filter, 
struct ref_sorting *sortin
free(to_free);
 }
 
+static void print_current_branch_name()
+{
+   const char *refname = resolve_ref_unsafe("HEAD", 0, NULL, NULL);
+   const char *shortname;
+   if (refname == NULL || !strcmp(refname, "HEAD"))
+   return;
+   if (!skip_prefix(refname, "refs/heads/", ))
+   die(_("unexpected symbolic ref for HEAD: %s"), refname);
+   puts(shortname);
+}
+
 static void reject_rebase_or_bisect_branch(const char *target)
 {
struct worktree **worktrees = get_worktrees(0);
@@ -581,6 +592,7 @@ static int edit_branch_description(const char *branch_name)
 int cmd_branch(int argc, const char **argv, const char *prefix)
 {
int delete = 0, rename = 0, copy = 0, force = 0, list = 0;
+   int show_current = 0;
int reflog = 0, edit_description = 0;
int quiet = 0, unset_upstream = 0;
const char *new_upstream = NULL;
@@ -620,6 +632,7 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
OPT_BIT('c', "copy", , N_("copy a branch and its reflog"), 
1),
OPT_BIT('C', NULL, , N_("copy a branch, even if target 
exists"), 2),
OPT_BOOL('l', "list", , N_("list branch names")),
+   OPT_BOOL(0, "show-current", _current, N_("show current 
branch name")),
OPT_BOOL(0, "create-reflog", , N_("create the branch's 
reflog")),
OPT_BOOL(0, "edit-description", _description,
 N_("edit the description for the branch")),
@@ -662,14 +675,15 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
argc = parse_options(argc, argv, prefix, options, builtin_branch_usage,
 0);
 
-   if (!delete && !rename && !copy && !edit_description && !new_upstream 
&& !unset_upstream && argc == 0)
+   if (!delete && !rename && !copy && !edit_description && !new_upstream &&
+   !show_current && !unset_upstream && argc == 0)
list = 1;
 
if (filter.with_commit || filter.merge != REF_FILTER_MERGED_NONE || 
filter.points_at.nr ||
filter.no_commit)
list = 1;
 
-   if (!!delete + !!rename + !!copy + !!new_upstream +
+   if (!!delete + !!rename + !!copy + !!new_upstream + !!show_current +
list + unset_upstream > 1)
usage_with_options(builtin_branch_usage, options);
 
@@ -697,6 +711,9 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
if (!argc)
die(_("branch name required"));
return delete_branches(argc, argv, delete > 1, filter.kind, 
quiet);
+   } else if (show_current) {
+   print_current_branch_name();
+   return 0;
} else if (list) {
/*  git branch --local also shows HEAD when it is detached */
if ((filter.kind & FILTER_REFS_BRANCHES) && filter.detached)
diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index ee6787614..e9bc3b05f 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -100,6 +100,47 @@ test_expect_success 'git branch -v pattern does not show 
branch summaries' '
test_must_fail git branch -v branch*
 '
 
+test_expect_success 'git branch `--show-current` shows current branch' '
+   cat >expect <<-\EOF &&
+ 

Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275

2018-10-10 Thread Jonathan Nieder
Hi,

Ævar Arnfjörð Bjarmason wrote:

> Add an --auto-exit-code variable and a corresponding 'gc.autoExitCode'
> configuration option to optionally bring back the 'git gc --auto' exit
> code behavior as it existed between 2.6.3..2.19.0 (inclusive).

Hm.  Can you tell me more about the use case where this would be
helpful to you?  That would help us come up with a better name for it.

Thanks,
Jonathan


Re: [PATCH 2/2] push: add an advice on unqualified push

2018-10-10 Thread Jeff King
On Wed, Oct 10, 2018 at 10:41:45AM +, Ævar Arnfjörð Bjarmason wrote:

> Improve the error message added in f8aae12034 ("push: allow
> unqualified dest refspecs to DWIM", 2008-04-23), which before this
> change looks like this:
> 
> $ git push avar v2.19.0^{commit}:newbranch -n
> error: unable to push to unqualified destination: newbranch
> The destination refspec neither matches an existing ref on the remote nor
> begins with refs/, and we are unable to guess a prefix based on the 
> source ref.
> error: failed to push some refs to 'g...@github.com:avar/git.git'

Thanks for looking into this. Despite being largely responsible for that
message myself, I always cringe when I see it because it's so opaque.

> This message needed to be read very carefully to spot how to fix the
> error, i.e. to push to refs/heads/newbranch, and it didn't use the
> advice system (since initial addition of the error predated it).
> 
> Fix both of those, now the message will look like this instead:
> 
> $ ./git-push avar v2.19.0^{commit}:newbranch -n
> error: unable to push to unqualified destination: newbranch
> hint: The destination refspec neither matches an existing
> hint: ref on the remote nor begins with refs/, and we are
> hint: unable to guess a prefix based on the source ref.
> hint:
> hint: The  part of the refspec is a commit object.
> hint: Did you mean to create a new branch by pushing to
> hint: 'v2.19.0^{commit}:refs/heads/newbranch'?
> error: failed to push some refs to 'g...@github.com:avar/git.git'
> 
> When trying to push a tag, tree or a blob we suggest that perhaps the
> user meant to push them to refs/tags/ instead.

This is much better, and I love the customized behavior based on the
object type.

I wonder if we could reword the first paragraph to be a little less
confusing, and spell out what we tried already. E.g., something like:

  The destination you provided is not a full refname (i.e., starting
  with "ref"). Git tried to guess what you meant by:

- looking for a matching branch or tag on the remote side

- looking at the refname of the local source

  but neither worked.

  The  part of the refspec is a commit object.
  Did you mean...

I'm not sure about saying "branch or tag" in the first bullet. It's
friendlier to most users, but less technically correct (if you said
"notes/foo", I believe we'd match an existing "refs/notes/foo", because
it's really just using the normal lookup rules).

Also, as an aside, I wonder if we should allow "heads/foo" to work as
"refs/heads/foo" (even when no such ref already exists). But that is
totally orthogonal to changing the message.

> The if/else duplication for all of OBJ_{COMMIT,TAG,TREE,BLOB} is
> unfortunate, but is required to correctly mark the messages for
> translation.

I think it would probably be OK to put the first paragraph in its own
variable. I know we try to avoid translation lego, but I'd think
paragraphs are separate units. Or are you worried about how to get them
into the same advise() call? I don't know that we need to, but we could
also plug one into the other with a "%s" (and leave a translator note).

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1546833213..fd455e2739 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -320,6 +320,13 @@ advice.*::
>   tries to overwrite a remote ref that points at an
>   object that is not a commit-ish, or make the remote
>   ref point at an object that is not a commit-ish.
> + pushAmbigiousRefName::
> + Shown when linkgit:git-push[1] gives up trying to
> + guess based on the source and destination refs what
> + remote ref namespace the source belongs in, but where
> + we can still suggest that the user push to either
> + refs/heads/* or refs/tags/* based on the type of the
> + source object.

I guess you could argue that this is "ambiguous", but usually we'd use
that term to mean "there were two branches that matched on the other
side". But here it's actually "no branches matched" (actually, it makes
me wonder what we'd do pushing "foo" when that name is ambiguous on the
other side).

So I wonder if this ought to be pushUnqualifiedRefname or something.

> @@ -1046,13 +1047,60 @@ static int match_explicit(struct ref *src, struct ref 
> *dst,
>   else if ((dst_guess = guess_ref(dst_value, matched_src))) {
>   matched_dst = make_linked_ref(dst_guess, dst_tail);
>   free(dst_guess);
> - } else
> - error(_("unable to push to unqualified destination: 
> %s\n"
> - "The destination refspec neither matches an "
> - "existing ref on the remote nor\n"
> - "begins with refs/, and we are unable to "
> -  

Re: [PATCH 1/2] i18n: remote.c: mark error(...) messages for translation

2018-10-10 Thread Jeff King
On Wed, Oct 10, 2018 at 10:41:44AM +, Ævar Arnfjörð Bjarmason wrote:

> Mark up the error(...) messages in remote.c for translation. The likes
> of "unable to push to unqualified destination" are relatively big
> parts of the UI, i.e. error messages shown when "git push" fails.
> 
> I don't think any of these are plumbing, an the entire test suite
> passes when running the tests with GIT_GETTEXT_POISON=1 (after
> building with GETTEXT_POISON).

So obviously the second patch is much more interesting, and I focused
most of my comments there. ;)

But this one seems like an obvious improvement.

-Peff


[PATCH v2 0/1] branch: introduce --show-current display option

2018-10-10 Thread Daniels Umanovskis
v2 reroll of a previously-discussed patch. Thanks to everyone for their
comments. Based on feedback:

1. Command is now a verb: git branch --show-current.

2. Changed to gitster's suggested implementation: nothing is printed
 if HEAD does not point to a symbolic ref. A fatal
 error if HEAD is a symbolic ref but does not start with refs/heads/.

3. Added a test to show this works with worktrees

A process question to the list. The patch adds a new localizable string
that gets output in case of repository corruption. I happen to speak a
couple of the languages that have po files. Is it accepted practice to
also include po edits in my patch in such a case, or should that be
left to the regular l10n workflow?

Daniels Umanovskis (1):
  branch: introduce --show-current display option

 Documentation/git-branch.txt |  6 +-
 builtin/branch.c | 21 --
 t/t3203-branch-output.sh | 41 
 3 files changed, 65 insertions(+), 3 deletions(-)

-- 
2.19.1.330.g93276587c.dirty



Re: [PATCH v3 1/2] commit-graph write: add progress output

2018-10-10 Thread SZEDER Gábor
On Mon, Sep 17, 2018 at 03:33:35PM +, Ævar Arnfjörð Bjarmason wrote:
> $ git -c gc.writeCommitGraph=true gc
> [...]
> Annotating commits in commit graph: 1565573, done.
> Computing commit graph generation numbers: 100% (782484/782484), done.

While poking around 'commit-graph.c' in my Bloom filter experiment, I
saw similar numbers like above, and was confused by the much higher
than expected number of annotated commits.  It's about twice as much
as the number of commits in the repository, or the number shown on the
very next line.

> diff --git a/commit-graph.c b/commit-graph.c
> index 8a1bec7b8a..2c5d996194 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> -static void close_reachable(struct packed_oid_list *oids)
> +static void close_reachable(struct packed_oid_list *oids, int 
> report_progress)
>  {
>   int i;
>   struct commit *commit;
> + struct progress *progress = NULL;
> + int j = 0;
>  
> + if (report_progress)
> + progress = start_delayed_progress(
> + _("Annotating commits in commit graph"), 0);
>   for (i = 0; i < oids->nr; i++) {
> + display_progress(progress, ++j);
>   commit = lookup_commit(the_repository, >list[i]);
>   if (commit)
>   commit->object.flags |= UNINTERESTING;
> @@ -604,6 +616,7 @@ static void close_reachable(struct packed_oid_list *oids)
>* closure.
>*/
>   for (i = 0; i < oids->nr; i++) {
> + display_progress(progress, ++j);
>   commit = lookup_commit(the_repository, >list[i]);
>  
>   if (commit && !parse_commit(commit))
> @@ -611,19 +624,28 @@ static void close_reachable(struct packed_oid_list 
> *oids)
>   }

The above loops have already counted all the commits, and, more
importantly, did all the hard work that takes time and makes the
progress indicator useful.

>   for (i = 0; i < oids->nr; i++) {
> + display_progress(progress, ++j);

This display_progress() call, however, doesn't seem to be necessary.
First, it counts all commits for a second time, resulting in the ~2x
difference compared to the actual number of commits, and then causing
my confusion.  Second, all what this loop is doing is setting a flag
in commits that were already looked up and parsed in the above loops.
IOW this loop is very fast, and the progress indicator jumps from
~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
progress indicator at all.

>   commit = lookup_commit(the_repository, >list[i]);
>  
>   if (commit)
>   commit->object.flags &= ~UNINTERESTING;
>   }
> + stop_progress();
>  }


Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-10 Thread Tim Schumacher

On 10.10.18 07:43, Junio C Hamano wrote:

* ts/alias-of-alias (2018-09-17) 3 commits
   (merged to 'next' on 2018-10-09 at ac19b4730b)
  + t0014: introduce an alias testing suite
  + alias: show the call history when an alias is looping
  + alias: add support for aliases of an alias

  An alias that expands to another alias has so far been forbidden,
  but now it is allowed to create such an alias.

  Will merge to 'master'.

Oh well, I still have the changed comment stored locally.
I guess that has to wait for another time.

Anyways, thanks for pulling this in.

PS: I hope that this E-Mail is formatted correctly. Thunderbird
received an update and now it doesn't show me plain text when
composing an E-Mail.


Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275

2018-10-10 Thread Jeff King
On Wed, Oct 10, 2018 at 07:27:32PM +, Ævar Arnfjörð Bjarmason wrote:

> Add an --auto-exit-code variable and a corresponding 'gc.autoExitCode'
> configuration option to optionally bring back the 'git gc --auto' exit
> code behavior as it existed between 2.6.3..2.19.0 (inclusive).
> 
> This was changed in 3029970275 ("gc: do not return error for prior
> errors in daemonized mode", 2018-07-16). The motivation for that patch
> was to appease 3rd party tools whose treatment of the 'git gc --auto'
> exit code is different from that of git core where it has always been
> ignored.

OK. I wouldn't want to use this myself, but I think you've made clear
why you find it useful. So I don't mind making it an optional behavior
(and it probably beats you trying to poke at the logfile yourself).

I'm not sure if the config is going to actually help that much, though.
The callers within Git will generally ignore the exit code anyway. So
for those cases, setting it will at best do nothing, and at worst it may
confuse the few stragglers (e.g., the git-svn one under recent
discussion).

Callers who _are_ prepared to act on the exit code probably ought to
just use --auto-exit-code in their invocation.

That said, I'm not entirely opposed to the matching config. There's
enough history here that somebody might want a sledgehammer setting to
go back to the old behavior.

-Peff


[PATCH] gc doc: mention the commit-graph in the intro

2018-10-10 Thread Ævar Arnfjörð Bjarmason
Explicitly mention in the intro that we may be writing supplemental
data structures such as the commit-graph during "gc", i.e. to call out
the "optimize" part of what this command does, it doesn't just
"collect garbage" as the "gc" name might imply.

Past changes have updated the intro to reflect new commands, such as
mentioning "worktree" in b586a96a39 ("gc.txt: more details about what
gc does", 2018-03-15). So let's elaborate on what was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27).

See also
https://public-inbox.org/git/87tvm3go42@evledraar.gmail.com/ (follow-up
replies) for an on-list discussion about what "gc" does.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

In light of the linked thread let's see how controversial this is as a
stand-alone :)

 Documentation/git-gc.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index f5bc98ccb3..c20ee6c789 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -17,7 +17,8 @@ Runs a number of housekeeping tasks within the current 
repository,
 such as compressing file revisions (to reduce disk space and increase
 performance), removing unreachable objects which may have been
 created from prior invocations of 'git add', packing refs, pruning
-reflog, rerere metadata or stale working trees.
+reflog, rerere metadata or stale working trees. May also update ancillary
+indexes such as the commit-graph.
 
 Users are encouraged to run this task on a regular basis within
 each repository to maintain good disk space utilization and good
-- 
2.19.1.390.gf3a00b506f



[PATCH] revert & cherry-pick: run git gc --auto

2018-10-10 Thread Ævar Arnfjörð Bjarmason
Expand on the work started in 095c741edd ("commit: run git gc --auto
just before the post-commit hook", 2018-02-28) to run "gc --auto" in
more commands where new objects can be created.

The notably missing commands are now "rebase" and "stash". Both are
being rewritten in C, so any use of "gc --auto" there can wait for
that.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

After reading the "Users are encouraged to run this task..." paragraph
in the git-gc manpage I was wondering if due to gc --auto all over the
place now (including recently in git-commit with a patch of mine) if
we shouldn't change that advice.

I'm meaning to send some doc changes to git-gc.txt, but in the
meantime let's address this low-hanging fruit of running gc --auto
when we revert or cherry-pick commits, which can like git-commit
create a significant amount of loose objects.

 builtin/revert.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/builtin/revert.c b/builtin/revert.c
index 9a66720cfc..1b20902910 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -209,6 +209,7 @@ int cmd_revert(int argc, const char **argv, const char 
*prefix)
 {
struct replay_opts opts = REPLAY_OPTS_INIT;
int res;
+   const char *argv_gc_auto[] = {"gc", "--auto", NULL};
 
if (isatty(0))
opts.edit = 1;
@@ -217,6 +218,7 @@ int cmd_revert(int argc, const char **argv, const char 
*prefix)
res = run_sequencer(argc, argv, );
if (res < 0)
die(_("revert failed"));
+   run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
return res;
 }
 
@@ -224,11 +226,13 @@ int cmd_cherry_pick(int argc, const char **argv, const 
char *prefix)
 {
struct replay_opts opts = REPLAY_OPTS_INIT;
int res;
+   const char *argv_gc_auto[] = {"gc", "--auto", NULL};
 
opts.action = REPLAY_PICK;
sequencer_init_config();
res = run_sequencer(argc, argv, );
if (res < 0)
die(_("cherry-pick failed"));
+   run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
return res;
 }
-- 
2.19.1.390.gf3a00b506f



[PATCH] gc: remove redundant check for gc_auto_threshold

2018-10-10 Thread Ævar Arnfjörð Bjarmason
Checking gc_auto_threshold in too_many_loose_objects() was added in
17815501a8 ("git-gc --auto: run "repack -A -d -l" as necessary.",
2007-09-17) when need_to_gc() itself was also reliant on
gc_auto_pack_limit before its early return:

gc_auto_threshold <= 0 && gc_auto_pack_limit <= 0

When that check was simplified to just checking "gc_auto_threshold <=
0" in b14d255ba8 ("builtin-gc.c: allow disabling all auto-gc'ing by
assigning 0 to gc.auto", 2008-03-19) this unreachable code should have
been removed. We only call too_many_loose_objects() from within
need_to_gc() itself, which will return if this condition holds, and in
cmd_gc() which will return before ever getting to "auto_gc &&
too_many_loose_objects()" if "auto_gc && !need_to_gc()" is true
earlier in the function.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

I had this in my tree as part of some general gc cleanups I was
working on, but since it's trivially considered as a stand-alone topic
and unlikely to conflict with anything I or anyone else has planned
I'm sending it as a one-off.

 builtin/gc.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 2b592260e9..5f25a35dfc 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -157,9 +157,6 @@ static int too_many_loose_objects(void)
int num_loose = 0;
int needed = 0;
 
-   if (gc_auto_threshold <= 0)
-   return 0;
-
dir = opendir(git_path("objects/17"));
if (!dir)
return 0;
-- 
2.19.1.390.gf3a00b506f



[PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275

2018-10-10 Thread Ævar Arnfjörð Bjarmason
Add an --auto-exit-code variable and a corresponding 'gc.autoExitCode'
configuration option to optionally bring back the 'git gc --auto' exit
code behavior as it existed between 2.6.3..2.19.0 (inclusive).

This was changed in 3029970275 ("gc: do not return error for prior
errors in daemonized mode", 2018-07-16). The motivation for that patch
was to appease 3rd party tools whose treatment of the 'git gc --auto'
exit code is different from that of git core where it has always been
ignored.

That means that out of the three modes gc --auto will operate in:

 1. gc --auto has nothing to do
 2. gc --auto has something to do, will fork and try to do it
 3. gc --auto has something to do, but notices that gc has been failing
before when forked and can't do anything now.

We started exiting with zero in the case of #3, instead of
non-zero (see [1] for more details). As noted by the docs being added
here the #3 case is relatively rare, so I think it's fine to change
this as the default with the assumption that the use-case for tools
like the "repo" tool noted in commit 3029970275 above are more common
than not.

But it left us without any option to have "git gc --auto" tell us
about this failure except by starting to either parse its output, or
for the caller to start breaking the encapsulation and starting to
check for .git/gc.log themselves.

Let's instead provide an option to exit with non-zero when we have
errors to tell the user about, and provide a configuration option so
that it can be dropped in-place in anticipation of upgrading to Git
version 2.20 without having to make using --auto-exit-code conditional
on the git version in use.

1. https://public-inbox.org/git/878t69dgvx@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

> On Wed, Oct 10, 2018 at 09:51:52AM -0700, Jonathan Nieder wrote:
>
>> Ævar Arnfjörð Bjarmason wrote:
>> 
>> > I'm just saying it's hard in this case to remove one piece without the
>> > whole Jenga tower collapsing, and it's probably a good idea in some of
>> > these cases to pester the user about what he wants, but probably not via
>> > gc --auto emitting the same warning every time, e.g. in one of these
>> > threads I suggested maybe "git status" should emit this.
>> 
>> I have to say, I don't have a lot of sympathy for this.
>> 
>> I've been running with the patches I sent before for a while now, and
>> the behavior that they create is great.  I think we can make further
>> refinements on top.  To put it another way, I haven't actually
>> experienced any bad knock-on effects, and I think other feature
>> requests can be addressed separately.
>
> I think there may be some miscommunication here. The Jenga tower above
> is referring (I think) to Jonathan Tan's original patch to drop the
> warning entirely, which does have some unwanted side effects.
>
> Your patches are much less controversial, I think, and are in next and
> marked as "will merge to master" in the last "what's cooking".

[Junio: This goes on top of gitster/jn/gc-auto]

I thought the jn/gc-auto topic was still in "pu", my fault for not
paying attention. It seems the general consensus is against my notion
of what should be the default (which is fine), but since (as noted in
the patch) it seems yucky to start breaking the encapsulation of
gc.log, especially as it's looking more and more likely that it'll be
an implementation detail we might drop, here's a patch on top of
jn/gc-auto to optionally restore the old behavior.

 Documentation/config.txt | 28 
 Documentation/git-gc.txt |  7 +++
 builtin/gc.c |  7 ++-
 t/t6500-gc.sh|  2 ++
 4 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 5b72684999..e37a463bf8 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1635,6 +1635,34 @@ gc.autoDetach::
Make `git gc --auto` return immediately and run in background
if the system supports it. Default is true.
 
+gc.autoExitCode::
+   Make `git gc --auto` return non-zero if it would have
+   demonized itself (see `gc.autoDetach`) due to a needed gc, but
+   a 'gc.log' is found within the `gc.logExpiry` with an error
+   from a previous run.
++
+When 'git gc' is run with the default of `gc.autoDetach=true` a
+failure might have been noted in the 'gc.log' from a previously
+detached `--auto` run. If the failure is within the time configured in
+`gc.logExpiry` the `--auto` run will abort early and report the error
+in the 'gc.log'.
++
+From version 2.6.3 to version 2.19 of Git encountering this error
+would cause 'git gc' to exit with non-zero, but this was deemed to be
+a hassle for third-party tools to handle since it rarely happens, and
+they usually don't assume that 'git gc --auto' can fail. Therefore
+since version 2.20 of Git 'git gc --auto' will always exit with zero
+if it would have demonized itself, even when encountering 

Subscriber

2018-10-10 Thread Noômen B . Hassin
Need for update subscribing to mailing list from @git
Thanks

Sent from my Evertek

Re: What's so special about objects/17/ ?

2018-10-10 Thread Stefan Beller
On Tue, Oct 9, 2018 at 6:10 PM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
> >> Oh, I think I misled you by saying "more important".
> >> ...
> > I do challenge the decision to take a hardcoded value, though, ...
>
> I do not find any reason why you need to say "though" here.

I caught myself using lots of filler-words lately.

  Though, however, I think, I would guess, IMHO
  fills a lot of space without saying much.

I'll reduce that.

>  If you
> understood the message you are responding to that use of hardcoded
> value was chosen not to help the end-user experience, it should have
> been clear that we are in agreement.

We are, but for different reasons.

> I also sometimes find certain people here are unnecessarily
> combative in their discussion.  It this just some language issue?

certain people? ;-)
I have issues with ambiguity in communication directed towards me,
which is why I sometimes try to be very direct and blunt.
Other times I strive on ambiguity as well (mostly in my reviews).


Re: `--rebase-merges' still failing badly

2018-10-10 Thread Michael Witten
On Wed, 10 Oct 2018 18:51:17 -, Michael Witten wrote:

> merge -# Same as merge -C abcde r1

That should be:

  merge -# Same as `merge r1'


Re: [PATCH v6 09/10] submodule: support reading .gitmodules when it's not in the working tree

2018-10-10 Thread Antonio Ospite
On Mon, 8 Oct 2018 15:19:00 -0700
Stefan Beller  wrote:

> > +test_expect_success 'not writing gitmodules config file when it is not 
> > checked out' '
> > +test_must_fail git -C super submodule--helper config 
> > submodule.submodule.url newurl
> 
> This only checks the exit code, do we also want to check for
> 
> test_path_is_missing .gitmodules ?
>

OK, I agree, let's re-check also *after* we tried and failed to set
a config value, just to be sure that the code does not get accidentally
changed in the future to create the file. I'll add the check.

> > +test_expect_success 'initialising submodule when the gitmodules config is 
> > not checked out' '
> > +   git -C super submodule init
> > +'
> > +
> > +test_expect_success 'showing submodule summary when the gitmodules config 
> > is not checked out' '
> > +   git -C super submodule summary
> > +'
> 
> Same for these, is the exit code enough, or do we want to look at
> specific things?
>

Except for the "summary" test which was not even exercising the
config_from_gitmodule path,  checking exist status should be sufficient
to verify that "submodule--helper config" does not fail, but we can
surely do better.

I will add checks to confirm that not only the commands exited without
errors but they also achieved the desired effect, to validate the actual
high-level use case advertised by the test file. This should be more
future-proof.

And I think I'll merge the summary and the update tests.

> > +
> > +test_expect_success 'updating submodule when the gitmodules config is not 
> > checked out' '
> > +   (cd submodule &&
> > +   echo file2 >file2 &&
> > +   git add file2 &&
> > +   git commit -m "add file2 to submodule"
> > +   ) &&
> > +   git -C super submodule update
> 
> git status would want to be clean afterwards?

Mmh, this should have been "submodule update --remote" in the first
place to have any effect, I'll take the chance and rewrite this test in
a different way and also check the effect of the update operation, and
the repository status.

I'll be something like this:

ORIG_SUBMODULE=$(git -C submodule rev-parse HEAD)
ORIG_UPSTREAM=$(git -C upstream rev-parse HEAD)
ORIG_SUPER=$(git -C super rev-parse HEAD)

test_expect_success 're-updating submodule when the gitmodules config is not 
checked out' '
test_when_finished "git -C submodule reset --hard $ORIG_SUBMODULE;
git -C upstream reset --hard $ORIG_UPSTREAM;
git -C super reset --hard $ORIG_SUPER;
git -C upstream submodule update --remote;
git -C super pull;
git -C super submodule update --remote" &&
(cd submodule &&
echo file2 >file2 &&
git add file2 &&
test_tick &&
git commit -m "add file2 to submodule"
) &&
(cd upstream &&
git submodule update --remote &&
git add submodule &&
test_tick &&
git commit -m "Update submodule"
) &&
git -C super pull &&
# The --for-status options reads the gitmdoules config
git -C super submodule summary --for-status >actual &&
cat >expect <<-\EOF &&
* submodule 951c301...a939200 (1):
  < add file2 to submodule

EOF
test_cmp expect actual &&
# Test that the update actually succeeds
test_path_is_missing super/submodule/file2 &&
git -C super submodule update &&
test_cmp submodule/file2 super/submodule/file2 &&
git -C super status --short >output &&
test_must_be_empty output
'

Maybe a little overkill?

The "upstream" repo will be added in test 1 to better clarify the roles
of the involved repositories.

The commit ids should be stable because of test_tick, shouldn't they?

Thanks for the comments, they helped improving the quality of the tests
once again.

I'll wait a few days before sending a v7, hopefully someone will find
time to take another look at patch 9 and comment also on patch 10, and
give an opinion on the "mergeability" status of the whole patchset.

Ciao ciao,
   Antonio

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?


Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-10 Thread Stefan Beller
> * pw/diff-color-moved-ws-fix (2018-10-04) 5 commits
>  - diff --color-moved: fix a memory leak
>  - diff --color-moved-ws: fix another memory leak
>  - diff --color-moved-ws: fix a memory leak
>  - diff --color-moved-ws: fix out of bounds string access
>  - diff --color-moved-ws: fix double free crash
>
>  Various fixes to "diff --color-moved-ws".
>
>  What's the status of this topic?

Per [1] ("The whole series is
Reviewed-by: Stefan Beller "),
I would suggest merging to 'next'.

[1] 
https://public-inbox.org/git/CAGZ79kbamUK=d+-ejy9vopdivzf7ovongz1zx9y04vr3hnm...@mail.gmail.com/

> * sb/strbuf-h-update (2018-09-29) 1 commit
>  - strbuf.h: format according to coding guidelines
>
>  Code clean-up to serve as a BCP example.
>
>  What's the status of this one after the discussion thread stopped here?
>  cf. 

I was waiting for more discussion and stricter guidelines,
which never happened.

The only controversial issue about this patch is whether we want
to name all parameters or only when we feel like it.

Peff did not seem to care about this particular detail
https://public-inbox.org/git/20180929073827.gd2...@sigill.intra.peff.net/

You suggested to embrace it further and use caps for the parameter
names in the docs comment.
https://public-inbox.org/git/xmqq8t3lb8uu@gitster-ct.c.googlers.com/

The patch as-is just adds names everywhere.
I'd be happy to resend with either
(a) not enforcing names everywhere, but only as needed or
(b) having names everywhere, capitalizing them NAMES in
the doc comment.

I am tempted to ask for
(c) take as-is, defer the rewording of doc strings for a follow up patch.

> * sb/grep-submodule-cleanup (2018-10-10) 1 commit
>  - builtin/grep.c: remove superfluous submodule code
>
>  Code clean-up.
>
>  cf. <20181010001037.74709-1-jonathanta...@google.com>

Will resend.


> * bw/submodule-name-to-dir (2018-08-10) 2 commits
>  - submodule: munge paths to submodule git directories
>  - submodule: create helper to build paths to submodule gitdirs
>
>  In modern repository layout, the real body of a cloned submodule
>  repository is held in .git/modules/ of the superproject, indexed by
>  the submodule name.  URLencode the submodule name before computing
>  the name of the directory to make sure they form a flat namespace.
>
>  Kicked back to 'pu', expecting further work on the topic.
>  cf. 

Thanks.

>
> * sb/submodule-move-head-with-corruption (2018-08-28) 2 commits
>  - submodule.c: warn about missing submodule git directories
>  - t2013: add test for missing but active submodule
>
>  Will discard and wait for a cleaned-up rewrite.
>  cf. <20180907195349.ga103...@aiede.svl.corp.google.com>

Yeah I think discarding this is the right move.

> * sb/submodule-recursive-fetch-gets-the-tip (2018-09-12) 9 commits
>  - builtin/fetch: check for submodule updates for non branch fetches
>  - fetch: retry fetching submodules if sha1 were not fetched
>  - submodule: fetch in submodules git directory instead of in worktree
>  - submodule.c: do not copy around submodule list
>  - submodule: move global changed_submodule_names into fetch submodule struct
>  - submodule.c: sort changed_submodule_names before searching it
>  - submodule.c: fix indentation
>  - sha1-array: provide oid_array_filter
>  - string-list: add string_list_{pop, last} functions
>
>  "git fetch --recurse-submodules" may not fetch the necessary commit
>  that is bound to the superproject, which is getting corrected.
>
>  Expecting a reroll.
>  cf. 

is fixed in
https://public-inbox.org/git/20180917213559.126404-7-sbel...@google.com/

>  cf. 

That is fixed locally

>  cf. 

That has been addressed via
https://public-inbox.org/git/20180925194755.105578-1-sbel...@google.com/

Will resend after a local review.

> * pk/rebase-in-c-6-final (2018-10-09) 1 commit
>  - rebase: default to using the builtin rebase
>  (this branch uses ag/rebase-i-in-c, 
> js/rebase-in-c-5.5-work-with-rebase-i-in-c, pk/rebase-in-c, 
> pk/rebase-in-c-2-basic, pk/rebase-in-c-3-acts, pk/rebase-in-c-4-opts and 
> pk/rebase-in-c-5-test; is tangled with ag/sequencer-reduce-rewriting-todo, 
> jc/rebase-in-c-5-test-typofix and js/rebase-i-break.)
>
>  The final step of rewriting "rebase -i" in C.
>
>  Undecided.
>  I've been using this (i.e. the whole "rebase -i" and "rebase"
>  rewritten in C) in my personal build, and I also know users on
>  Windows port have been using it with the last feature release.  I
>  am tempted to merge the whole thing to 'next' soonish.
>
>  Opinions?  It's the last chance to remove any existing and avoid
>  any future "oops, that was wrong, and here is a fix-up"
>  embarrassment in these topics.

Yes, please merge to next.

Stefan


`--rebase-merges' still failing badly

2018-10-10 Thread Michael Witten
On Wed, 10 Oct 2018 14:43:46 +0900, Junio wrote:

> We haven't seen  much complaints and breakages  reported against the
> two big "rewrite in C" topics  around "rebase"; perhaps it is a good
> time to merge  them to 'next' soonish  to cook them for  a few weeks
> before moving them to 'master'?

In my opinion, the `--rebase-merges' feature has been broken since the
beginning, and the builtin version should  be fixed before it is moved
ahead. In short: "labels" are brittle; see below for tests.

Also, here are some quick *additional* thoughts:

* Labels should be simply "r0", "r1", ... "rN".

  * The current, long label names are just cumbersome.
  * The embedded comments are already more than enough.
  * "r" is short for "revision" or "reset" or "remember", etc.
  * "r" is  located on a  QWERTY keyboard such that  it's very
easy to type "rN", where "N" is a number.

* Why is the command "label" and not "branch"? Every other related
  command looks  like a normal  git command: "reset"  and "merge".
  Make it "branch".

* In my experience, there's a lot of this boiler plate:

  pick 12345
  label r1
  reset r0
  merge r1

  How about instead, use git's existing ideas:

  pick 12345
  reset r0
  merge ORIG_HEAD

  Or, maybe git in general  should treat `-' as `ORIG_HEAD' (which
  would be similar to how `git checkout' understands `-'), thereby
  allowing a very quick idiomatic string of commands:

  pick 12345
  reset r0
  merge -

  In truth, I don't really know the semantics of `ORIG_HEAD', so
  maybe those should be nailed down and documented more clearly;
  I would like it to work as in the following:

  pick 12345
 # label r1 (pretend)
  reset r0   # Store r1 in ORIG_HEAD
  pick 67890 # Do NOT touch ORIG_HEAD
  merge -# Same as merge -C abcde r1

  Anyway, this  kind of unspoken  behavior would make  *writing* a
  new history by hand much more pleasant.

* Why not just `--merges' instead of `--rebase-merges'? Or, better
  yet,  just make  it  the default  behavior;  the special  option
  should instead be:

  --flatten

  This option would simply tell `git rebase' to prepare an initial
  todo list without merges.

Thanks for this great feature.

I'm only complaining so much because it's such a useful feature, and I
want it  to be  even better, because  I'll  probably use it  A LOT; it
should have been available since the start as a natural consequence of
the way git works.

Sincerely,
Michael Witten

---

Unfortunately,   both  the   legacy   version  and   the  rewrite   of
`--rebase-merges'  display  a  bug  that  makes  this  feature  fairly
unusable in  practice; it tries  to create  a "label" (i.e.,  a branch
name) from a commit log summary  line, and the result is often invalid
(or just  plain irritating to work  with). In particular, it  fails on
typical characters, including at least these:

:/\?.*[]

To see this, first define some POSIX shell functions:

test()
{
(
set -e
summary=$1
d=/tmp/repo # WARNING. CHANGE IF NECESSARY.
rm -rf "$d"; mkdir -p "$d"; cd "$d"
git init -q
echo a > a; git add a; git commit -q -m a
git branch base
echo b > b; git add b; git commit -q -m b
git reset -q --hard HEAD^
git merge -q --no-ff -m "$summary" ORIG_HEAD
git log --graph --oneline
git rebase --rebase-merges base
); status=$?
echo
return "$status"
}

Test()
{
if test "$@" 1>/dev/null 2>&1; then
echo 'good'; return 0
else
echo 'fail'; return 1
fi
}

Then, try various commit summaries (see below for results):

test c
test 'combine these into a merge: a and b'
Test ab:
Test a:b
Test :
Test a/b
Test 'Now supports /regex/'
Test ab/
Test /ab
Test /
Test 'a\b'
Test '\'
Test 'Maybe this works?'
Test '?'
Test 'This does not work.'
Test 'This works. Strange!'
Test .git
Test .
Test 'Cast each pointer to *void'
Test '*'
Test 'return a[1] not a[0]'
Test '[ does not work'
Test '['
Test '] does work'
Test ']'

Here are the results of pasting the above commands into my terminal:

$ test c
warning: templates not found in ../install/share/git-core/templates
*   1992d07 (HEAD -> master) c
|\
| * 34555b5 b
|/
* 338db9b (base) a
Successfully rebased and updated refs/heads/master.

$ test 'combine these into a merge: a and b'
warning: templates not found in ../install/share/git-core/templates
*   4202c49 (HEAD -> master) combine these into a merge: a and b

Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Ævar Arnfjörð Bjarmason


On Wed, Oct 10 2018, Jonathan Nieder wrote:

> Hi,
>
> Ævar Arnfjörð Bjarmason wrote:
>
>> I'm just saying it's hard in this case to remove one piece without the
>> whole Jenga tower collapsing, and it's probably a good idea in some of
>> these cases to pester the user about what he wants, but probably not via
>> gc --auto emitting the same warning every time, e.g. in one of these
>> threads I suggested maybe "git status" should emit this.
>
> I have to say, I don't have a lot of sympathy for this.
>
> I've been running with the patches I sent before for a while now, and
> the behavior that they create is great.  I think we can make further
> refinements on top.  To put it another way, I haven't actually
> experienced any bad knock-on effects, and I think other feature
> requests can be addressed separately.
>
> I do have sympathy for some wishes for changes to "git gc --auto"
> behavior (I think it should be synchronous regardless of config and
> the asynchrony should move to being requested explicitly through a
> command line option by the callers within Git) but I don't understand
> why this holds up a change that IMHO is wholly positive for users.
>
> To put it another way, I am getting the feeling that the objections to
> that series were theoretical, while the practical benefits of the
> patch are pretty immediate and real.  I'm happy to help anyone who
> wants to polish it but time has shown no one is working on that, so...

[I wrote this before seeing Jeff's reply, but just to bo clear...]

Yes, like Jeff says I'm not referring to your gitster/jn/gc-auto with
this "Jenga tower" comment.

Re that patch: I've said what I think about tools printing error
messages saying "I can't do stuff" while not returning a non-zero exit
code, so I won't repeat that here. But whatever anyone thinks of that
it's ultimately a rather trivial detail, and doesn't have any knock-on
effects on the rest of git-gc behavior.

I'm talking about the "gc: do not warn about too many loose objects"
patch and similar approaches. FWIW what I'm describing in
<878t36f3ed@evledraar.gmail.com> isn't some theoretical concern. In
some large repositories at work that experience a lot of branch churn
and have fetch.prune / fetch.pruneTags turned on active checkouts very
quickly get to the default 6700 limit.

I've currently found that gc.pruneExpire=4.days.ago is close to a sweet
spot of avoiding that issue for now, while not e.g. gc-ing a loose
object someone committed on Friday before the same time on Monday, but
before I tweaked that, but with the default of 2.weeks we'd much more
regularly see the problem described in [1].

But as noted in the various GC threads linked from this one that sort of
solution within the confines of the current implementation and
configuration promises we've made, which lead to all sorts of stupidity.

1. https://public-inbox.org/git/87inc89j38@evledraar.gmail.com/


Re: Git svn bug on merging svn branches

2018-10-10 Thread Andreas Heiduk
Hello,

Am 10.10.2018 um 01:38 schrieb Артем Семенов:
> Hello.
> 
> Git svn bug on merging svn branches:
> 
> Svn repository (branches tag trunk).
> 
> 1. Add a some file by svn tools.
> 2. Create a new branch by svn tools (e.g. br1) .
> 3. Create a new branch by svn tools on branch br1 (e.g. br2).
> 4. Add some changes to file f1 in branch br1. Commit by svn tools.
> 5. Clone repository by git svn.
> 6. Create two local branches – br1_svn (on origin/br1) and br2_svn (on
> origin/br2);
> 7. Checkout to br2_svn. Add some changes (e.g add file f2). Execute git
> add, git commit.
> 8. Execute “git merge br1_svn”.
> 9. Checkout to br1_svn.
> 10. Execute “git svn info” - URL refers to br1. (URL:
> https://127.0.0.1/svn/branchtest/branches/br1)
> 11. Execute “git merge br2_svn”.
> 12. Execute “git svn info” - URL refers to br2. (URL:
> https://127.0.0.1/svn/branchtest/branches/br2)

The "CAVEAT" section in the git-svn manual already contains some text about
your case:

   If you do merge, note the following rule: git svn dcommit will attempt
   to commit on top of the SVN commit named in

   git log --grep=^git-svn-id: --first-parent -1

   You must therefore ensure that the most recent commit of the branch you
   want to dcommit to is the first parent of the merge. Chaos will ensue
   otherwise, especially if the first parent is an older commit on the
   same SVN branch.

The paragraphs before these lines give more reasons to avoid a non-linear
history in SVN branches.

Best regards
Andreas Heiduk


Re: [PATCH 0/2] branch: introduce --current display option

2018-10-10 Thread Stefan Beller
> I'd be happy to submit a documentation patch for discussion that
> formally moves rev-parse to plumbing.

I'd be happy to see such a patch.


Re: [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change

2018-10-10 Thread Stefan Beller
On Wed, Oct 10, 2018 at 8:26 AM Phillip Wood  wrote:
>
> On 09/10/2018 22:10, Stefan Beller wrote:
> >> As I said above I've more or less come to the view that the correctness
> >> of pythonic indentation is orthogonal to move detection as it affects
> >> all additions, not just those that correspond to moved lines.
> >
> > Makes sense.
>
> Right so are you happy for we to re-roll with a single
> allow-indentation-change mode based on my RFC?

I am happy with that.


Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Jeff King
On Wed, Oct 10, 2018 at 09:51:52AM -0700, Jonathan Nieder wrote:

> Ævar Arnfjörð Bjarmason wrote:
> 
> > I'm just saying it's hard in this case to remove one piece without the
> > whole Jenga tower collapsing, and it's probably a good idea in some of
> > these cases to pester the user about what he wants, but probably not via
> > gc --auto emitting the same warning every time, e.g. in one of these
> > threads I suggested maybe "git status" should emit this.
> 
> I have to say, I don't have a lot of sympathy for this.
> 
> I've been running with the patches I sent before for a while now, and
> the behavior that they create is great.  I think we can make further
> refinements on top.  To put it another way, I haven't actually
> experienced any bad knock-on effects, and I think other feature
> requests can be addressed separately.

I think there may be some miscommunication here. The Jenga tower above
is referring (I think) to Jonathan Tan's original patch to drop the
warning entirely, which does have some unwanted side effects.

Your patches are much less controversial, I think, and are in next and
marked as "will merge to master" in the last "what's cooking".

-Peff


Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Jonathan Nieder
Hi,

Ævar Arnfjörð Bjarmason wrote:

> I'm just saying it's hard in this case to remove one piece without the
> whole Jenga tower collapsing, and it's probably a good idea in some of
> these cases to pester the user about what he wants, but probably not via
> gc --auto emitting the same warning every time, e.g. in one of these
> threads I suggested maybe "git status" should emit this.

I have to say, I don't have a lot of sympathy for this.

I've been running with the patches I sent before for a while now, and
the behavior that they create is great.  I think we can make further
refinements on top.  To put it another way, I haven't actually
experienced any bad knock-on effects, and I think other feature
requests can be addressed separately.

I do have sympathy for some wishes for changes to "git gc --auto"
behavior (I think it should be synchronous regardless of config and
the asynchrony should move to being requested explicitly through a
command line option by the callers within Git) but I don't understand
why this holds up a change that IMHO is wholly positive for users.

To put it another way, I am getting the feeling that the objections to
that series were theoretical, while the practical benefits of the
patch are pretty immediate and real.  I'm happy to help anyone who
wants to polish it but time has shown no one is working on that, so...

Thanks,
Jonathan


Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Martin Langhoff
On Wed, Oct 10, 2018 at 8:21 AM Junio C Hamano  wrote:
> We probably can keep the "let's not run for a day" safety while
> pretending that "git gc -auto" succeeded for callers like "git svn"
> so that these callers do not hae to do "eval { ... }" to hide our
> exit code, no?
>
> I think that is what Jonathan's patch (jn/gc-auto) does.

+1

`--auto` means "DTRT, but remember you're running as part of a larger
process; don't error out unless it's critical".

cheers,


m
-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff


Re: [PATCH 0/2] branch: introduce --current display option

2018-10-10 Thread Daniels Umanovskis
On 10/10/18 5:03 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> I'm mildly negative on this because git-rev-parse is plumbing, but
> git-branch is porcelain [..]
> 
> We also list git-rev-parse as porcelain, just under "Porcelain / Ancillary
> Commands / Interrogators".
> 
> Should we just move it to plumbing? I don't know.

>From my perspective as a Git user, not developer, git-rev-parse is
between plumbing and porcelain, but much more plumbing. It's listed as
porcelain but is connected to the plumbing git-rev-list, and for the
most part it does things incomprehensible without understanding Git
internals. Then it also has a bunch of options that are very useful in
scripts but unrelated to revisions, here I mean --git-dir or
--is-inside-work-tree.

I'd be happy to submit a documentation patch for discussion that
formally moves rev-parse to plumbing.

> Also, as much as our current scripting interface can be very confusing
> (you might not think "get current branch" is under rev-parse), I can't
> help but think that adding two different ways to spew out the exact same
> thing to two different commands is heading in the wrong
> direction.

Agreed, so I'm very much inclined to move forward with Junio's preferred
solution on this, which would also act differently by only outputting
the branch when you're really on a branch, and being silent in e.g.
detached HEAD.


[PATCH v8 6/7] ieot: add Index Entry Offset Table (IEOT) extension

2018-10-10 Thread Ben Peart
From: Ben Peart 

This patch enables addressing the CPU cost of loading the index by adding
additional data to the index that will allow us to efficiently multi-
thread the loading and conversion of cache entries.

It accomplishes this by adding an (optional) index extension that is a
table of offsets to blocks of cache entries in the index file.  To make
this work for V4 indexes, when writing the cache entries, it periodically
"resets" the prefix-compression by encoding the current entry as if the
path name for the previous entry is completely different and saves the
offset of that entry in the IEOT.  Basically, with V4 indexes, it
generates offsets into blocks of prefix-compressed entries.

Signed-off-by: Ben Peart 
---
 Documentation/technical/index-format.txt |  18 +++
 read-cache.c | 196 ++-
 2 files changed, 211 insertions(+), 3 deletions(-)

diff --git a/Documentation/technical/index-format.txt 
b/Documentation/technical/index-format.txt
index 6bc2d90f7f..7c4d67aa6a 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -337,3 +337,21 @@ The remaining data of each directory block is grouped by 
type:
 
SHA-1("TREE" +  +
"REUC" + )
+
+== Index Entry Offset Table
+
+  The Index Entry Offset Table (IEOT) is used to help address the CPU
+  cost of loading the index by enabling multi-threading the process of
+  converting cache entries from the on-disk format to the in-memory format.
+  The signature for this extension is { 'I', 'E', 'O', 'T' }.
+
+  The extension consists of:
+
+  - 32-bit version (currently 1)
+
+  - A number of index offset entries each consisting of:
+
+- 32-bit offset from the begining of the file to the first cache entry
+   in this block of entries.
+
+- 32-bit count of cache entries in this block
diff --git a/read-cache.c b/read-cache.c
index 2214b3153d..3ace29d58f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -45,6 +45,7 @@
 #define CACHE_EXT_UNTRACKED 0x554E5452   /* "UNTR" */
 #define CACHE_EXT_FSMONITOR 0x46534D4E   /* "FSMN" */
 #define CACHE_EXT_ENDOFINDEXENTRIES 0x454F4945 /* "EOIE" */
+#define CACHE_EXT_INDEXENTRYOFFSETTABLE 0x49454F54 /* "IEOT" */
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
 #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
@@ -1696,6 +1697,7 @@ static int read_index_extension(struct index_state 
*istate,
read_fsmonitor_extension(istate, data, sz);
break;
case CACHE_EXT_ENDOFINDEXENTRIES:
+   case CACHE_EXT_INDEXENTRYOFFSETTABLE:
/* already handled in do_read_index() */
break;
default:
@@ -1888,6 +1890,23 @@ static size_t estimate_cache_size(size_t ondisk_size, 
unsigned int entries)
return ondisk_size + entries * per_entry;
 }
 
+struct index_entry_offset
+{
+   /* starting byte offset into index file, count of index entries in this 
block */
+   int offset, nr;
+};
+
+struct index_entry_offset_table
+{
+   int nr;
+   struct index_entry_offset entries[FLEX_ARRAY];
+};
+
+#ifndef NO_PTHREADS
+static struct index_entry_offset_table *read_ieot_extension(const char *mmap, 
size_t mmap_size, size_t offset);
+static void write_ieot_extension(struct strbuf *sb, struct 
index_entry_offset_table *ieot);
+#endif
+
 static size_t read_eoie_extension(const char *mmap, size_t mmap_size);
 static void write_eoie_extension(struct strbuf *sb, git_hash_ctx 
*eoie_context, size_t offset);
 
@@ -1929,6 +1948,15 @@ static void *load_index_extensions(void *_data)
return NULL;
 }
 
+/*
+ * Mostly randomly chosen maximum thread counts: we
+ * cap the parallelism to online_cpus() threads, and we want
+ * to have at least 1 cache entries per thread for it to
+ * be worth starting a thread.
+ */
+
+#define THREAD_COST(1)
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -2521,6 +2549,9 @@ static int do_write_index(struct index_state *istate, 
struct tempfile *tempfile,
struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
int drop_cache_tree = istate->drop_cache_tree;
off_t offset;
+   int ieot_blocks = 1;
+   struct index_entry_offset_table *ieot = NULL;
+   int nr, nr_threads;
 
for (i = removed = extended = 0; i < entries; i++) {
if (cache[i]->ce_flags & CE_REMOVE)
@@ -2554,10 +2585,44 @@ static int do_write_index(struct index_state *istate, 
struct tempfile *tempfile,
if (ce_write(, newfd, , sizeof(hdr)) < 0)
return -1;
 
+#ifndef NO_PTHREADS
+   nr_threads = git_config_get_index_threads();
+   if (nr_threads != 1) {
+   int ieot_blocks, cpus;
+
+   /*
+* ensure default number of ieot blocks maps evenly to the
+ 

[PATCH v8 5/7] read-cache: load cache extensions on a worker thread

2018-10-10 Thread Ben Peart
From: Ben Peart 

This patch helps address the CPU cost of loading the index by loading
the cache extensions on a worker thread in parallel with loading the cache
entries.

In some cases, loading the extensions takes longer than loading the
cache entries so this patch utilizes the new EOIE to start the thread to
load the extensions before loading all the cache entries in parallel.

This is possible because the current extensions don't access the cache
entries in the index_state structure so are OK that they don't all exist
yet.

The CACHE_EXT_TREE, CACHE_EXT_RESOLVE_UNDO, and CACHE_EXT_UNTRACKED
extensions don't even get a pointer to the index so don't have access to the
cache entries.

CACHE_EXT_LINK only uses the index_state to initialize the split index.
CACHE_EXT_FSMONITOR only uses the index_state to save the fsmonitor last
update and dirty flags.

I used p0002-read-cache.sh to generate some performance data:

Test w/100,000 files reduced the time by 0.53%
Test w/1,000,000 files reduced the time by 27.78%

Signed-off-by: Ben Peart 
---
 read-cache.c | 95 +++-
 1 file changed, 79 insertions(+), 16 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 4781515252..2214b3153d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -23,6 +23,7 @@
 #include "split-index.h"
 #include "utf8.h"
 #include "fsmonitor.h"
+#include "thread-utils.h"
 
 /* Mask for the name length in ce_flags in the on-disk index */
 
@@ -1890,6 +1891,44 @@ static size_t estimate_cache_size(size_t ondisk_size, 
unsigned int entries)
 static size_t read_eoie_extension(const char *mmap, size_t mmap_size);
 static void write_eoie_extension(struct strbuf *sb, git_hash_ctx 
*eoie_context, size_t offset);
 
+struct load_index_extensions
+{
+#ifndef NO_PTHREADS
+   pthread_t pthread;
+#endif
+   struct index_state *istate;
+   const char *mmap;
+   size_t mmap_size;
+   unsigned long src_offset;
+};
+
+static void *load_index_extensions(void *_data)
+{
+   struct load_index_extensions *p = _data;
+   unsigned long src_offset = p->src_offset;
+
+   while (src_offset <= p->mmap_size - the_hash_algo->rawsz - 8) {
+   /* After an array of active_nr index entries,
+* there can be arbitrary number of extended
+* sections, each of which is prefixed with
+* extension name (4-byte) and section length
+* in 4-byte network byte order.
+*/
+   uint32_t extsize = get_be32(p->mmap + src_offset + 4);
+   if (read_index_extension(p->istate,
+p->mmap + src_offset,
+p->mmap + src_offset + 8,
+extsize) < 0) {
+   munmap((void *)p->mmap, p->mmap_size);
+   die(_("index file corrupt"));
+   }
+   src_offset += 8;
+   src_offset += extsize;
+   }
+
+   return NULL;
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -1900,6 +1939,11 @@ int do_read_index(struct index_state *istate, const char 
*path, int must_exist)
const char *mmap;
size_t mmap_size;
const struct cache_entry *previous_ce = NULL;
+   struct load_index_extensions p;
+   size_t extension_offset = 0;
+#ifndef NO_PTHREADS
+   int nr_threads;
+#endif
 
if (istate->initialized)
return istate->cache_nr;
@@ -1936,6 +1980,30 @@ int do_read_index(struct index_state *istate, const char 
*path, int must_exist)
istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache));
istate->initialized = 1;
 
+   p.istate = istate;
+   p.mmap = mmap;
+   p.mmap_size = mmap_size;
+
+#ifndef NO_PTHREADS
+   nr_threads = git_config_get_index_threads();
+   if (!nr_threads)
+   nr_threads = online_cpus();
+
+   if (nr_threads > 1) {
+   extension_offset = read_eoie_extension(mmap, mmap_size);
+   if (extension_offset) {
+   int err;
+
+   p.src_offset = extension_offset;
+   err = pthread_create(, NULL, 
load_index_extensions, );
+   if (err)
+   die(_("unable to create load_index_extensions 
thread: %s"), strerror(err));
+
+   nr_threads--;
+   }
+   }
+#endif
+
if (istate->version == 4) {
mem_pool_init(>ce_mem_pool,
  
estimate_cache_size_from_compressed(istate->cache_nr));
@@ -1960,22 +2028,17 @@ int do_read_index(struct index_state *istate, const 
char *path, int must_exist)
istate->timestamp.sec = st.st_mtime;
istate->timestamp.nsec = ST_MTIME_NSEC(st);
 

[PATCH v8 7/7] read-cache: load cache entries on worker threads

2018-10-10 Thread Ben Peart
From: Ben Peart 

This patch helps address the CPU cost of loading the index by utilizing
the Index Entry Offset Table (IEOT) to divide loading and conversion of
the cache entries across multiple threads in parallel.

I used p0002-read-cache.sh to generate some performance data:

Test w/100,000 files reduced the time by 32.24%
Test w/1,000,000 files reduced the time by -4.77%

Note that on the 1,000,000 files case, multi-threading the cache entry parsing
does not yield a performance win.  This is because the cost to parse the
index extensions in this repo, far outweigh the cost of loading the cache
entries.

The high cost of parsing the index extensions is driven by the cache tree
and the untracked cache extensions. As this is currently the longest pole,
any reduction in this time will reduce the overall index load times so is
worth further investigation in another patch series.

Signed-off-by: Ben Peart 
---
 read-cache.c | 230 ++-
 1 file changed, 193 insertions(+), 37 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 3ace29d58f..7acc2c86f4 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1720,7 +1720,8 @@ int read_index(struct index_state *istate)
return read_index_from(istate, get_index_file(), get_git_dir());
 }
 
-static struct cache_entry *create_from_disk(struct index_state *istate,
+static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
+   unsigned int version,
struct ondisk_cache_entry *ondisk,
unsigned long *ent_size,
const struct cache_entry 
*previous_ce)
@@ -1737,7 +1738,7 @@ static struct cache_entry *create_from_disk(struct 
index_state *istate,
 * number of bytes to be stripped from the end of the previous name,
 * and the bytes to append to the result, to come up with its name.
 */
-   int expand_name_field = istate->version == 4;
+   int expand_name_field = version == 4;
 
/* On-disk flags are just 16 bits */
flags = get_be16(>flags);
@@ -1761,16 +1762,17 @@ static struct cache_entry *create_from_disk(struct 
index_state *istate,
const unsigned char *cp = (const unsigned char *)name;
size_t strip_len, previous_len;
 
-   previous_len = previous_ce ? previous_ce->ce_namelen : 0;
+   /* If we're at the begining of a block, ignore the previous 
name */
strip_len = decode_varint();
-   if (previous_len < strip_len) {
-   if (previous_ce)
+   if (previous_ce) {
+   previous_len = previous_ce->ce_namelen;
+   if (previous_len < strip_len)
die(_("malformed name field in the index, near 
path '%s'"),
-   previous_ce->name);
-   else
-   die(_("malformed name field in the index in the 
first path"));
+   previous_ce->name);
+   copy_len = previous_len - strip_len;
+   } else {
+   copy_len = 0;
}
-   copy_len = previous_len - strip_len;
name = (const char *)cp;
}
 
@@ -1780,7 +1782,7 @@ static struct cache_entry *create_from_disk(struct 
index_state *istate,
len += copy_len;
}
 
-   ce = mem_pool__ce_alloc(istate->ce_mem_pool, len);
+   ce = mem_pool__ce_alloc(ce_mem_pool, len);
 
ce->ce_stat_data.sd_ctime.sec = get_be32(>ctime.sec);
ce->ce_stat_data.sd_mtime.sec = get_be32(>mtime.sec);
@@ -1948,6 +1950,52 @@ static void *load_index_extensions(void *_data)
return NULL;
 }
 
+/*
+ * A helper function that will load the specified range of cache entries
+ * from the memory mapped file and add them to the given index.
+ */
+static unsigned long load_cache_entry_block(struct index_state *istate,
+   struct mem_pool *ce_mem_pool, int offset, int nr, const 
char *mmap,
+   unsigned long start_offset, const struct cache_entry 
*previous_ce)
+{
+   int i;
+   unsigned long src_offset = start_offset;
+
+   for (i = offset; i < offset + nr; i++) {
+   struct ondisk_cache_entry *disk_ce;
+   struct cache_entry *ce;
+   unsigned long consumed;
+
+   disk_ce = (struct ondisk_cache_entry *)(mmap + src_offset);
+   ce = create_from_disk(ce_mem_pool, istate->version, disk_ce, 
, previous_ce);
+   set_index_entry(istate, i, ce);
+
+   src_offset += consumed;
+   previous_ce = ce;
+   }
+   return src_offset - start_offset;
+}
+
+static unsigned long load_all_cache_entries(struct index_state 

[PATCH v8 4/7] config: add new index.threads config setting

2018-10-10 Thread Ben Peart
From: Ben Peart 

Add support for a new index.threads config setting which will be used to
control the threading code in do_read_index().  A value of 0 will tell the
index code to automatically determine the correct number of threads to use.
A value of 1 will make the code single threaded.  A value greater than 1
will set the maximum number of threads to use.

For testing purposes, this setting can be overwritten by setting the
GIT_TEST_INDEX_THREADS= environment variable to a value greater than 0.

Signed-off-by: Ben Peart 
---
 Documentation/config.txt |  7 +++
 config.c | 18 ++
 config.h |  1 +
 t/README |  5 +
 t/t1700-split-index.sh   |  5 +
 5 files changed, 36 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ad0f4510c3..8fd973b76b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2413,6 +2413,13 @@ imap::
The configuration variables in the 'imap' section are described
in linkgit:git-imap-send[1].
 
+index.threads::
+   Specifies the number of threads to spawn when loading the index.
+   This is meant to reduce index load time on multiprocessor machines.
+   Specifying 0 or 'true' will cause Git to auto-detect the number of
+   CPU's and set the number of threads accordingly. Specifying 1 or
+   'false' will disable multithreading. Defaults to 'true'.
+
 index.version::
Specify the version with which new index files should be
initialized.  This does not affect existing repositories.
diff --git a/config.c b/config.c
index 3461993f0a..2ee29f6f86 100644
--- a/config.c
+++ b/config.c
@@ -2289,6 +2289,24 @@ int git_config_get_fsmonitor(void)
return 0;
 }
 
+int git_config_get_index_threads(void)
+{
+   int is_bool, val = 0;
+
+   val = git_env_ulong("GIT_TEST_INDEX_THREADS", 0);
+   if (val)
+   return val;
+
+   if (!git_config_get_bool_or_int("index.threads", _bool, )) {
+   if (is_bool)
+   return val ? 0 : 1;
+   else
+   return val;
+   }
+
+   return 0; /* auto */
+}
+
 NORETURN
 void git_die_config_linenr(const char *key, const char *filename, int linenr)
 {
diff --git a/config.h b/config.h
index ab46e0165d..a06027e69b 100644
--- a/config.h
+++ b/config.h
@@ -250,6 +250,7 @@ extern int git_config_get_untracked_cache(void);
 extern int git_config_get_split_index(void);
 extern int git_config_get_max_percent_split_change(void);
 extern int git_config_get_fsmonitor(void);
+extern int git_config_get_index_threads(void);
 
 /* This dies if the configured or default date is in the future */
 extern int git_config_get_expiry(const char *key, const char **output);
diff --git a/t/README b/t/README
index 3ea6c85460..8f5c0620ea 100644
--- a/t/README
+++ b/t/README
@@ -327,6 +327,11 @@ GIT_TEST_COMMIT_GRAPH=, when true, forces the 
commit-graph to
 be written after every 'git commit' command, and overrides the
 'core.commitGraph' setting to true.
 
+GIT_TEST_INDEX_THREADS= enables exercising the multi-threaded loading
+of the index for the whole test suite by bypassing the default number of
+cache entries and thread minimums. Setting this to 1 will make the
+index loading single threaded.
+
 Naming Tests
 
 
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 8e17f8e7a0..ef9349bd70 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -6,7 +6,12 @@ test_description='split index mode tests'
 
 # We need total control of index splitting here
 sane_unset GIT_TEST_SPLIT_INDEX
+
+# Testing a hard coded SHA against an index with an extension
+# that can vary from run to run is problematic so we disable
+# those extensions.
 sane_unset GIT_FSMONITOR_TEST
+sane_unset GIT_TEST_INDEX_THREADS
 
 test_expect_success 'enable split index' '
git config splitIndex.maxPercentChange 100 &&
-- 
2.18.0.windows.1



[PATCH v8 2/7] read-cache: clean up casting and byte decoding

2018-10-10 Thread Ben Peart
From: Ben Peart 

This patch does a clean up pass to minimize the casting required to work
with the memory mapped index (mmap).

It also makes the decoding of network byte order more consistent by using
get_be32() where possible.

Signed-off-by: Ben Peart 
---
 read-cache.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 583a4fb1f8..6ba99e2c96 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1650,7 +1650,7 @@ int verify_index_checksum;
 /* Allow fsck to force verification of the cache entry order. */
 int verify_ce_order;
 
-static int verify_hdr(struct cache_header *hdr, unsigned long size)
+static int verify_hdr(const struct cache_header *hdr, unsigned long size)
 {
git_hash_ctx c;
unsigned char hash[GIT_MAX_RAWSZ];
@@ -1674,7 +1674,7 @@ static int verify_hdr(struct cache_header *hdr, unsigned 
long size)
 }
 
 static int read_index_extension(struct index_state *istate,
-   const char *ext, void *data, unsigned long sz)
+   const char *ext, const char *data, unsigned 
long sz)
 {
switch (CACHE_EXT(ext)) {
case CACHE_EXT_TREE:
@@ -1889,8 +1889,8 @@ int do_read_index(struct index_state *istate, const char 
*path, int must_exist)
int fd, i;
struct stat st;
unsigned long src_offset;
-   struct cache_header *hdr;
-   void *mmap;
+   const struct cache_header *hdr;
+   const char *mmap;
size_t mmap_size;
const struct cache_entry *previous_ce = NULL;
 
@@ -1918,7 +1918,7 @@ int do_read_index(struct index_state *istate, const char 
*path, int must_exist)
die_errno("unable to map index file");
close(fd);
 
-   hdr = mmap;
+   hdr = (const struct cache_header *)mmap;
if (verify_hdr(hdr, mmap_size) < 0)
goto unmap;
 
@@ -1943,7 +1943,7 @@ int do_read_index(struct index_state *istate, const char 
*path, int must_exist)
struct cache_entry *ce;
unsigned long consumed;
 
-   disk_ce = (struct ondisk_cache_entry *)((char *)mmap + 
src_offset);
+   disk_ce = (struct ondisk_cache_entry *)(mmap + src_offset);
ce = create_from_disk(istate, disk_ce, , previous_ce);
set_index_entry(istate, i, ce);
 
@@ -1961,21 +1961,20 @@ int do_read_index(struct index_state *istate, const 
char *path, int must_exist)
 * in 4-byte network byte order.
 */
uint32_t extsize;
-   memcpy(, (char *)mmap + src_offset + 4, 4);
-   extsize = ntohl(extsize);
+   extsize = get_be32(mmap + src_offset + 4);
if (read_index_extension(istate,
-(const char *) mmap + src_offset,
-(char *) mmap + src_offset + 8,
+mmap + src_offset,
+mmap + src_offset + 8,
 extsize) < 0)
goto unmap;
src_offset += 8;
src_offset += extsize;
}
-   munmap(mmap, mmap_size);
+   munmap((void *)mmap, mmap_size);
return istate->cache_nr;
 
 unmap:
-   munmap(mmap, mmap_size);
+   munmap((void *)mmap, mmap_size);
die("index file corrupt");
 }
 
-- 
2.18.0.windows.1



[PATCH v8 1/7] read-cache.c: optimize reading index format v4

2018-10-10 Thread Ben Peart
From: Nguyễn Thái Ngọc Duy 

Index format v4 requires some more computation to assemble a path
based on a previous one. The current code is not very efficient
because

 - it doubles memory copy, we assemble the final path in a temporary
   first before putting it back to a cache_entry

 - strbuf_remove() in expand_name_field() is not exactly a good fit
   for stripping a part at the end, _setlen() would do the same job
   and is much cheaper.

 - the open-coded loop to find the end of the string in
   expand_name_field() can't beat an optimized strlen()

This patch avoids the temporary buffer and writes directly to the new
cache_entry, which addresses the first two points. The last point
could also be avoided if the total string length fits in the first 12
bits of ce_flags, if not we fall back to strlen().

Running "test-tool read-cache 100" on webkit.git (275k files), reading
v2 only takes 4.226 seconds, while v4 takes 5.711 seconds, 35% more
time. The patch reduces read time on v4 to 4.319 seconds.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 read-cache.c | 128 ---
 1 file changed, 60 insertions(+), 68 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 8d04d78a58..583a4fb1f8 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1713,63 +1713,24 @@ int read_index(struct index_state *istate)
return read_index_from(istate, get_index_file(), get_git_dir());
 }
 
-static struct cache_entry *cache_entry_from_ondisk(struct mem_pool *mem_pool,
-  struct ondisk_cache_entry 
*ondisk,
-  unsigned int flags,
-  const char *name,
-  size_t len)
-{
-   struct cache_entry *ce = mem_pool__ce_alloc(mem_pool, len);
-
-   ce->ce_stat_data.sd_ctime.sec = get_be32(>ctime.sec);
-   ce->ce_stat_data.sd_mtime.sec = get_be32(>mtime.sec);
-   ce->ce_stat_data.sd_ctime.nsec = get_be32(>ctime.nsec);
-   ce->ce_stat_data.sd_mtime.nsec = get_be32(>mtime.nsec);
-   ce->ce_stat_data.sd_dev   = get_be32(>dev);
-   ce->ce_stat_data.sd_ino   = get_be32(>ino);
-   ce->ce_mode  = get_be32(>mode);
-   ce->ce_stat_data.sd_uid   = get_be32(>uid);
-   ce->ce_stat_data.sd_gid   = get_be32(>gid);
-   ce->ce_stat_data.sd_size  = get_be32(>size);
-   ce->ce_flags = flags & ~CE_NAMEMASK;
-   ce->ce_namelen = len;
-   ce->index = 0;
-   hashcpy(ce->oid.hash, ondisk->sha1);
-   memcpy(ce->name, name, len);
-   ce->name[len] = '\0';
-   return ce;
-}
-
-/*
- * Adjacent cache entries tend to share the leading paths, so it makes
- * sense to only store the differences in later entries.  In the v4
- * on-disk format of the index, each on-disk cache entry stores the
- * number of bytes to be stripped from the end of the previous name,
- * and the bytes to append to the result, to come up with its name.
- */
-static unsigned long expand_name_field(struct strbuf *name, const char *cp_)
-{
-   const unsigned char *ep, *cp = (const unsigned char *)cp_;
-   size_t len = decode_varint();
-
-   if (name->len < len)
-   die("malformed name field in the index");
-   strbuf_remove(name, name->len - len, len);
-   for (ep = cp; *ep; ep++)
-   ; /* find the end */
-   strbuf_add(name, cp, ep - cp);
-   return (const char *)ep + 1 - cp_;
-}
-
-static struct cache_entry *create_from_disk(struct mem_pool *mem_pool,
+static struct cache_entry *create_from_disk(struct index_state *istate,
struct ondisk_cache_entry *ondisk,
unsigned long *ent_size,
-   struct strbuf *previous_name)
+   const struct cache_entry 
*previous_ce)
 {
struct cache_entry *ce;
size_t len;
const char *name;
unsigned int flags;
+   size_t copy_len;
+   /*
+* Adjacent cache entries tend to share the leading paths, so it makes
+* sense to only store the differences in later entries.  In the v4
+* on-disk format of the index, each on-disk cache entry stores the
+* number of bytes to be stripped from the end of the previous name,
+* and the bytes to append to the result, to come up with its name.
+*/
+   int expand_name_field = istate->version == 4;
 
/* On-disk flags are just 16 bits */
flags = get_be16(>flags);
@@ -1789,21 +1750,54 @@ static struct cache_entry *create_from_disk(struct 
mem_pool *mem_pool,
else
name = ondisk->name;
 
-   if (!previous_name) {
-   /* v3 and earlier */
-   if (len == CE_NAMEMASK)
-   len = strlen(name);
-   ce = cache_entry_from_ondisk(mem_pool, 

[PATCH v8 3/7] eoie: add End of Index Entry (EOIE) extension

2018-10-10 Thread Ben Peart
From: Ben Peart 

The End of Index Entry (EOIE) is used to locate the end of the variable
length index entries and the beginning of the extensions. Code can take
advantage of this to quickly locate the index extensions without having
to parse through all of the index entries.

The EOIE extension is always written out to the index file including to
the shared index when using the split index feature. Because it is always
written out, the SHA checksums in t/t1700-split-index.sh were updated
to reflect its inclusion.

It is written as an optional extension to ensure compatibility with other
git implementations that do not yet support it.  It is always written out
to ensure it is available as often as possible to speed up index operations.

Because it must be able to be loaded before the variable length cache
entries and other index extensions, this extension must be written last.
The signature for this extension is { 'E', 'O', 'I', 'E' }.

The extension consists of:

- 32-bit offset to the end of the index entries

- 160-bit SHA-1 over the extension types and their sizes (but not
their contents).  E.g. if we have "TREE" extension that is N-bytes
long, "REUC" extension that is M-bytes long, followed by "EOIE",
then the hash would be:

SHA-1("TREE" +  +
"REUC" + )

Signed-off-by: Ben Peart 
---
 Documentation/technical/index-format.txt |  23 
 read-cache.c | 158 +--
 t/t1700-split-index.sh   |   8 +-
 3 files changed, 177 insertions(+), 12 deletions(-)

diff --git a/Documentation/technical/index-format.txt 
b/Documentation/technical/index-format.txt
index db3572626b..6bc2d90f7f 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -314,3 +314,26 @@ The remaining data of each directory block is grouped by 
type:
 
   - An ewah bitmap, the n-th bit indicates whether the n-th index entry
 is not CE_FSMONITOR_VALID.
+
+== End of Index Entry
+
+  The End of Index Entry (EOIE) is used to locate the end of the variable
+  length index entries and the begining of the extensions. Code can take
+  advantage of this to quickly locate the index extensions without having
+  to parse through all of the index entries.
+
+  Because it must be able to be loaded before the variable length cache
+  entries and other index extensions, this extension must be written last.
+  The signature for this extension is { 'E', 'O', 'I', 'E' }.
+
+  The extension consists of:
+
+  - 32-bit offset to the end of the index entries
+
+  - 160-bit SHA-1 over the extension types and their sizes (but not
+   their contents).  E.g. if we have "TREE" extension that is N-bytes
+   long, "REUC" extension that is M-bytes long, followed by "EOIE",
+   then the hash would be:
+
+   SHA-1("TREE" +  +
+   "REUC" + )
diff --git a/read-cache.c b/read-cache.c
index 6ba99e2c96..4781515252 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -43,6 +43,7 @@
 #define CACHE_EXT_LINK 0x6c696e6b/* "link" */
 #define CACHE_EXT_UNTRACKED 0x554E5452   /* "UNTR" */
 #define CACHE_EXT_FSMONITOR 0x46534D4E   /* "FSMN" */
+#define CACHE_EXT_ENDOFINDEXENTRIES 0x454F4945 /* "EOIE" */
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
 #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
@@ -1693,6 +1694,9 @@ static int read_index_extension(struct index_state 
*istate,
case CACHE_EXT_FSMONITOR:
read_fsmonitor_extension(istate, data, sz);
break;
+   case CACHE_EXT_ENDOFINDEXENTRIES:
+   /* already handled in do_read_index() */
+   break;
default:
if (*ext < 'A' || 'Z' < *ext)
return error("index uses %.4s extension, which we do 
not understand",
@@ -1883,6 +1887,9 @@ static size_t estimate_cache_size(size_t ondisk_size, 
unsigned int entries)
return ondisk_size + entries * per_entry;
 }
 
+static size_t read_eoie_extension(const char *mmap, size_t mmap_size);
+static void write_eoie_extension(struct strbuf *sb, git_hash_ctx 
*eoie_context, size_t offset);
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -2190,11 +2197,15 @@ static int ce_write(git_hash_ctx *context, int fd, void 
*data, unsigned int len)
return 0;
 }
 
-static int write_index_ext_header(git_hash_ctx *context, int fd,
- unsigned int ext, unsigned int sz)
+static int write_index_ext_header(git_hash_ctx *context, git_hash_ctx 
*eoie_context,
+ int fd, unsigned int ext, unsigned int sz)
 {
ext = htonl(ext);
sz = htonl(sz);
+   if (eoie_context) {
+   the_hash_algo->update_fn(eoie_context, , 4);
+   the_hash_algo->update_fn(eoie_context, , 4);
+   }
return 

[PATCH v8 0/7] speed up index load through parallelization

2018-10-10 Thread Ben Peart
From: Ben Peart 

Fixed issues identified in review the most impactful probably being plugging
some leaks and improved error handling.  Also added better error messages
and some code cleanup to code I'd touched.

The biggest change in the interdiff is the impact of renaming ieot_offset to
ieot_start and ieot_work to ieot_blocks in hopes of making it easier to read
and understand the code.

Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/6caa0bac46
Checkout: git fetch https://github.com/benpeart/git read-index-multithread-v8 
&& git checkout 6caa0bac46


### Interdiff (v7..v8):

diff --git a/read-cache.c b/read-cache.c
index 14402a0738..7acc2c86f4 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1901,7 +1901,7 @@ struct index_entry_offset
 struct index_entry_offset_table
 {
int nr;
-   struct index_entry_offset entries[0];
+   struct index_entry_offset entries[FLEX_ARRAY];
 };
 
 #ifndef NO_PTHREADS
@@ -1935,9 +1935,7 @@ static void *load_index_extensions(void *_data)
 * extension name (4-byte) and section length
 * in 4-byte network byte order.
 */
-   uint32_t extsize;
-   memcpy(, p->mmap + src_offset + 4, 4);
-   extsize = ntohl(extsize);
+   uint32_t extsize = get_be32(p->mmap + src_offset + 4);
if (read_index_extension(p->istate,
 p->mmap + src_offset,
 p->mmap + src_offset + 8,
@@ -2015,8 +2013,8 @@ struct load_cache_entries_thread_data
int offset;
const char *mmap;
struct index_entry_offset_table *ieot;
-   int ieot_offset;/* starting index into the ieot array */
-   int ieot_work;  /* count of ieot entries to process */
+   int ieot_start; /* starting index into the ieot array */
+   int ieot_blocks;/* count of ieot entries to process */
unsigned long consumed; /* return # of bytes in index file processed */
 };
 
@@ -2030,8 +2028,9 @@ static void *load_cache_entries_thread(void *_data)
int i;
 
/* iterate across all ieot blocks assigned to this thread */
-   for (i = p->ieot_offset; i < p->ieot_offset + p->ieot_work; i++) {
-   p->consumed += load_cache_entry_block(p->istate, 
p->ce_mem_pool, p->offset, p->ieot->entries[i].nr, p->mmap, 
p->ieot->entries[i].offset, NULL);
+   for (i = p->ieot_start; i < p->ieot_start + p->ieot_blocks; i++) {
+   p->consumed += load_cache_entry_block(p->istate, p->ce_mem_pool,
+   p->offset, p->ieot->entries[i].nr, p->mmap, 
p->ieot->entries[i].offset, NULL);
p->offset += p->ieot->entries[i].nr;
}
return NULL;
@@ -2040,48 +2039,45 @@ static void *load_cache_entries_thread(void *_data)
 static unsigned long load_cache_entries_threaded(struct index_state *istate, 
const char *mmap, size_t mmap_size,
unsigned long src_offset, int nr_threads, struct 
index_entry_offset_table *ieot)
 {
-   int i, offset, ieot_work, ieot_offset, err;
+   int i, offset, ieot_blocks, ieot_start, err;
struct load_cache_entries_thread_data *data;
unsigned long consumed = 0;
-   int nr;
 
/* a little sanity checking */
if (istate->name_hash_initialized)
BUG("the name hash isn't thread safe");
 
mem_pool_init(>ce_mem_pool, 0);
-   data = xcalloc(nr_threads, sizeof(struct 
load_cache_entries_thread_data));
 
/* ensure we have no more threads than we have blocks to process */
if (nr_threads > ieot->nr)
nr_threads = ieot->nr;
-   data = xcalloc(nr_threads, sizeof(struct 
load_cache_entries_thread_data));
+   data = xcalloc(nr_threads, sizeof(*data));
 
-   offset = ieot_offset = 0;
-   ieot_work = DIV_ROUND_UP(ieot->nr, nr_threads);
+   offset = ieot_start = 0;
+   ieot_blocks = DIV_ROUND_UP(ieot->nr, nr_threads);
for (i = 0; i < nr_threads; i++) {
struct load_cache_entries_thread_data *p = [i];
-   int j;
+   int nr, j;
 
-   if (ieot_offset + ieot_work > ieot->nr)
-   ieot_work = ieot->nr - ieot_offset;
+   if (ieot_start + ieot_blocks > ieot->nr)
+   ieot_blocks = ieot->nr - ieot_start;
 
p->istate = istate;
p->offset = offset;
p->mmap = mmap;
p->ieot = ieot;
-   p->ieot_offset = ieot_offset;
-   p->ieot_work = ieot_work;
+   p->ieot_start = ieot_start;
+   p->ieot_blocks = ieot_blocks;
 
/* create a mem_pool for each thread */
nr = 0;
-   for (j = p->ieot_offset; j < p->ieot_offset + p->ieot_work; j++)
+   for (j = p->ieot_start; j < p->ieot_start + p->ieot_blocks; j++)
 

  1   2   >