Re: commit-graph is cool (overcoming add_missing_tags() perf issues)

2018-10-30 Thread Elijah Newren
On Tue, Oct 30, 2018 at 9:23 AM Ævar Arnfjörð Bjarmason
 wrote:
>
> On Wed, Oct 17, 2018 at 8:41 PM Elijah Newren  wrote:
> > (And in the mean time I gave the user a one-liner to nuke his
> > local-only tags that I suspect he doesn't need.)
>
> Just a note that you can usually set 'fetch.pruneTags=true' these days
> to make that happen.

TIL.  Thanks for the heads up.


Re: commit-graph is cool (overcoming add_missing_tags() perf issues)

2018-10-30 Thread Elijah Newren
On Tue, Oct 30, 2018 at 7:22 AM Derrick Stolee  wrote:
>
> On 10/17/2018 2:00 PM, Elijah Newren wrote:
> > Hi,
> >
> > Just wanted to give a shout-out for the commit-graph work and how
> > impressive it is.  I had an internal report from a user that git
> > pushes containing only one new tiny commit were taking over a minute
> > (in a moderate size repo with good network connectivity). After
> > digging for a while, I noticed three unusual things about the repo[1]:
> >* he had push.followTags set to true
> >* upstream repo had about 20k tags (despite only 55k commits)
> >* his repo had an additional 2.5k tags, but none of these were in
> >  the history of the branches he was pushing and thus would not be
> >  included in any pushes.
> >
> > Digging in, almost all the time was CPU-bound and spent in
> > add_missing_tags()[2].  If I'm reading the code correctly, it appears
> > that function loops over each tag, calling in_merge_bases_many() once
> > per tag.  Thus, for his case, we were potentially walking all of
> > history of the main branch 2.5k times.  That seemed rather suboptimal.
>
> Elijah,
>
> Do you still have this repo around? Could you by chance test the
> performance with the new algorithm for add_missing_tags() in [1]?
> Specifically, please test it without a commit-graph file, since your
> data shape already makes use of generation numbers pretty well.
>
> Thanks,
> -Stolee

I nuked it, but turns out I had a backup that I found after digging
around for a bit.  I'll post a comment on the series with the results.

By the way, I've been running pu for about a week with this tweak:

diff --git a/revision.c b/revision.c
index b5108b75ab..d20c687e71 100644
--- a/revision.c
+++ b/revision.c
@@ -1457,6 +1457,7 @@ void repo_init_revisions(struct repository *r,
revs->pruning.change = file_change;
revs->pruning.change_fn_data = revs;
revs->sort_order = REV_SORT_IN_GRAPH_ORDER;
+   revs->topo_order = 1;
revs->dense = 1;
revs->prefix = prefix;
revs->max_age = -1;

Only ran into one small problem once, and it wasn't commit-graph
related; rather it was related to my above patch and needing to not
have topo_order be set.  (I just bailed and used my older
system-installed git from /usr/bin/ in that one case.)  So, I think
the commit-graph stuff is looking pretty good, and I find the recent
thread on further improvements with corrected commit date (among other
possibilities) very intriguing...even if I haven't had much time to
comment or test recently.


Re: [PATCH 1/1] Use correct /dev/null for UNIX and Windows

2018-10-30 Thread Chris Webster
Resending in text mode.

On Tue, Oct 30, 2018 at 10:20 PM Chris Webster  wrote:
>
> On Tue, Oct 30, 2018 at 9:54 PM Junio C Hamano  wrote:
>>
>> "chris via GitGitGadget"  writes:
>>
>> > From: chris 
>> >
>> > Use File::Spec->devnull() for output redirection to avoid messages
>> > when Windows version of Perl is first in path.  The message 'The
>>
>> Dscho, "Windows version of Perl is first in path" somehow feels
>> contradicting with what one of the topics I saw from you were trying
>> to enforce (or, at least, "set as the supported configuration").
>>
>> I am guessing that the Perl you are building and shipping with Git
>> for Windows would yield what the shell that ends up running the
>> scriptlet `git config --get-color $key` prefers when asked for
>> File::Spec->devnull(), and nothing will break with this patch even
>> if that is "/dev/null", but I thought I'd double check.
>>
>> Thanks.
>>
This problem originally showed up in the
https://github.com/so-fancy/diff-so-fancy project, which has a copy of
DiffHighlight.pm.   That project allows diffsofancy (perl) to be run
from the command line without requiring the bash environment ((well ,
sort of) including the associated perl).
>
>> > system cannot find the path specified.' is displayed each time git is
>> > run to get colors.
>> >
>> > Signed-off-by: Chris. Webster 
>> > ---
>> >  contrib/diff-highlight/DiffHighlight.pm | 7 ++-
>> >  1 file changed, 6 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/contrib/diff-highlight/DiffHighlight.pm 
>> > b/contrib/diff-highlight/DiffHighlight.pm
>> > index 536754583..7440aa1c4 100644
>> > --- a/contrib/diff-highlight/DiffHighlight.pm
>> > +++ b/contrib/diff-highlight/DiffHighlight.pm
>> > @@ -4,6 +4,11 @@ use 5.008;
>> >  use warnings FATAL => 'all';
>> >  use strict;
>> >
>> > +# Use the correct value for both UNIX and Windows (/dev/null vs nul)
>> > +use File::Spec;
>> > +
>> > +my $NULL = File::Spec->devnull();
>> > +
>> >  # Highlight by reversing foreground and background. You could do
>> >  # other things like bold or underline if you prefer.
>> >  my @OLD_HIGHLIGHT = (
>> > @@ -134,7 +139,7 @@ sub highlight_stdin {
>> >  # fallback, which means we will work even if git can't be run.
>> >  sub color_config {
>> >   my ($key, $default) = @_;
>> > - my $s = `git config --get-color $key 2>/dev/null`;
>> > + my $s = `git config --get-color $key 2>$NULL`;
>> >   return length($s) ? $s : $default;
>> >  }


Re: [PATCH 2/3] check_stream_sha1(): handle input underflow

2018-10-30 Thread Junio C Hamano
Jeff King  writes:

> On Wed, Oct 31, 2018 at 01:03:39AM -0400, Jeff King wrote:
>
>> Phew. I almost just deleted all of the above, because now I think I'm
>> ready to write that comment you asked for. ;) But I left it since maybe
>> it makes sense to follow my thought process.
>
> So here it is in a more succinct form.

Thanks.

> +
> + /*
> +  * Unlike the loose object case, we do not have to worry here
> +  * about running out of input bytes and spinning infinitely. If
> +  * we get Z_BUF_ERROR due to too few input bytes, then we'll
> +  * replenish them in the next use_pack() call when we loop. If
> +  * we truly hit the end of the pack (i.e., because it's corrupt
> +  * or truncated), then use_pack() catches that and will die().
> +  */
>   if (status != Z_OK && status != Z_BUF_ERROR) {
>   git_inflate_end(>z);
>   st->z_state = z_error;

Reads well.  Will apply.


Re: [PATCH 2/3] check_stream_sha1(): handle input underflow

2018-10-30 Thread Jeff King
On Wed, Oct 31, 2018 at 01:03:39AM -0400, Jeff King wrote:

> Phew. I almost just deleted all of the above, because now I think I'm
> ready to write that comment you asked for. ;) But I left it since maybe
> it makes sense to follow my thought process.

So here it is in a more succinct form.

-Peff

-- >8 --
Subject: [PATCH] read_istream_pack_non_delta(): document input handling

Twice now we have scratched our heads about why the loose streaming code
needs the protection added by 692f0bc7ae (avoid infinite loop in
read_istream_loose, 2013-03-25), but the similar code in its pack
counterpart does not.

The short answer is that use_pack() will die before it lets us run out
of bytes. Note that this could mean reading garbage (including the
trailing hash) from the packfile in some cases of corruption, but that's
OK. zlib will notice and complain (and if not, certainly the end result
will not match the object hash we expect).

Let's leave a comment this time to document our findings.

Signed-off-by: Jeff King 
---
 streaming.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/streaming.c b/streaming.c
index d1e6b2dce6..ac7c7a22f9 100644
--- a/streaming.c
+++ b/streaming.c
@@ -408,6 +408,15 @@ static read_method_decl(pack_non_delta)
st->z_state = z_done;
break;
}
+
+   /*
+* Unlike the loose object case, we do not have to worry here
+* about running out of input bytes and spinning infinitely. If
+* we get Z_BUF_ERROR due to too few input bytes, then we'll
+* replenish them in the next use_pack() call when we loop. If
+* we truly hit the end of the pack (i.e., because it's corrupt
+* or truncated), then use_pack() catches that and will die().
+*/
if (status != Z_OK && status != Z_BUF_ERROR) {
git_inflate_end(>z);
st->z_state = z_error;
-- 
2.19.1.1298.g19f18f2a22



Re: [PATCH 1/1] Use correct /dev/null for UNIX and Windows

2018-10-30 Thread Junio C Hamano
> Subject: Re: [PATCH 1/1] Use correct /dev/null for UNIX and Windows

As this is only about contrib/diff-highlight, please make it clear
that it is the area the patch affects on its title, i.e.

Subject: diff-highlight: use File::Spec->devnull(), not /dev/null

or something like that.

> From: chris 

Please make this line read like

From: Chris Webster 

i.e. the author should be the person who is signing off that patch.

> Use File::Spec->devnull() for output redirection to avoid messages
> when Windows version of Perl is first in path.  The message 'The
> system cannot find the path specified.' is displayed each time git is
> run to get colors.
>
> Signed-off-by: Chris. Webster 
> ---
>  contrib/diff-highlight/DiffHighlight.pm | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

There are a handful more instances of /dev/null found if you do

$ git grep /dev/null -- \*.pl \*.pm

The one in perl/Git.pm must be shared by scripts written in Perl, so
it may be worth giving the same tweak to it, like this patch does to
the highlight script.

> diff --git a/contrib/diff-highlight/DiffHighlight.pm 
> b/contrib/diff-highlight/DiffHighlight.pm
> index 536754583..7440aa1c4 100644
> --- a/contrib/diff-highlight/DiffHighlight.pm
> +++ b/contrib/diff-highlight/DiffHighlight.pm
> @@ -4,6 +4,11 @@ use 5.008;
>  use warnings FATAL => 'all';
>  use strict;
>  
> +# Use the correct value for both UNIX and Windows (/dev/null vs nul)
> +use File::Spec;
> +
> +my $NULL = File::Spec->devnull();
> +
>  # Highlight by reversing foreground and background. You could do
>  # other things like bold or underline if you prefer.
>  my @OLD_HIGHLIGHT = (
> @@ -134,7 +139,7 @@ sub highlight_stdin {
>  # fallback, which means we will work even if git can't be run.
>  sub color_config {
>   my ($key, $default) = @_;
> - my $s = `git config --get-color $key 2>/dev/null`;
> + my $s = `git config --get-color $key 2>$NULL`;
>   return length($s) ? $s : $default;
>  }


Re: [PATCH 2/3] check_stream_sha1(): handle input underflow

2018-10-30 Thread Jeff King
On Wed, Oct 31, 2018 at 01:44:25PM +0900, Junio C Hamano wrote:

> Jeff King  writes:
> 
> >> See 692f0bc7 to find who did the fix you stole from, and for what
> >> kind of breakage the original fix was made.
> >
> > Heh. I did not dig into it, but actually thought "I'll bet Junio had to
> > get this right when he wrote the streaming code. No wonder he spotted my
> > mistake so quickly!".
> >
> >> By the way, a very similar loop for pack_non_delta istream iterates
> >> while total_read is smaller than sz, but it does not have the same
> >> check upon BUF_ERROR to see if we've read everything.
> >
> > Indeed. Did you find that one by inspection, or did you peek at:
> >
> >   https://public-inbox.org/git/20130325202114.gd16...@sigill.intra.peff.net/
> 
> I looked for BUF_ERROR in the streaming.c and found two instances in
> a very similar looking loop with a subtle differnce, and the
> difference was due to one of them getting fixed in the past while
> the other one was left intact as written at its inception.
> 
> I should have looked for that message to read the part below
> three-dash mark.  Or we may want to transplant that comment somehow
> to the function so next person will not be puzzled like I did?

Hmm. Reading that function, I am not sure if it actually might need
fixing.

Might we actually get Z_BUF_ERROR asking for more input if zlib reads to
the end of the pack window? That is probably quite unlikely in practice,
but in theory you could feed a very large buffer for the output and use
a very small pack window.

So I do not think we can use the same logic in that loop. But at the
same time, what prevents use_pack() from getting to the very end of the
pack and saying "I have no bytes left for you"? And then we'd loop
infinitely, feeding zlib nothing.

I'm not sure what the solution is. I do not think this works:

diff --git a/streaming.c b/streaming.c
index d1e6b2dce6..a92a85ed10 100644
--- a/streaming.c
+++ b/streaming.c
@@ -394,6 +394,9 @@ static read_method_decl(pack_non_delta)
mapped = use_pack(st->u.in_pack.pack, ,
  st->u.in_pack.pos, >z.avail_in);
 
+   if (!st->z.avail_in)
+   break;
+
st->z.next_out = (unsigned char *)buf + total_read;
st->z.avail_out = sz - total_read;
st->z.next_in = mapped;

because we may have read to the very end but still have bytes to output.

Though hrm. I think use_pack() will always tell us about the trailing
20-byte hash in the "avail" window. Which means we should never
legitimately get to 0 there, because it means that either:

  1. We're reading the trailing hash, which cannot possibly be right (in
 most cases I'd expect zlib to barf at that point anyway, but of
 course it's possible to have a hash that is valid zlib data ;) ).

  2. We're truncated _before_ the hash, so we really did read to EOF,
 and there are no more bytes. I suspect we may actually detect this
 case upon opening the pack (since we do peek at the trailer then),
 but again that could be fooled by coincidence.

I guess that's not the whole story, though. use_pack() tries to promise
at least 20 bytes (to simplify some of the other parsing routines). So
we shouldn't actually ever get "0" here. If we really are that close to
the end of the pack, we'd hit this logic in use_pack:

  if (offset > (p->pack_size - the_hash_algo->rawsz))
die("offset beyond end of packfile (truncated pack?)");

So actually, I think this code is OK as-is. We will always have at least
20 bytes of input, or use_pack() will die.

Phew. I almost just deleted all of the above, because now I think I'm
ready to write that comment you asked for. ;) But I left it since maybe
it makes sense to follow my thought process.

-Peff


Re: [PATCH 1/1] Use correct /dev/null for UNIX and Windows

2018-10-30 Thread Junio C Hamano
"chris via GitGitGadget"  writes:

> From: chris 
>
> Use File::Spec->devnull() for output redirection to avoid messages
> when Windows version of Perl is first in path.  The message 'The

Dscho, "Windows version of Perl is first in path" somehow feels
contradicting with what one of the topics I saw from you were trying
to enforce (or, at least, "set as the supported configuration").

I am guessing that the Perl you are building and shipping with Git
for Windows would yield what the shell that ends up running the
scriptlet `git config --get-color $key` prefers when asked for
File::Spec->devnull(), and nothing will break with this patch even
if that is "/dev/null", but I thought I'd double check.

Thanks.

> system cannot find the path specified.' is displayed each time git is
> run to get colors.
>
> Signed-off-by: Chris. Webster 
> ---
>  contrib/diff-highlight/DiffHighlight.pm | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/contrib/diff-highlight/DiffHighlight.pm 
> b/contrib/diff-highlight/DiffHighlight.pm
> index 536754583..7440aa1c4 100644
> --- a/contrib/diff-highlight/DiffHighlight.pm
> +++ b/contrib/diff-highlight/DiffHighlight.pm
> @@ -4,6 +4,11 @@ use 5.008;
>  use warnings FATAL => 'all';
>  use strict;
>  
> +# Use the correct value for both UNIX and Windows (/dev/null vs nul)
> +use File::Spec;
> +
> +my $NULL = File::Spec->devnull();
> +
>  # Highlight by reversing foreground and background. You could do
>  # other things like bold or underline if you prefer.
>  my @OLD_HIGHLIGHT = (
> @@ -134,7 +139,7 @@ sub highlight_stdin {
>  # fallback, which means we will work even if git can't be run.
>  sub color_config {
>   my ($key, $default) = @_;
> - my $s = `git config --get-color $key 2>/dev/null`;
> + my $s = `git config --get-color $key 2>$NULL`;
>   return length($s) ? $s : $default;
>  }


Re: [PATCH 1/1] Use correct /dev/null for UNIX and Windows

2018-10-30 Thread Junio C Hamano
Jeff King  writes:

> On Tue, Oct 30, 2018 at 11:26:36AM -0700, chris via GitGitGadget wrote:
>
>> From: chris 
>
> You might want to adjust your user.name. :)

Yes, absolutely.  We'd want to see that the From: line and one of
the Signed-off-by: lines are idential.

>> Use File::Spec->devnull() for output redirection to avoid messages
>> when Windows version of Perl is first in path.  The message 'The
>> system cannot find the path specified.' is displayed each time git is
>> run to get colors.
>
> Thanks, makes sense, and the patch looks good to me.

Yup, and we already use File::Spec everywhere anyway, so this is not
a new dependency, either.  Which is very good.


Re: [PATCH 2/3] check_stream_sha1(): handle input underflow

2018-10-30 Thread Junio C Hamano
Jeff King  writes:

>> See 692f0bc7 to find who did the fix you stole from, and for what
>> kind of breakage the original fix was made.
>
> Heh. I did not dig into it, but actually thought "I'll bet Junio had to
> get this right when he wrote the streaming code. No wonder he spotted my
> mistake so quickly!".
>
>> By the way, a very similar loop for pack_non_delta istream iterates
>> while total_read is smaller than sz, but it does not have the same
>> check upon BUF_ERROR to see if we've read everything.
>
> Indeed. Did you find that one by inspection, or did you peek at:
>
>   https://public-inbox.org/git/20130325202114.gd16...@sigill.intra.peff.net/

I looked for BUF_ERROR in the streaming.c and found two instances in
a very similar looking loop with a subtle differnce, and the
difference was due to one of them getting fixed in the past while
the other one was left intact as written at its inception.

I should have looked for that message to read the part below
three-dash mark.  Or we may want to transplant that comment somehow
to the function so next person will not be puzzled like I did?


Re: [PATCH 1/2] ls-remote: do not send ref prefixes for patterns

2018-10-30 Thread Junio C Hamano
Jeff King  writes:

> Since b4be74105f (ls-remote: pass ref prefixes when requesting a
> remote's refs, 2018-03-15), "ls-remote foo" will pass "refs/heads/foo",
> "refs/tags/foo", etc to the transport code in an attempt to let the
> other side reduce the size of its advertisement.

Jonathan, seeing 2b554353 ("fetch: send "refs/tags/" prefix upon CLI
refspecs", 2018-06-05), I am guessing that you are doing the proto v2
work inherited from Brandon?  Having to undo this is unfortunate, but
I agree with this patch that we need to do this until ref prefix learns
to grok wildcards.

> Unfortunately this is not correct, as ls-remote patterns do not follow
> the usual ref lookup rules, and are in fact tail-matched. So we could
> find "refs/heads/foo" or "refs/heads/a/much/deeper/foo" or even
> "refs/another/hierarchy/foo".
>
> Since we can't pass a prefix and there's not yet a v2 extension for
> matching wildcards, we must disable this feature to keep the same
> behavior as v1.
>
> Reported-by: Jon Simons 
> Signed-off-by: Jeff King 
> ---
>  builtin/ls-remote.c  | 8 
>  t/t5512-ls-remote.sh | 9 +
>  2 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
> index 6a0cdec30d..5faa8459d9 100644
> --- a/builtin/ls-remote.c
> +++ b/builtin/ls-remote.c
> @@ -88,15 +88,7 @@ int cmd_ls_remote(int argc, const char **argv, const char 
> *prefix)
>   int i;
>   pattern = xcalloc(argc, sizeof(const char *));
>   for (i = 1; i < argc; i++) {
> - const char *glob;
>   pattern[i - 1] = xstrfmt("*/%s", argv[i]);
> -
> - glob = strchr(argv[i], '*');
> - if (glob)
> - argv_array_pushf(_prefixes, "%.*s",
> -  (int)(glob - argv[i]), 
> argv[i]);
> - else
> - expand_ref_prefix(_prefixes, argv[i]);
>   }
>   }
>  
> diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
> index bc5703ff9b..ca1b7e5bc1 100755
> --- a/t/t5512-ls-remote.sh
> +++ b/t/t5512-ls-remote.sh
> @@ -302,4 +302,13 @@ test_expect_success 'ls-remote works outside repository' 
> '
>   nongit git ls-remote dst.git
>  '
>  
> +test_expect_success 'ls-remote patterns work with all protocol versions' '
> + git for-each-ref --format="%(objectname)%(refname)" \
> + refs/heads/master refs/remotes/origin/master >expect &&
> + git -c protocol.version=1 ls-remote . master >actual.v1 &&
> + test_cmp expect actual.v1 &&
> + git -c protocol.version=2 ls-remote . master >actual.v2 &&
> + test_cmp expect actual.v2
> +'
> +
>  test_done


Re: [PATCH 2/3] check_stream_sha1(): handle input underflow

2018-10-30 Thread Jeff King
On Wed, Oct 31, 2018 at 01:23:54PM +0900, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > The bug comes from commit f6371f9210 (sha1_file: add
> > read_loose_object() function, 2017-01-13), which
> > reimplemented some of the existing loose object functions.
> > So it's worth checking if this bug was inherited from any of
> > those. The answers seems to be no. The two obvious
> > candidates are both OK:
> >
> >   1. unpack_sha1_rest(); this doesn't need to loop on
> >  Z_BUF_ERROR at all, since it allocates the expected
> >  output buffer in advance (which we can't do since we're
> >  explicitly streaming here)
> >
> >   2. check_object_signature(); the streaming path relies on
> >  the istream interface, which uses read_istream_loose()
> >  for this case. That function uses a similar "is our
> >  output buffer full" check with Z_BUF_ERROR (which is
> >  where I stole it from for this patch!)
> 
> See 692f0bc7 to find who did the fix you stole from, and for what
> kind of breakage the original fix was made.

Heh. I did not dig into it, but actually thought "I'll bet Junio had to
get this right when he wrote the streaming code. No wonder he spotted my
mistake so quickly!".

> By the way, a very similar loop for pack_non_delta istream iterates
> while total_read is smaller than sz, but it does not have the same
> check upon BUF_ERROR to see if we've read everything.

Indeed. Did you find that one by inspection, or did you peek at:

  https://public-inbox.org/git/20130325202114.gd16...@sigill.intra.peff.net/

? :)

-Peff


Re: Using --word-diff breaks --color-moved

2018-10-30 Thread Junio C Hamano
james harvey  writes:

> If you use both "--word-diff" and "--color-moved", regardless of the
> order of arguments, "--word-diff" takes precedence and "--color-moved"
> isn't allowed to do anything.
>
> I think "--color-moved" should have precedence over "--word-diff".  I
> cannot think of a scenario where a user would supply both options, and
> actually want "--word-diff" to take precedence.

I am not sure if I follow.  If these two cannot work well together,
then we should just reject the request as asking for incompatible
combination of options while we are parsing the command line
arguments, rather than arguing which one should trump the other
one---that would simply lead to "in my opinion, word-diff is more
important" vs "in mine, color-moved is more important", no?



[PATCH 2/2] ls-remote: pass heads/tags prefixes to transport

2018-10-30 Thread Jeff King
Unlike its arbitrary text patterns, the --heads and --tags
options to ls-remote are true prefixes. We can pass this
information to the transport code. If the v2 protocol is in
use, that will reduce the size of the ref advertisement.

Note that the test added here succeeds both before and after
the patch. This is an optimization, not a bug-fix; it's just
making sure we didn't break anything.

Signed-off-by: Jeff King 
---
 builtin/ls-remote.c  | 5 +
 t/t5512-ls-remote.sh | 9 +
 2 files changed, 14 insertions(+)

diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index 5faa8459d9..1d7f1f5ce2 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -92,6 +92,11 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
}
}
 
+   if (flags & REF_TAGS)
+   argv_array_push(_prefixes, "refs/tags/");
+   if (flags & REF_HEADS)
+   argv_array_push(_prefixes, "refs/heads/");
+
remote = remote_get(dest);
if (!remote) {
if (dest)
diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index ca1b7e5bc1..91ee6841c1 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -311,4 +311,13 @@ test_expect_success 'ls-remote patterns work with all 
protocol versions' '
test_cmp expect actual.v2
 '
 
+test_expect_success 'ls-remote prefixes work with all protocol versions' '
+   git for-each-ref --format="%(objectname)%(refname)" \
+   refs/heads/ refs/tags/ >expect &&
+   git -c protocol.version=1 ls-remote --heads --tags . >actual.v1 &&
+   test_cmp expect actual.v1 &&
+   git -c protocol.version=2 ls-remote --heads --tags . >actual.v2 &&
+   test_cmp expect actual.v2
+'
+
 test_done
-- 
2.19.1.1298.g19f18f2a22


[PATCH 1/2] ls-remote: do not send ref prefixes for patterns

2018-10-30 Thread Jeff King
Since b4be74105f (ls-remote: pass ref prefixes when requesting a
remote's refs, 2018-03-15), "ls-remote foo" will pass "refs/heads/foo",
"refs/tags/foo", etc to the transport code in an attempt to let the
other side reduce the size of its advertisement.

Unfortunately this is not correct, as ls-remote patterns do not follow
the usual ref lookup rules, and are in fact tail-matched. So we could
find "refs/heads/foo" or "refs/heads/a/much/deeper/foo" or even
"refs/another/hierarchy/foo".

Since we can't pass a prefix and there's not yet a v2 extension for
matching wildcards, we must disable this feature to keep the same
behavior as v1.

Reported-by: Jon Simons 
Signed-off-by: Jeff King 
---
 builtin/ls-remote.c  | 8 
 t/t5512-ls-remote.sh | 9 +
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index 6a0cdec30d..5faa8459d9 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -88,15 +88,7 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
int i;
pattern = xcalloc(argc, sizeof(const char *));
for (i = 1; i < argc; i++) {
-   const char *glob;
pattern[i - 1] = xstrfmt("*/%s", argv[i]);
-
-   glob = strchr(argv[i], '*');
-   if (glob)
-   argv_array_pushf(_prefixes, "%.*s",
-(int)(glob - argv[i]), 
argv[i]);
-   else
-   expand_ref_prefix(_prefixes, argv[i]);
}
}
 
diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index bc5703ff9b..ca1b7e5bc1 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -302,4 +302,13 @@ test_expect_success 'ls-remote works outside repository' '
nongit git ls-remote dst.git
 '
 
+test_expect_success 'ls-remote patterns work with all protocol versions' '
+   git for-each-ref --format="%(objectname)%(refname)" \
+   refs/heads/master refs/remotes/origin/master >expect &&
+   git -c protocol.version=1 ls-remote . master >actual.v1 &&
+   test_cmp expect actual.v1 &&
+   git -c protocol.version=2 ls-remote . master >actual.v2 &&
+   test_cmp expect actual.v2
+'
+
 test_done
-- 
2.19.1.1298.g19f18f2a22



Re: [PATCH 2/3] check_stream_sha1(): handle input underflow

2018-10-30 Thread Junio C Hamano
Jeff King  writes:

> The bug comes from commit f6371f9210 (sha1_file: add
> read_loose_object() function, 2017-01-13), which
> reimplemented some of the existing loose object functions.
> So it's worth checking if this bug was inherited from any of
> those. The answers seems to be no. The two obvious
> candidates are both OK:
>
>   1. unpack_sha1_rest(); this doesn't need to loop on
>  Z_BUF_ERROR at all, since it allocates the expected
>  output buffer in advance (which we can't do since we're
>  explicitly streaming here)
>
>   2. check_object_signature(); the streaming path relies on
>  the istream interface, which uses read_istream_loose()
>  for this case. That function uses a similar "is our
>  output buffer full" check with Z_BUF_ERROR (which is
>  where I stole it from for this patch!)

See 692f0bc7 to find who did the fix you stole from, and for what
kind of breakage the original fix was made.

By the way, a very similar loop for pack_non_delta istream iterates
while total_read is smaller than sz, but it does not have the same
check upon BUF_ERROR to see if we've read everything.




[PATCH 0/2] ls-remote and v2 ref prefixes

2018-10-30 Thread Jeff King
This series fixes a bug where ls-remote sends a ref-advertisement prefix
when it shouldn't, and then optimizes a spot where it doesn't send one
but could.

  [1/2]: ls-remote: do not send ref prefixes for patterns
  [2/2]: ls-remote: pass heads/tags prefixes to transport

 builtin/ls-remote.c  | 13 +
 t/t5512-ls-remote.sh | 18 ++
 2 files changed, 23 insertions(+), 8 deletions(-)

-Peff


Re: [PATCH 1/1] Use correct /dev/null for UNIX and Windows

2018-10-30 Thread Jeff King
On Tue, Oct 30, 2018 at 11:26:36AM -0700, chris via GitGitGadget wrote:

> From: chris 

You might want to adjust your user.name. :)

> Use File::Spec->devnull() for output redirection to avoid messages
> when Windows version of Perl is first in path.  The message 'The
> system cannot find the path specified.' is displayed each time git is
> run to get colors.

Thanks, makes sense, and the patch looks good to me.

-Peff


Re: [PATCH 0/4] mingw: prevent external PERL5LIB from interfering

2018-10-30 Thread Junio C Hamano
"Johannes Schindelin via GitGitGadget" 
writes:

> An alternative approach which was rejected at the time (because it
> interfered with the then-ongoing work to compile Git for Windows using MS
> Visual C++) would patch the make_environment_block() function to skip the
> specified environment variables before handing down the environment block to
> the spawned process. Currently it would interfere with the mingw-utf-8-env
> patch series I sent earlier today
> [https://public-inbox.org/git/pull.57.git.gitgitgad...@gmail.com].

So the rejected approach that was not used in this series would
interfere with the other one, but I do not have to worry about it
because this series does not use that approach?  I had to read the
six lines above twice to figure out that it essentially is saying "I
shouldn't care", but please let me know if I misread the paragraph
and I need to care ;-)

> While at it, this patch series also cleans up house and moves the
> Windows-specific core.* variable handling to compat/mingw.c rather than
> cluttering environment.c and config.c with things that e.g. developers on
> Linux do not want to care about.

Or Macs.  When I skimmed this series earlier, I found that patches 2
& 3 sensibly implemented to achieve this goal.

>
> Johannes Schindelin (4):
>   config: rename `dummy` parameter to `cb` in git_default_config()
>   Allow for platform-specific core.* config settings
>   Move Windows-specific config settings into compat/mingw.c
>   mingw: unset PERL5LIB by default
>
>  Documentation/config.txt |  6 
>  cache.h  |  8 -
>  compat/mingw.c   | 58 +++-
>  compat/mingw.h   |  3 ++
>  config.c | 18 ---
>  environment.c|  1 -
>  git-compat-util.h|  8 +
>  t/t0029-core-unsetenvvars.sh | 30 +++
>  8 files changed, 109 insertions(+), 23 deletions(-)
>  create mode 100755 t/t0029-core-unsetenvvars.sh
>
>
> base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2
> Published-As: 
> https://github.com/gitgitgadget/git/releases/tags/pr-62%2Fdscho%2Fperl5lib-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
> pr-62/dscho/perl5lib-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/62


Re: [PATCH 1/3] commit-reach: implement get_reachable_subset

2018-10-30 Thread Junio C Hamano
"Derrick Stolee via GitGitGadget"  writes:

> +struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
> +  struct commit **to, int nr_to,
> +  int reachable_flag)

This is OR'ed into object.flags, and I somoehow expected to see it
as 'unsigned int', not a signed one.

> +{
> + struct commit **item;
> + struct commit *current;
> + struct commit_list *found_commits = NULL;
> + struct commit **to_last = to + nr_to;
> + struct commit **from_last = from + nr_from;
> + uint32_t min_generation = GENERATION_NUMBER_INFINITY;
> + int num_to_find = 0;
> +
> + struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
> +
> + for (item = to; item < to_last; item++) {
> + struct commit *c = *item;
> + 
> + parse_commit(c);
> + if (c->generation < min_generation)
> + min_generation = c->generation;
> +
> + if (!(c->object.flags & PARENT1)) {
> + c->object.flags |= PARENT1;
> + num_to_find++;
> + }
> + }
> +
> + for (item = from; item < from_last; item++) {
> + struct commit *c = *item;
> + if (!(c->object.flags & PARENT2)) {
> + c->object.flags |= PARENT2;
> + parse_commit(c);
> +
> + prio_queue_put(, *item);
> + }
> + }

OK, we marked "to" with PARENT1 and counted them in num_to_find
without dups.  We also marked "from" with PARENT2 and threw them in
the "queue" without dups.

Mental note: the caller must guarantee that everybody reachable from
"to" and "from" have PARENT1 and PARENT2 clear.  This might deserve
to be in the comment before the function.

> + while (num_to_find && (current = prio_queue_get()) != NULL) {
> + struct commit_list *parents;
> +
> + if (current->object.flags & PARENT1) {
> + current->object.flags &= ~PARENT1;
> + current->object.flags |= reachable_flag;
> + commit_list_insert(current, _commits);
> + num_to_find--;

What is in "queue" are all reachable from "from" so we found this
object in the "to" set is reachable.  One down.

> + }
> +
> + for (parents = current->parents; parents; parents = 
> parents->next) {
> + struct commit *p = parents->item;
> +
> + parse_commit(p);
> +
> + if (p->generation < min_generation)
> + continue;

Any commit reachable from "from" whose generation is too small
(i.e. old) can never reach any of "to", because all "to" are at
generation "min_generation" or greater.  Makes sense.

> + if (p->object.flags & PARENT2)
> + continue;

If the object is painted with PARENT2, we know we have it in queue
and traversing to find more of its ancestors.  Avoid recursing into
it twice.  Makes sense.

> + p->object.flags |= PARENT2;
> + prio_queue_put(, p);

Otherwise, explore this parent.

> + }
> + }
> +
> + clear_commit_marks_many(nr_to, to, PARENT1);

Among "to", the ones that were not found still have PARENT1.
Because we do not propagate this flag down the ancestry chain like
merge-base discovery, this only has to clear just one level.

> + clear_commit_marks_many(nr_from, from, PARENT2);

And then we smudged everything that are reachable from "from"
with this flag.  In the worst case, i.e. when num_to_find is still
positive here, we would have painted all commits down to the root,
and we need to clear all of them.

> + return found_commits;
> +}

OK, this all makes sense.  Unlike merge-base traversals, this does
not have to traverse from the "to" side at all, which makes it quite
simpler and straight-forward.

I do wonder if we can now reimplement in_merge_bases_many() in terms
of this helper, and if that gives us a better performance.  It asks
"is 'commit', i.e. a single 'to', an ancestor of, i.e. reachable
from, one of the 'references', i.e.  'from'"?

> diff --git a/commit-reach.h b/commit-reach.h
> index 7d313e297..43bd50a70 100644
> --- a/commit-reach.h
> +++ b/commit-reach.h
> @@ -74,4 +74,14 @@ int can_all_from_reach_with_flag(struct object_array *from,
>  int can_all_from_reach(struct commit_list *from, struct commit_list *to,
>  int commit_date_cutoff);
>  
> +
> +/*
> + * Return a list of commits containing the commits in the 'to' array
> + * that are reachable from at least one commit in the 'from' array.
> + * Also add the given 'flag' to each of the commits in the returned list.
> + */
> +struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
> +  struct commit **to, int nr_to,
> 

Re: [PATCH 0/3] Make add_missing_tags() linear

2018-10-30 Thread Junio C Hamano
"Derrick Stolee via GitGitGadget"  writes:

> Add a new method in commit-reach.c called get_reachable_subset() which does
> a many-to-many reachability test. Starting at the 'from' commits, walk until
> the generation is below the smallest generation in the 'to' commits, or all
> 'to' commits have been discovered. This performs only one commit walk for
> the entire add_missing_tags() method, giving linear performance in the worst
> case.

;-)

I think in_merge_bases_many() was an attempt to do one half of this
(i.e. it checks only one 'to' against main 'from' to see if any of
them reach).  I wonder why we didn't extend it to multiple 'to' back
when we invented it.

In any case, good to see this code optimized.

Thanks.


Re: [PATCH v3 0/5] am/rebase: share read_author_script()

2018-10-30 Thread Junio C Hamano
Phillip Wood  writes:

> From: Phillip Wood 
>
> Thanks to Junio for the feedback on v2. I've updated patch 4 based on
> those comments, the rest are unchanged.

Hmph, all these five patches seem to be identical to what I have in
'pu'.  Did you send the right version?

> v1 cover letter:
>
> This is a follow up to pw/rebase-i-author-script-fix, it reduces code
> duplication and improves rebase's parsing of the author script. After
> this I'll do another series to share the code to write the author
> script.
>
> Phillip Wood (5):
>   am: don't die in read_author_script()
>   am: improve author-script error reporting
>   am: rename read_author_script()
>   add read_author_script() to libgit
>   sequencer: use read_author_script()
>
>  builtin/am.c |  60 ++--
>  sequencer.c  | 192 ---
>  sequencer.h  |   3 +
>  3 files changed, 128 insertions(+), 127 deletions(-)


Re: [PATCH v3 2/2] worktree: rename is_worktree_locked to worktree_lock_reason

2018-10-30 Thread Junio C Hamano
nbelakov...@gmail.com writes:

> From: Nickolai Belakovski 
>
> A function prefixed with 'is_' would be expected to return a boolean,
> however this function returns a string.
>
> Signed-off-by: Nickolai Belakovski 
> ---

Given that there is a clear documentation in worktree.h, and a
pointer that is not NULL is true in "if/while(ptr)", I'd say this
change is a borderline "meh".

I'll queue this on top of 1/2 and try merging to the integration
topics to see if it interacts with any other topics in flight.
Since the patch has already been written, let's not waste the effort
if there is no conflict with anybody else (otherwise I'd discard
this one---a "meh" patch is not worth having to worry about conflict
resolution).

Thanks.

>  builtin/worktree.c | 10 +-
>  worktree.c |  2 +-
>  worktree.h |  4 ++--
>  3 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> index c4abbde2b..5e8402617 100644
> --- a/builtin/worktree.c
> +++ b/builtin/worktree.c
> @@ -245,7 +245,7 @@ static void validate_worktree_add(const char *path, const 
> struct add_opts *opts)
>   if (!wt)
>   goto done;
>  
> - locked = !!is_worktree_locked(wt);
> + locked = !!worktree_lock_reason(wt);
>   if ((!locked && opts->force) || (locked && opts->force > 1)) {
>   if (delete_git_dir(wt->id))
>   die(_("unable to re-add worktree '%s'"), path);
> @@ -682,7 +682,7 @@ static int lock_worktree(int ac, const char **av, const 
> char *prefix)
>   if (is_main_worktree(wt))
>   die(_("The main working tree cannot be locked or unlocked"));
>  
> - old_reason = is_worktree_locked(wt);
> + old_reason = worktree_lock_reason(wt);
>   if (old_reason) {
>   if (*old_reason)
>   die(_("'%s' is already locked, reason: %s"),
> @@ -714,7 +714,7 @@ static int unlock_worktree(int ac, const char **av, const 
> char *prefix)
>   die(_("'%s' is not a working tree"), av[0]);
>   if (is_main_worktree(wt))
>   die(_("The main working tree cannot be locked or unlocked"));
> - if (!is_worktree_locked(wt))
> + if (!worktree_lock_reason(wt))
>   die(_("'%s' is not locked"), av[0]);
>   ret = unlink_or_warn(git_common_path("worktrees/%s/locked", wt->id));
>   free_worktrees(worktrees);
> @@ -787,7 +787,7 @@ static int move_worktree(int ac, const char **av, const 
> char *prefix)
>   validate_no_submodules(wt);
>  
>   if (force < 2)
> - reason = is_worktree_locked(wt);
> + reason = worktree_lock_reason(wt);
>   if (reason) {
>   if (*reason)
>   die(_("cannot move a locked working tree, lock reason: 
> %s\nuse 'move -f -f' to override or unlock first"),
> @@ -900,7 +900,7 @@ static int remove_worktree(int ac, const char **av, const 
> char *prefix)
>   if (is_main_worktree(wt))
>   die(_("'%s' is a main working tree"), av[0]);
>   if (force < 2)
> - reason = is_worktree_locked(wt);
> + reason = worktree_lock_reason(wt);
>   if (reason) {
>   if (*reason)
>   die(_("cannot remove a locked working tree, lock 
> reason: %s\nuse 'remove -f -f' to override or unlock first"),
> diff --git a/worktree.c b/worktree.c
> index b0d0b5426..befdbe7fa 100644
> --- a/worktree.c
> +++ b/worktree.c
> @@ -235,7 +235,7 @@ int is_main_worktree(const struct worktree *wt)
>   return !wt->id;
>  }
>  
> -const char *is_worktree_locked(struct worktree *wt)
> +const char *worktree_lock_reason(struct worktree *wt)
>  {
>   assert(!is_main_worktree(wt));
>  
> diff --git a/worktree.h b/worktree.h
> index 6b12a3cf6..55d449b6a 100644
> --- a/worktree.h
> +++ b/worktree.h
> @@ -10,7 +10,7 @@ struct worktree {
>   char *path;
>   char *id;
>   char *head_ref; /* NULL if HEAD is broken or detached */
> - char *lock_reason;  /* private - use is_worktree_locked */
> + char *lock_reason;  /* private - use worktree_lock_reason */
>   struct object_id head_oid;
>   int is_detached;
>   int is_bare;
> @@ -60,7 +60,7 @@ extern int is_main_worktree(const struct worktree *wt);
>   * Return the reason string if the given worktree is locked or NULL
>   * otherwise.
>   */
> -extern const char *is_worktree_locked(struct worktree *wt);
> +extern const char *worktree_lock_reason(struct worktree *wt);
>  
>  #define WT_VALIDATE_WORKTREE_MISSING_OK (1 << 0)


Re: [PATCH v3 1/2] worktree: update documentation for lock_reason and lock_reason_valid

2018-10-30 Thread Junio C Hamano
nbelakov...@gmail.com writes:

> From: Nickolai Belakovski 
>
> Clarify that these fields are to be considered implementation details
> and direct the reader to use the is_worktree_locked function to retrieve
> said information.
>
> Signed-off-by: Nickolai Belakovski 
> ---
>  worktree.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/worktree.h b/worktree.h
> index df3fc30f7..6b12a3cf6 100644
> --- a/worktree.h
> +++ b/worktree.h
> @@ -10,12 +10,12 @@ struct worktree {
>   char *path;
>   char *id;
>   char *head_ref; /* NULL if HEAD is broken or detached */
> - char *lock_reason;  /* internal use */
> + char *lock_reason;  /* private - use is_worktree_locked */

s/use /used by/, probably.

>   struct object_id head_oid;
>   int is_detached;
>   int is_bare;
>   int is_current;
> - int lock_reason_valid;
> + int lock_reason_valid; /* private */
>  };

These annotations to the two fields are not wrong per-se, but I have
a feeling that it would equally be important to document what the
other "non-private" fields mean, if peeking them *is* the API this
subsystem offers.

Thanks.


Using --word-diff breaks --color-moved

2018-10-30 Thread james harvey
If you use both "--word-diff" and "--color-moved", regardless of the
order of arguments, "--word-diff" takes precedence and "--color-moved"
isn't allowed to do anything.

I think "--color-moved" should have precedence over "--word-diff".  I
cannot think of a scenario where a user would supply both options, and
actually want "--word-diff" to take precedence.  If I'm not thinking
of a scenario where this wouldn't be desired, perhaps whichever is
first as an argument could take precedence.

(The same behavior happens if 4+ lines are moved and
"--color-moved{default=zebra}" is used, but below
"--color-moved=plain" is used to be a smaller testcase.)

Given the following setup:

$ cat << EOF > file
> a
> b
> c
> EOF
$ git add file
$ git commit -m "Added file."
$ cat << EOF > file
> b
> c
> a
> EOF

You can have moved lines colorization:
$ git diff --color-moved=plain
...
[oldMovedColor]-a
b
c
[newMovedColor]+a

You can diff based on words:
$ git diff --word-diff
...
[oldColor][-a-]
b
c
[newColor][+a+}

But, you cannot diff based on words, and have moved lines colorization:
$ git diff --color-moved=plain --word-diff
$ git diff --word-diff
...
[oldColor][-a-]
b
c
[newColor][+a+}


Re: [PATCH 12/12] fsck: mark strings for translation

2018-10-30 Thread Jonathan Nieder
Hi,

Ævar Arnfjörð Bjarmason wrote:
>> On Mon, Oct 29, 2018 at 3:09 PM Junio C Hamano  wrote:
>>> SZEDER Gábor  writes:
 Nguyễn Thái Ngọc Duy wrote:

> -fprintf(stderr, "%s in %s %s: %s\n",
> -msg_type, printable_type(obj), describe_object(obj), err);
> +fprintf_ln(stderr, _("%s in %s %s: %s"),

 Are the (f)printf() -> (f)printf_ln() changes all over
 'builtin/fsck.c' really necessary to mark strings for translation?
>>>
>>> It is beyond absolute minimum but I saw it argued here that this
>>> makes it easier to manage the .po and .pot files if your message
>>> strings do not end with LF, a you are much less likely to _add_
>>> unneeded LF to the translated string than _lose_ LF at the end of
>>> translated string.
[...]
> As Jonathan pointed out in the follow-up message[1] this sort of thing
> is checked for in msgfmt, so sure, let's strip the \n, but it's really
> not something we need to worry about. Likewise with translators turning
> "%s" into "%d" (or "% s") or whatever.

IMHO the advantage of leaving the \n out in the message is not so much
that we don't trust translators but more that where we can make their
lives easier, we should.

In other words, I'm glad the patch does that, and Ævar, I agree.

Thanks, both.

Jonathan

> 1. 
> https://public-inbox.org/git/cacsjy8acuy9fziiehgc7mel4i+xp6u0pmh1rgor-wznhyt1...@mail.gmail.com/


[PATCH 2/3] check_stream_sha1(): handle input underflow

2018-10-30 Thread Jeff King
This commit fixes an infinite loop when fscking large
truncated loose objects.

The check_stream_sha1() function takes an mmap'd loose
object buffer and streams 4k of output at a time, checking
its sha1. The loop quits when we've output enough bytes (we
know the size from the object header), or when zlib tells us
anything except Z_OK or Z_BUF_ERROR.

The latter is expected because zlib may run out of room in
our 4k buffer, and that is how it tells us to process the
output and loop again.

But Z_BUF_ERROR also covers another case: one in which zlib
cannot make forward progress because it needs more _input_.
This should never happen in this loop, because though we're
streaming the output, we have the entire deflated input
available in the mmap'd buffer. But since we don't check
this case, we'll just loop infinitely if we do see a
truncated object, thinking that zlib is asking for more
output space.

It's tempting to fix this by checking stream->avail_in as
part of the loop condition (and quitting if all of our bytes
have been consumed). But that assumes that once zlib has
consumed the input, there is nothing left to do.  That's not
necessarily the case: it may have read our input into its
internal state, but still have bytes to output.

Instead, let's continue on Z_BUF_ERROR only when we see the
case we're expecting: the previous round filled our output
buffer completely. If it didn't (and we still saw
Z_BUF_ERROR), we know something is wrong and should break
out of the loop.

The bug comes from commit f6371f9210 (sha1_file: add
read_loose_object() function, 2017-01-13), which
reimplemented some of the existing loose object functions.
So it's worth checking if this bug was inherited from any of
those. The answers seems to be no. The two obvious
candidates are both OK:

  1. unpack_sha1_rest(); this doesn't need to loop on
 Z_BUF_ERROR at all, since it allocates the expected
 output buffer in advance (which we can't do since we're
 explicitly streaming here)

  2. check_object_signature(); the streaming path relies on
 the istream interface, which uses read_istream_loose()
 for this case. That function uses a similar "is our
 output buffer full" check with Z_BUF_ERROR (which is
 where I stole it from for this patch!)

Reported-by: Ævar Arnfjörð Bjarmason 
Helped-by: Junio C Hamano 
Signed-off-by: Jeff King 
---
 sha1-file.c |  3 ++-
 t/t1450-fsck.sh | 19 +++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/sha1-file.c b/sha1-file.c
index dd0b6aa873..2daf7d9935 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -2199,7 +2199,8 @@ static int check_stream_sha1(git_zstream *stream,
 * see the comment in unpack_sha1_rest for details.
 */
while (total_read <= size &&
-  (status == Z_OK || status == Z_BUF_ERROR)) {
+  (status == Z_OK ||
+   (status == Z_BUF_ERROR && !stream->avail_out))) {
stream->next_out = buf;
stream->avail_out = sizeof(buf);
if (size - total_read < stream->avail_out)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 3421f12e8a..b5677d26a4 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -683,6 +683,25 @@ test_expect_success 'fsck detects trailing loose garbage 
(large blob)' '
test_i18ngrep "garbage.*$blob" out
 '
 
+test_expect_success 'fsck detects truncated loose object' '
+   # make it big enough that we know we will truncate in the data
+   # portion, not the header
+   test-tool genrandom truncate 4096 >file &&
+   blob=$(git hash-object -w file) &&
+   file=$(sha1_file $blob) &&
+   test_when_finished "remove_object $blob" &&
+   test_copy_bytes 1024 <"$file" >tmp &&
+   rm "$file" &&
+   mv -f tmp "$file" &&
+
+   # check both regular and streaming code paths
+   test_must_fail git fsck 2>out &&
+   test_i18ngrep corrupt.*$blob out &&
+
+   test_must_fail git -c core.bigfilethreshold=128 fsck 2>out &&
+   test_i18ngrep corrupt.*$blob out
+'
+
 # for each of type, we have one version which is referenced by another object
 # (and so while unreachable, not dangling), and another variant which really is
 # dangling.
-- 
2.19.1.1235.g6b27db57c2



[PATCH 3/3] cat-file: handle streaming failures consistently

2018-10-30 Thread Jeff King
There are three ways to convince cat-file to stream a blob:

  - cat-file -p $blob

  - cat-file blob $blob

  - echo $batch | cat-file --batch

In the first two, we simply exit with the error code of
streaw_blob_to_fd(). That means that an error will cause us
to exit with "-1" (which we try to avoid) without printing
any kind of error message (which is confusing to the user).

Instead, let's match the third case, which calls die() on an
error. Unfortunately we cannot be more specific, as
stream_blob_to_fd() does not tell us whether the problem was
on reading (e.g., a corrupt object) or on writing (e.g.,
ENOSPC). That might be an opportunity for future work, but
for now we will at least exit with a sane message and exit
code.

Signed-off-by: Jeff King 
---
 builtin/cat-file.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 8d97c84725..0d403eb77d 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -50,6 +50,13 @@ static int filter_object(const char *path, unsigned mode,
return 0;
 }
 
+static int stream_blob(const struct object_id *oid)
+{
+   if (stream_blob_to_fd(1, oid, NULL, 0))
+   die("unable to stream %s to stdout", oid_to_hex(oid));
+   return 0;
+}
+
 static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
int unknown_type)
 {
@@ -132,7 +139,7 @@ static int cat_one_file(int opt, const char *exp_type, 
const char *obj_name,
}
 
if (type == OBJ_BLOB)
-   return stream_blob_to_fd(1, , NULL, 0);
+   return stream_blob();
buf = read_object_file(, , );
if (!buf)
die("Cannot read object %s", obj_name);
@@ -155,7 +162,7 @@ static int cat_one_file(int opt, const char *exp_type, 
const char *obj_name,
oidcpy(_oid, );
 
if (oid_object_info(the_repository, _oid, NULL) == 
OBJ_BLOB)
-   return stream_blob_to_fd(1, _oid, NULL, 0);
+   return stream_blob(_oid);
/*
 * we attempted to dereference a tag to a blob
 * and failed; there may be new dereference
@@ -319,8 +326,9 @@ static void print_object_or_die(struct batch_options *opt, 
struct expand_data *d
BUG("invalid cmdmode: %c", opt->cmdmode);
batch_write(opt, contents, size);
free(contents);
-   } else if (stream_blob_to_fd(1, oid, NULL, 0) < 0)
-   die("unable to stream %s to stdout", oid_to_hex(oid));
+   } else {
+   stream_blob(oid);
+   }
}
else {
enum object_type type;
-- 
2.19.1.1235.g6b27db57c2


[PATCH 1/3] t1450: check large blob in trailing-garbage test

2018-10-30 Thread Jeff King
Commit cce044df7f (fsck: detect trailing garbage in all
object types, 2017-01-13) added two tests of trailing
garbage in a loose object file: one with a commit and one
with a blob. The point of having two is that blobs would
follow a different code path that streamed the contents,
instead of loading it into a buffer as usual.

At the time, merely being a blob was enough to trigger the
streaming code path. But since 7ac4f3a007 (fsck: actually
fsck blob data, 2018-05-02), we now only stream blobs that
are actually large. So since then, the streaming code path
is not tested at all for this case.

We can restore the original intent of the test by tweaking
core.bigFileThreshold to make our small blob seem large.
There's no easy way to externally verify that we followed
the streaming code path, but I did check before/after using
a temporary debug statement.

Signed-off-by: Jeff King 
---
I prepared this series on master, but it occurs to me you may want to
apply patch 2 on top of f6371f9210 or thereabouts, which introduced the
bug it fixes. If so, then obviously this one doesn't make sense back
then, and should go on top of 7ac4f3a007. It should be semantically
independent, though there may be a minor text conflict.

 t/t1450-fsck.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 0f2dd26f74..3421f12e8a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -673,13 +673,13 @@ test_expect_success 'fsck detects trailing loose garbage 
(commit)' '
test_i18ngrep "garbage.*$commit" out
 '
 
-test_expect_success 'fsck detects trailing loose garbage (blob)' '
+test_expect_success 'fsck detects trailing loose garbage (large blob)' '
blob=$(echo trailing | git hash-object -w --stdin) &&
file=$(sha1_file $blob) &&
test_when_finished "remove_object $blob" &&
chmod +w "$file" &&
echo garbage >>"$file" &&
-   test_must_fail git fsck 2>out &&
+   test_must_fail git -c core.bigfilethreshold=5 fsck 2>out &&
test_i18ngrep "garbage.*$blob" out
 '
 
-- 
2.19.1.1235.g6b27db57c2



Re: Infinite loop regression in git-fsck in v2.12.0

2018-10-30 Thread Jeff King
On Tue, Oct 30, 2018 at 06:56:03PM -0400, Jeff King wrote:

> > >   while (total_read <= size &&
> > > +stream->avail_in > 0 &&
> > >  (status == Z_OK || status == Z_BUF_ERROR)) {
> > >   stream->next_out = buf;
> > >   stream->avail_out = sizeof(buf);
> > 
> > Hmph.  If the last round consumed the final input byte and needed
> > output space of N bytes, but only M (< N) bytes of the output space
> > was available, then it would have reduced both avail_in and
> > avail_out down to zero and yielded Z_BUF_ERROR, no?  Or would zlib
> > refrain from consuming that final byte (leaving avail_in to at least
> > one) and give us Z_BUF_ERROR in such a case?
> 
> Hmm, yeah, good thinking. I think zlib could consume that final byte
> into its internal buffer.
> 
> As part of my digging, I looked at how the loose streaming code handles
> this. It checks that when we see Z_BUF_ERROR, we actually did run out of
> output bytes (so if we didn't, then we know it's not the case we
> expected to be looping on).
> 
> I have some patches almost ready to send; I'll use that technique.

And here they are.

  [1/3]: t1450: check large blob in trailing-garbage test
  [2/3]: check_stream_sha1(): handle input underflow
  [3/3]: cat-file: handle streaming failures consistently

 builtin/cat-file.c | 16 
 sha1-file.c|  3 ++-
 t/t1450-fsck.sh| 23 +--
 3 files changed, 35 insertions(+), 7 deletions(-)

-Peff


Re: Infinite loop regression in git-fsck in v2.12.0

2018-10-30 Thread Jeff King
On Tue, Oct 30, 2018 at 10:56:22PM +0100, Ævar Arnfjörð Bjarmason wrote:

> > So maybe a good approach would be that we'd annotate all those test
> > whose fsck fails with "this is how it should fail", and run those tests
> > under GIT_TEST_FSCK=true, and GIT_TEST_FSCK=true would also be asserting
> > that no tests other than those marked as failing the fsck check at the
> > end fail it.
> [...]
> Jeff: Gotta turn in for the night, but maybe Something you're maybe
> interested in carrying forward for this fix? It's not that much work to
> mark up the failing tests, there's 10-20 of them from some quick
> eyeballing.

For this fix, I'd much rather add a specific test to the existing fsck
tests. Otherwise, we're relying on what a bunch of other tests happen to
be doing now, but there's little hope that they won't get refactored in
a way that puts a gap in our test coverage.

IOW, I think of things like GIT_TEST_FSCK as a kind of shotgun approach.
They may find things, and we should fix them and make sure it runs
clean. But ultimately, specific cases of interest should get their own
tests.

-Peff


Re: [RFC v1] Add virtual file system settings and hook proc

2018-10-30 Thread Junio C Hamano
Ben Peart  writes:

> diff --git a/config.c b/config.c
> index 4051e38823..96e05ee0f1 100644
> --- a/config.c
> +++ b/config.c
> ...
> @@ -2307,6 +2311,37 @@ int git_config_get_index_threads(void)
>   return 0; /* auto */
>  }
>  
> +int git_config_get_virtualfilesystem(void)
> +{
> + if (git_config_get_pathname("core.virtualfilesystem", 
> _virtualfilesystem))
> + core_virtualfilesystem = getenv("GIT_TEST_VIRTUALFILESYSTEM");
> +
> + if (core_virtualfilesystem && !*core_virtualfilesystem)
> + core_virtualfilesystem = NULL;
> +
> + if (core_virtualfilesystem) {
> + /*
> +  * Some git commands spawn helpers and redirect the index to a 
> different
> +  * location.  These include "difftool -d" and the sequencer
> +  * (i.e. `git rebase -i`, `git cherry-pick` and `git revert`) 
> and others.
> +  * In those instances we don't want to update their temporary 
> index with
> +  * our virtualization data.
> +  */
> + char *default_index_file = xstrfmt("%s/%s", 
> the_repository->gitdir, "index");
> + int should_run_hook = !strcmp(default_index_file, 
> the_repository->index_file);
> +
> + free(default_index_file);
> + if (should_run_hook) {
> + /* virtual file system relies on the sparse checkout 
> logic so force it on */
> + core_apply_sparse_checkout = 1;
> + return 1;
> + }
> + core_virtualfilesystem = NULL;

It would be a small leak if this came from config_get_pathname(),
but if it came from $GIT_TEST_VFS env, we cannot free it X-<.

A helper function called *_get_X() that does not return X as its
return value or updating the location pointed by its *dst parameter,
and instead only stores its finding in a global variable feels
somewhat odd.  It smells more like "find out", "probe", "check",
etc.

> diff --git a/dir.c b/dir.c
> index 47c2fca8dc..3097b0e446 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -21,6 +21,7 @@
>  #include "ewah/ewok.h"
>  #include "fsmonitor.h"
>  #include "submodule-config.h"
> +#include "virtualfilesystem.h"
>  
>  /*
>   * Tells read_directory_recursive how a file or directory should be treated.
> @@ -1109,6 +1110,18 @@ int is_excluded_from_list(const char *pathname,
> struct exclude_list *el, struct index_state *istate)
>  {
>   struct exclude *exclude;
> +
> + /*
> +  * The virtual file system data is used to prevent git from traversing
> +  * any part of the tree that is not in the virtual file system.  Return
> +  * 1 to exclude the entry if it is not found in the virtual file system,
> +  * else fall through to the regular excludes logic as it may further 
> exclude.
> +  */

This comment will sit better immediately in front of the call to "is
excluded from vfs?" helper function.

> + if (*dtype == DT_UNKNOWN)
> + *dtype = get_dtype(NULL, istate, pathname, pathlen);

We try to defer paying cost to determine unknown *dtype as late as
possible by having this call in last_exclude_matching_from_list(),
and not here.  If we are doing this, we probably should update the
callpaths that call last_exclude_matching_from_list() to make the
caller responsible for doing get_dtype() and drop the lazy finding
of dtype from the callee.  Alternatively, the new "is excluded from
vfs" helper can learn to do the lazy get_dtype() just like the
existing last_exclude_matching_from_list() does.  I suspect the
latter may be simpler.

> + if (is_excluded_from_virtualfilesystem(pathname, pathlen, *dtype) > 0)
> + return 1;
> +
>   exclude = last_exclude_matching_from_list(pathname, pathlen, basename,
> dtype, el, istate);
>   if (exclude)
> @@ -1324,8 +1337,20 @@ struct exclude *last_exclude_matching(struct 
> dir_struct *dir,
>  int is_excluded(struct dir_struct *dir, struct index_state *istate,
>   const char *pathname, int *dtype_p)
>  {
> - struct exclude *exclude =
> - last_exclude_matching(dir, istate, pathname, dtype_p);
> + struct exclude *exclude;
> +
> + /*
> +  * The virtual file system data is used to prevent git from traversing
> +  * any part of the tree that is not in the virtual file system.  Return
> +  * 1 to exclude the entry if it is not found in the virtual file system,
> +  * else fall through to the regular excludes logic as it may further 
> exclude.
> +  */
> + if (*dtype_p == DT_UNKNOWN)
> + *dtype_p = get_dtype(NULL, istate, pathname, strlen(pathname));

Exactly the same comment as above.

> + if (is_excluded_from_virtualfilesystem(pathname, strlen(pathname), 
> *dtype_p) > 0)
> + return 1;
> +
> + exclude = last_exclude_matching(dir, istate, pathname, dtype_p);
>   if (exclude)
>   

Re: Infinite loop regression in git-fsck in v2.12.0

2018-10-30 Thread Jeff King
On Wed, Oct 31, 2018 at 07:28:00AM +0900, Junio C Hamano wrote:

> > So we need to distinguish those cases. I think this is the simplest fix:
> >
> > diff --git a/sha1-file.c b/sha1-file.c
> > index dd0b6aa873..a7ff5fe25d 100644
> > --- a/sha1-file.c
> > +++ b/sha1-file.c
> > @@ -2199,6 +2199,7 @@ static int check_stream_sha1(git_zstream *stream,
> >  * see the comment in unpack_sha1_rest for details.
> >  */
> > while (total_read <= size &&
> > +  stream->avail_in > 0 &&
> >(status == Z_OK || status == Z_BUF_ERROR)) {
> > stream->next_out = buf;
> > stream->avail_out = sizeof(buf);
> 
> Hmph.  If the last round consumed the final input byte and needed
> output space of N bytes, but only M (< N) bytes of the output space
> was available, then it would have reduced both avail_in and
> avail_out down to zero and yielded Z_BUF_ERROR, no?  Or would zlib
> refrain from consuming that final byte (leaving avail_in to at least
> one) and give us Z_BUF_ERROR in such a case?

Hmm, yeah, good thinking. I think zlib could consume that final byte
into its internal buffer.

As part of my digging, I looked at how the loose streaming code handles
this. It checks that when we see Z_BUF_ERROR, we actually did run out of
output bytes (so if we didn't, then we know it's not the case we
expected to be looping on).

I have some patches almost ready to send; I'll use that technique.

-Peff


Re: Infinite loop regression in git-fsck in v2.12.0

2018-10-30 Thread Junio C Hamano
Jeff King  writes:

> The problem isn't actually a sha1 mismatch, though that's what
> parse_object() will report. The issue is actually that the file is
> truncated. So zlib does not say "this is corrupt", but rather "I need
> more bytes to keep going". And unfortunately it returns Z_BUF_ERROR both
> for "I need more bytes" (in which we know we are truncated, because we
> fed the whole mmap'd file in the first place) as well as "I need more
> output buffer space" (which just means we should keep looping!).
>
> So we need to distinguish those cases. I think this is the simplest fix:
>
> diff --git a/sha1-file.c b/sha1-file.c
> index dd0b6aa873..a7ff5fe25d 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -2199,6 +2199,7 @@ static int check_stream_sha1(git_zstream *stream,
>* see the comment in unpack_sha1_rest for details.
>*/
>   while (total_read <= size &&
> +stream->avail_in > 0 &&
>  (status == Z_OK || status == Z_BUF_ERROR)) {
>   stream->next_out = buf;
>   stream->avail_out = sizeof(buf);

Hmph.  If the last round consumed the final input byte and needed
output space of N bytes, but only M (< N) bytes of the output space
was available, then it would have reduced both avail_in and
avail_out down to zero and yielded Z_BUF_ERROR, no?  Or would zlib
refrain from consuming that final byte (leaving avail_in to at least
one) and give us Z_BUF_ERROR in such a case?

> This works just by checking that we are making forward progress in the
> output buffer. I think that would _probably_ be OK for this case, since
> we know we have all of the input available. But in a case where we're
> feeding the input in a stream, it would not be. It's possible there that
> we would not create any output in one round, but would do so after
> feeding more input bytes.

Yes, exactly.

> I think the patch I showed above addresses the root cause more directly.
> I'll wrap that up in a real commit, but I think there may be some
> related work:
>
>   - "git show 19f9c827" does complain with "sha1 mismatch" (which isn't
> strictly correct, but is probably good enough). However, "git
> cat-file blob 19f9c827" exits non-zero without printing anything. It
> probably should complain more loudly.
>
>   - the offending loop comes from f6371f9210. But that commit was mostly
> cargo-culting other parts of sha1-file.c. I'm worried that this bug
> exists elsewhere, too. I'll dig around to see if I can find other
> instances.

Thanks.


[PATCH 21/24] commit-graph: convert remaining function to handle arbitrary repositories

2018-10-30 Thread Stefan Beller
Convert all functions to handle arbitrary repositories in commit-graph.c
that are used by functions taking a repository argument already.

Notable exclusion is write_commit_graph and its local functions as that
only works on the_repository.

Signed-off-by: Stefan Beller 
---
 commit-graph.c | 40 
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..f78a8e96b5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -292,7 +292,8 @@ static int bsearch_graph(struct commit_graph *g, struct 
object_id *oid, uint32_t
g->chunk_oid_lookup, g->hash_len, pos);
 }
 
-static struct commit_list **insert_parent_or_die(struct commit_graph *g,
+static struct commit_list **insert_parent_or_die(struct repository *r,
+struct commit_graph *g,
 uint64_t pos,
 struct commit_list **pptr)
 {
@@ -303,7 +304,7 @@ static struct commit_list **insert_parent_or_die(struct 
commit_graph *g,
die("invalid parent position %"PRIu64, pos);
 
hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
-   c = lookup_commit(the_repository, );
+   c = lookup_commit(r, );
if (!c)
die(_("could not find commit %s"), oid_to_hex());
c->graph_pos = pos;
@@ -317,7 +318,9 @@ static void fill_commit_graph_info(struct commit *item, 
struct commit_graph *g,
item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 }
 
-static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, 
uint32_t pos)
+static int fill_commit_in_graph(struct repository *r,
+   struct commit *item,
+   struct commit_graph *g, uint32_t pos)
 {
uint32_t edge_value;
uint32_t *parent_data_ptr;
@@ -341,13 +344,13 @@ static int fill_commit_in_graph(struct commit *item, 
struct commit_graph *g, uin
edge_value = get_be32(commit_data + g->hash_len);
if (edge_value == GRAPH_PARENT_NONE)
return 1;
-   pptr = insert_parent_or_die(g, edge_value, pptr);
+   pptr = insert_parent_or_die(r, g, edge_value, pptr);
 
edge_value = get_be32(commit_data + g->hash_len + 4);
if (edge_value == GRAPH_PARENT_NONE)
return 1;
if (!(edge_value & GRAPH_OCTOPUS_EDGES_NEEDED)) {
-   pptr = insert_parent_or_die(g, edge_value, pptr);
+   pptr = insert_parent_or_die(r, g, edge_value, pptr);
return 1;
}
 
@@ -355,7 +358,7 @@ static int fill_commit_in_graph(struct commit *item, struct 
commit_graph *g, uin
  4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK));
do {
edge_value = get_be32(parent_data_ptr);
-   pptr = insert_parent_or_die(g,
+   pptr = insert_parent_or_die(r, g,
edge_value & GRAPH_EDGE_LAST_MASK,
pptr);
parent_data_ptr++;
@@ -374,7 +377,9 @@ static int find_commit_in_graph(struct commit *item, struct 
commit_graph *g, uin
}
 }
 
-static int parse_commit_in_graph_one(struct commit_graph *g, struct commit 
*item)
+static int parse_commit_in_graph_one(struct repository *r,
+struct commit_graph *g,
+struct commit *item)
 {
uint32_t pos;
 
@@ -382,7 +387,7 @@ static int parse_commit_in_graph_one(struct commit_graph 
*g, struct commit *item
return 1;
 
if (find_commit_in_graph(item, g, ))
-   return fill_commit_in_graph(item, g, pos);
+   return fill_commit_in_graph(r, item, g, pos);
 
return 0;
 }
@@ -391,7 +396,7 @@ int parse_commit_in_graph(struct repository *r, struct 
commit *item)
 {
if (!prepare_commit_graph(r))
return 0;
-   return parse_commit_in_graph_one(r->objects->commit_graph, item);
+   return parse_commit_in_graph_one(r, r->objects->commit_graph, item);
 }
 
 void load_commit_graph_info(struct repository *r, struct commit *item)
@@ -403,19 +408,22 @@ void load_commit_graph_info(struct repository *r, struct 
commit *item)
fill_commit_graph_info(item, r->objects->commit_graph, pos);
 }
 
-static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit 
*c)
+static struct tree *load_tree_for_commit(struct repository *r,
+struct commit_graph *g,
+struct commit *c)
 {
struct object_id oid;
const unsigned char *commit_data = g->chunk_commit_data +
   GRAPH_DATA_WIDTH * (c->graph_pos);
 
hashcpy(oid.hash, commit_data);
-   

[PATCH 22/24] commit: make free_commit_buffer and release_commit_memory repository agnostic

2018-10-30 Thread Stefan Beller
Pass the object pool to free_commit_buffer and release_commit_memory,
such that we can eliminate access to 'the_repository'.

Also remove the TODO in release_commit_memory, as commit->util was
removed in 9d2c97016f (commit.h: delete 'util' field in struct commit,
2018-05-19)

Signed-off-by: Stefan Beller 
---
 builtin/fsck.c | 3 ++-
 builtin/log.c  | 6 --
 builtin/rev-list.c | 3 ++-
 commit.c   | 9 -
 commit.h   | 4 ++--
 object.c   | 2 +-
 6 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 06eb421720..c476ac6983 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -382,7 +382,8 @@ static int fsck_obj(struct object *obj, void *buffer, 
unsigned long size)
if (obj->type == OBJ_TREE)
free_tree_buffer((struct tree *)obj);
if (obj->type == OBJ_COMMIT)
-   free_commit_buffer((struct commit *)obj);
+   free_commit_buffer(the_repository->parsed_objects,
+  (struct commit *)obj);
return err;
 }
 
diff --git a/builtin/log.c b/builtin/log.c
index 061d4fd864..64c2649c7c 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -395,7 +395,8 @@ static int cmd_log_walk(struct rev_info *rev)
 * We may show a given commit multiple times when
 * walking the reflogs.
 */
-   free_commit_buffer(commit);
+   free_commit_buffer(the_repository->parsed_objects,
+  commit);
free_commit_list(commit->parents);
commit->parents = NULL;
}
@@ -1922,7 +1923,8 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
open_next_file(rev.numbered_files ? NULL : commit, NULL, 
, quiet))
die(_("Failed to create output files"));
shown = log_tree_commit(, commit);
-   free_commit_buffer(commit);
+   free_commit_buffer(the_repository->parsed_objects,
+  commit);
 
/* We put one extra blank line between formatted
 * patches and this flag is used by log-tree code
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5064d08e1b..a8867f0c4b 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -197,7 +197,8 @@ static void finish_commit(struct commit *commit, void *data)
free_commit_list(commit->parents);
commit->parents = NULL;
}
-   free_commit_buffer(commit);
+   free_commit_buffer(the_repository->parsed_objects,
+  commit);
 }
 
 static inline void finish_object__ma(struct object *obj)
diff --git a/commit.c b/commit.c
index 7d2f3a9a93..4fe74aa4bc 100644
--- a/commit.c
+++ b/commit.c
@@ -328,10 +328,10 @@ void repo_unuse_commit_buffer(struct repository *r,
free((void *)buffer);
 }
 
-void free_commit_buffer(struct commit *commit)
+void free_commit_buffer(struct parsed_object_pool *pool, struct commit *commit)
 {
struct commit_buffer *v = buffer_slab_peek(
-   the_repository->parsed_objects->buffer_slab, commit);
+   pool->buffer_slab, commit);
if (v) {
FREE_AND_NULL(v->buffer);
v->size = 0;
@@ -354,13 +354,12 @@ struct object_id *get_commit_tree_oid(const struct commit 
*commit)
return _commit_tree(commit)->object.oid;
 }
 
-void release_commit_memory(struct commit *c)
+void release_commit_memory(struct parsed_object_pool *pool, struct commit *c)
 {
c->maybe_tree = NULL;
c->index = 0;
-   free_commit_buffer(c);
+   free_commit_buffer(pool, c);
free_commit_list(c->parents);
-   /* TODO: what about commit->util? */
 
c->object.parsed = 0;
 }
diff --git a/commit.h b/commit.h
index 2e6b799b26..d2779a23f6 100644
--- a/commit.h
+++ b/commit.h
@@ -140,7 +140,7 @@ void repo_unuse_commit_buffer(struct repository *r,
 /*
  * Free any cached object buffer associated with the commit.
  */
-void free_commit_buffer(struct commit *);
+void free_commit_buffer(struct parsed_object_pool *pool, struct commit *);
 
 struct tree *get_commit_tree(const struct commit *);
 struct object_id *get_commit_tree_oid(const struct commit *);
@@ -149,7 +149,7 @@ struct object_id *get_commit_tree_oid(const struct commit 
*);
  * Release memory related to a commit, including the parent list and
  * any cached object buffer.
  */
-void release_commit_memory(struct commit *c);
+void release_commit_memory(struct parsed_object_pool *pool, struct commit *c);
 
 /*
  * Disassociate any cached object buffer from the commit, but do not free it.
diff --git a/object.c b/object.c
index 003f870484..c4170d2d0c 100644
--- a/object.c
+++ b/object.c
@@ -540,7 +540,7 @@ void parsed_object_pool_clear(struct parsed_object_pool 

[PATCH 13/24] commit-reach: prepare get_merge_bases to handle arbitrary repositories

2018-10-30 Thread Stefan Beller
Similarly to previous patches, the get_merge_base functions are used
often in the code base, which makes migrating them hard.

Implement the new functions, prefixed with 'repo_' and hide the old
functions behind a wrapper macro.

Signed-off-by: Stefan Beller 
---
 commit-reach.c| 24 ++---
 commit-reach.h| 26 +++---
 .../coccinelle/the_repository.pending.cocci   | 27 +++
 3 files changed, 57 insertions(+), 20 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index bf7a513991..3be5526957 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -258,23 +258,27 @@ static struct commit_list *get_merge_bases_many_0(struct 
repository *r,
return result;
 }
 
-struct commit_list *get_merge_bases_many(struct commit *one,
-int n,
-struct commit **twos)
+struct commit_list *repo_get_merge_bases_many(struct repository *r,
+ struct commit *one,
+ int n,
+ struct commit **twos)
 {
-   return get_merge_bases_many_0(the_repository, one, n, twos, 1);
+   return get_merge_bases_many_0(r, one, n, twos, 1);
 }
 
-struct commit_list *get_merge_bases_many_dirty(struct commit *one,
-  int n,
-  struct commit **twos)
+struct commit_list *repo_get_merge_bases_many_dirty(struct repository *r,
+   struct commit *one,
+   int n,
+   struct commit **twos)
 {
-   return get_merge_bases_many_0(the_repository, one, n, twos, 0);
+   return get_merge_bases_many_0(r, one, n, twos, 0);
 }
 
-struct commit_list *get_merge_bases(struct commit *one, struct commit *two)
+struct commit_list *repo_get_merge_bases(struct repository *r,
+struct commit *one,
+struct commit *two)
 {
-   return get_merge_bases_many_0(the_repository, one, 1, , 1);
+   return get_merge_bases_many_0(r, one, 1, , 1);
 }
 
 /*
diff --git a/commit-reach.h b/commit-reach.h
index 122a23a24d..c647447263 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -8,17 +8,23 @@ struct commit_list;
 struct contains_cache;
 struct ref_filter;
 
-struct commit_list *get_merge_bases_many(struct commit *one,
-int n,
-struct commit **twos);
-struct commit_list *get_merge_bases_many_dirty(struct commit *one,
-  int n,
-  struct commit **twos);
-struct commit_list *get_merge_bases(struct commit *one, struct commit *two);
-struct commit_list *get_octopus_merge_bases(struct commit_list *in);
-
+struct commit_list *repo_get_merge_bases(struct repository *r,
+struct commit *rev1,
+struct commit *rev2);
+struct commit_list *repo_get_merge_bases_many(struct repository *r,
+ struct commit *one, int n,
+ struct commit **twos);
 /* To be used only when object flags after this call no longer matter */
-struct commit_list *get_merge_bases_many_dirty(struct commit *one, int n, 
struct commit **twos);
+struct commit_list *repo_get_merge_bases_many_dirty(struct repository *r,
+   struct commit *one, int n,
+   struct commit **twos);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define get_merge_bases(r1, r2)   repo_get_merge_bases(the_repository, 
r1, r2)
+#define get_merge_bases_many(one, n, two) 
repo_get_merge_bases_many(the_repository, one, n, two)
+#define get_merge_bases_many_dirty(one, n, twos) 
repo_get_merge_bases_many_dirty(the_repository, one, n, twos)
+#endif
+
+struct commit_list *get_octopus_merge_bases(struct commit_list *in);
 
 int is_descendant_of(struct commit *commit, struct commit_list *with_commit);
 int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit 
**reference);
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index b185fe0a1d..5e037fe428 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -64,3 +64,30 @@ expression E;
 - parse_commit(
 + repo_parse_commit(the_repository,
   E)
+
+@@
+expression E;
+expression F;
+@@
+- get_merge_bases(
++ repo_get_merge_bases(the_repository,
+  E, F);
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- 

[PATCH 20/24] submodule: don't add submodule as odb for push

2018-10-30 Thread Stefan Beller
In push_submodule(), because we do not actually need access to objects
in the submodule, do not invoke add_submodule_odb().
(for_each_remote_ref_submodule() does not require access to those
objects, and the actual push is done by spawning another process,
which handles object access by itself.)

This code of push_submodule() is exercised in t5531 and continues
to work, showing that the submodule odbc is not needed.

Signed-off-by: Stefan Beller 
---
 submodule.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index 0fdda45252..da1ac1e6f8 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1024,9 +1024,6 @@ static int push_submodule(const char *path,
  const struct string_list *push_options,
  int dry_run)
 {
-   if (add_submodule_odb(path))
-   return 1;
-
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
struct child_process cp = CHILD_PROCESS_INIT;
argv_array_push(, "push");
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 17/24] commit: prepare logmsg_reencode to handle arbitrary repositories

2018-10-30 Thread Stefan Beller
Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 commit.h|  8 
 contrib/coccinelle/the_repository.pending.cocci |  9 +
 pretty.c| 13 +++--
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/commit.h b/commit.h
index 57375e3239..2e6b799b26 100644
--- a/commit.h
+++ b/commit.h
@@ -180,6 +180,14 @@ extern int has_non_ascii(const char *text);
 extern const char *logmsg_reencode(const struct commit *commit,
   char **commit_encoding,
   const char *output_encoding);
+const char *repo_logmsg_reencode(struct repository *r,
+const struct commit *commit,
+char **commit_encoding,
+const char *output_encoding);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define logmsg_reencode(c, enc, out) repo_logmsg_reencode(the_repository, c, 
enc, out)
+#endif
+
 extern const char *skip_blank_lines(const char *msg);
 
 /** Removes the first commit from a list sorted by date, and adds all
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index 516f19ffee..f5b42cfc62 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -123,3 +123,12 @@ expression F;
 - unuse_commit_buffer(
 + repo_unuse_commit_buffer(the_repository,
   E, F);
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- logmsg_reencode(
++ repo_logmsg_reencode(the_repository,
+  E, F, G);
diff --git a/pretty.c b/pretty.c
index 8ca29e9281..b359b68750 100644
--- a/pretty.c
+++ b/pretty.c
@@ -595,14 +595,15 @@ static char *replace_encoding_header(char *buf, const 
char *encoding)
return strbuf_detach(, NULL);
 }
 
-const char *logmsg_reencode(const struct commit *commit,
-   char **commit_encoding,
-   const char *output_encoding)
+const char *repo_logmsg_reencode(struct repository *r,
+const struct commit *commit,
+char **commit_encoding,
+const char *output_encoding)
 {
static const char *utf8 = "UTF-8";
const char *use_encoding;
char *encoding;
-   const char *msg = get_commit_buffer(commit, NULL);
+   const char *msg = repo_get_commit_buffer(r, commit, NULL);
char *out;
 
if (!output_encoding || !*output_encoding) {
@@ -630,7 +631,7 @@ const char *logmsg_reencode(const struct commit *commit,
 * the cached copy from get_commit_buffer, we need to duplicate 
it
 * to avoid munging the cached copy.
 */
-   if (msg == get_cached_commit_buffer(the_repository, commit, 
NULL))
+   if (msg == get_cached_commit_buffer(r, commit, NULL))
out = xstrdup(msg);
else
out = (char *)msg;
@@ -644,7 +645,7 @@ const char *logmsg_reencode(const struct commit *commit,
 */
out = reencode_string(msg, output_encoding, use_encoding);
if (out)
-   unuse_commit_buffer(commit, msg);
+   repo_unuse_commit_buffer(r, commit, msg);
}
 
/*
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 18/24] pretty: prepare format_commit_message to handle arbitrary repositories

2018-10-30 Thread Stefan Beller
Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 contrib/coccinelle/the_repository.pending.cocci | 10 ++
 pretty.c| 15 ---
 pretty.h|  7 ++-
 3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index f5b42cfc62..2ee702ecf7 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -132,3 +132,13 @@ expression G;
 - logmsg_reencode(
 + repo_logmsg_reencode(the_repository,
   E, F, G);
+
+@@
+expression E;
+expression F;
+expression G;
+expression H;
+@@
+- format_commit_message(
++ repo_format_commit_message(the_repository,
+  E, F, G, H);
diff --git a/pretty.c b/pretty.c
index b359b68750..3240495308 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1508,9 +1508,10 @@ void userformat_find_requirements(const char *fmt, 
struct userformat_want *w)
strbuf_release();
 }
 
-void format_commit_message(const struct commit *commit,
-  const char *format, struct strbuf *sb,
-  const struct pretty_print_context *pretty_ctx)
+void repo_format_commit_message(struct repository *r,
+   const struct commit *commit,
+   const char *format, struct strbuf *sb,
+   const struct pretty_print_context *pretty_ctx)
 {
struct format_commit_context context;
const char *output_enc = pretty_ctx->output_encoding;
@@ -1524,9 +1525,9 @@ void format_commit_message(const struct commit *commit,
 * convert a commit message to UTF-8 first
 * as far as 'format_commit_item' assumes it in UTF-8
 */
-   context.message = logmsg_reencode(commit,
- _encoding,
- utf8);
+   context.message = repo_logmsg_reencode(r, commit,
+  _encoding,
+  utf8);
 
strbuf_expand(sb, format, format_commit_item, );
rewrap_message_tail(sb, , 0, 0, 0);
@@ -1550,7 +1551,7 @@ void format_commit_message(const struct commit *commit,
}
 
free(context.commit_encoding);
-   unuse_commit_buffer(commit, context.message);
+   repo_unuse_commit_buffer(r, commit, context.message);
 }
 
 static void pp_header(struct pretty_print_context *pp,
diff --git a/pretty.h b/pretty.h
index 7359d318a9..e6625269cf 100644
--- a/pretty.h
+++ b/pretty.h
@@ -103,9 +103,14 @@ void pp_remainder(struct pretty_print_context *pp, const 
char **msg_p,
  * Put the result to "sb".
  * Please use this function for custom formats.
  */
-void format_commit_message(const struct commit *commit,
+void repo_format_commit_message(struct repository *r,
+   const struct commit *commit,
const char *format, struct strbuf *sb,
const struct pretty_print_context *context);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define format_commit_message(c, f, s, con) \
+   repo_format_commit_message(the_repository, c, f, s, con)
+#endif
 
 /*
  * Parse given arguments from "arg", check it for correctness and
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 19/24] submodule: use submodule repos for object lookup

2018-10-30 Thread Stefan Beller
This converts the 'show_submodule_header' function to use
the repository API properly, such that the submodule objects
are not added to the main object store.

Signed-off-by: Stefan Beller 
---
 submodule.c | 76 ++---
 1 file changed, 61 insertions(+), 15 deletions(-)

diff --git a/submodule.c b/submodule.c
index d9d3046689..0fdda45252 100644
--- a/submodule.c
+++ b/submodule.c
@@ -443,7 +443,7 @@ static int prepare_submodule_summary(struct rev_info *rev, 
const char *path,
return prepare_revision_walk(rev);
 }
 
-static void print_submodule_summary(struct rev_info *rev, struct diff_options 
*o)
+static void print_submodule_summary(struct repository *r, struct rev_info 
*rev, struct diff_options *o)
 {
static const char format[] = "  %m %s";
struct strbuf sb = STRBUF_INIT;
@@ -454,7 +454,8 @@ static void print_submodule_summary(struct rev_info *rev, 
struct diff_options *o
ctx.date_mode = rev->date_mode;
ctx.output_encoding = get_log_output_encoding();
strbuf_setlen(, 0);
-   format_commit_message(commit, format, , );
+   repo_format_commit_message(r, commit, format, ,
+ );
strbuf_addch(, '\n');
if (commit->object.flags & SYMMETRIC_LEFT)
diff_emit_submodule_del(o, sb.buf);
@@ -481,14 +482,47 @@ void prepare_submodule_repo_env(struct argv_array *out)
 DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
-/* Helper function to display the submodule header line prior to the full
- * summary output. If it can locate the submodule objects directory it will
- * attempt to lookup both the left and right commits and put them into the
- * left and right pointers.
+/*
+ * Initialize 'out' based on the provided submodule path.
+ *
+ * Unlike repo_submodule_init, this tolerates submodules not present
+ * in .gitmodules. This function exists only to preserve historical behavior,
+ *
+ * Returns 0 on success, -1 when the submodule is not present.
  */
-static void show_submodule_header(struct diff_options *o, const char *path,
+static struct repository *open_submodule(const char *path)
+{
+   struct strbuf sb = STRBUF_INIT;
+   struct repository *out = xmalloc(sizeof(*out));
+
+   if (submodule_to_gitdir(, path) || repo_init(out, sb.buf, NULL)) {
+   strbuf_release();
+   free(out);
+   return NULL;
+   }
+
+   out->submodule_prefix = xstrfmt("%s%s/",
+   the_repository->submodule_prefix ?
+   the_repository->submodule_prefix :
+   "", path);
+
+   strbuf_release();
+   return out;
+}
+
+/*
+ * Helper function to display the submodule header line prior to the full
+ * summary output.
+ *
+ * If it can locate the submodule git directory it will create a repository
+ * handle for the submodule and lookup both the left and right commits and
+ * put them into the left and right pointers.
+ */
+static void show_submodule_header(struct diff_options *o,
+   const char *path,
struct object_id *one, struct object_id *two,
unsigned dirty_submodule,
+   struct repository *sub,
struct commit **left, struct commit **right,
struct commit_list **merge_bases)
 {
@@ -507,7 +541,7 @@ static void show_submodule_header(struct diff_options *o, 
const char *path,
else if (is_null_oid(two))
message = "(submodule deleted)";
 
-   if (add_submodule_odb(path)) {
+   if (!sub) {
if (!message)
message = "(commits not present)";
goto output_header;
@@ -517,8 +551,8 @@ static void show_submodule_header(struct diff_options *o, 
const char *path,
 * Attempt to lookup the commit references, and determine if this is
 * a fast forward or fast backwards update.
 */
-   *left = lookup_commit_reference(the_repository, one);
-   *right = lookup_commit_reference(the_repository, two);
+   *left = lookup_commit_reference(sub, one);
+   *right = lookup_commit_reference(sub, two);
 
/*
 * Warn about missing commits in the submodule project, but only if
@@ -528,7 +562,7 @@ static void show_submodule_header(struct diff_options *o, 
const char *path,
 (!is_null_oid(two) && !*right))
message = "(commits not present)";
 
-   *merge_bases = get_merge_bases(*left, *right);
+   *merge_bases = repo_get_merge_bases(sub, *left, *right);
if (*merge_bases) {
if ((*merge_bases)->item == *left)
fast_forward = 1;
@@ -562,16 +596,18 @@ void show_submodule_summary(struct diff_options *o, const 
char *path,
struct rev_info rev;
struct commit *left = 

[PATCH 12/24] commit-reach.c: allow get_merge_bases_many_0 to handle arbitrary repositories

2018-10-30 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 commit-reach.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index ab2bb1e5d5..bf7a513991 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -216,7 +216,8 @@ static int remove_redundant(struct repository *r, struct 
commit **array, int cnt
return filled;
 }
 
-static struct commit_list *get_merge_bases_many_0(struct commit *one,
+static struct commit_list *get_merge_bases_many_0(struct repository *r,
+ struct commit *one,
  int n,
  struct commit **twos,
  int cleanup)
@@ -226,7 +227,7 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
struct commit_list *result;
int cnt, i;
 
-   result = merge_bases_many(the_repository, one, n, twos);
+   result = merge_bases_many(r, one, n, twos);
for (i = 0; i < n; i++) {
if (one == twos[i])
return result;
@@ -249,7 +250,7 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
clear_commit_marks(one, all_flags);
clear_commit_marks_many(n, twos, all_flags);
 
-   cnt = remove_redundant(the_repository, rslt, cnt);
+   cnt = remove_redundant(r, rslt, cnt);
result = NULL;
for (i = 0; i < cnt; i++)
commit_list_insert_by_date(rslt[i], );
@@ -261,19 +262,19 @@ struct commit_list *get_merge_bases_many(struct commit 
*one,
 int n,
 struct commit **twos)
 {
-   return get_merge_bases_many_0(one, n, twos, 1);
+   return get_merge_bases_many_0(the_repository, one, n, twos, 1);
 }
 
 struct commit_list *get_merge_bases_many_dirty(struct commit *one,
   int n,
   struct commit **twos)
 {
-   return get_merge_bases_many_0(one, n, twos, 0);
+   return get_merge_bases_many_0(the_repository, one, n, twos, 0);
 }
 
 struct commit_list *get_merge_bases(struct commit *one, struct commit *two)
 {
-   return get_merge_bases_many_0(one, 1, , 1);
+   return get_merge_bases_many_0(the_repository, one, 1, , 1);
 }
 
 /*
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 14/24] commit-reach: prepare in_merge_bases[_many] to handle arbitrary repositories

2018-10-30 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 commit-reach.c  | 15 +--
 commit-reach.h  | 12 ++--
 contrib/coccinelle/the_repository.pending.cocci | 16 
 3 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 3be5526957..88a65f854f 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -312,16 +312,17 @@ int is_descendant_of(struct commit *commit, struct 
commit_list *with_commit)
 /*
  * Is "commit" an ancestor of one of the "references"?
  */
-int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit 
**reference)
+int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
+int nr_reference, struct commit **reference)
 {
struct commit_list *bases;
int ret = 0, i;
uint32_t min_generation = GENERATION_NUMBER_INFINITY;
 
-   if (parse_commit(commit))
+   if (repo_parse_commit(r, commit))
return ret;
for (i = 0; i < nr_reference; i++) {
-   if (parse_commit(reference[i]))
+   if (repo_parse_commit(r, reference[i]))
return ret;
if (reference[i]->generation < min_generation)
min_generation = reference[i]->generation;
@@ -330,7 +331,7 @@ int in_merge_bases_many(struct commit *commit, int 
nr_reference, struct commit *
if (commit->generation > min_generation)
return ret;
 
-   bases = paint_down_to_common(the_repository, commit,
+   bases = paint_down_to_common(r, commit,
 nr_reference, reference,
 commit->generation);
if (commit->object.flags & PARENT2)
@@ -344,9 +345,11 @@ int in_merge_bases_many(struct commit *commit, int 
nr_reference, struct commit *
 /*
  * Is "commit" an ancestor of (i.e. reachable from) the "reference"?
  */
-int in_merge_bases(struct commit *commit, struct commit *reference)
+int repo_in_merge_bases(struct repository *r,
+   struct commit *commit,
+   struct commit *reference)
 {
-   return in_merge_bases_many(commit, 1, );
+   return repo_in_merge_bases_many(r, commit, 1, );
 }
 
 struct commit_list *reduce_heads(struct commit_list *heads)
diff --git a/commit-reach.h b/commit-reach.h
index c647447263..6b16f12467 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -27,8 +27,16 @@ struct commit_list *repo_get_merge_bases_many_dirty(struct 
repository *r,
 struct commit_list *get_octopus_merge_bases(struct commit_list *in);
 
 int is_descendant_of(struct commit *commit, struct commit_list *with_commit);
-int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit 
**reference);
-int in_merge_bases(struct commit *commit, struct commit *reference);
+int repo_in_merge_bases(struct repository *r,
+   struct commit *commit,
+   struct commit *reference);
+int repo_in_merge_bases_many(struct repository *r,
+struct commit *commit,
+int nr_reference, struct commit **reference);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define in_merge_bases(c1, c2) repo_in_merge_bases(the_repository, c1, c2)
+#define in_merge_bases_many(c1, n, cs) 
repo_in_merge_bases_many(the_repository, c1, n, cs)
+#endif
 
 /*
  * Takes a list of commits and returns a new list where those
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index 5e037fe428..8c6a71bf64 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -91,3 +91,19 @@ expression G;
 + repo_get_merge_bases_many_dirty(the_repository,
   E, F, G);
 
+@@
+expression E;
+expression F;
+@@
+- in_merge_bases(
++ repo_in_merge_bases(the_repository,
+  E, F);
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- in_merge_bases_many(
++ repo_in_merge_bases_many(the_repository,
+  E, F, G);
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 16/24] commit: prepare repo_unuse_commit_buffer to handle arbitrary repositories

2018-10-30 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 commit.c| 6 --
 commit.h| 7 ++-
 contrib/coccinelle/the_repository.pending.cocci | 8 
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/commit.c b/commit.c
index 4034def16c..7d2f3a9a93 100644
--- a/commit.c
+++ b/commit.c
@@ -318,10 +318,12 @@ const void *repo_get_commit_buffer(struct repository *r,
return ret;
 }
 
-void unuse_commit_buffer(const struct commit *commit, const void *buffer)
+void repo_unuse_commit_buffer(struct repository *r,
+ const struct commit *commit,
+ const void *buffer)
 {
struct commit_buffer *v = buffer_slab_peek(
-   the_repository->parsed_objects->buffer_slab, commit);
+   r->parsed_objects->buffer_slab, commit);
if (!(v && v->buffer == buffer))
free((void *)buffer);
 }
diff --git a/commit.h b/commit.h
index 591a77a5bb..57375e3239 100644
--- a/commit.h
+++ b/commit.h
@@ -130,7 +130,12 @@ const void *repo_get_commit_buffer(struct repository *r,
  * from an earlier call to get_commit_buffer.  The buffer may or may not be
  * freed by this call; callers should not access the memory afterwards.
  */
-void unuse_commit_buffer(const struct commit *, const void *buffer);
+void repo_unuse_commit_buffer(struct repository *r,
+ const struct commit *,
+ const void *buffer);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define unuse_commit_buffer(c, b) repo_unuse_commit_buffer(the_repository, c, 
b)
+#endif
 
 /*
  * Free any cached object buffer associated with the commit.
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index 4018e6eaf7..516f19ffee 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -115,3 +115,11 @@ expression F;
 - get_commit_buffer(
 + repo_get_commit_buffer(the_repository,
   E, F);
+
+@@
+expression E;
+expression F;
+@@
+- unuse_commit_buffer(
++ repo_unuse_commit_buffer(the_repository,
+  E, F);
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 15/24] commit: prepare get_commit_buffer to handle arbitrary repositories

2018-10-30 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 commit.c| 8 +---
 commit.h| 7 ++-
 contrib/coccinelle/the_repository.pending.cocci | 8 
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/commit.c b/commit.c
index 7a931d7fd4..4034def16c 100644
--- a/commit.c
+++ b/commit.c
@@ -297,13 +297,15 @@ const void *get_cached_commit_buffer(struct repository 
*r, const struct commit *
return v->buffer;
 }
 
-const void *get_commit_buffer(const struct commit *commit, unsigned long 
*sizep)
+const void *repo_get_commit_buffer(struct repository *r,
+  const struct commit *commit,
+  unsigned long *sizep)
 {
-   const void *ret = get_cached_commit_buffer(the_repository, commit, 
sizep);
+   const void *ret = get_cached_commit_buffer(r, commit, sizep);
if (!ret) {
enum object_type type;
unsigned long size;
-   ret = read_object_file(>object.oid, , );
+   ret = repo_read_object_file(r, >object.oid, , 
);
if (!ret)
die("cannot read commit object %s",
oid_to_hex(>object.oid));
diff --git a/commit.h b/commit.h
index 08935f9a19..591a77a5bb 100644
--- a/commit.h
+++ b/commit.h
@@ -117,7 +117,12 @@ const void *get_cached_commit_buffer(struct repository *, 
const struct commit *,
  * from disk. The resulting memory should not be modified, and must be given
  * to unuse_commit_buffer when the caller is done.
  */
-const void *get_commit_buffer(const struct commit *, unsigned long *size);
+const void *repo_get_commit_buffer(struct repository *r,
+  const struct commit *,
+  unsigned long *size);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define get_commit_buffer(c, s) repo_get_commit_buffer(the_repository, c, s)
+#endif
 
 /*
  * Tell the commit subsytem that we are done with a particular commit buffer.
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index 8c6a71bf64..4018e6eaf7 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -107,3 +107,11 @@ expression G;
 - in_merge_bases_many(
 + repo_in_merge_bases_many(the_repository,
   E, F, G);
+
+@@
+expression E;
+expression F;
+@@
+- get_commit_buffer(
++ repo_get_commit_buffer(the_repository,
+  E, F);
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 24/24] t/helper/test-repository: celebrate independence from the_repository

2018-10-30 Thread Stefan Beller
dade47c06c (commit-graph: add repo arg to graph readers, 2018-07-11)
brought more independence from the_repository to the commit graph, however
it was not completely independent of the_repository, as the previous
patches show.

To ensure we're not accessing the_repository by accident, we'd ideally
assign NULL to the_repository to trigger a segfault on access.

We currently have a temporary hack in cache.h, which relies on
the_hash_algo (which is a short form of the_repository->hash_algo) to
be set, so we cannot do that. The next best thing is to set all fields of
the_repository to 0, so any accidental access is more likely to be found.

Signed-off-by: Stefan Beller 
---
 cache.h|  2 ++
 t/helper/test-repository.c | 10 ++
 2 files changed, 12 insertions(+)

diff --git a/cache.h b/cache.h
index f7fabdde8f..9f535040af 100644
--- a/cache.h
+++ b/cache.h
@@ -1035,6 +1035,8 @@ static inline int hashcmp(const unsigned char *sha1, 
const unsigned char *sha2)
 *
 * This will need to be extended or ripped out when we learn about
 * hashes of different sizes.
+*
+* When ripping this out, see TODO in test-repository.c.
 */
if (the_hash_algo->rawsz != 20)
BUG("hash size not yet supported by hashcmp");
diff --git a/t/helper/test-repository.c b/t/helper/test-repository.c
index 6a84a53efb..f7f8618445 100644
--- a/t/helper/test-repository.c
+++ b/t/helper/test-repository.c
@@ -17,6 +17,11 @@ static void test_parse_commit_in_graph(const char *gitdir, 
const char *worktree,
 
setup_git_env(gitdir);
 
+   memset(the_repository, 0, sizeof(*the_repository));
+
+   /* TODO: Needed for temporary hack in hashcmp, see 183a638b7da. */
+   repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
if (repo_init(, gitdir, worktree))
die("Couldn't init repo");
 
@@ -43,6 +48,11 @@ static void test_get_commit_tree_in_graph(const char *gitdir,
 
setup_git_env(gitdir);
 
+   memset(the_repository, 0, sizeof(*the_repository));
+
+   /* TODO: Needed for temporary hack in hashcmp, see 183a638b7da. */
+   repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
if (repo_init(, gitdir, worktree))
die("Couldn't init repo");
 
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 23/24] path.h: make REPO_GIT_PATH_FUNC repository agnostic

2018-10-30 Thread Stefan Beller
git_pathdup uses the_repository internally, but the macro
REPO_GIT_PATH_FUNC is specifically made for arbitrary repositories.
Switch to repo_git_path which works on arbitrary repositories.

Signed-off-by: Stefan Beller 
---
 path.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/path.h b/path.h
index b654ea8ff5..651e6157fc 100644
--- a/path.h
+++ b/path.h
@@ -165,7 +165,7 @@ extern void report_linked_checkout_garbage(void);
const char *git_path_##var(struct repository *r) \
{ \
if (!r->cached_paths.var) \
-   r->cached_paths.var = git_pathdup(filename); \
+   r->cached_paths.var = repo_git_path(r, filename); \
return r->cached_paths.var; \
}
 
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 10/24] commit-reach.c: allow merge_bases_many to handle arbitrary repositories

2018-10-30 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 commit-reach.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 080ae0a758..4cf471bfaf 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -95,7 +95,9 @@ static struct commit_list *paint_down_to_common(struct 
repository *r,
return result;
 }
 
-static struct commit_list *merge_bases_many(struct commit *one, int n, struct 
commit **twos)
+static struct commit_list *merge_bases_many(struct repository *r,
+   struct commit *one, int n,
+   struct commit **twos)
 {
struct commit_list *list = NULL;
struct commit_list *result = NULL;
@@ -110,14 +112,14 @@ static struct commit_list *merge_bases_many(struct commit 
*one, int n, struct co
return commit_list_insert(one, );
}
 
-   if (parse_commit(one))
+   if (repo_parse_commit(r, one))
return NULL;
for (i = 0; i < n; i++) {
-   if (parse_commit(twos[i]))
+   if (repo_parse_commit(r, twos[i]))
return NULL;
}
 
-   list = paint_down_to_common(the_repository, one, n, twos, 0);
+   list = paint_down_to_common(r, one, n, twos, 0);
 
while (list) {
struct commit *commit = pop_commit();
@@ -224,7 +226,7 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
struct commit_list *result;
int cnt, i;
 
-   result = merge_bases_many(one, n, twos);
+   result = merge_bases_many(the_repository, one, n, twos);
for (i = 0; i < n; i++) {
if (one == twos[i])
return result;
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 04/24] object-store: allow read_object_file_extended to read from arbitrary repositories

2018-10-30 Thread Stefan Beller
read_object_file_extended is not widely used, so migrate it all at once.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 object-store.h |  5 +++--
 sha1-file.c| 11 ++-
 streaming.c|  2 +-
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/object-store.h b/object-store.h
index 63b7605a3e..3d98a682b2 100644
--- a/object-store.h
+++ b/object-store.h
@@ -161,12 +161,13 @@ void sha1_file_name(struct repository *r, struct strbuf 
*buf, const unsigned cha
 
 void *map_sha1_file(struct repository *r, const unsigned char *sha1, unsigned 
long *size);
 
-extern void *read_object_file_extended(const struct object_id *oid,
+extern void *read_object_file_extended(struct repository *r,
+  const struct object_id *oid,
   enum object_type *type,
   unsigned long *size, int lookup_replace);
 static inline void *read_object_file(const struct object_id *oid, enum 
object_type *type, unsigned long *size)
 {
-   return read_object_file_extended(oid, type, size, 1);
+   return read_object_file_extended(the_repository, oid, type, size, 1);
 }
 
 /* Read and unpack an object file into memory, write memory to an object file 
*/
diff --git a/sha1-file.c b/sha1-file.c
index 856e000ee1..c5b704aec5 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1403,7 +1403,8 @@ int pretend_object_file(void *buf, unsigned long len, 
enum object_type type,
  * deal with them should arrange to call read_object() and give error
  * messages themselves.
  */
-void *read_object_file_extended(const struct object_id *oid,
+void *read_object_file_extended(struct repository *r,
+   const struct object_id *oid,
enum object_type *type,
unsigned long *size,
int lookup_replace)
@@ -1413,10 +1414,10 @@ void *read_object_file_extended(const struct object_id 
*oid,
const char *path;
struct stat st;
const struct object_id *repl = lookup_replace ?
-   lookup_replace_object(the_repository, oid) : oid;
+   lookup_replace_object(r, oid) : oid;
 
errno = 0;
-   data = read_object(the_repository, repl->hash, type, size);
+   data = read_object(r, repl->hash, type, size);
if (data)
return data;
 
@@ -1428,11 +1429,11 @@ void *read_object_file_extended(const struct object_id 
*oid,
die(_("replacement %s not found for %s"),
oid_to_hex(repl), oid_to_hex(oid));
 
-   if (!stat_sha1_file(the_repository, repl->hash, , ))
+   if (!stat_sha1_file(r, repl->hash, , ))
die(_("loose object %s (stored in %s) is corrupt"),
oid_to_hex(repl), path);
 
-   if ((p = has_packed_and_bad(the_repository, repl->hash)) != NULL)
+   if ((p = has_packed_and_bad(r, repl->hash)) != NULL)
die(_("packed object %s (stored in %s) is corrupt"),
oid_to_hex(repl), p->pack_name);
 
diff --git a/streaming.c b/streaming.c
index d1e6b2dce6..c843a1230f 100644
--- a/streaming.c
+++ b/streaming.c
@@ -490,7 +490,7 @@ static struct stream_vtbl incore_vtbl = {
 
 static open_method_decl(incore)
 {
-   st->u.incore.buf = read_object_file_extended(oid, type, >size, 0);
+   st->u.incore.buf = read_object_file_extended(the_repository, oid, type, 
>size, 0);
st->u.incore.read_ptr = 0;
st->vtbl = _vtbl;
 
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 03/24] packfile: allow has_packed_and_bad to handle arbitrary repositories

2018-10-30 Thread Stefan Beller
has_packed_and_bad is not widely used, so just migrate it all at once.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 packfile.c  | 5 +++--
 packfile.h  | 2 +-
 sha1-file.c | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/packfile.c b/packfile.c
index f2850a00b5..63f8dc7b98 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1138,12 +1138,13 @@ void mark_bad_packed_object(struct packed_git *p, const 
unsigned char *sha1)
p->num_bad_objects++;
 }
 
-const struct packed_git *has_packed_and_bad(const unsigned char *sha1)
+const struct packed_git *has_packed_and_bad(struct repository *r,
+   const unsigned char *sha1)
 {
struct packed_git *p;
unsigned i;
 
-   for (p = the_repository->objects->packed_git; p; p = p->next)
+   for (p = r->objects->packed_git; p; p = p->next)
for (i = 0; i < p->num_bad_objects; i++)
if (hasheq(sha1,
   p->bad_object_sha1 + the_hash_algo->rawsz * 
i))
diff --git a/packfile.h b/packfile.h
index 6c4037605d..d70c6d9afb 100644
--- a/packfile.h
+++ b/packfile.h
@@ -146,7 +146,7 @@ extern int packed_object_info(struct repository *r,
  off_t offset, struct object_info *);
 
 extern void mark_bad_packed_object(struct packed_git *p, const unsigned char 
*sha1);
-extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1);
+extern const struct packed_git *has_packed_and_bad(struct repository *r, const 
unsigned char *sha1);
 
 /*
  * Iff a pack file in the given repository contains the object named by sha1,
diff --git a/sha1-file.c b/sha1-file.c
index b8ce21cbaf..856e000ee1 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1432,7 +1432,7 @@ void *read_object_file_extended(const struct object_id 
*oid,
die(_("loose object %s (stored in %s) is corrupt"),
oid_to_hex(repl), path);
 
-   if ((p = has_packed_and_bad(repl->hash)) != NULL)
+   if ((p = has_packed_and_bad(the_repository, repl->hash)) != NULL)
die(_("packed object %s (stored in %s) is corrupt"),
oid_to_hex(repl), p->pack_name);
 
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 02/24] sha1_file: allow read_object to read objects in arbitrary repositories

2018-10-30 Thread Stefan Beller
Allow read_object (a file local functon in sha1_file) to
handle arbitrary repositories by passing the repository down
to oid_object_info_extended.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 sha1-file.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/sha1-file.c b/sha1-file.c
index dd0b6aa873..b8ce21cbaf 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1361,7 +1361,9 @@ int oid_object_info(struct repository *r,
return type;
 }
 
-static void *read_object(const unsigned char *sha1, enum object_type *type,
+static void *read_object(struct repository *r,
+const unsigned char *sha1,
+enum object_type *type,
 unsigned long *size)
 {
struct object_id oid;
@@ -1373,7 +1375,7 @@ static void *read_object(const unsigned char *sha1, enum 
object_type *type,
 
hashcpy(oid.hash, sha1);
 
-   if (oid_object_info_extended(the_repository, , , 0) < 0)
+   if (oid_object_info_extended(r, , , 0) < 0)
return NULL;
return content;
 }
@@ -1414,7 +1416,7 @@ void *read_object_file_extended(const struct object_id 
*oid,
lookup_replace_object(the_repository, oid) : oid;
 
errno = 0;
-   data = read_object(repl->hash, type, size);
+   data = read_object(the_repository, repl->hash, type, size);
if (data)
return data;
 
@@ -1755,7 +1757,7 @@ int force_object_loose(const struct object_id *oid, 
time_t mtime)
 
if (has_loose_object(oid))
return 0;
-   buf = read_object(oid->hash, , );
+   buf = read_object(the_repository, oid->hash, , );
if (!buf)
return error(_("cannot read sha1_file for %s"), 
oid_to_hex(oid));
hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", type_name(type), len) + 
1;
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 11/24] commit-reach.c: allow remove_redundant to handle arbitrary repositories

2018-10-30 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 commit-reach.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 4cf471bfaf..ab2bb1e5d5 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -156,7 +156,7 @@ struct commit_list *get_octopus_merge_bases(struct 
commit_list *in)
return ret;
 }
 
-static int remove_redundant(struct commit **array, int cnt)
+static int remove_redundant(struct repository *r, struct commit **array, int 
cnt)
 {
/*
 * Some commit in the array may be an ancestor of
@@ -174,7 +174,7 @@ static int remove_redundant(struct commit **array, int cnt)
ALLOC_ARRAY(filled_index, cnt - 1);
 
for (i = 0; i < cnt; i++)
-   parse_commit(array[i]);
+   repo_parse_commit(r, array[i]);
for (i = 0; i < cnt; i++) {
struct commit_list *common;
uint32_t min_generation = array[i]->generation;
@@ -190,7 +190,7 @@ static int remove_redundant(struct commit **array, int cnt)
if (array[j]->generation < min_generation)
min_generation = array[j]->generation;
}
-   common = paint_down_to_common(the_repository, array[i], filled,
+   common = paint_down_to_common(r, array[i], filled,
  work, min_generation);
if (array[i]->object.flags & PARENT2)
redundant[i] = 1;
@@ -249,7 +249,7 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
clear_commit_marks(one, all_flags);
clear_commit_marks_many(n, twos, all_flags);
 
-   cnt = remove_redundant(rslt, cnt);
+   cnt = remove_redundant(the_repository, rslt, cnt);
result = NULL;
for (i = 0; i < cnt; i++)
commit_list_insert_by_date(rslt[i], );
@@ -370,7 +370,7 @@ struct commit_list *reduce_heads(struct commit_list *heads)
p->item->object.flags &= ~STALE;
}
}
-   num_head = remove_redundant(array, num_head);
+   num_head = remove_redundant(the_repository, array, num_head);
for (i = 0; i < num_head; i++)
tail = _list_insert(array[i], tail)->next;
free(array);
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 07/24] object: parse_object to honor its repository argument

2018-10-30 Thread Stefan Beller
In 8e4b0b6047 (object.c: allow parse_object to handle
arbitrary repositories, 2018-06-28), we forgot to pass the
repository down to the read_object_file.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 object.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/object.c b/object.c
index e54160550c..003f870484 100644
--- a/object.c
+++ b/object.c
@@ -259,8 +259,8 @@ struct object *parse_object(struct repository *r, const 
struct object_id *oid)
if (obj && obj->parsed)
return obj;
 
-   if ((obj && obj->type == OBJ_BLOB && has_object_file(oid)) ||
-   (!obj && has_object_file(oid) &&
+   if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
+   (!obj && repo_has_object_file(r, oid) &&
 oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
if (check_object_signature(repl, NULL, 0, NULL) < 0) {
error(_("sha1 mismatch %s"), oid_to_hex(oid));
@@ -270,7 +270,7 @@ struct object *parse_object(struct repository *r, const 
struct object_id *oid)
return lookup_object(r, oid->hash);
}
 
-   buffer = read_object_file(oid, , );
+   buffer = repo_read_object_file(r, oid, , );
if (buffer) {
if (check_object_signature(repl, buffer, size, type_name(type)) 
< 0) {
free(buffer);
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 05/24] object-store: prepare read_object_file to deal with arbitrary repositories

2018-10-30 Thread Stefan Beller
As read_object_file is a widely used function (which is also regularly used
in new code in flight between master..pu), changing its signature is painful
is hard, as other series in flight rely on the original signature. It would
burden the maintainer if we'd just change the signature.

Introduce repo_read_object_file which takes the repository argument, and
hide the original read_object_file as a macro behind
NO_THE_REPOSITORY_COMPATIBILITY_MACROS, similar to
e675765235 (diff.c: remove implicit dependency on the_index, 2018-09-21)

Add a coccinelle patch to convert existing callers, but do not apply
the resulting patch from 'make coccicheck' to keep the diff of this
patch small.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 contrib/coccinelle/the_repository.pending.cocci | 13 +
 object-store.h  | 10 --
 2 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 contrib/coccinelle/the_repository.pending.cocci

diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
new file mode 100644
index 00..3c7fa70502
--- /dev/null
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -0,0 +1,13 @@
+// This file is used for the ongoing refactoring of
+// bringing the index or repository struct in all of
+// our code base.
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- read_object_file(
++ repo_read_object_file(the_repository,
+  E, F, G)
+
diff --git a/object-store.h b/object-store.h
index 3d98a682b2..00a64622e6 100644
--- a/object-store.h
+++ b/object-store.h
@@ -165,10 +165,16 @@ extern void *read_object_file_extended(struct repository 
*r,
   const struct object_id *oid,
   enum object_type *type,
   unsigned long *size, int lookup_replace);
-static inline void *read_object_file(const struct object_id *oid, enum 
object_type *type, unsigned long *size)
+static inline void *repo_read_object_file(struct repository *r,
+ const struct object_id *oid,
+ enum object_type *type,
+ unsigned long *size)
 {
-   return read_object_file_extended(the_repository, oid, type, size, 1);
+   return read_object_file_extended(r, oid, type, size, 1);
 }
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define read_object_file(oid, type, size) 
repo_read_object_file(the_repository, oid, type, size)
+#endif
 
 /* Read and unpack an object file into memory, write memory to an object file 
*/
 int oid_object_info(struct repository *r, const struct object_id *, unsigned 
long *);
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 08/24] commit: allow parse_commit* to handle arbitrary repositories

2018-10-30 Thread Stefan Beller
Just like the previous commit, parse_commit and friends are used a lot
and are found in new patches, so we cannot change their signature easily.

Re-introduce these function prefixed with 'repo_' that take a repository
argument and keep the original as a shallow macro.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 commit.c  | 18 --
 commit.h  | 17 +
 .../coccinelle/the_repository.pending.cocci   | 24 +++
 3 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/commit.c b/commit.c
index dc8a39d52a..7a931d7fd4 100644
--- a/commit.c
+++ b/commit.c
@@ -443,7 +443,10 @@ int parse_commit_buffer(struct repository *r, struct 
commit *item, const void *b
return 0;
 }
 
-int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph)
+int repo_parse_commit_internal(struct repository *r,
+  struct commit *item,
+  int quiet_on_missing,
+  int use_commit_graph)
 {
enum object_type type;
void *buffer;
@@ -454,9 +457,9 @@ int parse_commit_internal(struct commit *item, int 
quiet_on_missing, int use_com
return -1;
if (item->object.parsed)
return 0;
-   if (use_commit_graph && parse_commit_in_graph(the_repository, item))
+   if (use_commit_graph && parse_commit_in_graph(r, item))
return 0;
-   buffer = read_object_file(>object.oid, , );
+   buffer = repo_read_object_file(r, >object.oid, , );
if (!buffer)
return quiet_on_missing ? -1 :
error("Could not read %s",
@@ -467,18 +470,19 @@ int parse_commit_internal(struct commit *item, int 
quiet_on_missing, int use_com
 oid_to_hex(>object.oid));
}
 
-   ret = parse_commit_buffer(the_repository, item, buffer, size, 0);
+   ret = parse_commit_buffer(r, item, buffer, size, 0);
if (save_commit_buffer && !ret) {
-   set_commit_buffer(the_repository, item, buffer, size);
+   set_commit_buffer(r, item, buffer, size);
return 0;
}
free(buffer);
return ret;
 }
 
-int parse_commit_gently(struct commit *item, int quiet_on_missing)
+int repo_parse_commit_gently(struct repository *r,
+struct commit *item, int quiet_on_missing)
 {
-   return parse_commit_internal(item, quiet_on_missing, 1);
+   return repo_parse_commit_internal(r, item, quiet_on_missing, 1);
 }
 
 void parse_commit_or_die(struct commit *item)
diff --git a/commit.h b/commit.h
index 1d260d62f5..08935f9a19 100644
--- a/commit.h
+++ b/commit.h
@@ -79,12 +79,21 @@ struct commit *lookup_commit_reference_by_name(const char 
*name);
 struct commit *lookup_commit_or_die(const struct object_id *oid, const char 
*ref_name);
 
 int parse_commit_buffer(struct repository *r, struct commit *item, const void 
*buffer, unsigned long size, int check_graph);
-int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph);
-int parse_commit_gently(struct commit *item, int quiet_on_missing);
-static inline int parse_commit(struct commit *item)
+int repo_parse_commit_internal(struct repository *r, struct commit *item,
+  int quiet_on_missing, int use_commit_graph);
+int repo_parse_commit_gently(struct repository *r,
+struct commit *item,
+int quiet_on_missing);
+static inline int repo_parse_commit(struct repository *r, struct commit *item)
 {
-   return parse_commit_gently(item, 0);
+   return repo_parse_commit_gently(r, item, 0);
 }
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define parse_commit_internal(item, quiet, use) 
repo_parse_commit_internal(the_repository, item, quiet, use)
+#define parse_commit_gently(item, quiet) 
repo_parse_commit_gently(the_repository, item, quiet)
+#define parse_commit(item) repo_parse_commit(the_repository, item)
+#endif
+
 void parse_commit_or_die(struct commit *item);
 
 struct buffer_slab;
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index 46f3a1b23a..b185fe0a1d 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -40,3 +40,27 @@ expression F;
 - has_object_file_with_flags(
 + repo_has_object_file_with_flags(the_repository,
   E)
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- parse_commit_internal(
++ repo_parse_commit_internal(the_repository,
+  E, F, G)
+
+@@
+expression E;
+expression F;
+@@
+- parse_commit_gently(
++ repo_parse_commit_gently(the_repository,
+  E, F)
+
+@@
+expression E;
+@@
+- parse_commit(
++ repo_parse_commit(the_repository,
+  E)
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 06/24] object-store: prepare has_{sha1, object}_file[_with_flags] to handle arbitrary repositories

2018-10-30 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 .../coccinelle/the_repository.pending.cocci   | 29 +++
 object-store.h| 22 ++
 sha1-file.c   | 15 ++
 3 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index 3c7fa70502..46f3a1b23a 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -11,3 +11,32 @@ expression G;
 + repo_read_object_file(the_repository,
   E, F, G)
 
+@@
+expression E;
+@@
+- has_sha1_file(
++ repo_has_sha1_file(the_repository,
+  E)
+
+@@
+expression E;
+expression F;
+@@
+- has_sha1_file_with_flags(
++ repo_has_sha1_file_with_flags(the_repository,
+  E)
+
+@@
+expression E;
+@@
+- has_object_file(
++ repo_has_object_file(the_repository,
+  E)
+
+@@
+expression E;
+expression F;
+@@
+- has_object_file_with_flags(
++ repo_has_object_file_with_flags(the_repository,
+  E)
diff --git a/object-store.h b/object-store.h
index 00a64622e6..2b5e6ff1ed 100644
--- a/object-store.h
+++ b/object-store.h
@@ -212,15 +212,27 @@ int read_loose_object(const char *path,
  * object_info. OBJECT_INFO_SKIP_CACHED is automatically set; pass
  * nonzero flags to also set other flags.
  */
-extern int has_sha1_file_with_flags(const unsigned char *sha1, int flags);
-static inline int has_sha1_file(const unsigned char *sha1)
+int repo_has_sha1_file_with_flags(struct repository *r,
+ const unsigned char *sha1, int flags);
+static inline int repo_has_sha1_file(struct repository *r,
+const unsigned char *sha1)
 {
-   return has_sha1_file_with_flags(sha1, 0);
+   return repo_has_sha1_file_with_flags(r, sha1, 0);
 }
 
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define has_sha1_file_with_flags(sha1, flags) 
repo_has_sha1_file_with_flags(the_repository, sha1, flags)
+#define has_sha1_file(sha1) repo_has_sha1_file(the_repository, sha1)
+#endif
+
 /* Same as the above, except for struct object_id. */
-extern int has_object_file(const struct object_id *oid);
-extern int has_object_file_with_flags(const struct object_id *oid, int flags);
+int repo_has_object_file(struct repository *r, const struct object_id *oid);
+int repo_has_object_file_with_flags(struct repository *r,
+   const struct object_id *oid, int flags);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define has_object_file(oid) repo_has_object_file(the_repository, oid)
+#define has_object_file_with_flags(oid, flags) 
repo_has_object_file_with_flags(the_repository, oid, flags)
+#endif
 
 /*
  * Return true iff an alternate object database has a loose object
diff --git a/sha1-file.c b/sha1-file.c
index c5b704aec5..e77273ccfd 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1768,24 +1768,27 @@ int force_object_loose(const struct object_id *oid, 
time_t mtime)
return ret;
 }
 
-int has_sha1_file_with_flags(const unsigned char *sha1, int flags)
+int repo_has_sha1_file_with_flags(struct repository *r,
+ const unsigned char *sha1, int flags)
 {
struct object_id oid;
if (!startup_info->have_repository)
return 0;
hashcpy(oid.hash, sha1);
-   return oid_object_info_extended(the_repository, , NULL,
+   return oid_object_info_extended(r, , NULL,
flags | OBJECT_INFO_SKIP_CACHED) >= 0;
 }
 
-int has_object_file(const struct object_id *oid)
+int repo_has_object_file(struct repository *r,
+const struct object_id *oid)
 {
-   return has_sha1_file(oid->hash);
+   return repo_has_sha1_file(r, oid->hash);
 }
 
-int has_object_file_with_flags(const struct object_id *oid, int flags)
+int repo_has_object_file_with_flags(struct repository *r,
+   const struct object_id *oid, int flags)
 {
-   return has_sha1_file_with_flags(oid->hash, flags);
+   return repo_has_sha1_file_with_flags(r, oid->hash, flags);
 }
 
 static void check_tree(const void *buf, size_t size)
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCH 09/24] commit-reach.c: allow paint_down_to_common to handle arbitrary repositories

2018-10-30 Thread Stefan Beller
As the function is file local and not widely used, migrate it all at once.

Signed-off-by: Stefan Beller 
---
 commit-reach.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index a9da65c462..080ae0a758 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -30,7 +30,8 @@ static int queue_has_nonstale(struct prio_queue *queue)
 }
 
 /* all input commits in one and twos[] must have been parsed! */
-static struct commit_list *paint_down_to_common(struct commit *one, int n,
+static struct commit_list *paint_down_to_common(struct repository *r,
+   struct commit *one, int n,
struct commit **twos,
int min_generation)
 {
@@ -83,7 +84,7 @@ static struct commit_list *paint_down_to_common(struct commit 
*one, int n,
parents = parents->next;
if ((p->object.flags & flags) == flags)
continue;
-   if (parse_commit(p))
+   if (repo_parse_commit(r, p))
return NULL;
p->object.flags |= flags;
prio_queue_put(, p);
@@ -116,7 +117,7 @@ static struct commit_list *merge_bases_many(struct commit 
*one, int n, struct co
return NULL;
}
 
-   list = paint_down_to_common(one, n, twos, 0);
+   list = paint_down_to_common(the_repository, one, n, twos, 0);
 
while (list) {
struct commit *commit = pop_commit();
@@ -187,8 +188,8 @@ static int remove_redundant(struct commit **array, int cnt)
if (array[j]->generation < min_generation)
min_generation = array[j]->generation;
}
-   common = paint_down_to_common(array[i], filled, work,
- min_generation);
+   common = paint_down_to_common(the_repository, array[i], filled,
+ work, min_generation);
if (array[i]->object.flags & PARENT2)
redundant[i] = 1;
for (j = 0; j < filled; j++)
@@ -322,7 +323,9 @@ int in_merge_bases_many(struct commit *commit, int 
nr_reference, struct commit *
if (commit->generation > min_generation)
return ret;
 
-   bases = paint_down_to_common(commit, nr_reference, reference, 
commit->generation);
+   bases = paint_down_to_common(the_repository, commit,
+nr_reference, reference,
+commit->generation);
if (commit->object.flags & PARENT2)
ret = 1;
clear_commit_marks(commit, all_flags);
-- 
2.19.1.930.g4563a0d9d0-goog



[PATCHv2 00/24] Bring more repository handles into our code base]

2018-10-30 Thread Stefan Beller
This resends sb/more-repo-in-api

On Thu, Oct 25, 2018 at 2:14 AM SZEDER Gábor  wrote:
> On Tue, Oct 16, 2018 at 04:35:49PM -0700, Stefan Beller wrote:
> > This converts the 'show_submodule_header' function to use
> > the repository API properly, such that the submodule objects
> > are not added to the main object store.
>
> This patch breaks the test suite with 'GIT_TEST_COMMIT_GRAPH=1', in
> particular 't4041-diff-submodule-option.sh' fails with:
> [...]

I debugged into this and the last four patches will fix it.

I also picked up the patch for pending semantic patches, as the
first patch, can I have your sign off, please?

This also addresses Jonathans feedback in
https://public-inbox.org/git/20181019203750.110741-1-jonathanta...@google.com/

A range-diff is below.

Thanks,
Stefan

SZEDER Gábor (1):
  Makefile: add pending semantic patches

Stefan Beller (23):
  sha1_file: allow read_object to read objects in arbitrary repositories
  packfile: allow has_packed_and_bad to handle arbitrary repositories
  object-store: allow read_object_file_extended to read from arbitrary
repositories
  object-store: prepare read_object_file to deal with arbitrary
repositories
  object-store: prepare has_{sha1, object}_file[_with_flags] to handle
arbitrary repositories
  object: parse_object to honor its repository argument
  commit: allow parse_commit* to handle arbitrary repositories
  commit-reach.c: allow paint_down_to_common to handle arbitrary
repositories
  commit-reach.c: allow merge_bases_many to handle arbitrary
repositories
  commit-reach.c: allow remove_redundant to handle arbitrary
repositories
  commit-reach.c: allow get_merge_bases_many_0 to handle arbitrary
repositories
  commit-reach: prepare get_merge_bases to handle arbitrary repositories
  commit-reach: prepare in_merge_bases[_many] to handle arbitrary
repositories
  commit: prepare get_commit_buffer to handle arbitrary repositories
  commit: prepare repo_unuse_commit_buffer to handle arbitrary
repositories
  commit: prepare logmsg_reencode to handle arbitrary repositories
  pretty: prepare format_commit_message to handle arbitrary repositories
  submodule: use submodule repos for object lookup
  submodule: don't add submodule as odb for push
  commit-graph: convert remaining function to handle arbitrary
repositories
  commit: make free_commit_buffer and release_commit_memory repository
agnostic
  path.h: make REPO_GIT_PATH_FUNC repository agnostic
  t/helper/test-repository: celebrate independence from the_repository

 Makefile  |   6 +-
 builtin/fsck.c|   3 +-
 builtin/log.c |   6 +-
 builtin/rev-list.c|   3 +-
 cache.h   |   2 +
 commit-graph.c|  40 +++--
 commit-reach.c|  73 +
 commit-reach.h|  38 +++--
 commit.c  |  41 ++---
 commit.h  |  43 +-
 .../coccinelle/the_repository.pending.cocci   | 144 ++
 object-store.h|  35 -
 object.c  |   8 +-
 packfile.c|   5 +-
 packfile.h|   2 +-
 path.h|   2 +-
 pretty.c  |  28 ++--
 pretty.h  |   7 +-
 sha1-file.c   |  34 +++--
 streaming.c   |   2 +-
 submodule.c   |  79 +++---
 t/helper/test-repository.c|  10 ++
 22 files changed, 459 insertions(+), 152 deletions(-)
 create mode 100644 contrib/coccinelle/the_repository.pending.cocci

git range-diff origin/sb/more-repo-in-api...
[...] # rebased to origin/master
  -:  -- > 116:  4ede3d42df Seventh batch for 2.20
  -:  -- > 117:  b1de196491 Makefile: add pending semantic patches
  1:  1b9b5c695e = 118:  f1be5eb487 sha1_file: allow read_object to read 
objects in arbitrary repositories
  2:  33b94066f2 = 119:  9100a5705d packfile: allow has_packed_and_bad to 
handle arbitrary repositories
  3:  5217b6b1e1 = 120:  a4ad7791da object-store: allow 
read_object_file_extended to read from arbitrary repositories
  4:  2b7239b55b ! 121:  5d7b3a6dd9 object-store: prepare read_object_file to 
deal with arbitrary repositories
@@ -19,10 +19,10 @@
 Signed-off-by: Stefan Beller 
 Signed-off-by: Junio C Hamano 
 
- diff --git a/contrib/coccinelle/the_repository.cocci 
b/contrib/coccinelle/the_repository.cocci
+ diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
  new file mode 100644
  --- 

[PATCH 01/24] Makefile: add pending semantic patches

2018-10-30 Thread Stefan Beller
From: SZEDER Gábor 

There are basically two main use cases for semantic patches:

  - To avoid undesirable code patterns, e.g. we should not use
sha1_to_hex(oid.hash) or strbuf_addf(, "fixed string"), but
use oid_to_hex() or strbuf_addstr(, "fixed string")
instead.  Note that in these cases we don't remove the
functions sha1_to_hex() or strbuf_addf(), because there are
good reasons to use them in other scenarios.

Our semantic patches under 'contrib/coccinelle/' fall into
this category, and we have 'make coccicheck' and the static
analysis build job on Travis CI to catch these undesirable
code patterns preferably early, and to prevent them from
entering our codebase.

  - To perform one-off code transformations, e.g. to modify a
function's name and/or signature and convert all its
callsites; see e.g. commits abef9020e3 (sha1_file: convert
sha1_object_info* to object_id, 2018-03-12) and b4f5aca40e
(sha1_file: convert read_sha1_file to struct object_id,
2018-03-12).

To allows semantic patches of the second category, we'll introduce
the concept of "pending" semantic patches, stored in
'contrib/coccinelle/.pending.cocci' files, modifying
'make coccicheck' to skip them, and adding the new 'make
coccicheck-pending' target to make it convenient to apply them.

[Missing: SZEDERs sign off, so I also do not sign off]
---
 Makefile | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index b08d5ea258..6ea2bbd7f5 100644
--- a/Makefile
+++ b/Makefile
@@ -2739,9 +2739,11 @@ endif
then \
echo '' SPATCH result: $@; \
fi
-coccicheck: $(addsuffix .patch,$(wildcard contrib/coccinelle/*.cocci))
+coccicheck: $(addsuffix .patch,$(filter-out %.pending.cocci,$(wildcard 
contrib/coccinelle/*.cocci)))
 
-.PHONY: coccicheck
+coccicheck-pending: $(addsuffix .patch,$(wildcard 
contrib/coccinelle/*.pending.cocci))
+
+.PHONY: coccicheck coccicheck-pending
 
 ### Installation rules
 
-- 
2.19.1.930.g4563a0d9d0-goog



Re: Infinite loop regression in git-fsck in v2.12.0

2018-10-30 Thread Ævar Arnfjörð Bjarmason


On Tue, Oct 30 2018, Ævar Arnfjörð Bjarmason wrote:

> The test is easy, just add a 'git fsck' at the end of t5000-tar-tree.sh,
> but more generally it seems having something like GIT_TEST_FSCK=true is
> a good idea. We do a bunch of stress testing of the object store in the
> test suite that we're unlikely to encounter in the wild.
>
> Of course my idea of how to do that in my
> <20181030184331.27264-3-ava...@gmail.com> would be counterproductive,
> i.e. it seems we want to catch all the cases where there's a bad fsck,
> just that it returns in a certain way.
>
> So maybe a good approach would be that we'd annotate all those test
> whose fsck fails with "this is how it should fail", and run those tests
> under GIT_TEST_FSCK=true, and GIT_TEST_FSCK=true would also be asserting
> that no tests other than those marked as failing the fsck check at the
> end fail it.

WIP patch for doing that:

diff --git a/Makefile b/Makefile
index b08d5ea258..ca624c381f 100644
--- a/Makefile
+++ b/Makefile
@@ -723,6 +723,7 @@ TEST_BUILTINS_OBJS += test-dump-fsmonitor.o
 TEST_BUILTINS_OBJS += test-dump-split-index.o
 TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
 TEST_BUILTINS_OBJS += test-example-decorate.o
+TEST_BUILTINS_OBJS += test-env-bool.o
 TEST_BUILTINS_OBJS += test-genrandom.o
 TEST_BUILTINS_OBJS += test-hashmap.o
 TEST_BUILTINS_OBJS += test-index-version.o
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 5df8b682aa..c4481085c4 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -17,6 +17,7 @@ static struct test_cmd cmds[] = {
{ "dump-fsmonitor", cmd__dump_fsmonitor },
{ "dump-split-index", cmd__dump_split_index },
{ "dump-untracked-cache", cmd__dump_untracked_cache },
+   { "env-bool", cmd__env_bool },
{ "example-decorate", cmd__example_decorate },
{ "genrandom", cmd__genrandom },
{ "hashmap", cmd__hashmap },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 71f470b871..f7845fbc56 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -13,6 +13,7 @@ int cmd__dump_cache_tree(int argc, const char **argv);
 int cmd__dump_fsmonitor(int argc, const char **argv);
 int cmd__dump_split_index(int argc, const char **argv);
 int cmd__dump_untracked_cache(int argc, const char **argv);
+int cmd__env_bool(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
 int cmd__hashmap(int argc, const char **argv);
diff --git a/t/t1305-config-include.sh b/t/t1305-config-include.sh
index 635918505d..92fbce2920 100755
--- a/t/t1305-config-include.sh
+++ b/t/t1305-config-include.sh
@@ -313,4 +313,8 @@ test_expect_success 'include cycles are detected' '
test_i18ngrep "exceeded maximum include depth" stderr
 '

+GIT_FSCK_FAILS=true
+GIT_FSCK_FAILS_TEST='
+   test_i18ngrep "exceeded maximum include depth" fsck.err
+'
 test_done
diff --git a/t/t3103-ls-tree-misc.sh b/t/t3103-ls-tree-misc.sh
index 14520913af..06abf84ef4 100755
--- a/t/t3103-ls-tree-misc.sh
+++ b/t/t3103-ls-tree-misc.sh
@@ -22,4 +22,10 @@ test_expect_success 'ls-tree fails with non-zero exit 
code on broken tree' '
test_must_fail git ls-tree -r HEAD
 '

+GIT_FSCK_FAILS=true
+GIT_FSCK_FAILS_TEST='
+   test_i18ngrep "invalid sha1 pointer in cache-tree" fsck.err &&
+   test_i18ngrep "broken link from" fsck.out &&
+   test_i18ngrep "missing tree" fsck.out
+'
 test_done
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 897e6fcc94..d4ebb94998 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -454,6 +454,8 @@ GIT_EXIT_OK=
 trap 'die' EXIT
 trap 'exit $?' INT

+GIT_FSCK_FAILS=
+
 # The user-facing functions are loaded from a separate file so that
 # test_perf subshells can have them too
 . "$TEST_DIRECTORY/test-lib-functions.sh"
@@ -790,6 +792,25 @@ test_at_end_hook_ () {
 }

 test_done () {
+   if test_have_prereq TEST_FSCK
+   then
+   desc='git fsck at end (due to GIT_TEST_FSCK)'
+   if test -n "$GIT_FSCK_FAILS"
+   then
+   test_expect_success "$desc (expected to fail)" '
+   test_must_fail git fsck 2>fsck.err >fsck.out
+   '
+   test_expect_success "$descriptor (expected to fail) -- 
assert failure mode" "
+   test_path_exists fsck.err &&
+   test_path_exists fsck.out &&
+   $GIT_FSCK_FAILS_TEST
+   "
+   else
+   test_expect_success "$desc" '
+   git fsck
+   '
+   fi
+   fi

Re: Infinite loop regression in git-fsck in v2.12.0

2018-10-30 Thread Jeff King
On Tue, Oct 30, 2018 at 09:03:24PM +0100, Ævar Arnfjörð Bjarmason wrote:

> While playing around with having a GIT_TEST_FSCK=true as I suggested in
> https://public-inbox.org/git/20181030184331.27264-3-ava...@gmail.com/ I
> found that we've had an infinite loop in git-fsck since c68b489e56
> ("fsck: parse loose object paths directly", 2017-01-13)
> 
> In particular in the while() loop added by f6371f9210 ("sha1_file: add
> read_loose_object() function", 2017-01-13) in the check_stream_sha1()
> function.
> 
> To reproduce just:
> 
> (
> cd t &&
> ./t5000-tar-tree.sh -d &&
> git -C trash\ directory.t5000-tar-tree/ fsck
> )

Thanks, I was easily able to reproduce.

> Before we'd print:
> 
> error: sha1 mismatch 19f9c8273ec45a8938e6999cb59b3ff66739902a
> error: 19f9c8273ec45a8938e6999cb59b3ff66739902a: object corrupt or missing
> Checking object directories: 100% (256/256), done.
> missing blob 19f9c8273ec45a8938e6999cb59b3ff66739902a

The problem isn't actually a sha1 mismatch, though that's what
parse_object() will report. The issue is actually that the file is
truncated. So zlib does not say "this is corrupt", but rather "I need
more bytes to keep going". And unfortunately it returns Z_BUF_ERROR both
for "I need more bytes" (in which we know we are truncated, because we
fed the whole mmap'd file in the first place) as well as "I need more
output buffer space" (which just means we should keep looping!).

So we need to distinguish those cases. I think this is the simplest fix:

diff --git a/sha1-file.c b/sha1-file.c
index dd0b6aa873..a7ff5fe25d 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -2199,6 +2199,7 @@ static int check_stream_sha1(git_zstream *stream,
 * see the comment in unpack_sha1_rest for details.
 */
while (total_read <= size &&
+  stream->avail_in > 0 &&
   (status == Z_OK || status == Z_BUF_ERROR)) {
stream->next_out = buf;
stream->avail_out = sizeof(buf);

> I have no idea if this makes sense, but this fixes it and we pass all
> the fsck tests with it:
> 
> diff --git a/sha1-file.c b/sha1-file.c
> index dd0b6aa873..fffc31458e 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -2182,7 +2182,7 @@ static int check_stream_sha1(git_zstream *stream,
>   git_hash_ctx c;
>   unsigned char real_sha1[GIT_MAX_RAWSZ];
>   unsigned char buf[4096];
> - unsigned long total_read;
> + unsigned long total_read, last_total_read;
>   int status = Z_OK;
> 
>   the_hash_algo->init_fn();
> @@ -2193,6 +2193,7 @@ static int check_stream_sha1(git_zstream *stream,
>* do not count against the object's content size.
>*/
>   total_read = stream->total_out - strlen(hdr) - 1;
> + last_total_read = total_read;

This works just by checking that we are making forward progress in the
output buffer. I think that would _probably_ be OK for this case, since
we know we have all of the input available. But in a case where we're
feeding the input in a stream, it would not be. It's possible there that
we would not create any output in one round, but would do so after
feeding more input bytes.

I think the patch I showed above addresses the root cause more directly.
I'll wrap that up in a real commit, but I think there may be some
related work:

  - "git show 19f9c827" does complain with "sha1 mismatch" (which isn't
strictly correct, but is probably good enough). However, "git
cat-file blob 19f9c827" exits non-zero without printing anything. It
probably should complain more loudly.

  - the offending loop comes from f6371f9210. But that commit was mostly
cargo-culting other parts of sha1-file.c. I'm worried that this bug
exists elsewhere, too. I'll dig around to see if I can find other
instances.

-Peff


Infinite loop regression in git-fsck in v2.12.0

2018-10-30 Thread Ævar Arnfjörð Bjarmason
While playing around with having a GIT_TEST_FSCK=true as I suggested in
https://public-inbox.org/git/20181030184331.27264-3-ava...@gmail.com/ I
found that we've had an infinite loop in git-fsck since c68b489e56
("fsck: parse loose object paths directly", 2017-01-13)

In particular in the while() loop added by f6371f9210 ("sha1_file: add
read_loose_object() function", 2017-01-13) in the check_stream_sha1()
function.

To reproduce just:

(
cd t &&
./t5000-tar-tree.sh -d &&
git -C trash\ directory.t5000-tar-tree/ fsck
)

Before we'd print:

error: sha1 mismatch 19f9c8273ec45a8938e6999cb59b3ff66739902a
error: 19f9c8273ec45a8938e6999cb59b3ff66739902a: object corrupt or missing
Checking object directories: 100% (256/256), done.
missing blob 19f9c8273ec45a8938e6999cb59b3ff66739902a

Now we just hang on:

Checking object directories:   9% (24/256)

I have no idea if this makes sense, but this fixes it and we pass all
the fsck tests with it:

diff --git a/sha1-file.c b/sha1-file.c
index dd0b6aa873..fffc31458e 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -2182,7 +2182,7 @@ static int check_stream_sha1(git_zstream *stream,
git_hash_ctx c;
unsigned char real_sha1[GIT_MAX_RAWSZ];
unsigned char buf[4096];
-   unsigned long total_read;
+   unsigned long total_read, last_total_read;
int status = Z_OK;

the_hash_algo->init_fn();
@@ -2193,6 +2193,7 @@ static int check_stream_sha1(git_zstream *stream,
 * do not count against the object's content size.
 */
total_read = stream->total_out - strlen(hdr) - 1;
+   last_total_read = total_read;

/*
 * This size comparison must be "<=" to read the final zlib packets;
@@ -2207,6 +2208,9 @@ static int check_stream_sha1(git_zstream *stream,
status = git_inflate(stream, Z_FINISH);
the_hash_algo->update_fn(, buf, stream->next_out - buf);
total_read += stream->next_out - buf;
+   if (last_total_read == total_read)
+   return -1;
+   last_total_read = total_read;
}
git_inflate_end(stream);


I.e. we get into a loop where total_read isn't increasing. We no longer
print "sha1 mismatch" but maybe that's an emergent effect of something
else. Haven't checked.

The test is easy, just add a 'git fsck' at the end of t5000-tar-tree.sh,
but more generally it seems having something like GIT_TEST_FSCK=true is
a good idea. We do a bunch of stress testing of the object store in the
test suite that we're unlikely to encounter in the wild.

Of course my idea of how to do that in my
<20181030184331.27264-3-ava...@gmail.com> would be counterproductive,
i.e. it seems we want to catch all the cases where there's a bad fsck,
just that it returns in a certain way.

So maybe a good approach would be that we'd annotate all those test
whose fsck fails with "this is how it should fail", and run those tests
under GIT_TEST_FSCK=true, and GIT_TEST_FSCK=true would also be asserting
that no tests other than those marked as failing the fsck check at the
end fail it.


[RFC v1] Add virtual file system settings and hook proc

2018-10-30 Thread Ben Peart
From: Ben Peart 

On index load, clear/set the skip worktree bits based on the virtual
file system data. Use virtual file system data to update skip-worktree
bit in unpack-trees. Use virtual file system data to exclude files and
folders not explicitly requested.

Signed-off-by: Ben Peart 
---

We have taken several steps to make git perform well on very large repos.
Some of those steps include: improving underlying algorithms, utilizing
multi-threading where possible, and simplifying the behavior of some commands.
These changes typically benefit all git repos to varying degrees.  While
these optimizations all help, they are insufficient to provide adequate
performance on the very large repos we often work with.

To make git perform well on the very largest repos, we had to make more
significant changes.  The biggest performance win by far is the work we have
done to make git operations O(modified) instead of O(size of repo).  This
takes advantage of the fact that the number of files a developer has modified
is a tiny fraction of the overall repo size.

We accomplished this by utilizing the existing internal logic for the skip
worktree bit and excludes to tell git to ignore all files and folders other
than those that have been modified.  This logic is driven by an external
process that monitors writes to the repo and communicates the list of files
and folders with changes to git via the virtual file system hook in this patch.

The external process maintains a list of files and folders that have been
modified.  When git runs, it requests the list of files and folders that
have been modified via the virtual file system hook.  Git then sets/clears
the skip-worktree bit on the cache entries and builds a hashmap of the
modified files/folders that is used by the excludes logic to avoid scanning
the entire repo looking for changes and untracked files.

With this system, we have been able to make local git command performance on
extremely large repos (millions of files, 1/2 million folders) entirely
manageable (30 second checkout, 3.5 seconds status, 4 second add, 7 second
commit, etc).

Our desire is to eliminate all custom patches in our fork of git.  To that
end, I'm submitting this as an RFC to see how much interest there is and how
much willingness to take this type of change into git.

Notes:
Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/c06b290d2f
Checkout: git fetch https://github.com/benpeart/git virtual-filesystem-v1 
&& git checkout c06b290d2f

 Documentation/config.txt |   8 +
 Documentation/githooks.txt   |  20 ++
 Makefile |   1 +
 cache.h  |   1 +
 config.c |  37 +++-
 config.h |   1 +
 dir.c|  34 +++-
 environment.c|   1 +
 read-cache.c |   2 +
 t/README |   3 +
 t/t1092-virtualfilesystem.sh | 354 +++
 t/t1092/virtualfilesystem|  23 +++
 unpack-trees.c   |  23 ++-
 virtualfilesystem.c  | 308 ++
 virtualfilesystem.h  |  25 +++
 15 files changed, 833 insertions(+), 8 deletions(-)
 create mode 100755 t/t1092-virtualfilesystem.sh
 create mode 100755 t/t1092/virtualfilesystem
 create mode 100644 virtualfilesystem.c
 create mode 100644 virtualfilesystem.h

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 09e95e9e98..dd4b834375 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -441,6 +441,14 @@ core.fsmonitor::
avoiding unnecessary processing of files that have not changed.
See the "fsmonitor-watchman" section of linkgit:githooks[5].
 
+core.virtualFilesystem::
+   If set, the value of this variable is used as a command which
+   will identify all files and directories that are present in
+   the working directory.  Git will only track and update files
+   listed in the virtual file system.  Using the virtual file system
+   will supersede the sparse-checkout settings which will be ignored.
+   See the "virtual file system" section of linkgit:githooks[6].
+
 core.trustctime::
If false, the ctime differences between the index and the
working tree are ignored; useful when the inode change time
diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 959044347e..87f9ad2a77 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -492,6 +492,26 @@ This hook is invoked by `git-p4 submit`. It takes no 
parameters and nothing
 from standard input. Exiting with non-zero status from this script prevent
 `git-p4 submit` from launching. Run `git-p4 submit --help` for details.
 
+virtualFilesystem
+~~
+
+"Virtual File System" allows populating the working directory sparsely.
+The projection data is typically automatically generated by an external
+process.  Git will limit what files it 

[PATCH 0/1] mingw: fix isatty() after dup2()

2018-10-30 Thread Johannes Schindelin via GitGitGadget
This patch has been looong in the waiting (well over a year) for being sent
to the Git mailing list; it is a fixup for a change of isatty() on Windows
that had long made it into git.git's master branch.

Johannes Schindelin (1):
  mingw: fix isatty() after dup2()

 compat/mingw.h   |  3 +++
 compat/winansi.c | 12 
 2 files changed, 15 insertions(+)


base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-61%2Fdscho%2Fmingw-isatty-and-dup2-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-61/dscho/mingw-isatty-and-dup2-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/61
-- 
gitgitgadget


[PATCH 1/1] mingw: fix isatty() after dup2()

2018-10-30 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

Since a9b8a09c3c30 (mingw: replace isatty() hack, 2016-12-22), we handle
isatty() by special-casing the stdin/stdout/stderr file descriptors,
caching the return value. However, we missed the case where dup2()
overrides the respective file descriptor.

That poses a problem e.g. where the `show` builtin asks for a pager very
early, the `setup_pager()` function sets the pager depending on the
return value of `isatty()` and then redirects stdout. Subsequently,
`cmd_log_init_finish()` calls `setup_pager()` *again*. What should
happen now is that `isatty()` reports that stdout is *not* a TTY and
consequently stdout should be left alone.

Let's override dup2() to handle this appropriately.

This fixes https://github.com/git-for-windows/git/issues/1077

Signed-off-by: Johannes Schindelin 
---
 compat/mingw.h   |  3 +++
 compat/winansi.c | 12 
 2 files changed, 15 insertions(+)

diff --git a/compat/mingw.h b/compat/mingw.h
index f31dcff2b..b04556ce0 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -390,6 +390,9 @@ int mingw_raise(int sig);
 int winansi_isatty(int fd);
 #define isatty winansi_isatty
 
+int winansi_dup2(int oldfd, int newfd);
+#define dup2 winansi_dup2
+
 void winansi_init(void);
 HANDLE winansi_get_osfhandle(int fd);
 
diff --git a/compat/winansi.c b/compat/winansi.c
index a11a0f16d..f4f08237f 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -474,6 +474,18 @@ static void die_lasterr(const char *fmt, ...)
va_end(params);
 }
 
+#undef dup2
+int winansi_dup2(int oldfd, int newfd)
+{
+   int ret = dup2(oldfd, newfd);
+
+   if (!ret && newfd >= 0 && newfd <= 2)
+   fd_is_interactive[newfd] = oldfd < 0 || oldfd > 2 ?
+   0 : fd_is_interactive[oldfd];
+
+   return ret;
+}
+
 static HANDLE duplicate_handle(HANDLE hnd)
 {
HANDLE hresult, hproc = GetCurrentProcess();
-- 
gitgitgadget


[PATCH v2 0/3] index-pack: test updates

2018-10-30 Thread Ævar Arnfjörð Bjarmason
I'd still probalby like to have core.checkCollisions as a config knob
to be able to turn it off, but let's see what Jeff comes up with once
he finishes his WIP cache patch.

In the meantime 1-3/4 of my series is obviously correct test fixes
which I'd like queued up first.

Ævar Arnfjörð Bjarmason (3):
  pack-objects test: modernize style
  pack-objects tests: don't leave test .git corrupt at end
  index-pack tests: don't leave test repo dirty at end

 t/t1060-object-corruption.sh |  4 ++-
 t/t5300-pack-object.sh   | 47 +++-
 2 files changed, 28 insertions(+), 23 deletions(-)

-- 
2.19.1.899.g0250525e69



[PATCH v2 1/3] pack-objects test: modernize style

2018-10-30 Thread Ævar Arnfjörð Bjarmason
Modernize the quoting and indentation style of two tests added in
8685da4256 ("don't ever allow SHA1 collisions to exist by fetching a
pack", 2007-03-20), and of a subsequent one added in
4614043c8f ("index-pack: use streaming interface for collision test on
large blobs", 2012-05-24) which had copied the style of the first two.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t5300-pack-object.sh | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 6c620cd540..a0309e4bab 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -475,22 +475,22 @@ test_expect_success 'pack-objects in too-many-packs mode' 
'
 # two tests at the end of this file.
 #
 
-test_expect_success \
-'fake a SHA1 hash collision' \
-'long_a=$(git hash-object a | sed -e "s!^..!&/!") &&
- long_b=$(git hash-object b | sed -e "s!^..!&/!") &&
- test -f   .git/objects/$long_b &&
- cp -f .git/objects/$long_a \
-   .git/objects/$long_b'
+test_expect_success 'fake a SHA1 hash collision' '
+   long_a=$(git hash-object a | sed -e "s!^..!&/!") &&
+   long_b=$(git hash-object b | sed -e "s!^..!&/!") &&
+   test -f .git/objects/$long_b &&
+   cp -f   .git/objects/$long_a \
+   .git/objects/$long_b
+'
 
-test_expect_success \
-'make sure index-pack detects the SHA1 collision' \
-'test_must_fail git index-pack -o bad.idx test-3.pack 2>msg &&
- test_i18ngrep "SHA1 COLLISION FOUND" msg'
+test_expect_success 'make sure index-pack detects the SHA1 collision' '
+   test_must_fail git index-pack -o bad.idx test-3.pack 2>msg &&
+   test_i18ngrep "SHA1 COLLISION FOUND" msg
+'
 
-test_expect_success \
-'make sure index-pack detects the SHA1 collision (large blobs)' \
-'test_must_fail git -c core.bigfilethreshold=1 index-pack -o bad.idx 
test-3.pack 2>msg &&
- test_i18ngrep "SHA1 COLLISION FOUND" msg'
+test_expect_success 'make sure index-pack detects the SHA1 collision (large 
blobs)' '
+   test_must_fail git -c core.bigfilethreshold=1 index-pack -o bad.idx 
test-3.pack 2>msg &&
+   test_i18ngrep "SHA1 COLLISION FOUND" msg
+'
 
 test_done
-- 
2.19.1.899.g0250525e69



[PATCH v2 2/3] pack-objects tests: don't leave test .git corrupt at end

2018-10-30 Thread Ævar Arnfjörð Bjarmason
Change the pack-objects tests to not leave their .git directory
corrupt and the end.

In 2fca19fbb5 ("fix multiple issues with t5300", 2010-02-03) a comment
was added warning against adding any subsequent tests, but since
4614043c8f ("index-pack: use streaming interface for collision test on
large blobs", 2012-05-24) the comment has drifted away from the code,
mentioning two test, when we actually have three.

Instead of having this warning let's just create a new .git directory
specifically for these tests.

As an aside, it would be interesting to instrument the test suite to
run a "git fsck" at the very end (in "test_done"). That would have
errored before this change, and may find other issues #leftoverbits.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t5300-pack-object.sh | 37 -
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index a0309e4bab..410a09b0dd 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -468,29 +468,32 @@ test_expect_success 'pack-objects in too-many-packs mode' 
'
git fsck
 '
 
-#
-# WARNING!
-#
-# The following test is destructive.  Please keep the next
-# two tests at the end of this file.
-#
-
-test_expect_success 'fake a SHA1 hash collision' '
-   long_a=$(git hash-object a | sed -e "s!^..!&/!") &&
-   long_b=$(git hash-object b | sed -e "s!^..!&/!") &&
-   test -f .git/objects/$long_b &&
-   cp -f   .git/objects/$long_a \
-   .git/objects/$long_b
+test_expect_success 'setup: fake a SHA1 hash collision' '
+   git init corrupt &&
+   (
+   cd corrupt &&
+   long_a=$(git hash-object -w ../a | sed -e "s!^..!&/!") &&
+   long_b=$(git hash-object -w ../b | sed -e "s!^..!&/!") &&
+   test -f .git/objects/$long_b &&
+   cp -f   .git/objects/$long_a \
+   .git/objects/$long_b
+   )
 '
 
 test_expect_success 'make sure index-pack detects the SHA1 collision' '
-   test_must_fail git index-pack -o bad.idx test-3.pack 2>msg &&
-   test_i18ngrep "SHA1 COLLISION FOUND" msg
+   (
+   cd corrupt &&
+   test_must_fail git index-pack -o ../bad.idx ../test-3.pack 
2>msg &&
+   test_i18ngrep "SHA1 COLLISION FOUND" msg
+   )
 '
 
 test_expect_success 'make sure index-pack detects the SHA1 collision (large 
blobs)' '
-   test_must_fail git -c core.bigfilethreshold=1 index-pack -o bad.idx 
test-3.pack 2>msg &&
-   test_i18ngrep "SHA1 COLLISION FOUND" msg
+   (
+   cd corrupt &&
+   test_must_fail git -c core.bigfilethreshold=1 index-pack -o 
../bad.idx ../test-3.pack 2>msg &&
+   test_i18ngrep "SHA1 COLLISION FOUND" msg
+   )
 '
 
 test_done
-- 
2.19.1.899.g0250525e69



[PATCH v2 3/3] index-pack tests: don't leave test repo dirty at end

2018-10-30 Thread Ævar Arnfjörð Bjarmason
Change a test added in 51054177b3 ("index-pack: detect local
corruption in collision check", 2017-04-01) so that the repository
isn't left dirty at the end.

Due to the caveats explained in 720dae5a19 ("config doc: elaborate on
fetch.fsckObjects security", 2018-07-27) even a "fetch" that fails
will write to the local object store, so let's copy the bit-error test
directory before running this test.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t1060-object-corruption.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
index ac1f189fd2..4feb65157d 100755
--- a/t/t1060-object-corruption.sh
+++ b/t/t1060-object-corruption.sh
@@ -117,8 +117,10 @@ test_expect_failure 'clone --local detects misnamed 
objects' '
 '
 
 test_expect_success 'fetch into corrupted repo with index-pack' '
+   cp -R bit-error bit-error-cp &&
+   test_when_finished "rm -rf bit-error-cp" &&
(
-   cd bit-error &&
+   cd bit-error-cp &&
test_must_fail git -c transfer.unpackLimit=1 \
fetch ../no-bit-error 2>stderr &&
test_i18ngrep ! -i collision stderr
-- 
2.19.1.899.g0250525e69



[PATCH 4/4] mingw: unset PERL5LIB by default

2018-10-30 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

Git for Windows ships with its own Perl interpreter, and insists on
using it, so it will most likely wreak havoc if PERL5LIB is set before
launching Git.

Let's just unset that environment variables when spawning processes.

To make this feature extensible (and overrideable), there is a new
config setting `core.unsetenvvars` that allows specifying a
comma-separated list of names to unset before spawning processes.

Reported by Gabriel Fuhrmann.

Signed-off-by: Johannes Schindelin 
---
 Documentation/config.txt |  6 ++
 compat/mingw.c   | 35 ++-
 t/t0029-core-unsetenvvars.sh | 30 ++
 3 files changed, 70 insertions(+), 1 deletion(-)
 create mode 100755 t/t0029-core-unsetenvvars.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 09e95e9e9..f338f0b2c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -921,6 +921,12 @@ relatively high IO latencies.  When enabled, Git will do 
the
 index comparison to the filesystem data in parallel, allowing
 overlapping IO's.  Defaults to true.
 
+core.unsetenvvars::
+   Windows-only: comma-separated list of environment variables'
+   names that need to be unset before spawning any other process.
+   Defaults to `PERL5LIB` to account for the fact that Git for
+   Windows insists on using its own Perl interpreter.
+
 core.createObject::
You can set this to 'link', in which case a hardlink followed by
a delete of the source are used to make sure that object creation
diff --git a/compat/mingw.c b/compat/mingw.c
index 272d5e11e..181e74c23 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -212,6 +212,7 @@ enum hide_dotfiles_type {
 };
 
 static enum hide_dotfiles_type hide_dotfiles = HIDE_DOTFILES_DOTGITONLY;
+static char *unset_environment_variables;
 
 int mingw_core_config(const char *var, const char *value, void *cb)
 {
@@ -223,6 +224,12 @@ int mingw_core_config(const char *var, const char *value, 
void *cb)
return 0;
}
 
+   if (!strcmp(var, "core.unsetenvvars")) {
+   free(unset_environment_variables);
+   unset_environment_variables = xstrdup(value);
+   return 0;
+   }
+
return 0;
 }
 
@@ -1180,6 +1187,27 @@ static wchar_t *make_environment_block(char **deltaenv)
return wenvblk;
 }
 
+static void do_unset_environment_variables(void)
+{
+   static int done;
+   char *p = unset_environment_variables;
+
+   if (done || !p)
+   return;
+   done = 1;
+
+   for (;;) {
+   char *comma = strchr(p, ',');
+
+   if (comma)
+   *comma = '\0';
+   unsetenv(p);
+   if (!comma)
+   break;
+   p = comma + 1;
+   }
+}
+
 struct pinfo_t {
struct pinfo_t *next;
pid_t pid;
@@ -1198,9 +1226,12 @@ static pid_t mingw_spawnve_fd(const char *cmd, const 
char **argv, char **deltaen
wchar_t wcmd[MAX_PATH], wdir[MAX_PATH], *wargs, *wenvblk = NULL;
unsigned flags = CREATE_UNICODE_ENVIRONMENT;
BOOL ret;
+   HANDLE cons;
+
+   do_unset_environment_variables();
 
/* Determine whether or not we are associated to a console */
-   HANDLE cons = CreateFile("CONOUT$", GENERIC_WRITE,
+   cons = CreateFile("CONOUT$", GENERIC_WRITE,
FILE_SHARE_WRITE, NULL, OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL, NULL);
if (cons == INVALID_HANDLE_VALUE) {
@@ -2437,6 +2468,8 @@ void mingw_startup(void)
/* fix Windows specific environment settings */
setup_windows_environment();
 
+   unset_environment_variables = xstrdup("PERL5LIB");
+
/* initialize critical section for waitpid pinfo_t list */
InitializeCriticalSection(_cs);
 
diff --git a/t/t0029-core-unsetenvvars.sh b/t/t0029-core-unsetenvvars.sh
new file mode 100755
index 0..24ce46a6e
--- /dev/null
+++ b/t/t0029-core-unsetenvvars.sh
@@ -0,0 +1,30 @@
+#!/bin/sh
+
+test_description='test the Windows-only core.unsetenvvars setting'
+
+. ./test-lib.sh
+
+if ! test_have_prereq MINGW
+then
+   skip_all='skipping Windows-specific tests'
+   test_done
+fi
+
+test_expect_success 'setup' '
+   mkdir -p "$TRASH_DIRECTORY/.git/hooks" &&
+   write_script "$TRASH_DIRECTORY/.git/hooks/pre-commit" <<-\EOF
+   echo $HOBBES >&2
+   EOF
+'
+
+test_expect_success 'core.unsetenvvars works' '
+   HOBBES=Calvin &&
+   export HOBBES &&
+   git commit --allow-empty -m with 2>err &&
+   grep Calvin err &&
+   git -c core.unsetenvvars=FINDUS,HOBBES,CALVIN \
+   commit --allow-empty -m without 2>err &&
+   ! grep Calvin err
+'
+
+test_done
-- 
gitgitgadget


[PATCH 1/4] config: rename `dummy` parameter to `cb` in git_default_config()

2018-10-30 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

This is the convention elsewhere (and prepares for the case where we may
need to pass callback data).

Signed-off-by: Johannes Schindelin 
---
 config.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/config.c b/config.c
index 4051e3882..3687c6783 100644
--- a/config.c
+++ b/config.c
@@ -1448,13 +1448,13 @@ static int git_default_mailmap_config(const char *var, 
const char *value)
return 0;
 }
 
-int git_default_config(const char *var, const char *value, void *dummy)
+int git_default_config(const char *var, const char *value, void *cb)
 {
if (starts_with(var, "core."))
return git_default_core_config(var, value);
 
if (starts_with(var, "user."))
-   return git_ident_config(var, value, dummy);
+   return git_ident_config(var, value, cb);
 
if (starts_with(var, "i18n."))
return git_default_i18n_config(var, value);
-- 
gitgitgadget



[PATCH 3/4] Move Windows-specific config settings into compat/mingw.c

2018-10-30 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

Signed-off-by: Johannes Schindelin 
---
 cache.h|  8 
 compat/mingw.c | 18 ++
 config.c   |  8 
 environment.c  |  1 -
 4 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/cache.h b/cache.h
index f7fabdde8..0ce95c5a8 100644
--- a/cache.h
+++ b/cache.h
@@ -906,14 +906,6 @@ int use_optional_locks(void);
 extern char comment_line_char;
 extern int auto_comment_line_char;
 
-/* Windows only */
-enum hide_dotfiles_type {
-   HIDE_DOTFILES_FALSE = 0,
-   HIDE_DOTFILES_TRUE,
-   HIDE_DOTFILES_DOTGITONLY
-};
-extern enum hide_dotfiles_type hide_dotfiles;
-
 enum log_refs_config {
LOG_REFS_UNSET = -1,
LOG_REFS_NONE = 0,
diff --git a/compat/mingw.c b/compat/mingw.c
index 293f286af..272d5e11e 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -6,6 +6,7 @@
 #include "../run-command.h"
 #include "../cache.h"
 #include "win32/lazyload.h"
+#include "../config.h"
 
 #define HCAST(type, handle) ((type)(intptr_t)handle)
 
@@ -203,8 +204,25 @@ static int ask_yes_no_if_possible(const char *format, ...)
}
 }
 
+/* Windows only */
+enum hide_dotfiles_type {
+   HIDE_DOTFILES_FALSE = 0,
+   HIDE_DOTFILES_TRUE,
+   HIDE_DOTFILES_DOTGITONLY
+};
+
+static enum hide_dotfiles_type hide_dotfiles = HIDE_DOTFILES_DOTGITONLY;
+
 int mingw_core_config(const char *var, const char *value, void *cb)
 {
+   if (!strcmp(var, "core.hidedotfiles")) {
+   if (value && !strcasecmp(value, "dotgitonly"))
+   hide_dotfiles = HIDE_DOTFILES_DOTGITONLY;
+   else
+   hide_dotfiles = git_config_bool(var, value);
+   return 0;
+   }
+
return 0;
 }
 
diff --git a/config.c b/config.c
index 646b6cca9..5bf19a23c 100644
--- a/config.c
+++ b/config.c
@@ -1344,14 +1344,6 @@ static int git_default_core_config(const char *var, 
const char *value, void *cb)
return 0;
}
 
-   if (!strcmp(var, "core.hidedotfiles")) {
-   if (value && !strcasecmp(value, "dotgitonly"))
-   hide_dotfiles = HIDE_DOTFILES_DOTGITONLY;
-   else
-   hide_dotfiles = git_config_bool(var, value);
-   return 0;
-   }
-
if (!strcmp(var, "core.partialclonefilter")) {
return git_config_string(_partial_clone_filter_default,
 var, value);
diff --git a/environment.c b/environment.c
index 3f3c8746c..30ecd4e78 100644
--- a/environment.c
+++ b/environment.c
@@ -71,7 +71,6 @@ int core_apply_sparse_checkout;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
-enum hide_dotfiles_type hide_dotfiles = HIDE_DOTFILES_DOTGITONLY;
 enum log_refs_config log_all_ref_updates = LOG_REFS_UNSET;
 
 #ifndef PROTECT_HFS_DEFAULT
-- 
gitgitgadget



[PATCH 0/4] mingw: prevent external PERL5LIB from interfering

2018-10-30 Thread Johannes Schindelin via GitGitGadget
In Git for Windows, we do not support running the Perl scripts with any
random Perl interpreter. Instead, we insist on using the shipped one (except
for MinGit, where we do not even ship the Perl scripts, to save on space).

However, if Git is called from, say, a Perl script running in a different
Perl interpreter with appropriately configured PERL5LIB, it messes with
Git's ability to run its Perl scripts.

For that reason, we devised the presented method of defining a list of
environment variables (via core.unsetEnvVars) that would then be unset
before spawning any process, defaulting to PERL5LIB.

An alternative approach which was rejected at the time (because it
interfered with the then-ongoing work to compile Git for Windows using MS
Visual C++) would patch the make_environment_block() function to skip the
specified environment variables before handing down the environment block to
the spawned process. Currently it would interfere with the mingw-utf-8-env
patch series I sent earlier today
[https://public-inbox.org/git/pull.57.git.gitgitgad...@gmail.com].

While at it, this patch series also cleans up house and moves the
Windows-specific core.* variable handling to compat/mingw.c rather than
cluttering environment.c and config.c with things that e.g. developers on
Linux do not want to care about.

Johannes Schindelin (4):
  config: rename `dummy` parameter to `cb` in git_default_config()
  Allow for platform-specific core.* config settings
  Move Windows-specific config settings into compat/mingw.c
  mingw: unset PERL5LIB by default

 Documentation/config.txt |  6 
 cache.h  |  8 -
 compat/mingw.c   | 58 +++-
 compat/mingw.h   |  3 ++
 config.c | 18 ---
 environment.c|  1 -
 git-compat-util.h|  8 +
 t/t0029-core-unsetenvvars.sh | 30 +++
 8 files changed, 109 insertions(+), 23 deletions(-)
 create mode 100755 t/t0029-core-unsetenvvars.sh


base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-62%2Fdscho%2Fperl5lib-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-62/dscho/perl5lib-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/62
-- 
gitgitgadget


[PATCH 2/4] Allow for platform-specific core.* config settings

2018-10-30 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

In the Git for Windows project, we have ample precendent for config
settings that apply to Windows, and to Windows only.

Let's formalize this concept by introducing a platform_core_config()
function that can be #define'd in a platform-specific manner.

This will allow us to contain platform-specific code better, as the
corresponding variables no longer need to be exported so that they can
be defined in environment.c and be set in config.c

Signed-off-by: Johannes Schindelin 
---
 compat/mingw.c| 5 +
 compat/mingw.h| 3 +++
 config.c  | 6 +++---
 git-compat-util.h | 8 
 4 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 81ef24286..293f286af 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -203,6 +203,11 @@ static int ask_yes_no_if_possible(const char *format, ...)
}
 }
 
+int mingw_core_config(const char *var, const char *value, void *cb)
+{
+   return 0;
+}
+
 /* Normalizes NT paths as returned by some low-level APIs. */
 static wchar_t *normalize_ntpath(wchar_t *wbuf)
 {
diff --git a/compat/mingw.h b/compat/mingw.h
index f31dcff2b..e9d2b9cdd 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -11,6 +11,9 @@ typedef _sigset_t sigset_t;
 #undef _POSIX_THREAD_SAFE_FUNCTIONS
 #endif
 
+extern int mingw_core_config(const char *var, const char *value, void *cb);
+#define platform_core_config mingw_core_config
+
 /*
  * things that are not available in header files
  */
diff --git a/config.c b/config.c
index 3687c6783..646b6cca9 100644
--- a/config.c
+++ b/config.c
@@ -1093,7 +1093,7 @@ int git_config_color(char *dest, const char *var, const 
char *value)
return 0;
 }
 
-static int git_default_core_config(const char *var, const char *value)
+static int git_default_core_config(const char *var, const char *value, void 
*cb)
 {
/* This needs a better name */
if (!strcmp(var, "core.filemode")) {
@@ -1363,7 +1363,7 @@ static int git_default_core_config(const char *var, const 
char *value)
}
 
/* Add other config variables here and to Documentation/config.txt. */
-   return 0;
+   return platform_core_config(var, value, cb);
 }
 
 static int git_default_i18n_config(const char *var, const char *value)
@@ -1451,7 +1451,7 @@ static int git_default_mailmap_config(const char *var, 
const char *value)
 int git_default_config(const char *var, const char *value, void *cb)
 {
if (starts_with(var, "core."))
-   return git_default_core_config(var, value);
+   return git_default_core_config(var, value, cb);
 
if (starts_with(var, "user."))
return git_ident_config(var, value, cb);
diff --git a/git-compat-util.h b/git-compat-util.h
index 96a3f86d8..3a08d9916 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -342,6 +342,14 @@ typedef uintmax_t timestamp_t;
 #define _PATH_DEFPATH "/usr/local/bin:/usr/bin:/bin"
 #endif
 
+#ifndef platform_core_config
+static inline int noop_core_config(const char *var, const char *value, void 
*cb)
+{
+   return 0;
+}
+#define platform_core_config noop_core_config
+#endif
+
 #ifndef has_dos_drive_prefix
 static inline int git_has_dos_drive_prefix(const char *path)
 {
-- 
gitgitgadget



[PATCH 0/1] DiffHighlight.pm: Use correct /dev/null for UNIX and Windows

2018-10-30 Thread Chris. Webster via GitGitGadget
Use File::Spec->devnull() for output redirection to avoid messages when
Windows version of Perl is first in path. The message 'The system cannot
find the path specified.' is displayed each time git is run to get colors.

chris (1):
  Use correct /dev/null for UNIX and Windows

 contrib/diff-highlight/DiffHighlight.pm | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)


base-commit: c670b1f876521c9f7cd40184bf7ed05aad843433
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-59%2Fwebstech%2Fmaster-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-59/webstech/master-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/59
-- 
gitgitgadget


[PATCH 1/1] Use correct /dev/null for UNIX and Windows

2018-10-30 Thread chris via GitGitGadget
From: chris 

Use File::Spec->devnull() for output redirection to avoid messages
when Windows version of Perl is first in path.  The message 'The
system cannot find the path specified.' is displayed each time git is
run to get colors.

Signed-off-by: Chris. Webster 
---
 contrib/diff-highlight/DiffHighlight.pm | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/contrib/diff-highlight/DiffHighlight.pm 
b/contrib/diff-highlight/DiffHighlight.pm
index 536754583..7440aa1c4 100644
--- a/contrib/diff-highlight/DiffHighlight.pm
+++ b/contrib/diff-highlight/DiffHighlight.pm
@@ -4,6 +4,11 @@ use 5.008;
 use warnings FATAL => 'all';
 use strict;
 
+# Use the correct value for both UNIX and Windows (/dev/null vs nul)
+use File::Spec;
+
+my $NULL = File::Spec->devnull();
+
 # Highlight by reversing foreground and background. You could do
 # other things like bold or underline if you prefer.
 my @OLD_HIGHLIGHT = (
@@ -134,7 +139,7 @@ sub highlight_stdin {
 # fallback, which means we will work even if git can't be run.
 sub color_config {
my ($key, $default) = @_;
-   my $s = `git config --get-color $key 2>/dev/null`;
+   my $s = `git config --get-color $key 2>$NULL`;
return length($s) ? $s : $default;
 }
 
-- 
gitgitgadget


Re: [PATCH v2 06/16] sequencer: refactor sequencer_add_exec_commands() to work on a todo_list

2018-10-30 Thread Phillip Wood

On 27/10/2018 22:29, Alban Gruin wrote:

This refactors sequencer_add_exec_commands() to work on a todo_list to
avoid redundant reads and writes to the disk.

An obvious way to do this would be to insert the `exec' command between
the other commands, and reparse it once this is done.  This is not what
is done here.  Instead, the command is appended to the buffer once, and
a new list of items is created.  Items from the old list are copied to
it, and new `exec' items are appended when necessary.  This eliminates
the need to reparse the todo list, but this also means its buffer cannot
be directly written to the disk, hence todo_list_write_to_disk().


I'd reword this slightly, maybe

Instead of just inserting the `exec' command between the other commands, 
and re-parsing the buffer at the end the exec command is appended to the 
buffer once, and a new list of items is created.  Items from the old 
list are copied across and new `exec' items are appended when necessary. 
 This eliminates the need to reparse the buffer, but this also means we 
have to use todo_list_write_to_disk() to write the file.



sequencer_add_exec_commands() still reads the todo list from the disk,
as it is needed by rebase -p.  todo_list_add_exec_commands() works on a
todo_list structure, and reparses it at the end.


I think the saying 'reparses' is confusing as that is what we're trying 
to avoid.



complete_action() still uses sequencer_add_exec_commands() for now.
This will be changed in a future commit.

Signed-off-by: Alban Gruin 
---
  sequencer.c | 69 +
  1 file changed, 49 insertions(+), 20 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index e12860c047..12a3efeca8 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4216,6 +4216,50 @@ int sequencer_make_script(FILE *out, int argc, const 
char **argv,
return 0;
  }
  
+static void todo_list_add_exec_commands(struct todo_list *todo_list,

+   const char *commands)
+{
+   struct strbuf *buf = _list->buf;
+   const char *old_buf = buf->buf;
+   size_t commands_len = strlen(commands + strlen("exec ")) - 1;
+   int i, first = 1, nr = 0, alloc = 0;


Minor nit pick, I think it is clearer if first is initialized just 
before the loop as it is in the deleted code below.



+   struct todo_item *items = NULL,
+   base_item = {TODO_EXEC, NULL, 0, 0, commands_len, 0};
+
+   strbuf_addstr(buf, commands);
+   base_item.offset_in_buf = buf->len - commands_len - 1;
+   base_item.arg = buf->buf + base_item.offset_in_buf;


I think if the user gives --exec more than once on the command line then 
commands will contain more than one exec command so this needs to parse 
commands and create one todo_item for each command.



+
+   /*
+* Insert  after every pick. Here, fixup/squash chains
+* are considered part of the pick, so we insert the commands *after*
+* those chains if there are any.
+*/
+   for (i = 0; i < todo_list->nr; i++) {
+   enum todo_command command = todo_list->items[i].command;
+   if (todo_list->items[i].arg)
+   todo_list->items[i].arg = todo_list->items[i].arg - 
old_buf + buf->buf;
+
+   if (command == TODO_PICK && !first) {
+   ALLOC_GROW(items, nr + 1, alloc);
+   memcpy(items + nr++, _item, sizeof(struct 
todo_item));


I think it would be clearer to say
items[nr++] = base_item;
rather than using memcpy. This applies below and to some of the other 
patches as well. Also this needs to loop over all the base_items if the 
user gave --exec more than once on the command line.


Best Wishes

Phillip


+   }
+
+   ALLOC_GROW(items, nr + 1, alloc);
+   memcpy(items + nr++, todo_list->items + i, sizeof(struct 
todo_item));
+   first = 0;
+   }
+
+   /* insert or append final  */
+   ALLOC_GROW(items, nr + 1, alloc);
+   memcpy(items + nr++, _item, sizeof(struct todo_item));
+
+   FREE_AND_NULL(todo_list->items);
+   todo_list->items = items;
+   todo_list->nr = nr;
+   todo_list->alloc = alloc;
+}
+
  /*
   * Add commands after pick and (series of) squash/fixup commands
   * in the todo list.
@@ -4224,10 +4268,7 @@ int sequencer_add_exec_commands(const char *commands)
  {
const char *todo_file = rebase_path_todo();
struct todo_list todo_list = TODO_LIST_INIT;
-   struct todo_item *item;
-   struct strbuf *buf = _list.buf;
-   size_t offset = 0, commands_len = strlen(commands);
-   int i, first;
+   int res;
  
  	if (strbuf_read_file(_list.buf, todo_file, 0) < 0)

return error(_("could not read '%s'."), todo_file);
@@ -4237,23 +4278,11 @@ int sequencer_add_exec_commands(const char *commands)
return error(_("unusable todo list: '%s'"), todo_file);

Re: [PATCH v2 04/16] sequencer: introduce todo_list_write_to_file()

2018-10-30 Thread Phillip Wood

Hi Alban

I like the direction this is going, it is an improvement on re-scanning 
the list at the end of each function.


On 27/10/2018 22:29, Alban Gruin wrote:

This introduce a new function to recreate the text of a todo list from
its commands, and then to write it to the disk.  This will be useful in
the future, the buffer of a todo list won’t be treated as a strict
mirror of the todo file by some of its functions once they will be
refactored.


I'd suggest rewording this slightly, maybe something like

This introduces a new function to recreate the text of a todo list from
its commands and write it to a file. This will be useful as the next few 
commits will change the use of the buffer in struct todo_list so it will 
no-longer be a mirror of the file on disk.



This functionnality can already be found in todo_list_transform(), but


s/functionnality/functionality/


it is specifically made to replace the buffer of a todo list, which is
not the desired behaviour.  Thus, the part of todo_list_transform() that
actually creates the buffer is moved to a new function,
todo_list_to_strbuf().  The rest is unused, and so is dropped.

todo_list_write_to_file() can also take care to append the help text to


s/care to append/care of appending/


the buffer before writing it to the disk, or to write only the first n
items of the list.


Why/when do we only want to write a subset of the items?


Signed-off-by: Alban Gruin 
---
  sequencer.c | 59 -
  sequencer.h |  4 +++-
  2 files changed, 43 insertions(+), 20 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 07296f1f57..0dd45677b1 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4256,24 +4256,27 @@ int sequencer_add_exec_commands(const char *commands)
return i;
  }
  
-void todo_list_transform(struct todo_list *todo_list, unsigned flags)

+static void todo_list_to_strbuf(struct todo_list *todo_list, struct strbuf 
*buf,
+   int num, unsigned flags)
  {
-   struct strbuf buf = STRBUF_INIT;
struct todo_item *item;
-   int i;
+   int i, max = todo_list->nr;
  
-	for (item = todo_list->items, i = 0; i < todo_list->nr; i++, item++) {

+   if (num > 0 && num < max)
+   max = num;
+
+   for (item = todo_list->items, i = 0; i < max; i++, item++) {
/* if the item is not a command write it and continue */
if (item->command >= TODO_COMMENT) {
-   strbuf_addf(, "%.*s\n", item->arg_len, item->arg);
+   strbuf_addf(buf, "%.*s\n", item->arg_len, item->arg);
continue;
}
  
  		/* add command to the buffer */

if (flags & TODO_LIST_ABBREVIATE_CMDS)
-   strbuf_addch(, command_to_char(item->command));
+   strbuf_addch(buf, command_to_char(item->command));
else
-   strbuf_addstr(, command_to_string(item->command));
+   strbuf_addstr(buf, command_to_string(item->command));
  
  		/* add commit id */

if (item->commit) {
@@ -4283,27 +4286,46 @@ void todo_list_transform(struct todo_list *todo_list, 
unsigned flags)
  
  			if (item->command == TODO_MERGE) {

if (item->flags & TODO_EDIT_MERGE_MSG)
-   strbuf_addstr(, " -c");
+   strbuf_addstr(buf, " -c");
else
-   strbuf_addstr(, " -C");
+   strbuf_addstr(buf, " -C");
}
  
-			strbuf_addf(, " %s", oid);

+   strbuf_addf(buf, " %s", oid);
}
  
  		/* add all the rest */

if (!item->arg_len)
-   strbuf_addch(, '\n');
+   strbuf_addch(buf, '\n');
else
-   strbuf_addf(, " %.*s\n", item->arg_len, item->arg);
+   strbuf_addf(buf, " %.*s\n", item->arg_len, item->arg);
}
+}
  
-	strbuf_reset(_list->buf);

-   strbuf_add(_list->buf, buf.buf, buf.len);
+int todo_list_write_to_file(struct todo_list *todo_list, const char *file,
+   const char *shortrevisions, const char *shortonto,
+   int command_count, int append_help, int num, 
unsigned flags)


This is a really long argument list which makes it easy for callers to 
get the parameters in the wrong order. I think append_help could 
probably be folded into the flags, I'm not sure what the command_count 
is used for but I've only read the first few patches. Maybe it would be 
better to pass a struct so we have named fields.


Best Wishes

Phillip


+{
+   int edit_todo = !(shortrevisions && shortonto), res;
+   struct strbuf buf = STRBUF_INIT;
+
+   todo_list_to_strbuf(todo_list, , 

Re: commit-graph is cool (overcoming add_missing_tags() perf issues)

2018-10-30 Thread Ævar Arnfjörð Bjarmason
On Wed, Oct 17, 2018 at 8:41 PM Elijah Newren  wrote:
> (And in the mean time I gave the user a one-liner to nuke his
> local-only tags that I suspect he doesn't need.)

Just a note that you can usually set 'fetch.pruneTags=true' these days
to make that happen.


Re: [PATCH v2] completion: use builtin completion for format-patch

2018-10-30 Thread Duy Nguyen
On Tue, Oct 30, 2018 at 7:39 AM Denton Liu  wrote:
>
> This patch offloads completion functionality for format-patch to
> __gitcomp_builtin. In addition to this, send-email was borrowing some
> completion options from format-patch so those options are moved into
> send-email's completions.
>
> Signed-off-by: Denton Liu 
> ---
>
> I ran t9902-completion.sh on this patch and it seems to pass. Thus, this
> should address the concerns about losing some completion options in
> send-email.
>
> ---
>  contrib/completion/git-completion.bash | 34 +++---
>  1 file changed, 14 insertions(+), 20 deletions(-)
>
> diff --git a/contrib/completion/git-completion.bash 
> b/contrib/completion/git-completion.bash
> index d63d2dffd..cb4ef6723 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -1531,15 +1531,6 @@ _git_fetch ()
> __git_complete_remote_or_refspec
>  }
>
> -__git_format_patch_options="
> -   --stdout --attach --no-attach --thread --thread= --no-thread
> -   --numbered --start-number --numbered-files --keep-subject --signoff
> -   --signature --no-signature --in-reply-to= --cc= --full-index --binary
> -   --not --all --cover-letter --no-prefix --src-prefix= --dst-prefix=
> -   --inline --suffix= --ignore-if-in-upstream --subject-prefix=
> -   --output-directory --reroll-count --to= --quiet --notes
> -"
> -
>  _git_format_patch ()
>  {
> case "$cur" in
> @@ -1550,7 +1541,7 @@ _git_format_patch ()
> return
> ;;
> --*)
> -   __gitcomp "$__git_format_patch_options"
> +   __gitcomp_builtin format-patch

We do want to keep some extra options back because __gitcomp_builtin
cannot show all supported options of format-patch. The reason is a
subset of options is handled by setup_revisions() which does not have
--git-completion-helper support.

> @@ -2080,16 +2071,19 @@ _git_send_email ()
> return
> ;;
> --*)
> -   __gitcomp "--annotate --bcc --cc --cc-cmd --chain-reply-to
> -   --compose --confirm= --dry-run --envelope-sender
> -   --from --identity
> -   --in-reply-to --no-chain-reply-to 
> --no-signed-off-by-cc
> -   --no-suppress-from --no-thread --quiet --reply-to
> -   --signed-off-by-cc --smtp-pass --smtp-server
> -   --smtp-server-port --smtp-encryption= --smtp-user
> -   --subject --suppress-cc= --suppress-from --thread --to
> -   --validate --no-validate
> -   $__git_format_patch_options"
> +   __gitcomp "--all --annotate --attach --bcc --binary --cc 
> --cc= --cc-cmd
> +   --chain-reply-to --compose --confirm= --cover-letter 
> --dry-run
> +   --dst-prefix= --envelope-sender --from --full-index 
> --identity
> +   --ignore-if-in-upstream --inline --in-reply-to 
> --in-reply-to=
> +   --keep-subject --no-attach --no-chain-reply-to 
> --no-prefix
> +   --no-signature --no-signed-off-by-cc 
> --no-suppress-from --not --notes
> +   --no-thread --no-validate --numbered --numbered-files
> +   --output-directory --quiet --reply-to --reroll-count 
> --signature
> +   --signed-off-by-cc --signoff --smtp-encryption= 
> --smtp-pass
> +   --smtp-server --smtp-server-port --smtp-user 
> --src-prefix=
> +   --start-number --stdout --subject --subject-prefix= 
> --suffix=
> +   --suppress-cc= --suppress-from --thread --thread= 
> --to --to=
> +   --validate"

I have no comment about this. In an ideal world, sendemail.perl could
be taught to support --git-completion-helper but I don't think my
little remaining Perl knowledge (or time) is enough to do it. Perhaps
this will do. I don't know.
-- 
Duy


Re: Lost changes after merge

2018-10-30 Thread Rafael Ascensão
The commits you mentioned are not present on the new pastes.

On Tue, Oct 30, 2018 at 03:46:28AM +0100, Gray King wrote:
> Sorry, seems the link has been expired, here is the new one:
> * Before merge run `git log --format="%h %p %d" -n 20 --all --graph`:
>

One thing I noticed, is that you're using %p with --graph. And --graph
enables parent rewriting. Which may surprise you if you don't know what
it does.

But apart from that, and assuming you only did `git merge f087081868`
everything looks normal between those two pastes.

Cheers,
Rafael Ascensão


Re: commit-graph is cool (overcoming add_missing_tags() perf issues)

2018-10-30 Thread Derrick Stolee

On 10/17/2018 2:00 PM, Elijah Newren wrote:

Hi,

Just wanted to give a shout-out for the commit-graph work and how
impressive it is.  I had an internal report from a user that git
pushes containing only one new tiny commit were taking over a minute
(in a moderate size repo with good network connectivity). After
digging for a while, I noticed three unusual things about the repo[1]:
   * he had push.followTags set to true
   * upstream repo had about 20k tags (despite only 55k commits)
   * his repo had an additional 2.5k tags, but none of these were in
 the history of the branches he was pushing and thus would not be
 included in any pushes.

Digging in, almost all the time was CPU-bound and spent in
add_missing_tags()[2].  If I'm reading the code correctly, it appears
that function loops over each tag, calling in_merge_bases_many() once
per tag.  Thus, for his case, we were potentially walking all of
history of the main branch 2.5k times.  That seemed rather suboptimal.


Elijah,

Do you still have this repo around? Could you by chance test the 
performance with the new algorithm for add_missing_tags() in [1]? 
Specifically, please test it without a commit-graph file, since your 
data shape already makes use of generation numbers pretty well.


Thanks,
-Stolee

[1] https://public-inbox.org/git/pull.60.git.gitgitgad...@gmail.com/T/#t


[PATCH 2/3] test-reach: test get_reachable_subset

2018-10-30 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The get_reachable_subset() method returns the list of commits in
the 'to' array that are reachable from at least one commit in the
'from' array. Add tests that check this method works in a few
cases:

1. All commits in the 'to' list are reachable. This exercises the
   early-termination condition.

2. Some commits in the 'to' list are reachable. This exercises the
   loop-termination condition.

3. No commits in the 'to' list are reachable. This exercises the
   NULL return condition.

Signed-off-by: Derrick Stolee 
---
 t/helper/test-reach.c | 34 
 t/t6600-test-reach.sh | 52 +++
 2 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 08d2ea68e..a0272178b 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -32,8 +32,8 @@ int cmd__reach(int ac, const char **av)
struct commit *A, *B;
struct commit_list *X, *Y;
struct object_array X_obj = OBJECT_ARRAY_INIT;
-   struct commit **X_array;
-   int X_nr, X_alloc;
+   struct commit **X_array, **Y_array;
+   int X_nr, X_alloc, Y_nr, Y_alloc;
struct strbuf buf = STRBUF_INIT;
struct repository *r = the_repository;
 
@@ -44,9 +44,10 @@ int cmd__reach(int ac, const char **av)
 
A = B = NULL;
X = Y = NULL;
-   X_nr = 0;
-   X_alloc = 16;
+   X_nr = Y_nr = 0;
+   X_alloc = Y_alloc = 16;
ALLOC_ARRAY(X_array, X_alloc);
+   ALLOC_ARRAY(Y_array, Y_alloc);
 
while (strbuf_getline(, stdin) != EOF) {
struct object_id oid;
@@ -92,6 +93,8 @@ int cmd__reach(int ac, const char **av)
 
case 'Y':
commit_list_insert(c, );
+   ALLOC_GROW(Y_array, Y_nr + 1, Y_alloc);
+   Y_array[Y_nr++] = c;
break;
 
default:
@@ -136,6 +139,29 @@ int cmd__reach(int ac, const char **av)
filter.with_commit_tag_algo = 0;
 
printf("%s(_,A,X,_):%d\n", av[1], commit_contains(, A, 
X, ));
+   } else if (!strcmp(av[1], "get_reachable_subset")) {
+   const int reachable_flag = 1;
+   int i, count = 0;
+   struct commit_list *current;
+   struct commit_list *list = get_reachable_subset(X_array, X_nr,
+   Y_array, Y_nr,
+   reachable_flag);
+   printf("get_reachable_subset(X,Y)\n");
+   for (current = list; current; current = current->next) {
+   if (!(list->item->object.flags & reachable_flag))
+   die(_("commit %s is not marked reachable"),
+   oid_to_hex(>item->object.oid));
+   count++;
+   }
+   for (i = 0; i < Y_nr; i++) {
+   if (Y_array[i]->object.flags & reachable_flag)
+   count--;
+   }
+
+   if (count < 0)
+   die(_("too many commits marked reachable"));
+
+   print_sorted_commit_ids(list);
}
 
exit(0);
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index ae94b27f7..a0c64e617 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -265,4 +265,56 @@ test_expect_success 'commit_contains:miss' '
test_three_modes commit_contains --tag
 '
 
+test_expect_success 'get_reachable_subset:all' '
+   cat >input <<-\EOF &&
+   X:commit-9-1
+   X:commit-8-3
+   X:commit-7-5
+   X:commit-6-6
+   X:commit-1-7
+   Y:commit-3-3
+   Y:commit-1-7
+   Y:commit-5-6
+   EOF
+   (
+   echo "get_reachable_subset(X,Y)" &&
+   git rev-parse commit-3-3 \
+ commit-1-7 \
+ commit-5-6 | sort
+   ) >expect &&
+   test_three_modes get_reachable_subset
+'
+
+test_expect_success 'get_reachable_subset:some' '
+   cat >input <<-\EOF &&
+   X:commit-9-1
+   X:commit-8-3
+   X:commit-7-5
+   X:commit-1-7
+   Y:commit-3-3
+   Y:commit-1-7
+   Y:commit-5-6
+   EOF
+   (
+   echo "get_reachable_subset(X,Y)" &&
+   git rev-parse commit-3-3 \
+ commit-1-7 | sort
+   ) >expect &&
+   test_three_modes get_reachable_subset
+'
+
+test_expect_success 'get_reachable_subset:none' '
+   cat >input <<-\EOF &&
+   X:commit-9-1
+   X:commit-8-3
+   X:commit-7-5
+   X:commit-1-7
+   Y:commit-9-3
+   Y:commit-7-6
+   Y:commit-2-8
+   EOF
+   echo "get_reachable_subset(X,Y)" >expect &&
+   test_three_modes get_reachable_subset
+'
+
 

[PATCH 3/3] remote: make add_missing_tags() linear

2018-10-30 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The add_missing_tags() method currently has quadratic behavior.
This is due to a linear number (based on number of tags T) of
calls to in_merge_bases_many, which has linear performance (based
on number of commits C in the repository).

Replace this O(T * C) algorithm with an O(T + C) algorithm by
using get_reachable_subset(). We ignore the return list and focus
instead on the reachable_flag assigned to the commits we care
about, because we need to interact with the tag ref and not just
the commit object.

Signed-off-by: Derrick Stolee 
---
 remote.c | 34 +-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/remote.c b/remote.c
index 81f4f01b0..b850f2feb 100644
--- a/remote.c
+++ b/remote.c
@@ -1205,9 +1205,36 @@ static void add_missing_tags(struct ref *src, struct ref 
**dst, struct ref ***ds
 * sent to the other side.
 */
if (sent_tips.nr) {
+   const int reachable_flag = 1;
+   struct commit_list *found_commits;
+   struct commit **src_commits;
+   int nr_src_commits = 0, alloc_src_commits = 16;
+   ALLOC_ARRAY(src_commits, alloc_src_commits);
+
for_each_string_list_item(item, _tag) {
struct ref *ref = item->util;
+   struct commit *commit;
+
+   if (is_null_oid(>new_oid))
+   continue;
+   commit = lookup_commit_reference_gently(the_repository,
+   >new_oid,
+   1);
+   if (!commit)
+   /* not pushing a commit, which is not an error 
*/
+   continue;
+
+   ALLOC_GROW(src_commits, nr_src_commits + 1, 
alloc_src_commits);
+   src_commits[nr_src_commits++] = commit;
+   }
+
+   found_commits = get_reachable_subset(sent_tips.tip, 
sent_tips.nr,
+src_commits, 
nr_src_commits,
+reachable_flag);
+
+   for_each_string_list_item(item, _tag) {
struct ref *dst_ref;
+   struct ref *ref = item->util;
struct commit *commit;
 
if (is_null_oid(>new_oid))
@@ -1223,7 +1250,7 @@ static void add_missing_tags(struct ref *src, struct ref 
**dst, struct ref ***ds
 * Is this tag, which they do not have, reachable from
 * any of the commits we are sending?
 */
-   if (!in_merge_bases_many(commit, sent_tips.nr, 
sent_tips.tip))
+   if (!(commit->object.flags & reachable_flag))
continue;
 
/* Add it in */
@@ -1231,7 +1258,12 @@ static void add_missing_tags(struct ref *src, struct ref 
**dst, struct ref ***ds
oidcpy(_ref->new_oid, >new_oid);
dst_ref->peer_ref = copy_ref(ref);
}
+
+   clear_commit_marks_many(nr_src_commits, src_commits, 
reachable_flag);
+   free(src_commits);
+   free_commit_list(found_commits);
}
+
string_list_clear(_tag, 0);
free(sent_tips.tip);
 }
-- 
gitgitgadget


[PATCH 0/3] Make add_missing_tags() linear

2018-10-30 Thread Derrick Stolee via GitGitGadget
As reported earlier [1], the add_missing_tags() method in remote.c has
quadratic performance. Some of that performance is curbed due to the
generation-number cutoff in in_merge_bases_many(). However, that fix doesn't
help users without a commit-graph, and it can still be painful if that
cutoff is sufficiently low compared to the tags we are using for
reachability testing.

Add a new method in commit-reach.c called get_reachable_subset() which does
a many-to-many reachability test. Starting at the 'from' commits, walk until
the generation is below the smallest generation in the 'to' commits, or all
'to' commits have been discovered. This performs only one commit walk for
the entire add_missing_tags() method, giving linear performance in the worst
case.

Tests are added in t6600-test-reach.sh to ensure get_reachable_subset()
works independently of its application in add_missing_tags().

Thanks, -Stolee

[1] 
https://public-inbox.org/git/cabpp-becpsoxudovjbdg_3w9wus102rw+e+qpmd4g3qyd-q...@mail.gmail.com/

Derrick Stolee (3):
  commit-reach: implement get_reachable_subset
  test-reach: test get_reachable_subset
  remote: make add_missing_tags() linear

 commit-reach.c| 70 +++
 commit-reach.h| 10 +++
 remote.c  | 34 -
 t/helper/test-reach.c | 34 ++---
 t/t6600-test-reach.sh | 52 
 5 files changed, 195 insertions(+), 5 deletions(-)


base-commit: c670b1f876521c9f7cd40184bf7ed05aad843433
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-60%2Fderrickstolee%2Fadd-missing-tags-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-60/derrickstolee/add-missing-tags-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/60
-- 
gitgitgadget


[PATCH 1/3] commit-reach: implement get_reachable_subset

2018-10-30 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The existing reachability algorithms in commit-reach.c focus on
finding merge-bases or determining if all commits in a set X can
reach at least one commit in a set Y. However, for two commits sets
X and Y, we may also care about which commits in Y are reachable
from at least one commit in X.

Implement get_reachable_subset() which answers this question. Given
two arrays of commits, 'from' and 'to', return a commit_list with
every commit from the 'to' array that is reachable from at least
one commit in the 'from' array.

The algorithm is a simple walk starting at the 'from' commits, using
the PARENT2 flag to indicate "this commit has already been added to
the walk queue". By marking the 'to' commits with the PARENT1 flag,
we can determine when we see a commit from the 'to' array. We remove
the PARENT1 flag as we add that commit to the result list to avoid
duplicates.

The order of the resulting list is a reverse of the order that the
commits are discovered in the walk.

There are a couple shortcuts to avoid walking more than we need:

1. We determine the minimum generation number of commits in the
   'to' array. We do not walk commits with generation number
   below this minimum.

2. We count how many distinct commits are in the 'to' array, and
   decrement this count when we discover a 'to' commit during the
   walk. If this number reaches zero, then we can terminate the
   walk.

Tests will be added using the 'test-tool reach' helper in a
subsequent commit.

Signed-off-by: Derrick Stolee 
---
 commit-reach.c | 70 ++
 commit-reach.h | 10 
 2 files changed, 80 insertions(+)

diff --git a/commit-reach.c b/commit-reach.c
index 9f79ce0a2..a98532ecc 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -688,3 +688,73 @@ int can_all_from_reach(struct commit_list *from, struct 
commit_list *to,
object_array_clear(_objs);
return result;
 }
+
+struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
+struct commit **to, int nr_to,
+int reachable_flag)
+{
+   struct commit **item;
+   struct commit *current;
+   struct commit_list *found_commits = NULL;
+   struct commit **to_last = to + nr_to;
+   struct commit **from_last = from + nr_from;
+   uint32_t min_generation = GENERATION_NUMBER_INFINITY;
+   int num_to_find = 0;
+
+   struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+
+   for (item = to; item < to_last; item++) {
+   struct commit *c = *item;
+   
+   parse_commit(c);
+   if (c->generation < min_generation)
+   min_generation = c->generation;
+
+   if (!(c->object.flags & PARENT1)) {
+   c->object.flags |= PARENT1;
+   num_to_find++;
+   }
+   }
+
+   for (item = from; item < from_last; item++) {
+   struct commit *c = *item;
+   if (!(c->object.flags & PARENT2)) {
+   c->object.flags |= PARENT2;
+   parse_commit(c);
+
+   prio_queue_put(, *item);
+   }
+   }
+
+   while (num_to_find && (current = prio_queue_get()) != NULL) {
+   struct commit_list *parents;
+
+   if (current->object.flags & PARENT1) {
+   current->object.flags &= ~PARENT1;
+   current->object.flags |= reachable_flag;
+   commit_list_insert(current, _commits);
+   num_to_find--;
+   }
+
+   for (parents = current->parents; parents; parents = 
parents->next) {
+   struct commit *p = parents->item;
+
+   parse_commit(p);
+
+   if (p->generation < min_generation)
+   continue;
+
+   if (p->object.flags & PARENT2)
+   continue;
+
+   p->object.flags |= PARENT2;
+   prio_queue_put(, p);
+   }
+   }
+
+   clear_commit_marks_many(nr_to, to, PARENT1);
+   clear_commit_marks_many(nr_from, from, PARENT2);
+
+   return found_commits;
+}
+
diff --git a/commit-reach.h b/commit-reach.h
index 7d313e297..43bd50a70 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -74,4 +74,14 @@ int can_all_from_reach_with_flag(struct object_array *from,
 int can_all_from_reach(struct commit_list *from, struct commit_list *to,
   int commit_date_cutoff);
 
+
+/*
+ * Return a list of commits containing the commits in the 'to' array
+ * that are reachable from at least one commit in the 'from' array.
+ * Also add the given 'flag' to each of the commits in the returned list.
+ */
+struct commit_list *get_reachable_subset(struct commit 

Git Test Coverage Report (Tuesday, Oct 30)

2018-10-30 Thread Derrick Stolee
This coverage report uses a new build definition that runs the four 
suites in parallel, then combines the results into the report below. You 
can see the build that generated this report at [1].


Thanks,
-Stolee

[1] https://git.visualstudio.com/git/_build/results?buildId=235=logs

---

pu: 7f50cad4af91f508e14338e87c5911f990d0a938
jch: 13c38c20c0c9afee77ede86d423027eb79c51ced
next: 53cd27f688b0bae2a5f2d8f5cea6ba63c6e4a781
master: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2
master@{1}: c670b1f876521c9f7cd40184bf7ed05aad843433

Uncovered code in 'pu' not in 'jch'
--

builtin/blame.c
7f50cad4af builtin/blame.c    200) 
repo_unuse_commit_buffer(the_repository, commit, message);
74e8221b52 builtin/blame.c    924) blame_date_width = sizeof("Thu Oct 19 
16:00");

74e8221b52 builtin/blame.c    925) break;

builtin/describe.c
7f50cad4af builtin/describe.c 257) repo_parse_commit(the_repository, p);

builtin/fsck.c
372738935a builtin/fsck.c 622) fprintf_ln(stderr, _("Checking %s link"), 
head_ref_name);

372738935a builtin/fsck.c 627) return error(_("invalid %s"), head_ref_name);

builtin/grep.c
c22b820141 builtin/grep.c  429) grep_read_unlock();

builtin/pack-objects.c
7f50cad4af builtin/pack-objects.c 2832) if 
(!repo_has_object_file(the_repository, >oid) && 
is_promisor_object(>oid))


builtin/reflog.c
c9ef0d95eb builtin/reflog.c 585) all_worktrees = 0;
c9ef0d95eb builtin/reflog.c 621) continue;

date.c
74e8221b52  113) die("Timestamp too large for this system: %"PRItime, time);
74e8221b52  216) if (tm->tm_mon == human_tm->tm_mon) {
74e8221b52  217) if (tm->tm_mday > human_tm->tm_mday) {
74e8221b52  219) } else if (tm->tm_mday == human_tm->tm_mday) {
74e8221b52  220) hide.date = hide.wday = 1;
74e8221b52  221) } else if (tm->tm_mday + 5 > human_tm->tm_mday) {
74e8221b52  223) hide.date = 1;
74e8221b52  231) gettimeofday(, NULL);
74e8221b52  232) show_date_relative(time, tz, , buf);
74e8221b52  233) return;
74e8221b52  246) hide.seconds = 1;
74e8221b52  247) hide.tz |= !hide.date;
74e8221b52  248) hide.wday = hide.time = !hide.year;
74e8221b52  262) strbuf_rtrim(buf);
74e8221b52  287) gettimeofday(, NULL);
74e8221b52  290) human_tz = local_time_tzoffset(now.tv_sec, _tm);
74e8221b52  886) static int auto_date_style(void)
74e8221b52  888) return (isatty(1) || pager_in_use()) ? DATE_HUMAN : 
DATE_NORMAL;

74e8221b52  909) return DATE_HUMAN;
74e8221b52  911) return auto_date_style();

fsck.c
7f50cad4af  858) repo_unuse_commit_buffer(the_repository, commit, buffer);
7f50cad4af  878) repo_read_object_file(the_repository,
7f50cad4af  879)   >object.oid, , );

http-push.c
7f50cad4af 1635) if (!repo_has_object_file(the_repository, _oid))
7f50cad4af 1642) if (!repo_has_object_file(the_repository, 
_ref->old_oid))


merge-recursive.c
4cdc48e412 1585) return -1;
4cdc48e412 1588) return -1;
4cdc48e412 1594) return -1;
4cdc48e412 1596) if (update_file(o, 1, b_oid, b_mode, collide_path))
4cdc48e412 1597) return -1;
4cdc48e412 1664) return -1;
4cdc48e412 1667) return -1;
4cdc48e412 1670) return -1;
b58ae691c0 1703) return -1;
387361a6a7 1738) return -1;
387361a6a7 1786) return -1;
387361a6a7 1795) new_path = unique_path(o, a->path, ci->branch1);
387361a6a7 1796) output(o, 1, _("Refusing to lose untracked file"
387361a6a7 1802) return -1;
387361a6a7 1805) return -1;
387361a6a7 1815) return -1;
387361a6a7 1831) return -1;
387361a6a7 1834) return -1;

midx.c
1dcd9f2043  184) return;

negotiator/default.c
7f50cad4af  71) if (repo_parse_commit(the_repository, commit))

packfile.c
dc7d664335  350) close_midx(o->multi_pack_index);
dc7d664335  351) o->multi_pack_index = NULL;

refs.c
3a3b9d8cde  657) return 0;

refs/files-backend.c
remote.c
879b6a9e6f 1140) return error(_("dst ref %s receives from more than one 
src."),


revision.c
7f50cad4af  726) if (repo_parse_commit(the_repository, p) < 0)

sequencer.c
7f50cad4af 1624) repo_unuse_commit_buffer(the_repository, head_commit,
7f50cad4af 3865) repo_unuse_commit_buffer(the_repository,

sha1-array.c
7fdf90d541 91) oidcpy([dst], [src]);

submodule-config.c
bcbc780d14 740) return CONFIG_INVALID_KEY;
45f5ef3d77 755) warning(_("Could not update .gitmodules entry %s"), key);

submodule.c
b303ef65e7  524) the_repository->submodule_prefix :
060675d4fc 1378) strbuf_release();
183be9660a 1501) struct get_next_submodule_task *task = task_cb;
183be9660a 1505) get_next_submodule_task_release(task);
183be9660a 1532) return 0;
183be9660a 1536) goto out;
183be9660a 1551) return 0;

tree.c
7f50cad4af 108) if (repo_parse_commit(the_repository, commit))

worktree.c
3a3b9d8cde 495) return -1;
3a3b9d8cde 508) return -1;
3a3b9d8cde 517) return -1;
ab3e1f78ae 537) break;

wrapper.c
5efde212fc  70) die("Out of memory, malloc failed (tried to allocate %" 
PRIuMAX " bytes)",
5efde212fc  73) error("Out of memory, malloc failed (tried to allocate 
%" PRIuMAX " bytes)",


Commits introducing uncovered code:
Ævar Arnfjörð Bjarmason  879b6a9e6: i18n: remote.c: mark error(...) 

Re: [PATCH v3 1/3] [Outreachy] t3903-stash: test without configured user name

2018-10-30 Thread Slavica Djukic



On 26-Oct-18 3:13 AM, Junio C Hamano wrote:

Slavica Djukic  writes:


From: Slavica 

Please make sure this matches your sign-off below.


This is part of enhancement request that ask for 'git stash' to work
even if 'user.name' and 'user.email' are not configured.
Due to an implementation detail, git-stash undesirably requires
'user.name' and 'user.email' to be set, but shouldn't.
The issue is discussed here:
https://public-inbox.org/git/87o9debty4@evledraar.gmail.com/T/#u.

As the four lines above summarize the issue being highlighted by the
expect-failure rather well, the last two lines are unnecessary.
Please remove them.  Alternatively, you can place them after the
three-dash lines we see below.


Signed-off-by: Slavica Djukic 
---
  t/t3903-stash.sh | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index 9e06494ba0..ae2c905343 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -1156,4 +1156,18 @@ test_expect_success 'stash --  works with binary 
files' '
test_path_is_file subdir/untracked
  '
  
+test_expect_failure 'stash works when user.name and user.email are not set' '

+test_commit 1 &&

Just being curious, but do we need a fresh commit created at this
point in the test?  Many tests before this one begin with "git reset"
and then run "git stash" without ever creating commit themselves,
instead relying on the fact that there already is at least one
commit created in the "setup" phase of the test that a "stash"
created can be made relative to.  I do not think this test is all
that special in that regard to require its own commit.


No, we don't need fresh commit here. Thank you for this and all other 
suggestions.


I've changed test according to them.




+test_config user.useconfigonly true &&
+test_config stash.usebuiltin true &&
+sane_unset GIT_AUTHOR_NAME &&
+sane_unset GIT_AUTHOR_EMAIL &&
+sane_unset GIT_COMMITTER_NAME &&
+sane_unset GIT_COMMITTER_EMAIL &&
+test_unconfig user.email &&

There are trailing whitespaces on the line above.  Please remove.

Also, Don't be original in the form alone---all other tests in this
file indent with a leading HT, not four SPs.  Please match the style
of surrounding code.


+test_unconfig user.name &&
+echo changed >1.t &&
+git stash
+'
+
  test_done

Thanks.  Please do not reroll the next round at too rapid a pace.


I've taken my time for next round, I am working on 2/3 and 3/3 parts as 
well.

I wouldn't have sent this patch if I understood you well in previous reply.

Thank you,

Slavica.







Re: [PATCH v7 09/10] submodule: support reading .gitmodules when it's not in the working tree

2018-10-30 Thread Antonio Ospite
On Tue, 30 Oct 2018 10:57:09 +0100 (STD)
Johannes Schindelin  wrote:

> Hi Antonio,
>

Hi Johannes,

> On Thu, 25 Oct 2018, Antonio Ospite wrote:
> 
> > diff --git a/t/t7418-submodule-sparse-gitmodules.sh 
> > b/t/t7418-submodule-sparse-gitmodules.sh
> > new file mode 100755
[...]
> > +   echo "$PWD/submodule" >expect &&
> 
> Would you mind squashing this fixup in?
> 
> -- snip --
> diff --git a/t/t7418-submodule-sparse-gitmodules.sh 
> b/t/t7418-submodule-sparse-gitmodules.sh
> index 21a86b89c6cb..3f7f27188313 100755
> --- a/t/t7418-submodule-sparse-gitmodules.sh
> +++ b/t/t7418-submodule-sparse-gitmodules.sh
> @@ -55,7 +55,7 @@ test_expect_success 'initialising submodule when the 
> gitmodules config is not ch
>   test_must_fail git -C super config submodule.submodule.url &&
>   git -C super submodule init &&
>   git -C super config submodule.submodule.url >actual &&
> - echo "$PWD/submodule" >expect &&
> + echo "$(pwd)/submodule" >expect &&
>   test_cmp expect actual
>  '
>  
> -- snap --
> 
> On Windows, `$PWD` and `$(pwd)` are *not* synonymous. The former
> reflects the "Unix path" which is understood by the Bash script (and
> only by the Bash script, *not* by `git.exe`!) while the latter refers to
> the actual Windows path.
>

I see, this is also mentioned in t/README, I had overlooked that part.
Thank you for reporting.

> Without this fix, your new test case will fail on Windows all the time,
> see e.g.
> https://git-for-windows.visualstudio.com/git/_build/results?buildId=22913=logs
> 

Junio, what is the plan for 'ao/submodule-wo-gitmodules-checked-out'?
I see it's not in next yet; do you want me to resend the whole series
with this fixup in or would it be less overhead for you to apply it
directly to patch 9/10 from v7 of the series?

Thank you,
   Antonio

P.S. I was wondering if it is worth having patchset versions mentioned
somewhere in pu/, maybe in merge commits if not in branch names?
Or at least in whats-cooking.txt next to the date.

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?


[PATCH v3 3/5] am: rename read_author_script()

2018-10-30 Thread Phillip Wood
From: Phillip Wood 

Rename read_author_script() in preparation for adding a shared
read_author_script() function to libgit.

Signed-off-by: Phillip Wood 
---
 builtin/am.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index d42b725273..991d13f9a2 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -306,7 +306,7 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
  * script, and thus if the file differs from what this function expects, it is
  * better to bail out than to do something that the user does not expect.
  */
-static int read_author_script(struct am_state *state)
+static int read_am_author_script(struct am_state *state)
 {
const char *filename = am_path(state, "author-script");
struct strbuf buf = STRBUF_INIT;
@@ -441,7 +441,7 @@ static void am_load(struct am_state *state)
BUG("state file 'last' does not exist");
state->last = strtol(sb.buf, NULL, 10);
 
-   if (read_author_script(state) < 0)
+   if (read_am_author_script(state) < 0)
die(_("could not parse author script"));
 
read_commit_msg(state);
-- 
2.19.1



[PATCH v3 4/5] add read_author_script() to libgit

2018-10-30 Thread Phillip Wood
From: Phillip Wood 

Add read_author_script() to sequencer.c based on the implementation in
builtin/am.c and update read_am_author_script() to use
read_author_script(). The sequencer code that reads the author script
will be updated in the next commit.

Signed-off-by: Phillip Wood 
---

Notes:
changes since v1:
 - added comment above read_author_script()
 - rebased to reflect changes added in patch 2

 builtin/am.c |  86 +
 sequencer.c  | 105 +++
 sequencer.h  |   3 ++
 3 files changed, 110 insertions(+), 84 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 991d13f9a2..c5373158c0 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -260,36 +260,6 @@ static int read_state_file(struct strbuf *sb, const struct 
am_state *state,
die_errno(_("could not read '%s'"), am_path(state, file));
 }
 
-/**
- * Take a series of KEY='VALUE' lines where VALUE part is
- * sq-quoted, and append  at the end of the string list
- */
-static int parse_key_value_squoted(char *buf, struct string_list *list)
-{
-   while (*buf) {
-   struct string_list_item *item;
-   char *np;
-   char *cp = strchr(buf, '=');
-   if (!cp) {
-   np = strchrnul(buf, '\n');
-   return error(_("unable to parse '%.*s'"),
-(int) (np - buf), buf);
-   }
-   np = strchrnul(cp, '\n');
-   *cp++ = '\0';
-   item = string_list_append(list, buf);
-
-   buf = np + (*np == '\n');
-   *np = '\0';
-   cp = sq_dequote(cp);
-   if (!cp)
-   return error(_("unable to dequote value of '%s'"),
-item->string);
-   item->util = xstrdup(cp);
-   }
-   return 0;
-}
-
 /**
  * Reads and parses the state directory's "author-script" file, and sets
  * state->author_name, state->author_email and state->author_date accordingly.
@@ -309,65 +279,13 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
 static int read_am_author_script(struct am_state *state)
 {
const char *filename = am_path(state, "author-script");
-   struct strbuf buf = STRBUF_INIT;
-   struct string_list kv = STRING_LIST_INIT_DUP;
-   int retval = -1; /* assume failure */
-   int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
-   int fd;
 
assert(!state->author_name);
assert(!state->author_email);
assert(!state->author_date);
 
-   fd = open(filename, O_RDONLY);
-   if (fd < 0) {
-   if (errno == ENOENT)
-   return 0;
-   return error_errno(_("could not open '%s' for reading"),
-  filename);
-   }
-   strbuf_read(, fd, 0);
-   close(fd);
-   if (parse_key_value_squoted(buf.buf, ))
-   goto finish;
-
-   for (i = 0; i < kv.nr; i++) {
-   if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
-   if (name_i >= 0)
-   name_i = error(_("'GIT_AUTHOR_NAME' already 
given"));
-   else
-   name_i = i;
-   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
-   if (email_i >= 0)
-   email_i = error(_("'GIT_AUTHOR_EMAIL' already 
given"));
-   else
-   email_i = i;
-   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
-   if (date_i >= 0)
-   date_i = error(_("'GIT_AUTHOR_DATE' already 
given"));
-   else
-   date_i = i;
-   } else {
-   err = error(_("unknown variable '%s'"),
-   kv.items[i].string);
-   }
-   }
-   if (name_i == -2)
-   error(_("missing 'GIT_AUTHOR_NAME'"));
-   if (email_i == -2)
-   error(_("missing 'GIT_AUTHOR_EMAIL'"));
-   if (date_i == -2)
-   error(_("missing 'GIT_AUTHOR_DATE'"));
-   if (date_i < 0 || email_i < 0 || date_i < 0 || err)
-   goto finish;
-   state->author_name = kv.items[name_i].util;
-   state->author_email = kv.items[email_i].util;
-   state->author_date = kv.items[date_i].util;
-   retval = 0;
-finish:
-   string_list_clear(, !!retval);
-   strbuf_release();
-   return retval;
+   return read_author_script(filename, >author_name,
+ >author_email, >author_date, 1);
 }
 
 /**
diff --git a/sequencer.c b/sequencer.c
index dc2c58d464..3530dbeb6c 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -660,6 +660,111 @@ static int 

[PATCH v3 1/5] am: don't die in read_author_script()

2018-10-30 Thread Phillip Wood
From: Phillip Wood 

The caller is already prepared to handle errors returned from this
function so there is no need for it to die if it cannot read the file.

Suggested-by: Eric Sunshine 
Signed-off-by: Phillip Wood 
---
 builtin/am.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/am.c b/builtin/am.c
index 5e866d17c7..b68578bc3f 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -318,7 +318,8 @@ static int read_author_script(struct am_state *state)
if (fd < 0) {
if (errno == ENOENT)
return 0;
-   die_errno(_("could not open '%s' for reading"), filename);
+   return error_errno(_("could not open '%s' for reading"),
+  filename);
}
strbuf_read(, fd, 0);
close(fd);
-- 
2.19.1



[PATCH v3 5/5] sequencer: use read_author_script()

2018-10-30 Thread Phillip Wood
From: Phillip Wood 

Use the new function added in the last commit to read the author
script, updating read_env_script() and read_author_ident(). We now
have a single code path that reads the author script for am and all
flavors of rebase. This changes the behavior of read_env_script() as
previously it would set any environment variables that were in the
author-script file. Now it is an error if the file contains other
variables or any of GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL and
GIT_AUTHOR_DATE are missing. This is what am and the non interactive
version of rebase have been doing for several years so hopefully it
will not cause a problem for interactive rebase users. The advantage
is that we are reusing existing code from am which uses sq_dequote()
to properly dequote variables. This fixes potential problems with user
edited scripts as read_env_script() which did not track quotes
properly.

This commit also removes the fallback code for checking for a broken
author script after git is upgraded when a rebase is stopped. Now that
the parsing uses sq_dequote() it will reliably return an error if the
quoting is broken and the user will have to abort the rebase and
restart. This isn't ideal but it's a corner case and the detection of
the broken quoting could be confused by user edited author scripts.

Signed-off-by: Phillip Wood 
---

Notes:
changes since v1
 - use argv_array_pushf() as suggested by Eric
 - fixed strbuf handling as suggested by Eric
 - fix comments and commit message to reflect changed behavior of
   read_env_script()

 sequencer.c | 97 -
 1 file changed, 21 insertions(+), 76 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 3530dbeb6c..987542f67c 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -767,53 +767,24 @@ int read_author_script(const char *path, char **name, 
char **email, char **date,
 }
 
 /*
- * write_author_script() used to fail to terminate the last line with a "'" and
- * also escaped "'" incorrectly as "'''" rather than "'\\''". We check for
- * the terminating "'" on the last line to see how "'" has been escaped in case
- * git was upgraded while rebase was stopped.
- */
-static int quoting_is_broken(const char *s, size_t n)
-{
-   /* Skip any empty lines in case the file was hand edited */
-   while (n > 0 && s[--n] == '\n')
-   ; /* empty */
-   if (n > 0 && s[n] != '\'')
-   return 1;
-
-   return 0;
-}
-
-/*
- * Read a list of environment variable assignments (such as the author-script
- * file) into an environment block. Returns -1 on error, 0 otherwise.
+ * Read a GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL AND GIT_AUTHOR_DATE from a
+ * file with shell quoting into struct argv_array. Returns -1 on
+ * error, 0 otherwise.
  */
 static int read_env_script(struct argv_array *env)
 {
-   struct strbuf script = STRBUF_INIT;
-   int i, count = 0, sq_bug;
-   const char *p2;
-   char *p;
+   char *name, *email, *date;
 
-   if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0)
+   if (read_author_script(rebase_path_author_script(),
+  , , , 0))
return -1;
-   /* write_author_script() used to quote incorrectly */
-   sq_bug = quoting_is_broken(script.buf, script.len);
-   for (p = script.buf; *p; p++)
-   if (sq_bug && skip_prefix(p, "'''", ))
-   strbuf_splice(, p - script.buf, p2 - p, "'", 1);
-   else if (skip_prefix(p, "'\\''", ))
-   strbuf_splice(, p - script.buf, p2 - p, "'", 1);
-   else if (*p == '\'')
-   strbuf_splice(, p-- - script.buf, 1, "", 0);
-   else if (*p == '\n') {
-   *p = '\0';
-   count++;
-   }
 
-   for (i = 0, p = script.buf; i < count; i++) {
-   argv_array_push(env, p);
-   p += strlen(p) + 1;
-   }
+   argv_array_pushf(env, "GIT_AUTHOR_NAME=%s", name);
+   argv_array_pushf(env, "GIT_AUTHOR_EMAIL=%s", email);
+   argv_array_pushf(env, "GIT_AUTHOR_DATE=%s", date);
+   free(name);
+   free(email);
+   free(date);
 
return 0;
 }
@@ -833,54 +804,28 @@ static char *get_author(const char *message)
 /* Read author-script and return an ident line (author  timestamp) */
 static const char *read_author_ident(struct strbuf *buf)
 {
-   const char *keys[] = {
-   "GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
-   };
struct strbuf out = STRBUF_INIT;
-   char *in, *eol;
-   const char *val[3];
-   int i = 0;
+   char *name, *email, *date;
 
-   if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
+   if (read_author_script(rebase_path_author_script(),
+  , , , 0))
return NULL;
 
-   /* dequote values and construct 

[PATCH v3 2/5] am: improve author-script error reporting

2018-10-30 Thread Phillip Wood
From: Phillip Wood 

If there are errors in a user edited author-script there was no
indication of what was wrong. This commit adds some specific error messages
depending on the problem. It also relaxes the requirement that the
variables appear in a specific order in the file to match the behavior
of 'rebase --interactive'.

Signed-off-by: Phillip Wood 
---
 builtin/am.c | 49 +++--
 1 file changed, 39 insertions(+), 10 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index b68578bc3f..d42b725273 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -270,8 +270,11 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
struct string_list_item *item;
char *np;
char *cp = strchr(buf, '=');
-   if (!cp)
-   return -1;
+   if (!cp) {
+   np = strchrnul(buf, '\n');
+   return error(_("unable to parse '%.*s'"),
+(int) (np - buf), buf);
+   }
np = strchrnul(cp, '\n');
*cp++ = '\0';
item = string_list_append(list, buf);
@@ -280,7 +283,8 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
*np = '\0';
cp = sq_dequote(cp);
if (!cp)
-   return -1;
+   return error(_("unable to dequote value of '%s'"),
+item->string);
item->util = xstrdup(cp);
}
return 0;
@@ -308,6 +312,7 @@ static int read_author_script(struct am_state *state)
struct strbuf buf = STRBUF_INIT;
struct string_list kv = STRING_LIST_INIT_DUP;
int retval = -1; /* assume failure */
+   int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
int fd;
 
assert(!state->author_name);
@@ -326,14 +331,38 @@ static int read_author_script(struct am_state *state)
if (parse_key_value_squoted(buf.buf, ))
goto finish;
 
-   if (kv.nr != 3 ||
-   strcmp(kv.items[0].string, "GIT_AUTHOR_NAME") ||
-   strcmp(kv.items[1].string, "GIT_AUTHOR_EMAIL") ||
-   strcmp(kv.items[2].string, "GIT_AUTHOR_DATE"))
+   for (i = 0; i < kv.nr; i++) {
+   if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
+   if (name_i >= 0)
+   name_i = error(_("'GIT_AUTHOR_NAME' already 
given"));
+   else
+   name_i = i;
+   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
+   if (email_i >= 0)
+   email_i = error(_("'GIT_AUTHOR_EMAIL' already 
given"));
+   else
+   email_i = i;
+   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
+   if (date_i >= 0)
+   date_i = error(_("'GIT_AUTHOR_DATE' already 
given"));
+   else
+   date_i = i;
+   } else {
+   err = error(_("unknown variable '%s'"),
+   kv.items[i].string);
+   }
+   }
+   if (name_i == -2)
+   error(_("missing 'GIT_AUTHOR_NAME'"));
+   if (email_i == -2)
+   error(_("missing 'GIT_AUTHOR_EMAIL'"));
+   if (date_i == -2)
+   error(_("missing 'GIT_AUTHOR_DATE'"));
+   if (date_i < 0 || email_i < 0 || date_i < 0 || err)
goto finish;
-   state->author_name = kv.items[0].util;
-   state->author_email = kv.items[1].util;
-   state->author_date = kv.items[2].util;
+   state->author_name = kv.items[name_i].util;
+   state->author_email = kv.items[email_i].util;
+   state->author_date = kv.items[date_i].util;
retval = 0;
 finish:
string_list_clear(, !!retval);
-- 
2.19.1



[PATCH v3 0/5] am/rebase: share read_author_script()

2018-10-30 Thread Phillip Wood
From: Phillip Wood 

Thanks to Junio for the feedback on v2. I've updated patch 4 based on
those comments, the rest are unchanged.

v1 cover letter:

This is a follow up to pw/rebase-i-author-script-fix, it reduces code
duplication and improves rebase's parsing of the author script. After
this I'll do another series to share the code to write the author
script.

Phillip Wood (5):
  am: don't die in read_author_script()
  am: improve author-script error reporting
  am: rename read_author_script()
  add read_author_script() to libgit
  sequencer: use read_author_script()

 builtin/am.c |  60 ++--
 sequencer.c  | 192 ---
 sequencer.h  |   3 +
 3 files changed, 128 insertions(+), 127 deletions(-)

-- 
2.19.1



Re: js/mingw-http-ssl, was Re: What's cooking in git.git (Oct 2018, #05; Fri, 26)

2018-10-30 Thread Johannes Schindelin
Hi Junio,

On Tue, 30 Oct 2018, Junio C Hamano wrote:

> Eric Sunshine  writes:
> 
> > On Mon, Oct 29, 2018 at 10:10 PM Junio C Hamano  wrote:
> >> How's this?
> >>
> >> On platforms with recent cURL library, http.sslBackend configuration
> >> variable can be used to choose different SSL backend at runtime.
> >
> > s/choose/& a/
> >
> >> The Windows port uses this mechanism to switch between OpenSSL and
> >> Secure Channel while talking over the HTTPS protocol.
> 
> Thanks.
> 
> By the way, I am beginning to like this phrasing quite a lot.  It
> encourages those with other ports like Linux and Mac to see if they
> want a similar runtime switching by singling out Windows port, which
> was what I wanted to achive with the original one, but I think the
> updated text does so more effectively.

I like it, too!

Thanks,
Dscho


Re: [RFC PATCH] remote: add --fetch option to git remote set-url

2018-10-30 Thread Junio C Hamano
Denton Liu  writes:

> On Mon, Oct 29, 2018 at 02:57:28PM +0900, Junio C Hamano wrote:
> ...
>> Earlier you had a check like this:
>> 
>> > +  if (push_mode && fetch_mode)
>> > +  die(_("--push --fetch doesn't make sense"));
>> 
>> If a request to "--fetch" is ignored when "--add" is given, for
>> example, shouldn't the combination also be diagnosed as "not making
>> sense, we'd ignore your wish to use the --fetch option"?  Similarly
>> for the case where there already is pushurl defined for the remote.

Clarification.  Here I am suggesting that a part of the logic in the
earlier assignment to move_fetch_to_push should become a similar
call to die(), detecting a competing and unsatisfiable wish, rather
than getting silently ignored.

>> This is a different tangent on the same line, but it could be that
>> the user wants to have two (or more) pushURLs because the user wants
>> to push to two remotes at the same time with "git push this-remote",
>> so silently ignoring "--force" may not be the right thing in the

Correction.  s/--force/--fetch/ was what I meant here.

>> first place.  We may instead need to make the value of URL to an
>> extra pushURL entry (if we had one, we now have two).

Also, additionally, since there is no use to have two or more URL,
because unlike "git push $there" that can push to two places,
fetching from two places into the same set of remote-tracking
branches would not make sense, --move-pushURL-to-URL-and-set-pushURL
operation that is the symmetry of what the patch under discussion
proposes should fail instead of creating an extra URL entry, breaking
an apparent symmetry.

> Perhaps I should motivate the use-case for this option. There have been
> times when I've had the URL set to something but no pushURL. I've
> wanted to only change the fetching URL and keep the pushing URL the same

URL (plus optionally pushURL) split was done because most everybody
were fetching from and pushing to the same place; it would not have
made any sense to have fetchURL and pushURL that are separate, as
that would have forced everybody to have both, when majority of the
users would have to set them to the same value.

Quite honestly, tweaking URL and/or pushURL is not something you'd
do every three months or more frequently, so I do not particularly
feel sympathetic to the cause of this patch, which would allow
setting one to the value that happens to be set to the other one
while setting an arbitrary new value to the other one.  Once the
user's need deviates from that single niche pattern, the user needs
to update both, and setting these two independently is quite simple
and straight-forward in the first place.  And that same simple pattern
to set two independently can be used without learning this new option.

> My implementation of --fetch is supposed to emulate what would happen if
> git were implemented with fetchurl/pushurl instead. Does the patch make
> more sense in this context?

Hmph.  As a system that has fetchURL and pushURL independently and
forces everybody to set both does not make much sense in the first
place, the patch making sense in that context is not a very strong
reason to support it.

> Please let me know if you think that the concept behind this patch is a
> good idea. Thanks!

If I don't then I do not have to let you know, then ;-) 

I do not particularly think it is a horrible idea, but I do not see
that it would be a feature that helps particuarly wide audience.


Re: [PATCH v7 09/10] submodule: support reading .gitmodules when it's not in the working tree

2018-10-30 Thread Johannes Schindelin
Hi Antonio,

On Thu, 25 Oct 2018, Antonio Ospite wrote:

> diff --git a/t/t7418-submodule-sparse-gitmodules.sh 
> b/t/t7418-submodule-sparse-gitmodules.sh
> new file mode 100755
> index 00..21a86b89c6
> --- /dev/null
> +++ b/t/t7418-submodule-sparse-gitmodules.sh
> @@ -0,0 +1,122 @@
> +#!/bin/sh
> +#
> +# Copyright (C) 2018  Antonio Ospite 
> +#
> +
> +test_description='Test reading/writing .gitmodules when not in the working 
> tree
> +
> +This test verifies that, when .gitmodules is in the current branch but is not
> +in the working tree reading from it still works but writing to it does not.
> +
> +The test setup uses a sparse checkout, however the same scenario can be set 
> up
> +also by committing .gitmodules and then just removing it from the filesystem.
> +'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'sparse checkout setup which hides .gitmodules' '
> + git init upstream &&
> + git init submodule &&
> + (cd submodule &&
> + echo file >file &&
> + git add file &&
> + test_tick &&
> + git commit -m "Add file"
> + ) &&
> + (cd upstream &&
> + git submodule add ../submodule &&
> + test_tick &&
> + git commit -m "Add submodule"
> + ) &&
> + git clone upstream super &&
> + (cd super &&
> + cat >.git/info/sparse-checkout <<-\EOF &&
> + /*
> + !/.gitmodules
> + EOF
> + git config core.sparsecheckout true &&
> + git read-tree -m -u HEAD &&
> + test_path_is_missing .gitmodules
> + )
> +'
> +
> +test_expect_success 'reading gitmodules config file when it is not checked 
> out' '
> + echo "../submodule" >expect &&
> + git -C super submodule--helper config submodule.submodule.url >actual &&
> + test_cmp expect actual
> +'
> +
> +test_expect_success 'not writing gitmodules config file when it is not 
> checked out' '
> + test_must_fail git -C super submodule--helper config 
> submodule.submodule.url newurl &&
> + test_path_is_missing super/.gitmodules
> +'
> +
> +test_expect_success 'initialising submodule when the gitmodules config is 
> not checked out' '
> + test_must_fail git -C super config submodule.submodule.url &&
> + git -C super submodule init &&
> + git -C super config submodule.submodule.url >actual &&
> + echo "$PWD/submodule" >expect &&

Would you mind squashing this fixup in?

-- snip --
diff --git a/t/t7418-submodule-sparse-gitmodules.sh 
b/t/t7418-submodule-sparse-gitmodules.sh
index 21a86b89c6cb..3f7f27188313 100755
--- a/t/t7418-submodule-sparse-gitmodules.sh
+++ b/t/t7418-submodule-sparse-gitmodules.sh
@@ -55,7 +55,7 @@ test_expect_success 'initialising submodule when the 
gitmodules config is not ch
test_must_fail git -C super config submodule.submodule.url &&
git -C super submodule init &&
git -C super config submodule.submodule.url >actual &&
-   echo "$PWD/submodule" >expect &&
+   echo "$(pwd)/submodule" >expect &&
test_cmp expect actual
 '
 
-- snap --

On Windows, `$PWD` and `$(pwd)` are *not* synonymous. The former
reflects the "Unix path" which is understood by the Bash script (and
only by the Bash script, *not* by `git.exe`!) while the latter refers to
the actual Windows path.

Without this fix, your new test case will fail on Windows all the time,
see e.g.
https://git-for-windows.visualstudio.com/git/_build/results?buildId=22913=logs

Thanks,
Johannes


> + test_cmp expect actual
> +'
> +
> +test_expect_success 'updating submodule when the gitmodules config is not 
> checked out' '
> + test_path_is_missing super/submodule/file &&
> + git -C super submodule update &&
> + test_cmp submodule/file super/submodule/file
> +'
> +
> +ORIG_SUBMODULE=$(git -C submodule rev-parse HEAD)
> +ORIG_UPSTREAM=$(git -C upstream rev-parse HEAD)
> +ORIG_SUPER=$(git -C super rev-parse HEAD)
> +
> +test_expect_success 're-updating submodule when the gitmodules config is not 
> checked out' '
> + test_when_finished "git -C submodule reset --hard $ORIG_SUBMODULE;
> + git -C upstream reset --hard $ORIG_UPSTREAM;
> + git -C super reset --hard $ORIG_SUPER;
> + git -C upstream submodule update --remote;
> + git -C super pull;
> + git -C super submodule update --remote" &&
> + (cd submodule &&
> + echo file2 >file2 &&
> + git add file2 &&
> + test_tick &&
> + git commit -m "Add file2 to submodule"
> + ) &&
> + (cd upstream &&
> + git submodule update --remote &&
> + git add submodule &&
> + test_tick &&
> + git commit -m "Update submodule"
> + ) &&
> + git -C super pull &&
> + # The --for-status options reads the gitmodules config
> + git -C super submodule 

[PATCH 1/2] t7800: fix quoting

2018-10-30 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

When passing a command-line to call an external diff command to the
difftool, we must be prepared for paths containing special characters,
e.g. backslashes in the temporary directory's path on Windows.

This patch is needed in preparation for the next commit, which will
make the MinGW version of Git *not* rewrite TMP to use forward slashes
instead of backslashes.

Signed-off-by: Johannes Schindelin 
---
 t/t7800-difftool.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t7800-difftool.sh b/t/t7800-difftool.sh
index 562bd215a..22b9199d5 100755
--- a/t/t7800-difftool.sh
+++ b/t/t7800-difftool.sh
@@ -332,7 +332,7 @@ test_expect_success 'difftool --extcmd cat arg1' '
 test_expect_success 'difftool --extcmd cat arg2' '
echo branch >expect &&
git difftool --no-prompt \
-   --extcmd sh\ -c\ \"cat\ \$2\" branch >actual &&
+   --extcmd sh\ -c\ \"cat\ \\\"\$2\\\"\" branch >actual &&
test_cmp expect actual
 '
 
-- 
gitgitgadget



[PATCH 2/2] mingw: reencode environment variables on the fly (UTF-16 <-> UTF-8)

2018-10-30 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

On Windows, the authoritative environment is encoded in UTF-16. In Git
for Windows, we convert that to UTF-8 (because UTF-16 is *such* a
foreign idea to Git that its source code is unprepared for it).

Previously, out of performance concerns, we converted the entire
environment to UTF-8 in one fell swoop at the beginning, and upon
putenv() and run_command() converted it back.

Having a private copy of the environment comes with its own perils: when
a library used by Git's source code tries to modify the environment, it
does not really work (in Git for Windows' case, libcurl, see
https://github.com/git-for-windows/git/compare/bcad1e6d58^...bcad1e6d58^2
for a glimpse of the issues).

Hence, it makes our environment handling substantially more robust if we
switch to on-the-fly-conversion in `getenv()`/`putenv()` calls. Based
on an initial version in the MSVC context by Jeff Hostetler, this patch
makes it so.

Surprisingly, this has a *positive* effect on speed: at the time when
the current code was written, we tested the performance, and there were
*so many* `getenv()` calls that it seemed better to convert everything
in one go. In the meantime, though, Git has obviously been cleaned up a
bit with regards to `getenv()` calls so that the Git processes spawned
by the test suite use an average of only 40 `getenv()`/`putenv()` calls
over the process lifetime.

Speaking of the entire test suite: the total time spent in the
re-encoding in the current code takes about 32.4 seconds (out of 113
minutes runtime), whereas the code introduced in this patch takes only
about 8.2 seconds in total. Not much, but it proves that we need not be
concerned about the performance impact introduced by this patch.

Helped-by: Jeff Hostetler 
Signed-off-by: Johannes Schindelin 
---
 compat/mingw.c | 280 +
 compat/mingw.h |  32 +-
 2 files changed, 196 insertions(+), 116 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 44264fe3f..0cd432441 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1071,44 +1071,121 @@ static char *path_lookup(const char *cmd, int exe_only)
return prog;
 }
 
-static int do_putenv(char **env, const char *name, int size, int free_old);
+static const wchar_t *wcschrnul(const wchar_t *s, wchar_t c)
+{
+   while (*s && *s != c)
+   s++;
+   return s;
+}
+
+/* Compare only keys */
+static int wenvcmp(const void *a, const void *b)
+{
+   wchar_t *p = *(wchar_t **)a, *q = *(wchar_t **)b;
+   size_t p_len, q_len;
+
+   /* Find the keys */
+   p_len = wcschrnul(p, L'=') - p;
+   q_len = wcschrnul(q, L'=') - q;
+
+   /* If the length differs, include the shorter key's NUL */
+   if (p_len < q_len)
+   p_len++;
+   else if (p_len > q_len)
+   p_len = q_len + 1;
+
+   return _wcsnicmp(p, q, p_len);
+}
 
-/* used number of elements of environ array, including terminating NULL */
-static int environ_size = 0;
-/* allocated size of environ array, in bytes */
-static int environ_alloc = 0;
+/* We need a stable sort to convert the environment between UTF-16 <-> UTF-8 */
+#ifndef INTERNAL_QSORT
+#include "qsort.c"
+#endif
 
 /*
- * Create environment block suitable for CreateProcess. Merges current
- * process environment and the supplied environment changes.
+ * Build an environment block combining the inherited environment
+ * merged with the given list of settings.
+ *
+ * Values of the form "KEY=VALUE" in deltaenv override inherited values.
+ * Values of the form "KEY" in deltaenv delete inherited values.
+ *
+ * Multiple entries in deltaenv for the same key are explicitly allowed.
+ *
+ * We return a contiguous block of UNICODE strings with a final trailing
+ * zero word.
  */
 static wchar_t *make_environment_block(char **deltaenv)
 {
-   wchar_t *wenvblk = NULL;
-   char **tmpenv;
-   int i = 0, size = environ_size, wenvsz = 0, wenvpos = 0;
+   wchar_t *wenv = GetEnvironmentStringsW(), *wdeltaenv, *result, *p;
+   size_t wlen, s, delta_size, size;
+
+   wchar_t **array = NULL;
+   size_t alloc = 0, nr = 0, i;
+
+   size = 1; /* for extra NUL at the end */
+
+   /* If there is no deltaenv to apply, simply return a copy. */
+   if (!deltaenv || !*deltaenv) {
+   for (p = wenv; p && *p; ) {
+   size_t s = wcslen(p) + 1;
+   size += s;
+   p += s;
+   }
+
+   ALLOC_ARRAY(result, size);
+   memcpy(result, wenv, size * sizeof(*wenv));
+   FreeEnvironmentStringsW(wenv);
+   return result;
+   }
+
+   /*
+* If there is a deltaenv, let's accumulate all keys into `array`,
+* sort them using the stable git_qsort() and then copy, skipping
+* duplicate keys
+*/
+   for (p = wenv; p && *p; ) {
+   ALLOC_GROW(array, nr + 1, 

  1   2   >