Re: [PATCH 1/2] color: downgrade "always" to "auto" only for on-disk configuration

2017-10-13 Thread Junio C Hamano
Jeff King  writes:

> All of the regressions people have actually _noticed_ stem from my
> 136c8c8b8f in v2.14.2. So I think it is a viable option to try to go
> back to the pre-v2.14.2 state. I.e.:
> ...
> That takes us back to the pre-regression state. The ancient bug from
> 4c7f1819 still exists, but that would be OK for v2.15. We'd probably
> want to bump the -rc cycle a bit to give more confidence that (2) caught
> everything.

Yes, I think that is the approach I was pushing initially with the
jc/ref-filter-colors-fix topic that was later retracted; the result
of your 4-patch series more or less matches that one, modulo that I
didn't treat for-each-ref as a plumbing.  I do share the worry that
it is hard to make sure that these post-revert adjustment caught
everything; after all, that was a major part of the reason why my
earlier attempt was retracted.  I still think this is the _right_
direction to go in, even though it is harder to get right.

> Post-release, we would either:
> ...
> But we could punt on that part until after the release. The only thing
> we'd need to decide on now is that first set of reversions. What I
> really _don't_ want to do is ship v2.15 with "always works like auto"
> and then flip that back in v2.16.

True.  Let's see what others think.  I know Jonathan is running
the fork at $work with "downgrade always to auto" patches, and while
I think both approaches would probably work well in practice, I have
preference for this "harder but right" approach, so I'd want to see
different views discussed on the list before we decide.

Thanks.


Re: [PATCH] revision: quit pruning diff more quickly when possible

2017-10-13 Thread Junio C Hamano
Jeff King  writes:

> Here it is cleaned up and with a commit message. There's another case
> that can be optimized, too: --remove-empty with an all-deletions commit.
> That's probably even more obscure and pathological, but it was easy to
> cover in the same breath.

This one looks good.  It appears that again you guys had all the fun
while I was offline ;-).  And I am happy to see that we didn't veer
in the direction to optimize for a wrong case by keeping track of
what trees we already saw and things like that, of course.

Thanks.

> Subject: revision: quit pruning diff more quickly when possible
>
> When the revision traversal machinery is given a pathspec,
> we must compute the parent-diff for each commit to determine
> which ones are TREESAME. We set the QUICK diff flag to avoid
> looking at more entries than we need; we really just care
> whether there are any changes at all.
>
> But there is one case where we want to know a bit more: if
> --remove-empty is set, we care about finding cases where the
> change consists only of added entries (in which case we may
> prune the parent in try_to_simplify_commit()). To cover that
> case, our file_add_remove() callback does not quit the diff
> upon seeing an added entry; it keeps looking for other types
> of entries.
>
> But this means when --remove-empty is not set (and it is not
> by default), we compute more of the diff than is necessary.
> You can see this in a pathological case where a commit adds
> a very large number of entries, and we limit based on a
> broad pathspec. E.g.:
>
>   perl -e '
> chomp(my $blob = `git hash-object -w --stdin  for my $a (1..1000) {
>   for my $b (1..1000) {
> print "100644 $blob\t$a/$b\n";
>   }
> }
>   ' | git update-index --index-info
>   git commit -qm add
>
>   git rev-list HEAD -- .
>
> This case takes about 100ms now, but after this patch only
> needs 6ms. That's not a huge improvement, but it's easy to
> get and it protects us against even more pathological cases
> (e.g., going from 1 million to 10 million files would take
> ten times as long with the current code, but not increase at
> all after this patch).
>
> This is reported to minorly speed-up pathspec limiting in
> real world repositories (like the 100-million-file Windows
> repository), but probably won't make a noticeable difference
> outside of pathological setups.
>
> This patch actually covers the case without --remove-empty,
> and the case where we see only deletions. See the in-code
> comment for details.
>
> Note that we have to add a new member to the diff_options
> struct so that our callback can see the value of
> revs->remove_empty_trees. This callback parameter could be
> passed to the "add_remove" and "change" callbacks, but
> there's not much point. They already receive the
> diff_options struct, and doing it this way avoids having to
> update the function signature of the other callbacks
> (arguably the format_callback and output_prefix functions
> could benefit from the same simplification).
>
> Signed-off-by: Jeff King 
> ---
>  diff.h |  1 +
>  revision.c | 16 +---
>  2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/diff.h b/diff.h
> index 7dcfcfbef7..4a34d256f1 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -180,6 +180,7 @@ struct diff_options {
>   pathchange_fn_t pathchange;
>   change_fn_t change;
>   add_remove_fn_t add_remove;
> + void *change_fn_data;
>   diff_format_fn_t format_callback;
>   void *format_callback_data;
>   diff_prefix_fn_t output_prefix;
> diff --git a/revision.c b/revision.c
> index 8fd222f3bf..a3f245e2cc 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -399,8 +399,16 @@ static struct commit *one_relevant_parent(const struct 
> rev_info *revs,
>   * if the whole diff is removal of old data, and otherwise
>   * REV_TREE_DIFFERENT (of course if the trees are the same we
>   * want REV_TREE_SAME).
> - * That means that once we get to REV_TREE_DIFFERENT, we do not
> - * have to look any further.
> + *
> + * The only time we care about the distinction is when
> + * remove_empty_trees is in effect, in which case we care only about
> + * whether the whole change is REV_TREE_NEW, or if there's another type
> + * of change. Which means we can stop the diff early in either of these
> + * cases:
> + *
> + *   1. We're not using remove_empty_trees at all.
> + *
> + *   2. We saw anything except REV_TREE_NEW.
>   */
>  static int tree_difference = REV_TREE_SAME;
>  
> @@ -411,9 +419,10 @@ static void file_add_remove(struct diff_options *options,
>   const char *fullpath, unsigned dirty_submodule)
>  {
>   int diff = addremove == '+' ? REV_TREE_NEW : REV_TREE_OLD;
> + struct rev_info *revs = options->change_fn_data;
>  
>   tree_difference |= diff;
> - if (tree_difference == REV_TREE_DIFFERENT)
> + if (!revs->remove_empty_trees || tree_difference != REV_TREE_NEW)
>   

Re: [PATCH] diff: alias -q to --quiet

2017-10-13 Thread Junio C Hamano
Jeff King  writes:

> So there are two separate questions/tasks:
>
>   1. Should we remove the special handling of "-q" leftover from this
>  deprecation? I think the answer is yes.
>
>   2. Should we teach the diff machinery as a whole to treat "-q" as a
>  synonym for "--quiet".

Good questions.  And thanks for archaeology.

The topic #1 above is something that should have happened when "-q" stopped 
working
as "--diff-filter=d", and we probably should have started to error
out then, so that scripts that relied on the original behaviour
would have been forced to update.  That did not happen which was a
grave mistake.

By doing so, we would have made sure any script that uses "-q" died
out, and after a while, we can talk about reusing it for other
purposes, like the topic #2 above.

Is it worth making "-q" error out while doing #1 and keep it error
out for a few years?  I have a feeling that the answer might be
unfortunately yes _if_ we want to also do #2.  Even though we broke
"-q" for the scripts who wanted to see it ignore only the removals 4
years ago and left it broken since then.  Removals are much rarer
than modifications and additions, so it wouldn't be surprising if
the users of these scripts simply did not notice the old breakage,
but if we made "-q" to mean "--quiet" without doing #1, they will
break, as all diffs these scripts work on will suddenly give an
empty output.

If we aren't doing #2, then I do not think we need to make "-q"
error out when we do #1, though.

In any case, if we were to do both of the above two, they must
happen in that order, not the other way around.

Thanks.



Re: [PATCH 3/3] branch: forbid refs/heads/HEAD

2017-10-13 Thread Junio C Hamano
Junio C Hamano  writes:

>> Should we test that:
>>
>>   git update-ref refs/heads/HEAD HEAD^
>>
>> continues to work?
>
> Perhaps.  Also we may want to make sure that "git branch -D HEAD"
> still works as a way to recover from historical mistakes.

The only difference is improved tests where we use show-ref to make
sure refs/heads/HEAD does not exist when it shouldn't, exercise
update-ref to create and delete refs/heads/HEAD, and also make sure
it can be deleted with "git branch -d".

-- >8 --
strbuf_check_branch_ref() is the central place where many codepaths
see if a proposed name is suitable for the name of a branch.  It was
designed to allow us to get stricter than the check_refname_format()
check used for refnames in general, and we already use it to reject
a branch whose name begins with a '-'.

Use it to also reject "HEAD" as a branch name.

Helped-by: Jeff King 
Signed-off-by: Junio C Hamano 
---
 sha1_name.c |  2 +-
 t/t1430-bad-ref-name.sh | 23 +++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/sha1_name.c b/sha1_name.c
index c7c5ab376c..1b8c503095 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1332,7 +1332,7 @@ void strbuf_branchname(struct strbuf *sb, const char 
*name, unsigned allowed)
 int strbuf_check_branch_ref(struct strbuf *sb, const char *name)
 {
strbuf_branchname(sb, name, INTERPRET_BRANCH_LOCAL);
-   if (name[0] == '-')
+   if (*name == '-' || !strcmp(name, "HEAD"))
return -1;
strbuf_splice(sb, 0, 0, "refs/heads/", 11);
return check_refname_format(sb->buf, 0);
diff --git a/t/t1430-bad-ref-name.sh b/t/t1430-bad-ref-name.sh
index e88349c8a0..4556aa66b8 100755
--- a/t/t1430-bad-ref-name.sh
+++ b/t/t1430-bad-ref-name.sh
@@ -331,4 +331,27 @@ test_expect_success 'update-ref --stdin -z fails delete 
with bad ref name' '
grep "fatal: invalid ref format: ~a" err
 '
 
+test_expect_success 'branch rejects HEAD as a branch name' '
+   test_must_fail git branch HEAD HEAD^ &&
+   test_must_fail git show-ref refs/heads/HEAD
+'
+
+test_expect_success 'checkout -b rejects HEAD as a branch name' '
+   test_must_fail git checkout -B HEAD HEAD^ &&
+   test_must_fail git show-ref refs/heads/HEAD
+'
+
+test_expect_success 'update-ref can operate on refs/heads/HEAD' '
+   git update-ref refs/heads/HEAD HEAD^ &&
+   git show-ref refs/heads/HEAD &&
+   git update-ref -d refs/heads/HEAD &&
+   test_must_fail git show-ref refs/heads/HEAD
+'
+
+test_expect_success 'branch -d can remove refs/heads/HEAD' '
+   git update-ref refs/heads/HEAD HEAD^ &&
+   git branch -d HEAD &&
+   test_must_fail git show-ref refs/heads/HEAD
+'
+
 test_done
-- 
2.15.0-rc1-158-gbd035ae683



Re: [PATCH 3/3] branch: forbid refs/heads/HEAD

2017-10-13 Thread Junio C Hamano
Jeff King  writes:

> On Fri, Oct 13, 2017 at 02:11:32PM +0900, Junio C Hamano wrote:
>
>> strbuf_check_branch_ref() is the central place where many codepaths
>> see if a proposed name is suitable for the name of a branch.  It was
>> designed to allow us to get stricter than the check_refname_format()
>> check used for refnames in general, and we already use it to reject
>> a branch whose name begins with a '-'.
>> 
>> Use it to also reject "HEAD" as a branch name.
>
> Heh, I just pointed somebody to this a day or two ago as #leftoverbit. I
> guess it's taken now. :)

Didn't notice /remember it; sorry about that.  I can retract it if
you want, but perhaps they cannot unsee it ;-)

>> diff --git a/t/t1430-bad-ref-name.sh b/t/t1430-bad-ref-name.sh
>> index e88349c8a0..3ecb2eab0c 100755
>> --- a/t/t1430-bad-ref-name.sh
>> +++ b/t/t1430-bad-ref-name.sh
>> @@ -331,4 +331,12 @@ test_expect_success 'update-ref --stdin -z fails delete 
>> with bad ref name' '
>>  grep "fatal: invalid ref format: ~a" err
>>  '
>>  
>> +test_expect_success 'branch rejects HEAD as a branch name' '
>> +test_must_fail git branch HEAD HEAD^
>> +'
>> +
>> +test_expect_success 'checkout -b rejects HEAD as a branch name' '
>> +test_must_fail git checkout -B HEAD HEAD^
>> +'
>
> Should we test that:
>
>   git update-ref refs/heads/HEAD HEAD^
>
> continues to work?

Perhaps.  Also we may want to make sure that "git branch -D HEAD"
still works as a way to recover from historical mistakes.


Re: [PATCH v2 2/3] for-each-ref: let upstream/push optionally remote ref name

2017-10-13 Thread Junio C Hamano
Kevin Daudt  writes:

> On Thu, Oct 05, 2017 at 02:19:15PM +0200, Johannes Schindelin wrote:
>> From: J Wyman 
>> [..] 
>> 
>> diff --git a/Documentation/git-for-each-ref.txt 
>> b/Documentation/git-for-each-ref.txt
>> index 39f50bd53eb..22767025850 100644
>> --- a/Documentation/git-for-each-ref.txt
>> +++ b/Documentation/git-for-each-ref.txt
>> @@ -142,8 +142,9 @@ upstream::
>>  encountered. Append `:track,nobracket` to show tracking
>>  information without brackets (i.e "ahead N, behind M").
>>  +
>> -Also respects `:remotename` to state the name of the *remote* instead of
>> -the ref.
>> +Also respects `:remotename` to state the name of the *remote* instead
>> +of the ref, and `:remoteref` to state the name of the *reference* as
>> +locally known by the remote.
>
> What does "locally known by the remote" mean in this sentence?

Good question.  I cannot offhand offer a better and concise
phrasing, but I think can explain what it wants to describe ;-).

For a local branch we have (say, 'topic'), there may be different
things outside this repository that regularly interact with it.

We may update 'topic' with work others did, by

git checkout topic && git pull

without any extra command line argument to "git pull".  We are
pulling from the 'upstream' of the 'topic', which is a branch in a
remote repository, and the (nick)name of the remote is :remotename.
When we do this pull, we would grab one of the branches the remote
has and merge it into our 'topic'.  IOW, when the above command line
is written in longhand, the "git pull" step may look like this [*1*]:

git pull origin devel

if we were building on top of the 'devel' branch at the 'origin'
remote.  The full refname of that branch, 'refs/heads/devel', is
:remoteref, and that is the reference locally known to the 'origin'
remote.

The "locally known by the remote" is an attempt by the patch authors
to stress the fact that the result is not refs/remotes/origin/devel
(which is where _you_ would keep a local copy of that reference in
this repository).

Two other things outside this repository that interact with 'topic'
are where the result of our work is pushed out to, which is a branch
at the 'push' remote.


[Footnotes]

*1* Modulo the details of other refs fetched that are unrelated to
the resulting merge and local copies stored as remote-tracking
branches in this repository.




Re: [PATCH v3 07/10] connect: tell server that the client understands v1

2017-10-13 Thread Brandon Williams
On 10/10, Jonathan Tan wrote:
> On Tue,  3 Oct 2017 13:15:04 -0700
> Brandon Williams  wrote:
> 
> > 2. ssh://, file://
> >Set 'GIT_PROTOCOL' environment variable with the desired protocol
> >version.  With the file:// transport, 'GIT_PROTOCOL' can be set
> >explicitly in the locally running git-upload-pack or git-receive-pack
> >processes.  With the ssh:// transport and OpenSSH compliant ssh
> >programs, 'GIT_PROTOCOL' can be sent across ssh by using '-o
> >SendEnv=GIT_PROTOCOL' and having the server whitelist this
> >environment variable.
> 
> In your commit message, also mention what GIT_PROTOCOL contains
> (version=?). (From this commit message alone, I would have expected a
> lone integer, but that is not the case.)
> 
> Same comment for the commit message of PATCH v3 08/10.

I'll update the commit msgs.

-- 
Brandon Williams


Re: [PATCH v3 03/10] protocol: introduce protocol extention mechanisms

2017-10-13 Thread Brandon Williams
On 10/10, Jonathan Nieder wrote:
> Hi,
> 
> Brandon Williams wrote:
> 
> > Given some of this discussion though, maybe we want to change the
> > semantics of 'protocol.version' so that both servers and clients respect
> > it.  As of right now, these patches have the server always allow
> > protocol v0 and v1?  Though that doesn't do much right now since v1 is
> > the same as v0.
> 
> I strongly prefer not to do that.
> 
> If we want to make the advertised protocol versions on the server side
> configurable, I think it should be independent from the configuration
> for protocol versions to use on the client side.  Rationale:
> 
>  - As a client-side user, I may want to (after reporting the bug, of
>course!) blacklist certain protocol versions to work around server
>bugs.  But this should not affect my behavior as a server --- in
>my role as a server, these server-side bugs have no effect on me.
> 
>  - As a server operator, I may want to (after reporting the bug, of
>course!) blacklist certain protocol versions to work around client
>bugs.  But this should not affect my behavior as a client --- in my
>role as a client, these client-side bugs have no effect on me.
> 
> Making the client-side case configurable seems important since Git is
> widely used in environments where it may not be easy to control the
> deployed version (so having configuration as an escape hatch is
> important).
> 
> Making the server-side case configurable seems less important since
> Git server operators usually have tight control over the deployed Git
> version and can apply a targetted fix or workaround.

This is fine with me.  Realistically, as I mentioned, all of this is
unimportant at the moment as it doesn't prevent us from moving forward
with the transition or with implementing a v2.  If we do get to a point
in the future where we need to explicitly care about blacklisting or
whitelisting protocol versions from the config then we can take care of
that then.  The important thing is that servers won't die if they see
multiple 'version=?' entries or unknown values of '?' in GIT_PROTOCOL.


-- 
Brandon Williams


Re: [PATCH] Documentation: document Extra Parameters

2017-10-13 Thread Brandon Williams
On 10/10, Jonathan Tan wrote:
> Document the server support for Extra Parameters, additional information
> that the client can send in its first message to the server during a
> Git client-server interaction.
> 
> Signed-off-by: Jonathan Tan 
> ---
> I noticed that Documentation/technical was not updated in this patch
> series. It probably should, and here is a suggestion on how to do it.
> 
> Also, I'm not sure what to call the extra sendable information. I'm open
> to suggestions for a better name than Extra Parameters.

Thanks for writing this up.

> ---
>  Documentation/technical/http-protocol.txt |  8 ++
>  Documentation/technical/pack-protocol.txt | 43 
> ++-
>  2 files changed, 44 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/technical/http-protocol.txt 
> b/Documentation/technical/http-protocol.txt
> index 1c561bdd9..a0e45f288 100644
> --- a/Documentation/technical/http-protocol.txt
> +++ b/Documentation/technical/http-protocol.txt
> @@ -219,6 +219,10 @@ smart server reply:
> S: 003c2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0\n
> S: 003fa3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}\n
>  
> +The client may send Extra Parameters (see
> +Documentation/technical/pack-protocol.txt) as a colon-separated string
> +in the Git-Protocol HTTP header.
> +
>  Dumb Server Response
>  
>  Dumb servers MUST respond with the dumb server reply format.
> @@ -269,7 +273,11 @@ the C locale ordering.  The stream SHOULD include the 
> default ref
>  named `HEAD` as the first ref.  The stream MUST include capability
>  declarations behind a NUL on the first ref.
>  
> +The returned response contains "version 1" if "version=1" was sent as an
> +Extra Parameter.
> +
>smart_reply =  PKT-LINE("# service=$servicename" LF)
> +  *1("version 1")
>ref_list
>""
>ref_list=  empty_list / non_empty_list
> diff --git a/Documentation/technical/pack-protocol.txt 
> b/Documentation/technical/pack-protocol.txt
> index ed1eae8b8..f9ebfb23e 100644
> --- a/Documentation/technical/pack-protocol.txt
> +++ b/Documentation/technical/pack-protocol.txt
> @@ -39,6 +39,19 @@ communicates with that invoked process over the SSH 
> connection.
>  The file:// transport runs the 'upload-pack' or 'receive-pack'
>  process locally and communicates with it over a pipe.
>  
> +Extra Parameters
> +
> +
> +The protocol provides a mechanism in which clients can send additional
> +information in its first message to the server. These are called "Extra
> +Parameters", and are supported by the Git, SSH, and HTTP protocols.
> +
> +Each Extra Parameter takes the form of `=` or ``.
> +
> +Servers that receive any such Extra Parameters MUST ignore all
> +unrecognized keys. Currently, the only Extra Parameter recognized is
> +"version=1".
> +
>  Git Transport
>  -
>  
> @@ -46,18 +59,25 @@ The Git transport starts off by sending the command and 
> repository
>  on the wire using the pkt-line format, followed by a NUL byte and a
>  hostname parameter, terminated by a NUL byte.
>  
> -   0032git-upload-pack /project.git\0host=myserver.com\0
> +   0033git-upload-pack /project.git\0host=myserver.com\0
> +
> +The transport may send Extra Parameters by adding an additional NUL
> +byte, and then adding one or more NUL-terminated strings:
> +
> +   003egit-upload-pack /project.git\0host=myserver.com\0\0version=1\0
>  
>  --
> -   git-proto-request = request-command SP pathname NUL [ host-parameter NUL ]
> +   git-proto-request = request-command SP pathname NUL
> +[ host-parameter NUL [ NUL extra-parameters ] ]

This should probably be "[ host-parameter NUL ] [ NUL extra-parameters ]"
because we don't want to require sending a host parameter in order to
send extra parameters.

> request-command   = "git-upload-pack" / "git-receive-pack" /
>  "git-upload-archive"   ; case sensitive
> pathname  = *( %x01-ff ) ; exclude NUL
> host-parameter= "host=" hostname [ ":" port ]
> +   extra-parameters  = 1*extra-parameter
> +   extra-parameter   = 1*( %x01-ff ) NUL
>  --
>  
> -Only host-parameter is allowed in the git-proto-request. Clients
> -MUST NOT attempt to send additional parameters. It is used for the
> +host-parameter is used for the
>  git-daemon name based virtual hosting.  See --interpolated-path
>  option to git daemon, with the %H/%CH format characters.
>  
> @@ -117,6 +137,12 @@ we execute it without the leading '/'.
>v
> ssh u...@example.com "git-upload-pack '~alice/project.git'"
>  
> +Depending on the value of the `protocol.version` configuration variable,
> +Git may attempt to send Extra Parameters as a colon-separated string in
> +the GIT_PROTOCOL environment variable. This is done only if
> +the `ssh.variant` configuration variable indicates that the 

Re: [PATCH v3 05/10] upload-pack, receive-pack: introduce protocol version 1

2017-10-13 Thread Brandon Williams
On 10/10, Jonathan Tan wrote:
> On Tue,  3 Oct 2017 13:15:02 -0700
> Brandon Williams  wrote:
> 
> > +   switch (determine_protocol_version_server()) {
> > +   case protocol_v1:
> > +   if (advertise_refs || !stateless_rpc)
> > +   packet_write_fmt(1, "version 1\n");
> > +   /*
> > +* v1 is just the original protocol with a version string,
> > +* so just fall through after writing the version string.
> > +*/
> 
> Peff sent out at least one patch [1] that reformats fallthrough comments
> to be understood by GCC. It's probably worth doing here too.
> 
> In this case, I would do the 2-comment system that Peff suggested:
> 
>   /*
>* v1 is just the original protocol with a version string,
>* so just fall through after writing the version string.
>*/
>   if (advertise_refs || !stateless_rpc)
>   packet_write_fmt(1, "version 1\n");
>   /* fallthrough */
> 
> (I put the first comment before the code, so it doesn't look so weird.)

Sounds good.

> 
> [1] 
> https://public-inbox.org/git/20170921062541.ew67gyvrmb2ot...@sigill.intra.peff.net/
> 
> > +   case protocol_v0:
> > +   break;
> > +   default:

I'm also going to change this to from default to
'protocol_unknown_version' that way we get a compiler error instead of a
run-time BUG when introducing a new protocol version number.

> > +   BUG("unknown protocol version");
> > +   }

-- 
Brandon Williams


Re: [PATCH v3 04/10] daemon: recognize hidden request arguments

2017-10-13 Thread Brandon Williams
On 10/10, Jonathan Tan wrote:
> On Tue,  3 Oct 2017 13:15:01 -0700
> Brandon Williams  wrote:
> 
> >  /*
> >   * Read the host as supplied by the client connection.
> 
> The return value is probably worth documenting. Something like "Returns
> a pointer to the character *after* the NUL byte terminating the host
> argument, or extra_args if there is no host argument."

I can add that comment.

> 
> >   */
> > -static void parse_host_arg(struct hostinfo *hi, char *extra_args, int 
> > buflen)
> > +static char *parse_host_arg(struct hostinfo *hi, char *extra_args, int 
> > buflen)
> 
> [snip]
> 
> > +static void parse_extra_args(struct hostinfo *hi, struct argv_array *env,
> > +char *extra_args, int buflen)
> > +{
> > +   const char *end = extra_args + buflen;
> > +   struct strbuf git_protocol = STRBUF_INIT;
> > +
> > +   /* First look for the host argument */
> > +   extra_args = parse_host_arg(hi, extra_args, buflen);
> 
> This works, but is a bit loose in what it accepts. I think it's better
> to be tighter - in particular, if there is no host argument, we
> shouldn't be looking for extra args.

I disagree, you shouldn't be precluded from using protocol v2 if you
don't include a host argument.

> 
> > +
> > +   /* Look for additional arguments places after a second NUL byte */
> > +   for (; extra_args < end; extra_args += strlen(extra_args) + 1) {
> 
> Assuming that the host argument exists, this for loop should start at
> extra_args + 1 to skip the second NUL byte. This currently works
> because this code is lenient towards empty strings.

Being lenient towards empty strings is fine, I don't see any reason why
we should disallow them.  Also, this code already
requires that there be the second NUL byte because if there isn't then
the code which parses the host argument would fail out.

> 
> > +   const char *arg = extra_args;
> > +
> > +   /*
> > +* Parse the extra arguments, adding most to 'git_protocol'
> > +* which will be used to set the 'GIT_PROTOCOL' envvar in the
> > +* service that will be run.
> > +*
> > +* If there ends up being a particular arg in the future that
> > +* git-daemon needs to parse specificly (like the 'host' arg)
> > +* then it can be parsed here and not added to 'git_protocol'.
> > +*/
> > +   if (*arg) {
> > +   if (git_protocol.len > 0)
> > +   strbuf_addch(_protocol, ':');
> > +   strbuf_addstr(_protocol, arg);
> > +   }
> > +   }
> 
> But I think we shouldn't be lenient towards empty strings.

Why not? I see no issue with allowing them.  In fact if we error out we
could be painting ourselves into a corner much like how we did with the
host parsing logic.

> 
> > +
> > +   if (git_protocol.len > 0)
> > +   argv_array_pushf(env, GIT_PROTOCOL_ENVIRONMENT "=%s",
> > +git_protocol.buf);
> > +   strbuf_release(_protocol);
> >  }

-- 
Brandon Williams


Re: [PATCH] sequencer.c: unify error messages

2017-10-13 Thread René Scharfe
Am 13.10.2017 um 19:51 schrieb Ralf Thielow:
> When ftruncate() in rearrange_squash() fails, we write
> that we couldn't finish the operation on the todo file.
> It is more accurate to write that we couldn't truncate
> as we do in other calls of ftruncate().

Would it make sense to factor out rewriting the to-do file to avoid
code duplication in the first place?

> While at there, remove a full stop in another error message.
> 
> Signed-off-by: Ralf Thielow 
> ---
>   sequencer.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/sequencer.c b/sequencer.c
> index e258bb646..b0e6459a5 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -2948,9 +2948,9 @@ int rearrange_squash(void)
>   if (fd < 0)
>   res = error_errno(_("could not open '%s'"), todo_file);
>   else if (write(fd, buf.buf, buf.len) < 0)
> - res = error_errno(_("could not read '%s'."), todo_file);
> + res = error_errno(_("could not read '%s'"), todo_file);
   
That should read "write", right?

>   else if (ftruncate(fd, buf.len) < 0)
> - res = error_errno(_("could not finish '%s'"),
> + res = error_errno(_("could not truncate '%s'"),
>  todo_file);

Hmm, why call ftruncate(2) instead of opening the file with O_TRUNC?

>   close(fd);
>   strbuf_release();
> 


[PATCH v2 2/2] diff: finish removal of deprecated -q option

2017-10-13 Thread Anthony Sottile
Functionality was removed in c48f6816f0 but the cli option was not removed.

Signed-off-by: Anthony Sottile 
---
 builtin/diff-files.c | 2 --
 builtin/diff.c   | 2 --
 diff.h   | 4 +---
 3 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/builtin/diff-files.c b/builtin/diff-files.c
index e88493f..b0ff251 100644
--- a/builtin/diff-files.c
+++ b/builtin/diff-files.c
@@ -37,8 +37,6 @@ int cmd_diff_files(int argc, const char **argv, const char 
*prefix)
rev.max_count = 2;
else if (!strcmp(argv[1], "--theirs"))
rev.max_count = 3;
-   else if (!strcmp(argv[1], "-q"))
-   options |= DIFF_SILENT_ON_REMOVED;
else
usage(diff_files_usage);
argv++; argc--;
diff --git a/builtin/diff.c b/builtin/diff.c
index f5bbd4d..96513e8 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -227,8 +227,6 @@ static int builtin_diff_files(struct rev_info *revs, int 
argc, const char **argv
revs->max_count = 2;
else if (!strcmp(argv[1], "--theirs"))
revs->max_count = 3;
-   else if (!strcmp(argv[1], "-q"))
-   options |= DIFF_SILENT_ON_REMOVED;
else if (!strcmp(argv[1], "-h"))
usage(builtin_diff_usage);
else
diff --git a/diff.h b/diff.h
index aca150b..c9d71e1 100644
--- a/diff.h
+++ b/diff.h
@@ -65,7 +65,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct 
diff_options *opt, void *data)
 #define DIFF_OPT_BINARY  (1 <<  2)
 #define DIFF_OPT_TEXT(1 <<  3)
 #define DIFF_OPT_FULL_INDEX  (1 <<  4)
-#define DIFF_OPT_SILENT_ON_REMOVE(1 <<  5)
+/* (1 << 5) unused */
 #define DIFF_OPT_FIND_COPIES_HARDER  (1 <<  6)
 #define DIFF_OPT_FOLLOW_RENAMES  (1 <<  7)
 #define DIFF_OPT_RENAME_EMPTY(1 <<  8)
@@ -374,8 +374,6 @@ extern void diff_warn_rename_limit(const char *varname, int 
needed, int degraded
  */
 extern const char *diff_aligned_abbrev(const struct object_id *sha1, int);
 
-/* do not report anything on removed paths */
-#define DIFF_SILENT_ON_REMOVED 01
 /* report racily-clean paths as modified */
 #define DIFF_RACY_IS_MODIFIED 02
 extern int run_diff_files(struct rev_info *revs, unsigned int option);
-- 
2.7.4



[PATCH v2 1/2] diff: alias -q to --quiet

2017-10-13 Thread Anthony Sottile
Previously, `-q` was silently ignored:

Before:

$ git diff -q -- Documentation/; echo $?
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index a88c767..aa6e724 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -587,6 +587,7 @@ ifndef::git-log[]
That is, it exits with 1 if there were differences and
0 means no differences.

+-q::
 --quiet::
Disable all output of the program. Implies `--exit-code`.
 endif::git-log[]
0
$

After:
$ ./git diff -q -- Documentation/; echo $?
1
$

Signed-off-by: Anthony Sottile 
---
 Documentation/diff-options.txt | 1 +
 diff.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index a88c767..aa6e724 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -587,6 +587,7 @@ ifndef::git-log[]
That is, it exits with 1 if there were differences and
0 means no differences.
 
+-q::
 --quiet::
Disable all output of the program. Implies `--exit-code`.
 endif::git-log[]
diff --git a/diff.c b/diff.c
index 69f0357..13dfc3e 100644
--- a/diff.c
+++ b/diff.c
@@ -4751,7 +4751,7 @@ int diff_opt_parse(struct diff_options *options,
}
else if (!strcmp(arg, "--exit-code"))
DIFF_OPT_SET(options, EXIT_WITH_STATUS);
-   else if (!strcmp(arg, "--quiet"))
+   else if (!strcmp(arg, "-q") || !strcmp(arg, "--quiet"))
DIFF_OPT_SET(options, QUICK);
else if (!strcmp(arg, "--ext-diff"))
DIFF_OPT_SET(options, ALLOW_EXTERNAL);
-- 
2.7.4



Re: [PATCH] diff: alias -q to --quiet

2017-10-13 Thread Jeff King
On Fri, Oct 13, 2017 at 09:44:15AM -0700, Anthony Sottile wrote:

> Previously, `-q` was silently ignored:

I'm not sure if is totally ignored. Normally if we have an unknown
options we'd complain:

  $ git diff -x
  error: invalid option: -x

but we don't with "-q". Why?

In builtin/diff.c:471, we can see:

  else if (!strcmp(argv[1], "-q"))
  options |= DIFF_SILENT_ON_REMOVED;

So it _does_ do something, just not what you expected.

But wait, that's not the whole story. We convert "-q" into
SILENT_ON_REMOVED in git-diff-files and in git-diff (when we're acting
like diff-files). But nobody ever seems to check it!

Running "git log -p -SSILENT_ON_REMOVED" turns up two interesting
commits:

  - 95a7c546b0 (diff: deprecate -q option to diff-files, 2013-07-17)

  - c48f6816f0 (diff: remove "diff-files -q" in a version of Git in a
distant future, 2013-07-18).

So we dropped "-q" a few years ago and added a deprecation notice. "git
tag --contains 95a7c546b0" tells us that happened in v1.8.5, which
shipped in Nov 2013.

And then in v2.0.0 (May 2014) we tried to drop "-q" completely. Looking
over c48f6816f0, I _think_ it's a mistake that "-q" became a silent noop
there. That commit should have ripped out the remaining bits that set
the SILENT_ON_REMOVED flag, and "-q" would have become an error.

So there are two separate questions/tasks:

  1. Should we remove the special handling of "-q" leftover from this
 deprecation? I think the answer is yes.

  2. Should we teach the diff machinery as a whole to treat "-q" as a
 synonym for "--quiet".

 Probably yes, but it's less clear to me that this won't have funny
 interactions. Are there other commands which use the diff-options
 parser via setup_revisions(), but expect "-q" to be left in the
 output so that they can handle it themselves?

 It looks like git-log does so, but it pulls the "-q" out before
 handing the remainder to setup_revisions(). And anyway, it just
 converts the option into a quiet diff (though it does in a way
 that's different than the rest of the diff code -- that might bear
 investigating on its own).

 Grepping for 'q' and OPT__QUIET, I don't see any others, but I
 didn't spend very much time digging.

-Peff


[PATCH] sequencer.c: unify error messages

2017-10-13 Thread Ralf Thielow
When ftruncate() in rearrange_squash() fails, we write
that we couldn't finish the operation on the todo file.
It is more accurate to write that we couldn't truncate
as we do in other calls of ftruncate().

While at there, remove a full stop in another error message.

Signed-off-by: Ralf Thielow 
---
 sequencer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index e258bb646..b0e6459a5 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2948,9 +2948,9 @@ int rearrange_squash(void)
if (fd < 0)
res = error_errno(_("could not open '%s'"), todo_file);
else if (write(fd, buf.buf, buf.len) < 0)
-   res = error_errno(_("could not read '%s'."), todo_file);
+   res = error_errno(_("could not read '%s'"), todo_file);
else if (ftruncate(fd, buf.len) < 0)
-   res = error_errno(_("could not finish '%s'"),
+   res = error_errno(_("could not truncate '%s'"),
   todo_file);
close(fd);
strbuf_release();
-- 
2.15.0.rc0.296.g7b26d72



Re: Can I remove multiple stashed states at a time?

2017-10-13 Thread Jeff King
On Fri, Oct 13, 2017 at 11:58:12AM +0900, 小川恭史 wrote:

> I want to remove multiple stashed states at a time.
> 
> But "git stash drop " removes only one stashed state at a time
> and "git stash clear" remove all.
> 
> Can I do that?

There isn't a way to do it through "git stash", I don't think. The stash
feature is backed by a reflog, and the underlying machinery is capable
of it. E.g.:

  git reflog delete --updateref --rewrite refs/stash@{1} refs/stash@{2} 
refs/stash@{3}

works. But that's getting a bit more familiar with the innards of stash
than users should be (both because they shouldn't need to, and because
the underlying storage may be subject to change).

Probably "git stash drop" should learn to take multiple stash arguments
and pass them all along to "reflog delete".

You can also just do:

  for i in 1 2 3; do
 git stash drop $i
  done

of course. It's less efficient than a single invocation (since it has to
rewrite the whole reflog each time), but unless you have thousands of
stashes, I doubt the difference is all that noticeable.

-Peff


[PATCH 4/4] tag: respect color.ui config

2017-10-13 Thread Jeff King
Since 11b087adfd (ref-filter: consult want_color() before
emitting colors, 2017-07-13), we expect that setting
"color.ui" to "always" will enable color tag formats even
without a tty.  As that commit was built on top of
136c8c8b8f (color: check color.ui in git_default_config(),
2017-07-13) from the same series, we didn't need to touch
tag's config parsing at all.

However, since we reverted 136c8c8b8f, we now need to
explicitly call git_color_default_config() to make this
work.

Let's do so, and also restore the test dropped in 0c88bf5050
(provide --color option for all ref-filter users,
2017-10-03). That commit swapped out our "color.ui=always"
test for "--color" in preparation for "always" going away.
But since it is here to stay, we should test both cases.

Note that for-each-ref also lost its color.ui support as
part of reverting 136c8c8b8f. But as a plumbing command, it
should _not_ respect the color.ui config. Since it also
gained a --color option in 0c88bf5050, that's the correct
way to ask it for color. We'll continue to test that, and
confirm that "color.ui" is not respected.

Signed-off-by: Jeff King 
---
 builtin/tag.c   | 2 +-
 t/t6300-for-each-ref.sh | 5 +
 t/t7004-tag.sh  | 6 ++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index 695cb0778e..b38329b593 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -158,7 +158,7 @@ static int git_tag_config(const char *var, const char 
*value, void *cb)
 
if (starts_with(var, "column."))
return git_column_config(var, value, "tag", );
-   return git_default_config(var, value, cb);
+   return git_color_default_config(var, value, cb);
 }
 
 static void write_tag_body(int fd, const struct object_id *oid)
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 416ff7d0b8..3aa534933e 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -442,6 +442,11 @@ test_expect_success '--color can override tty check' '
test_cmp expected.color actual
 '
 
+test_expect_success 'color.ui=always does not override tty check' '
+   git -c color.ui=always for-each-ref --format="$color_format" >actual &&
+   test_cmp expected.bare actual
+'
+
 cat >expected <<\EOF
 heads/master
 tags/master
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index 4e62c505fc..a9af2de996 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -1918,6 +1918,12 @@ test_expect_success '--color overrides auto-color' '
test_cmp expect.color actual
 '
 
+test_expect_success 'color.ui=always overrides auto-color' '
+   git -c color.ui=always tag $color_args >actual.raw &&
+   test_decode_color actual &&
+   test_cmp expect.color actual
+'
+
 test_expect_success 'setup --merged test tags' '
git tag mergetest-1 HEAD~2 &&
git tag mergetest-2 HEAD~1 &&
-- 
2.15.0.rc1.395.ga4290b5804


[PATCH 3/4] Revert "color: check color.ui in git_default_config()"

2017-10-13 Thread Jeff King
This reverts commit 136c8c8b8fa39f1315713248473dececf20f8fe7.

That commit was trying to address a bug caused by 4c7f1819b3
(make color.ui default to 'auto', 2013-06-10), in which
plumbing like diff-tree defaulted to "auto" color, but did
not respect a "color.ui" directive to disable it.

But it also meant that we started respecting "color.ui" set
to "always". This was a known problem, but 4c7f1819b3 argued
that nobody ought to be doing that. However, that turned out
to be wrong, and we got a number of bug reports related to
"add -p" regressing in v2.14.2.

Let's revert 136c8c8b8, fixing the regression to "add -p".
This leaves the problem from 4c7f1819b3 unfixed, but:

  1. It's a pretty obscure problem in the first place. I
 only noticed it while working on the color code, and we
 haven't got a single bug report or complaint about it.

  2. We can make a more moderate fix on top by respecting
 "never" but not "always" for plumbing commands. This
 is just the minimal fix to go back to the working state
 we had before v2.14.2.

Note that this isn't a pure revert. We now have a test in
t3701 which shows off the "add -p" regression. This can be
flipped to success.

Signed-off-by: Jeff King 
---
 builtin/branch.c   | 2 +-
 builtin/clean.c| 3 ++-
 builtin/grep.c | 2 +-
 builtin/show-branch.c  | 2 +-
 color.c| 8 
 config.c   | 4 
 diff.c | 3 +++
 t/t3701-add-interactive.sh | 2 +-
 8 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index b67593288c..79dc9181fd 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -93,7 +93,7 @@ static int git_branch_config(const char *var, const char 
*value, void *cb)
return config_error_nonbool(var);
return color_parse(value, branch_colors[slot]);
}
-   return git_default_config(var, value, cb);
+   return git_color_default_config(var, value, cb);
 }
 
 static const char *branch_get_color(enum color_branch ix)
diff --git a/builtin/clean.c b/builtin/clean.c
index 733b6d3745..189e20628c 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -126,7 +126,8 @@ static int git_clean_config(const char *var, const char 
*value, void *cb)
return 0;
}
 
-   return git_default_config(var, value, cb);
+   /* inspect the color.ui config variable and others */
+   return git_color_default_config(var, value, cb);
 }
 
 static const char *clean_get_color(enum color_clean ix)
diff --git a/builtin/grep.c b/builtin/grep.c
index 19e23946ac..2d65f27d01 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -275,7 +275,7 @@ static int wait_all(void)
 static int grep_cmd_config(const char *var, const char *value, void *cb)
 {
int st = grep_config(var, value, cb);
-   if (git_default_config(var, value, cb) < 0)
+   if (git_color_default_config(var, value, cb) < 0)
st = -1;
 
if (!strcmp(var, "grep.threads")) {
diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index 84547d6fba..6fa1f62a88 100644
--- a/builtin/show-branch.c
+++ b/builtin/show-branch.c
@@ -554,7 +554,7 @@ static int git_show_branch_config(const char *var, const 
char *value, void *cb)
return 0;
}
 
-   return git_default_config(var, value, cb);
+   return git_color_default_config(var, value, cb);
 }
 
 static int omit_in_dense(struct commit *commit, struct commit **rev, int n)
diff --git a/color.c b/color.c
index 9ccd954d6b..9a9261ac16 100644
--- a/color.c
+++ b/color.c
@@ -368,6 +368,14 @@ int git_color_config(const char *var, const char *value, 
void *cb)
return 0;
 }
 
+int git_color_default_config(const char *var, const char *value, void *cb)
+{
+   if (git_color_config(var, value, cb) < 0)
+   return -1;
+
+   return git_default_config(var, value, cb);
+}
+
 void color_print_strbuf(FILE *fp, const char *color, const struct strbuf *sb)
 {
if (*color)
diff --git a/config.c b/config.c
index 4831c12735..adb7d7a3e5 100644
--- a/config.c
+++ b/config.c
@@ -16,7 +16,6 @@
 #include "string-list.h"
 #include "utf8.h"
 #include "dir.h"
-#include "color.h"
 
 struct config_source {
struct config_source *prev;
@@ -1351,9 +1350,6 @@ int git_default_config(const char *var, const char 
*value, void *dummy)
if (starts_with(var, "advice."))
return git_default_advice_config(var, value);
 
-   if (git_color_config(var, value, dummy) < 0)
-   return -1;
-
if (!strcmp(var, "pager.color") || !strcmp(var, "color.pager")) {
pager_use_color = git_config_bool(var,value);
return 0;
diff --git a/diff.c b/diff.c
index 69f03570ad..30b2e271b0 100644
--- a/diff.c
+++ b/diff.c
@@ -358,6 +358,9 @@ int git_diff_ui_config(const char *var, const char *value, 
void *cb)
return 0;
 

[PATCH 2/4] Revert "t6006: drop "always" color config tests"

2017-10-13 Thread Jeff King
This reverts commit c5bdfe677cfab5b2e87771c35565d44d3198efda.

That commit was done primarily to prepare for the weakening
of "always" in 6be4595edb (color: make "always" the same as
"auto" in config, 2017-10-03). But since we've now reverted
6be4595edb, there's no need for us to remove "-c
color.ui=always" from the tests. And in fact it's a good
idea to restore these tests, to make sure that "always"
continues to work.

Signed-off-by: Jeff King 
---
 t/t6006-rev-list-format.sh | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
index 25a9c65dc5..98be78b4a2 100755
--- a/t/t6006-rev-list-format.sh
+++ b/t/t6006-rev-list-format.sh
@@ -208,11 +208,26 @@ do
has_no_color actual
'
 
+   test_expect_success "$desc enables colors for color.diff" '
+   git -c color.diff=always log --format=$color -1 >actual &&
+   has_color actual
+   '
+
+   test_expect_success "$desc enables colors for color.ui" '
+   git -c color.ui=always log --format=$color -1 >actual &&
+   has_color actual
+   '
+
test_expect_success "$desc respects --color" '
git log --format=$color -1 --color >actual &&
has_color actual
'
 
+   test_expect_success "$desc respects --no-color" '
+   git -c color.ui=always log --format=$color -1 --no-color 
>actual &&
+   has_no_color actual
+   '
+
test_expect_success TTY "$desc respects --color=auto (stdout is tty)" '
test_terminal git log --format=$color -1 --color=auto >actual &&
has_color actual
@@ -225,11 +240,6 @@ do
has_no_color actual
)
'
-
-   test_expect_success TTY "$desc respects --no-color" '
-   test_terminal git log --format=$color -1 --no-color >actual &&
-   has_no_color actual
-   '
 done
 
 test_expect_success '%C(always,...) enables color even without tty' '
-- 
2.15.0.rc1.395.ga4290b5804



[PATCH 1/4] Revert "color: make "always" the same as "auto" in config"

2017-10-13 Thread Jeff King
This reverts commit 6be4595edb8e5b616c6e8b9fbc78b0f831fa2a87.

That commit weakened the "always" setting of color config so
that it acted as "auto". This was meant to solve regressions
in v2.14.2 in which setting "color.ui=always" in the on-disk
config broke scripts like add--interactive, because the
plumbing diff commands began to generate color output.

This was due to 136c8c8b8f (color: check color.ui in
git_default_config(), 2017-07-13), which was in turn trying
to fix issues caused by 4c7f1819b3 (make color.ui default to
'auto', 2013-06-10). But in weakening "always", we created
even more problems, as people expect to be able to use "git
-c color.ui=always" to force color (especially because some
commands don't have their own --color flag). We can fix that
by special-casing the command-line "-c", but now things are
getting pretty confusing.

Instead of piling hacks upon hacks, let's start peeling off
the hacks. The first step is dropping the weakening of
"always", which this revert does.

Note that we could actually revert the whole series merged
in by da15b78e52642bd45fd5513abfdf2e58a6f4. Most of that
series consists of preparations to the tests to handle the
weakening of "-c color.ui=always". But it's worth keeping
for a few reasons:

  - there are some other preparatory cleanups, like
e433749d86 (test-terminal: set TERM=vt100, 2017-10-03)

  - it adds "--color" options more consistently in
0c88bf5050 (provide --color option for all ref-filter
users, 2017-10-03)

  - some of the cases dropping "-c" end up being more robust
and realistic tests, as in 01c94e9001 (t7508: use
test_terminal for color output, 2017-10-03)

  - the preferred tool for overriding config is "--color",
and we should be modeling that consistently

We can individually revert the few commits necessary to
restore some useful tests (which will be done on top of this
patch).

Note that this isn't a pure revert; we'll keep the test
added in t3701, but mark it as failure for now.

Signed-off-by: Jeff King 
---
 Documentation/config.txt   | 35 ++-
 color.c|  2 +-
 t/t3701-add-interactive.sh |  2 +-
 3 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index b53c994d0a..1ac0ae6adb 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1058,10 +1058,10 @@ clean.requireForce::
 
 color.branch::
A boolean to enable/disable color in the output of
-   linkgit:git-branch[1]. May be set to `false` (or `never`) to
-   disable color entirely, `auto` (or `true` or `always`) in which
-   case colors are used only when the output is to a terminal.  If
-   unset, then the value of `color.ui` is used (`auto` by default).
+   linkgit:git-branch[1]. May be set to `always`,
+   `false` (or `never`) or `auto` (or `true`), in which case colors are 
used
+   only when the output is to a terminal. If unset, then the
+   value of `color.ui` is used (`auto` by default).
 
 color.branch.::
Use customized color for branch coloration. `` is one of
@@ -1072,11 +1072,12 @@ color.branch.::
 
 color.diff::
Whether to use ANSI escape sequences to add color to patches.
-   If this is set to `true` or `auto`, linkgit:git-diff[1],
+   If this is set to `always`, linkgit:git-diff[1],
linkgit:git-log[1], and linkgit:git-show[1] will use color
-   when output is to the terminal. The value `always` is a
-   historical synonym for `auto`.  If unset, then the value of
-   `color.ui` is used (`auto` by default).
+   for all patches.  If it is set to `true` or `auto`, those
+   commands will only use color when output is to the terminal.
+   If unset, then the value of `color.ui` is used (`auto` by
+   default).
 +
 This does not affect linkgit:git-format-patch[1] or the
 'git-diff-{asterisk}' plumbing commands.  Can be overridden on the
@@ -1140,12 +1141,12 @@ color.grep.::
 --
 
 color.interactive::
-   When set to `true` or `auto`, use colors for interactive prompts
+   When set to `always`, always use colors for interactive prompts
and displays (such as those used by "git-add --interactive" and
-   "git-clean --interactive") when the output is to the terminal.
-   When false (or `never`), never show colors. The value `always`
-   is a historical synonym for `auto`.  If unset, then the value of
-   `color.ui` is used (`auto` by default).
+   "git-clean --interactive"). When false (or `never`), never.
+   When set to `true` or `auto`, use colors only when the output is
+   to the terminal. If unset, then the value of `color.ui` is
+   used (`auto` by default).
 
 color.interactive.::
Use customized color for 'git add --interactive' and 'git clean
@@ -1192,10 +1193,10 @@ color.ui::
configuration to set a default for the `--color` option.  Set 

[PATCH 0/4] peeling back color.ui=always hacks

2017-10-13 Thread Jeff King
On Fri, Oct 13, 2017 at 09:06:38AM -0400, Jeff King wrote:

> > I think that it is too late, regardless of our release cycle.
> > 
> > "Plumbing never looks at color.ui" implies that "plumbing must not
> > get color.ui=auto from 4c7f1819", but given that 4c7f1819 is from
> > 2013, I'd be surprised if we stopped coloring output from plumbing
> > without getting any complaints from third-party script writers.
> 
> I agree that 4c7f1819 is the root of things. But there also weren't a
> lot of people complaining about it. I only noticed it as part of other
> work I was doing, and (perhaps foolishly) tried to clean it up.
> 
> All of the regressions people have actually _noticed_ stem from my
> 136c8c8b8f in v2.14.2. So I think it is a viable option to try to go
> back to the pre-v2.14.2 state. I.e.:
> 
>   1. Revert my 20120618c1 (color: downgrade "always" to "auto" only for
>  on-disk configuration, 2017-10-10) and the test changes that came
>  with it.
> 
>   2. Teach for-each-ref and tag to use git_color_default_config(). These
>  are the two I _know_ need this treatment as part of the series that
>  contained 136c8c8b8f. As you've noted there may be others, but I'd
>  be surprised if there are many. There hasn't been a lot of color
>  work in the last few months besides what I've done.
> 
>   3. Revert 136c8c8b8f.
> 
> That takes us back to the pre-regression state. The ancient bug from
> 4c7f1819 still exists, but that would be OK for v2.15. We'd probably
> want to bump the -rc cycle a bit to give more confidence that (2) caught
> everything.

Here's a series which does that. I'm torn on the correct direction, but
I took the time to prepare these patches because I wanted to have two
concrete alternatives to examine, not hand-waving about what one of the
solutions would look like.

I'm adding Jonathan back to the cc as somebody who was interested in the
whole sequence (I think these patches should stand alone as explaining
their reasoning, but you may want to look back in the thread a little
for context).

  [1/4]: Revert "color: make "always" the same as "auto" in config"
  [2/4]: Revert "t6006: drop "always" color config tests"
  [3/4]: Revert "color: check color.ui in git_default_config()"
  [4/4]: tag: respect color.ui config

 Documentation/config.txt   | 35 ++-
 builtin/branch.c   |  2 +-
 builtin/clean.c|  3 ++-
 builtin/grep.c |  2 +-
 builtin/show-branch.c  |  2 +-
 builtin/tag.c  |  2 +-
 color.c| 10 +-
 config.c   |  4 
 diff.c |  3 +++
 t/t6006-rev-list-format.sh | 20 +++-
 t/t6300-for-each-ref.sh|  5 +
 t/t7004-tag.sh |  6 ++
 12 files changed, 62 insertions(+), 32 deletions(-)

-Peff


[PATCH] diff: alias -q to --quiet

2017-10-13 Thread Anthony Sottile
Previously, `-q` was silently ignored:

Before:

$ git diff -q -- Documentation/; echo $?
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index a88c767..aa6e724 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -587,6 +587,7 @@ ifndef::git-log[]
That is, it exits with 1 if there were differences and
0 means no differences.

+-q::
 --quiet::
Disable all output of the program. Implies `--exit-code`.
 endif::git-log[]
0
$

After:
$ ./git diff -q -- Documentation/; echo $?
1
$

Signed-off-by: Anthony Sottile 
---
 Documentation/diff-options.txt | 1 +
 diff.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index a88c767..aa6e724 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -587,6 +587,7 @@ ifndef::git-log[]
That is, it exits with 1 if there were differences and
0 means no differences.
 
+-q::
 --quiet::
Disable all output of the program. Implies `--exit-code`.
 endif::git-log[]
diff --git a/diff.c b/diff.c
index 69f0357..13dfc3e 100644
--- a/diff.c
+++ b/diff.c
@@ -4751,7 +4751,7 @@ int diff_opt_parse(struct diff_options *options,
}
else if (!strcmp(arg, "--exit-code"))
DIFF_OPT_SET(options, EXIT_WITH_STATUS);
-   else if (!strcmp(arg, "--quiet"))
+   else if (!strcmp(arg, "-q") || !strcmp(arg, "--quiet"))
DIFF_OPT_SET(options, QUICK);
else if (!strcmp(arg, "--ext-diff"))
DIFF_OPT_SET(options, ALLOW_EXTERNAL);
-- 
2.7.4



Re: [PATCH v2 2/3] for-each-ref: let upstream/push optionally remote ref name

2017-10-13 Thread Kevin Daudt
On Thu, Oct 05, 2017 at 02:19:15PM +0200, Johannes Schindelin wrote:
> From: J Wyman 
> [..] 
> 
> diff --git a/Documentation/git-for-each-ref.txt 
> b/Documentation/git-for-each-ref.txt
> index 39f50bd53eb..22767025850 100644
> --- a/Documentation/git-for-each-ref.txt
> +++ b/Documentation/git-for-each-ref.txt
> @@ -142,8 +142,9 @@ upstream::
>   encountered. Append `:track,nobracket` to show tracking
>   information without brackets (i.e "ahead N, behind M").
>  +
> -Also respects `:remotename` to state the name of the *remote* instead of
> -the ref.
> +Also respects `:remotename` to state the name of the *remote* instead
> +of the ref, and `:remoteref` to state the name of the *reference* as
> +locally known by the remote.

What does "locally known by the remote" mean in this sentence?

>  +
>  Has no effect if the ref does not have tracking information associated
>  with it.  All the options apart from `nobracket` are mutually exclusive,
> @@ -152,9 +153,9 @@ but if used together the last option is selected.


Re: [PATCH] revision: quit pruning diff more quickly when possible

2017-10-13 Thread Jeff King
On Fri, Oct 13, 2017 at 11:37:50AM -0400, Derrick Stolee wrote:

> Thanks, Peff. This patch looks good to me.
> 
> I tried a few other things like adding a flag DIFF_OPT_HAS_ANY_CHANGE next
> to DIFF_OPT_HAS_CHANGES that we could check in diff_can_quit_early() but it
> had side-effects that broke existing tests. From this exploration, it does
> seem necessary to be aware of 'remove_empty_trees'.

Keep in mind that the regular diff_change callbacks already handle this
case[1].

The file_change callbacks are specific to the revision machinery's
pruning diff, and intentionally hold back the HAS_CHANGES flag.

-Peff

[1] I tried "git diff-tree --root -r --quiet 45546f17e" on the bomb
repo, and it went quickly. Dropping --quiet makes it take a really
long time.


Re: [PATCH] revision: quit pruning diff more quickly when possible

2017-10-13 Thread Derrick Stolee

On 10/13/2017 11:27 AM, Jeff King wrote:

On Fri, Oct 13, 2017 at 10:26:46AM -0400, Jeff King wrote:


On Fri, Oct 13, 2017 at 10:25:10AM -0400, Derrick Stolee wrote:


This does appear to be the problem. The missing DIFF_OPT_HAS_CHANGES is
causing diff_can_quit_early() to return false. Due to the corner-case of the
bug it seems it will not be a huge performance improvement in most cases.
Still worth fixing and I'm looking at your suggestions to try and learn this
area better.

Yeah, I just timed some pathspec limits on linux.git, and it makes at
best a fraction of a percent improvement (but any improvement is well
within run-to-run noise). Which is not surprising.

I agree it's worth fixing, though.

Here it is cleaned up and with a commit message. There's another case
that can be optimized, too: --remove-empty with an all-deletions commit.
That's probably even more obscure and pathological, but it was easy to
cover in the same breath.

I didn't bother making a perf script, since this really isn't indicative
of real-world performance. If we wanted to do perf regression tests
here, I think the best path forward would be:

   1. Make sure there the perf tests cover pathspecs (maybe in p0001?).

   2. Make it easy to run the whole perf suite against a "bomb" repo.
  This surely isn't the only slow thing of interest.

-- >8 --
Subject: revision: quit pruning diff more quickly when possible

When the revision traversal machinery is given a pathspec,
we must compute the parent-diff for each commit to determine
which ones are TREESAME. We set the QUICK diff flag to avoid
looking at more entries than we need; we really just care
whether there are any changes at all.

But there is one case where we want to know a bit more: if
--remove-empty is set, we care about finding cases where the
change consists only of added entries (in which case we may
prune the parent in try_to_simplify_commit()). To cover that
case, our file_add_remove() callback does not quit the diff
upon seeing an added entry; it keeps looking for other types
of entries.

But this means when --remove-empty is not set (and it is not
by default), we compute more of the diff than is necessary.
You can see this in a pathological case where a commit adds
a very large number of entries, and we limit based on a
broad pathspec. E.g.:

   perl -e '
 chomp(my $blob = `git hash-object -w --stdin remove_empty_trees. This callback parameter could be
passed to the "add_remove" and "change" callbacks, but
there's not much point. They already receive the
diff_options struct, and doing it this way avoids having to
update the function signature of the other callbacks
(arguably the format_callback and output_prefix functions
could benefit from the same simplification).

Signed-off-by: Jeff King 
---
  diff.h |  1 +
  revision.c | 16 +---
  2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/diff.h b/diff.h
index 7dcfcfbef7..4a34d256f1 100644
--- a/diff.h
+++ b/diff.h
@@ -180,6 +180,7 @@ struct diff_options {
pathchange_fn_t pathchange;
change_fn_t change;
add_remove_fn_t add_remove;
+   void *change_fn_data;
diff_format_fn_t format_callback;
void *format_callback_data;
diff_prefix_fn_t output_prefix;
diff --git a/revision.c b/revision.c
index 8fd222f3bf..a3f245e2cc 100644
--- a/revision.c
+++ b/revision.c
@@ -399,8 +399,16 @@ static struct commit *one_relevant_parent(const struct 
rev_info *revs,
   * if the whole diff is removal of old data, and otherwise
   * REV_TREE_DIFFERENT (of course if the trees are the same we
   * want REV_TREE_SAME).
- * That means that once we get to REV_TREE_DIFFERENT, we do not
- * have to look any further.
+ *
+ * The only time we care about the distinction is when
+ * remove_empty_trees is in effect, in which case we care only about
+ * whether the whole change is REV_TREE_NEW, or if there's another type
+ * of change. Which means we can stop the diff early in either of these
+ * cases:
+ *
+ *   1. We're not using remove_empty_trees at all.
+ *
+ *   2. We saw anything except REV_TREE_NEW.
   */
  static int tree_difference = REV_TREE_SAME;
  
@@ -411,9 +419,10 @@ static void file_add_remove(struct diff_options *options,

const char *fullpath, unsigned dirty_submodule)
  {
int diff = addremove == '+' ? REV_TREE_NEW : REV_TREE_OLD;
+   struct rev_info *revs = options->change_fn_data;
  
  	tree_difference |= diff;

-   if (tree_difference == REV_TREE_DIFFERENT)
+   if (!revs->remove_empty_trees || tree_difference != REV_TREE_NEW)
DIFF_OPT_SET(options, HAS_CHANGES);
  }
  
@@ -1351,6 +1360,7 @@ void init_revisions(struct rev_info *revs, const char *prefix)

DIFF_OPT_SET(>pruning, QUICK);
revs->pruning.add_remove = file_add_remove;
revs->pruning.change = file_change;
+   revs->pruning.change_fn_data = revs;
revs->sort_order = 

[PATCH] revision: quit pruning diff more quickly when possible

2017-10-13 Thread Jeff King
On Fri, Oct 13, 2017 at 10:26:46AM -0400, Jeff King wrote:

> On Fri, Oct 13, 2017 at 10:25:10AM -0400, Derrick Stolee wrote:
> 
> > This does appear to be the problem. The missing DIFF_OPT_HAS_CHANGES is
> > causing diff_can_quit_early() to return false. Due to the corner-case of the
> > bug it seems it will not be a huge performance improvement in most cases.
> > Still worth fixing and I'm looking at your suggestions to try and learn this
> > area better.
> 
> Yeah, I just timed some pathspec limits on linux.git, and it makes at
> best a fraction of a percent improvement (but any improvement is well
> within run-to-run noise). Which is not surprising.
> 
> I agree it's worth fixing, though.

Here it is cleaned up and with a commit message. There's another case
that can be optimized, too: --remove-empty with an all-deletions commit.
That's probably even more obscure and pathological, but it was easy to
cover in the same breath.

I didn't bother making a perf script, since this really isn't indicative
of real-world performance. If we wanted to do perf regression tests
here, I think the best path forward would be:

  1. Make sure there the perf tests cover pathspecs (maybe in p0001?).

  2. Make it easy to run the whole perf suite against a "bomb" repo.
 This surely isn't the only slow thing of interest.

-- >8 --
Subject: revision: quit pruning diff more quickly when possible

When the revision traversal machinery is given a pathspec,
we must compute the parent-diff for each commit to determine
which ones are TREESAME. We set the QUICK diff flag to avoid
looking at more entries than we need; we really just care
whether there are any changes at all.

But there is one case where we want to know a bit more: if
--remove-empty is set, we care about finding cases where the
change consists only of added entries (in which case we may
prune the parent in try_to_simplify_commit()). To cover that
case, our file_add_remove() callback does not quit the diff
upon seeing an added entry; it keeps looking for other types
of entries.

But this means when --remove-empty is not set (and it is not
by default), we compute more of the diff than is necessary.
You can see this in a pathological case where a commit adds
a very large number of entries, and we limit based on a
broad pathspec. E.g.:

  perl -e '
chomp(my $blob = `git hash-object -w --stdin remove_empty_trees. This callback parameter could be
passed to the "add_remove" and "change" callbacks, but
there's not much point. They already receive the
diff_options struct, and doing it this way avoids having to
update the function signature of the other callbacks
(arguably the format_callback and output_prefix functions
could benefit from the same simplification).

Signed-off-by: Jeff King 
---
 diff.h |  1 +
 revision.c | 16 +---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/diff.h b/diff.h
index 7dcfcfbef7..4a34d256f1 100644
--- a/diff.h
+++ b/diff.h
@@ -180,6 +180,7 @@ struct diff_options {
pathchange_fn_t pathchange;
change_fn_t change;
add_remove_fn_t add_remove;
+   void *change_fn_data;
diff_format_fn_t format_callback;
void *format_callback_data;
diff_prefix_fn_t output_prefix;
diff --git a/revision.c b/revision.c
index 8fd222f3bf..a3f245e2cc 100644
--- a/revision.c
+++ b/revision.c
@@ -399,8 +399,16 @@ static struct commit *one_relevant_parent(const struct 
rev_info *revs,
  * if the whole diff is removal of old data, and otherwise
  * REV_TREE_DIFFERENT (of course if the trees are the same we
  * want REV_TREE_SAME).
- * That means that once we get to REV_TREE_DIFFERENT, we do not
- * have to look any further.
+ *
+ * The only time we care about the distinction is when
+ * remove_empty_trees is in effect, in which case we care only about
+ * whether the whole change is REV_TREE_NEW, or if there's another type
+ * of change. Which means we can stop the diff early in either of these
+ * cases:
+ *
+ *   1. We're not using remove_empty_trees at all.
+ *
+ *   2. We saw anything except REV_TREE_NEW.
  */
 static int tree_difference = REV_TREE_SAME;
 
@@ -411,9 +419,10 @@ static void file_add_remove(struct diff_options *options,
const char *fullpath, unsigned dirty_submodule)
 {
int diff = addremove == '+' ? REV_TREE_NEW : REV_TREE_OLD;
+   struct rev_info *revs = options->change_fn_data;
 
tree_difference |= diff;
-   if (tree_difference == REV_TREE_DIFFERENT)
+   if (!revs->remove_empty_trees || tree_difference != REV_TREE_NEW)
DIFF_OPT_SET(options, HAS_CHANGES);
 }
 
@@ -1351,6 +1360,7 @@ void init_revisions(struct rev_info *revs, const char 
*prefix)
DIFF_OPT_SET(>pruning, QUICK);
revs->pruning.add_remove = file_add_remove;
revs->pruning.change = file_change;
+   revs->pruning.change_fn_data = revs;
revs->sort_order = REV_SORT_IN_GRAPH_ORDER;

Re: git-clone causes out of memory

2017-10-13 Thread Derrick Stolee

On 10/13/2017 10:26 AM, Jeff King wrote:

On Fri, Oct 13, 2017 at 10:25:10AM -0400, Derrick Stolee wrote:


This does appear to be the problem. The missing DIFF_OPT_HAS_CHANGES is
causing diff_can_quit_early() to return false. Due to the corner-case of the
bug it seems it will not be a huge performance improvement in most cases.
Still worth fixing and I'm looking at your suggestions to try and learn this
area better.

Yeah, I just timed some pathspec limits on linux.git, and it makes at
best a fraction of a percent improvement (but any improvement is well
within run-to-run noise). Which is not surprising.

I agree it's worth fixing, though.

-Peff


I just ran a first-level folder history in the Windows repository and 
`git rev-list HEAD -- windows >/dev/null` went from 0.43s to 0.04s. So, 
a good percentage improvement but even on this enormous repo the bug 
doesn't present an issue to human users.


Thanks,
-Stolee


Re: git-clone causes out of memory

2017-10-13 Thread Jeff King
On Fri, Oct 13, 2017 at 10:25:10AM -0400, Derrick Stolee wrote:

> This does appear to be the problem. The missing DIFF_OPT_HAS_CHANGES is
> causing diff_can_quit_early() to return false. Due to the corner-case of the
> bug it seems it will not be a huge performance improvement in most cases.
> Still worth fixing and I'm looking at your suggestions to try and learn this
> area better.

Yeah, I just timed some pathspec limits on linux.git, and it makes at
best a fraction of a percent improvement (but any improvement is well
within run-to-run noise). Which is not surprising.

I agree it's worth fixing, though.

-Peff


Re: git-clone causes out of memory

2017-10-13 Thread Derrick Stolee

On 10/13/2017 10:20 AM, Jeff King wrote:

On Fri, Oct 13, 2017 at 10:10:18AM -0400, Jeff King wrote:


Hmm. So this patch makes it go fast:

diff --git a/revision.c b/revision.c
index d167223e69..b52ea4e9d8 100644
--- a/revision.c
+++ b/revision.c
@@ -409,7 +409,7 @@ static void file_add_remove(struct diff_options *options,
int diff = addremove == '+' ? REV_TREE_NEW : REV_TREE_OLD;
  
  	tree_difference |= diff;

-   if (tree_difference == REV_TREE_DIFFERENT)
+   if (tree_difference & REV_TREE_DIFFERENT)
DIFF_OPT_SET(options, HAS_CHANGES);
  }
  


But that essentially makes the conditional a noop (since we know we set
either NEW or OLD above and DIFFERENT is the union of those flags).

I'm not sure I understand why file_add_remove() would ever want to avoid
setting HAS_CHANGES (certainly its companion file_change() always does).
This goes back to Junio's dd47aa3133 (try-to-simplify-commit: use
diff-tree --quiet machinery., 2007-03-14).

Maybe I am missing something, but AFAICT this was always buggy. But
since it only affects adds and deletes, maybe nobody noticed? I'm also
not sure if it only causes a slowdown, or if this could cause us to
erroneously mark something as TREESAME which isn't (I _do_ think people
would have noticed that).

Answering my own question a little, there is a hint in the comment
a few lines above:

   /*
* The goal is to get REV_TREE_NEW as the result only if the
* diff consists of all '+' (and no other changes), REV_TREE_OLD
* if the whole diff is removal of old data, and otherwise
* REV_TREE_DIFFERENT (of course if the trees are the same we
* want REV_TREE_SAME).
* That means that once we get to REV_TREE_DIFFERENT, we do not
* have to look any further.
*/

So my patch above is breaking that. But it's not clear from that comment
why we care about knowing the different between NEW, OLD, and DIFFERENT.

Grepping around for REV_TREE_NEW and REV_TREE_OLD, I think the answer is
in try_to_simplify_commit():

  case REV_TREE_NEW:
   if (revs->remove_empty_trees &&
   rev_same_tree_as_empty(revs, p)) {
   /* We are adding all the specified
* paths from this parent, so the
* history beyond this parent is not
* interesting.  Remove its parents
* (they are grandparents for us).
* IOW, we pretend this parent is a
* "root" commit.
*/
   if (parse_commit(p) < 0)
   die("cannot simplify commit %s (invalid %s)",
   oid_to_hex(>object.oid),
   oid_to_hex(>object.oid));
   p->parents = NULL;
   }
 /* fallthrough */
 case REV_TREE_OLD:
 case REV_TREE_DIFFERENT:

So when --remove-empty is not in effect (and it's not by default), we
don't care about OLD vs NEW, and we should be able to optimize further.

-Peff


This does appear to be the problem. The missing DIFF_OPT_HAS_CHANGES is 
causing diff_can_quit_early() to return false. Due to the corner-case of 
the bug it seems it will not be a huge performance improvement in most 
cases. Still worth fixing and I'm looking at your suggestions to try and 
learn this area better.


It will speed up natural cases of someone adding or renaming a folder 
with a lot of contents, including someone initializing a repository from 
an existing codebase. This git bomb case happens to add all of the 
folders at once, which is why the performance is so noticeable. I use a 
version of this "exponential file growth" for testing that increases the 
folder-depth one commit at a time, so the contents are rather small in 
the first "add", hence not noticing this issue.


Thanks,
-Stolee


Re: git-clone causes out of memory

2017-10-13 Thread Jeff King
On Fri, Oct 13, 2017 at 10:10:18AM -0400, Jeff King wrote:

> Hmm. So this patch makes it go fast:
> 
> diff --git a/revision.c b/revision.c
> index d167223e69..b52ea4e9d8 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -409,7 +409,7 @@ static void file_add_remove(struct diff_options *options,
>   int diff = addremove == '+' ? REV_TREE_NEW : REV_TREE_OLD;
>  
>   tree_difference |= diff;
> - if (tree_difference == REV_TREE_DIFFERENT)
> + if (tree_difference & REV_TREE_DIFFERENT)
>   DIFF_OPT_SET(options, HAS_CHANGES);
>  }
>  
> 
> But that essentially makes the conditional a noop (since we know we set
> either NEW or OLD above and DIFFERENT is the union of those flags).
> 
> I'm not sure I understand why file_add_remove() would ever want to avoid
> setting HAS_CHANGES (certainly its companion file_change() always does).
> This goes back to Junio's dd47aa3133 (try-to-simplify-commit: use
> diff-tree --quiet machinery., 2007-03-14).
> 
> Maybe I am missing something, but AFAICT this was always buggy. But
> since it only affects adds and deletes, maybe nobody noticed? I'm also
> not sure if it only causes a slowdown, or if this could cause us to
> erroneously mark something as TREESAME which isn't (I _do_ think people
> would have noticed that).

Answering my own question a little, there is a hint in the comment
a few lines above:

  /*
   * The goal is to get REV_TREE_NEW as the result only if the
   * diff consists of all '+' (and no other changes), REV_TREE_OLD
   * if the whole diff is removal of old data, and otherwise
   * REV_TREE_DIFFERENT (of course if the trees are the same we
   * want REV_TREE_SAME).
   * That means that once we get to REV_TREE_DIFFERENT, we do not
   * have to look any further.
   */

So my patch above is breaking that. But it's not clear from that comment
why we care about knowing the different between NEW, OLD, and DIFFERENT.

Grepping around for REV_TREE_NEW and REV_TREE_OLD, I think the answer is
in try_to_simplify_commit():

 case REV_TREE_NEW:
  if (revs->remove_empty_trees &&
  rev_same_tree_as_empty(revs, p)) {
  /* We are adding all the specified
   * paths from this parent, so the
   * history beyond this parent is not
   * interesting.  Remove its parents
   * (they are grandparents for us).
   * IOW, we pretend this parent is a
   * "root" commit.
   */
  if (parse_commit(p) < 0)
  die("cannot simplify commit %s (invalid %s)",
  oid_to_hex(>object.oid),
  oid_to_hex(>object.oid));
  p->parents = NULL;
  }
/* fallthrough */
case REV_TREE_OLD:
case REV_TREE_DIFFERENT:

So when --remove-empty is not in effect (and it's not by default), we
don't care about OLD vs NEW, and we should be able to optimize further.

-Peff


Re: git-clone causes out of memory

2017-10-13 Thread Jeff King
On Fri, Oct 13, 2017 at 09:56:36AM -0400, Jeff King wrote:

> On Fri, Oct 13, 2017 at 09:55:15AM -0400, Derrick Stolee wrote:
> 
> > > We should be comparing an empty tree and d0/d0/d0/d0 (or however deep
> > > your pathspec goes). We should be able to see immediately that the entry
> > > is not present between the two and not bother descending. After all,
> > > we've set the QUICK flag in init_revisions(). So the real question is
> > > why QUICK is not kicking in.
> > 
> > I'm struggling to understand your meaning. We want to walk from root to
> > d0/d0/d0/d0, but there is no reason to walk beyond that tree. But maybe
> > that's what the QUICK flag is supposed to do.
> 
> Yes, that's exactly what it is for. When we see the first difference we
> should say "aha, the caller only wanted to know whether there was a
> difference, not what it was" and return immediately. See
> diff_can_quit_early().

Hmm. So this patch makes it go fast:

diff --git a/revision.c b/revision.c
index d167223e69..b52ea4e9d8 100644
--- a/revision.c
+++ b/revision.c
@@ -409,7 +409,7 @@ static void file_add_remove(struct diff_options *options,
int diff = addremove == '+' ? REV_TREE_NEW : REV_TREE_OLD;
 
tree_difference |= diff;
-   if (tree_difference == REV_TREE_DIFFERENT)
+   if (tree_difference & REV_TREE_DIFFERENT)
DIFF_OPT_SET(options, HAS_CHANGES);
 }
 

But that essentially makes the conditional a noop (since we know we set
either NEW or OLD above and DIFFERENT is the union of those flags).

I'm not sure I understand why file_add_remove() would ever want to avoid
setting HAS_CHANGES (certainly its companion file_change() always does).
This goes back to Junio's dd47aa3133 (try-to-simplify-commit: use
diff-tree --quiet machinery., 2007-03-14).

Maybe I am missing something, but AFAICT this was always buggy. But
since it only affects adds and deletes, maybe nobody noticed? I'm also
not sure if it only causes a slowdown, or if this could cause us to
erroneously mark something as TREESAME which isn't (I _do_ think people
would have noticed that).

-Peff


Re: git-clone causes out of memory

2017-10-13 Thread Jeff King
On Fri, Oct 13, 2017 at 09:55:15AM -0400, Derrick Stolee wrote:

> > We should be comparing an empty tree and d0/d0/d0/d0 (or however deep
> > your pathspec goes). We should be able to see immediately that the entry
> > is not present between the two and not bother descending. After all,
> > we've set the QUICK flag in init_revisions(). So the real question is
> > why QUICK is not kicking in.
> 
> I'm struggling to understand your meaning. We want to walk from root to
> d0/d0/d0/d0, but there is no reason to walk beyond that tree. But maybe
> that's what the QUICK flag is supposed to do.

Yes, that's exactly what it is for. When we see the first difference we
should say "aha, the caller only wanted to know whether there was a
difference, not what it was" and return immediately. See
diff_can_quit_early().

-Peff


Re: git-clone causes out of memory

2017-10-13 Thread Derrick Stolee

On 10/13/2017 9:50 AM, Jeff King wrote:

On Fri, Oct 13, 2017 at 09:39:14AM -0400, Derrick Stolee wrote:


Since I don't understand enough about the consumers to diff_tree_oid() (and
the fact that the recursive behavior may be wanted in some cases), I think
we can fix this in builtin/rev-list.c with this simple diff:

---

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index ded1577424..b2e8e02cc8 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -285,6 +285,9 @@ int cmd_rev_list(int argc, const char **argv, const char
*prefix)

     git_config(git_default_config, NULL);
     init_revisions(, prefix);
+
+   revs.pruning.flags = revs.pruning.flags & ~DIFF_OPT_RECURSIVE;
+


(Note: I'm running tests now and see that this breaks behavior. 
Definitely not the solution we want.)



Hmm, this feels wrong, because we _do_ want to recurse down and follow
the pathspec to see if there are real changes.

We should be comparing an empty tree and d0/d0/d0/d0 (or however deep
your pathspec goes). We should be able to see immediately that the entry
is not present between the two and not bother descending. After all,
we've set the QUICK flag in init_revisions(). So the real question is
why QUICK is not kicking in.

-Peff


I'm struggling to understand your meaning. We want to walk from root to 
d0/d0/d0/d0, but there is no reason to walk beyond that tree. But maybe 
that's what the QUICK flag is supposed to do.


Thanks,
-Stolee


Re: git-clone causes out of memory

2017-10-13 Thread Jeff King
On Fri, Oct 13, 2017 at 09:39:14AM -0400, Derrick Stolee wrote:

> Since I don't understand enough about the consumers to diff_tree_oid() (and
> the fact that the recursive behavior may be wanted in some cases), I think
> we can fix this in builtin/rev-list.c with this simple diff:
> 
> ---
> 
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index ded1577424..b2e8e02cc8 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -285,6 +285,9 @@ int cmd_rev_list(int argc, const char **argv, const char
> *prefix)
> 
>     git_config(git_default_config, NULL);
>     init_revisions(, prefix);
> +
> +   revs.pruning.flags = revs.pruning.flags & ~DIFF_OPT_RECURSIVE;
> +

Hmm, this feels wrong, because we _do_ want to recurse down and follow
the pathspec to see if there are real changes.

We should be comparing an empty tree and d0/d0/d0/d0 (or however deep
your pathspec goes). We should be able to see immediately that the entry
is not present between the two and not bother descending. After all,
we've set the QUICK flag in init_revisions(). So the real question is
why QUICK is not kicking in.

-Peff


Re: git-clone causes out of memory

2017-10-13 Thread Derrick Stolee

On 10/13/2017 9:15 AM, Derrick Stolee wrote:

On 10/13/2017 8:44 AM, Jeff King wrote:

On Fri, Oct 13, 2017 at 03:12:43PM +0300, Constantine wrote:


On 13.10.2017 15:04, Junio C Hamano wrote:

Christian Couder  writes:


Yeah, but perhaps Git could be smarter when rev-listing too and avoid
processing files or directories it has already seen?

Aren't you suggesting to optimize for a wrong case?

Anything that is possible with a software should be considered as a 
possible
usecase. It's in fact a DoS attack. Imagine there's a server that 
using git
to process something, and now there's a way to knock down this 
server. It's

also bad from a promotional stand point.

But the point is that you'd have the same problem with any repository
that had 10^7 files in it. Yes, it's convenient for the attacker that
there are only 9 objects, but fundamentally it's pretty easy for an
attacker to construct repositories that have large trees (or very deep
trees -- that's what causes stack exhaustion in some cases).

Note too that this attack almost always comes down to the diff code
(which is why it kicks in for pathspec limiting) which has to actually
expand the tree. Most "normal" server-side operations (like accepting
pushes or serving fetches) operate only on the object graph and _do_
avoid processing already-seen objects.

As soon as servers start trying to checkout or diff, though, the attack
surface gets quite large. And you really need to start thinking about
having resource limits and quotas for CPU and memory use of each process
(and group by requesting user, IP, repo, etc).

I think the main thing Git could be doing here is to limit the size of
the tree (both width and depth). But arbitrary limits like that have a
way of being annoying, and I think it just pushes resource-exhaustion
attacks off a little (e.g., can you construct a blob that behaves badly
with the "--patch"?).

-Peff


I'm particularly interested in why `git rev-list HEAD -- [path]` gets 
slower in this case, because I wrote the history algorithm used by 
VSTS. In our algorithm, we only walk the list of objects from commit 
to the tree containing the path item. For example, in the path 
d0/d0/d0, we would only walk:


    commit --root--> tree --d0--> tree --d0--> tree [parse oid for d0 
entry]


From this, we can determine the TREESAME relationship by parsing four 
objects without parsing all contents below d0/d0/d0.


The reason we have exponential behavior in `git rev-list` is because 
we are calling diff_tree_oid() in tree-diff.c recursively without 
short-circuiting on equal OIDs.


I will prepare a patch that adds this OID-equal short-circuit to avoid 
this exponential behavior. I'll model my patch against a similar patch 
in master:


    commit d12a8cf0af18804c2000efc7a0224da631e04cd1 unpack-trees: 
avoid duplicate ODB lookups during checkout


It will also significantly speed up rev-list calls for short paths in 
deep repositories. It will not be very measurable in the git or Linux 
repos because their shallow folder structure.


Thanks,
-Stolee


Since I don't understand enough about the consumers to diff_tree_oid() 
(and the fact that the recursive behavior may be wanted in some cases), 
I think we can fix this in builtin/rev-list.c with this simple diff:


---

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index ded1577424..b2e8e02cc8 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -285,6 +285,9 @@ int cmd_rev_list(int argc, const char **argv, const 
char *prefix)


    git_config(git_default_config, NULL);
    init_revisions(, prefix);
+
+   revs.pruning.flags = revs.pruning.flags & ~DIFF_OPT_RECURSIVE;
+
    revs.abbrev = DEFAULT_ABBREV;
    revs.commit_format = CMIT_FMT_UNSPECIFIED;
    argc = setup_revisions(argc, argv, , NULL);



Re: git-clone causes out of memory

2017-10-13 Thread Derrick Stolee

On 10/13/2017 8:44 AM, Jeff King wrote:

On Fri, Oct 13, 2017 at 03:12:43PM +0300, Constantine wrote:


On 13.10.2017 15:04, Junio C Hamano wrote:

Christian Couder  writes:


Yeah, but perhaps Git could be smarter when rev-listing too and avoid
processing files or directories it has already seen?

Aren't you suggesting to optimize for a wrong case?


Anything that is possible with a software should be considered as a possible
usecase. It's in fact a DoS attack. Imagine there's a server that using git
to process something, and now there's a way to knock down this server. It's
also bad from a promotional stand point.

But the point is that you'd have the same problem with any repository
that had 10^7 files in it. Yes, it's convenient for the attacker that
there are only 9 objects, but fundamentally it's pretty easy for an
attacker to construct repositories that have large trees (or very deep
trees -- that's what causes stack exhaustion in some cases).

Note too that this attack almost always comes down to the diff code
(which is why it kicks in for pathspec limiting) which has to actually
expand the tree. Most "normal" server-side operations (like accepting
pushes or serving fetches) operate only on the object graph and _do_
avoid processing already-seen objects.

As soon as servers start trying to checkout or diff, though, the attack
surface gets quite large. And you really need to start thinking about
having resource limits and quotas for CPU and memory use of each process
(and group by requesting user, IP, repo, etc).

I think the main thing Git could be doing here is to limit the size of
the tree (both width and depth). But arbitrary limits like that have a
way of being annoying, and I think it just pushes resource-exhaustion
attacks off a little (e.g., can you construct a blob that behaves badly
with the "--patch"?).

-Peff


I'm particularly interested in why `git rev-list HEAD -- [path]` gets 
slower in this case, because I wrote the history algorithm used by VSTS. 
In our algorithm, we only walk the list of objects from commit to the 
tree containing the path item. For example, in the path d0/d0/d0, we 
would only walk:


    commit --root--> tree --d0--> tree --d0--> tree [parse oid for d0 
entry]


From this, we can determine the TREESAME relationship by parsing four 
objects without parsing all contents below d0/d0/d0.


The reason we have exponential behavior in `git rev-list` is because we 
are calling diff_tree_oid() in tree-diff.c recursively without 
short-circuiting on equal OIDs.


I will prepare a patch that adds this OID-equal short-circuit to avoid 
this exponential behavior. I'll model my patch against a similar patch 
in master:


    commit d12a8cf0af18804c2000efc7a0224da631e04cd1 unpack-trees: avoid 
duplicate ODB lookups during checkout


It will also significantly speed up rev-list calls for short paths in 
deep repositories. It will not be very measurable in the git or Linux 
repos because their shallow folder structure.


Thanks,
-Stolee


Re: [PATCH 3/3] branch: forbid refs/heads/HEAD

2017-10-13 Thread Jeff King
On Fri, Oct 13, 2017 at 02:11:32PM +0900, Junio C Hamano wrote:

> strbuf_check_branch_ref() is the central place where many codepaths
> see if a proposed name is suitable for the name of a branch.  It was
> designed to allow us to get stricter than the check_refname_format()
> check used for refnames in general, and we already use it to reject
> a branch whose name begins with a '-'.
> 
> Use it to also reject "HEAD" as a branch name.

Heh, I just pointed somebody to this a day or two ago as #leftoverbit. I
guess it's taken now. :)

Related to this: should we do a better job of confirming that the
refname is available for use?

If you do:

  git branch foo
  git checkout -b foo/bar

then "foo/bar" is not available. And for "checkout -b", we'd notice when
we tried to create the ref. But for:

  git checkout --orphan foo/bar

we'd update HEAD with a non-viable name, and only find out later during
"git commit". That's not the end of the world, but it might be nice to
complain when writing the symlink.

Largely orthogonal to the problem you're solving here, but I suspect it
may touch the same code, so it might be worth thinking about while we're
here.

> diff --git a/t/t1430-bad-ref-name.sh b/t/t1430-bad-ref-name.sh
> index e88349c8a0..3ecb2eab0c 100755
> --- a/t/t1430-bad-ref-name.sh
> +++ b/t/t1430-bad-ref-name.sh
> @@ -331,4 +331,12 @@ test_expect_success 'update-ref --stdin -z fails delete 
> with bad ref name' '
>   grep "fatal: invalid ref format: ~a" err
>  '
>  
> +test_expect_success 'branch rejects HEAD as a branch name' '
> + test_must_fail git branch HEAD HEAD^
> +'
> +
> +test_expect_success 'checkout -b rejects HEAD as a branch name' '
> + test_must_fail git checkout -B HEAD HEAD^
> +'

Should we test that:

  git update-ref refs/heads/HEAD HEAD^

continues to work?

-Peff


Re: [PATCH 1/2] color: downgrade "always" to "auto" only for on-disk configuration

2017-10-13 Thread Jeff King
On Fri, Oct 13, 2017 at 12:37:57PM +0900, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > OK. For the record, I'm not against scrapping this whole thing and
> > trying to rollback to your "plumbing never looks at color.ui" proposal.
> > It's quite late in the -rc cycle to do that, but there's nothing that
> > says we can't bump the release date if that's what we need to do to get
> > it right.
> 
> I think that it is too late, regardless of our release cycle.
> 
> "Plumbing never looks at color.ui" implies that "plumbing must not
> get color.ui=auto from 4c7f1819", but given that 4c7f1819 is from
> 2013, I'd be surprised if we stopped coloring output from plumbing
> without getting any complaints from third-party script writers.

I agree that 4c7f1819 is the root of things. But there also weren't a
lot of people complaining about it. I only noticed it as part of other
work I was doing, and (perhaps foolishly) tried to clean it up.

All of the regressions people have actually _noticed_ stem from my
136c8c8b8f in v2.14.2. So I think it is a viable option to try to go
back to the pre-v2.14.2 state. I.e.:

  1. Revert my 20120618c1 (color: downgrade "always" to "auto" only for
 on-disk configuration, 2017-10-10) and the test changes that came
 with it.

  2. Teach for-each-ref and tag to use git_color_default_config(). These
 are the two I _know_ need this treatment as part of the series that
 contained 136c8c8b8f. As you've noted there may be others, but I'd
 be surprised if there are many. There hasn't been a lot of color
 work in the last few months besides what I've done.

  3. Revert 136c8c8b8f.

That takes us back to the pre-regression state. The ancient bug from
4c7f1819 still exists, but that would be OK for v2.15. We'd probably
want to bump the -rc cycle a bit to give more confidence that (2) caught
everything.

Post-release, we would either:

  a. Do nothing. As far as we know, nobody has cared deeply about
 4c7f1819 for the past 4 years.

  b. Teach git_default_config() to respect "never" but not "always", so
 that you can disable the auto-color in the plumbing (but not shoot
 yourself in the foot).

  c. Go all-out and remove the "auto" behavior from plumbing. This is
 much more likely to have fallouts since we've had 4 years of the
 wrong behavior. But we'd have a whole cycle to identify
 regressions.

I'd probably vote for (b), followed by (a). Option (c) seems like a lot
of risk for little benefit.

But we could punt on that part until after the release. The only thing
we'd need to decide on now is that first set of reversions. What I
really _don't_ want to do is ship v2.15 with "always works like auto"
and then flip that back in v2.16.

I know you're probably infuriated with me because I'm essentially
arguing the opposite of what I did earlier in the cycle. And I really am
OK with going either way (shipping what's in -rc1 plus the "let 'always'
work from the command line", or doing something like what I outlined
above). But you've convinced me that the road I was going down really is
piling up the hacks, and I want to make it clear that I'm not married to
following the path I outlined earlier, and that I think there _is_ a
viable alternative.

-Peff


Re: [ANNOUNCE] Git for Windows 2.14.2(3)

2017-10-13 Thread Steve Hoelzer
Johannes,

On Thu, Oct 12, 2017 at 5:53 PM, Johannes Schindelin
 wrote:
> Dear Git users,
>
> It is my pleasure to announce that Git for Windows 2.14.2(3) is available 
> from:
>
> https://git-for-windows.github.io/
>
> Changes since Git for Windows v2.14.2(2) (October 5th 2017)
>
> New Features
>
>   * Comes with Git LFS v2.3.3.

I just ran "git update" and afterward "git version" reported
2.14.2(3), but "git lfs version" still said 2.3.2.

I also uninstalled/reinstalled Git for Windows and LFS is still 2.3.2.

Steve


Re: git-clone causes out of memory

2017-10-13 Thread Jeff King
On Fri, Oct 13, 2017 at 03:12:43PM +0300, Constantine wrote:

> On 13.10.2017 15:04, Junio C Hamano wrote:
> > Christian Couder  writes:
> > 
> > > Yeah, but perhaps Git could be smarter when rev-listing too and avoid
> > > processing files or directories it has already seen?
> > 
> > Aren't you suggesting to optimize for a wrong case?
> > 
> 
> Anything that is possible with a software should be considered as a possible
> usecase. It's in fact a DoS attack. Imagine there's a server that using git
> to process something, and now there's a way to knock down this server. It's
> also bad from a promotional stand point.

But the point is that you'd have the same problem with any repository
that had 10^7 files in it. Yes, it's convenient for the attacker that
there are only 9 objects, but fundamentally it's pretty easy for an
attacker to construct repositories that have large trees (or very deep
trees -- that's what causes stack exhaustion in some cases).

Note too that this attack almost always comes down to the diff code
(which is why it kicks in for pathspec limiting) which has to actually
expand the tree. Most "normal" server-side operations (like accepting
pushes or serving fetches) operate only on the object graph and _do_
avoid processing already-seen objects.

As soon as servers start trying to checkout or diff, though, the attack
surface gets quite large. And you really need to start thinking about
having resource limits and quotas for CPU and memory use of each process
(and group by requesting user, IP, repo, etc).

I think the main thing Git could be doing here is to limit the size of
the tree (both width and depth). But arbitrary limits like that have a
way of being annoying, and I think it just pushes resource-exhaustion
attacks off a little (e.g., can you construct a blob that behaves badly
with the "--patch"?).

-Peff


Re: git-clone causes out of memory

2017-10-13 Thread Jeff King
On Fri, Oct 13, 2017 at 07:06:03PM +0900, Mike Hommey wrote:

> On Fri, Oct 13, 2017 at 12:51:58PM +0300, Constantine wrote:
> > There's a gitbomb on github. It is undoubtedly creative and funny, but since
> > this is a bug in git, I thought it'd be nice to report. The command:
> > 
> > $ git clone https://github.com/x0rz/ShadowBrokersFiles
> 
> What fills memory is actually the checkout part of the command. git
> clone -n doesn't fail.
> 
> Credit should go where it's due: https://kate.io/blog/git-bomb/
> (with the bonus that it comes with explanations)

That is a nice explanation, and I think the dates point to the kate.io
post as the parent there.

I certainly have come across this type of "bomb" repo before, but I
couldn't come up with any public references (somebody uploaded such a
repository to GitHub in 2014). So I think this may be the first public
discussion of the idea.

-Peff


Re: git-clone causes out of memory

2017-10-13 Thread Constantine

On 13.10.2017 15:04, Junio C Hamano wrote:

Christian Couder  writes:


Yeah, but perhaps Git could be smarter when rev-listing too and avoid
processing files or directories it has already seen?


Aren't you suggesting to optimize for a wrong case?



Anything that is possible with a software should be considered as a 
possible usecase. It's in fact a DoS attack. Imagine there's a server 
that using git to process something, and now there's a way to knock down 
this server. It's also bad from a promotional stand point.


Re: git-clone causes out of memory

2017-10-13 Thread Junio C Hamano
Christian Couder  writes:

> Yeah, but perhaps Git could be smarter when rev-listing too and avoid
> processing files or directories it has already seen?

Aren't you suggesting to optimize for a wrong case?


Re: [PATCH v2] Documentation/git-config.txt: reword missleading sentence

2017-10-13 Thread Junio C Hamano
Moy Matthieu  writes:

>>>  --path::
>>> -   'git-config' will expand leading '{tilde}' to the value of
>>> +   'git config' will expand leading '{tilde}' to the value of
>>> '$HOME', and '{tilde}user' to the home directory for the
>
> Didn't notice yesterday, but you still have forward quotes here and
> backquotes right below. If you are to fix this paragraph, better fix all
> issues at once.

When we say ~user in this sentence, unlike $HOME, it is not
something the user would type literally; 'user' in that is a
placeholder to be replaced with a value appropriate in the real
life, e.g. ~moy.  So '{tilde}user' may actually be OK, even though I
agree that `$HOME` may be more correct.

>>> specified user.  This option has no effect when setting the
>>> -   value (but you can use 'git config bla {tilde}/' from the
>>> -   command line to let your shell do the expansion).
>>> +   value (but you can use `git config section.variable {tilde}/`
>>
>> Does this reference to {tilde} get expanded inside the `literal`
>> mark-up?  ...
>
> If I read correctly, the potential issue with ~ is that it's used for
> subscript text (i.e. foo~bar~ in asciidoc is LaTeX's $foo_{bar}$). But ~
> within a literal string should be safe, and at least we use it in many
> places in our doc.

My comment was not about "safety" but about correctness.  At least
for me, `{tilde}user` does not expand to ~user, but instead spell
out open-brace, tee, eye, ..., close-brace, followed by "user",
which is not what we want.


Congratulations!

2017-10-13 Thread Oxfam Team
You won a donation funds of one million united states dollar. get back for 
more details


Re: [PATCH/RFC] git-post: the opposite of git-cherry-pick

2017-10-13 Thread Ævar Arnfjörð Bjarmason

On Thu, Oct 05 2017, Johannes Sixt jotted:

> Am 05.10.2017 um 21:33 schrieb Stefan Beller:
>> On Thu, Oct 5, 2017 at 12:13 PM, Johannes Sixt  wrote:
>>> +git-post(1)
>>> +===
>>> +
>>> +NAME
>>> +
>>> +git-post - Apply a commit on top of a branch that is not checked out

It would be great to have this somehow, whatever the UI ends up being.

>> Contrasted to:
>>git-cherry-pick - Apply the changes introduced by some existing commits
>> Some questions on top of my head:
>> * Do we want to emphasize the cherry-pick to say checked out?
>
> Maybe. Maybe "cherry picking" is sufficiently clear that it is not needed.
>
>> * Is it a good design choice to have a different command, just because the
>>target branch is [not] checked out?
>
> I would not want to make it a sub-mode of cherry-pick, if that is what
> you mean, because "cherry picking" is about getting something, not
> giving something away.

It occurs to me that a better long-term UI design might be to make
git-{cherry-pick,pick) some sort of submodes for git-commit.

Right now git-commit picks the current index for committing, but "use a
patch as the source instead" could be a submode.

Right now it commits that change on top of your checked out commit, but
"other non-checked-out target commit" could be a mode instead,
i.e. exposing more of the power of the underlying git-commit-tree.

Just food for thought.

>> * Naming: I just searched for synonyms "opposite of cherry-picking" and
>>came up with cherry-put, cherry-post, target-commit
>
> With command line completion, we have to type 'git cherr-'
> currently. If we add another command that begins with 'cherry-',
> another  is needed. Therefore, I do not want to use a name
> starting with 'cherry-'.
>
>> * What was the rationale for naming it "post" (it sounds very generic to me)?
>
> Yes, it is generic, but I did not find a better word that means "give
> away" a commit. Perhaps "tack"?

[Not entirely serious]. Well if cherry-picking is taking a thing and
eating it here, maybe git-cherry-puke takes an already digested thing
and "throws" it elsewhere ?:)

It's a silly name but it's somewhat symmetric :)

>> * Does it play well with worktrees? (i.e. if a worktree has a branch checked
>>out, we cannot post there, or can we?)
>
> Good point. It should be forbidden, but there are no safety checks,
> currently.
>
>>> +EXAMPLES
>>> +
>>> +
>>> +Assume, while working on a topic, you find and fix an unrelated bug.
>>> +Now:
>>> +
>>> +
>>> +$ git commit   <1>
>>> +$ git post master  <2>
>>> +$ git show | git apply -R && git reset HEAD^   <3>
>>> +
>>> +
>>> +<1> create a commit with the fix on the current branch
>>> +<2> copy the fix onto the branch where it ought to be
>>> +<3> revert current topic branch to the unfixed state;
>>> +can also be done with `git reset --keep HEAD^` if there are no
>>> +unstaged changes in files that are modified by the fix
>>> +
>>> +Oftentimes, switching branches triggers a rebuild of a code base.
>>> +With the sequence above the branch switch can be avoided.
>>> +That said, it is good practice to test the bug fix on the
>>> +destination branch eventually.
>>
>> This is a cool and motivating example.
>
> Thanks. Another use case is when you receive a patch to be applied on
> a different branch while you are in the middle of some work. If it can
> be applied on the current branch, then you can post it to the
> destination, rewind, and continue with your work.
>
>>> +BUGS
>>> +
>>> +
>>> +The change can be applied on `dest-branch` only if there is
>>> +no textual conflict.
>>
>> This is not a bug per se, it could be signaled via a return code that
>> the posting was unsuccessful.
>
> Oh, it does so return an error. But there are some cases that one
> expects that work, but where git-apply is not capable enough.
>
> -- Hannes


Re: git-clone causes out of memory

2017-10-13 Thread Christian Couder
On Fri, Oct 13, 2017 at 12:37 PM, Mike Hommey  wrote:
> On Fri, Oct 13, 2017 at 12:26:46PM +0200, Christian Couder wrote:
>>
>> After cloning it with -n, there is the following "funny" situation:
>>
>> $ time git rev-list HEAD
>> 7af99c9e7d4768fa681f4fe4ff61259794cf719b
>> 18ed56cbc5012117e24a603e7c072cf65d36d469
>> 45546f17e5801791d4bc5968b91253a2f4b0db72
>>
>> real0m0.004s
>> user0m0.000s
>> sys 0m0.004s
>> $ time git rev-list HEAD -- d0/d0/d0/d0/d0/d0/d0/d0/d0/d0/f0
>>
>> real0m0.004s
>> user0m0.000s
>> sys 0m0.000s
>> $ time git rev-list HEAD -- d0/d0/d0/d0/d0/d0/d0/d0/d0/d0
>>
>> real0m0.004s
>> user0m0.000s
>> sys 0m0.000s
>> $ time git rev-list HEAD -- d0/d0/d0/d0/d0/d0/d0/d0/
>> 45546f17e5801791d4bc5968b91253a2f4b0db72
>>
>> real0m0.005s
>> user0m0.008s
>> sys 0m0.000s
>> $ time git rev-list HEAD -- d0/d0/d0/d0/d0/
>> 45546f17e5801791d4bc5968b91253a2f4b0db72
>>
>> real0m0.203s
>> user0m0.112s
>> sys 0m0.088s
>> $ time git rev-list HEAD -- d0/d0/d0/d0/
>> 45546f17e5801791d4bc5968b91253a2f4b0db72
>>
>> real0m1.305s
>> user0m0.720s
>> sys 0m0.580s
>> $ time git rev-list HEAD -- d0/d0/d0/
>> 45546f17e5801791d4bc5968b91253a2f4b0db72
>>
>> real0m12.135s
>> user0m6.700s
>> sys 0m5.412s
>>
>> So `git rev-list` becomes exponentially more expensive when you run it
>> on a shorter directory path, though it is fast if you run it without a
>> path.
>
> That's because there are 10^7 files under d0/d0/d0, 10^6 under
> d0/d0/d0/d0/, 10^5 under d0/d0/d0/d0/d0/ etc.
>
> So really, this is all about things being slower when there's a crazy
> number of files. Picture me surprised.
>
> What makes it kind of special is that the repository contains a lot of
> paths/files, but very few objects, because it's duplicating everything.
>
> All the 10^10 blobs have the same content, all the 10^9 trees that point
> to them have the same content, all the 10^8 trees that point to those
> trees have the same content, etc.
>
> If git wasn't effectively deduplicating identical content, the repository
> would be multiple gigabytes large.

Yeah, but perhaps Git could be smarter when rev-listing too and avoid
processing files or directories it has already seen?


Re: git-clone causes out of memory

2017-10-13 Thread Mike Hommey
On Fri, Oct 13, 2017 at 12:26:46PM +0200, Christian Couder wrote:
> On Fri, Oct 13, 2017 at 12:06 PM, Mike Hommey  wrote:
> > On Fri, Oct 13, 2017 at 12:51:58PM +0300, Constantine wrote:
> >> There's a gitbomb on github. It is undoubtedly creative and funny, but 
> >> since
> >> this is a bug in git, I thought it'd be nice to report. The command:
> >>
> >>   $ git clone https://github.com/x0rz/ShadowBrokersFiles
> >
> > What fills memory is actually the checkout part of the command. git
> > clone -n doesn't fail.
> >
> > Credit should go where it's due: https://kate.io/blog/git-bomb/
> > (with the bonus that it comes with explanations)
> 
> Yeah, there is a thread on Hacker News about this too:
> 
> https://news.ycombinator.com/item?id=15457076
> 
> The original repo on GitHub is:
> 
> https://github.com/Katee/git-bomb.git
> 
> After cloning it with -n, there is the following "funny" situation:
> 
> $ time git rev-list HEAD
> 7af99c9e7d4768fa681f4fe4ff61259794cf719b
> 18ed56cbc5012117e24a603e7c072cf65d36d469
> 45546f17e5801791d4bc5968b91253a2f4b0db72
> 
> real0m0.004s
> user0m0.000s
> sys 0m0.004s
> $ time git rev-list HEAD -- d0/d0/d0/d0/d0/d0/d0/d0/d0/d0/f0
> 
> real0m0.004s
> user0m0.000s
> sys 0m0.000s
> $ time git rev-list HEAD -- d0/d0/d0/d0/d0/d0/d0/d0/d0/d0
> 
> real0m0.004s
> user0m0.000s
> sys 0m0.000s
> $ time git rev-list HEAD -- d0/d0/d0/d0/d0/d0/d0/d0/
> 45546f17e5801791d4bc5968b91253a2f4b0db72
> 
> real0m0.005s
> user0m0.008s
> sys 0m0.000s
> $ time git rev-list HEAD -- d0/d0/d0/d0/d0/
> 45546f17e5801791d4bc5968b91253a2f4b0db72
> 
> real0m0.203s
> user0m0.112s
> sys 0m0.088s
> $ time git rev-list HEAD -- d0/d0/d0/d0/
> 45546f17e5801791d4bc5968b91253a2f4b0db72
> 
> real0m1.305s
> user0m0.720s
> sys 0m0.580s
> $ time git rev-list HEAD -- d0/d0/d0/
> 45546f17e5801791d4bc5968b91253a2f4b0db72
> 
> real0m12.135s
> user0m6.700s
> sys 0m5.412s
> 
> So `git rev-list` becomes exponentially more expensive when you run it
> on a shorter directory path, though it is fast if you run it without a
> path.

That's because there are 10^7 files under d0/d0/d0, 10^6 under
d0/d0/d0/d0/, 10^5 under d0/d0/d0/d0/d0/ etc.

So really, this is all about things being slower when there's a crazy
number of files. Picture me surprised.

What makes it kind of special is that the repository contains a lot of
paths/files, but very few objects, because it's duplicating everything.

All the 10^10 blobs have the same content, all the 10^9 trees that point
to them have the same content, all the 10^8 trees that point to those
trees have the same content, etc.

If git wasn't effectively deduplicating identical content, the repository
would be multiple gigabytes large.

Mike


Re: git-clone causes out of memory

2017-10-13 Thread Christian Couder
On Fri, Oct 13, 2017 at 12:06 PM, Mike Hommey  wrote:
> On Fri, Oct 13, 2017 at 12:51:58PM +0300, Constantine wrote:
>> There's a gitbomb on github. It is undoubtedly creative and funny, but since
>> this is a bug in git, I thought it'd be nice to report. The command:
>>
>>   $ git clone https://github.com/x0rz/ShadowBrokersFiles
>
> What fills memory is actually the checkout part of the command. git
> clone -n doesn't fail.
>
> Credit should go where it's due: https://kate.io/blog/git-bomb/
> (with the bonus that it comes with explanations)

Yeah, there is a thread on Hacker News about this too:

https://news.ycombinator.com/item?id=15457076

The original repo on GitHub is:

https://github.com/Katee/git-bomb.git

After cloning it with -n, there is the following "funny" situation:

$ time git rev-list HEAD
7af99c9e7d4768fa681f4fe4ff61259794cf719b
18ed56cbc5012117e24a603e7c072cf65d36d469
45546f17e5801791d4bc5968b91253a2f4b0db72

real0m0.004s
user0m0.000s
sys 0m0.004s
$ time git rev-list HEAD -- d0/d0/d0/d0/d0/d0/d0/d0/d0/d0/f0

real0m0.004s
user0m0.000s
sys 0m0.000s
$ time git rev-list HEAD -- d0/d0/d0/d0/d0/d0/d0/d0/d0/d0

real0m0.004s
user0m0.000s
sys 0m0.000s
$ time git rev-list HEAD -- d0/d0/d0/d0/d0/d0/d0/d0/
45546f17e5801791d4bc5968b91253a2f4b0db72

real0m0.005s
user0m0.008s
sys 0m0.000s
$ time git rev-list HEAD -- d0/d0/d0/d0/d0/
45546f17e5801791d4bc5968b91253a2f4b0db72

real0m0.203s
user0m0.112s
sys 0m0.088s
$ time git rev-list HEAD -- d0/d0/d0/d0/
45546f17e5801791d4bc5968b91253a2f4b0db72

real0m1.305s
user0m0.720s
sys 0m0.580s
$ time git rev-list HEAD -- d0/d0/d0/
45546f17e5801791d4bc5968b91253a2f4b0db72

real0m12.135s
user0m6.700s
sys 0m5.412s

So `git rev-list` becomes exponentially more expensive when you run it
on a shorter directory path, though it is fast if you run it without a
path.


Re: git-clone causes out of memory

2017-10-13 Thread Mike Hommey
On Fri, Oct 13, 2017 at 12:51:58PM +0300, Constantine wrote:
> There's a gitbomb on github. It is undoubtedly creative and funny, but since
> this is a bug in git, I thought it'd be nice to report. The command:
> 
>   $ git clone https://github.com/x0rz/ShadowBrokersFiles

What fills memory is actually the checkout part of the command. git
clone -n doesn't fail.

Credit should go where it's due: https://kate.io/blog/git-bomb/
(with the bonus that it comes with explanations)

Mike


git-clone causes out of memory

2017-10-13 Thread Constantine
There's a gitbomb on github. It is undoubtedly creative and funny, but 
since this is a bug in git, I thought it'd be nice to report. The command:


$ git clone https://github.com/x0rz/ShadowBrokersFiles

quickly fills out the RAM (e.g. 4GB of free memory for me). To recover, 
call oom-killer through Alt+SysRq+f, then switch to a TTY and back.


Git version: 2.14.2

P.S.: I am not subscribed to the ML, acc. to 
https://git-scm.com/community I will be added to CC.


Re: Enhancement request: git-push: Allow (configurable) default push-option

2017-10-13 Thread Marius Paliga
Updated patch and added tests:
(feel free to modify Documentation)

---

Subject: [PATCH] Added support for new configuration parameter push.pushOption

builtin/push.c: add push.pushOption config

Currently push options need to be given explicitly, via
the command line as "git push --push-option".

The UX of Git would be enhanced if push options could be
configured instead of given each time on the command line.

Add the config option push.pushOption, which is a multi
string option, containing push options that are sent by default.

When push options are set in the system wide config
(/etc/gitconfig), they can be unset later in the more specific
repository config by setting the string to the empty string.

Add tests and documentation as well.

Signed-off-by: Marius Paliga 
---
 Documentation/git-push.txt |  3 ++
 builtin/push.c | 12 
 t/t5545-push-options.sh| 77 ++
 3 files changed, 92 insertions(+)

diff --git a/Documentation/git-push.txt b/Documentation/git-push.txt
index 3e76e99f3..da9b17624 100644
--- a/Documentation/git-push.txt
+++ b/Documentation/git-push.txt
@@ -161,6 +161,9 @@ already exists on the remote side.
 Transmit the given string to the server, which passes them to
 the pre-receive as well as the post-receive hook. The given string
 must not contain a NUL or LF character.
+Default push options can also be specified with configuration
+variable `push.pushOption`. String(s) specified here will always
+be passed to the server without need to specify it using `--push-option`

 --receive-pack=::
 --exec=::
diff --git a/builtin/push.c b/builtin/push.c
index 2ac810422..10e520c8f 100644
--- a/builtin/push.c
+++ b/builtin/push.c
@@ -32,6 +32,8 @@ static const char **refspec;
 static int refspec_nr;
 static int refspec_alloc;

+static struct string_list push_options_from_config = STRING_LIST_INIT_DUP;
+
 static void add_refspec(const char *ref)
 {
 refspec_nr++;
@@ -503,6 +505,14 @@ static int git_push_config(const char *k, const
char *v, void *cb)
 int val = git_config_bool(k, v) ?
 RECURSE_SUBMODULES_ON_DEMAND : RECURSE_SUBMODULES_OFF;
 recurse_submodules = val;
+} else if (!strcmp(k, "push.pushoption")) {
+if (v == NULL)
+return config_error_nonbool(k);
+else
+if (v[0] == '\0')
+string_list_clear(_options_from_config, 0);
+else
+string_list_append(_options_from_config, v);
 }

 return git_default_config(k, v, NULL);
@@ -562,6 +572,8 @@ int cmd_push(int argc, const char **argv, const
char *prefix)
 packet_trace_identity("push");
 git_config(git_push_config, );
 argc = parse_options(argc, argv, prefix, options, push_usage, 0);
+if (!push_options.nr)
+push_options = push_options_from_config;
 set_push_cert_flags(, push_cert);

 if (deleterefs && (tags || (flags & (TRANSPORT_PUSH_ALL |
TRANSPORT_PUSH_MIRROR
diff --git a/t/t5545-push-options.sh b/t/t5545-push-options.sh
index 90a4b0d2f..463783789 100755
--- a/t/t5545-push-options.sh
+++ b/t/t5545-push-options.sh
@@ -140,6 +140,83 @@ test_expect_success 'push options and submodules' '
 test_cmp expect parent_upstream/.git/hooks/post-receive.push_options
 '

+test_expect_success 'default push option' '
+mk_repo_pair &&
+git -C upstream config receive.advertisePushOptions true &&
+(
+cd workbench &&
+test_commit one &&
+git push --mirror up &&
+test_commit two &&
+git -c push.pushOption=default push up master
+) &&
+test_refs master master &&
+echo "default" >expect &&
+test_cmp expect upstream/.git/hooks/pre-receive.push_options &&
+test_cmp expect upstream/.git/hooks/post-receive.push_options
+'
+
+test_expect_success 'two default push options' '
+mk_repo_pair &&
+git -C upstream config receive.advertisePushOptions true &&
+(
+cd workbench &&
+test_commit one &&
+git push --mirror up &&
+test_commit two &&
+git -c push.pushOption=default1 -c push.pushOption=default2
push up master
+) &&
+test_refs master master &&
+printf "default1\ndefault2\n" >expect &&
+test_cmp expect upstream/.git/hooks/pre-receive.push_options &&
+test_cmp expect upstream/.git/hooks/post-receive.push_options
+'
+
+test_expect_success 'push option from command line overrides
from-config push option' '
+mk_repo_pair &&
+git -C upstream config receive.advertisePushOptions true &&
+(
+cd workbench &&
+test_commit one &&
+git push --mirror up &&
+test_commit two &&
+git -c push.pushOption=default push --push-option=manual up master
+) &&
+test_refs master master &&
+echo "manual" >expect &&
+test_cmp expect upstream/.git/hooks/pre-receive.push_options &&
+test_cmp expect 

Re: [PATCH v2] Documentation/git-config.txt: reword missleading sentence

2017-10-13 Thread Moy Matthieu
Junio C Hamano  writes:

> second.pa...@gmail.com writes:
>
>> From: PAYRE NATHAN p1508475 
>
> Should I assume that the name/address on the last Signed-off-by: we
> see below is what you want to be known as?  As a part of school
> work, I'd imagine that Matthieu wants your work to be associated
> with the univ-lyon1.fr address, so perhaps you want to go the other
> way around?

Yes, I'd rather have contributions made with the identity
@etu.univ-lyon1.fr, and use the same identity for Signed-off-by: and
From:.

>>  --path::
>> -'git-config' will expand leading '{tilde}' to the value of
>> +'git config' will expand leading '{tilde}' to the value of
>>  '$HOME', and '{tilde}user' to the home directory for the

Didn't notice yesterday, but you still have forward quotes here and
backquotes right below. If you are to fix this paragraph, better fix all
issues at once.

>>  specified user.  This option has no effect when setting the
>> -value (but you can use 'git config bla {tilde}/' from the
>> -command line to let your shell do the expansion).
>> +value (but you can use `git config section.variable {tilde}/`
>
> Does this reference to {tilde} get expanded inside the `literal`
> mark-up?  In the description for 'gitdir', we find this passage (in
> Documentation/config.txt):
>
>  * If the pattern starts with `~/`, `~` will be substituted with the
>content of the environment variable `HOME`.
>
> So I'd expect `~` to be a safe way to get what you want, not `{tilde}`.

If I read correctly, the potential issue with ~ is that it's used for
subscript text (i.e. foo~bar~ in asciidoc is LaTeX's $foo_{bar}$). But ~
within a literal string should be safe, and at least we use it in many
places in our doc.

Cheers,

-- 
Matthieu Moy
https://matthieu-moy.fr/


Re: [PATCH 1/3] branch: streamline "attr_only" handling in validate_new_branchname()

2017-10-13 Thread Eric Sunshine
On Fri, Oct 13, 2017 at 1:11 AM, Junio C Hamano  wrote:
> The function takes a parameter "attr_only" (which is a name that is
> hard to reason about, which will be corrected soon) and skips some
> checks when it is set.  Reorganize the conditionals to make it move

s/move/more/

> obvious that some checks are never made when this parameter is set.
>
> Signed-off-by: Junio C Hamano 


Re: is there a truly compelling rationale for .git/info/exclude?

2017-10-13 Thread Robert P. J. Day
On Thu, 12 Oct 2017, Jeff King wrote:

> On Fri, Oct 13, 2017 at 01:18:00AM +0200, Johannes Schindelin wrote:
>
> > [who I had to cull from the To:/Cc: headers, as my mailer consistently
> > told me that there is no valid DNS record to route mail to
> > rpj...@crashcourse.ca, which *is* weird.]
>
> You are not the only one to mention this, so I did 60 seconds of
> digging. Turns out that the MX of crashcourse.ca points to a CNAME
> (mail.crashcourse.ca), which is explicitly forbidden by RFC 2181
> (section 10.3). Some MTAs are picky about this and others are not
> (mine isn't, so I've added Robert back to the cc so he sees this).

  ok, i'll tell my admin about this, thanks.

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
http://crashcourse.ca

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday