date:20161011

On Tue, Oct 11, 2016 at 11:12 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> I think this patch is the most interesting patch, so I'll refrain from
>> resending the other 27 patches, though I have adressed the review comments
>> locally. I'll resend everything once we are in agreement for this one.
>
> What is the primary purpose of this patch?  Is it to prepare callers
> so that the way they interact with the attr subsystem will not have to
> change when they become threaded and the attr subsystem becomes
> thread ready?
>
> I am not sure if the updates to the callers fulfill that purpose.
> For example, look at this hunk.
>
>> @@ -111,6 +111,7 @@ static int write_archive_entry(const unsigned char 
>> *sha1, const char *base,
>>   struct archiver_args *args = c->args;
>>   write_archive_entry_fn_t write_entry = c->write_entry;
>>   static struct git_attr_check *check;
>> + static struct git_attr_result result;
>
> As we discussed, this caller, even when threaded, will always want
> to ask for a fixed two attributes, so "check" being static and
> shared across threads is perfectly fine.  But we do not want to see
> "result" shared, do we?

Well all of the hunks in the patch are not threaded, so they
don't follow a threading pattern, but the static pattern to not be
more expensive than needed.

>
>>   const char *path_without_prefix;
>>   int err;
>>
>> @@ -124,12 +125,15 @@ static int write_archive_entry(const unsigned char 
>> *sha1, const char *base,
>>   strbuf_addch(&path, '/');
>>   path_without_prefix = path.buf + args->baselen;
>>
>> - if (!check)
>> - check = git_attr_check_initl("export-ignore", "export-subst", 
>> NULL);
>> - if (!git_check_attr(path_without_prefix, check)) {
>> - if (ATTR_TRUE(check->check[0].value))
>> + if (!check) {
>> + git_attr_check_initl(&check, "export-ignore", "export-subst", 
>> NULL);
>> + git_attr_result_init(&result, check);
>> + }
>
> Are we assuming that storing and checking of a single pointer is
> atomic?  I would not expose that assumption to the callers.  On a
> platform where that assumption holds, "if check is not NULL,
> somebody must have done it already, so return without doing nothing"
> can be the first thing git_attr_check_initl()'s implementation does,
> though.  Or it may not hold anywhere without some barriers.  All
> that implementation details should be hidden inside _initl()'s
> implementation.  So this caller should instead just do an
> unconditional:
>
> git_attr_check_initl(&check, "export-ignore", "export-subst", NULL);
>
> Also, as "result" should be per running thread, hence non-static,
> and because we do not want repeated heap allocations and releases
> but luckily most callers _know_ not just how many but what exact
> attributes they are interested in (I think there are only two
> callers that do not know it; check-all-attrs one, and your pathspec
> magic one that does not exist at this point in the series), I would
> think it is much more preferrable to allow the caller to prepare an
> on-stack array and call it "initialized already".
>
> In other words, ideally, I think this part of the patch should
> rather read like this:
>
> static struct git_attr_check *check;
> struct git_attr_result result[2];
>
> ...
> git_attr_check_initl(&check, "export-ignore", "export-subst", NULL);
> if (!git_check_attr(path_without_prefix, check, result)) {
> ... use result[0] and result[1] ...
>
> For sanity checking, it is OK to add ARRAY_SIZE(result) as the final
> and extra parameter to git_check_attr() so that the function can
> make sure it matches (or exceeds) check->nr.

That seems tempting from a callers perspective; I'll look into that.

Re: Make `git fetch --all` parallel?

On Tue, Oct 11, 2016 at 6:52 PM, Jeff King  wrote:
> On Tue, Oct 11, 2016 at 09:34:28PM -0400, Jeff King wrote:
>
>> > Ok, time to present data... Let's assume a degenerate case first:
>> > "up-to-date with all remotes" because that is easy to reproduce.
>> >
>> > I have 14 remotes currently:
>> >
>> > $ time git fetch --all
>> > real 0m18.016s
>> > user 0m2.027s
>> > sys 0m1.235s
>> >
>> > $ time git config --get-regexp remote.*.url |awk '{print $2}' |xargs
>> > -P 14 -I % git fetch %
>> > real 0m5.168s
>> > user 0m2.312s
>> > sys 0m1.167s
>>
>> So first, thank you (and Ævar) for providing real numbers. It's clear
>> that I was talking nonsense.
>>
>> Second, I wonder where all that time is going. Clearly there's an
>> end-to-end latency issue, but I'm not sure where it is. Is it startup
>> time for git-fetch? Is it in getting and processing the ref
>> advertisement from the other side? What I'm wondering is if there are
>> opportunities to speed up the serial case (but nobody really cared
>> before because it doesn't matter unless you're doing 14 of them back to
>> back).
>
> Hmm. I think it really might be just network latency. Here's my fetch
> time:
>
>   $ git config remote.origin.url
>   git://github.com/gitster/git.git
>
>   $ time git fetch origin
>   real0m0.183s
>   user0m0.072s
>   sys 0m0.008s
>
> 14 of those in a row shouldn't take more than about 2.5 seconds, which
> is still twice as fast as your parallel case. So what's going on?
>
> One is that I live about a hundred miles from GitHub's data center, and
> my ping time there is ~13ms. The other side of the country, let alone
> Europe, is going to be noticeably slower just for the TCP handshake.
>
> The second is that git:// is really cheap and simple. git-over-ssh is
> over twice as slow:
>
>   $ time git fetch g...@github.com:gitster/git
>   ...
>   real0m0.432s
>   user0m0.100s
>   sys 0m0.032s
>
> HTTP fares better than I would have thought, but is also slower:
>
>   $ time git fetch https://github.com/gitster/git
>   ...
>   real0m0.258s
>   user0m0.080s
>   sys 0m0.032s
>
> -Peff

Well 9/14 are https for me, the rest is git://
Also 9/14 (but a different set) is github, the rest is
either internal or kernel.org.

Fetching from github (https) is only 0.9s from here
(SF bay area, I'm not in Europe any more ;) )

I would have expected to have a speedup
of roughly 2 + latency gains. Factor 2 because
in the current state of affairs either the client or the
remote is working, i.e. the other sie is idle/waiting, so
factor 2 seemed reasonable (and ofc the latency), so I
was a bit surprised to see a higher yield.

Re: [PATCH 5/5] trailer: support values folded to multiple lines

Jonathan Tan  writes:

> Currently, interpret-trailers requires that a trailer be only on 1 line.
> For example:
>
>   a: first line
>  second line
>
> would be interpreted as one trailer line followed by one non-trailer line.
>
> Make interpret-trailers support RFC 822-style folding, treating those
> lines as one logical trailer.

Let's see how the code handles one minor detail when we see 822
folding, namely, "what happens to the leading whitespace that signals
the beginning of the second and subsequent lines?".

> diff --git a/trailer.c b/trailer.c
> index 97e96a9..907baa0 100644
> --- a/trailer.c
> +++ b/trailer.c
> @@ -31,7 +31,7 @@ struct trailer_item {
>* (excluding the terminating newline) and token is NULL.
>*/
>   char *token;
> - char *value;
> + struct strbuf value;
>  };

Is the length of value very frequently used once the list of trailer
lines are fully parsed?  If not, I'd rather not to have "struct
strbuf" in a long-living structure like this one and instead prefer
keeping it a simple and stupid "char *value".

Yes, I know the existing code in trailers overuses strbuf when there
is no need, primarily because it uses the lazy "split into an array
of strbufs" function.  We shouldn't make it worse.

> @@ -767,16 +773,24 @@ static int process_input_file(FILE *outfile,
>  
>   /* Parse trailer lines */
>   for (i = trailer_start; i < trailer_end; i++) {
> + if (last && isspace(lines[i]->buf[0])) {

It is convenient if "value" is a strbuf to do this,

> + /* continuation line of the last trailer item */
> + strbuf_addch(&last->value, '\n');
> + strbuf_addbuf(&last->value, lines[i]);
> + strbuf_strip_suffix(&last->value, "\n");

but it is easy to introduce a temporary strbuf in this scope and use
it only to create the final value and detach it to last->value, i.e.

if (last && isspace(*lines[i]->buf)) {
struct strbuf buf = STRBUF_INIT;
strbuf_addf(&buf, "%s\n%s", last->value, lines[i]->buf);
strbuf_strip_suffix(&buf, "\n");
free(last->value);
last->value = strbuf_detach(&buf, NULL);

By the way, I now see that the code handles the "minor detail" to
keep the leading whitespace, which is good.

Thanks.

Re: [PATCHv2] attr: convert to new threadsafe API

Stefan Beller  writes:

> I think this patch is the most interesting patch, so I'll refrain from
> resending the other 27 patches, though I have adressed the review comments
> locally. I'll resend everything once we are in agreement for this one.

What is the primary purpose of this patch?  Is it to prepare callers
so that the way they interact with the attr subsystem will not have to
change when they become threaded and the attr subsystem becomes
thread ready?

I am not sure if the updates to the callers fulfill that purpose.
For example, look at this hunk.

> @@ -111,6 +111,7 @@ static int write_archive_entry(const unsigned char *sha1, 
> const char *base,
>   struct archiver_args *args = c->args;
>   write_archive_entry_fn_t write_entry = c->write_entry;
>   static struct git_attr_check *check;
> + static struct git_attr_result result;

As we discussed, this caller, even when threaded, will always want
to ask for a fixed two attributes, so "check" being static and
shared across threads is perfectly fine.  But we do not want to see
"result" shared, do we?

>   const char *path_without_prefix;
>   int err;
>  
> @@ -124,12 +125,15 @@ static int write_archive_entry(const unsigned char 
> *sha1, const char *base,
>   strbuf_addch(&path, '/');
>   path_without_prefix = path.buf + args->baselen;
>  
> - if (!check)
> - check = git_attr_check_initl("export-ignore", "export-subst", 
> NULL);
> - if (!git_check_attr(path_without_prefix, check)) {
> - if (ATTR_TRUE(check->check[0].value))
> + if (!check) {
> + git_attr_check_initl(&check, "export-ignore", "export-subst", 
> NULL);
> + git_attr_result_init(&result, check);
> + }

Are we assuming that storing and checking of a single pointer is
atomic?  I would not expose that assumption to the callers.  On a
platform where that assumption holds, "if check is not NULL,
somebody must have done it already, so return without doing nothing"
can be the first thing git_attr_check_initl()'s implementation does,
though.  Or it may not hold anywhere without some barriers.  All
that implementation details should be hidden inside _initl()'s
implementation.  So this caller should instead just do an
unconditional:

git_attr_check_initl(&check, "export-ignore", "export-subst", NULL);

Also, as "result" should be per running thread, hence non-static,
and because we do not want repeated heap allocations and releases
but luckily most callers _know_ not just how many but what exact
attributes they are interested in (I think there are only two
callers that do not know it; check-all-attrs one, and your pathspec
magic one that does not exist at this point in the series), I would
think it is much more preferrable to allow the caller to prepare an
on-stack array and call it "initialized already".  

In other words, ideally, I think this part of the patch should
rather read like this:

static struct git_attr_check *check;
struct git_attr_result result[2];

...
git_attr_check_initl(&check, "export-ignore", "export-subst", NULL);
if (!git_check_attr(path_without_prefix, check, result)) {
... use result[0] and result[1] ...

For sanity checking, it is OK to add ARRAY_SIZE(result) as the final
and extra parameter to git_check_attr() so that the function can
make sure it matches (or exceeds) check->nr.

Re: git 2.10.1 test regression in t3700-add.sh

Jeremy Huddleston Sequoia  writes:

>> diff --git a/t/t3700-add.sh b/t/t3700-add.sh
>> index 924a266126..53c0cb6dea 100755
>> --- a/t/t3700-add.sh
>> +++ b/t/t3700-add.sh
>> @@ -350,6 +350,7 @@ test_expect_success POSIXPERM,SYMLINKS 'git add 
>> --chmod=+x with symlinks' '
>> '
>> 
>> test_expect_success 'git add --chmod=[+-]x changes index with already added 
>> file' '
>> +rm -f foo3 xfoo3 &&
>>  echo foo >foo3 &&
>>  git add foo3 &&
>>  git add --chmod=+x foo3 &&
>
>
> I actually tried that, but the problem is that xfoo3 was
> previously added as a symlink, so test_mode_in_index still reports
> it as a symlink.

Ah, of course.  That "rm -f" needs to remove from the index and also
from the working tree, so has to be "git rm -f --ignore-unmatch" or
something like that.

> It's fundamentally a question of who is responsible for cleanup.
> Is the individual test responsible for cleaning up after itself
> (such that later tests can rely on a clean state), or should
> individual tests assume that the initial state might be undefined
> and try to cleanup after earlier tests?

In modern tests, we strive to do the former with liberal use of
test_when_finished.  I think the one that creates xfoo[123] and
leaves them behind for a long time predates the modern practice.

A minimal fix with that approach may look like this:

 t/t3700-add.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 924a266126..80c7ee3e3b 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -64,6 +64,7 @@ test_expect_success 'git add: filemode=0 should not get 
confused by symlink' '
 test_expect_success \
'git update-index --add: Test that executable bit is not used...' \
'git config core.filemode 0 &&
+test_when_finished "git rm -f xfoo3" &&
 test_ln_s_add xfoo2 xfoo3 &&   # runs git update-index --add
 test_mode_in_index 12 xfoo3'

Re: Formatting problem send_mail in version 2.10.0

2016-10-11 Thread Larry Finger


On 10/11/2016 11:18 AM, Matthieu Moy wrote:

Larry Finger  writes:


That added information at the end is intended to be passed on to the
stable group. In this case, the patch needs to be applied to kernel
versions 4.8 and later.


OK, but where do people fetch this information from?


This format is used in a patch for the kernel. When the patch is merged into 
mainline, sta...@vger.kernel.org gets sent an E-mail with a copy of the original 
patch. Maintainers of the indicated systems then merge the patch with their 
stable version.


When you use git send-email, the content of the Cc: trailers ends up
both in the body of the message and in the Cc: field of the same
message.

If you need the mention to appear in the body of the message, then using
parenthesis is fine: git send-email won't remove it (more precisely,
"send-email" will call "format-patch" which won't remove it).

Not an objection to patching send-email anyway, but if there's a simple
and RFC-compliant way to do what you're looking for, we can as well use
it (possibly in addition to patching).


I do not want it in the body of the message. I just want to pass a hint to the 
stable maintainer(s).


As noted earlier, this has worked for a very long time, and I think the previous 
behavior should be restored.


Larry

Re: Make `git fetch --all` parallel?

On Tue, Oct 11, 2016 at 09:34:28PM -0400, Jeff King wrote:

> > Ok, time to present data... Let's assume a degenerate case first:
> > "up-to-date with all remotes" because that is easy to reproduce.
> > 
> > I have 14 remotes currently:
> > 
> > $ time git fetch --all
> > real 0m18.016s
> > user 0m2.027s
> > sys 0m1.235s
> > 
> > $ time git config --get-regexp remote.*.url |awk '{print $2}' |xargs
> > -P 14 -I % git fetch %
> > real 0m5.168s
> > user 0m2.312s
> > sys 0m1.167s
> 
> So first, thank you (and Ævar) for providing real numbers. It's clear
> that I was talking nonsense.
> 
> Second, I wonder where all that time is going. Clearly there's an
> end-to-end latency issue, but I'm not sure where it is. Is it startup
> time for git-fetch? Is it in getting and processing the ref
> advertisement from the other side? What I'm wondering is if there are
> opportunities to speed up the serial case (but nobody really cared
> before because it doesn't matter unless you're doing 14 of them back to
> back).

Hmm. I think it really might be just network latency. Here's my fetch
time:

  $ git config remote.origin.url
  git://github.com/gitster/git.git

  $ time git fetch origin
  real0m0.183s
  user0m0.072s
  sys 0m0.008s

14 of those in a row shouldn't take more than about 2.5 seconds, which
is still twice as fast as your parallel case. So what's going on?

One is that I live about a hundred miles from GitHub's data center, and
my ping time there is ~13ms. The other side of the country, let alone
Europe, is going to be noticeably slower just for the TCP handshake.

The second is that git:// is really cheap and simple. git-over-ssh is
over twice as slow:

  $ time git fetch g...@github.com:gitster/git
  ...
  real0m0.432s
  user0m0.100s
  sys 0m0.032s

HTTP fares better than I would have thought, but is also slower:

  $ time git fetch https://github.com/gitster/git
  ...
  real0m0.258s
  user0m0.080s
  sys 0m0.032s

-Peff

[PATCH 3/5] trailer: make args have their own struct

Improve type safety by making arguments (whether from configuration or
from the command line) have their own "struct arg_item" type, separate
from the "struct trailer_item" type used to represent the trailers in
the buffer being manipulated.

Also take the opportunity to refine the "struct trailer_item" definition
by removing the conf (which is used only by arguments) and by removing
const on the string fields, since those fields are owned by the struct.

This change also prepares "struct trailer_item" to be further
differentiated from "struct arg_item" in future patches.
---
 trailer.c | 161 ++
 1 file changed, 99 insertions(+), 62 deletions(-)

diff --git a/trailer.c b/trailer.c
index e8b1bfb..167b2fd 100644
--- a/trailer.c
+++ b/trailer.c
@@ -26,12 +26,18 @@ static struct conf_info default_conf_info;
 
 struct trailer_item {
struct trailer_item *next;
-   const char *token;
-   const char *value;
+   char *token;
+   char *value;
+};
+
+struct arg_item {
+   struct arg_item *next;
+   char *token;
+   char *value;
struct conf_info conf;
 };
 
-static struct trailer_item *first_conf_item;
+static struct arg_item *first_conf_item;
 
 static char *separators = ":";
 
@@ -55,8 +61,7 @@ static size_t token_len_without_separator(const char *token, 
size_t len)
return len;
 }
 
-static int same_token(const struct trailer_item *a,
- const struct trailer_item *b)
+static int same_token(const struct trailer_item *a, const struct arg_item *b)
 {
size_t a_len = token_len_without_separator(a->token, strlen(a->token));
size_t b_len = token_len_without_separator(b->token, strlen(b->token));
@@ -65,14 +70,12 @@ static int same_token(const struct trailer_item *a,
return !strncasecmp(a->token, b->token, min_len);
 }
 
-static int same_value(const struct trailer_item *a,
- const struct trailer_item *b)
+static int same_value(const struct trailer_item *a, const struct arg_item *b)
 {
return !strcasecmp(a->value, b->value);
 }
 
-static int same_trailer(const struct trailer_item *a,
-   const struct trailer_item *b)
+static int same_trailer(const struct trailer_item *a, const struct arg_item *b)
 {
return same_token(a, b) && same_value(a, b);
 }
@@ -94,11 +97,18 @@ static inline void strbuf_replace(struct strbuf *sb, const 
char *a, const char *
 
 static void free_trailer_item(struct trailer_item *item)
 {
+   free(item->token);
+   free(item->value);
+   free(item);
+}
+
+static void free_arg_item(struct arg_item *item)
+{
free(item->conf.name);
free(item->conf.key);
free(item->conf.command);
-   free((char *)item->token);
-   free((char *)item->value);
+   free(item->token);
+   free(item->value);
free(item);
 }
 
@@ -131,13 +141,13 @@ static void print_all(FILE *outfile, struct trailer_item 
*first, int trim_empty)
}
 }
 
-static const char *apply_command(const char *command, const char *arg)
+static char *apply_command(const char *command, const char *arg)
 {
struct strbuf cmd = STRBUF_INIT;
struct strbuf buf = STRBUF_INIT;
struct child_process cp = CHILD_PROCESS_INIT;
const char *argv[] = {NULL, NULL};
-   const char *result;
+   char *result;
 
strbuf_addstr(&cmd, command);
if (arg)
@@ -162,7 +172,7 @@ static const char *apply_command(const char *command, const 
char *arg)
return result;
 }
 
-static void apply_item_command(struct trailer_item *in_tok, struct 
trailer_item *arg_tok)
+static void apply_item_command(struct trailer_item *in_tok, struct arg_item 
*arg_tok)
 {
if (arg_tok->conf.command) {
const char *arg;
@@ -179,60 +189,77 @@ static void apply_item_command(struct trailer_item 
*in_tok, struct trailer_item
}
 }
 
+static struct trailer_item *trailer_from_arg(struct arg_item *arg_tok)
+{
+   struct trailer_item *new = xcalloc(sizeof(*new), 1);
+   new->token = arg_tok->token;
+   new->value = arg_tok->value;
+   arg_tok->token = arg_tok->value = NULL;
+   free_arg_item(arg_tok);
+   return new;
+}
+
 static void apply_existing_arg(struct trailer_item **found_next,
-  struct trailer_item *arg_tok,
+  struct arg_item *arg_tok,
   struct trailer_item **position_to_add,
   const struct trailer_item *in_tok_head,
   const struct trailer_item *neighbor)
 {
-   if (arg_tok->conf.if_exists == EXISTS_DO_NOTHING) {
-   free_trailer_item(arg_tok);
+   enum action_if_exists if_exists = arg_tok->conf.if_exists;
+   struct trailer_item *to_add;
+
+   if (if_exists == EXISTS_DO_NOTHING) {
+   free_arg_item(arg_tok);
return;
}

Re: Make `git fetch --all` parallel?

On Tue, Oct 11, 2016 at 04:18:15PM -0700, Stefan Beller wrote:

> >> At the very least we would need a similar thing as Jeff recently sent for 
> >> the
> >> push case with objects quarantined and then made available in one go?
> >
> > I don't think so. The object database is perfectly happy with multiple
> > simultaneous writers, and nothing impacts the have/wants until actual
> > refs are written. Quarantining objects before the refs are written is an
> > orthogonal concept.
> 
> If a remote advertises its tips, we'd need to look these up (clientside) to
> decide if we have them, and I do not think we'd do that via a reachability
> check, but via direct lookup in the object data base? So I do not quite
> understand, what we gain from the atomic ref writes in e.g. remote/origin/.

It's been a while since I've dug into the fetch protocol. But I think we
cover the "do we have the objects already" check via quickfetch(), which
does do a reachability check, And then we advertise our "have" commits
by walking backwards from our ref tips, so everything there is
reachable.

Anything else would be questionable, especially under older versions of
git, as we promise only to have a complete graph for objects reachable
from the refs. Older versions of git would happily truncate unreachable
history based on the 2-week prune expiration period.

> > I'm not altogether convinced that parallel fetch would be that much
> > faster, though.
> 
> Ok, time to present data... Let's assume a degenerate case first:
> "up-to-date with all remotes" because that is easy to reproduce.
> 
> I have 14 remotes currently:
> 
> $ time git fetch --all
> real 0m18.016s
> user 0m2.027s
> sys 0m1.235s
> 
> $ time git config --get-regexp remote.*.url |awk '{print $2}' |xargs
> -P 14 -I % git fetch %
> real 0m5.168s
> user 0m2.312s
> sys 0m1.167s

So first, thank you (and Ævar) for providing real numbers. It's clear
that I was talking nonsense.

Second, I wonder where all that time is going. Clearly there's an
end-to-end latency issue, but I'm not sure where it is. Is it startup
time for git-fetch? Is it in getting and processing the ref
advertisement from the other side? What I'm wondering is if there are
opportunities to speed up the serial case (but nobody really cared
before because it doesn't matter unless you're doing 14 of them back to
back).

> > I usually just do a one-off fetch of their URL in such a case, exactly
> > because I _don't_ want to end up with a bunch of remotes. You can also
> > mark them with skipDefaultUpdate if you only care about them
> > occasionally (so you can "git fetch sbeller" when you care about it, but
> > it doesn't slow down your daily "git fetch").
> 
> And I assume you don't want the remotes because it takes time to fetch and not
> because your disk space is expensive. ;)

That, and it clogs the ref namespace. You can mostly ignore the extra
refs, but they show up in the "git checkout ..." DWIM, for example.

-Peff

[PATCH 2/5] trailer: streamline trailer item create and add

Currently, creation and addition (to a list) of trailer items are spread
across multiple functions. Streamline this by only having 2 functions:
one to parse the user-supplied string, and one to add the parsed
information to a list.
---
 trailer.c | 135 +-
 1 file changed, 62 insertions(+), 73 deletions(-)

diff --git a/trailer.c b/trailer.c
index bf3d7d0..e8b1bfb 100644
--- a/trailer.c
+++ b/trailer.c
@@ -335,7 +335,7 @@ static int set_if_missing(struct conf_info *item, const 
char *value)
return 0;
 }
 
-static void duplicate_conf(struct conf_info *dst, struct conf_info *src)
+static void duplicate_conf(struct conf_info *dst, const struct conf_info *src)
 {
*dst = *src;
if (src->name)
@@ -467,10 +467,30 @@ static int git_trailer_config(const char *conf_key, const 
char *value, void *cb)
return 0;
 }
 
-static int parse_trailer(struct strbuf *tok, struct strbuf *val, const char 
*trailer)
+static const char *token_from_item(struct trailer_item *item, char *tok)
+{
+   if (item->conf.key)
+   return item->conf.key;
+   if (tok)
+   return tok;
+   return item->conf.name;
+}
+
+static int token_matches_item(const char *tok, struct trailer_item *item, int 
tok_len)
+{
+   if (!strncasecmp(tok, item->conf.name, tok_len))
+   return 1;
+   return item->conf.key ? !strncasecmp(tok, item->conf.key, tok_len) : 0;
+}
+
+static int parse_trailer(struct strbuf *tok, struct strbuf *val,
+const struct conf_info **conf, const char *trailer)
 {
size_t len;
struct strbuf seps = STRBUF_INIT;
+   struct trailer_item *item;
+   int tok_len;
+
strbuf_addstr(&seps, separators);
strbuf_addch(&seps, '=');
len = strcspn(trailer, seps.buf);
@@ -490,73 +510,31 @@ static int parse_trailer(struct strbuf *tok, struct 
strbuf *val, const char *tra
strbuf_addstr(tok, trailer);
strbuf_trim(tok);
}
-   return 0;
-}
-
-static const char *token_from_item(struct trailer_item *item, char *tok)
-{
-   if (item->conf.key)
-   return item->conf.key;
-   if (tok)
-   return tok;
-   return item->conf.name;
-}
-
-static struct trailer_item *new_trailer_item(struct trailer_item *conf_item,
-char *tok, char *val)
-{
-   struct trailer_item *new = xcalloc(sizeof(*new), 1);
-   new->value = val ? val : xstrdup("");
-
-   if (conf_item) {
-   duplicate_conf(&new->conf, &conf_item->conf);
-   new->token = xstrdup(token_from_item(conf_item, tok));
-   free(tok);
-   } else {
-   duplicate_conf(&new->conf, &default_conf_info);
-   new->token = tok;
-   }
-
-   return new;
-}
-
-static int token_matches_item(const char *tok, struct trailer_item *item, int 
tok_len)
-{
-   if (!strncasecmp(tok, item->conf.name, tok_len))
-   return 1;
-   return item->conf.key ? !strncasecmp(tok, item->conf.key, tok_len) : 0;
-}
-
-static struct trailer_item *create_trailer_item(const char *string)
-{
-   struct strbuf tok = STRBUF_INIT;
-   struct strbuf val = STRBUF_INIT;
-   struct trailer_item *item;
-   int tok_len;
-
-   if (parse_trailer(&tok, &val, string))
-   return NULL;
-
-   tok_len = token_len_without_separator(tok.buf, tok.len);
 
/* Lookup if the token matches something in the config */
+   tok_len = token_len_without_separator(tok->buf, tok->len);
+   *conf = &default_conf_info;
for (item = first_conf_item; item; item = item->next) {
-   if (token_matches_item(tok.buf, item, tok_len))
-   return new_trailer_item(item,
-   strbuf_detach(&tok, NULL),
-   strbuf_detach(&val, NULL));
+   if (token_matches_item(tok->buf, item, tok_len)) {
+   char *tok_buf = strbuf_detach(tok, NULL);
+   *conf = &item->conf;
+   strbuf_addstr(tok, token_from_item(item, tok_buf));
+   free(tok_buf);
+   break;
+   }
}
 
-   return new_trailer_item(NULL,
-   strbuf_detach(&tok, NULL),
-   strbuf_detach(&val, NULL));
+   return 0;
 }
 
-static void add_trailer_item(struct trailer_item ***tail,
-struct trailer_item *new)
+static void add_trailer_item(struct trailer_item ***tail, char *tok, char *val,
+const struct conf_info *conf)
 {
-   if (!new)
-   return;
+   struct trailer_item *new = xcalloc(sizeof(*new), 1);
+   new->token = tok;
+   new->value = val;
+   duplicate_conf(&new->conf,

[PATCH 4/5] trailer: allow non-trailers in trailer block

Currently, interpret-trailers requires all lines of a trailer block to
be trailers (or comments) - if not it would not identify that block as a
trailer block, and thus create its own trailer block, inserting a blank
line.  For example:

  echo -e "\na: b\nnot trailer" |
  git interpret-trailers --trailer "c: d"

would result in:

  a: b
  not trailer

  c: d

Relax the definition of a trailer block to only require 1 trailer, so
that trailers can be directly added to such blocks, resulting in:

  a: b
  not trailer
  c: d

This allows arbitrary lines to be included in trailer blocks, like those
in [1], and still allow interpret-trailers to be used.

This change also makes comments in the trailer block be treated as any
other non-trailer line, preserving them in the output of
interpret-trailers.

[1]
https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable/+/e7d316a02f683864a12389f8808570e37fb90aa3
---

There are some possible inaccuracies in the master branch's
find_trailer_start (in particular, handling of lines that *start* with a
separator, which should not be considered a trailer line) - this patch
preserves the existing behavior.

 Documentation/git-interpret-trailers.txt |  3 +-
 t/t7513-interpret-trailers.sh| 35 +++
 trailer.c| 77 ++--
 3 files changed, 90 insertions(+), 25 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 93d1db6..c480da6 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -48,7 +48,8 @@ with only spaces at the end of the commit message part, one 
blank line
 will be added before the new trailer.
 
 Existing trailers are extracted from the input message by looking for
-a group of one or more lines that contain a colon (by default), where
+a group of one or more lines in which at least one line contains a 
+colon (by default), where
 the group is preceded by one or more empty (or whitespace-only) lines.
 The group must either be at the end of the message or be the last
 non-whitespace lines before a line that starts with '---'. Such three
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index aee785c..7f5cd2a 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -126,6 +126,37 @@ test_expect_success 'with multiline title in the message' '
test_cmp expected actual
 '
 
+test_expect_success 'with non-trailer lines mixed with trailer lines' '
+   cat >patch <<-\EOF &&
+
+   this: is a trailer
+   this is not a trailer
+   EOF
+   cat >expected <<-\EOF &&
+
+   this: is a trailer
+   this is not a trailer
+   token: value
+   EOF
+   git interpret-trailers --trailer "token: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'with non-trailer lines only' '
+   cat >patch <<-\EOF &&
+
+   this is not a trailer
+   EOF
+   cat >expected <<-\EOF &&
+
+   this is not a trailer
+
+   token: value
+   EOF
+   git interpret-trailers --trailer "token: value" patch >actual &&
+   test_cmp expected actual
+'
+
 test_expect_success 'with config setup' '
git config trailer.ack.key "Acked-by: " &&
cat >expected <<-\EOF &&
@@ -257,6 +288,8 @@ test_expect_success 'with message that has comments' '
cat >>expected <<-\EOF &&
# comment
 
+   # other comment
+   # yet another comment
Reviewed-by: Johan
Cc: Peff
# last comment
@@ -286,6 +319,8 @@ test_expect_success 'with message that has an old style 
conflict block' '
cat >>expected <<-\EOF &&
# comment
 
+   # other comment
+   # yet another comment
Reviewed-by: Johan
Cc: Peff
# last comment
diff --git a/trailer.c b/trailer.c
index 167b2fd..97e96a9 100644
--- a/trailer.c
+++ b/trailer.c
@@ -26,6 +26,10 @@ static struct conf_info default_conf_info;
 
 struct trailer_item {
struct trailer_item *next;
+   /*
+* If this is not a trailer line, the line is stored in value
+* (excluding the terminating newline) and token is NULL.
+*/
char *token;
char *value;
 };
@@ -63,9 +67,14 @@ static size_t token_len_without_separator(const char *token, 
size_t len)
 
 static int same_token(const struct trailer_item *a, const struct arg_item *b)
 {
-   size_t a_len = token_len_without_separator(a->token, strlen(a->token));
-   size_t b_len = token_len_without_separator(b->token, strlen(b->token));
-   size_t min_len = (a_len > b_len) ? b_len : a_len;
+   size_t a_len, b_len, min_len;
+
+   if (!a->token)
+   return 0;
+
+   a_len = to

[PATCH 1/5] trailer: use singly-linked list, not doubly

Use singly-linked lists (instead of doubly-linked lists) in trailer to
keep track of arguments (whether implicit from configuration or explicit
from the command line) and trailer items.

This change significantly reduces the code length and simplifies the code.
There are now fewer pointers to be manipulated, but most trailer
manipulations now require seeking from beginning to end, so there might
be a slight net decrease in performance; however the number of trailers
is usually small (10 to 15 at the most) so this should not cause a big
impact.
---
 trailer.c | 357 ++
 1 file changed, 125 insertions(+), 232 deletions(-)

diff --git a/trailer.c b/trailer.c
index c6ea9ac..bf3d7d0 100644
--- a/trailer.c
+++ b/trailer.c
@@ -25,7 +25,6 @@ struct conf_info {
 static struct conf_info default_conf_info;
 
 struct trailer_item {
-   struct trailer_item *previous;
struct trailer_item *next;
const char *token;
const char *value;
@@ -56,7 +55,8 @@ static size_t token_len_without_separator(const char *token, 
size_t len)
return len;
 }
 
-static int same_token(struct trailer_item *a, struct trailer_item *b)
+static int same_token(const struct trailer_item *a,
+ const struct trailer_item *b)
 {
size_t a_len = token_len_without_separator(a->token, strlen(a->token));
size_t b_len = token_len_without_separator(b->token, strlen(b->token));
@@ -65,12 +65,14 @@ static int same_token(struct trailer_item *a, struct 
trailer_item *b)
return !strncasecmp(a->token, b->token, min_len);
 }
 
-static int same_value(struct trailer_item *a, struct trailer_item *b)
+static int same_value(const struct trailer_item *a,
+ const struct trailer_item *b)
 {
return !strcasecmp(a->value, b->value);
 }
 
-static int same_trailer(struct trailer_item *a, struct trailer_item *b)
+static int same_trailer(const struct trailer_item *a,
+   const struct trailer_item *b)
 {
return same_token(a, b) && same_value(a, b);
 }
@@ -129,92 +131,6 @@ static void print_all(FILE *outfile, struct trailer_item 
*first, int trim_empty)
}
 }
 
-static void update_last(struct trailer_item **last)
-{
-   if (*last)
-   while ((*last)->next != NULL)
-   *last = (*last)->next;
-}
-
-static void update_first(struct trailer_item **first)
-{
-   if (*first)
-   while ((*first)->previous != NULL)
-   *first = (*first)->previous;
-}
-
-static void add_arg_to_input_list(struct trailer_item *on_tok,
- struct trailer_item *arg_tok,
- struct trailer_item **first,
- struct trailer_item **last)
-{
-   if (after_or_end(arg_tok->conf.where)) {
-   arg_tok->next = on_tok->next;
-   on_tok->next = arg_tok;
-   arg_tok->previous = on_tok;
-   if (arg_tok->next)
-   arg_tok->next->previous = arg_tok;
-   update_last(last);
-   } else {
-   arg_tok->previous = on_tok->previous;
-   on_tok->previous = arg_tok;
-   arg_tok->next = on_tok;
-   if (arg_tok->previous)
-   arg_tok->previous->next = arg_tok;
-   update_first(first);
-   }
-}
-
-static int check_if_different(struct trailer_item *in_tok,
- struct trailer_item *arg_tok,
- int check_all)
-{
-   enum action_where where = arg_tok->conf.where;
-   do {
-   if (!in_tok)
-   return 1;
-   if (same_trailer(in_tok, arg_tok))
-   return 0;
-   /*
-* if we want to add a trailer after another one,
-* we have to check those before this one
-*/
-   in_tok = after_or_end(where) ? in_tok->previous : in_tok->next;
-   } while (check_all);
-   return 1;
-}
-
-static void remove_from_list(struct trailer_item *item,
-struct trailer_item **first,
-struct trailer_item **last)
-{
-   struct trailer_item *next = item->next;
-   struct trailer_item *previous = item->previous;
-
-   if (next) {
-   item->next->previous = previous;
-   item->next = NULL;
-   } else if (last)
-   *last = previous;
-
-   if (previous) {
-   item->previous->next = next;
-   item->previous = NULL;
-   } else if (first)
-   *first = next;
-}
-
-static struct trailer_item *remove_first(struct trailer_item **first)
-{
-   struct trailer_item *item = *first;
-   *first = item->next;
-   if (item->next) {
-   item->next->previous = NULL;
-   item->next = NULL;
-

[PATCH 0/5] allow non-trailers and multiple-line trailers

In [1], Junio explained a possible redesign of trailer support in
interpret-trailers and cherry-pick, something along these lines:

 1. Support non-trailers and multi-line trailers in interpret-trailers
 2. Create a helper function so that the new interpretation can be used
elsewhere (e.g. in sequencer)
 3. Modify "Signed-off-by" and "(cherry picked by" to use the helper
function

My current plans for step 1 and step 2 require relatively significant
refactoring in trailer.c, so I thought that I should send out a patch
set covering only step 1 first for discussion, before continuing with my
work on top of this patch set.

Support for steps 2 and 3, including my original use case of being
looser in the definition of a trailer when invoking "cherry-pick -x"
(and thus suppressing the insertion of a newline) will come in a
subsequent patch set.

[1] Message ID 

Jonathan Tan (5):
  trailer: use singly-linked list, not doubly
  trailer: streamline trailer item create and add
  trailer: make args have their own struct
  trailer: allow non-trailers in trailer block
  trailer: support values folded to multiple lines

 Documentation/git-interpret-trailers.txt |  10 +-
 t/t7513-interpret-trailers.sh| 174 +
 trailer.c| 638 +++
 3 files changed, 480 insertions(+), 342 deletions(-)

-- 
2.8.0.rc3.226.g39d4020

[PATCH 5/5] trailer: support values folded to multiple lines

Currently, interpret-trailers requires that a trailer be only on 1 line.
For example:

  a: first line
 second line

would be interpreted as one trailer line followed by one non-trailer line.

Make interpret-trailers support RFC 822-style folding, treating those
lines as one logical trailer.
---
 Documentation/git-interpret-trailers.txt |   7 +-
 t/t7513-interpret-trailers.sh| 139 +++
 trailer.c|  40 ++---
 3 files changed, 170 insertions(+), 16 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index c480da6..cfec636 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -57,11 +57,12 @@ minus signs start the patch part of the message.
 
 When reading trailers, there can be whitespaces before and after the
 token, the separator and the value. There can also be whitespaces
-inside the token and the value.
+inside the token and the value. The value may be split over multiple lines with
+each subsequent line starting with whitespace, like the "folding" in RFC 822.
 
 Note that 'trailers' do not follow and are not intended to follow many
-rules for RFC 822 headers. For example they do not follow the line
-folding rules, the encoding rules and probably many other rules.
+rules for RFC 822 headers. For example they do not follow
+the encoding rules and probably many other rules.
 
 OPTIONS
 ---
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 7f5cd2a..195cdd3 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -157,6 +157,145 @@ test_expect_success 'with non-trailer lines only' '
test_cmp expected actual
 '
 
+test_expect_success 'multiline field treated as atomic for placement' '
+   q_to_tab >patch <<-\EOF &&
+
+   another: trailer
+   name: value on
+   Qmultiple lines
+   another: trailer
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   name: value on
+   Qmultiple lines
+   name: value
+   another: trailer
+   EOF
+   test_config trailer.name.where after &&
+   git interpret-trailers --trailer "name: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'multiline field treated as atomic for replacement' '
+   q_to_tab >patch <<-\EOF &&
+
+   another: trailer
+   name: value on
+   Qmultiple lines
+   another: trailer
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   another: trailer
+   name: value
+   EOF
+   test_config trailer.name.ifexists replace &&
+   git interpret-trailers --trailer "name: value" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'multiline field treated as atomic for difference check' '
+   q_to_tab >patch <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   EOF
+   test_config trailer.name.ifexists addIfDifferent &&
+
+   q_to_tab >trailer <<-\EOF &&
+   name: first line
+   Qsecond line
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   EOF
+   git interpret-trailers --trailer "$(cat trailer)" patch >actual &&
+   test_cmp expected actual &&
+
+   q_to_tab >trailer <<-\EOF &&
+   name: first line
+   Qsecond line *DIFFERENT*
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   name: first line
+   Qsecond line *DIFFERENT*
+   EOF
+   git interpret-trailers --trailer "$(cat trailer)" patch >actual &&
+   test_cmp expected actual &&
+
+   q_to_tab >trailer <<-\EOF &&
+   name: first line *DIFFERENT*
+   Qsecond line
+   EOF
+   q_to_tab >expected <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   name: first line *DIFFERENT*
+   Qsecond line
+   EOF
+   git interpret-trailers --trailer "$(cat trailer)" patch >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'multiline field treated as atomic for neighbor check' '
+   q_to_tab >patch <<-\EOF &&
+
+   another: trailer
+   name: first line
+   Qsecond line
+   another: trailer
+   EOF
+   test_config trailer.name.where after &&
+   test_config

[PATCH] sequencer: mark a file-local symbol static

2016-10-11 Thread Ramsay Jones


Signed-off-by: Ramsay Jones 
---

Hi Johannes,

If you need to re-roll your 'js/prepare-sequencer' branch, could you
please squash this into commit 53f8024e ("sequencer: completely revamp
the "todo" script parsing", 10-10-2016).

Thanks.

ATB,
Ramsay Jones

 sequencer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index d662c6b..aa65628 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -880,7 +880,7 @@ static void todo_list_release(struct todo_list *todo_list)
todo_list->nr = todo_list->alloc = 0;
 }
 
-struct todo_item *append_new_todo(struct todo_list *todo_list)
+static struct todo_item *append_new_todo(struct todo_list *todo_list)
 {
ALLOC_GROW(todo_list->items, todo_list->nr + 1, todo_list->alloc);
return todo_list->items + todo_list->nr++;
-- 
2.10.0

[PATCHv2] attr: convert to new threadsafe API

This revamps the API of the attr subsystem to be thread safe.
Before we had the question and its results in one struct type.
The typical usage of the API was

static struct git_attr_check check;

if (!check)
check = git_attr_check_initl("text", NULL);

git_check_attr(path, check);
act_on(check->value[0]);

This has a couple of issues when it comes to thread safety:

* the initialization is racy in this implementation. To make it
  thread safe, we need to acquire a mutex, such that only one
  thread is executing the code in git_attr_check_initl.
  As we do not want to introduce a mutex at each call site,
  this is best done in the attr code. However to do so, we need
  to have access to the `check` variable, i.e. the API has to
  look like
git_attr_check_initl(struct git_attr_check*, ...);
  Then one of the threads calling git_attr_check_initl will
  acquire the mutex and init the `check`, while all other threads
  will wait on the mutex just to realize they're late to the
  party and they'll return with no work done.

* While the check for attributes to be questioned only need to
  be initalized once as that part will be read only after its
  initialisation, the answer may be different for each path.
  Because of that we need to decouple the check and the answer,
  such that each thread can obtain an answer for the path it
  is currently processing.

This commit changes the API and adds locking in
git_attr_check_initl that provides the thread safety.

Signed-off-by: Stefan Beller 
---

I think this patch is the most interesting patch, so I'll refrain from
resending the other 27 patches, though I have adressed the review comments
locally. I'll resend everything once we are in agreement for this one.

This version adds the actual thread safety,
that is promised in the documentation, however it doesn't add any optimization,
i.e. it's a single global lock. But as we do not expect contention, that is 
fine.

Also there is no example of how to use the thread safe API for asking questions
from multiple threads.

Thanks,
Stefan

 Documentation/technical/api-gitattributes.txt |  93 +--
 archive.c |  14 ++-
 attr.c| 128 +-
 attr.h|  80 ++--
 builtin/check-attr.c  |  30 +++---
 builtin/pack-objects.c|  12 ++-
 convert.c |  35 +++
 ll-merge.c|  27 --
 userdiff.c|  19 ++--
 ws.c  |  15 +--
 10 files changed, 292 insertions(+), 161 deletions(-)

diff --git a/Documentation/technical/api-gitattributes.txt 
b/Documentation/technical/api-gitattributes.txt
index 92fc32a..ac97244 100644
--- a/Documentation/technical/api-gitattributes.txt
+++ b/Documentation/technical/api-gitattributes.txt
@@ -8,6 +8,18 @@ attributes to set of paths.
 Data Structure
 --
 
+extern struct git_attr *git_attr(const char *);
+
+/*
+ * Return the name of the attribute represented by the argument.  The
+ * return value is a pointer to a null-delimited string that is part
+ * of the internal data structure; it should not be modified or freed.
+ */
+extern const char *git_attr_name(const struct git_attr *);
+
+extern int attr_name_valid(const char *name, size_t namelen);
+extern void invalid_attr_name_message(struct strbuf *, const char *, int);
+
 `struct git_attr`::
 
An attribute is an opaque object that is identified by its name.
@@ -16,15 +28,17 @@ Data Structure
of no interest to the calling programs.  The name of the
attribute can be retrieved by calling `git_attr_name()`.
 
-`struct git_attr_check_elem`::
-
-   This structure represents one attribute and its value.
-
 `struct git_attr_check`::
 
-   This structure represents a collection of `git_attr_check_elem`.
+   This structure represents a collection of `struct git_attrs`.
It is passed to `git_check_attr()` function, specifying the
-   attributes to check, and receives their values.
+   attributes to check, and receives their values into a corresponding
+   `struct git_attr_result`.
+
+`struct git_attr_result`::
+
+   This structure represents a collection of results to its
+   corresponding `struct git_attr_check`, that has the same order.
 
 
 Attribute Values
@@ -53,19 +67,30 @@ value of the attribute for the path.
 Querying Specific Attributes
 
 
-* Prepare `struct git_attr_check` using git_attr_check_initl()
+* Prepare a `struct git_attr_check` using `git_attr_check_initl()`
   function, enumerating the names of attributes whose values you are
   interested in, terminated with a NULL pointer.  Alternatively, an
-  empty `struct git_attr_check` can be prepared by calling
-  `git_attr_check_

[PATCH] convert: mark a file-local symbol static

2016-10-11 Thread Ramsay Jones


Signed-off-by: Ramsay Jones 
---

Hi Lars,

If you need to re-roll your 'ls/filter-process' branch, could you
please squash this into commit 85290197
("convert: add filter..process option", 08-10-2016).

Thanks.

ATB,
Ramsay Jones

 convert.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/convert.c b/convert.c
index fe68316..cf30380 100644
--- a/convert.c
+++ b/convert.c
@@ -568,7 +568,7 @@ static void kill_multi_file_filter(struct hashmap *hashmap, 
struct cmd2process *
free(entry);
 }
 
-void stop_multi_file_filter(struct child_process *process)
+static void stop_multi_file_filter(struct child_process *process)
 {
sigchain_push(SIGPIPE, SIG_IGN);
/* Closing the pipe signals the filter to initiate a shutdown. */
-- 
2.10.0

Re: git 2.10.1 test regression in t3700-add.sh

2016-10-11 Thread Jeremy Huddleston Sequoia


> On Oct 10, 2016, at 10:41, Junio C Hamano  wrote:
> 
> Jeremy Huddleston Sequoia  writes:
> 
>> Actually, looks like that as just a rabbit hole.  The real issue
>> looks to be because an earlier test drops down xfoo3 as a symlink,
>> which causes this test to fail due to the collision.  I'll get out
>> a patch in a bit.
> 
> [administrivia: please don't top-post, it is extremely hard to
> follow what is going on]
> 
> There indeed is a test that creates xfoo3 as a symbolic link and
> leaves it in the filesystem pointing at xfoo2 much earlier in the
> sequence.  It seems that later one of the "add ignore-errors" tests
> (there are two back-to-back) runs "git reset --hard" to make it (and
> other symbolic links that are similarly named) go away, namely this
> part of the code:
> 
>test_expect_success POSIXPERM,SANITY 'git add --ignore-errors' '
>git reset --hard &&
>date >foo1 &&
>date >foo2 &&
>chmod 0 foo2 &&
>test_must_fail git add --verbose --ignore-errors . &&
>git ls-files foo1 | grep foo1
>'
> 
>rm -f foo2
> 
>test_expect_success POSIXPERM,SANITY 'git add (add.ignore-errors)' '
>git config add.ignore-errors 1 &&
>git reset --hard &&
>date >foo1 &&
>date >foo2 &&
>chmod 0 foo2 &&
>test_must_fail git add --verbose . &&
>git ls-files foo1 | grep foo1
>'
>rm -f foo2
> 
> "git reset --hard" in the first one, because these symbolic links
> are not in the index at that point in the sequence, would leave them
> untracked and in the working tree.  Then "add" is told to be
> non-atomic with "--ignore-errors", adding them to the index while
> reporting a failure.  When the test after that runs "git reset --hard"
> again, because these symlinks are in the index (and not in HEAD),
> they are removed from the working tree.
> 
> And that is why normal people won't see xfoo3 in later tests, like
> the one you had trouble with.
> 
> Are you running without SANITY by any chance (I am not saying that
> you are doing a wrong thing--just trying to make sure I am guessing
> along the correct route)?

Ah, yep.  That's the ticket:

ok 23 # skip git add should fail atomically upon an unreadable file (missing 
SANITY of POSIXPERM,SANITY)
ok 24 # skip git add --ignore-errors (missing SANITY of POSIXPERM,SANITY)
ok 25 # skip git add (add.ignore-errors) (missing SANITY of POSIXPERM,SANITY)
ok 26 # skip git add (add.ignore-errors = false) (missing SANITY of 
POSIXPERM,SANITY)
ok 27 # skip --no-ignore-errors overrides config (missing SANITY of 
POSIXPERM,SANITY)

I dug into it a bit, and since I'm running the tests during a staging phase 
which runs as root, !SANITY gets set.  Should be solvable by essentially 
breaking test out into post-build instead of pre-install.  Thanks for the 
pointer there.


> If that is the issue, then I think the right correction would be to
> make sure that the files that an individual test expects to be
> missing from the file system is indeed missing by explicitly
> removing it, perhaps like this.
> 
> I also notice that the problematic test uses "chmod 755"; don't we
> need POSIXPERM prerequisite on this test, too, I wonder?
> 
> Thanks.
> 
> -- >8 --
> t3700: fix broken test under !SANITY
> 
> An "add --chmod=+x" test recently added by 610d55af0f ("add: modify
> already added files when --chmod is given", 2016-09-14) used "xfoo3"
> as a test file.  The paths xfoo[1-3] were used by earlier tests for
> symbolic links but they were expected to have been removed by the
> test script reached this new test.
> 
> The removal with "git reset --hard" however happened in tests that
> are protected by POSIXPERM,SANITY prerequisites.  Platforms and test
> environments that lacked these would have seen xfoo3 as a leftover
> symbolic link, pointing somewhere else, and chmod test would have
> given a wrong result.
> 
> 
> 
> t/t3700-add.sh | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/t/t3700-add.sh b/t/t3700-add.sh
> index 924a266126..53c0cb6dea 100755
> --- a/t/t3700-add.sh
> +++ b/t/t3700-add.sh
> @@ -350,6 +350,7 @@ test_expect_success POSIXPERM,SYMLINKS 'git add 
> --chmod=+x with symlinks' '
> '
> 
> test_expect_success 'git add --chmod=[+-]x changes index with already added 
> file' '
> + rm -f foo3 xfoo3 &&
>   echo foo >foo3 &&
>   git add foo3 &&
>   git add --chmod=+x foo3 &&


I actually tried that, but the problem is that xfoo3 was previously added as a 
symlink, so test_mode_in_index still reports it as a symlink.

It's fundamentally a question of who is responsible for cleanup.  Is the 
individual test responsible for cleaning up after itself (such that later tests 
can rely on a clean state), or should individual tests assume that the initial 
state might be undefined and try to cleanup after earlier tests?  I'm in favor 
of either doing the former or ensuring that tests don't step on each-o

Re: Make `git fetch --all` parallel?

On Tue, Oct 11, 2016 at 3:59 PM, Jeff King  wrote:
> On Tue, Oct 11, 2016 at 03:50:36PM -0700, Stefan Beller wrote:
>
>> I agree. Though even for implementing the "dumb" case of fetching
>> objects twice we'd have to take care of some racing issues, I would assume.
>>
>> Why did you put a "sleep 2" below?
>> * a slow start to better spread load locally? (keep the workstation 
>> responsive?)
>> * a slow start to have different fetches in a different phase of the
>> fetch protocol?
>> * avoiding some subtle race?
>>
>> At the very least we would need a similar thing as Jeff recently sent for the
>> push case with objects quarantined and then made available in one go?
>
> I don't think so. The object database is perfectly happy with multiple
> simultaneous writers, and nothing impacts the have/wants until actual
> refs are written. Quarantining objects before the refs are written is an
> orthogonal concept.

If a remote advertises its tips, we'd need to look these up (clientside) to
decide if we have them, and I do not think we'd do that via a reachability
check, but via direct lookup in the object data base? So I do not quite
understand, what we gain from the atomic ref writes in e.g. remote/origin/.


> I'm not altogether convinced that parallel fetch would be that much
> faster, though.

Ok, time to present data... Let's assume a degenerate case first:
"up-to-date with all remotes" because that is easy to reproduce.

I have 14 remotes currently:

$ time git fetch --all
real 0m18.016s
user 0m2.027s
sys 0m1.235s

$ time git config --get-regexp remote.*.url |awk '{print $2}' |xargs
-P 14 -I % git fetch %
real 0m5.168s
user 0m2.312s
sys 0m1.167s

A factor of >3, so I suspect there is improvement ;)

Well just as Ævar pointed out, there is some improvement.

>
> I usually just do a one-off fetch of their URL in such a case, exactly
> because I _don't_ want to end up with a bunch of remotes. You can also
> mark them with skipDefaultUpdate if you only care about them
> occasionally (so you can "git fetch sbeller" when you care about it, but
> it doesn't slow down your daily "git fetch").

And I assume you don't want the remotes because it takes time to fetch and not
because your disk space is expensive. ;)

>
> -Peff

Re: Make `git fetch --all` parallel?

2016-10-11 Thread Ævar Arnfjörð Bjarmason

On Wed, Oct 12, 2016 at 12:59 AM, Jeff King  wrote:

> I'm not altogether convinced that parallel fetch would be that much
> faster, though.

I have local aliases to use GNU parallel for stuff like this, on my
git.git which has accumulated 17 remotes:

$ time parallel -j1 'git fetch {}' ::: $(git remote)
real0m18.265s
$ time parallel -j8 'git fetch {}' ::: $(git remote)
real0m2.957s

In that case I didn't have any new objects to fetch, but just doing
the negotiation in parallel was a lot faster.

So there's big wins in some cases.

Re: Make `git fetch --all` parallel?

On Tue, Oct 11, 2016 at 03:50:36PM -0700, Stefan Beller wrote:

> I agree. Though even for implementing the "dumb" case of fetching
> objects twice we'd have to take care of some racing issues, I would assume.
> 
> Why did you put a "sleep 2" below?
> * a slow start to better spread load locally? (keep the workstation 
> responsive?)
> * a slow start to have different fetches in a different phase of the
> fetch protocol?
> * avoiding some subtle race?
> 
> At the very least we would need a similar thing as Jeff recently sent for the
> push case with objects quarantined and then made available in one go?

I don't think so. The object database is perfectly happy with multiple
simultaneous writers, and nothing impacts the have/wants until actual
refs are written. Quarantining objects before the refs are written is an
orthogonal concept.

I'm not altogether convinced that parallel fetch would be that much
faster, though. Your bottleneck for a fetch is generally the network for
most of the time, then a brief spike of CPU during delta resolution. You
might get some small benefit from overlapping the fetches so that you
spend CPU on one while you spend network on the other, but I doubt it
would be nearly as beneficial as the parallel submodule clones (which
generally have a bigger CPU segment, and also are generally considered
independent, so there's no real tradeoff of getting duplicate objects).

Sometimes the bottleneck is the server preparing the back, but if that
is the case, you should probably complain to your server admin to enable
bitmaps. :)

> I would love to see the implementation though, as over time I accumulate
> a lot or remotes. (Someone published patches on the mailing list and made
> them available somewhere hosted? Grabbing them from their hosting site
> is easier than applying patches for me, so I'd rather fetch them... so I have
> some remotes now)

I usually just do a one-off fetch of their URL in such a case, exactly
because I _don't_ want to end up with a bunch of remotes. You can also
mark them with skipDefaultUpdate if you only care about them
occasionally (so you can "git fetch sbeller" when you care about it, but
it doesn't slow down your daily "git fetch").

-Peff

Re: Make `git fetch --all` parallel?

> I dunno, if documented though.

http://stackoverflow.com/questions/26373995/how-to-control-the-order-of-fetching-when-fetching-all-remotes-by-git-fetch-al

We do not give promises about the order of --all (checked with our
documentation as well), however there seems to be a
grouping scheme for remotes that you can order via
setting remotes.default (which is not documented in man git config,
but only in the git remote man page)

Re: Make `git fetch --all` parallel?

Stefan Beller  writes:

> Why did you put a "sleep 2" below?

The assumption was to fetch from faster and near the center of the
project universe early, so by giving them head-start, fetches that
start in later rounds may have chance to see newly updated remote
tracking refs when telling the poorer other ends what we have.

> At the very least we would need a similar thing as Jeff recently sent for the
> push case with objects quarantined and then made available in one go?

There is no race; ref updates are done only after objects are fully
finalized.  You can do the quarantine but that would defeat the "let
ones from the center of universe finish early so later ones from
poor periphery have more .have's to work with" idea, I suspect.

Re: Make `git fetch --all` parallel?

On Tue, Oct 11, 2016 at 3:37 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> So I do think it would be much faster, but I also think patches for this 
>> would
>> require some thought and a lot of refactoring of the fetch code.
>> ...
>> During the negotiation phase a client would have to be able to change its
>> mind (add more "haves", or in case of the parallel fetching these become
>> "will-have-soons", although the remote figured out the client did not have it
>> earlier.)
>
> Even though a fancy optimization as you outlined might be ideal, I
> suspect that users would be happier if the network bandwidth is
> utilized to talk to multiple remotes at the same time even if they
> end up receiving the same recent objects from more than one place in
> the end.

I agree. Though even for implementing the "dumb" case of fetching
objects twice we'd have to take care of some racing issues, I would assume.

Why did you put a "sleep 2" below?
* a slow start to better spread load locally? (keep the workstation responsive?)
* a slow start to have different fetches in a different phase of the
fetch protocol?
* avoiding some subtle race?

At the very least we would need a similar thing as Jeff recently sent for the
push case with objects quarantined and then made available in one go?

>
> Is the order in which "git fetch --all" iterates over "all remotes"
> predictable and documented?

it is predictable, as it is just the same order as put by grep in
$ grep "\[remote " .git/config, i.e. in order of the file, which in my
case turns out to be sorted by importance/history quite naturally.
But reordering my config file would be not a big deal.

I dunno, if documented though.

> If so, listing the remotes from more
> powerful and well connected place to slower ones and then doing an
> equivalent of stupid
>
> for remote in $list_of_remotes_ordered_in_such_a_way

list_of_remotes_ordered_in_such_a_way is roughly:
$(git config --get-regexp remote.*.url | tr '.' ' ' |awk '{print $2}')

> do
> git fetch "$remote" &
> sleep 2
> done
>
> might be fairly easy thing to bring happiness.

I would love to see the implementation though, as over time I accumulate
a lot or remotes. (Someone published patches on the mailing list and made
them available somewhere hosted? Grabbing them from their hosting site
is easier than applying patches for me, so I'd rather fetch them... so I have
some remotes now)

Re: Make `git fetch --all` parallel?

Stefan Beller  writes:

> So I do think it would be much faster, but I also think patches for this would
> require some thought and a lot of refactoring of the fetch code.
> ...
> During the negotiation phase a client would have to be able to change its
> mind (add more "haves", or in case of the parallel fetching these become
> "will-have-soons", although the remote figured out the client did not have it
> earlier.)

Even though a fancy optimization as you outlined might be ideal, I
suspect that users would be happier if the network bandwidth is
utilized to talk to multiple remotes at the same time even if they
end up receiving the same recent objects from more than one place in
the end.

Is the order in which "git fetch --all" iterates over "all remotes"
predictable and documented?  If so, listing the remotes from more
powerful and well connected place to slower ones and then doing an
equivalent of stupid

for remote in $list_of_remotes_ordered_in_such_a_way
do
git fetch "$remote" &
sleep 2
done

might be fairly easy thing to bring happiness.

Re: [PATCH v10 13/14] convert: add filter..process option

2016-10-11 Thread Lars Schneider


> On 09 Oct 2016, at 01:06, Jakub Narębski  wrote:
> 
> Part 1 of review, starting with the protocol v2 itself.
> 
> W dniu 08.10.2016 o 13:25, larsxschnei...@gmail.com pisze:
>> From: Lars Schneider 
>> 
>> +upon checkin. By default these commands process only a single
>> +blob and terminate.  If a long running `process` filter is used
>> +in place of `clean` and/or `smudge` filters, then Git can process
>> +all blobs with a single filter command invocation for the entire
>> +life of a single Git command, for example `git add --all`.  See
>> +section below for the description of the protocol used to
>> +communicate with a `process` filter.
> 
> I don't remember how this part looked like in previous versions
> of this patch series, but "... is used in place of `clean` ..."
> does not tell explicitly about the precedence of those 
> configuration variables.  I think it should be stated explicitly
> that `process` takes precedence over any `clean` and/or `smudge`
> settings for the same `filter.` (regardless of whether
> the long running `process` filter support "clean" and/or "smudge"
> operations or not).

This is stated explicitly later on. I moved it up here:

"If a long running `process` filter is used
in place of `clean` and/or `smudge` filters, then Git can process
all blobs with a single filter command invocation for the entire
life of a single Git command, for example `git add --all`. If a 
long running `process` filter is configured then it always takes 
precedence over a configured single blob filter. "

OK?


>> +If the filter command (a string value) is defined via
>> +`filter..process` then Git can process all blobs with a
>> +single filter invocation for the entire life of a single Git
>> +command. This is achieved by using a packet format (pkt-line,
>> +see technical/protocol-common.txt) based protocol over standard
>> +input and standard output as follows. All packets, except for the
>> +"*CONTENT" packets and the "" flush packet, are considered
>> +text and therefore are terminated by a LF.
> 
> Maybe s/standard input and output/\& of filter process,/ (that is,
> add "... of filter process," to the third sentence in the above
> paragraph).

You mean "This is achieved by using a packet format (pkt-line,
see technical/protocol-common.txt) based protocol over standard
input and standard output of filter process as follows." ?

I think I like the original version better.


>> After the filter started
> Git sends a welcome message ("git-filter-client"), a list of
>> supported protocol version numbers, and a flush packet. Git expects
>> +to read a welcome response message ("git-filter-server") and exactly
>> +one protocol version number from the previously sent list. All further
>> +communication will be based on the selected version. The remaining
>> +protocol description below documents "version=2". Please note that
>> +"version=42" in the example below does not exist and is only there
>> +to illustrate how the protocol would look like with more than one
>> +version.
>> +
>> +After the version negotiation Git sends a list of all capabilities that
>> +it supports and a flush packet. Git expects to read a list of desired
>> +capabilities, which must be a subset of the supported capabilities list,
>> +and a flush packet as response:
>> +
>> +packet:  git> git-filter-client
>> +packet:  git> version=2
>> +packet:  git> version=42
>> +packet:  git> 
>> +packet:  git< git-filter-server
>> +packet:  git< version=2
>> +packet:  git> clean=true
>> +packet:  git> smudge=true
>> +packet:  git> not-yet-invented=true
>> +packet:  git> 
>> +packet:  git< clean=true
>> +packet:  git< smudge=true
>> +packet:  git< 
> 
> WARNING: This example is different from description!!!

Can you try to explain the difference more clearly? I read it multiple
times and I think this is sound.


> In example you have Git sending "git-filter-client" and list of supported
> protocol versions, terminated with flush packet,

Correct.


> then filter driver
> process sends "git-filter-server", exactly one version, *AND* list of
> supported capabilities in "=true" format, terminated with
> flush packet.

Correct. That's what I read in the text and in the example.

> 
> In description above the example you have 4-part handshake, not 3-part;
> the filter is described to send list of supported capabilities last
> (a subset of what Git command supports).

Part 1: Git sends a welcome message...
Part 2: Git expects to read a welcome response message...
Part 3: After the version negotiation Git sends a list of all capabilities...
Part 4: Git expects to read a list of desired capabilities...

I think example and text match, no?


> Moreover in the example in
> previous version at least as far as v8 of this series, the response
> from filter driver was fixed length list of two lines: magic string
>

Re: [PATCH v10 13/14] convert: add filter..process option

2016-10-11 Thread Lars Schneider


> On 09 Oct 2016, at 07:32, Torsten Bögershausen  wrote:
> 
> On 09.10.16 01:06, Jakub Narębski wrote:
>>> +
 +packet:  git< status=abort
 +packet:  git< 
 +
 +
 +After the filter has processed a blob it is expected to wait for
 +the next "key=value" list containing a command. Git will close
 +the command pipe on exit. The filter is expected to detect EOF
 +and exit gracefully on its own.
>> Any "kill filter" solutions should probably be put here.  I guess
>> that filter exiting means EOF on its standard output when read
>> by Git command, isn't it?
>> 
> Isn't it that Git closes the command pipe, then filter sees EOF on it's stdin
> 
> and does a graceful exit.

Correct!

- Lars

Re: [PATCH 26/28] attr: make git_attr_counted static

On Tue, Oct 11, 2016 at 10:37 AM, Brandon Williams  wrote:
> On 10/10, Stefan Beller wrote:
>> It's not used outside the attr code, so let's keep it private.
>>
>> Change-Id: I0d15e0f2ea944b31d68b9cf1a4edecac0ca2170d
>
> Looks like you forgot to remove this :)

will be fixed in a reroll.
thanks!

>
> --
> Brandon Williams

Re: What's cooking in git.git (Oct 2016, #03; Tue, 11)

Stefan Beller  writes:

> On Tue, Oct 11, 2016 at 2:06 PM, Junio C Hamano  wrote:
>>
>> * sb/submodule-ignore-trailing-slash (2016-10-10) 2 commits
>>   (merged to 'next' on 2016-10-11 at e37425ed17)
>>  + submodule: ignore trailing slash in relative url
>>  + submodule: ignore trailing slash on superproject URL
>>
>>  A minor regression fix for "git submodule".
>>
>>  Will merge to 'master'.
>
> Going by the bug report, this *may* be more than
> minor and worth merging down to maint as well, eventually.

The topic was forked at a reasonably old commit so that it can be
merged as far down to maint-2.9 if we wanted to.  Which means the
regression was fairly old and fix is not all that urgent as well.

Thanks.

Re: [PATCH 0/2] Support `git reset --stdin`

Jeff King  writes:

> Anyway, the existence of this discussion is probably a good argument in
> favor of Dscho's patch. I was mostly curious how close our plumbing
> tools could come.

Sure, in 100% agreement.

Re: [PATCH 0/2] Support `git reset --stdin`

On Tue, Oct 11, 2016 at 02:36:31PM -0700, Junio C Hamano wrote:

> > True. I'd have done something more like:
> >
> >   git ls-tree -r $paths | git update-index --index-info
> >
> > but there are some corner cases around deleting paths from the index.
> 
> Ah, I would think read-tree has the exact same issue, even if we
> added pathspec support, around removal.
> 
> So it is more like
> 
>   (
>   printf "0 \t%s\n" $paths
>   git --literal-pathspecs ls-tree -r --ignore-missing $paths
>   ) | git update-index --index-info
> 
> which does not look too bad, even though this
> 
>   printf "%s\n" $paths | git reset --stdin
> 
> does look shorter.

Of course neither of ours solutions works when "$paths" is coming on
stdin, rather than in a variable, which I suspect was Dscho's original
motivation. :)

One reason not to do the unconditional $z40 in yours is that without it,
I would hope that update-index is smart enough not to discard the stat
information for entries which are unchanged.

I suspect the best answer is more like:

  git diff-index --cached HEAD | git update-index --index-info

except that you have to munge the data in between, because update-index
does not know how to pick the correct data out of the --raw diff output.
But that's probably closer to what git-reset does internally.

Anyway, the existence of this discussion is probably a good argument in
favor of Dscho's patch. I was mostly curious how close our plumbing
tools could come.

-Peff

Re: What's cooking in git.git (Oct 2016, #03; Tue, 11)

On Tue, Oct 11, 2016 at 2:39 PM, Stefan Beller  wrote:
> On Tue, Oct 11, 2016 at 2:06 PM, Junio C Hamano  wrote:
>>
>> * sb/submodule-ignore-trailing-slash (2016-10-10) 2 commits
>>   (merged to 'next' on 2016-10-11 at e37425ed17)
>>  + submodule: ignore trailing slash in relative url
>>  + submodule: ignore trailing slash on superproject URL
>>
>>  A minor regression fix for "git submodule".
>>
>>  Will merge to 'master'.
>>
>
> Going by the bug report, this *may* be more than
> minor and worth merging down to maint as well, eventually.

and here is the actual bug report:

https://public-inbox.org/git/CAL4SumgJbrirymt5+iyNbpo++xXfzJZRiHDm8=0+eCArpCX=d...@mail.gmail.com/

Re: What's cooking in git.git (Oct 2016, #03; Tue, 11)

On Tue, Oct 11, 2016 at 2:06 PM, Junio C Hamano  wrote:
>
> * sb/submodule-ignore-trailing-slash (2016-10-10) 2 commits
>   (merged to 'next' on 2016-10-11 at e37425ed17)
>  + submodule: ignore trailing slash in relative url
>  + submodule: ignore trailing slash on superproject URL
>
>  A minor regression fix for "git submodule".
>
>  Will merge to 'master'.
>

Going by the bug report, this *may* be more than
minor and worth merging down to maint as well, eventually.

Re: [PATCH 0/2] Support `git reset --stdin`

Jeff King  writes:

>> If read-tree had pathspec support (i.e. "read these and only these
>> paths given from the command line into the index from a given
>> tree-ish"), that would have been the most natural place to extend
>> with "oh, by the way, instead of the command line, you can feed the
>> paths on the standard input".
>> 
>> But it doesn't have one.
>
> True. I'd have done something more like:
>
>   git ls-tree -r $paths | git update-index --index-info
>
> but there are some corner cases around deleting paths from the index.

Ah, I would think read-tree has the exact same issue, even if we
added pathspec support, around removal.

So it is more like

(
printf "0 \t%s\n" $paths
git --literal-pathspecs ls-tree -r --ignore-missing $paths
) | git update-index --index-info

which does not look too bad, even though this

printf "%s\n" $paths | git reset --stdin

does look shorter.

Re: [PATCH 0/2] Support `git reset --stdin`

On Tue, Oct 11, 2016 at 12:27:21PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Is git-reset the right layer to add scripting features? I thought we
> > usually pushed people doing mass index manipulation to use update-index
> > or read-tree. Is there something that reset makes easy that is hard with
> > those tools (I could imagine "--hard", but I see it is not supported
> > with your patch).
> >
> > Not that I'm necessarily opposed to the patch, I was just surprised.
> 
> If read-tree had pathspec support (i.e. "read these and only these
> paths given from the command line into the index from a given
> tree-ish"), that would have been the most natural place to extend
> with "oh, by the way, instead of the command line, you can feed the
> paths on the standard input".
> 
> But it doesn't have one.

True. I'd have done something more like:

  git ls-tree -r $paths | git update-index --index-info

but there are some corner cases around deleting paths from the index.

-Peff

Re: [PATCH] upload-pack: use priority queue in reachable() check

Jeff King  writes:

> Like a lot of old commit-traversal code, this keeps a
> commit_list in commit-date order, and and inserts parents
> into the list. This means each insertion is potentially
> linear, and the whole thing is quadratic (though the exact
> runtime depends on the relationship between the commit dates
> and the parent topology).
>
> These days we have a priority queue, which can do the same
> thing with a much better worst-case time.
>
> Signed-off-by: Jeff King 
> ---
> This was inspired by a real-world case where several instances of
> upload-pack were chewing minutes of CPU, and backtraces in each instance
> pointed to this function.  Unfortunately, I only saw the server side of
> these requests. I pulled the contents of have_obj and want_obj out of
> the process via gdb, but I wasn't able to reproduce the slow fetch.
>
> So I'm not 100% sure that this fixes it, but the problem hasn't happened
> again. And it certainly seems like it couldn't _hurt_ to optimize this.

This does look good and looks like a quite straight-forward change.

Will queue; thanks.

Re: interactive rebase should better highlight the not-applying commit

On Tue, Oct 11, 2016 at 12:07 PM, Joshua N Pritikin  wrote:
> As of GIT 2.8.1, if you do an interactive rebase and get some conflict
> in the stack of patches then the commit with the conflict is buried in
> 4-5 lines of output. It is visually difficult to immediately pick out
> which commit did not apply cleanly. I suggest highlighting the 1 line
> commit summary in red or green or some color to help it stand out from
> all the other output.
>
> I decided to suggest this change after I realized that I probably
> skipped a commit during an interactive rebase instead of resolving the
> conflict. I knew I had to skip some commit so I assumed that I just need
> to skip without reading the commit summary carefully. Now it is 7-15
> days after I did the erroneous rebase. I had to spend a few hours today
> with GIT's archaeology tools to find the lost code.
>

Looking at the actual code, this is not as easy as one might assume,
because rebase is written in shell. (One of the last remaining large commands
in shell), and there is no color support in the die(..) function.

However IIUC currently rebase is completely rewritten/ported to C where it is
easier to add color support as we do have some color support in there already.

[PATCH] upload-pack: use priority queue in reachable() check

Like a lot of old commit-traversal code, this keeps a
commit_list in commit-date order, and and inserts parents
into the list. This means each insertion is potentially
linear, and the whole thing is quadratic (though the exact
runtime depends on the relationship between the commit dates
and the parent topology).

These days we have a priority queue, which can do the same
thing with a much better worst-case time.

Signed-off-by: Jeff King 
---
This was inspired by a real-world case where several instances of
upload-pack were chewing minutes of CPU, and backtraces in each instance
pointed to this function.  Unfortunately, I only saw the server side of
these requests. I pulled the contents of have_obj and want_obj out of
the process via gdb, but I wasn't able to reproduce the slow fetch.

So I'm not 100% sure that this fixes it, but the problem hasn't happened
again. And it certainly seems like it couldn't _hurt_ to optimize this.

 upload-pack.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 5ec21e6..d9e381f 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -16,6 +16,7 @@
 #include "string-list.h"
 #include "parse-options.h"
 #include "argv-array.h"
+#include "prio-queue.h"
 
 static const char * const upload_pack_usage[] = {
N_("git upload-pack [] "),
@@ -319,12 +320,12 @@ static int got_sha1(const char *hex, unsigned char *sha1)
 
 static int reachable(struct commit *want)
 {
-   struct commit_list *work = NULL;
+   struct prio_queue work = { compare_commits_by_commit_date };
 
-   commit_list_insert_by_date(want, &work);
-   while (work) {
+   prio_queue_put(&work, want);
+   while (work.nr) {
struct commit_list *list;
-   struct commit *commit = pop_commit(&work);
+   struct commit *commit = prio_queue_get(&work);
 
if (commit->object.flags & THEY_HAVE) {
want->object.flags |= COMMON_KNOWN;
@@ -340,12 +341,12 @@ static int reachable(struct commit *want)
for (list = commit->parents; list; list = list->next) {
struct commit *parent = list->item;
if (!(parent->object.flags & REACHABLE))
-   commit_list_insert_by_date(parent, &work);
+   prio_queue_put(&work, parent);
}
}
want->object.flags |= REACHABLE;
clear_commit_marks(want, REACHABLE);
-   free_commit_list(work);
+   clear_prio_queue(&work);
return (want->object.flags & COMMON_KNOWN);
 }
 
-- 
2.10.1.572.g142464c

What's cooking in git.git (Oct 2016, #03; Tue, 11)

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

A handful of topics have been merged to 'master' and the upcoming
release started to take shape.  Quite a few topics that no longer
merge cleanly to 'pu' and have been excluded from 'pu' for some time
have finally been moved to the Discarded section (many of them are
expected to be updated and return).

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ak/curl-imap-send-explicit-scheme (2016-08-17) 1 commit
  (merged to 'next' on 2016-09-22 at 4449584c26)
 + imap-send: Tell cURL to use imap:// or imaps://

 When we started cURL to talk to imap server when a new enough
 version of cURL library is available, we forgot to explicitly add
 imap(s):// before the destination.  To some folks, that didn't work
 and the library tried to make HTTP(s) requests instead.


* cp/completion-negative-refs (2016-08-24) 1 commit
  (merged to 'next' on 2016-09-22 at abd1585aa6)
 + completion: support excluding refs

 The command-line completion script (in contrib/) learned to
 complete "git cmd ^mas" to complete the negative end of
 reference to "git cmd ^master".


* dp/autoconf-curl-ssl (2016-06-28) 1 commit
  (merged to 'next' on 2016-09-22 at 9c5aeeced9)
 + ./configure.ac: detect SSL in libcurl using curl-config

 The ./configure script generated from configure.ac was taught how
 to detect support of SSL by libcurl better.


* jc/blame-reverse (2016-06-14) 2 commits
  (merged to 'next' on 2016-09-22 at d1a8e9ce99)
 + blame: dwim "blame --reverse OLD" as "blame --reverse OLD.."
 + blame: improve diagnosis for "--reverse NEW"

 It is a common mistake to say "git blame --reverse OLD path",
 expecting that the command line is dwimmed as if asking how lines
 in path in an old revision OLD have survived up to the current
 commit.


* jk/pack-objects-optim-mru (2016-08-11) 4 commits
  (merged to 'next' on 2016-09-21 at 97b919bdbd)
 + pack-objects: use mru list when iterating over packs
 + pack-objects: break delta cycles before delta-search phase
 + sha1_file: make packed_object_info public
 + provide an initializer for "struct object_info"

 Originally merged to 'next' on 2016-08-11

 "git pack-objects" in a repository with many packfiles used to
 spend a lot of time looking for/at objects in them; the accesses to
 the packfiles are now optimized by checking the most-recently-used
 packfile first.


* nd/shallow-deepen (2016-06-13) 27 commits
  (merged to 'next' on 2016-09-22 at f0cf3e3385)
 + fetch, upload-pack: --deepen=N extends shallow boundary by N commits
 + upload-pack: add get_reachable_list()
 + upload-pack: split check_unreachable() in two, prep for get_reachable_list()
 + t5500, t5539: tests for shallow depth excluding a ref
 + clone: define shallow clone boundary with --shallow-exclude
 + fetch: define shallow boundary with --shallow-exclude
 + upload-pack: support define shallow boundary by excluding revisions
 + refs: add expand_ref()
 + t5500, t5539: tests for shallow depth since a specific date
 + clone: define shallow clone boundary based on time with --shallow-since
 + fetch: define shallow boundary with --shallow-since
 + upload-pack: add deepen-since to cut shallow repos based on time
 + shallow.c: implement a generic shallow boundary finder based on rev-list
 + fetch-pack: use a separate flag for fetch in deepening mode
 + fetch-pack.c: mark strings for translating
 + fetch-pack: use a common function for verbose printing
 + fetch-pack: use skip_prefix() instead of starts_with()
 + upload-pack: move rev-list code out of check_non_tip()
 + upload-pack: make check_non_tip() clean things up on error
 + upload-pack: tighten number parsing at "deepen" lines
 + upload-pack: use skip_prefix() instead of starts_with()
 + upload-pack: move "unshallow" sending code out of deepen()
 + upload-pack: remove unused variable "backup"
 + upload-pack: move "shallow" sending code out of deepen()
 + upload-pack: move shallow deepen code out of receive_needs()
 + transport-helper.c: refactor set_helper_option()
 + remote-curl.c: convert fetch_git() to use argv_array

 The existing "git fetch --depth=" option was hard to use
 correctly when making the history of an existing shallow clone
 deeper.  A new option, "--deepen=", has been added to make this
 easier to use.  "git clone" also learned "--shallow-since="
 and "--shallow-exclude=" options to make it easier to specify
 "I am interested only in the recent N months worth of history" and
 "Give me only the history since that version".


* rs/qsort (2016-10-03) 6 commits
  (merged to 'next' on 2016-10-06 at 32a5bd3c88)
 + show-b

Re: [PATCH 2/2] reset: support the --stdin option

2016-10-11 Thread Jakub Narębski

W dniu 11.10.2016 o 18:09, Johannes Schindelin pisze:

>  SYNOPSIS
>  
>  [verse]
> -'git reset' [-q] [] [--] ...
> +'git reset' [-q] [--stdin [-z]] [] [--] ...

I think you meant here

  +'git reset' [-q] [--stdin [-z]] []

Because you say "*Instead*" below.

> +--stdin::
> + Instead of taking list of paths from the command line,
> + read list of paths from the standard input.  Paths are
> + separated by LF (i.e. one path per line) by default.

And die if  were supplied:

> + if (pathspec.nr)
> + die(_("--stdin is incompatible with path arguments"));

Of course you need to fix it in built-in synopsis as well:

> + N_("git reset [-q] [--stdin [-z]] [] [--] ..."),
>   N_("git reset --patch [] [--] [...]"),

-- 
Jakub Narębski

Re: interactive rebase should better highlight the not-applying commit

On Tue, Oct 11, 2016 at 12:07 PM, Joshua N Pritikin  wrote:
> I assume somebody familiar with GIT's code base could make this change
> in about 10 minutes.

Can you elaborate how you come to that estimate?

Re: Make `git fetch --all` parallel?

On Tue, Oct 11, 2016 at 1:12 PM, Ram Rachum  wrote:
> Hi everyone!
>
> I have a repo that has a bunch of different remotes, and I noticed
> slowness when doing `git fetch --all`. Is it currently made
> sequentially? Do you think that maybe it could be done in parallel so
> it could be much faster?
>
> Thanks,
> Ram.

If you were to run fetching from each remote in parallel
assuming the work load is unchanged, this would speed up the
execution by the number of remotes.

This translation sounds pretty easy at first, but when looking into
the details it is not as easy any more:

What if 2 remotes have the same object (e.g. the same commit)?
Currently this is easy: The first remote to fetch from will deliver that
object to you.

When fetching in parallel, we would want to download that object from
just one remote, preferably the remote with better network connectivity(?)

So I do think it would be much faster, but I also think patches for this would
require some thought and a lot of refactoring of the fetch code.

The current fetch protocol is roughly:

remote: I have these refs:
8a36cd87b7c85a651ab388d403629865ffa3ba0d HEAD
10d26b0d1ef1ebfd09418ec61bdadc299ac988e2 refs/heads/ab/gitweb-abbrev-links
77947bbe24e0306d1ce5605c962c4a25f5aca22f refs/heads/ab/gitweb-link-html-escape
...

client: I want 8a36cd87b7c85a651ab388d403629865ffa3ba0d,
and I have 231ce93d2a0b0b4210c810e865eb5db7ba3032b2
and I have 02d0927973782f4b8b7317b499979fada1105be6
and I have 1172e16af07d6e15bca6398f0ded18a0ae7b9249

remote: I don't know about 231ce93d2a0b0b4210c810e865eb5db7ba3032b2,
nor 02d0927973782f4b8b7317b499979fada1105be6, but
I know about 1172e16af07d6e15bca6398f0ded18a0ae7b9249

 conversation continues...

remote: Ok I figured out what you need, here is a packfile:



During the negotiation phase a client would have to be able to change its
mind (add more "haves", or in case of the parallel fetching these become
"will-have-soons", although the remote figured out the client did not have it
earlier.)

If you want to see more details, see Documentation/technical/pack-protocol.txt

Re: [PATCH] contrib: add credential helper for libsecret

On Tue, 2016-10-11 at 13:13 -0700, Junio C Hamano wrote:
> Dennis Kaarsemaker  writes:
> 
> > On Tue, 2016-10-11 at 22:48 +0300, Mantas Mikulėnas wrote:
> > > On 2016-10-11 22:36, Junio C Hamano wrote:
> > > > Thanks for a review.  I'll wait until one of (1) a squashable patch
> > > > to address the "we do not want unconditional overwrite" issue, (2) a
> > > > reroll from Mantas to do the same, or (3) a counter-argument from
> > > > somebody to explain why unconditional overwrite is a good idea here
> > > > (but not in the original) appears.
> > > 
> > > 
> > > 
> > > I overlooked that. I can write a patch, but it shouldn't make any
> > > difference in practice – if c->username *was* set, then it would also
> > > get added to the search attribute list, therefore the search couldn't
> > > possibly return any results with a different username anyway.
> > 
> > 
> > Makes sense, so a (3) it is.
> 
> 
> So... does it mean the gnome-keyring one needs a bugfix?

I'd say both behaviours are correct.

When you do a search without a username, both helpers fill in the
username returned by the actual credential storage. When you do a
search with a username, the gnome-keyring helper won't overwrite the
value passed in and the libsecret helper overwrites it with the same
value, as the search can only return matches that have the same value.

D.

Re: [PATCH] contrib: add credential helper for libsecret

Junio C Hamano  writes:

> Dennis Kaarsemaker  writes:
>
>> On Tue, 2016-10-11 at 22:48 +0300, Mantas Mikulėnas wrote:
>>> On 2016-10-11 22:36, Junio C Hamano wrote:
>>> > Thanks for a review.  I'll wait until one of (1) a squashable patch
>>> > to address the "we do not want unconditional overwrite" issue, (2) a
>>> > reroll from Mantas to do the same, or (3) a counter-argument from
>>> > somebody to explain why unconditional overwrite is a good idea here
>>> > (but not in the original) appears.
>>> 
>>> 
>>> I overlooked that. I can write a patch, but it shouldn't make any
>>> difference in practice – if c->username *was* set, then it would also
>>> get added to the search attribute list, therefore the search couldn't
>>> possibly return any results with a different username anyway.
>>
>> Makes sense, so a (3) it is.
>
> So... does it mean the gnome-keyring one needs a bugfix?

Just so there is no misunderstanding, updating (or not)
gnome-keyring code is an unrelated issue.  

I'll queue the patch under discussion in this thread, and if an
update to gnome-keyring appears, that will be queued separately.

Thanks again, both of you.

Re: [PATCH] contrib: add credential helper for libsecret

Dennis Kaarsemaker  writes:

> On Tue, 2016-10-11 at 22:48 +0300, Mantas Mikulėnas wrote:
>> On 2016-10-11 22:36, Junio C Hamano wrote:
>> > Thanks for a review.  I'll wait until one of (1) a squashable patch
>> > to address the "we do not want unconditional overwrite" issue, (2) a
>> > reroll from Mantas to do the same, or (3) a counter-argument from
>> > somebody to explain why unconditional overwrite is a good idea here
>> > (but not in the original) appears.
>> 
>> 
>> I overlooked that. I can write a patch, but it shouldn't make any
>> difference in practice – if c->username *was* set, then it would also
>> get added to the search attribute list, therefore the search couldn't
>> possibly return any results with a different username anyway.
>
> Makes sense, so a (3) it is.

So... does it mean the gnome-keyring one needs a bugfix?

Make `git fetch --all` parallel?

2016-10-11 Thread Ram Rachum

Hi everyone!

I have a repo that has a bunch of different remotes, and I noticed
slowness when doing `git fetch --all`. Is it currently made
sequentially? Do you think that maybe it could be done in parallel so
it could be much faster?

Thanks,
Ram.

Re: [PATCH] contrib: add credential helper for libsecret

On Tue, 2016-10-11 at 22:48 +0300, Mantas Mikulėnas wrote:
> On 2016-10-11 22:36, Junio C Hamano wrote:
> > Thanks for a review.  I'll wait until one of (1) a squashable patch
> > to address the "we do not want unconditional overwrite" issue, (2) a
> > reroll from Mantas to do the same, or (3) a counter-argument from
> > somebody to explain why unconditional overwrite is a good idea here
> > (but not in the original) appears.
> 
> 
> I overlooked that. I can write a patch, but it shouldn't make any
> difference in practice – if c->username *was* set, then it would also
> get added to the search attribute list, therefore the search couldn't
> possibly return any results with a different username anyway.

Makes sense, so a (3) it is.

D.

Re: [PATCH] t4014-format-patch: Adjust git_version regex to better handle distro changes to DEF_VER

2016-10-11 Thread Eric Wong

Junio C Hamano  wrote:
> Jeremy Huddleston Sequoia  writes:
> 
> > CC: Josh Triplett 
> > CC: Junio C Hamano 
> 
> Please don't do this in your log message.  These belong to your
> e-mail headers, not here.

Fwiw, I prefer having these trailers.  It makes it easier to
maintain the Cc: list through multiple iterations/authors and is
also common practice on LKML.

Re: [PATCH] contrib: add credential helper for libsecret

2016-10-11 Thread Mantas Mikulėnas

On 2016-10-11 22:48, Mantas Mikulėnas wrote:
> On 2016-10-11 22:36, Junio C Hamano wrote:
>> Thanks for a review.  I'll wait until one of (1) a squashable patch
>> to address the "we do not want unconditional overwrite" issue, (2) a
>> reroll from Mantas to do the same, or (3) a counter-argument from
>> somebody to explain why unconditional overwrite is a good idea here
>> (but not in the original) appears.
> 
> I overlooked that. I can write a patch, but it shouldn't make any
> difference in practice – if c->username *was* set, then it would also
> get added to the search attribute list, therefore the search couldn't
> possibly return any results with a different username anyway.

On a second thought, it doesn't actually make sense _not_ to override
the username. Let's say the search query *somehow* returned a different
account than requested. With the original (gnome-keyring helper's)
behavior, the final output would have the old account's username, but
the new account's password – which has very little chance of working.

-- 
Mantas Mikulėnas

Re: [PATCH] contrib: add credential helper for libsecret

2016-10-11 Thread Mantas Mikulėnas

On 2016-10-11 22:36, Junio C Hamano wrote:
> Thanks for a review.  I'll wait until one of (1) a squashable patch
> to address the "we do not want unconditional overwrite" issue, (2) a
> reroll from Mantas to do the same, or (3) a counter-argument from
> somebody to explain why unconditional overwrite is a good idea here
> (but not in the original) appears.

I overlooked that. I can write a patch, but it shouldn't make any
difference in practice – if c->username *was* set, then it would also
get added to the search attribute list, therefore the search couldn't
possibly return any results with a different username anyway.

-- 
Mantas Mikulėnas

Re: [PATCH 28/28] attr: convert to new threadsafe API

Stefan Beller  writes:

>> You can have many different callsites (think: hits "git grep" finds)
>> that call git_attr_check_initl() and they all may be asking for
>> different set of attributes.  As long as they are using different
>> "check" instance to receive these sets of attributes, they are OK.
>
> Right, but that requires a mutex per callsite; up to now I imagined
> a global mutex only, which is how I came to the conclusion.

Hmph, why?  I agree it would make it easier to allow a mutex keyed
by the address of "check" to enhance parallelism, but I do not think
it _requires_ it, in the sense that if you were anticipating low
contention, one mutex to cover any and all callers to initl() would
still give you a correct result.

I wonder if we can assume and rely on atomicity of store, e.g. if we
can cheaply notice "ah this check is already initialized" without
locking.  Then initl() might become something like

initl(char **variable, ...)
{
struct git_attr_check *check;

if (*variable)
return; /* somebody else did it already */
acquire lock;
if (*variable) {
/* somebody else was about to finish when we checked earlier */
release lock;
return;
}
check = alloc();
populate check;
*variable = check;
release lock;
}

and once the system starts running and starts asking the same
question for thousands of paths, the majority of initl() calls
can still be almost no-op, like the original single-threaded

static struct git_attr_check *check = NULL;
if (!check)
check = init();

can avoid heavyweight init() most of the time.

Re: [PATCH 0/2] Support `git reset --stdin`

Jeff King  writes:

> Is git-reset the right layer to add scripting features? I thought we
> usually pushed people doing mass index manipulation to use update-index
> or read-tree. Is there something that reset makes easy that is hard with
> those tools (I could imagine "--hard", but I see it is not supported
> with your patch).
>
> Not that I'm necessarily opposed to the patch, I was just surprised.

If read-tree had pathspec support (i.e. "read these and only these
paths given from the command line into the index from a given
tree-ish"), that would have been the most natural place to extend
with "oh, by the way, instead of the command line, you can feed the
paths on the standard input".  

But it doesn't have one.

Re: [PATCH] contrib: add credential helper for libsecret

Dennis Kaarsemaker  writes:

> On Sun, 2016-10-09 at 15:34 +0300, Mantas Mikulėnas wrote:
>
>> +s = g_hash_table_lookup(attributes, "user");
>> +if (s) {
>> +g_free(c->username);
>> +c->username = g_strdup(s);
>> +}
>
> This always overwrites c->username, the original gnome-keyring version
> only does that when the username isn't set. Other than that it looks
> good to me.
>
> Reviewed-by: Dennis Kaarsemaker 
> Tested-by: Dennis Kaarsemaker 

Thanks for a review.  I'll wait until one of (1) a squashable patch
to address the "we do not want unconditional overwrite" issue, (2) a
reroll from Mantas to do the same, or (3) a counter-argument from
somebody to explain why unconditional overwrite is a good idea here
(but not in the original) appears.

interactive rebase should better highlight the not-applying commit

2016-10-11 Thread Joshua N Pritikin

As of GIT 2.8.1, if you do an interactive rebase and get some conflict 
in the stack of patches then the commit with the conflict is buried in 
4-5 lines of output. It is visually difficult to immediately pick out 
which commit did not apply cleanly. I suggest highlighting the 1 line 
commit summary in red or green or some color to help it stand out from 
all the other output.

I decided to suggest this change after I realized that I probably 
skipped a commit during an interactive rebase instead of resolving the 
conflict. I knew I had to skip some commit so I assumed that I just need 
to skip without reading the commit summary carefully. Now it is 7-15 
days after I did the erroneous rebase. I had to spend a few hours today 
with GIT's archaeology tools to find the lost code.

I assume somebody familiar with GIT's code base could make this change 
in about 10 minutes.

-- 
Joshua N. Pritikin, Ph.D.
Virginia Institute for Psychiatric and Behavioral Genetics
Virginia Commonwealth University
PO Box 980126
800 E Leigh St, Biotech One, Suite 1-133
Richmond, VA 23219
http://people.virginia.edu/~jnp3bc

Re: [PATCH v3 12/25] sequencer: remember the onelines when parsing the todo file

Johannes Schindelin  writes:

> The `git-rebase-todo` file contains a list of commands. Most of those
> commands have the form
>
> 
>
> The  is displayed primarily for the user's convenience, as
> rebase -i really interprets only the   part. However, there
> are *some* places in interactive rebase where the  is used to
> display messages, e.g. for reporting at which commit we stopped.
>
> So let's just remember it when parsing the todo file; we keep a copy of
> the entire todo file anyway (to write out the new `done` and
> `git-rebase-todo` file just before processing each command), so all we
> need to do is remember the begin offsets and lengths.
>
> As we will have to parse and remember the command-line for `exec` commands
> later, we do not call the field "oneline" but rather "arg" (and will reuse
> that for exec's command-line).
>
> Signed-off-by: Johannes Schindelin 
> ---
>  sequencer.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/sequencer.c b/sequencer.c
> index afc494e..7ba5e07 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -708,6 +708,8 @@ static int read_and_refresh_cache(struct replay_opts 
> *opts)
>  struct todo_item {
>   enum todo_command command;
>   struct commit *commit;
> + const char *arg;
> + int arg_len;
>   size_t offset_in_buf;

micronit: you can make it to size_t and lose the cast below, no?

>  };
>  
> @@ -759,6 +761,9 @@ static int parse_insn_line(struct todo_item *item, const 
> char *bol, char *eol)
>   status = get_sha1(bol, commit_sha1);
>   *end_of_object_name = saved;
>  
> + item->arg = end_of_object_name + strspn(end_of_object_name, " \t");
> + item->arg_len = (int)(eol - item->arg);
> +
>   if (status < 0)
>   return -1;
>  
> @@ -911,6 +916,8 @@ static int walk_revs_populate_todo(struct todo_list 
> *todo_list,
>  
>   item->command = command;
>   item->commit = commit;
> + item->arg = NULL;
> + item->arg_len = 0;
>   item->offset_in_buf = todo_list->buf.len;
>   subject_len = find_commit_subject(commit_buffer, &subject);
>   strbuf_addf(&todo_list->buf, "%s %s %.*s\n", command_string,

Re: [PATCH v3 13/25] sequencer: prepare for rebase -i's commit functionality

> @@ -370,19 +383,79 @@ static int is_index_unchanged(void)
>  }
>  
>  /*
> + * Read the author-script file into an environment block, ready for use in
> + * run_command(), that can be free()d afterwards.
> + */
> +static char **read_author_script(void)
> +{
> + struct strbuf script = STRBUF_INIT;
> + int i, count = 0;
> + char *p, *p2, **env;
> + size_t env_size;
> +
> + if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0)
> + return NULL;
> +
> + for (p = script.buf; *p; p++)
> + if (skip_prefix(p, "'''", (const char **)&p2))
> + strbuf_splice(&script, p - script.buf, p2 - p, "'", 1);
> + else if (*p == '\'')
> + strbuf_splice(&script, p-- - script.buf, 1, "", 0);
> + else if (*p == '\n') {
> + *p = '\0';
> + count++;
> + }

Hmph, didn't we recently add parse_key_value_squoted() to build
read_author_script() in builtin/am.c on top of it, so that this
piece of code can also take advantage of and share the parser?

> +/*

Offtopic: this line and the beginning of the new comment block that
begins with "Read the author-script" above show a suboptimal marking
of what is added and what is left.  I wonder "diff-indent-heuristic"
topic by Michael can help to make it look better.

>   * If we are cherry-pick, and if the merge did not result in
>   * hand-editing, we will hit this commit and inherit the original
>   * author date and name.
> + *
>   * If we are revert, or if our cherry-pick results in a hand merge,
>   * we had better say that the current user is responsible for that.
> + *
> + * An exception is when run_git_commit() is called during an
> + * interactive rebase: in that case, we will want to retain the
> + * author metadata.
>   */

Re: Allow "git shortlog" to group by committer information

On Tue, Oct 11, 2016 at 12:07:40PM -0700, Linus Torvalds wrote:

> On Tue, Oct 11, 2016 at 12:01 PM, Jeff King  wrote:
> >
> > My implementation is a little more complicated because it's also setting
> > things up for grouping by trailers (so you can group by "signed-off-by",
> > for example). I don't know if that's useful to your or not.
> 
> Hmm. Maybe in theory. But probably not in reality - it's just not
> unique enough (ie there are generally multiple, and if you choose the
> first/last, it should be the same as author/committer, so it doesn't
> actually add anything).

The implementation I did credited each commit multiple times if the
trailer appeared more than once. If you want to play with it, you can
fetch it from:

  git://github.com/peff jk/shortlog-ident

and then something like:

  git shortlog --ident=reviewed-by --format='...reviewed %an'

works. I haven't found it to really be useful for more than toy
statistic gathering, though.

> There are possibly other things that *could* be grouped by and might be 
> useful:
> 
>  - main subdirectory it touches (I've often wanted that)
> 
>  - rough size of diff or number of files it touches
> 
> but realistically both are painful enough that it probably doesn't
> make sense to do in some low-level helper.

Yeah, I think there's a lot of policy there in what counts as "main",
the rough sizes, etc. I've definitely done queries like that before, but
usually by piping "log --numstat" into perl. It's a minor pain to get
the data into perl data structures, but once you have it, you have a lot
more flexibility in what you can compute.

That might be aided by providing more structured machine-readable output
from git, like JSON (which I don't particularly like, but it's kind-of a
standard, and it sure as hell beats XML). But obviously that's another
topic entirely.

> > I'm fine with this less invasive version, but a few suggestions:
> >
> >  - do you want to call it --group-by=committer (with --group-by=author
> >as the default), which could later extend naturally to other forms of
> >grouping?
> 
> Honestly, it's probably the more generic one, but especially for
> one-off commands that aren't that common, it's a pain to write. When
> testing it, I literally just used "-c" for that reason.

It's not the end of the world to call it "-c" now, and later define "-c"
as a shorthand for "--group-by=committer", if and when the latter comes
into existence.

Keep in mind that shortlog takes arbitrary revision options, too, and
"-c" is defined there for combined diffs. I can't think of a good reason
to want to pass it to shortlog, though, so I don't think it's a big
loss.

-Peff

Re: Allow "git shortlog" to group by committer information

2016-10-11 Thread Linus Torvalds

On Tue, Oct 11, 2016 at 12:01 PM, Jeff King  wrote:
>
> My implementation is a little more complicated because it's also setting
> things up for grouping by trailers (so you can group by "signed-off-by",
> for example). I don't know if that's useful to your or not.

Hmm. Maybe in theory. But probably not in reality - it's just not
unique enough (ie there are generally multiple, and if you choose the
first/last, it should be the same as author/committer, so it doesn't
actually add anything).

There are possibly other things that *could* be grouped by and might be useful:

 - main subdirectory it touches (I've often wanted that)

 - rough size of diff or number of files it touches

but realistically both are painful enough that it probably doesn't
make sense to do in some low-level helper.

> I'm fine with this less invasive version, but a few suggestions:
>
>  - do you want to call it --group-by=committer (with --group-by=author
>as the default), which could later extend naturally to other forms of
>grouping?

Honestly, it's probably the more generic one, but especially for
one-off commands that aren't that common, it's a pain to write. When
testing it, I literally just used "-c" for that reason.

I wrote the patch because I've wanted this before, but it's a "once or
twice a merge window" thing for me, so ..

>  - you might want to steal the tests and documentation from my patch
>(though obviously they would need tweaked to match your interface)

Heh. Yes.

  Linus

Re: [PATCH 0/2] Support `git reset --stdin`

On Tue, Oct 11, 2016 at 06:08:56PM +0200, Johannes Schindelin wrote:

> This feature was missing, and made it cumbersome for third-party
> tools to reset a lot of paths in one go.
> 
> Support for --stdin has been added, following builtin/checkout-index.c's
> example.

Is git-reset the right layer to add scripting features? I thought we
usually pushed people doing mass index manipulation to use update-index
or read-tree. Is there something that reset makes easy that is hard with
those tools (I could imagine "--hard", but I see it is not supported
with your patch).

Not that I'm necessarily opposed to the patch, I was just surprised.

-Peff

Re: Allow "git shortlog" to group by committer information

On Tue, Oct 11, 2016 at 11:45:58AM -0700, Linus Torvalds wrote:

> In some situations you may want to group the commits not by author,
> but by committer instead.
> 
> For example, when I just wanted to look up what I'm still missing from
> linux-next in the current merge window, I don't care so much about who
> wrote a patch, as what git tree it came from, which generally boils
> down to "who committed it".
> 
> So make git shortlog take a "-c" or "--committer" option to switch
> grouping to that.

I made a very similar patch as part of a larger series:

  http://public-inbox.org/git/20151229073515.gk8...@sigill.intra.peff.net/

but never followed through with it because it wasn't clear that grouping
by anything besides author was actually useful to anybody.

My implementation is a little more complicated because it's also setting
things up for grouping by trailers (so you can group by "signed-off-by",
for example). I don't know if that's useful to your or not.

I'm fine with this less invasive version, but a few suggestions:

 - do you want to call it --group-by=committer (with --group-by=author
   as the default), which could later extend naturally to other forms of
   grouping?

 - you might want to steal the tests and documentation from my patch
   (though obviously they would need tweaked to match your interface)

-Peff

Re: [PATCH v3 08/25] sequencer: strip CR from the todo script

Johannes Schindelin  writes:

> It is not unheard of that editors on Windows write CR/LF even if the
> file originally had only LF. This is particularly awkward for exec lines
> of a rebase -i todo sheet. Take for example the insn "exec echo": The
> shell script parser splits at the LF and leaves the CR attached to
> "echo", which leads to the unknown command "echo\r".
>
> Work around that by stripping CR when reading the todo commands, as we
> already do for LF.
>
> This happens to fix t9903.14 and .15 in MSYS1 environments (with the
> rebase--helper patches based on this patch series): the todo script
> constructed in such a setup contains CR/LF thanks to MSYS1 runtime's
> cleverness.
>
> Based on a report and a patch by Johannes Sixt.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  sequencer.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/sequencer.c b/sequencer.c
> index 678fdf3..cee7e50 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -774,6 +774,9 @@ static int parse_insn_buffer(char *buf, struct todo_list 
> *todo_list)
>  
>   next_p = *eol ? eol + 1 /* skip LF */ : eol;
>  
> + if (p != eol && eol[-1] == '\r')
> + eol--; /* skip Carriage Return */

micronit: s/skip/strip/ ;-)

> +
>   item = append_new_todo(todo_list);
>   item->offset_in_buf = p - todo_list->buf.buf;
>   if (parse_insn_line(item, p, eol)) {

Re: [PATCH 28/28] attr: convert to new threadsafe API

On Tue, Oct 11, 2016 at 11:23 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>>> I find this description a bit confusing.  At least the way I
>>> envisioned was that when this piece of code is run by multiple
>>> people at the same time,
>>>
>>> static struct git_attr_check *check = NULL;
>>> git_attr_check_initl(&check, ...);
>>>
>>> we won't waste the "check" by allocated by the first one by
>>> overwriting it with another one allocated by the second one.  So
>>> "the same arguments" does not come into the picture.  A single
>>> variable is either
>>>
>>>  * already allocated and initialised by the an earlier call to
>>>initl() by somebody else, or
>>>
>>>  * originally NULL when you call initl(), and the implementation
>>>makes sure that other people wait while you allocate, initialise
>>>and assign it to the variable, or
>>>
>>>  * originally NULL when you call initl(), but the implementation
>>>notices that somebody else is doing the second bullet point
>>>above, and you wait until that somebody else finishes and then
>>>you return without doing anything (because by that time, "check"
>>>is filled by that other party doing the second bullet point
>>>above).
>>>
>>> There is no need to check for "the same arguments".
>>>
>>
>> I see. So we assume that there are no different arguments at the same time,
>> i.e. all threads run the same code when it comes to attrs.
>
> Sorry, but I fail to see how you can jump to that conclusion.
> Puzzled.
>
> You can have many different callsites (think: hits "git grep" finds)
> that call git_attr_check_initl() and they all may be asking for
> different set of attributes.  As long as they are using different
> "check" instance to receive these sets of attributes, they are OK.

Right, but that requires a mutex per callsite; up to now I imagined
a global mutex only, which is how I came to the conclusion.

>
> It is insane to use the same "check" variable to receive sets of
> attributes for different attributes,

I agree.

> be it from the same call or
> different one, it is insane to do this:
>
> func(char *anotherattr)
> {
> static struct git_attr_check *check = NULL;
> git_attr_check_initl(&check, "one", anotherattr, ...);
>
> ... use "check" to ask question ...
> }
>
> The whole point of having a static, initialize-once instance of
> "check" is so that initl() can do potentially heavy preparation just
> once and keep reusing it.  Allowing a later caller of func() to pass
> a value of anotherattr that is different from the one used in the
> first call that did cause initl() to allocate-initialise-assign to
> "check" is simply insane, even there is no threading issue.

I was imagining a file.c like that:

static struct git_attr_check *check = NULL;

void one_func()
{
git_attr_check_initl(&check, "one", ...);
...
}

void two_func()
{
git_attr_check_initl(&check, "two", ...);
...
}


int foo_cmd(const char *prefix int argc, char **argv)
{
foreach_path(get_paths(...))
one_func();
check = NULL;
foreach_path(get_paths(...))
two_func();
}

This is correct single threaded code, but as soon as you want to
put phase one,two into threads, as they can be parallelized, this
goes horribly wrong.


>
> And in a threaded environment it is even worse; the first thread may
> call initl() to get one set of attributes in "check" and it may be
> about to ask the question, while the second call may call initl()
> and by your definition it will notice they have different sets of
> attributes and returns different "check"?  Either the earlier one is
> leaked, or it gets destroyed even though the first thread hasn't
> finished with "check" it got.
>
> It is perfectly OK to drop "static" from the above example code.
> Then it no longer is insane--it is perfectly normal code whose
> inefficiency cannot be avoided because it wants to do dynamic
> queries.

I think we had a misunderstanding here, as I was just assuming a
single mutex later on.

Re: [PATCH 17/28] attr: expose validity check for attribute names

On 10/11, Stefan Beller wrote:
> On Tue, Oct 11, 2016 at 11:40 AM, Brandon Williams  wrote:
> 
> > Wouldn't a +1 indicate that the attr name is valid while the 0
> > indicates that it is invalid?
> 
> Right.
> 
> >  It looks to me like we are checking each
> > character and if we run into one that doesn't work then we have an error
> > and return 0 indicating that the attr name we are checking is invalid,
> > otherwise the name is valid and we would want to return a +1 indicating
> > a success.
> >
> > Am I understanding the intent of this function properly?
> >
> 
> I will change it to +1; I was just rambling about when you may
> expect -1 in this project. :)

Ah got it :D

-- 
Brandon Williams

Re: [PATCH 17/28] attr: expose validity check for attribute names

On 10/11, Stefan Beller wrote:
> On Tue, Oct 11, 2016 at 10:28 AM, Brandon Williams  wrote:
> > On 10/10, Stefan Beller wrote:
> >> From: Junio C Hamano 
> >> -static int invalid_attr_name(const char *name, int namelen)
> >> +int attr_name_valid(const char *name, size_t namelen)
> >>  {
> >>   /*
> >>* Attribute name cannot begin with '-' and must consist of
> >>* characters from [-A-Za-z0-9_.].
> >>*/
> >>   if (namelen <= 0 || *name == '-')
> >> - return -1;
> >> + return 0;
> >>   while (namelen--) {
> >>   char ch = *name++;
> >>   if (! (ch == '-' || ch == '.' || ch == '_' ||
> >>  ('0' <= ch && ch <= '9') ||
> >>  ('a' <= ch && ch <= 'z') ||
> >>  ('A' <= ch && ch <= 'Z')) )
> >> - return -1;
> >> + return 0;
> >>   }
> >> - return 0;
> >> + return -1;
> >> +}
> >
> > Whats the reason behind returning -1 for a valid attr name vs 1?
> >
> At first I thought the two different return path for -1 are different
> error classes (one being just a minor error compared to the other),
> but they are not, so having the same code there makes sense.
> 
> So I think we could change it to +1 in this instance, as a non zero
> value would just indicate the attr name is not valid, but not necessarily
> an error.

Wouldn't a +1 indicate that the attr name is valid while the 0
indicates that it is invalid? It looks to me like we are checking each
character and if we run into one that doesn't work then we have an error
and return 0 indicating that the attr name we are checking is invalid,
otherwise the name is valid and we would want to return a +1 indicating
a success.

Am I understanding the intent of this function properly?

-- 
Brandon Williams

Allow "git shortlog" to group by committer information

2016-10-11 Thread Linus Torvalds

In some situations you may want to group the commits not by author,
but by committer instead.

For example, when I just wanted to look up what I'm still missing from
linux-next in the current merge window, I don't care so much about who
wrote a patch, as what git tree it came from, which generally boils
down to "who committed it".

So make git shortlog take a "-c" or "--committer" option to switch
grouping to that.

Signed-off-by: Linus Torvalds 
 builtin/shortlog.c | 15 ---
 shortlog.h |  1 +
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index ba0e1154a..c9585d475 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -117,11 +117,15 @@ static void read_from_stdin(struct shortlog *log)
 {
struct strbuf author = STRBUF_INIT;
struct strbuf oneline = STRBUF_INIT;
+   static const char *author_match[2] = { "Author: ", "author " };
+   static const char *committer_match[2] = { "Commit: ", "committer " };
+   const char **match;
 
+   match = log->committer ? committer_match : author_match;
while (strbuf_getline_lf(&author, stdin) != EOF) {
const char *v;
-   if (!skip_prefix(author.buf, "Author: ", &v) &&
-   !skip_prefix(author.buf, "author ", &v))
+   if (!skip_prefix(author.buf, match[0], &v) &&
+   !skip_prefix(author.buf, match[1], &v))
continue;
while (strbuf_getline_lf(&oneline, stdin) != EOF &&
   oneline.len)
@@ -140,6 +144,7 @@ void shortlog_add_commit(struct shortlog *log, struct 
commit *commit)
struct strbuf author = STRBUF_INIT;
struct strbuf oneline = STRBUF_INIT;
struct pretty_print_context ctx = {0};
+   const char *fmt;
 
ctx.fmt = CMIT_FMT_USERFORMAT;
ctx.abbrev = log->abbrev;
@@ -148,7 +153,9 @@ void shortlog_add_commit(struct shortlog *log, struct 
commit *commit)
ctx.date_mode.type = DATE_NORMAL;
ctx.output_encoding = get_log_output_encoding();
 
-   format_commit_message(commit, "%an <%ae>", &author, &ctx);
+   fmt = log->committer ? "%cn <%ce>" : "%an <%ae>";
+
+   format_commit_message(commit, fmt, &author, &ctx);
if (!log->summary) {
if (log->user_format)
pretty_print_commit(&ctx, commit, &oneline);
@@ -238,6 +245,8 @@ int cmd_shortlog(int argc, const char **argv, const char 
*prefix)
int nongit = !startup_info->have_repository;
 
const struct option options[] = {
+   OPT_BOOL('c', "committer", &log.committer,
+N_("Group by committer rather than author")),
OPT_BOOL('n', "numbered", &log.sort_by_number,
 N_("sort output according to the number of commits per 
author")),
OPT_BOOL('s', "summary", &log.summary,
diff --git a/shortlog.h b/shortlog.h
index 5a326c686..5d64cfe92 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -13,6 +13,7 @@ struct shortlog {
int in2;
int user_format;
int abbrev;
+   int committer;
 
char *common_repo_prefix;
int email;

Re: [PATCH 17/28] attr: expose validity check for attribute names

On Tue, Oct 11, 2016 at 11:40 AM, Brandon Williams  wrote:

> Wouldn't a +1 indicate that the attr name is valid while the 0
> indicates that it is invalid?

Right.

>  It looks to me like we are checking each
> character and if we run into one that doesn't work then we have an error
> and return 0 indicating that the attr name we are checking is invalid,
> otherwise the name is valid and we would want to return a +1 indicating
> a success.
>
> Am I understanding the intent of this function properly?
>

I will change it to +1; I was just rambling about when you may
expect -1 in this project. :)

Re: [PATCH 17/28] attr: expose validity check for attribute names

On Tue, Oct 11, 2016 at 10:28 AM, Brandon Williams  wrote:
> On 10/10, Stefan Beller wrote:
>> From: Junio C Hamano 
>> -static int invalid_attr_name(const char *name, int namelen)
>> +int attr_name_valid(const char *name, size_t namelen)
>>  {
>>   /*
>>* Attribute name cannot begin with '-' and must consist of
>>* characters from [-A-Za-z0-9_.].
>>*/
>>   if (namelen <= 0 || *name == '-')
>> - return -1;
>> + return 0;
>>   while (namelen--) {
>>   char ch = *name++;
>>   if (! (ch == '-' || ch == '.' || ch == '_' ||
>>  ('0' <= ch && ch <= '9') ||
>>  ('a' <= ch && ch <= 'z') ||
>>  ('A' <= ch && ch <= 'Z')) )
>> - return -1;
>> + return 0;
>>   }
>> - return 0;
>> + return -1;
>> +}
>
> Whats the reason behind returning -1 for a valid attr name vs 1?
>

Usually we prefer negative values for errors, whereas 0,1,2...
is used for actual return values.

If you're not really interested in the value, but rather "does it work at all?"
you can use it via

if (function() < 0)
die_errno("function has error");

Otherwise you'd do a

int value = function();
if (value < 0)
die(...);

switch(value) {
default:
case 0:
doZeroThing(); break;
case 1:
doOtherThing();
}

At first I thought the two different return path for -1 are different
error classes (one being just a minor error compared to the other),
but they are not, so having the same code there makes sense.

So I think we could change it to +1 in this instance, as a non zero
value would just indicate the attr name is not valid, but not necessarily
an error.

Re: [PATCH 17/28] attr: expose validity check for attribute names

On 10/10, Stefan Beller wrote:
> From: Junio C Hamano 
> -static int invalid_attr_name(const char *name, int namelen)
> +int attr_name_valid(const char *name, size_t namelen)
>  {
>   /*
>* Attribute name cannot begin with '-' and must consist of
>* characters from [-A-Za-z0-9_.].
>*/
>   if (namelen <= 0 || *name == '-')
> - return -1;
> + return 0;
>   while (namelen--) {
>   char ch = *name++;
>   if (! (ch == '-' || ch == '.' || ch == '_' ||
>  ('0' <= ch && ch <= '9') ||
>  ('a' <= ch && ch <= 'z') ||
>  ('A' <= ch && ch <= 'Z')) )
> - return -1;
> + return 0;
>   }
> - return 0;
> + return -1;
> +}

Whats the reason behind returning -1 for a valid attr name vs 1?

-- 
Brandon Williams

Re: [PATCH 28/28] attr: convert to new threadsafe API

Stefan Beller  writes:

>> I find this description a bit confusing.  At least the way I
>> envisioned was that when this piece of code is run by multiple
>> people at the same time,
>>
>> static struct git_attr_check *check = NULL;
>> git_attr_check_initl(&check, ...);
>>
>> we won't waste the "check" by allocated by the first one by
>> overwriting it with another one allocated by the second one.  So
>> "the same arguments" does not come into the picture.  A single
>> variable is either
>>
>>  * already allocated and initialised by the an earlier call to
>>initl() by somebody else, or
>>
>>  * originally NULL when you call initl(), and the implementation
>>makes sure that other people wait while you allocate, initialise
>>and assign it to the variable, or
>>
>>  * originally NULL when you call initl(), but the implementation
>>notices that somebody else is doing the second bullet point
>>above, and you wait until that somebody else finishes and then
>>you return without doing anything (because by that time, "check"
>>is filled by that other party doing the second bullet point
>>above).
>>
>> There is no need to check for "the same arguments".
>>
>
> I see. So we assume that there are no different arguments at the same time,
> i.e. all threads run the same code when it comes to attrs.

Sorry, but I fail to see how you can jump to that conclusion.
Puzzled.

You can have many different callsites (think: hits "git grep" finds)
that call git_attr_check_initl() and they all may be asking for
different set of attributes.  As long as they are using different
"check" instance to receive these sets of attributes, they are OK.

It is insane to use the same "check" variable to receive sets of
attributes for different attributes, be it from the same call or
different one, it is insane to do this:

func(char *anotherattr)
{
static struct git_attr_check *check = NULL;
git_attr_check_initl(&check, "one", anotherattr, ...);

... use "check" to ask question ...
}

The whole point of having a static, initialize-once instance of
"check" is so that initl() can do potentially heavy preparation just
once and keep reusing it.  Allowing a later caller of func() to pass
a value of anotherattr that is different from the one used in the
first call that did cause initl() to allocate-initialise-assign to
"check" is simply insane, even there is no threading issue.

And in a threaded environment it is even worse; the first thread may
call initl() to get one set of attributes in "check" and it may be
about to ask the question, while the second call may call initl()
and by your definition it will notice they have different sets of
attributes and returns different "check"?  Either the earlier one is
leaked, or it gets destroyed even though the first thread hasn't
finished with "check" it got.

It is perfectly OK to drop "static" from the above example code.
Then it no longer is insane--it is perfectly normal code whose
inefficiency cannot be avoided because it wants to do dynamic
queries.

Re: [PATCH 28/28] attr: convert to new threadsafe API

Stefan Beller  writes:

> This revamps the API of the attr subsystem to be thread safe.
> Before we had the question and its results in one struct type.
> The typical usage of the API was
>
> static struct git_attr_check;

I think you meant "*check" at the end, perhaps?

>
> if (!check)
> check = git_attr_check_initl("text", NULL);
>
> git_check_attr(path, check);
> act_on(check->value[0]);
>
> * While the check for attributes to be questioned only need to
>   be initalized once as that part will be read only after its
>   initialisation, the answer may be different for each path.
>   Because of that we need to decouple the check and the answer,
>   such that each thread can obtain an answer for the path it
>   is currently processing.

Yes, it is good to separate questions and answers.  I think answers
should be owned by the caller, though.  I do not think of a good
reason why you want to make it impossible to do something like this:

struct git_attr_check_result *result_1 = ...allocate...;
struct git_attr_check_result *result_2 = ...allocate...;

loop {
struct strbuf path = next_path();
git_check_attr(result_1, path.buf, check_1);
if (strbuf_strip_suffix(&path, ".c")) {
strbuf_addstr(&path, ".h");
git_check_attr(result_2, path.buf, check_2);
do something using result_1[] and result_2[];
} else {
do something using result_1[];
}
}

Do we already have a design of the "result" thing that is concrete
enough to require us to declare that the result is owned by the
implementation and asking another question has to destroy the answer
to the previous question?  Otherwise, I'd rather not to see us make
the API unnecessarily hard to use.  While I do want us to avoid
overengineering for excessive flexibility, I somehow feel "you
cannot control the lifetime of the result, it is owned by the
subsystem" falls the other side of the line.

> diff --git a/Documentation/technical/api-gitattributes.txt 
> b/Documentation/technical/api-gitattributes.txt
> index 92fc32a..2059aab 100644
> --- a/Documentation/technical/api-gitattributes.txt
> +++ b/Documentation/technical/api-gitattributes.txt
> @@ -8,6 +8,18 @@ attributes to set of paths.
>  Data Structure
>  --
>  
> +extern struct git_attr *git_attr(const char *);
> +
> +/*
> + * Return the name of the attribute represented by the argument.  The
> + * return value is a pointer to a null-delimited string that is part
> + * of the internal data structure; it should not be modified or freed.
> + */
> +extern const char *git_attr_name(const struct git_attr *);
> +
> +extern int attr_name_valid(const char *name, size_t namelen);
> +extern void invalid_attr_name_message(struct strbuf *, const char *, int);
> +
>  `struct git_attr`::
>  
>   An attribute is an opaque object that is identified by its name.
> @@ -16,15 +28,17 @@ Data Structure
>   of no interest to the calling programs.  The name of the
>   attribute can be retrieved by calling `git_attr_name()`.
>  
> -`struct git_attr_check_elem`::
> -
> - This structure represents one attribute and its value.
> -
>  `struct git_attr_check`::
>  
> - This structure represents a collection of `git_attr_check_elem`.
> + This structure represents a collection of `struct git_attrs`.
>   It is passed to `git_check_attr()` function, specifying the
> - attributes to check, and receives their values.
> + attributes to check, and receives their values into a corresponding
> + `struct git_attr_result`.
> +
> +`struct git_attr_result`::
> +
> + This structure represents a collection of results to its
> + corresponding `struct git_attr_check`, that has the same order.
>  
>  
>  Attribute Values
> @@ -56,16 +70,22 @@ Querying Specific Attributes
>  * Prepare `struct git_attr_check` using git_attr_check_initl()
>function, enumerating the names of attributes whose values you are
>interested in, terminated with a NULL pointer.  Alternatively, an
> -  empty `struct git_attr_check` can be prepared by calling
> -  `git_attr_check_alloc()` function and then attributes you want to
> -  ask about can be added to it with `git_attr_check_append()`
> -  function.
> +  empty `struct git_attr_check` as alloced by git_attr_check_alloc()

"allocated", not "alloced".

> +  can be prepared by calling `git_attr_check_alloc()` function and
> +  then attributes you want to ask about can be added to it with
> +  `git_attr_check_append()` function.
> +  git_attr_check_initl is thread safe, i.e. you can call it

Spell it `git_attr_check_initl()` for consistency.

> +  from different threads at the same time; internally however only one
> +  call at a time is processed. If the calls from different threads have
> +  the same arguments, the returned `git_attr_check` may be t

Re: [PATCH 28/28] attr: convert to new threadsafe API

On Tue, Oct 11, 2016 at 10:40 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> This revamps the API of the attr subsystem to be thread safe.
>> Before we had the question and its results in one struct type.
>> The typical usage of the API was
>>
>> static struct git_attr_check;
>
> I think you meant "*check" at the end, perhaps?
>
>>
>> if (!check)
>> check = git_attr_check_initl("text", NULL);
>>
>> git_check_attr(path, check);
>> act_on(check->value[0]);
>>
>> * While the check for attributes to be questioned only need to
>>   be initalized once as that part will be read only after its
>>   initialisation, the answer may be different for each path.
>>   Because of that we need to decouple the check and the answer,
>>   such that each thread can obtain an answer for the path it
>>   is currently processing.
>
> Yes, it is good to separate questions and answers.  I think answers
> should be owned by the caller, though.  I do not think of a good
> reason why you want to make it impossible to do something like this:
>
> struct git_attr_check_result *result_1 = ...allocate...;
> struct git_attr_check_result *result_2 = ...allocate...;
>
> loop {
> struct strbuf path = next_path();
> git_check_attr(result_1, path.buf, check_1);
> if (strbuf_strip_suffix(&path, ".c")) {
> strbuf_addstr(&path, ".h");
> git_check_attr(result_2, path.buf, check_2);
> do something using result_1[] and result_2[];
> } else {
> do something using result_1[];
> }
> }
>
> Do we already have a design of the "result" thing that is concrete
> enough to require us to declare that the result is owned by the
> implementation and asking another question has to destroy the answer
> to the previous question?  Otherwise, I'd rather not to see us make
> the API unnecessarily hard to use.  While I do want us to avoid
> overengineering for excessive flexibility, I somehow feel "you
> cannot control the lifetime of the result, it is owned by the
> subsystem" falls the other side of the line.

True, we had that issue for other APIs (IIRC path related things,
with 4 static buffers that round robin). I did not like that design decision,
but I felt it was okay, as the above did not occur to me.

In the case above, we could just copy the result_1->values and
then re-use the result_1, but I agree that this may be somewhat error prone
if you're not familiar with the decisions in this series.

So in case of the caller owning the result, we could pull the static
trick for now
and only use a different approach when we use it in actual threaded code, i.e.
the code in convert.c could become:

static struct git_attr_check *check;
static struct git_attr_result *result;

if (!check) {
check = git_attr_check_initl("crlf", "ident",
"filter", "eol", "text", NULL);
result = git_attr_result_alloc(5);
user_convert_tail = &user_convert;
git_config(read_convert_config, NULL);
}

if (!git_check_attr(path, check, result)) {
...

>> +  empty `struct git_attr_check` as alloced by git_attr_check_alloc()
>
> "allocated", not "alloced".

ok.

>
>> +  can be prepared by calling `git_attr_check_alloc()` function and
>> +  then attributes you want to ask about can be added to it with
>> +  `git_attr_check_append()` function.
>> +  git_attr_check_initl is thread safe, i.e. you can call it
>
> Spell it `git_attr_check_initl()` for consistency.

ok.

>
>> +  from different threads at the same time; internally however only one
>> +  call at a time is processed. If the calls from different threads have
>> +  the same arguments, the returned `git_attr_check` may be the same.
>
> I find this description a bit confusing.  At least the way I
> envisioned was that when this piece of code is run by multiple
> people at the same time,
>
> static struct git_attr_check *check = NULL;
> git_attr_check_initl(&check, ...);
>
> we won't waste the "check" by allocated by the first one by
> overwriting it with another one allocated by the second one.  So
> "the same arguments" does not come into the picture.  A single
> variable is either
>
>  * already allocated and initialised by the an earlier call to
>initl() by somebody else, or
>
>  * originally NULL when you call initl(), and the implementation
>makes sure that other people wait while you allocate, initialise
>and assign it to the variable, or
>
>  * originally NULL when you call initl(), but the implementation
>notices that somebody else is doing the second bullet point
>above, and you wait until that somebody else finishes and then
>you return without doing anything (because by that time, "check"
>is filled by that other party doing the second bullet point
>above).
>
> There is no need to

Re: [PATCH 2/2] reset: support the --stdin option

Johannes Schindelin  writes:

> + if (read_from_stdin) {
> + strbuf_getline_fn getline_fn = nul_term_line ?
> + strbuf_getline_nul : strbuf_getline_lf;
> + int flags = PATHSPEC_PREFER_FULL |
> + PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP;
> + struct strbuf buf = STRBUF_INIT;
> + struct strbuf unquoted = STRBUF_INIT;
> +
> + if (patch_mode)
> + die(_("--stdin is incompatible with --patch"));
> +
> + if (pathspec.nr)
> + die(_("--stdin is incompatible with path arguments"));
> +
> + if (patch_mode)
> + flags |= PATHSPEC_PREFIX_ORIGIN;

Didn't we already die above under that mode?

> + while (getline_fn(&buf, stdin) != EOF) {
> + if (!nul_term_line && buf.buf[0] == '"') {
> + strbuf_reset(&unquoted);
> + if (unquote_c_style(&unquoted, buf.buf, NULL))
> + die(_("line is badly quoted"));
> + strbuf_swap(&buf, &unquoted);
> + }
> + ALLOC_GROW(stdin_paths, stdin_nr + 1, stdin_alloc);
> + stdin_paths[stdin_nr++] = xstrdup(buf.buf);
> + strbuf_reset(&buf);
> + }
> + strbuf_release(&unquoted);
> + strbuf_release(&buf);
> +
> + ALLOC_GROW(stdin_paths, stdin_nr + 1, stdin_alloc);
> + stdin_paths[stdin_nr++] = NULL;

It makes sense to collect, but...

> + parse_pathspec(&pathspec, 0, flags, prefix,
> +(const char **)stdin_paths);

...letting them be used as if they are pathspec is wrong when
stdin_paths[] contain wildcard, isn't it?  

I think flags |= PATHSPEC_LITERAL_PATH can help fixing it.  0/2 said
this mimicks checkout-index and I think it should by not treating
the input as wildcarded patterns (i.e. "echo '*.c' | reset --stdin"
shouldn't be the way to reset all .c files --- that's something we
would want to add to the test, I guess).

Thanks.

Re: [PATCH 28/28] attr: convert to new threadsafe API

On 10/10, Stefan Beller wrote:
>   be initalized once as that part will be read only after its
   initialized
>   initialisation, the answer may be different for each path.
should this be the US spelling 'initialization'?

-- 
Brandon Williams

Re: [PATCH v4 0/7] Add --format to tag verification

2016-10-11 Thread Santiago Torres

Hi,

I noticed there were no replies for this thread. I was curious if it got
buried because I sent it on the Friday evening before a long weekend.

I don't mean to pressure or anything.

Thanks!
-Santiago.


On Fri, Oct 07, 2016 at 05:07:14PM -0400, santi...@nyu.edu wrote:
> From: Santiago Torres 
> 
> This is the fourth iteration of the series in [1][2][3], which comes as a
> result of the discussion in [4]. The main goal of this patch series is to 
> bring
> --format to git tag verification so that upper-layer tools can inspect the
> content of a tag and make decisions based on those contents.
> 
> In this re-woll we:
> 
> * Fixed the author fields and signed off by's throughout the patch
>   series
> 
> * Added two more patches with unit tests to ensure the format specifier
>   behaves as expected
> 
> * Fixed a missing initialization for the format specifier in verify-tag which
>   caused a crash.
> 
> * Fixed an outdated git commit message that had the previous name of a
>   function definition.
> 
> Thanks,
> -Santiago
> 
> [1] http://public-inbox.org/git/20160930221806.3398-1-santi...@nyu.edu/
> [2] http://public-inbox.org/git/20160922185317.349-1-santi...@nyu.edu/
> [3] http://public-inbox.org/git/20160926224233.32702-1-santi...@nyu.edu/
> [4] http://public-inbox.org/git/20160607195608.16643-1-santi...@nyu.edu/
> 
> 
> Lukas Puehringer (4):
>   tag: add format specifier to gpg_verify_tag
>   gpg-interface, tag: add GPG_VERIFY_QUIET flag
>   ref-filter: add function to print single ref_array_item
>   builtin/tag: add --format argument for tag -v
> 
> Santiago Torres (3):
>   builtin/verify-tag: add --format to verify-tag
>   t/t7030-verify-tag: Add --format specifier tests
>   t/t7004-tag: Add --format specifier tests
> 
>  Documentation/git-tag.txt|  2 +-
>  Documentation/git-verify-tag.txt |  2 +-
>  builtin/tag.c| 34 +++---
>  builtin/verify-tag.c | 13 +++--
>  gpg-interface.h  |  1 +
>  ref-filter.c | 10 ++
>  ref-filter.h |  3 +++
>  t/t7004-tag.sh   | 16 
>  t/t7030-verify-tag.sh| 16 
>  tag.c| 22 +++---
>  tag.h|  4 ++--
>  11 files changed, 99 insertions(+), 24 deletions(-)
> 
> -- 
> 2.10.0
> 


signature.asc
Description: PGP signature

Re: [PATCH 11/28] attr: (re)introduce git_check_attr() and struct git_attr_check

Brandon Williams  writes:

> On 10/10, Stefan Beller wrote:
>> From: Junio C Hamano 
>> 
>> A common pattern to check N attributes for many paths is to
>> 
>>  (1) prepare an array A of N git_attr_check_elem items;
>>  (2) call git_attr() to intern the N attribute names and fill A;
>
> Does the word 'intern' here mean internalize?  It took me a few reads to
> stop picturing college students running around an office :)

The verb comes from Lisp world where you "intern" a string to make a
symbol.

Re: [PATCH 26/28] attr: make git_attr_counted static

On 10/10, Stefan Beller wrote:
> It's not used outside the attr code, so let's keep it private.
> 
> Change-Id: I0d15e0f2ea944b31d68b9cf1a4edecac0ca2170d

Looks like you forgot to remove this :)

-- 
Brandon Williams

Re: [PATCH v3 07/25] sequencer: completely revamp the "todo" script parsing

Johannes Schindelin  writes:

> In the end, I decided to actually *not* use strbuf_add_unique_abbrev()
> here because it really makes the code very much too ugly after the
> introduction of short_commit_name():

It's perfectly fine not to use that function when it does not make
sense; we shouldn't use it (or anything for that matter) just for
the sake of using it.

Re: [PATCH 11/28] attr: (re)introduce git_check_attr() and struct git_attr_check

On 10/10, Stefan Beller wrote:
> From: Junio C Hamano 
> 
> A common pattern to check N attributes for many paths is to
> 
>  (1) prepare an array A of N git_attr_check_elem items;
>  (2) call git_attr() to intern the N attribute names and fill A;

Does the word 'intern' here mean internalize?  It took me a few reads to
stop picturing college students running around an office :)

-- 
Brandon Williams

Re: [PATCH v4 4/4] mergetool: honor -O

David Aguilar  writes:

>> I only see 4/4 in v4; am I correct to assume that 1-3/4 of v4 are
>> the same as their counterparts in v3?
>
> Yes, 1-3 are unchanged since v3.
> Thanks for checking,

I'll queue these four with Reviewed-by's from j6t.

Thanks, both.

[PATCH] gitk: Fix Japanese translation for "marked commit"

2016-10-11 Thread Satoshi Yasushima

Signed-off-by: Satoshi Yasushima 
---
 po/ja.po | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/po/ja.po b/po/ja.po
index f143753..510306b 100644
--- a/po/ja.po
+++ b/po/ja.po
@@ -2,16 +2,17 @@
 # Copyright (C) 2005-2015 Paul Mackerras
 # This file is distributed under the same license as the gitk package.
 #
-# YOKOTA Hiroshi , 2015.
 # Mizar , 2009.
 # Junio C Hamano , 2009.
+# YOKOTA Hiroshi , 2015.
+# Satoshi Yasushima , 2016.
 msgid ""
 msgstr ""
 "Project-Id-Version: gitk\n"
 "Report-Msgid-Bugs-To: \n"
 "POT-Creation-Date: 2015-05-17 14:32+1000\n"
 "PO-Revision-Date: 2015-11-12 13:00+0900\n"
-"Last-Translator: YOKOTA Hiroshi \n"
+"Last-Translator: Satoshi Yasushima \n"
 "Language-Team: Japanese\n"
 "Language: ja\n"
 "MIME-Version: 1.0\n"
@@ -314,11 +315,11 @@ msgstr "マークを付けたコミットと比較する"
 
 #: gitk:2630 gitk:2641
 msgid "Diff this -> marked commit"
-msgstr "これと選択したコミットのdiffを見る"
+msgstr "これとマークを付けたコミットのdiffを見る"
 
 #: gitk:2631 gitk:2642
 msgid "Diff marked commit -> this"
-msgstr "選択したコミットとこれのdiffを見る"
+msgstr "マークを付けたコミットとこれのdiffを見る"
 
 #: gitk:2632
 msgid "Revert this commit"
-- 
2.10.0.windows.1

Re: [PATCH v3 07/25] sequencer: completely revamp the "todo" script parsing

Johannes Schindelin  writes:

>> I am personally fine with this line; two things come to mind:
>> 
>>  - This would work just fine as-is with Linus's change to turn
>>DEFAULT_ABBREV to -1.
>> 
>>  - It appears that it is more fashionable to use
>>strbuf_add_unique_abbrev() these days.
>
> Right, I actually looked at this place when I tried to decide where I
> could use that function. Somehow I thought I'd not break up the flow here.
>
> But since you asked so nicely,...

When I say "I am fine", I am not asking you to change anything.

Some of the places that have been updated recently to use
strbuf_add_unique_abbrev() in other topics did improve the
readability of the code, but I do not think it would universally
be true.

Re: [PATCH] clean up confusing suggestion for commit references

2016-10-11 Thread Heiko Voigt

On Mon, Oct 10, 2016 at 12:14:14PM -0700, Junio C Hamano wrote:
> Jeff King  writes:
> 
> > On Mon, Oct 10, 2016 at 11:24:01AM -0700, Junio C Hamano wrote:
> >
> >> I no longer have preference either way myself, even though I was in
> >> favor of no-quotes simply because I had an alias to produce that
> >> format and was used to it.
> >
> > I'll admit that I don't care _that_ much and am happy to leave it up to
> > individual authors, as long as nobody quotes SubmittingPatches at me as
> > some kind of gospel when I use the no-quotes form.
> 
> ;-).  
> 
> I just do not want to hear "gitk (or was it git-gui) produces quoted
> form, why are you recommending no-quoted form in SubmittingPatches?"
> 
> I'd say "use common sense; sometimes it is less confusing to read
> without quotes and it is perfectly OK to do so if that is the case".

I do not care about which format it should be either. I just wanted to
be clear about whatever should be used. Since it seems we will allow
both, I am also fine with leaving the description as it is ;-)

Cheers Heiko

Re: [PATCH 1/2] submodule add: extend force flag to add existing repos

2016-10-11 Thread Heiko Voigt

Hi,

On Fri, Oct 07, 2016 at 10:25:04AM -0700, Stefan Beller wrote:
> On Fri, Oct 7, 2016 at 5:52 AM, Heiko Voigt  wrote:
> > On Thu, Oct 06, 2016 at 01:11:20PM -0700, Junio C Hamano wrote:
> >> Stefan Beller  writes:
> >>
> >> > Currently the force flag in `git submodule add` takes care of possibly
> >> > ignored files or when a name collision occurs.
> >> >
> >> > However there is another situation where submodule add comes in handy:
> >> > When you already have a gitlink recorded, but no configuration was
> >> > done (i.e. no .gitmodules file nor any entry in .git/config) and you
> >> > want to generate these config entries. For this situation allow
> >> > `git submodule add` to proceed if there is already a submodule at the
> >> > given path in the index.
> >
> > Is it important that the submodule is in the index?
> 
> If it is not in the index, it already works.

Ah ok I was not aware of that, sorry.

> > How about worktree?
> > From the index entry alone we can not deduce the values anyway.
> 
> Right, but as of now this is the only show stopper, i.e.
> * you have an existing repo? -> fine, it works with --force
> * you even ignored that repo -> --force knows how to do it.
> * you already have a gitlink -> Sorry, you're out of luck.
> 
> So that is why I stressed index in this commit message, as it is only about 
> this
> case.

Forget what I wrote. As said above I was not aware that there is only an
error when it is already in the index.

> > [1] 
> > http://public-inbox.org/git/%3c20160916141143.ga47...@book.hvoigt.net%3E/
> 
> Current situation:
> 
> > clone the submodule into a directory
> > git submodule add --force
> > git commit everything
> 
> works fine, but:
> 
> > clone the submodule into a directory
> > git add 
> > git commit  -m "Add submodule"
> > # me: "Oh crap! I did forget the configuration."
> > git submodule add --force  
> > # Git: "It already exists in the index, I am not going to produce the 
> > config for you."
> 
> The last step is changed with this patch, as
> it will just work fine then.

Thanks.

Cheers Heiko

Re: [PATCH v3 05/25] sequencer: eventually release memory allocated for the option values

Junio C Hamano  writes:

> This certainly is good, but I wonder if a new variant of OPT_STRING
> and OPTION_STRING that does the strdup for you, something along the
> lines of ...
> ... may make it even more pleasant to use?  Only for two fields in
> this patch that may probably be an overkill, but we may eventually 
> benefit from such an approach when we audit and plug leaks in
> parse-options users.  I dunno.

After sleeping on it, I do think this is an overkill.  The pattern I
would expect for the most normal (boring) code to use is rather:

struct app_specific app;
const char *opt_x = NULL;
struct option options[] = {
...
OPT_STRING(0, "xopt", &opt_x, N_("x option"), ...),
...
OPT_END()
};

parse_options(ac, av, prefix, options, ...);
app.x_field = xstrdup_or_null(opt_x);
... other values set to app's field based on
... not just command line options but from
... other sources.

The only reason why the OPT_STRDUP appeared convenient was because
options[] element happened to use a field in the structure directly.
The patch under discussion does an equivalent of

app.x_field = xstrdup_or_null(opt_x);

but the "opt_x" happens to be the same "app.x_field" in this case,
so in that sense, it follows the normal and boring pattern.

The "struct app_specific" may not even exist in the same scope as
the caller of parse_options(), but may have to be initialized in a
function that is three-level deep in the callchain, with opt_x
variable passed through as a parameter.  So OPT_STRDUP may not be a
bad or horrible idea, but it is not such a great one, either.

Re: Formatting problem send_mail in version 2.10.0

2016-10-11 Thread Matthieu Moy

Larry Finger  writes:

> That added information at the end is intended to be passed on to the
> stable group. In this case, the patch needs to be applied to kernel
> versions 4.8 and later.

OK, but where do people fetch this information from?

When you use git send-email, the content of the Cc: trailers ends up
both in the body of the message and in the Cc: field of the same
message.

If you need the mention to appear in the body of the message, then using
parenthesis is fine: git send-email won't remove it (more precisely,
"send-email" will call "format-patch" which won't remove it).

Not an objection to patching send-email anyway, but if there's a simple
and RFC-compliant way to do what you're looking for, we can as well use
it (possibly in addition to patching).

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

[PATCH 2/2] reset: support the --stdin option

Just like with other Git commands, this option makes it read the paths
from the standard input. It comes in handy when resetting many, many
paths at once and wildcards are not an option (e.g. when the paths are
generated by a tool).

Note: to keep things simple, we first parse the entire list and perform
the actual reset action only in a second phase.

Signed-off-by: Johannes Schindelin 
---
 Documentation/git-reset.txt | 10 +++-
 builtin/reset.c | 56 +++--
 t/t7107-reset-stdin.sh  | 33 ++
 3 files changed, 96 insertions(+), 3 deletions(-)
 create mode 100755 t/t7107-reset-stdin.sh

diff --git a/Documentation/git-reset.txt b/Documentation/git-reset.txt
index 25432d9..533ef69 100644
--- a/Documentation/git-reset.txt
+++ b/Documentation/git-reset.txt
@@ -8,7 +8,7 @@ git-reset - Reset current HEAD to the specified state
 SYNOPSIS
 
 [verse]
-'git reset' [-q] [] [--] ...
+'git reset' [-q] [--stdin [-z]] [] [--] ...
 'git reset' (--patch | -p) [] [--] [...]
 'git reset' [--soft | --mixed [-N] | --hard | --merge | --keep] [-q] []
 
@@ -97,6 +97,14 @@ OPTIONS
 --quiet::
Be quiet, only report errors.
 
+--stdin::
+   Instead of taking list of paths from the command line,
+   read list of paths from the standard input.  Paths are
+   separated by LF (i.e. one path per line) by default.
+
+-z::
+   Only meaningful with `--stdin`; paths are separated with
+   NUL character instead of LF.
 
 EXAMPLES
 
diff --git a/builtin/reset.c b/builtin/reset.c
index c04ac07..018735f 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -21,10 +21,12 @@
 #include "parse-options.h"
 #include "unpack-trees.h"
 #include "cache-tree.h"
+#include "strbuf.h"
+#include "quote.h"
 
 static const char * const git_reset_usage[] = {
N_("git reset [--mixed | --soft | --hard | --merge | --keep] [-q] 
[]"),
-   N_("git reset [-q] [] [--] ..."),
+   N_("git reset [-q] [--stdin [-z]] [] [--] ..."),
N_("git reset --patch [] [--] [...]"),
NULL
 };
@@ -267,7 +269,9 @@ static int reset_refs(const char *rev, const struct 
object_id *oid)
 int cmd_reset(int argc, const char **argv, const char *prefix)
 {
int reset_type = NONE, update_ref_status = 0, quiet = 0;
-   int patch_mode = 0, unborn;
+   int patch_mode = 0, nul_term_line = 0, read_from_stdin = 0, unborn;
+   char **stdin_paths = NULL;
+   int stdin_nr = 0, stdin_alloc = 0;
const char *rev;
struct object_id oid;
struct pathspec pathspec;
@@ -286,6 +290,10 @@ int cmd_reset(int argc, const char **argv, const char 
*prefix)
OPT_BOOL('p', "patch", &patch_mode, N_("select hunks 
interactively")),
OPT_BOOL('N', "intent-to-add", &intent_to_add,
N_("record only the fact that removed paths 
will be added later")),
+   OPT_BOOL('z', NULL, &nul_term_line,
+   N_("paths are separated with NUL character")),
+   OPT_BOOL(0, "stdin", &read_from_stdin,
+   N_("read paths from ")),
OPT_END()
};
 
@@ -295,6 +303,44 @@ int cmd_reset(int argc, const char **argv, const char 
*prefix)
PARSE_OPT_KEEP_DASHDASH);
parse_args(&pathspec, argv, prefix, patch_mode, &rev);
 
+   if (read_from_stdin) {
+   strbuf_getline_fn getline_fn = nul_term_line ?
+   strbuf_getline_nul : strbuf_getline_lf;
+   int flags = PATHSPEC_PREFER_FULL |
+   PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP;
+   struct strbuf buf = STRBUF_INIT;
+   struct strbuf unquoted = STRBUF_INIT;
+
+   if (patch_mode)
+   die(_("--stdin is incompatible with --patch"));
+
+   if (pathspec.nr)
+   die(_("--stdin is incompatible with path arguments"));
+
+   if (patch_mode)
+   flags |= PATHSPEC_PREFIX_ORIGIN;
+
+   while (getline_fn(&buf, stdin) != EOF) {
+   if (!nul_term_line && buf.buf[0] == '"') {
+   strbuf_reset(&unquoted);
+   if (unquote_c_style(&unquoted, buf.buf, NULL))
+   die(_("line is badly quoted"));
+   strbuf_swap(&buf, &unquoted);
+   }
+   ALLOC_GROW(stdin_paths, stdin_nr + 1, stdin_alloc);
+   stdin_paths[stdin_nr++] = xstrdup(buf.buf);
+   strbuf_reset(&buf);
+   }
+   strbuf_release(&unquoted);
+   strbuf_release(&buf);
+
+   ALLOC_GROW(stdin_paths, stdin_nr + 1, stdin_alloc);
+   stdin_paths[stdin_nr++] = NULL;
+   parse_pathspec(&pathspec, 0, flags, p

[PATCH 0/2] Support `git reset --stdin`

This feature was missing, and made it cumbersome for third-party
tools to reset a lot of paths in one go.

Support for --stdin has been added, following builtin/checkout-index.c's
example.


Johannes Schindelin (2):
  reset: fix usage
  reset: support the --stdin option

 Documentation/git-reset.txt | 10 +++-
 builtin/reset.c | 56 +++--
 t/t7107-reset-stdin.sh  | 33 ++
 3 files changed, 96 insertions(+), 3 deletions(-)
 create mode 100755 t/t7107-reset-stdin.sh


base-commit: 8a36cd87b7c85a651ab388d403629865ffa3ba0d
Published-As: https://github.com/dscho/git/releases/tag/reset-stdin-v1
Fetch-It-Via: git fetch https://github.com/dscho/git reset-stdin-v1

-- 
2.10.1.513.g00ef6dd

[PATCH 1/2] reset: fix usage

The  parameter is actually optional (see man page).

Signed-off-by: Johannes Schindelin 
---
 builtin/reset.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index 5aa8607..c04ac07 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -24,7 +24,7 @@
 
 static const char * const git_reset_usage[] = {
N_("git reset [--mixed | --soft | --hard | --merge | --keep] [-q] 
[]"),
-   N_("git reset [-q]  [--] ..."),
+   N_("git reset [-q] [] [--] ..."),
N_("git reset --patch [] [--] [...]"),
NULL
 };
-- 
2.10.1.513.g00ef6dd

Re: Formatting problem send_mail in version 2.10.0

On Tue, Oct 11, 2016 at 09:39:58AM +0200, Matthieu Moy wrote:

> >> I can't reproduce the problem with this simple setup:
> >> 
> >>git init
> >>echo content >file && git add file
> >>git commit -F- <<-\EOF
> >>the subject
> >> 
> >>the body
> >> 
> >>Cc: Stable  [4.8+]
> 
> Is this RFC2822 compliant (https://tools.ietf.org/html/rfc2822)? Not an
> expert of the norm, but my understanding is that you're allowed to use
> either "Name " (name-addr) or a...@domain.com
> (addr-spec), and that comments are allowed within parenthesis like
> "Stable  (4.8+)".

I'm not sure. I don't recall seeing anything like it in the wild before,
but I find it interesting that we behave differently than Mail::Address
(which I generally assume to adhere to a mix of spec and common
practice). I couldn't find anything relevant in rfc2822.

> What is this [4.8+] supposed to mean?
> 
> The guilty function is parse_mailboxes in perl/Git.pm. It should be
> rather easy to modify it but I need to understand the spec before I can
> try to implement anything.

It seems to be syntactically invalid as an rfc2822 address. If it's in
common use on trailer lines[1], I think the only sane things are to drop
it, or to stick it in the name. Between the two, I'd argue for the
latter, as that matches what Git historically has done.

I also found it interesting that:

  Cc: Stable [4.8+] 

ends up as:

  Cc: "Stable [ v4 . 8+ ]" 

I think this is also syntactically invalid as an rfc2822 address, but we
have a habit in Git of treating single-line "name " in a pretty
lax manner, and just letting any character except "<" exist in the name
field. I wonder if we should do something similar here.

-Peff

[1] Running `git log | grep -i '^ *cc:' | grep '\['` on linux.git
shows that it is a common pattern there, though there are other uses
of brackets that probably would not want to include their contents
in the name.

It also looks like:

  Cc: sta...@vger.kernel.org # 4.8+

is a common pattern.

So I suspect we really are in the realm of micro-formats here, and
it is less about what rfc2822 says, and what people would find it
useful for send-email to do with these bits after the address (and
possibly what other people's scripts do with them).

Re: git merge deletes my changes

2016-10-11 Thread Jakub Narębski

W dniu 10.10.2016 o 19:52, Paul Smith pisze:
> On Mon, 2016-10-10 at 10:19 +, Eduard Egorov wrote:
>> # ~/gitbuild/git-2.10.1/git merge -s subtree --squash ceph_ansible
>>
>> Can somebody confirm this please? Doesn't "merge -s subtree" really
>> merges branches?
> 
> I think possibly you're not fully understanding what the --squash flag
> does... that's what's causing your issue here, not the "-s" option.
> 
> A squash merge takes the commits that would be merged from the origin
> branch and squashes them into a single patch and applies them to the
> current branch as a new commit... but this new commit is not a merge
> commit (that is, when you look at it with "git show" etc. the commit
> will have only one parent, not two--or more--parents like a normal merge
> commit).
> 
> Basically, it's syntactic sugar for a diff plus patch operation plus
> some Git goodness wrapped around it to make it easier to use.

Actually this is full merge + commit surgery (as if you did merge with
--no-commit, then deleted MERGE_HEAD); the state of worktree is as if
it were after a merge.

> 
> But ultimately once you're done, Git has no idea that this new commit
> has any relationship whatsoever to the origin branch.  So the next time
> you merge, Git doesn't know that there was a previous merge and it will
> try to merge everything from scratch rather than starting at the
> previous common merge point.
> 
> So either you'll have to use a normal, non-squash merge, or else you'll
> have to tell Git by hand what the previous common merge point was (as
> Jeff King's excellent email suggests).  Or else, you'll have to live
> with this behavior.

The `git subtree` command (from contrib) allows yet another way: it
squashes *history* of merged subproject (as if with interactive rebase
'squash'), then merges this squash commit.

Now I know why this feature is here...
-- 
Jakub Narębski

Re: Formatting problem send_mail in version 2.10.0

2016-10-11 Thread Larry Finger


On 10/11/2016 02:39 AM, Matthieu Moy wrote:

Jeff King  writes:


[+cc authors of b1c8a11, which regressed this case; I'll quote liberally
 to give context]

On Mon, Oct 10, 2016 at 05:48:56PM -0400, Jeff King wrote:


I can't reproduce the problem with this simple setup:

git init
echo content >file && git add file
git commit -F- <<-\EOF
the subject

the body

Cc: Stable  [4.8+]


Is this RFC2822 compliant (https://tools.ietf.org/html/rfc2822)? Not an
expert of the norm, but my understanding is that you're allowed to use
either "Name " (name-addr) or a...@domain.com
(addr-spec), and that comments are allowed within parenthesis like
"Stable  (4.8+)".

What is this [4.8+] supposed to mean?

The guilty function is parse_mailboxes in perl/Git.pm. It should be
rather easy to modify it but I need to understand the spec before I can
try to implement anything.


That added information at the end is intended to be passed on to the stable 
group. In this case, the patch needs to be applied to kernel versions 4.8 and later.


Placing the info inside parentheses causes it to be dropped from the outgoing 
mail as follows:


(body) Adding cc: Stable  (4.8+) from line 'Cc: Stable 
 (4.8+)'

--snip--
Cc: Stable ,

Larry

Re: Bug? git worktree fails with master on bare repo

2016-10-11 Thread Kevin Daudt

On Mon, Oct 10, 2016 at 08:06:47AM -0500, Michael Tutty wrote:
> 
> > If source repo's HEAD is "master", I got the same behavior (worktree add 
> > fails)
> 
> So if it's possible for a bare repo to have HEAD pointing at master,
> is there a safe way for me to change this (e.g., as a cleanup step
> before doing my actual merge process)?

You can change where HEAD points to in a bare repositor with `git
symbolic-ref HEAD `, but note that this changes what branch gets
checked out when cloning a repository. 

Where users would normally check out master by default, after changing
what HEAD points to, it would be that branch (or detached when HEAD
doesn't even point at a branch).

Re: [PATCH] contrib: add credential helper for libsecret

On Mon, 2016-10-10 at 16:46 -0400, Jeff King wrote:
> On Mon, Oct 10, 2016 at 10:20:50PM +0200, Dennis Kaarsemaker wrote:
> 
> > On Sun, 2016-10-09 at 15:34 +0300, Mantas Mikulėnas wrote:
> > > This is based on the existing gnome-keyring helper, but instead of
> > > libgnome-keyring (which was specific to GNOME and is deprecated), it
> > > uses libsecret which can support other implementations of XDG Secret
> > > Service API.
> > > 
> > > Passes t0303-credential-external.sh.
> > 
> > When setting credential.helper to this helper, I get the following output:
> > 
> > $ git clone https://private-repo-url-removed private
> > Cloning into 'private'...
> > /home/dennis/code/git/contrib/credential/libsecret/ get: 1: 
> > /home/dennis/code/git/contrib/credential/libsecret/ get: 
> > /home/dennis/code/git/contrib/credential/libsecret/: Permission denied
> > 
> > Looks suboptimal. Am I holding it wrong?
> 
> That looks like a directory name in your error message. How did you set
> up credential.helper?

Doh.

I had done git config --global credential.helper 
~/code/git/contrib/credential/libsecret/

After doing it properly (actual path to the binary), it works just
fine. The error message above is slightly suboptimal, but there's no
solution for pebkac.

D.

Re: [PATCH/RFC] git.c: support "!!" aliases that do not move cwd

On Tue, Oct 11, 2016 at 11:44:50AM +0200, Johannes Schindelin wrote:

> > Yeah, that's reasonable, too. So:
> > 
> >   [alias]
> > d2u = "!dos2unix"
> > 
> > acts exactly as if:
> > 
> >   [alias "d2u"]
> > command = dos2unix
> > type = shell
> > 
> > was specified at that point, which is easy to understand.
> 
> It is easy to understand, and even easier to get wrong or out of sync. I
> really liked the ease of *one* `git config` call to add new aliases. Not
> sure that I like the need for more such calls just to add *one* alias (one
> config call for "shell", one for "don't cd up", etc).

Could we simply support alias.d2u indefinitely, and you could use
whichever format you felt like (the shorter, more limited one if you
wanted, or the more verbose but flexible one)?

-Peff

Re: [PATCH] contrib: add credential helper for libsecret

On Sun, 2016-10-09 at 15:34 +0300, Mantas Mikulėnas wrote:

> + s = g_hash_table_lookup(attributes, "user");
> + if (s) {
> + g_free(c->username);
> + c->username = g_strdup(s);
> + }

This always overwrites c->username, the original gnome-keyring version
only does that when the username isn't set. Other than that it looks
good to me.

Reviewed-by: Dennis Kaarsemaker 
Tested-by: Dennis Kaarsemaker 

D.

Re: git filter-branch --subdirectory-filter not working as expected, history of other folders is preserved

2016-10-11 Thread Seaders Oloinsigh

On Mon, Oct 10, 2016 at 7:19 PM, Jeff King  wrote:
> On Mon, Oct 10, 2016 at 05:12:25PM +0100, Seaders Oloinsigh wrote:
>
>> Due to the structure of this repo, it looks like there are some
>> branches that never had anything to do with the android/ subdirectory,
>> so they're not getting wiped out.  My branch is in a better state to
>> how I want it, but still, if I run your suggestion,
>> [...]
>
> Hmm. Yeah, I think this is an artifact of the way that filter-branch
> works with pathspec limiting. It keeps a mapping of commits that it has
> rewritten (including ones that were rewritten only because their
> ancestors were), and realizes that a branch ref needs updated when the
> commit it points to was rewritten.
>
> But if we don't touch _any_ commits in the history reachable from a
> branch (because they didn't even show up in our pathspec-limited
> rev-list), then it doesn't realize we touched the branch's history at
> all.
>
> I agree that the right outcome is for it to delete those branches
> entirely. I suspect the fix would be pretty tricky, though.
>
> In the meantime, I think you can work around it by either:
>
>   1. Make a pass beforehand for refs that do not touch your desired
>  paths at all, like:
>
>path=android ;# or whatever
>git for-each-ref --format='%(refname)' |
>while read ref; do
>  if test "$(git rev-list --count "$ref" -- "$path")" = 0; then
>echo "delete $ref"
>  fi
>done |
>git update-ref --stdin
>
>  and then filter what's left:
>
>git filter-branch --subdirectory-filter $path -- --all

This is the perfect solution for me.  Going through the delete
branches runthrough also quickened the filter-branch command, and I'm
left with a much more complete version of where I want to be.

I would still contend that the filter-branch either doesn't work as
expected, or the docs need updating to provide extra steps like you've
done, because when dealing with a large repo like we have, running
multiple filter-branch commands, trying different combinations is
quite a time sync, when you're left with the same incorrect solution
again and again.

>
> or
>
>   2. Do the filter-branch, and because you know you specified --all and
>  that your filters would touch all histories, any ref which _wasn't_
>  touched can be deleted. That list is anything which didn't get a
>  backup entry in refs/original. So something like:
>
>git for-each-ref --format='%(refname)' |
>perl -lne 'print $1 if m{^refs/original/(.*)}' >backups
>
>git for-each-ref --format='%(refname)' |
>grep -v ^refs/original >refs
>
>comm -23 refs backups |
>sed "s/^/delete /" |
>git update-ref --stdin
>
> -Peff

Re: [PATCH/RFC] git.c: support "!!" aliases that do not move cwd

2016-10-11 Thread Jakub Narębski

W dniu 11.10.2016 o 13:51, SZEDER Gábor pisze:
> Quoting Duy Nguyen :
>> On Fri, Oct 7, 2016 at 10:55 PM, Jakub Narębski  wrote:
>>> W dniu 07.10.2016 o 16:19, Johannes Schindelin pisze:
 On Fri, 7 Oct 2016, Jakub Narębski wrote:
>>>
> Note that we would have to teach git completion about new syntax;
> or new configuration variable if we go that route.

 Why would we? Git's completion does not expand aliases, it only completes
 the aliases' names, not their values.
>>>
>>> Because Git completion finds out which _options_ and _arguments_
>>> to complete for an alias based on its expansion.
>>>
>>> Yes, this is nice bit of trickery...
>>
>> It's c63437c (bash: improve aliased command recognition - 2010-02-23)
>> isn't it? This may favor j6t's approach [1] because retrieving alias
>> attributes is much easier.
>>
>> [1] 
>> https://public-inbox.org/git/20161006190720.4ecf3ptl6mszt...@sigill.intra.peff.net/T/#mb1d7b8f31d595b05105b8ea2137756761e13dbf4
>> -- 
>> Duy
> 
> The completion script is concerned in three ways:
> 
>   1. it has to get the names of all aliases, to offer them along with
>  git commands for 'git ' or 'git help ',
> 
>   2. it has to get the command executed in place of the alias, and
> 
>   3. strip everything that can't be a git command, so it can offer the
>  right options for the aliased command.

There is also a possibility to tell the completion script which git
command to use for option completion via some trickery, even if the
first git command of many used in script is not right (e.g. when
"$@" is passed somewhere in the middle).

I don't remember exact details, but let's look at source:

  # If you use complex aliases of form '!f() { ... }; f', you can use the null
  # command ':' as the first command in the function body to declare the desired
  # completion style.  For example '!f() { : git commit ; ... }; f' will
  # tell the completion to use commit completion.  This also works with aliases
  # of form "!sh -c '...'".  For example, "!sh -c ': git commit ; ... '".

Very nice.

> 
> The '!!' syntax is the easiest, I think it will just work even with
> the current completion code, no changes necessary.
> 
> The '(nocd)' form is still easy, we only have to add this '(nocd)' to
> that list of stripped words for (3), but no further changes necessary
> for (1) and (2).

Shouldn't the '!' vs '!!' / '(nocd)!' affect pathname completion?
Or do tab completion in subdir offer wrong completion of filenames
for aliases?

Best,
-- 
Jakub Narębski

Re: [PATCH v10 04/14] run-command: add clean_on_exit_handler

Hi Lars,

On Sat, 8 Oct 2016, larsxschnei...@gmail.com wrote:

> @@ -31,6 +32,15 @@ static void cleanup_children(int sig, int in_signal)
>   while (children_to_clean) {
>   struct child_to_clean *p = children_to_clean;
>   children_to_clean = p->next;
> +
> + if (p->process && !in_signal) {
> + struct child_process *process = p->process;
> + if (process->clean_on_exit_handler) {
> + trace_printf("trace: run_command: running exit 
> handler for pid %d", p->pid);

On Windows, pid_t translates to long long int, resulting in this build
error:

-- snip --
 In file included from cache.h:10:0,
  from run-command.c:1:
 run-command.c: In function 'cleanup_children':
 run-command.c:39:18: error: format '%d' expects argument of type 'int', but 
argument 5 has type 'pid_t {aka long long int}' [-Werror=format=]
  trace_printf("trace: run_command: running exit handler for pid %d", 
p->pid);
   ^
 trace.h:81:53: note: in definition of macro 'trace_printf'
   trace_printf_key_fl(TRACE_CONTEXT, __LINE__, NULL, __VA_ARGS__)
  ^~~
 cc1.exe: all warnings being treated as errors
 make: *** [Makefile:1987: run-command.o] Error 1
-- snap --

Maybe use PRIuMAX as we do elsewhere (see output of `git grep
printf.*pid`):

trace_printf("trace: run_command: running exit handler for pid %"
 PRIuMAX, (uintmax_t)p->pid);

Ciao,
Dscho

Re: [PATCH/RFC] git.c: support "!!" aliases that do not move cwd

2016-10-11 Thread SZEDER Gábor

Quoting Duy Nguyen :

On Fri, Oct 7, 2016 at 10:55 PM, Jakub Narębski wrote:

W dniu 07.10.2016 o 16:19, Johannes Schindelin pisze:

On Fri, 7 Oct 2016, Jakub Narębski wrote:

Note that we would have to teach git completion about new syntax;
or new configuration variable if we go that route.

Why would we? Git's completion does not expand aliases, it only completes
the aliases' names, not their values.

Because Git completion finds out which _options_ and _arguments_
to complete for an alias based on its expansion.

Yes, this is nice bit of trickery...

It's c63437c (bash: improve aliased command recognition - 2010-02-23)
isn't it? This may favor j6t's approach [1] because retrieving alias
attributes is much easier.

[1]
https://public-inbox.org/git/20161006190720.4ecf3ptl6mszt...@sigill.intra.peff.net/T/#mb1d7b8f31d595b05105b8ea2137756761e13dbf4
--
Duy

The completion script is concerned in three ways:

1. it has to get the names of all aliases, to offer them along with
git commands for 'git ' or 'git help ',

2. it has to get the command executed in place of the alias, and

3. strip everything that can't be a git command, so it can offer the
right options for the aliased command.

The '!!' syntax is the easiest, I think it will just work even with
the current completion code, no changes necessary.

The '(nocd)' form is still easy, we only have to add this '(nocd)' to
that list of stripped words for (3), but no further changes necessary
for (1) and (2).

'alias.d2u.command' is tricky. Both (1) and (2) require adjustments,
preferably in a way that doesn't involve additional git processes, at
least for (1), as it is executed often, for every 'git ', for the
sake of users on platforms with not particularly stellar fork()+exec()
performance. I think it would be possible to have only one 'git
config --get-regexp' and a little bit of post processing in each case,
but I didn't think too hard about possible pitfalls, especially when
dealing with ambiguity when both 'alias.d2u' and 'alias.d2u.command'
are set. And I won't think any more about it until a conclusion is
reached that we'll go this route :)

Re: [PATCH/RFC] git.c: support "!!" aliases that do not move cwd