[PATCH 1/2] mailinfo: Add support for keep_cr

2017-01-11 Thread Matthew Wilcox
From: Matthew Wilcox 

If you have a base-64 encoded patch with CRLF endings (as produced
by forwarding a patch from Outlook to a Linux machine, for example),
the keep_cr setting is not honoured because keep_cr is only passed
to mailsplit, which does not look through the encoding.  The keep_cr
logic needs to be applied after the base64 decode.  I copied that
logic to handle_filter(), and rather than add a new keep_cr parameter
to handle_filter, I opted to add keep_cr to struct mailinfo; it seemed
appropriate given use_scissors was already there.

Then I needed to initialise keep_cr in the struct mailinfo passed from
git-am, and rather than thread a 'keep_cr' parameter all the way through
to parse_mail(), I decided to add keep_cr to struct am_state, which let
it be removed as a parameter from five other functions.

Signed-off-by: Matthew Wilcox 
---
 builtin/am.c | 49 -
 mailinfo.c   |  6 ++
 mailinfo.h   |  1 +
 3 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 31fb60578..6cb6e6ca8 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -124,6 +124,7 @@ struct am_state {
int keep; /* enum keep_type */
int message_id;
int scissors; /* enum scissors_type */
+   int keep_cr;
struct argv_array git_apply_opts;
const char *resolvemsg;
int committer_date_is_author_date;
@@ -143,6 +144,7 @@ static void am_state_init(struct am_state *state, const 
char *dir)
 
memset(state, 0, sizeof(*state));
 
+   state->keep_cr = -1;
assert(dir);
state->dir = xstrdup(dir);
 
@@ -697,7 +699,7 @@ static int detect_patch_format(const char **paths)
  * a mbox file or a Maildir. Returns 0 on success, -1 on failure.
  */
 static int split_mail_mbox(struct am_state *state, const char **paths,
-   int keep_cr, int mboxrd)
+   int mboxrd)
 {
struct child_process cp = CHILD_PROCESS_INIT;
struct strbuf last = STRBUF_INIT;
@@ -707,7 +709,7 @@ static int split_mail_mbox(struct am_state *state, const 
char **paths,
argv_array_pushf(&cp.args, "-d%d", state->prec);
argv_array_pushf(&cp.args, "-o%s", state->dir);
argv_array_push(&cp.args, "-b");
-   if (keep_cr)
+   if (state->keep_cr)
argv_array_push(&cp.args, "--keep-cr");
if (mboxrd)
argv_array_push(&cp.args, "--mboxrd");
@@ -737,7 +739,7 @@ typedef int (*mail_conv_fn)(FILE *out, FILE *in, int 
keep_cr);
  * Returns 0 on success, -1 on failure.
  */
 static int split_mail_conv(mail_conv_fn fn, struct am_state *state,
-   const char **paths, int keep_cr)
+   const char **paths)
 {
static const char *stdin_only[] = {"-", NULL};
int i;
@@ -766,7 +768,7 @@ static int split_mail_conv(mail_conv_fn fn, struct am_state 
*state,
return error_errno(_("could not open '%s' for writing"),
   mail);
 
-   ret = fn(out, in, keep_cr);
+   ret = fn(out, in, state->keep_cr);
 
fclose(out);
fclose(in);
@@ -826,8 +828,7 @@ static int stgit_patch_to_mail(FILE *out, FILE *in, int 
keep_cr)
  *
  * Returns 0 on success, -1 on failure.
  */
-static int split_mail_stgit_series(struct am_state *state, const char **paths,
-   int keep_cr)
+static int split_mail_stgit_series(struct am_state *state, const char **paths)
 {
const char *series_dir;
char *series_dir_buf;
@@ -857,7 +858,7 @@ static int split_mail_stgit_series(struct am_state *state, 
const char **paths,
strbuf_release(&sb);
free(series_dir_buf);
 
-   ret = split_mail_conv(stgit_patch_to_mail, state, patches.argv, 
keep_cr);
+   ret = split_mail_conv(stgit_patch_to_mail, state, patches.argv);
 
argv_array_clear(&patches);
return ret;
@@ -937,30 +938,27 @@ static int hg_patch_to_mail(FILE *out, FILE *in, int 
keep_cr)
  * state->cur will be set to the index of the first mail, and state->last will
  * be set to the index of the last mail.
  *
- * Set keep_cr to 0 to convert all lines ending with \r\n to end with \n, 1
- * to disable this behavior, -1 to use the default configured setting.
- *
  * Returns 0 on success, -1 on failure.
  */
 static int split_mail(struct am_state *state, enum patch_format patch_format,
-   const char **paths, int keep_cr)
+   const char **paths)
 {
-   if (keep_cr < 0) {
-   keep_cr = 0;
-   git_config_get_bool("am.keepcr", &keep_cr);
+   if (state->keep_cr < 0) {
+   state->keep_cr = 0;
+   git_config_get_bool("am.keepcr", &state->keep_cr);
}
 
switch (patch_format) {
case PATCH_FORMAT_MBOX:
-   return split_mail_mbox(state, pa

[PATCH 2/2] mailinfo: Understand forwarded patches

2017-01-11 Thread Matthew Wilcox
From: Matthew Wilcox 

Extend the --scissors mechanism to strip off the preamble created by
forwarding a patch.  There are a couple of extra headers ("Sent" and
"To") added by forwarding, but other than that, the --scissors option
will now remove patches forwarded from Microsoft Outlook to a Linux
email account.

Signed-off-by: Matthew Wilcox 
---
 mailinfo.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mailinfo.c b/mailinfo.c
index 2059704a8..fc1275532 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -332,7 +332,7 @@ static void cleanup_subject(struct mailinfo *mi, struct 
strbuf *subject)
 
 #define MAX_HDR_PARSED 10
 static const char *header[MAX_HDR_PARSED] = {
-   "From","Subject","Date",
+   "From","Subject","Date","Sent","To",
 };
 
 static inline int cmp_header(const struct strbuf *line, const char *hdr)
@@ -685,6 +685,13 @@ static int is_scissors_line(const char *line)
c++;
continue;
}
+   if (!memcmp(c, "Original Message", 16)) {
+   in_perforation = 1;
+   perforation += 16;
+   scissors += 16;
+   c += 15;
+   continue;
+   }
in_perforation = 0;
}
 
-- 
2.11.0



Re: [PATCH 2/2] Use 'env' to find perl instead of fixed path

2017-01-11 Thread Pat Pannuto
I'm not at all attached to changing all of them, just figured it made
sense while I was here.

Would a patch that changes only:

 git-add--interactive.perl | 2 +-
 git-archimport.perl   | 2 +-
 git-cvsexportcommit.perl  | 2 +-
 git-cvsimport.perl| 2 +-
 git-cvsserver.perl| 2 +-
 git-difftool.perl | 2 +-
 git-relink.perl   | 2 +-
 git-send-email.perl   | 2 +-
 git-svn.perl  | 2 +-

be more acceptable?

On Thu, Jan 12, 2017 at 1:27 AM, Johannes Sixt  wrote:
> Am 12.01.2017 um 06:51 schrieb Pat Pannuto:
>>
>> diff --git a/git-add--interactive.perl b/git-add--interactive.perl
>> index cf6fc926a..6d7b6c35d 100755
>> --- a/git-add--interactive.perl
>> +++ b/git-add--interactive.perl
>> @@ -1,4 +1,4 @@
>> -#!/usr/bin/perl
>> +#!/usr/bin/env perl
>
>
> This change, and all others that affect installed external git programs, is
> a no-go. On Windows, our execve emulation is not complete. It would invoke
> only `env` (looked up in PATH), but not pass 'perl' as argument.
>
> Sorry for the bad news.
>
> I would have suggested to set PERL_PATH in your config.mak, but that does
> not change the generated perl scripts, I think. Perhaps you should implement
> that?
>
> I'm not thrilled about your changes to the test scripts, but I do not expect
> that they break on Windows.
>
> -- Hannes
>


Re: [PATCH 0/2] Use env for all perl invocations

2017-01-11 Thread Pat Pannuto
[plain text re-send, sorry]

Interesting, I'm guessing this came from when git was installed (
https://github.com/Homebrew/homebrew-core/blob/master/Formula/git.rb#L50
), when the perl path was likely still /usr/bin/perl

It feels weird to me that the perl path is fixed at
compile/install-time as opposed to run-time discovery -- this means
users can't change their perl install without breaking git?

On Thu, Jan 12, 2017 at 1:21 AM, Junio C Hamano  wrote:
> Pat Pannuto  writes:
>
>> I spent a little while debugging why git-format-patch refused to believe
>> that SSL support was installed (Can't locate Net/SMTP/SSL.pm in @INC...)
>> Turns out that it was installed for my system's preferred 
>> /usr/local/bin/perl,
>> but not for git-format-patch's hard-coded /usr/bin/perl; changing the shebang
>> allowed git format-patch to work as expected.
>
> Isn't that an indication that you are not building correctly?
> Perhaps
>
> $ git grep 'Define PERL_' Makefile
> $ make PERL_PATH=/usr/local/bin/perl
>
> would help?


Re: [RFC PATCH 3/2] vreportf: add prefix to each line

2017-01-11 Thread Jeff King
On Wed, Jan 11, 2017 at 02:11:42PM -0800, Junio C Hamano wrote:

> > It's actually kind of ugly.  For instance, a failing test in
> > t3600 now produces:
> >
> >error: the following files have staged content different from both the
> >error: file and the HEAD:
> >error: bar.txt
> >error: foo.txt
> >error: (use -f to force removal)
> >
> > which seems a bit aggressive.  
> 
> I agree that it is ugly, but one reason I was hoping to do this
> myself (or have somebody else do it by procrastinating) was that I
> thought it may help i18n.  That is, for an original
> 
>   error(_("we found an error"))
> 
> a .po file may translate the string to a multi-line string that has
> LFs in it and the output would look correct.  The translator already
> can do so by indenting the second and subsequent lines by the same
> column-width as "error: " (or its translation in their language, if
> we are going to i18n these headers), but that (1) is extra work for
> them, and (2) makes it impossible to have the same message appear in
> different contexts (i.e. "error:" vs "warning:" that have different
> column-widths).

Yes, I agree that would be a functional benefit. I'm just hoping we can
do it in a way that is visually pleasing.

> > It also screws up messages which indent with tabs (the prefix eats
> > up some of the tabstop, making the indentation smaller).
> 
> This is unavoidable and at the same time is a non-issue, isn't it?
> Messages that indent the second and subsequent lines with tabs are
> compensating the lack of the multi-line support of vreportf(), which
> this RFC patch fixes.  They may need to be adjusted to the new world
> order, but that is a good thing.  New multi-line messages no longer
> have to worry about the prefix that is added only to the first line
> when continuing the message to multiple lines.

I'm not so sure it is just about compensating. Look at the message
quoted above. The original looks like:

  error: the following files have staged content different from both the
  file and the HEAD:
  bar.txt
  foo.txt
  (use -f to force removal)

The leading whitespace is visually separating the list of files, not
just from the line with "error:", but from the other lines.

Though I think if we replaced tabs with spaces in this instance, then
they would still be bumped relative to the rest of the text.

> > It may be possible to address some of that by using some
> > other kind of continuation marker (instead of just repeating
> > the prefix), and expanding initial tabs.
> 
> Yes indeed.  The "some other kind of continuation marker" could just
> be a run of spaces that fill the same column as the "error: " or
> other prefix given to the first line.

I tried that, along with several other variants, but it gets
confusing/ugly when mixed with indentation that is significant to the
line. For example, the t3600 message becomes:

  error: the following files have staged content different from both the
 file and the HEAD:
 bar.txt
 foo.txt
 (use -f to force removal)

Which is arguably better than what we have now, but still looks pretty
bad to me.

I wonder if it would help for the marker to end in a non-whitespace
character. Like:

  error: the following files have staged content different from both the
   :  file and the HEAD:
   : bar.txt
   : foo.txt
   : (use -f to force removal)

or something. The ":" is a bit sparse looking. Maybe there are better
options. That does ruin line-by-line selection for cut-and-paste,
though.

So I dunno. I am open to ideas, but I didn't find one that I really
liked (this patch is actually a leftover from before Christmas, so I
don't even remember all the things I tried, just that I didn't like any
of them ;) ).

-Peff


Re: [PATCH 2/2] Use 'env' to find perl instead of fixed path

2017-01-11 Thread Johannes Sixt

Am 12.01.2017 um 06:51 schrieb Pat Pannuto:

diff --git a/git-add--interactive.perl b/git-add--interactive.perl
index cf6fc926a..6d7b6c35d 100755
--- a/git-add--interactive.perl
+++ b/git-add--interactive.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/perl
+#!/usr/bin/env perl


This change, and all others that affect installed external git programs, 
is a no-go. On Windows, our execve emulation is not complete. It would 
invoke only `env` (looked up in PATH), but not pass 'perl' as argument.


Sorry for the bad news.

I would have suggested to set PERL_PATH in your config.mak, but that 
does not change the generated perl scripts, I think. Perhaps you should 
implement that?


I'm not thrilled about your changes to the test scripts, but I do not 
expect that they break on Windows.


-- Hannes



Re: [PATCH 0/2] Use env for all perl invocations

2017-01-11 Thread Junio C Hamano
Pat Pannuto  writes:

> I spent a little while debugging why git-format-patch refused to believe
> that SSL support was installed (Can't locate Net/SMTP/SSL.pm in @INC...)
> Turns out that it was installed for my system's preferred /usr/local/bin/perl,
> but not for git-format-patch's hard-coded /usr/bin/perl; changing the shebang
> allowed git format-patch to work as expected.

Isn't that an indication that you are not building correctly?
Perhaps

$ git grep 'Define PERL_' Makefile
$ make PERL_PATH=/usr/local/bin/perl

would help?


[PATCH 2/2] Use 'env' to find perl instead of fixed path

2017-01-11 Thread Pat Pannuto
Depending on system configuration, /usr/bin/perl may not be the
preferred perl interpreter, use /usr/bin/env perl to invoke perl
instead to repsect user preferences.

In most cases this will not matter as any perl install will work,
but for applications such as git-format-patch, which may require
plugins such as Net/SMTP/SSL.pm to be installed, it is very confusing
when git fails to work after installation because it chose the
wrong perl installation.

This patch converts all shebangs to use env for consistency across
the project.

Signed-off-by: Pat Pannuto 
---
 Documentation/build-docdep.perl   | 2 +-
 Documentation/cat-texi.perl   | 2 +-
 Documentation/cmd-list.perl   | 2 +-
 Documentation/fix-texi.perl   | 2 +-
 Documentation/lint-gitlink.perl   | 2 +-
 compat/vcbuild/scripts/clink.pl   | 2 +-
 compat/vcbuild/scripts/lib.pl | 2 +-
 contrib/buildsystems/engine.pl| 2 +-
 contrib/buildsystems/generate | 2 +-
 contrib/buildsystems/parse.pl | 2 +-
 contrib/contacts/git-contacts | 2 +-
 contrib/credential/netrc/git-credential-netrc | 2 +-
 contrib/credential/netrc/test.pl  | 2 +-
 contrib/diff-highlight/diff-highlight | 2 +-
 contrib/examples/git-remote.perl  | 2 +-
 contrib/examples/git-rerere.perl  | 2 +-
 contrib/examples/git-svnimport.perl   | 2 +-
 contrib/fast-import/git-import.perl   | 2 +-
 contrib/fast-import/import-directories.perl   | 2 +-
 contrib/fast-import/import-tars.perl  | 2 +-
 contrib/hooks/setgitperms.perl| 2 +-
 contrib/hooks/update-paranoid | 2 +-
 contrib/long-running-filter/example.pl| 2 +-
 contrib/mw-to-git/git-mw.perl | 2 +-
 contrib/mw-to-git/git-remote-mediawiki.perl   | 2 +-
 contrib/mw-to-git/t/test-gitmw.pl | 2 +-
 contrib/stats/mailmap.pl  | 2 +-
 contrib/stats/packinfo.pl | 2 +-
 git-add--interactive.perl | 2 +-
 git-archimport.perl   | 2 +-
 git-cvsexportcommit.perl  | 2 +-
 git-cvsimport.perl| 2 +-
 git-cvsserver.perl| 2 +-
 git-difftool.perl | 2 +-
 git-relink.perl   | 2 +-
 git-send-email.perl   | 2 +-
 git-svn.perl  | 2 +-
 gitweb/gitweb.perl| 2 +-
 t/Git-SVN/Utils/can_compress.t| 2 +-
 t/Git-SVN/Utils/fatal.t   | 2 +-
 t/check-non-portable-shell.pl | 2 +-
 t/gitweb-lib.sh   | 2 +-
 t/perf/aggregate.perl | 2 +-
 t/perf/min_time.perl  | 2 +-
 t/t0202/test.pl   | 2 +-
 t/t4034/perl/post | 2 +-
 t/t4034/perl/pre  | 2 +-
 t/t9000/test.pl   | 2 +-
 t/t9500-gitweb-standalone-no-errors.sh| 2 +-
 t/t9700/test.pl   | 2 +-
 t/test-terminal.perl  | 2 +-
 51 files changed, 51 insertions(+), 51 deletions(-)

diff --git a/Documentation/build-docdep.perl b/Documentation/build-docdep.perl
index ba4205e03..dc50f21f3 100755
--- a/Documentation/build-docdep.perl
+++ b/Documentation/build-docdep.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/perl
+#!/usr/bin/env perl
 
 my %include = ();
 my %included = ();
diff --git a/Documentation/cat-texi.perl b/Documentation/cat-texi.perl
index 1cd28b1b5..98dcc2c42 100755
--- a/Documentation/cat-texi.perl
+++ b/Documentation/cat-texi.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/perl
+#!/usr/bin/env perl
 
 use warnings;
 
diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl
index ba640a441..f440eebe3 100755
--- a/Documentation/cmd-list.perl
+++ b/Documentation/cmd-list.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/perl
+#!/usr/bin/env perl
 
 use warnings;
 
diff --git a/Documentation/fix-texi.perl b/Documentation/fix-texi.perl
index c247aece7..61287a0a9 100755
--- a/Documentation/fix-texi.perl
+++ b/Documentation/fix-texi.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/perl
+#!/usr/bin/env perl
 
 use warnings;
 
diff --git a/Documentation/lint-gitlink.perl b/Documentation/lint-gitlink.perl
index 476cc30b8..3e4b00eda 100755
--- a/Documentation/lint-gitlink.perl
+++ b/Documentation/lint-gitlink.perl
@@ -1,4 +1,4 @@
-#!/usr/bin/perl
+#!/usr/bin/env perl
 
 use File::Find;
 use Getopt::Long;
diff --git a/compat/vcbuild/scripts/clink.pl b/compat/vcbuild/scripts/clink.pl
index 46eb61c5c..857847703 100755
--- a/compat/vcbuild/scripts/clink.pl
+++ b/compat/vcbuild/scripts/clink.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/perl
+#!/usr/bin/env perl
 ##
 # 

[PATCH 0/2] Use env for all perl invocations

2017-01-11 Thread Pat Pannuto
I spent a little while debugging why git-format-patch refused to believe
that SSL support was installed (Can't locate Net/SMTP/SSL.pm in @INC...)
Turns out that it was installed for my system's preferred /usr/local/bin/perl,
but not for git-format-patch's hard-coded /usr/bin/perl; changing the shebang
allowed git format-patch to work as expected.

This patch set converts all perl invocations in git to use env so that the
user-preferred perl interpreter is always used.

Pat Pannuto (2):
  Convert all 'perl -w' to 'perl' + 'use warnings;'
  Use 'env' to find perl instead of fixed path

 Documentation/build-docdep.perl   | 2 +-
 Documentation/cat-texi.perl   | 4 +++-
 Documentation/cmd-list.perl   | 4 +++-
 Documentation/fix-texi.perl   | 4 +++-
 Documentation/lint-gitlink.perl   | 2 +-
 compat/vcbuild/scripts/clink.pl   | 3 ++-
 compat/vcbuild/scripts/lib.pl | 3 ++-
 contrib/buildsystems/engine.pl| 3 ++-
 contrib/buildsystems/generate | 3 ++-
 contrib/buildsystems/parse.pl | 3 ++-
 contrib/contacts/git-contacts | 2 +-
 contrib/credential/netrc/git-credential-netrc | 2 +-
 contrib/credential/netrc/test.pl  | 2 +-
 contrib/diff-highlight/diff-highlight | 2 +-
 contrib/examples/git-remote.perl  | 3 ++-
 contrib/examples/git-rerere.perl  | 2 +-
 contrib/examples/git-svnimport.perl   | 2 +-
 contrib/fast-import/git-import.perl   | 2 +-
 contrib/fast-import/import-directories.perl   | 2 +-
 contrib/fast-import/import-tars.perl  | 2 +-
 contrib/hooks/setgitperms.perl| 2 +-
 contrib/hooks/update-paranoid | 2 +-
 contrib/long-running-filter/example.pl| 2 +-
 contrib/mw-to-git/git-mw.perl | 2 +-
 contrib/mw-to-git/git-remote-mediawiki.perl   | 2 +-
 contrib/mw-to-git/t/test-gitmw.pl | 5 -
 contrib/stats/mailmap.pl  | 2 +-
 contrib/stats/packinfo.pl | 2 +-
 git-add--interactive.perl | 2 +-
 git-archimport.perl   | 2 +-
 git-cvsexportcommit.perl  | 2 +-
 git-cvsimport.perl| 2 +-
 git-cvsserver.perl| 2 +-
 git-difftool.perl | 2 +-
 git-relink.perl   | 2 +-
 git-send-email.perl   | 2 +-
 git-svn.perl  | 2 +-
 gitweb/gitweb.perl| 2 +-
 t/Git-SVN/Utils/can_compress.t| 2 +-
 t/Git-SVN/Utils/fatal.t   | 2 +-
 t/check-non-portable-shell.pl | 2 +-
 t/gitweb-lib.sh   | 2 +-
 t/perf/aggregate.perl | 2 +-
 t/perf/min_time.perl  | 2 +-
 t/t0202/test.pl   | 2 +-
 t/t4034/perl/post | 2 +-
 t/t4034/perl/pre  | 2 +-
 t/t9000/test.pl   | 2 +-
 t/t9500-gitweb-standalone-no-errors.sh| 2 +-
 t/t9700/test.pl   | 2 +-
 t/test-terminal.perl  | 2 +-
 51 files changed, 66 insertions(+), 51 deletions(-)

-- 
2.11.0



[PATCH 1/2] Convert all 'perl -w' to 'perl' + 'use warnings;'

2017-01-11 Thread Pat Pannuto
This commit is in preparation for converting all shebangs to use 'env'
instead of a fixed perl path, which will not allow for arguments to 'perl'.

Signed-off-by: Pat Pannuto 
---
 Documentation/cat-texi.perl   | 4 +++-
 Documentation/cmd-list.perl   | 4 +++-
 Documentation/fix-texi.perl   | 4 +++-
 compat/vcbuild/scripts/clink.pl   | 3 ++-
 compat/vcbuild/scripts/lib.pl | 3 ++-
 contrib/buildsystems/engine.pl| 3 ++-
 contrib/buildsystems/generate | 3 ++-
 contrib/buildsystems/parse.pl | 3 ++-
 contrib/examples/git-remote.perl  | 3 ++-
 contrib/mw-to-git/t/test-gitmw.pl | 5 -
 10 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/Documentation/cat-texi.perl b/Documentation/cat-texi.perl
index 87437f8a9..1cd28b1b5 100755
--- a/Documentation/cat-texi.perl
+++ b/Documentation/cat-texi.perl
@@ -1,4 +1,6 @@
-#!/usr/bin/perl -w
+#!/usr/bin/perl
+
+use warnings;
 
 my @menu = ();
 my $output = $ARGV[0];
diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl
index 5aa73cfe4..ba640a441 100755
--- a/Documentation/cmd-list.perl
+++ b/Documentation/cmd-list.perl
@@ -1,4 +1,6 @@
-#!/usr/bin/perl -w
+#!/usr/bin/perl
+
+use warnings;
 
 use File::Compare qw(compare);
 
diff --git a/Documentation/fix-texi.perl b/Documentation/fix-texi.perl
index ff7d78f62..c247aece7 100755
--- a/Documentation/fix-texi.perl
+++ b/Documentation/fix-texi.perl
@@ -1,4 +1,6 @@
-#!/usr/bin/perl -w
+#!/usr/bin/perl
+
+use warnings;
 
 while (<>) {
if (/^\@setfilename/) {
diff --git a/compat/vcbuild/scripts/clink.pl b/compat/vcbuild/scripts/clink.pl
index a87d0da51..46eb61c5c 100755
--- a/compat/vcbuild/scripts/clink.pl
+++ b/compat/vcbuild/scripts/clink.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/perl -w
+#!/usr/bin/perl
 ##
 # Compiles or links files
 #
@@ -10,6 +10,7 @@
 # Copyright (C) 2009 Marius Storm-Olsen 
 ##
 use strict;
+use warnings;
 my @args = ();
 my @cflags = ();
 my $is_linking = 0;
diff --git a/compat/vcbuild/scripts/lib.pl b/compat/vcbuild/scripts/lib.pl
index d8054e469..e571b8470 100755
--- a/compat/vcbuild/scripts/lib.pl
+++ b/compat/vcbuild/scripts/lib.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/perl -w
+#!/usr/bin/perl
 ##
 # Libifies files on Windows
 #
@@ -10,6 +10,7 @@
 # Copyright (C) 2009 Marius Storm-Olsen 
 ##
 use strict;
+use warnings;
 my @args = ();
 while (@ARGV) {
my $arg = shift @ARGV;
diff --git a/contrib/buildsystems/engine.pl b/contrib/buildsystems/engine.pl
index 23da787dc..a173669ce 100755
--- a/contrib/buildsystems/engine.pl
+++ b/contrib/buildsystems/engine.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/perl -w
+#!/usr/bin/perl
 ##
 # Do not call this script directly!
 #
@@ -8,6 +8,7 @@
 # Copyright (C) 2009 Marius Storm-Olsen 
 ##
 use strict;
+use warnings;
 use File::Basename;
 use File::Spec;
 use Cwd;
diff --git a/contrib/buildsystems/generate b/contrib/buildsystems/generate
index bc10f25ff..9af89454a 100755
--- a/contrib/buildsystems/generate
+++ b/contrib/buildsystems/generate
@@ -1,4 +1,4 @@
-#!/usr/bin/perl -w
+#!/usr/bin/perl
 ##
 # Generate buildsystem files
 #
@@ -19,6 +19,7 @@
 # Copyright (C) 2009 Marius Storm-Olsen 
 ##
 use strict;
+use warnings;
 use File::Basename;
 use Cwd;
 
diff --git a/contrib/buildsystems/parse.pl b/contrib/buildsystems/parse.pl
index c9656ece9..33ca89eb0 100755
--- a/contrib/buildsystems/parse.pl
+++ b/contrib/buildsystems/parse.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/perl -w
+#!/usr/bin/perl
 ##
 # Do not call this script directly!
 #
@@ -8,6 +8,7 @@
 # Copyright (C) 2009 Marius Storm-Olsen 
 ##
 use strict;
+use warnings;
 use File::Basename;
 use Cwd;
 
diff --git a/contrib/examples/git-remote.perl b/contrib/examples/git-remote.perl
index d42df7b41..5bf3ffd4c 100755
--- a/contrib/examples/git-remote.perl
+++ b/contrib/examples/git-remote.perl
@@ -1,6 +1,7 @@
-#!/usr/bin/perl -w
+#!/usr/bin/perl
 
 use strict;
+use warnings;
 use Git;
 my $git = Git->repository();
 
diff --git a/contrib/mw-to-git/t/test-gitmw.pl 
b/contrib/mw-to-git/t/test-gitmw.pl
index 0ff76259f..8d0e7c078 100755
--- a/contrib/mw-to-git/t/test-gitmw.pl
+++ b/contrib/mw-to-git/t/test-gitmw.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/perl -w -s
+#!/usr/bin/perl
 # Copyright (C) 2012
 # Charles Roussel 
 # Simon Cathebras 
@@ -22,6 +22,9 @@
 # "edit_page"
 # "getallpagename"
 
+use strict;
+use warnin

Re: [ANNOUNCE] git-test: run automated tests against a range of Git commits

2017-01-11 Thread Michael Haggerty
On 01/10/2017 10:35 AM, Jeff King wrote:
> On Sun, Jan 08, 2017 at 10:52:25AM +0100, Michael Haggerty wrote:
>> [...]

Since my last email, I have implemented a bunch of what we discussed
[1]. Because of the new semantics, I also *renamed the main command* from

git test range [...]

to

git test run [...]

>> I'm thinking of maybe
>>
>> * If an argument matches `*..*`, pass it to `rev-list` (like now).

This was already implemented.

>> * Otherwise, treat each argument as a single commit/tree (i.e., pass it
>> to `rev-parse`).

This is now implemented, too. Plus, it is now allowed to specify
multiple commits and/or ranges in a single invocation.

>> * If no argument is specified, test `@{u}..` (assuming that an
>>   upstream is configured). Though actually, this won't be as
>>   convenient as it sounds, because (a) `git test` is often run
>>   in a separate worktree, and (2) on errors, it currently leaves the
>>   repository with a detached `HEAD`.

I decided that if no argument is specified, it makes more sense to test
HEAD if the working tree is clean, and the contents of the working tree
otherwise. In the latter case no results are stored to notes, but it is
still useful to be able to run a preconfigured test quickly while
working. These are both implemented.

>> * Support a `--stdin` option, to read a list of commits/trees to test
>>   from standard input. By this mechanism, users could use arbitrary
>>   `rev-list` commands to choose what to test.

This is now implemented, too.

> [...]
>> I think ideally `git notes add` would look for pre-existing notes, and:
>>
>> * If none are found, create an empty notes reference.
>>
>> * If pre-existing notes are found and there was no existing test with
>>   that name, probably just leave the old notes in place.
>>
>> * If pre-existing notes are found and there was already a test with
>>   that name but a different command, perhaps insist that the user
>>   decide explicitly whether to forget the old results or continue using
>>   them. This might help users avoid the mistake of re-using old results
>>   even if they change the manner of testing.
> 
> I'm not quite sure what you mean here. By "test" and "command", do you
> mean the test name that is used in the notes ref, and the command that
> it is defined as?

By "test", I mean a test as configured in `git-test` and referred to by
its name (e.g., in `--test=name`). Currently the only configuration for
a test is `test..command`, but I structured the namespace that way
to leave room to add more configuration in the future.

> In the notes-cache.c subsystem, the commit message stores a validity
> token which must match in order to use the cache. You could do something
> similar here (store the executed command in the commit message, and
> invalidate the cache the user has changed the command). The notes-cache
> stuff isn't available outside of the C code, though. You could either
> expose it, or just do something similar longhand.
> 
> Thinking about it, though, I did notice that the tree sha1 is not the
> only input to the cache. Things like config.mak (not to mention the
> system itself) contribute to the test results. So no system will ever be
> perfect, and it seems like just making an easy way to force a retest (or
> just invalidate the whole cache) would be sufficient.

It's true that the tree and the test command are not the only inputs to
the testing machinery. I wasn't hoping to build a watertight system to
prevent accidental use of old results when `config.mak` or something
else in the environment has changed. I was only trying to give the user
a reminder that if they change the command, it's a good time to consider
whether the old results have to be invalidated.

I suppose if we wanted to make this system more watertight, we could let
the test configuration specify the names of other files on which its
results depend; for example,

test.default.command = make -j16 test
test.default.auxiliaryInput = config.mak

and include some kind of hash of the auxiliary inputs in a validity
token. But that feels like overkill.

> [...]
>> Yeah, this is awkward, not only because many people don't know what to
>> make of detached HEAD, but also because it makes it awkward in general
>> to use `git test` in your main working directory. I didn't model this
>> behavior on `git rebase --interactive`'s `edit` command, because I
>> rarely use that. But I can see how they would fit together pretty well
>> for people who like that workflow.
> 
> Yeah, after sleeping on it, I think it's best if "git test" remains
> separate from that. It's primary function is to run the test, possibly
> serving up a cached answer. So it would be perfectly reasonable to do:
> 
>   git rebase -x 'git test range HEAD'
> 
> to accomplish the interactive testing (though perhaps just "git test"
> would be a nice synonym for that).

That invocation can now be written `git test run`, which is a bit
shorter. There could be a special 

[PATCH 1/5] doc: add documentation for OPT_STRING_LIST

2017-01-11 Thread Jacob Keller
From: Jacob Keller 

Commit c8ba16391655 ("parse-options: add OPT_STRING_LIST helper",
2011-06-09) added the OPT_STRING_LIST as a way to accumulate a repeated
list of strings. However, this was not documented in the
api-parse-options documentation. Add documentation now so that future
developers may learn of its existence.

Signed-off-by: Jacob Keller 
---
 Documentation/technical/api-parse-options.txt | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/technical/api-parse-options.txt 
b/Documentation/technical/api-parse-options.txt
index 27bd701c0d68..15e876e4c804 100644
--- a/Documentation/technical/api-parse-options.txt
+++ b/Documentation/technical/api-parse-options.txt
@@ -168,6 +168,11 @@ There are some macros to easily define options:
Introduce an option with string argument.
The string argument is put into `str_var`.
 
+`OPT_STRING_LIST(short, long, &list, arg_str, description)`::
+   Introduce an option with a string argument. Repeated invocations
+   accumulate into a list of strings. Reset and clear the list with
+   `--no-option`.
+
 `OPT_INTEGER(short, long, &int_var, description)`::
Introduce an option with integer argument.
The integer is put into `int_var`.
-- 
2.11.0.403.g196674b8396b



[PATCH 4/5] describe: teach --match to accept multiple patterns

2017-01-11 Thread Jacob Keller
From: Jacob Keller 

Teach `--match` to be accepted multiple times, accumulating a list of
patterns to match into a string list. Each pattern is inclusive, such
that a tag need only match one of the provided patterns to be
considered for matching.

This extension is useful as it enables more flexibility in what tags
match, and may avoid the need to run the describe command multiple
times to get the same result.

Add tests and update the documentation for this change.

Signed-off-by: Jacob Keller 
---
 Documentation/git-describe.txt |  5 -
 builtin/describe.c | 30 +++---
 t/t6120-describe.sh| 19 +++
 3 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
index e4ac448ff565..7ad41e2f6ade 100644
--- a/Documentation/git-describe.txt
+++ b/Documentation/git-describe.txt
@@ -83,7 +83,10 @@ OPTIONS
 --match ::
Only consider tags matching the given `glob(7)` pattern,
excluding the "refs/tags/" prefix.  This can be used to avoid
-   leaking private tags from the repository.
+   leaking private tags from the repository. If given multiple times, a
+   list of patterns will be accumulated, and tags matching any of the
+   patterns will be considered. Use `--no-match` to clear and reset the
+   list of patterns.
 
 --always::
Show uniquely abbreviated commit object as fallback.
diff --git a/builtin/describe.c b/builtin/describe.c
index 01490a157efc..5cc9e9abe798 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -28,7 +28,7 @@ static int abbrev = -1; /* unspecified */
 static int max_candidates = 10;
 static struct hashmap names;
 static int have_util;
-static const char *pattern;
+static struct string_list patterns = STRING_LIST_INIT_NODUP;
 static int always;
 static const char *dirty;
 
@@ -129,9 +129,24 @@ static int get_name(const char *path, const struct 
object_id *oid, int flag, voi
if (!all && !is_tag)
return 0;
 
-   /* Accept only tags that match the pattern, if given */
-   if (pattern && (!is_tag || wildmatch(pattern, path + 10, 0, NULL)))
-   return 0;
+   /*
+* If we're given patterns, accept only tags which match at least one
+* pattern.
+*/
+   if (patterns.nr) {
+   struct string_list_item *item;
+
+   if (!is_tag)
+   return 0;
+
+   for_each_string_list_item(item, &patterns) {
+   if (!wildmatch(item->string, path + 10, 0, NULL))
+   break;
+
+   /* If we get here, no pattern matched. */
+   return 0;
+   }
+   }
 
/* Is it annotated? */
if (!peel_ref(path, peeled.hash)) {
@@ -404,7 +419,7 @@ int cmd_describe(int argc, const char **argv, const char 
*prefix)
N_("only output exact matches"), 0),
OPT_INTEGER(0, "candidates", &max_candidates,
N_("consider  most recent tags (default: 10)")),
-   OPT_STRING(0, "match",   &pattern, N_("pattern"),
+   OPT_STRING_LIST(0, "match", &patterns, N_("pattern"),
   N_("only consider tags matching ")),
OPT_BOOL(0, "always",&always,
N_("show abbreviated commit object as fallback")),
@@ -430,6 +445,7 @@ int cmd_describe(int argc, const char **argv, const char 
*prefix)
die(_("--long is incompatible with --abbrev=0"));
 
if (contains) {
+   struct string_list_item *item;
struct argv_array args;
 
argv_array_init(&args);
@@ -440,8 +456,8 @@ int cmd_describe(int argc, const char **argv, const char 
*prefix)
argv_array_push(&args, "--always");
if (!all) {
argv_array_push(&args, "--tags");
-   if (pattern)
-   argv_array_pushf(&args, "--refs=refs/tags/%s", 
pattern);
+   for_each_string_list_item(item, &patterns)
+   argv_array_pushf(&args, "--refs=refs/tags/%s", 
item->string);
}
if (argc)
argv_array_pushv(&args, argv);
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 85f269411cb3..9e5db9b87a1f 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -182,6 +182,10 @@ check_describe "test2-lightweight-*" --tags 
--match="test2-*"
 
 check_describe "test2-lightweight-*" --long --tags --match="test2-*" HEAD^
 
+check_describe "test1-lightweight-*" --long --tags --match="test1-*" 
--match="test2-*" HEAD^
+
+check_describe "test2-lightweight-*" --long --tags --match="test1-*" 
--no-match --match="test2-*" HEAD^
+
 test_expect_success 'name-rev with exact tags' '
ech

[PATCH 3/5] name-rev: add support to discard refs by pattern match

2017-01-11 Thread Jacob Keller
From: Jacob Keller 

Extend name-rev further to support matching refs by adding `--discard`
patterns. These patterns will limit the scope of refs by discarding any
ref that matches at least one discard pattern. Checking the discard refs
shall happen first, before checking the include --refs patterns. This
will allow more flexibility to matching certain kinds of references.

Add tests and update Documentation for this change.

Signed-off-by: Jacob Keller 
---
 Documentation/git-name-rev.txt   |  7 +++
 builtin/name-rev.c   | 14 +-
 t/t6007-rev-list-cherry-pick-file.sh |  7 +++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-name-rev.txt b/Documentation/git-name-rev.txt
index 7433627db12d..9b46e5ea9aae 100644
--- a/Documentation/git-name-rev.txt
+++ b/Documentation/git-name-rev.txt
@@ -30,6 +30,13 @@ OPTIONS
given multiple times, use refs whose names match any of the given shell
patterns. Use `--no-refs` to clear any previous ref patterns given.
 
+--discard=::
+   Do not use any ref whose name matches a given shell pattern. The
+   pattern can be one of branch name, tag name or fully qualified ref
+   name. If given multiple times, discard refs that match any of the given
+   shell patterns. Use `--no-discards` to clear the list of discard
+   patterns.
+
 --all::
List all commits reachable from all refs
 
diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 000a2a700ed3..86479c17a7c9 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -109,6 +109,7 @@ struct name_ref_data {
int tags_only;
int name_only;
struct string_list ref_filters;
+   struct string_list discard_filters;
 };
 
 static struct tip_table {
@@ -150,6 +151,15 @@ static int name_ref(const char *path, const struct 
object_id *oid, int flags, vo
if (data->tags_only && !starts_with(path, "refs/tags/"))
return 0;
 
+   if (data->discard_filters.nr) {
+   struct string_list_item *item;
+
+   for_each_string_list_item(item, &data->discard_filters) {
+   if (subpath_matches(path, item->string) >= 0)
+   return 0;
+   }
+   }
+
if (data->ref_filters.nr) {
struct string_list_item *item;
int matched = 0;
@@ -323,12 +333,14 @@ int cmd_name_rev(int argc, const char **argv, const char 
*prefix)
 {
struct object_array revs = OBJECT_ARRAY_INIT;
int all = 0, transform_stdin = 0, allow_undefined = 1, always = 0, 
peel_tag = 0;
-   struct name_ref_data data = { 0, 0, STRING_LIST_INIT_NODUP };
+   struct name_ref_data data = { 0, 0, STRING_LIST_INIT_NODUP, 
STRING_LIST_INIT_NODUP };
struct option opts[] = {
OPT_BOOL(0, "name-only", &data.name_only, N_("print only names 
(no SHA-1)")),
OPT_BOOL(0, "tags", &data.tags_only, N_("only use tags to name 
the commits")),
OPT_STRING_LIST(0, "refs", &data.ref_filters, N_("pattern"),
   N_("only use refs matching ")),
+   OPT_STRING_LIST(0, "discard", &data.discard_filters, 
N_("pattern"),
+  N_("ignore refs matching ")),
OPT_GROUP(""),
OPT_BOOL(0, "all", &all, N_("list all commits reachable from 
all refs")),
OPT_BOOL(0, "stdin", &transform_stdin, N_("read from stdin")),
diff --git a/t/t6007-rev-list-cherry-pick-file.sh 
b/t/t6007-rev-list-cherry-pick-file.sh
index d072ec43b016..8a4c35f6ffee 100755
--- a/t/t6007-rev-list-cherry-pick-file.sh
+++ b/t/t6007-rev-list-cherry-pick-file.sh
@@ -118,6 +118,13 @@ test_expect_success 'name-rev --refs excludes non-matched 
patterns' '
test_cmp actual.named expect
 '
 
+test_expect_success 'name-rev --discard excludes matched patterns' '
+   git rev-list --left-right --cherry-pick F...E -- bar > actual &&
+   git name-rev --stdin --name-only --refs="*tags/*" --discard="*E" \
+   < actual > actual.named &&
+   test_cmp actual.named expect
+'
+
 cat >expect <

[PATCH 5/5] describe: teach describe negative pattern matches

2017-01-11 Thread Jacob Keller
From: Jacob Keller 

Teach git-describe the `--discard` option which will allow specifying
a glob pattern of tags to ignore. This can be combined with the
`--match` patterns to enable more flexibility in determining which tags
to consider.

For example, suppose you wish to find the first official release tag
that contains a certain commit. If we assume that official release tags
are of the form "v*" and pre-release candidates include "*rc*" in their
name, we can now find the first tag that introduces commit abcdef via:

  git describe --contains --match="v*" --discard="*rc*"

Add documentation and tests for this change.

Signed-off-by: Jacob Keller 
---
 Documentation/git-describe.txt |  8 
 builtin/describe.c | 21 +
 t/t6120-describe.sh|  8 
 3 files changed, 37 insertions(+)

diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
index 7ad41e2f6ade..a89bbde207b2 100644
--- a/Documentation/git-describe.txt
+++ b/Documentation/git-describe.txt
@@ -88,6 +88,14 @@ OPTIONS
patterns will be considered. Use `--no-match` to clear and reset the
list of patterns.
 
+--discard ::
+   Do not consider tags matching the given `glob(7)` pattern, excluding
+   the "refs/tags/" prefix. This can be used to narrow the tag space and
+   find only tags matching some meaningful criteria. If given multiple
+   times, a list of patterns will be accumulated and tags matching any
+   of the patterns will be discarded. Use `--no-discard` to clear and
+   reset the list of patterns.
+
 --always::
Show uniquely abbreviated commit object as fallback.
 
diff --git a/builtin/describe.c b/builtin/describe.c
index 5cc9e9abe798..c09288ee6321 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -29,6 +29,7 @@ static int max_candidates = 10;
 static struct hashmap names;
 static int have_util;
 static struct string_list patterns = STRING_LIST_INIT_NODUP;
+static struct string_list discard_patterns = STRING_LIST_INIT_NODUP;
 static int always;
 static const char *dirty;
 
@@ -130,6 +131,22 @@ static int get_name(const char *path, const struct 
object_id *oid, int flag, voi
return 0;
 
/*
+* If we're given discard patterns, first discard any tag which match
+* any of the discard pattern.
+*/
+   if (discard_patterns.nr) {
+   struct string_list_item *item;
+
+   if (!is_tag)
+   return 0;
+
+   for_each_string_list_item(item, &discard_patterns) {
+   if (!wildmatch(item->string, path + 10, 0, NULL))
+   return 0;
+   }
+   }
+
+   /*
 * If we're given patterns, accept only tags which match at least one
 * pattern.
 */
@@ -421,6 +438,8 @@ int cmd_describe(int argc, const char **argv, const char 
*prefix)
N_("consider  most recent tags (default: 10)")),
OPT_STRING_LIST(0, "match", &patterns, N_("pattern"),
   N_("only consider tags matching ")),
+   OPT_STRING_LIST(0, "discard", &discard_patterns, N_("pattern"),
+  N_("do not consider tags matching ")),
OPT_BOOL(0, "always",&always,
N_("show abbreviated commit object as fallback")),
{OPTION_STRING, 0, "dirty",  &dirty, N_("mark"),
@@ -458,6 +477,8 @@ int cmd_describe(int argc, const char **argv, const char 
*prefix)
argv_array_push(&args, "--tags");
for_each_string_list_item(item, &patterns)
argv_array_pushf(&args, "--refs=refs/tags/%s", 
item->string);
+   for_each_string_list_item(item, &discard_patterns)
+   argv_array_pushf(&args, 
"--discard=refs/tags/%s", item->string);
}
if (argc)
argv_array_pushv(&args, argv);
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 9e5db9b87a1f..4e4a9f2e5305 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -218,6 +218,14 @@ test_expect_success 'describe --contains and --match' '
test_cmp expect actual
 '
 
+test_expect_success 'describe --discard' '
+   echo "c~1" >expect &&
+   tagged_commit=$(git rev-parse "refs/tags/A^0") &&
+   test_must_fail git describe --contains --match="B" $tagged_commit &&
+   git describe --contains --match="?" --discard="A" $tagged_commit 
>actual &&
+   test_cmp expect actual
+'
+
 test_expect_success 'describe --contains and --no-match' '
echo "A^0" >expect &&
tagged_commit=$(git rev-parse "refs/tags/A^0") &&
-- 
2.11.0.403.g196674b8396b



[PATCH 0/5] extend git-describe pattern matching

2017-01-11 Thread Jacob Keller
From: Jacob Keller 

Teach git describe and git name-rev the ability to match multiple
patterns inclusively. Additionally, teach these commands to also accept
negative patterns to discard any refs which match.

The pattern lists for positive and negative patterns are inclusive. This
means that for the positive patterns, a reference will be considered as
long as it matches at least one of the match patterns. It need not match
all given patterns. Additionally for negative patterns, we will not
consider any ref which matches any negative pattern, even if it matches
one of the positive patterns.

Together this allows the ability to express far more sets of tags than a
single match pattern alone. It does not provide quite the same depth as
would teaching full regexp but it is simpler and easy enough to
understand.

This is a re-send of a series from a month or so ago, I've since
re-based this on next since it appears that it was not picked up before.

Jacob Keller (5):
  doc: add documentation for OPT_STRING_LIST
  name-rev: extend --refs to accept multiple patterns
  name-rev: add support to discard refs by pattern match
  describe: teach --match to accept multiple patterns
  describe: teach describe negative pattern matches

 Documentation/git-describe.txt| 13 ++-
 Documentation/git-name-rev.txt| 11 +-
 Documentation/technical/api-parse-options.txt |  5 +++
 builtin/describe.c| 51 ++
 builtin/name-rev.c| 53 +--
 t/t6007-rev-list-cherry-pick-file.sh  | 37 +++
 t/t6120-describe.sh   | 27 ++
 7 files changed, 176 insertions(+), 21 deletions(-)

-- 
2.11.0.403.g196674b8396b



[PATCH 2/5] name-rev: extend --refs to accept multiple patterns

2017-01-11 Thread Jacob Keller
From: Jacob Keller 

Teach git name-rev to take a string list of patterns from --refs instead
of only a single pattern. The list of patterns will be matched
inclusively, such that a ref only needs to match one pattern to be
included. If a ref will only be excluded if it does not match any of the
patterns.

Add tests and documentation for this change.

Signed-off-by: Jacob Keller 
---
 Documentation/git-name-rev.txt   |  4 +++-
 builtin/name-rev.c   | 41 +---
 t/t6007-rev-list-cherry-pick-file.sh | 30 ++
 3 files changed, 62 insertions(+), 13 deletions(-)

diff --git a/Documentation/git-name-rev.txt b/Documentation/git-name-rev.txt
index ca28fb8e2a07..7433627db12d 100644
--- a/Documentation/git-name-rev.txt
+++ b/Documentation/git-name-rev.txt
@@ -26,7 +26,9 @@ OPTIONS
 
 --refs=::
Only use refs whose names match a given shell pattern.  The pattern
-   can be one of branch name, tag name or fully qualified ref name.
+   can be one of branch name, tag name or fully qualified ref name. If
+   given multiple times, use refs whose names match any of the given shell
+   patterns. Use `--no-refs` to clear any previous ref patterns given.
 
 --all::
List all commits reachable from all refs
diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index cd89d48b65e8..000a2a700ed3 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -108,7 +108,7 @@ static const char *name_ref_abbrev(const char *refname, int 
shorten_unambiguous)
 struct name_ref_data {
int tags_only;
int name_only;
-   const char *ref_filter;
+   struct string_list ref_filters;
 };
 
 static struct tip_table {
@@ -150,16 +150,33 @@ static int name_ref(const char *path, const struct 
object_id *oid, int flags, vo
if (data->tags_only && !starts_with(path, "refs/tags/"))
return 0;
 
-   if (data->ref_filter) {
-   switch (subpath_matches(path, data->ref_filter)) {
-   case -1: /* did not match */
-   return 0;
-   case 0:  /* matched fully */
-   break;
-   default: /* matched subpath */
-   can_abbreviate_output = 1;
-   break;
+   if (data->ref_filters.nr) {
+   struct string_list_item *item;
+   int matched = 0;
+
+   /* See if any of the patterns match. */
+   for_each_string_list_item(item, &data->ref_filters) {
+   /*
+* We want to check every pattern even if we already
+* found a match, just in case one of the later
+* patterns could abbreviate the output.
+*/
+   switch (subpath_matches(path, item->string)) {
+   case -1: /* did not match */
+   break;
+   case 0: /* matched fully */
+   matched = 1;
+   break;
+   default: /* matched subpath */
+   matched = 1;
+   can_abbreviate_output = 1;
+   break;
+   }
}
+
+   /* If none of the patterns matched, stop now */
+   if (!matched)
+   return 0;
}
 
add_to_tip_table(oid->hash, path, can_abbreviate_output);
@@ -306,11 +323,11 @@ int cmd_name_rev(int argc, const char **argv, const char 
*prefix)
 {
struct object_array revs = OBJECT_ARRAY_INIT;
int all = 0, transform_stdin = 0, allow_undefined = 1, always = 0, 
peel_tag = 0;
-   struct name_ref_data data = { 0, 0, NULL };
+   struct name_ref_data data = { 0, 0, STRING_LIST_INIT_NODUP };
struct option opts[] = {
OPT_BOOL(0, "name-only", &data.name_only, N_("print only names 
(no SHA-1)")),
OPT_BOOL(0, "tags", &data.tags_only, N_("only use tags to name 
the commits")),
-   OPT_STRING(0, "refs", &data.ref_filter, N_("pattern"),
+   OPT_STRING_LIST(0, "refs", &data.ref_filters, N_("pattern"),
   N_("only use refs matching ")),
OPT_GROUP(""),
OPT_BOOL(0, "all", &all, N_("list all commits reachable from 
all refs")),
diff --git a/t/t6007-rev-list-cherry-pick-file.sh 
b/t/t6007-rev-list-cherry-pick-file.sh
index 1408b608eb03..d072ec43b016 100755
--- a/t/t6007-rev-list-cherry-pick-file.sh
+++ b/t/t6007-rev-list-cherry-pick-file.sh
@@ -99,6 +99,36 @@ test_expect_success '--cherry-pick bar does not come up 
empty (II)' '
test_cmp actual.named expect
 '
 
+test_expect_success 'name-rev multiple --refs combine inclusive' '
+   git rev-list --left-right --cherry-pick F...E -- bar > actual &&
+   git name-rev

[PATCHv2 4/4] unpack-trees: support super-prefix option

2017-01-11 Thread Stefan Beller
In the future we want to support working tree operations within submodules,
e.g. "git checkout --recurse-submodules", which will update the submodule
to the commit as recorded in its superproject. In the submodule the
unpack-tree operation is carried out as usual, but the reporting to the
user needs to prefix any path with the superproject. The mechanism for
this is the super-prefix. (see 74866d757, git: make super-prefix option)

Add support for the super-prefix option for commands that unpack trees
by wrapping any path output in unpacking trees in the newly introduced
super_prefixed function. This new function prefixes any path with the
super-prefix if there is one.  Assuming the submodule case doesn't happen
in the majority of the cases, we'd want to have a fast behavior for no
super prefix, i.e. no reallocation/copying, but just returning path.

Another aspect of introducing the `super_prefixed` function is to consider
who owns the memory and if this is the right place where the path gets
modified. As the super prefix ought to change the output behavior only and
not the actual unpack tree part, it is fine to be that late in the line.
As we get passed in 'const char *path', we cannot change the path itself,
which means in case of a super prefix we have to copy over the path.
We need two static buffers in that function as the error messages
contain at most two paths.

For testing purposes enable it in read-tree, which has no output
of paths other than an unpack-trees.c. These are all converted in
this patch.

Signed-off-by: Stefan Beller 
---

This is only patchv4 that is rerolled, patches 1-3 remain as is.

The test uses '\'' now, and the super_prefixed function is rewritten
using the flow Junio suggested.

The commit message got enhanced.

Thanks,
Stefan

 git.c   |  2 +-
 t/t1001-read-tree-m-2way.sh |  8 
 unpack-trees.c  | 38 +++---
 3 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/git.c b/git.c
index dce529fcbf..acbabd1298 100644
--- a/git.c
+++ b/git.c
@@ -471,7 +471,7 @@ static struct cmd_struct commands[] = {
{ "prune-packed", cmd_prune_packed, RUN_SETUP },
{ "pull", cmd_pull, RUN_SETUP | NEED_WORK_TREE },
{ "push", cmd_push, RUN_SETUP },
-   { "read-tree", cmd_read_tree, RUN_SETUP },
+   { "read-tree", cmd_read_tree, RUN_SETUP | SUPPORT_SUPER_PREFIX},
{ "receive-pack", cmd_receive_pack },
{ "reflog", cmd_reflog, RUN_SETUP },
{ "remote", cmd_remote, RUN_SETUP },
diff --git a/t/t1001-read-tree-m-2way.sh b/t/t1001-read-tree-m-2way.sh
index 7b70089705..5ededd8e40 100755
--- a/t/t1001-read-tree-m-2way.sh
+++ b/t/t1001-read-tree-m-2way.sh
@@ -363,6 +363,14 @@ test_expect_success 'a/b (untracked) vs a, plus c/d case 
test.' '
test -f a/b
 '
 
+test_expect_success 'read-tree supports the super-prefix' '
+   cat <<-EOF >expect &&
+   error: Updating '\''fictional/a'\'' would lose untracked files 
in it
+   EOF
+   test_must_fail git --super-prefix fictional/ read-tree -u -m "$treeH" 
"$treeM" 2>actual &&
+   test_cmp expect actual
+'
+
 test_expect_success 'a/b vs a, plus c/d case setup.' '
rm -f .git/index &&
rm -fr a &&
diff --git a/unpack-trees.c b/unpack-trees.c
index 7a6df99d10..0a24472359 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -52,6 +52,36 @@ static const char 
*unpack_plumbing_errors[NB_UNPACK_TREES_ERROR_TYPES] = {
  ? ((o)->msgs[(type)])  \
  : (unpack_plumbing_errors[(type)]) )
 
+static const char *super_prefixed(const char *path)
+{
+   static struct strbuf buf[2] = {STRBUF_INIT, STRBUF_INIT};
+   static int super_prefix_len = -1;
+   static unsigned idx = 0;
+
+   if (super_prefix_len < 0) {
+   if (!get_super_prefix())
+   super_prefix_len = 0;
+   else {
+   int i;
+
+   super_prefix_len = strlen(get_super_prefix());
+   for (i = 0; i < ARRAY_SIZE(buf); i++)
+   strbuf_addstr(&buf[i], get_super_prefix());
+   }
+   }
+
+   if (!super_prefix_len)
+   return path;
+
+   if (++idx >= ARRAY_SIZE(buf))
+   idx = 0;
+
+   strbuf_setlen(&buf[idx], super_prefix_len);
+   strbuf_addstr(&buf[idx], path);
+
+   return buf[idx].buf;
+}
+
 void setup_unpack_trees_porcelain(struct unpack_trees_options *opts,
  const char *cmd)
 {
@@ -172,7 +202,7 @@ static int add_rejected_path(struct unpack_trees_options *o,
 const char *path)
 {
if (!o->show_all_errors)
-   return error(ERRORMSG(o, e), path);
+   return error(ERRORMSG(o, e), super_prefixed(path));
 
/*
 * Otherwise, insert in a list for future display by
@@ -196,7 +226,7 @@ static void display_error_msgs(struct unpack

Re: [PATCH 4/4] unpack-trees: support super-prefix option

2017-01-11 Thread Stefan Beller
On Wed, Jan 11, 2017 at 3:28 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>>> Preparing the expected output "expect" outside test_expect_success
>>> block is also old-school.  Move it inside the new test?
>>
>> I looked into that. What is our current stance on using single/double quotes?
>
> Using dq around executable part, i.e.
>
> test_expect_success title "
> echo \"'foo'\"
> "
>
> is a million-times more grave sin even if the test initially does
> not refer to any variable in its body.  Somebody will eventually
> need to add a line that refers to a variable and either forgets to
> quote \$ in front or properly quotes it.

agreed.

>  The former makes the
> variable interpolated while the arguments to the test_expect_success
> is prepared (which is a bug) and the latter makes the result quite
> ugly, like so:
>
> test_expect_success title "
> sq=\' var=foo &&
> echo \"\${sq}\$value\${sq}\"
> "
>
> Enclosing the body always in sq-pair does mean something ugly like
> this from the beginning once you need to use sq inside:
>
> test_expect_success title '
> sq='\'' &&
> echo "${sq}foo${sq}"
> '

This one fails
error: bug in the test script: broken &&-chain:
sq=' &&
echo "${sq}foo${sq}"
both other occurrences of using ${sq} are defining sq outside
(one as sq=\' and the other as sq="'")

So I think we either have to keep sq out of the test,
sq="'"
test_expect_success title '
 echo "${sq}foo${sq}"
'

or do the echo/printf trick inside:

test_expect_success title '
 sq=$(echo -e "\x27") &&
 echo "${sq}foo${sq}"
'

Eh, I just realize the '\'' works inside here-doc, too.
Nevermind then, I'll resend it with just quoted sq in the here-doc then.

Thanks,
Stefan


Re: [PATCH v10 19/20] branch: use ref-filter printing APIs

2017-01-11 Thread Jacob Keller
On Tue, Jan 10, 2017 at 12:49 AM, Karthik Nayak  wrote:
> diff --git a/builtin/branch.c b/builtin/branch.c
> index 34cd61cd9..f293ee5b0 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -37,11 +37,11 @@ static unsigned char head_sha1[20];
>  static int branch_use_color = -1;
>  static char branch_colors[][COLOR_MAXLEN] = {
> GIT_COLOR_RESET,
> -   GIT_COLOR_NORMAL,   /* PLAIN */
> -   GIT_COLOR_RED,  /* REMOTE */
> -   GIT_COLOR_NORMAL,   /* LOCAL */
> -   GIT_COLOR_GREEN,/* CURRENT */
> -   GIT_COLOR_BLUE, /* UPSTREAM */
> +   GIT_COLOR_NORMAL,   /* PLAIN */
> +   GIT_COLOR_RED,  /* REMOTE */
> +   GIT_COLOR_NORMAL,   /* LOCAL */
> +   GIT_COLOR_GREEN,/* CURRENT */
> +   GIT_COLOR_BLUE, /* UPSTREAM */
>  };


What's... actually changing here? It looks like just white space? Is
there a strong reason for why this is changing?

Thanks,
Jake


Re: [PATCH 4/4] unpack-trees: support super-prefix option

2017-01-11 Thread Junio C Hamano
Stefan Beller  writes:

>> Preparing the expected output "expect" outside test_expect_success
>> block is also old-school.  Move it inside the new test?
>
> I looked into that. What is our current stance on using single/double quotes?

Using dq around executable part, i.e.

test_expect_success title "
echo \"'foo'\"
"

is a million-times more grave sin even if the test initially does
not refer to any variable in its body.  Somebody will eventually
need to add a line that refers to a variable and either forgets to
quote \$ in front or properly quotes it.  The former makes the
variable interpolated while the arguments to the test_expect_success
is prepared (which is a bug) and the latter makes the result quite
ugly, like so:

test_expect_success title "
sq=\' var=foo &&
echo \"\${sq}\$value\${sq}\"
"

Enclosing the body always in sq-pair does mean something ugly like
this from the beginning once you need to use sq inside:

test_expect_success title '
sq='\'' &&
echo "${sq}foo${sq}"
'

or

test_expect_success title '
echo "'\''foo'\''"
'

but the ugliness is localized to places where single quote is
absolutely needed, and '\'' is something eyes soon get used to as an
idiom [*1*].  It does not affect every reference of variables like
enclosing with dq does.


[Footnote]

*1* That "idiom"-ness is the reason why the last example above is
not written like this:

test_expect_success title '
echo "'\'foo\''"
'

even though the sq pair surrounding "foo" is not technically
necessary.  It lets us think of the string as almost always is
inside sq context, with the single exception that we step
outside sq context and append a sq by saying bs-sq (\').


Re: [PATCH 4/4] unpack-trees: support super-prefix option

2017-01-11 Thread Stefan Beller
On Wed, Jan 11, 2017 at 1:32 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> Add support for the super-prefix option for commands that unpack trees.
>> For testing purposes enable it in read-tree, which has no other path
>> related output.
>
> "path related output"?  I am not sure I understand this.

Well, s/path related output/output of paths/.

> When read-tree reads N trees, or unpack_trees() is asked to "merge"
> N trees, how does --super-prefix supposed to interact with the paths
> in these trees?  Does the prefix added to paths in all trees?

Internally the super-prefix is ignored. Only the (input and) output
is using that super prefix for messages.

It was introduced for grepping recursively into submodules, i.e.

invoke as

git grep --recurse-submodules \
  -e path/inside/submodule/and/further/down
# internally it invokes:
git -C .. --super-prefix .. grep ..
# which operates "just normal" except for the
# input parsing and output

The use case for this patch is working tree related things, i.e.
git checkout --recurse-submodules
# internally when recursing we call "git read-tree -u", but
# reporting could be:
Your local changes to the following files would be overwritten by checkout:
  path/inside/submodule/file.txt


>
> Did you mean that read-tree (and unpack_trees() in general) is a
> whole-tree operation and there is no path related input (i.e.
> pathspec limiting), but if another command that started in the
> superproject is running us in its submodule, it wants us to prefix
> the path to the submodule when we report errors?

Yes. I tried to explain it better above, but you got it here.

>
> If that is what is going on, I think it makes sense, but it may be
> asking too much from the readers to guess that from "Add support for
> the super-prefix option".

I need to enhance the commit message by a lot then.

>
>> --- a/t/t1001-read-tree-m-2way.sh
>> +++ b/t/t1001-read-tree-m-2way.sh
>> @@ -363,6 +363,15 @@ test_expect_success 'a/b (untracked) vs a, plus c/d 
>> case test.' '
>>   test -f a/b
>>  '
>>
>> +cat <<-EOF >expect &&
>> + error: Updating 'fictional/a' would lose untracked files in it
>> +EOF
>> +
>> +test_expect_success 'read-tree supports the super-prefix' '
>> + test_must_fail git --super-prefix fictional/ read-tree -u -m "$treeH" 
>> "$treeM" 2>actual &&
>> + test_cmp expect actual
>> +'
>> +
>
> Preparing the expected output "expect" outside test_expect_success
> block is also old-school.  Move it inside the new test?

I looked into that. What is our current stance on using single/double quotes?
Most tests use single quotes for the test title as well as the test
instructions,
such that double quotes can be used naturally in there. But single quotes
cannot be used even escaped, such that we need to do a thing like

sq="'"

test_expect_success 'test title' '
cat <<-EOF >expect &&
error for ${sq}path${sq}
EOF
test instructions go here
'

though that is not used as often as I would think as
grep \${sq} yields t1507 and t3005.

>
> Hmph, as a reader of the code, I do not even want to wonder how
> expensive get_super_prefix() is.  If the executable part of the
> above were written like this, it would have been easier to
> understand:
>
> if (super_prefix_len < 0) {
> if (!get_super_prefix())
> super_prefix_len = 0;
> else {
> int i;
> ... prepare buf[] and set super_prefix_len ...;
> }
> }
>
> if (!super_prefix_len)
> return path;
>
> ... use buf[] to do the prefixing and return it ...
>

good point. I'll fix that.


Re: [RFC PATCH 3/2] vreportf: add prefix to each line

2017-01-11 Thread Junio C Hamano
Jeff King  writes:

> On Wed, Jan 11, 2017 at 09:01:38AM -0500, Jeff King wrote:
>
>>   [1/2]: Revert "vreportf: avoid intermediate buffer"
>>   [2/2]: vreport: sanitize ASCII control chars
>
> We've talked before about repeating the "error:" header for multi-line
> messages. The reversion in patch 1 makes this easy to play with, so I
> did. I kind of hate it. But here it is, for your viewing pleasure.

Thanks.

>
> -- >8 --
> Subject: vreportf: add prefix to each line
>
> Some error and warning messages are multi-line, but we put
> the prefix only on the first line. This means that either
> the subsequent lines don't have any indication that they're
> connected to the original error, or the caller has to make
> multiple calls to error() or warning(). And that's not even
> possible for die().
>
> Instead, let's put the prefix at the start of each line,
> just as advise() does.
>
> Note that this patch doesn't update the tests yet, so it
> causes tons of failures. This is just to see what it might
> look like.
>
> It's actually kind of ugly.  For instance, a failing test in
> t3600 now produces:
>
>error: the following files have staged content different from both the
>error: file and the HEAD:
>error: bar.txt
>error: foo.txt
>error: (use -f to force removal)
>
> which seems a bit aggressive.  

I agree that it is ugly, but one reason I was hoping to do this
myself (or have somebody else do it by procrastinating) was that I
thought it may help i18n.  That is, for an original

error(_("we found an error"))

a .po file may translate the string to a multi-line string that has
LFs in it and the output would look correct.  The translator already
can do so by indenting the second and subsequent lines by the same
column-width as "error: " (or its translation in their language, if
we are going to i18n these headers), but that (1) is extra work for
them, and (2) makes it impossible to have the same message appear in
different contexts (i.e. "error:" vs "warning:" that have different
column-widths).

> It also screws up messages which indent with tabs (the prefix eats
> up some of the tabstop, making the indentation smaller).

This is unavoidable and at the same time is a non-issue, isn't it?
Messages that indent the second and subsequent lines with tabs are
compensating the lack of the multi-line support of vreportf(), which
this RFC patch fixes.  They may need to be adjusted to the new world
order, but that is a good thing.  New multi-line messages no longer
have to worry about the prefix that is added only to the first line
when continuing the message to multiple lines.

> And the result is
> a little harder if you need to cut-and-paste the file lines
> (if your terminal lets you triple-click to select a whole
> line, now you have this "error:" cruft on the front).

This might be an issue, but I personally do not care (as I know how
to drive my "screen", namely, use 'c' while cutting) ;-).

> It may be possible to address some of that by using some
> other kind of continuation marker (instead of just repeating
> the prefix), and expanding initial tabs.

Yes indeed.  The "some other kind of continuation marker" could just
be a run of spaces that fill the same column as the "error: " or
other prefix given to the first line.

> Signed-off-by: Jeff King 
> ---
>  usage.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/usage.c b/usage.c
> index ad6d2910f..8a1f6ff4e 100644
> --- a/usage.c
> +++ b/usage.c
> @@ -8,18 +8,30 @@
>  
>  static FILE *error_handle;
>  
> +static void show_line(FILE *fh, const char *prefix, const char *line, int 
> len)
> +{
> + fprintf(fh, "%s%.*s\n", prefix, len, line);
> +}
> +
>  void vreportf(const char *prefix, const char *err, va_list params)
>  {
>   char msg[4096];
>   FILE *fh = error_handle ? error_handle : stderr;
> + const char *base;
>   char *p;
>  
>   vsnprintf(msg, sizeof(msg), err, params);
> +
> + base = msg;
>   for (p = msg; *p; p++) {
> - if (iscntrl(*p) && *p != '\t' && *p != '\n')
> + if (*p == '\n') {
> + show_line(fh, prefix, base, p - base);
> + base = p + 1;
> + } else if (iscntrl(*p) && *p != '\t')
>   *p = '?';
>   }
> - fprintf(fh, "%s%s\n", prefix, msg);
> +
> + show_line(fh, prefix, base, p - base);
>  }
>  
>  static NORETURN void usage_builtin(const char *err, va_list params)


Re: [PATCHv2 1/2] builtin/commit.c: drop use snprintf via dynamic allocation

2017-01-11 Thread Junio C Hamano
Elia Pinto  writes:

> In general snprintf is bad because it may silently truncate results if we're
> wrong. In this patch where we use PATH_MAX, we'd want to handle larger
> paths anyway, so we switch to dynamic allocation.
>
> Helped-by: Jeff King 
> Signed-off-by: Elia Pinto 
> ---
> This is the second version of the patch.
>
> I have split the original commit in two, as discussed here
> http://public-inbox.org/git/20161213132717.42965-1-gitter.spi...@gmail.com/.

And both halves have the same title?

I think the the primary value of this one is that it removes the
PATH_MAX limitation from the environment setting that points to a
filename.  Reducing the number of snprintf() call is a side effect,
even though that may have been what motivated you originally.  The
original code used snprintf() correctly after all.

The primary value of 2/2 is reduction of the cognitive burden from
the programmers---switching to xstrfmt(), they no longer have to
count bytes needed for static allocation.  Reducing the number of
snprintf() call is a side effect, even though that may have been
what motivated you originally.  The original code used snprintf()
correctly after all.

The code looks correct in both patches (except that in 1/2 _clear()
immediately before exit() wouldn't have to be there, but it does not
hurt to have one either).  They need to be better explained.

Thanks.

>  builtin/commit.c | 22 +++---
>  1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/builtin/commit.c b/builtin/commit.c
> index 0ed634b26..09bcc0f13 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -960,15 +960,16 @@ static int prepare_to_commit(const char *index_file, 
> const char *prefix,
>   return 0;
>  
>   if (use_editor) {
> - char index[PATH_MAX];
> - const char *env[2] = { NULL };
> - env[0] =  index;
> - snprintf(index, sizeof(index), "GIT_INDEX_FILE=%s", index_file);
> - if (launch_editor(git_path_commit_editmsg(), NULL, env)) {
> + struct argv_array env = ARGV_ARRAY_INIT;
> +
> + argv_array_pushf(&env, "GIT_INDEX_FILE=%s", index_file);
> + if (launch_editor(git_path_commit_editmsg(), NULL, env.argv)) {
>   fprintf(stderr,
>   _("Please supply the message using either -m or -F 
> option.\n"));
> + argv_array_clear(&env);
>   exit(1);
>   }
> + argv_array_clear(&env);
>   }
>  
>   if (!no_verify &&
> @@ -1557,23 +1558,22 @@ static int run_rewrite_hook(const unsigned char 
> *oldsha1,
>  
>  int run_commit_hook(int editor_is_used, const char *index_file, const char 
> *name, ...)
>  {
> - const char *hook_env[3] =  { NULL };
> - char index[PATH_MAX];
> + struct argv_array hook_env = ARGV_ARRAY_INIT;
>   va_list args;
>   int ret;
>  
> - snprintf(index, sizeof(index), "GIT_INDEX_FILE=%s", index_file);
> - hook_env[0] = index;
> + argv_array_pushf(&hook_env, "GIT_INDEX_FILE=%s", index_file);
>  
>   /*
>* Let the hook know that no editor will be launched.
>*/
>   if (!editor_is_used)
> - hook_env[1] = "GIT_EDITOR=:";
> + argv_array_push(&hook_env, "GIT_EDITOR=:");
>  
>   va_start(args, name);
> - ret = run_hook_ve(hook_env, name, args);
> + ret = run_hook_ve(hook_env.argv,name, args);
>   va_end(args);
> + argv_array_clear(&hook_env);
>  
>   return ret;
>  }


Re: [PATCH] index: improve constness for reading blob data

2017-01-11 Thread Junio C Hamano
Brandon Williams  writes:

> Improve constness of the index_state parameter to the
> 'read_blob_data_from_index' function.
>
> Signed-off-by: Brandon Williams 
> ---

Thanks.


Re: [PATCH 4/4] unpack-trees: support super-prefix option

2017-01-11 Thread Junio C Hamano
Stefan Beller  writes:

> Add support for the super-prefix option for commands that unpack trees.
> For testing purposes enable it in read-tree, which has no other path
> related output.

"path related output"?  I am not sure I understand this.

When read-tree reads N trees, or unpack_trees() is asked to "merge"
N trees, how does --super-prefix supposed to interact with the paths
in these trees?  Does the prefix added to paths in all trees?

Did you mean that read-tree (and unpack_trees() in general) is a
whole-tree operation and there is no path related input (i.e.
pathspec limiting), but if another command that started in the
superproject is running us in its submodule, it wants us to prefix
the path to the submodule when we report errors?

If that is what is going on, I think it makes sense, but it may be
asking too much from the readers to guess that from "Add support for
the super-prefix option".

> --- a/t/t1001-read-tree-m-2way.sh
> +++ b/t/t1001-read-tree-m-2way.sh
> @@ -363,6 +363,15 @@ test_expect_success 'a/b (untracked) vs a, plus c/d case 
> test.' '
>   test -f a/b
>  '
>  
> +cat <<-EOF >expect &&
> + error: Updating 'fictional/a' would lose untracked files in it
> +EOF
> +
> +test_expect_success 'read-tree supports the super-prefix' '
> + test_must_fail git --super-prefix fictional/ read-tree -u -m "$treeH" 
> "$treeM" 2>actual &&
> + test_cmp expect actual
> +'
> +

Preparing the expected output "expect" outside test_expect_success
block is also old-school.  Move it inside the new test?

> diff --git a/unpack-trees.c b/unpack-trees.c
> index 7a6df99d10..bc56195e27 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -52,6 +52,37 @@ static const char 
> *unpack_plumbing_errors[NB_UNPACK_TREES_ERROR_TYPES] = {
> ? ((o)->msgs[(type)])  \
> : (unpack_plumbing_errors[(type)]) )
>  
> +static const char *super_prefixed(const char *path)
> +{
> + /*
> +  * This is used for the error messages above.
> +  * We need to have exactly two buffer spaces.
> +  */
> + static struct strbuf buf[2] = {STRBUF_INIT, STRBUF_INIT};
> + static int super_prefix_len = -1;
> + static unsigned idx = 0;
> +
> + if (!get_super_prefix())
> + return path;
> +
> + if (super_prefix_len < 0) {
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(buf); i++)
> + strbuf_addstr(&buf[i], get_super_prefix());
> +
> + super_prefix_len = strlen(get_super_prefix());
> + }
> +
> + if (++idx >= ARRAY_SIZE(buf))
> + idx = 0;
> +
> + strbuf_setlen(&buf[idx], super_prefix_len);
> + strbuf_addstr(&buf[idx], path);
> +
> + return buf[idx].buf;
> +}

Hmph, as a reader of the code, I do not even want to wonder how
expensive get_super_prefix() is.  If the executable part of the
above were written like this, it would have been easier to
understand:

if (super_prefix_len < 0) {
if (!get_super_prefix())
super_prefix_len = 0;
else {
int i;
... prepare buf[] and set super_prefix_len ...;
}
}

if (!super_prefix_len)
return path;

... use buf[] to do the prefixing and return it ...



Re: [PATCH 2/2] diff: document the pattern format for diff.orderFile

2017-01-11 Thread Junio C Hamano
Richard Hansen  writes:

>> Back then we didn't even have wildmatch(), and used fnmatch()
>> instead, so forcing FNM_PATHNAME would have meant that people
>> wouldn't be able to say "foo*bar" to match "foo/other/bar"; with
>> wildmatch, "foo**bar" lets you defeat WM_PATHNAME so having
>> WM_PATHNAME always in effect is less of an issue, but with
>> fnmatch(), having FNM_PATHNAME always in effect has a lot of
>> downside.
>
> Ah, that makes sense.
>>
>> I'd expect that orderfile people have today will be broken and
>> require tweaking if you switched WM_PATHNAME on.
>
> OK, so we don't want to turn on WM_PATHNAME unless we do it for a new
> major version.

I do agree with you that if we were starting Git from scratch, or at
least if we were adding diff.orderfile feature today, we would have
used wildmatch(WM_PATHNAME) for this matching.  We would also have
used the same parser as used to read the exclude files (and when we
see negative matching entries in the parsed result, either errored
out or ignored them with warning).  That kind of change unfortunately
would require a major version bump, I am afraid.


[PATCH] submodule absorbgitdirs: mention in docstring help

2017-01-11 Thread Stefan Beller
This part was missing in f6f85861 (submodule: add absorb-git-dir function,
2016-12-12).

Noticed-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
---
 git-submodule.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 554bd1c494..b43af1742c 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -12,7 +12,8 @@ USAGE="[--quiet] add [-b ] [-f|--force] [--name 
] [--reference ] [--recursive] [--] [...]
or: $dashless [--quiet] summary [--cached|--files] [--summary-limit ] 
[commit] [--] [...]
or: $dashless [--quiet] foreach [--recursive] 
-   or: $dashless [--quiet] sync [--recursive] [--] [...]"
+   or: $dashless [--quiet] sync [--recursive] [--] [...]
+   or: $dashless [--quiet] absorbgitdirs [--] [...]"
 OPTIONS_SPEC=
 SUBDIRECTORY_OK=Yes
 . git-sh-setup
-- 
2.11.0.259.g7b30ecf4f0



Re: [PATCH 1/2] diff: document behavior of relative diff.orderFile

2017-01-11 Thread Junio C Hamano
Jeff King  writes:

> On Tue, Jan 10, 2017 at 02:15:11PM -0800, Junio C Hamano wrote:
>
>> Richard Hansen  writes:
>> ...
>> > I think so.  (For bare repositories anyway; non-bare should be
>> > relative to GIT_WORK_TREE.)  Perhaps git_config_pathname() itself
>> > should convert relative paths to absolute so that every pathname
>> > setting automatically works without changing any calling code.
>> 
>> Yes, that was what I was alluding to.  We might have to wait until
>> major version boundary to do so, but I think that it is the sensible
>> way forward in the longer term to convert relative to absolute in
>> git_config_pathname().
>
> Yeah, I'd agree.
>
> I'm undecided on whether it would need to happen at a major version
> bump. ...
>
> So I dunno. I do hate to break even corner cases, but I'm having trouble
> imagining the scenario where somebody is actually using the current
> behavior in a useful way.

Thanks for a detailed analysis (I probably should have spelt them
out when I said "we might have to" to save you the trouble).



Re: [musl] Re: Test failures when Git is built with libpcre and grep is built without it

2017-01-11 Thread Junio C Hamano
Jeff King  writes:

> Another difference I found is that "[\d]" matches a literal "\" or "d"
> in ERE, but behaves like "[0-9]" in PCRE. I'll work up a patch based on
> that.

Wow, clever.

Thanks.


Re: [PATCH v1] convert: add "status=delayed" to filter process protocol

2017-01-11 Thread Junio C Hamano
Lars Schneider  writes:

>> Hmm, I would have expected that the basic flow would become
>> 
>>  for each paths to be processed:
>>  convert-to-worktree to buf
>>  if not delayed:
>>  do the caller's thing to use buf
>>  else:
>>  remember path
>> 
>>  for each delayed paths:
>>  ensure filter process finished processing for path
>>  fetch the thing to buf from the process
>>  do the caller's thing to use buf
>> 
>> and that would make quite a lot of sense.  However, what is actually
>> implemented is a bit disappointing from that point of view.  While
>> its first part is the same as above, the latter part instead does:
>> 
>>  for each delayed paths:
>>  checkout the path
>> ...
>
> I am not sure I can follow you here.
> ...
> I implemented the "checkout_delayed_entries" function in v1 because
> it solved the problem with minimal changes in the existing code. Our previous 
> discussion made me think that this is the preferred way:
>
>  I do not think we want to see such a rewrite all over the
>  codepaths.  It might be OK to add such a "these entries are known
>  to be delayed" list in struct checkout so that the above becomes
>  more like this:
>
>for (i = 0; i < active_nr; i++)
>   checkout_entry(active_cache[i], state, NULL);
>  + checkout_entry_finish(state);
>
>  That is, addition of a single "some of the checkout_entry() calls
>  done so far might have been lazy, and I'll give them a chance to
>  clean up" might be palatable.  Anything more than that on the
>  caller side is not.

But that is apples-and-oranges comparision, no?  The old discussion
assumes there is no "caller's thing to use buf" other than "checkout
to the working tree", which is why the function its main loop calls
is "checkout_entry()" and the caller does not see the contents of
the filtered blob at all.  In that context, checkout_entry_finish()
that equally does not let the caller see the contents of the
filtered blob is quite appropriate.



Re: [PATCH v1] convert: add "status=delayed" to filter process protocol

2017-01-11 Thread Junio C Hamano
Jakub Narębski  writes:

>> Yes, this problem happens every day with filters that perform network
>> requests (e.g. GitLFS). 
>
> Do I understand it correctly that the expected performance improvement
> thanks to this feature is possible only if there is some amount of
> parallelism and concurrency in the filter?  That is, filter can be sending
> one blob to Git while processing other one, or filter can be fetching blobs
> in parallel.

The first-object latency may not be helped, but by allowing
"delayed", the latency to retrieve the second and subsequent objects
can be hidden, I would think.


Re: git cat-file on a submodule

2017-01-11 Thread Junio C Hamano
Jeff King  writes:

> On Tue, Jan 10, 2017 at 07:11:40PM -0500, David Turner wrote:
>
>> Why does git cat-file -t $sha:foo, where foo is a submodule, not work?
> ...
> I'm not sure if I'm complaining or not. I can't immediately think of
> something that would be horribly broken. But it really feels like you
> are using the wrong tool, and patching the tool to handle this case will
> probably lead to weird cognitive dissonance down the road.

"git cat-file [any option] $sha" should fail and complain for any
$sha that does not name an object that exists in the object database
of the repository it is working on.  

So I'd complain if the first example command quoted above from
David's mail stopped failing when the commit bound at 'foo' in the
top-level treeish $sha (i.e. a commit in the submodule) does not
exist in the top-level repository's object database.

> Maybe it would help to describe your use case more fully. If what you
> care about is the presumed type based on the surrounding tree, then
> maybe:
>
>   git --literal-pathspecs ls-tree $sha -- foo
>
> would be a better match.

Yup.


[PATCH] lib-submodule-update.sh: drop unneeded shell

2017-01-11 Thread Stefan Beller
In modern Git we prefer "git -C )"
as it doesn't need an extra shell.

Signed-off-by: Stefan Beller 
---

And because it is in a setup function, we actually save the invocation
of 22 shells for a single run of the whole test suite.

Noticed while adding a lot more in near vincinity, though not as near
to cause merge conflicts, so sending it extra.

Thanks,
Stefan

 t/lib-submodule-update.sh | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 79cdd34a54..915eb4a7c6 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -69,10 +69,7 @@ create_lib_submodule_repo () {
 
git checkout -b "replace_sub1_with_directory" "add_sub1" &&
git submodule update &&
-   (
-   cd sub1 &&
-   git checkout modifications
-   ) &&
+   git -C sub1 checkout modifications &&
git rm --cached sub1 &&
rm sub1/.git* &&
git config -f .gitmodules --remove-section "submodule.sub1" &&
-- 
2.11.0.259.g7b30ecf4f0



Re: [PATCH 2/2] diff: document the pattern format for diff.orderFile

2017-01-11 Thread Richard Hansen

On 2017-01-11 13:15, Junio C Hamano wrote:

Richard Hansen  writes:


On 2017-01-10 21:46, Junio C Hamano wrote:

Richard Hansen  writes:


I was looking at the code to see how the two file formats differed and
noticed that match_order() doesn't set the WM_PATHNAME flag when it
calls wildmatch().  That's unintentional (a bug), right?


It has been that way from day one IIRC even before we introduced
wildmatch()---IOW it may be intentional that the current code that
uses wildmatch() does not use WM_PATHNAME.


You are the original author (af5323e027 2005-05-30).  Do you remember
what your intention was?


Yes.

Back then we didn't even have wildmatch(), and used fnmatch()
instead, so forcing FNM_PATHNAME would have meant that people
wouldn't be able to say "foo*bar" to match "foo/other/bar"; with
wildmatch, "foo**bar" lets you defeat WM_PATHNAME so having
WM_PATHNAME always in effect is less of an issue, but with
fnmatch(), having FNM_PATHNAME always in effect has a lot of
downside.


Ah, that makes sense.



I'd expect that orderfile people have today will be broken and
require tweaking if you switched WM_PATHNAME on.


OK, so we don't want to turn on WM_PATHNAME unless we do it for a new 
major version.


I'll do another re-roll and document the non-WM_PATHNAME behavior. 
Perhaps I'll encourage users to prefer ** over * if they want to match 
slash (even though they are equivalent) so that migration is easier if 
we ever do turn on WM_PATHNAME.


-Richard


Re: git cat-file on a submodule

2017-01-11 Thread David Turner
On Wed, 2017-01-11 at 07:53 -0500, Jeff King wrote:
> On Tue, Jan 10, 2017 at 07:11:40PM -0500, David Turner wrote:
> 
> > Why does git cat-file -t $sha:foo, where foo is a submodule, not work?
> 
> Because "cat-file" is about inspecting items in the object database, and
> typically the submodule commit is not present in the superproject's
> database. So we cannot know its type. You can infer what it _should_ be
> from the surrounding tree, but you cannot actually do the object lookup.
> 
> Likewise, "git cat-file -t $sha1:Makefile" is not just telling you that
> we found a 100644 entry in the tree, so we expect a blob. It's resolving
> to a sha1, and then checking the type of that sha1 in the database. It
> _should_ be a blob, but if it isn't, then cat-file is the tool that
> should tell you that it is not.
> 
> > git rev-parse $sha:foo works.
> 
> Right. Because that command is about resolving a name to a sha1, which
> we can do even without the object.
> 
> > By "why", I mean "would anyone complain if I fixed it?"  FWIW, I think
> > -p should just return the submodule's sha.
> 
> I'm not sure if I'm complaining or not. I can't immediately think of
> something that would be horribly broken. But it really feels like you
> are using the wrong tool, and patching the tool to handle this case will
> probably lead to weird cognitive dissonance down the road.

OK, this makes sense to me.  I tried cat-file because that is the tool I
was familiar with, but that doesn't mean that it was the right thing to
do.

> Maybe it would help to describe your use case more fully. If what you
> care about is the presumed type based on the surrounding tree, then
> maybe:
> 
>   git --literal-pathspecs ls-tree $sha -- foo

That (minus --literal-pathspecs, which does not exist in the version of
git I happen to be using) is fine for my purposes.  Thanks.



Re: [PATCH 2/2] diff: document the pattern format for diff.orderFile

2017-01-11 Thread Junio C Hamano
Richard Hansen  writes:

> On 2017-01-10 21:46, Junio C Hamano wrote:
>> Richard Hansen  writes:
>>
>>> I was looking at the code to see how the two file formats differed and
>>> noticed that match_order() doesn't set the WM_PATHNAME flag when it
>>> calls wildmatch().  That's unintentional (a bug), right?
>>
>> It has been that way from day one IIRC even before we introduced
>> wildmatch()---IOW it may be intentional that the current code that
>> uses wildmatch() does not use WM_PATHNAME.
>
> You are the original author (af5323e027 2005-05-30).  Do you remember
> what your intention was?

Yes.  

Back then we didn't even have wildmatch(), and used fnmatch()
instead, so forcing FNM_PATHNAME would have meant that people
wouldn't be able to say "foo*bar" to match "foo/other/bar"; with
wildmatch, "foo**bar" lets you defeat WM_PATHNAME so having
WM_PATHNAME always in effect is less of an issue, but with
fnmatch(), having FNM_PATHNAME always in effect has a lot of
downside.

I'd expect that orderfile people have today will be broken and
require tweaking if you switched WM_PATHNAME on.


Re: [RFC PATCH 0/5] Localise error headers

2017-01-11 Thread Junio C Hamano
Jeff King  writes:

> Yes, I would think die_errno() is a no-brainer for translation, since
> the strerror() will be translated.
>
>> apply.c:die(_("internal error"));
>> 
>> That is funny, too. I think we should substitute that with
>> 
>> die("BUG: untranslated, but what went wrong instead")
>
> Yep. We did not consistently use "BUG:" in the early days. I would say
> that "BUG" lines do not need to be translated. The point is that nobody
> should ever see them, so it seems like there is little point in giving
> extra work to translators.

In addition, "BUG: " is relatively recent introduction to our
codebase.  Perhaps having a separate BUG() function help the
distinction further?


Re: [PATCH v1] convert: add "status=delayed" to filter process protocol

2017-01-11 Thread Taylor Blau
On Wed, Jan 11, 2017 at 11:13:00AM +0100, Lars Schneider wrote:
>
> In v1 I implemented a) with the busy-loop problem in mind.
>
> My thinking was this:
>
> If the filter sees at least one filter request twice then the filter knows 
> that
> Git has already requested all files that require filtering. At that point the
> filter could just block the "delayed" answer to the latest filter request 
> until
> at least one of the previously delayed requests can be fulfilled. Then the 
> filter
> answers "delay" to Git until Git requests the blob that can be fulfilled. This
> process cycles until all requests can be fulfilled. Wouldn't that work?
>
> I think a "done" message by the filter is not easy. Right now the protocol 
> works
> in a mode were Git always asks and the filter always answers. I believe 
> changing
> the filter to be able to initiate a "done" message would complicated the 
> protocol.
>
> > Additionally, the protocol should specify a sentinel "no more entries" value
> > that could be sent from Git to the filter to signal that there are no more 
> > files
> > to checkout. Some filters may implement mechanisms for converting files that
> > require a signal to know when all files have been sent. Specifically, Git 
> > LFS
> > (https://git-lfs.github.com) batches files to be transferred together, and 
> > needs
> > to know when all files have been announced to truncate and send the last 
> > batch,
> > if it is not yet full. I'm sure other filter implementations use a similar
> > mechanism and would benefit from this as well.
>
> I agree. I think the filter already has this info implicitly as explained 
> above
> but an explicit message would be better!

This works, but forces some undesirable constraints:

- The filter has to keep track of all items in the checkout to determine how
  many times Git has asked about them. This is probably fine, though annoying.
  Cases where this is not fine is when there are _many_ items in the checkout
  forcing filter implementations to keep track of large sets of data.
- The protocol is now _must_ ask for the status of checkout items in a
  particular order. If the assumption is that "Git has sent me all items in the
  checkout by the point I see Git ask for the status of an item more than once",
  then Git _cannot_ ask for the status of a delayed item while there are still
  more items to checkout. This could be OK, so long as the protocol commits to
  it and documents this behavior.

What are your thoughts?

--
Thanks,
Taylor Blau

ttayl...@github.com


Re: [PATCH 2/2] diff: document the pattern format for diff.orderFile

2017-01-11 Thread Richard Hansen

On 2017-01-10 21:46, Junio C Hamano wrote:

Richard Hansen  writes:


I was looking at the code to see how the two file formats differed and
noticed that match_order() doesn't set the WM_PATHNAME flag when it
calls wildmatch().  That's unintentional (a bug), right?


It has been that way from day one IIRC even before we introduced
wildmatch()---IOW it may be intentional that the current code that
uses wildmatch() does not use WM_PATHNAME.


You are the original author (af5323e027 2005-05-30).  Do you remember 
what your intention was?


I would like to change it to pass WM_PATHNAME, but I'm not sure if that 
would break anyone.  I'm guessing probably not, because users probably 
expect WM_PATHNAME and would be surprised (like I was) to learn otherwise.


If we want to keep it as-is, I'll have to adjust [PATCH v2 2/2].

-Richard


Re: [RFC PATCH 0/5] Localise error headers

2017-01-11 Thread Stefan Beller
On Wed, Jan 11, 2017 at 3:37 AM, Jeff King  wrote:
> On Tue, Jan 10, 2017 at 10:28:42AM -0800, Stefan Beller wrote:
>
>> > And then presumably that mix would gradually move to 100% consistency as
>> > more messages are translated. But the implicit question is: are there
>> > die() messages that should never be translated? I'm not sure.
>>
>> I would assume any plumbing command is not localizing?
>> Because in plumbing land, (easily scriptable) you may find
>> a grep on the output/stderr for a certain condition?
>
> That's the assumption I'm challenging. Certainly the behavior and
> certain aspects of the output of a plumbing command should remain the
> same over time. But error messages to stderr?

In an ideal world that assumption would hold true and any machine
readable information from the plumbing commands are on stdout and
nobody in their sane mind would ever try to parse stderr.

There is no easy way to check though except auditing all the code.
Having pointed out some string handling mistakes in the prior email,
my confidence is not high that plumbing commands are that strict about
separating useful machine output and errors for human consumption.

>
> It seems like they should be translated, because plumbing invoked on
> behalf of porcelain scripts is going to send its stderr directly to the
> user.

Well that could be solved, c.f. unpack-trees.c lines 15-55.

As another data point (coming from that area of strings):
If you grep for these plumbing strings in the project,
i.e.
$ git grep "would be overwritten by merge. Cannot merge"
$ git grep "not uptodate. Cannot merge."
...

you only find the occurrence in the  unpack-trees.c lines 15-55,
which means our test suite at least doesn't grep for error messages
but relies on the exit code of plumbing commands(?!)

>
>> To find a good example, "git grep die" giving me some food of though:
>>
>> die_errno(..) should always take a string marked up for translation,
>> because the errno string is translated?
>
> Yes, I would think die_errno() is a no-brainer for translation, since
> the strerror() will be translated.
>
>> apply.c:die(_("internal error"));
>>
>> That is funny, too. I think we should substitute that with
>>
>> die("BUG: untranslated, but what went wrong instead")
>
> Yep. We did not consistently use "BUG:" in the early days. I would say
> that "BUG" lines do not need to be translated. The point is that nobody
> should ever see them, so it seems like there is little point in giving
> extra work to translators.
>
> -Peff


Re: [PATCH v1] convert: add "status=delayed" to filter process protocol

2017-01-11 Thread Jakub Narębski
W dniu 11.01.2017 o 11:20, Lars Schneider pisze: 
> On 10 Jan 2017, at 23:11, Jakub Narębski  wrote:
>> W dniu 09.01.2017 o 00:42, Junio C Hamano pisze:
>>> larsxschnei...@gmail.com writes:
 From: Lars Schneider 

 Some `clean` / `smudge` filters might require a significant amount of
 time to process a single blob. During this process the Git checkout
 operation is blocked and Git needs to wait until the filter is done to
 continue with the checkout.
>>
>> Lars, what is expected use case for this feature; that is when do you
>> think this problem may happen?  Is it something that happened IRL?
> 
> Yes, this problem happens every day with filters that perform network
> requests (e.g. GitLFS). 

Do I understand it correctly that the expected performance improvement
thanks to this feature is possible only if there is some amount of
parallelism and concurrency in the filter?  That is, filter can be sending
one blob to Git while processing other one, or filter can be fetching blobs
in parallel.

This means that filter process works as a kind of (de)multiplexer for
fetching and/or processing blob contents, I think.

> [...] In GitLFS we even implemented Git wrapper
> commands to address the problem: https://github.com/git-lfs/git-lfs/pull/988
> The ultimate goal of this patch is to be able to get rid of the wrapper 
> commands.

I'm sorry, I don't see it how the wrapper helps here.

> 
 Teach the filter process protocol (introduced in edcc858) to accept the
 status "delayed" as response to a filter request. Upon this response Git
 continues with the checkout operation and asks the filter to process the
 blob again after all other blobs have been processed.
>>>
>>> Hmm, I would have expected that the basic flow would become
>>>
>>> for each paths to be processed:
>>> convert-to-worktree to buf
>>> if not delayed:
>>> do the caller's thing to use buf
>>> else:
>>> remember path
>>>
>>> for each delayed paths:
>>> ensure filter process finished processing for path
>>> fetch the thing to buf from the process
>>> do the caller's thing to use buf
>>
>> I would expect here to have a kind of event loop, namely
>>
>>while there are delayed paths:
>>get path that is ready from filter
>>fetch the thing to buf (supporting "delayed")
>>if path done
>>do the caller's thing to use buf 
>>(e.g. finish checkout path, eof convert, etc.)
>>
>> We can either trust filter process to tell us when it finished sending
>> delayed paths, or keep list of paths that are being delayed in Git.
> 
> I could implement "get path that is ready from filter" but wouldn't
> that complicate the filter protocol? I think we can use the protocol pretty 
> much as if with the strategy outlined here:
> http://public-inbox.org/git/f533857d-9b51-44c1-8889-aa0542ad8...@gmail.com/

You are talking about the "busy-loop" solution, isn't it?  In the
same notation, it would look like this:

  while there are delayed paths:
  for each delayed path:
  request path from filter [1]
  fetch the thing (supporting "delayed") [2]
  if path done
  do the caller's thing to use buf
  remove it from delayed paths list


Footnotes:
--
1) We don't send the Git-side contents of blob again, isn't it?
   So we need some protocol extension / new understanding anyway.
   for example that we don't send contents if we request path again.
2) If path is not ready at all, filter protocol would send status=delayed
   with empty contents.  This means that we would immediately go to the
   next path, if there is one.

There are some cases where busy loop is preferable, but I don't think
it is the case here.


The event loop solution would require additional protocol extension,
but I don't think those complicate protocol too much:

A. Git would need to signal filter process that it has sent all paths,
and that it should be sending delayed paths when they are ready.  This
could be done for example with "command=continue".

packet:  git> command=continue


B. Filter driver, in the event-loop phase, when (de)multiplexing fetching
or processing of data, it would need now to initialize transfer, instead
of waiting for Git to ask.  It could look like this:

packet:  git< status=resumed [3]
packet:  git< pathname=file/to/be/resumed [4]
packet:  git< 
packet:  git< SMUDGED_CONTENT_CONTINUED
packet:  git< 
packet:  git<   # empty list, means "status=success" [5]

Footnotes:
--
3.) It could be "status=success", "status=delayed", "command=resumed", etc.
4.) In the future we

Re: [PATCH 1/2] diff: document behavior of relative diff.orderFile

2017-01-11 Thread Jeff King
On Tue, Jan 10, 2017 at 02:15:11PM -0800, Junio C Hamano wrote:

> Richard Hansen  writes:
> 
> >> A related tangent.
> >>
> >> I wonder if anything that uses git_config_pathname() should be
> >> relative to GIT_DIR when it is not absolute.
> >
> > I think so.  (For bare repositories anyway; non-bare should be
> > relative to GIT_WORK_TREE.)  Perhaps git_config_pathname() itself
> > should convert relative paths to absolute so that every pathname
> > setting automatically works without changing any calling code.
> 
> Yes, that was what I was alluding to.  We might have to wait until
> major version boundary to do so, but I think that it is the sensible
> way forward in the longer term to convert relative to absolute in
> git_config_pathname().

Yeah, I'd agree.

I'm undecided on whether it would need to happen at a major version
bump. The existing semantics are fairly insane, and would cause a lot of
confusing breakages. We can imagine use of relative paths in a bare
repository falls into one of a few categories:

  1. The user generally runs by "cd /path/to/bare.git && git ...". This
 would be unaffected, as relative and $GIT_DIR are the same.

  2. The user runs via "cd /path/to/bare.git/some-subdir". This would be
 broken, but I have trouble imagining that they really wanted to
 read something like "objects/orderfile".

  3. The user runs via "GIT_DIR=/path/to/bare.git" from various
 directories. This case is probably horribly broken, as things like
 diff.orderFile will complain if they ever run from a directory that
 doesn't have the order file.

  4. They run GIT_DIR=/path/to/bare.git from a consistent origin
 directory. This _does_ work, and we'd be breaking it. Though I kind
 of question why the config in $GIT_DIR is meant to apply to a file
 in a totally unrelated directory.

 I suppose somebody could be relying on the behavior where setting
 GIT_DIR uses the current directory as the working tree (i.e., if
 core.bare is "true" in bare.git). But then, we'd consider their
 working directory as the working tree and read from that anyway. So
 the behavior would stay the same.

So I dunno. I do hate to break even corner cases, but I'm having trouble
imagining the scenario where somebody is actually using the current
behavior in a useful way.

-Peff


[RFC PATCH 3/2] vreportf: add prefix to each line

2017-01-11 Thread Jeff King
On Wed, Jan 11, 2017 at 09:01:38AM -0500, Jeff King wrote:

>   [1/2]: Revert "vreportf: avoid intermediate buffer"
>   [2/2]: vreport: sanitize ASCII control chars

We've talked before about repeating the "error:" header for multi-line
messages. The reversion in patch 1 makes this easy to play with, so I
did. I kind of hate it. But here it is, for your viewing pleasure.

-- >8 --
Subject: vreportf: add prefix to each line

Some error and warning messages are multi-line, but we put
the prefix only on the first line. This means that either
the subsequent lines don't have any indication that they're
connected to the original error, or the caller has to make
multiple calls to error() or warning(). And that's not even
possible for die().

Instead, let's put the prefix at the start of each line,
just as advise() does.

Note that this patch doesn't update the tests yet, so it
causes tons of failures. This is just to see what it might
look like.

It's actually kind of ugly.  For instance, a failing test in
t3600 now produces:

   error: the following files have staged content different from both the
   error: file and the HEAD:
   error: bar.txt
   error: foo.txt
   error: (use -f to force removal)

which seems a bit aggressive.  It also screws up messages
which indent with tabs (the prefix eats up some of the
tabstop, making the indentation smaller). And the result is
a little harder if you need to cut-and-paste the file lines
(if your terminal lets you triple-click to select a whole
line, now you have this "error:" cruft on the front).

It may be possible to address some of that by using some
other kind of continuation marker (instead of just repeating
the prefix), and expanding initial tabs.

Signed-off-by: Jeff King 
---
 usage.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/usage.c b/usage.c
index ad6d2910f..8a1f6ff4e 100644
--- a/usage.c
+++ b/usage.c
@@ -8,18 +8,30 @@
 
 static FILE *error_handle;
 
+static void show_line(FILE *fh, const char *prefix, const char *line, int len)
+{
+   fprintf(fh, "%s%.*s\n", prefix, len, line);
+}
+
 void vreportf(const char *prefix, const char *err, va_list params)
 {
char msg[4096];
FILE *fh = error_handle ? error_handle : stderr;
+   const char *base;
char *p;
 
vsnprintf(msg, sizeof(msg), err, params);
+
+   base = msg;
for (p = msg; *p; p++) {
-   if (iscntrl(*p) && *p != '\t' && *p != '\n')
+   if (*p == '\n') {
+   show_line(fh, prefix, base, p - base);
+   base = p + 1;
+   } else if (iscntrl(*p) && *p != '\t')
*p = '?';
}
-   fprintf(fh, "%s%s\n", prefix, msg);
+
+   show_line(fh, prefix, base, p - base);
 }
 
 static NORETURN void usage_builtin(const char *err, va_list params)
-- 
2.11.0.627.gfa6151259



[PATCH 1/2] Revert "vreportf: avoid intermediate buffer"

2017-01-11 Thread Jeff King
This reverts commit f4c3edc0b156362a92bf9de4f0ec794e90a757fc.

The purpose of that commit was to let us write errors of
arbitrary length to stderr by skipping the intermediate
buffer and sending our varargs straight to fprintf. That
works, but it comes with a downside: we do not get access to
the varargs before they are sent to stderr.

On balance, it's not a good tradeoff. Error messages larger
than our 4K buffer are quite uncommon, and we've lost the
ability to make any modifications to the output (e.g., to
remove non-printable characters).

The only way to have both would be one of:

  1. Write into a dynamic buffer. But this is a bad idea for
 a low-level function that may be called when malloc()
 has failed.

  2. Do our own printf-format varargs parsing. This is too
 complex to be worth the trouble.

Let's just revert that change and go back to a fixed buffer.

Signed-off-by: Jeff King 
---
 usage.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/usage.c b/usage.c
index 17f52c1b5..b1cbe6799 100644
--- a/usage.c
+++ b/usage.c
@@ -7,21 +7,13 @@
 #include "cache.h"
 
 static FILE *error_handle;
-static int tweaked_error_buffering;
 
 void vreportf(const char *prefix, const char *err, va_list params)
 {
+   char msg[4096];
FILE *fh = error_handle ? error_handle : stderr;
-
-   fflush(fh);
-   if (!tweaked_error_buffering) {
-   setvbuf(fh, NULL, _IOLBF, 0);
-   tweaked_error_buffering = 1;
-   }
-
-   fputs(prefix, fh);
-   vfprintf(fh, err, params);
-   fputc('\n', fh);
+   vsnprintf(msg, sizeof(msg), err, params);
+   fprintf(fh, "%s%s\n", prefix, msg);
 }
 
 static NORETURN void usage_builtin(const char *err, va_list params)
@@ -93,7 +85,6 @@ void set_die_is_recursing_routine(int (*routine)(void))
 void set_error_handle(FILE *fh)
 {
error_handle = fh;
-   tweaked_error_buffering = 0;
 }
 
 void NORETURN usagef(const char *err, ...)
-- 
2.11.0.627.gfa6151259



[PATCH 2/2] vreport: sanitize ASCII control chars

2017-01-11 Thread Jeff King
Our error() and die() calls may report messages with
arbitrary data (e.g., filenames or even data from a remote
server). Let's make it harder to cause confusion with
mischievous filenames. E.g., try:

  git rev-parse "$(printf "\rfatal: this argument is too sneaky")" --

or

  git rev-parse "$(printf "\x1b[5mblinky\x1b[0m")" --

Let's block all ASCII control characters, with the exception
of TAB and LF. We use both in our own messages (and we are
necessarily sanitizing the complete output of snprintf here,
as we do not have access to the individual varargs). And TAB
and LF are unlikely to cause confusion (you could put
"\nfatal: sneaky\n" in your filename, but it would at least
not _cover up_ the message leading to it, unlike "\r").

We'll replace the characters with a "?", which is similar to
how "ls" behaves. It might be nice to do something less
lossy, like converting them to "\x" hex codes. But replacing
with a single character makes it easy to do in-place and
without worrying about length limitations. This feature
should kick in rarely enough that the "?" marks are almost
never seen.

We'll leave high-bit characters as-is, as they are likely to
be UTF-8 (though there may be some Unicode mischief you
could cause, which may require further patches).

Signed-off-by: Jeff King 
---
 usage.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/usage.c b/usage.c
index b1cbe6799..ad6d2910f 100644
--- a/usage.c
+++ b/usage.c
@@ -12,7 +12,13 @@ void vreportf(const char *prefix, const char *err, va_list 
params)
 {
char msg[4096];
FILE *fh = error_handle ? error_handle : stderr;
+   char *p;
+
vsnprintf(msg, sizeof(msg), err, params);
+   for (p = msg; *p; p++) {
+   if (iscntrl(*p) && *p != '\t' && *p != '\n')
+   *p = '?';
+   }
fprintf(fh, "%s%s\n", prefix, msg);
 }
 
-- 
2.11.0.627.gfa6151259


[PATCH 0/2] sanitizing error message contents

2017-01-11 Thread Jeff King
When adding a warning() call in 50d341374 (http: make redirects more
obvious, 2016-12-06), somebody brought up that evil servers can redirect
you to something like:

  
https://evil.example.com/some/repo?unused=\rwarning:+rainbows+and_unicorns_ahead

(where "\r" is a literal CR), and instead of seeing:

  warning: redirecting to https://evil.example.com/...

you just get:

  warning: rainbows and unicorns ahead

or whatever innocuous looking line they prefer (probably just ANSI
"clear to beginning of line" would be even more effective).

Since it's hard to figure out which error messages could potentially
contain malicious contents, and since spewing control characters to the
terminal is generally bad anyway, this series sanitizes at the lowest
level.

Note that this doesn't cover "remote:" lines coming over the sideband.
Those are already covered for "\r", as we have to parse it to handle
printing "remote:" consistently. But you can play tricks like putting:

  printf '\0331K\033[0Efatal: this looks local\n'

into a pre-receive hook. I'm not sure if we would want to do more
sanitizing there. The goal of this series is not so much that a remote
can't send funny strings that may look local, but that they can't
prevent local strings from being displayed. OTOH, I suspect clever use
of ANSI codes (moving the cursor, clearing lines, etc) could get you
pretty far.

I'd be hesitant to disallow control codes entirely, though, as I suspect
some servers do send colors over the sideband. So I punted on that here,
but I think this is at least an incremental improvement.

  [1/2]: Revert "vreportf: avoid intermediate buffer"
  [2/2]: vreport: sanitize ASCII control chars

 usage.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

-Peff


Re: RFC: Enable delayed responses to Git clean/smudge filter requests

2017-01-11 Thread Lars Schneider

> On 09 Jan 2017, at 21:44, Stefan Beller  wrote:
> 
> On Mon, Nov 14, 2016 at 1:09 PM, Lars Schneider
>  wrote:
>> Hi,
>> 
>> Git always performs a clean/smudge filter on files in sequential order.
>> Sometimes a filter operation can take a noticeable amount of time.
>> This blocks the entire Git process.
>> 
>> I would like to give a filter process the possibility to answer Git with
>> "I got your request, I am processing it, ask me for the result later!".
>> 
>> I see the following way to realize this:
>> 
>> In unpack-trees.c:check_updates() [1] we loop through the cache
>> entries and "ask me later" could be an acceptable return value of the
>> checkout_entry() call. The loop could run until all entries returned
>> success or error.
> 
> Late to this thread, but here is an answer nevertheless.
> 
> I am currently working on getting submodules working
> for working tree modifying commands (prominently checkout, but
> also read-tree -u and any other caller that uses the code in
> unpack-trees.)
> 
> Once the submodules are supported and used, I anticipate that
> putting the files in the working tree on disk will become a bottle neck,
> i.e. the checkout taking way too long for an oversized project.
> 
> So in the future we have to do something to make checkout fast
> again, which IMHO is threading. My current vision is to have checkout
> automatically choose a number of threads based on expected workload,
> c.f. preload-index.c, line 18-25.

That sounds interesting! We are using "submodule.fetchjobs=0" to process
submodules in parallel already and it works great! Thanks a lot for
implementing this!


>> The filter machinery is triggered in various other places in Git and
>> all places that want to support "ask me later" would need to be patched
>> accordingly.
> 
> I think this makes sense, even in a threaded git-checkout.
> I assume this idea is implemented before threading hits checkout,
> so a question on the design:
> 
> Who determines the workload that is acceptable?
> From reading this email, it seems to be solely the filter that uses
> as many threads/processes as it thinks is ok.

Correct.


> Would it be possible to enhance the protocol further to have
> Git also mingle with the workload, i.e. tell the filter it is
> allowed to use up (N-M) threads, as it itself already uses
> M out of N configured threads?
> 
> (I do not want to discuss the details here, but only if such a thing
> is viable with this approach as well)

Yes, I think we could give a filter these kind of hints. However, it
would, of course, be up to the filter implementation to follow the hints.

In case you curious, here is the discussion on v1 of the delay implementation:
http://public-inbox.org/git/20170108191736.47359-1-larsxschnei...@gmail.com/


Cheers,
Lars

Re: git cat-file on a submodule

2017-01-11 Thread Jeff King
On Tue, Jan 10, 2017 at 07:11:40PM -0500, David Turner wrote:

> Why does git cat-file -t $sha:foo, where foo is a submodule, not work?

Because "cat-file" is about inspecting items in the object database, and
typically the submodule commit is not present in the superproject's
database. So we cannot know its type. You can infer what it _should_ be
from the surrounding tree, but you cannot actually do the object lookup.

Likewise, "git cat-file -t $sha1:Makefile" is not just telling you that
we found a 100644 entry in the tree, so we expect a blob. It's resolving
to a sha1, and then checking the type of that sha1 in the database. It
_should_ be a blob, but if it isn't, then cat-file is the tool that
should tell you that it is not.

> git rev-parse $sha:foo works.

Right. Because that command is about resolving a name to a sha1, which
we can do even without the object.

> By "why", I mean "would anyone complain if I fixed it?"  FWIW, I think
> -p should just return the submodule's sha.

I'm not sure if I'm complaining or not. I can't immediately think of
something that would be horribly broken. But it really feels like you
are using the wrong tool, and patching the tool to handle this case will
probably lead to weird cognitive dissonance down the road.

Maybe it would help to describe your use case more fully. If what you
care about is the presumed type based on the surrounding tree, then
maybe:

  git --literal-pathspecs ls-tree $sha -- foo

would be a better match.

-Peff


Re: [RFC PATCH 0/5] Localise error headers

2017-01-11 Thread Jeff King
On Tue, Jan 10, 2017 at 10:28:42AM -0800, Stefan Beller wrote:

> > And then presumably that mix would gradually move to 100% consistency as
> > more messages are translated. But the implicit question is: are there
> > die() messages that should never be translated? I'm not sure.
> 
> I would assume any plumbing command is not localizing?
> Because in plumbing land, (easily scriptable) you may find
> a grep on the output/stderr for a certain condition?

That's the assumption I'm challenging. Certainly the behavior and
certain aspects of the output of a plumbing command should remain the
same over time. But error messages to stderr?

It seems like they should be translated, because plumbing invoked on
behalf of porcelain scripts is going to send its stderr directly to the
user.

> To find a good example, "git grep die" giving me some food of though:
> 
> die_errno(..) should always take a string marked up for translation,
> because the errno string is translated?

Yes, I would think die_errno() is a no-brainer for translation, since
the strerror() will be translated.

> apply.c:die(_("internal error"));
> 
> That is funny, too. I think we should substitute that with
> 
> die("BUG: untranslated, but what went wrong instead")

Yep. We did not consistently use "BUG:" in the early days. I would say
that "BUG" lines do not need to be translated. The point is that nobody
should ever see them, so it seems like there is little point in giving
extra work to translators.

-Peff


[PATCH] t7810: avoid assumption about invalid regex syntax

2017-01-11 Thread Jeff King
A few of the tests want to check that "git grep -P -E" will
override -P with -E, and vice versa. To do so, we use a
regex with "\x{..}", which is valid in PCRE but not defined
by POSIX (for basic or extended regular expressions).

However, POSIX declares quite a lot of syntax, including
"\x", as "undefined". That leaves implementations free to
extend the standard if they choose. At least one, musl libc,
implements "\x" in the same way as PCRE.  Our tests check
that "-E" complains about "\x", which fails with musl.

We can fix this by finding some construct which behaves
reliably on both PCRE and POSIX, but differently in each
system.

One such construct is the use of backslash inside brackets.
In PCRE, "[\d]" interprets "\d" as it would outside the
brackets, matching a digit. Whereas in POSIX, the backslash
must be treated literally, and we match either it or a
literal "d".  Moreover, implementations are not free to
change this according to POSIX, so we should be able to rely
on it.

Signed-off-by: Jeff King 
---
I've tested this with glibc, but I wasn't able to do so with musl. The
two complications are:

  1. Recent versions of git won't build with musl's regex at all,
 because it doesn't support the non-standard REG_STARTEND that we
 rely on since b7d36ffca (regex: use regexec_buf(), 2016-09-21).

 So if applied on an older git, this patch should help, but newer
 versions need NO_REGEX (to use the fallback glibc regex code)
 either way, which would also make the problem go away.

 Still, I think it's the right thing to do, since we are relying on
 something that POSIX clearly leaves up to the implementation. It
 may also help on other systems, or if musl ends up supporting
 REG_STARTEND in the future.

  2. I tried to cherry-pick to v2.7.x and test it with musl. Debian
 ships with a "musl-gcc" wrapper, but it doesn't work out of the
 box. Both zlib and pcre are compiled against glibc, so I'd have to
 rebuild those, too. At which point I gave up and decided to just
 let you test it on your musl-based system. :)

 t/t7810-grep.sh | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index de2405ccb..19f0108f8 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -39,6 +39,10 @@ test_expect_success setup '
echo "a+bc"
echo "abc"
} >ab &&
+   {
+   echo d &&
+   echo 0
+   } >d0 &&
echo vvv >v &&
echo ww w >w &&
echo x x xx x >x &&
@@ -1105,36 +1109,36 @@ test_expect_success 'grep pattern with 
grep.patternType=fixed, =basic, =extended
 '
 
 test_expect_success 'grep -G -F -P -E pattern' '
-   >empty &&
-   test_must_fail git grep -G -F -P -E "a\x{2b}b\x{2a}c" ab >actual &&
-   test_cmp empty actual
+   echo "d0:d" >expected &&
+   git grep -G -F -P -E "[\d]" d0 >actual &&
+   test_cmp expected actual
 '
 
 test_expect_success 'grep pattern with grep.patternType=fixed, =basic, =perl, 
=extended' '
-   >empty &&
-   test_must_fail git \
+   echo "d0:d" >expected &&
+   git \
-c grep.patterntype=fixed \
-c grep.patterntype=basic \
-c grep.patterntype=perl \
-c grep.patterntype=extended \
-   grep "a\x{2b}b\x{2a}c" ab >actual &&
-   test_cmp empty actual
+   grep "[\d]" d0 >actual &&
+   test_cmp expected actual
 '
 
 test_expect_success LIBPCRE 'grep -G -F -E -P pattern' '
-   echo "ab:a+b*c" >expected &&
-   git grep -G -F -E -P "a\x{2b}b\x{2a}c" ab >actual &&
+   echo "d0:0" >expected &&
+   git grep -G -F -E -P "[\d]" d0 >actual &&
test_cmp expected actual
 '
 
 test_expect_success LIBPCRE 'grep pattern with grep.patternType=fixed, =basic, 
=extended, =perl' '
-   echo "ab:a+b*c" >expected &&
+   echo "d0:0" >expected &&
git \
-c grep.patterntype=fixed \
-c grep.patterntype=basic \
-c grep.patterntype=extended \
-c grep.patterntype=perl \
-   grep "a\x{2b}b\x{2a}c" ab >actual &&
+   grep "[\d]" d0 >actual &&
test_cmp expected actual
 '
 
-- 
2.11.0.627.gfa6151259


Re: [PATCH v1] convert: add "status=delayed" to filter process protocol

2017-01-11 Thread Lars Schneider

> On 10 Jan 2017, at 23:11, Jakub Narębski  wrote:
> 
> W dniu 09.01.2017 o 00:42, Junio C Hamano pisze:
>> larsxschnei...@gmail.com writes:
>>> From: Lars Schneider 
>>> 
>>> Some `clean` / `smudge` filters might require a significant amount of
>>> time to process a single blob. During this process the Git checkout
>>> operation is blocked and Git needs to wait until the filter is done to
>>> continue with the checkout.
> 
> Lars, what is expected use case for this feature; that is when do you
> think this problem may happen?  Is it something that happened IRL?

Yes, this problem happens every day with filters that perform network
requests (e.g. GitLFS). In GitLFS we even implemented Git wrapper
commands to address the problem: https://github.com/git-lfs/git-lfs/pull/988
The ultimate goal of this patch is to be able to get rid of the wrapper 
commands.


>>> Teach the filter process protocol (introduced in edcc858) to accept the
>>> status "delayed" as response to a filter request. Upon this response Git
>>> continues with the checkout operation and asks the filter to process the
>>> blob again after all other blobs have been processed.
>> 
>> Hmm, I would have expected that the basic flow would become
>> 
>>  for each paths to be processed:
>>  convert-to-worktree to buf
>>  if not delayed:
>>  do the caller's thing to use buf
>>  else:
>>  remember path
>> 
>>  for each delayed paths:
>>  ensure filter process finished processing for path
>>  fetch the thing to buf from the process
>>  do the caller's thing to use buf
> 
> I would expect here to have a kind of event loop, namely
> 
>while there are delayed paths:
>get path that is ready from filter
>fetch the thing to buf (supporting "delayed")
>if path done
>do the caller's thing to use buf 
>(e.g. finish checkout path, eof convert, etc.)
> 
> We can either trust filter process to tell us when it finished sending
> delayed paths, or keep list of paths that are being delayed in Git.

I could implement "get path that is ready from filter" but wouldn't
that complicate the filter protocol? I think we can use the protocol pretty 
much as if with the strategy outlined here:
http://public-inbox.org/git/f533857d-9b51-44c1-8889-aa0542ad8...@gmail.com/


Thanks,
Lars

Re: [PATCH v1] convert: add "status=delayed" to filter process protocol

2017-01-11 Thread Lars Schneider

> On 10 Jan 2017, at 00:38, Taylor Blau  wrote:
> 
> I've been considering some alternative approaches in order to make the
> communication between Git and any extension that implements this protocol more
> intuitive.
> 
> In particular, I'm considering alternatives to:
> 
>>  for each delayed paths:
>>  ensure filter process finished processing for path
>>  fetch the thing to buf from the process
>>  do the caller's thing to use buf
> 
> As I understand it, the above sequence of steps would force Git to either:
> 
> a) loop over all delayed paths and ask the filter if it's done processing,
>   creating a busy-loop between the filter and Git, or...
> b) loop over all delayed paths sequentially, checking out each path in 
> sequence
> 
> I would like to avoid both of those situations, and instead opt for an
> asynchronous approach. In (a), the protocol is far too chatty. In (b), the
> protocol is much less chatty, but forces the checkout to be the very last 
> step,
> which has negative performance implications on checkouts with many large 
> files.
> 
> For instance, checking out several multi-gigabyte files one after the other
> means that a significant amount of time is lost while the filter has some of 
> the
> items ready. Instead of checking them out as they become available, Git waits
> until the very end when they are all available.
> 
> I think it would be preferable for the protocol to specify a sort of "done"
> signal against each path such that Git could check out delayed paths as they
> become available. If implemented this way, Git could checkout files
> asynchronously, while the filter continues to do work on the other end.

In v1 I implemented a) with the busy-loop problem in mind. 

My thinking was this:

If the filter sees at least one filter request twice then the filter knows that
Git has already requested all files that require filtering. At that point the
filter could just block the "delayed" answer to the latest filter request until
at least one of the previously delayed requests can be fulfilled. Then the 
filter
answers "delay" to Git until Git requests the blob that can be fulfilled. This
process cycles until all requests can be fulfilled. Wouldn't that work?

I think a "done" message by the filter is not easy. Right now the protocol 
works 
in a mode were Git always asks and the filter always answers. I believe changing
the filter to be able to initiate a "done" message would complicated the 
protocol.


> Additionally, the protocol should specify a sentinel "no more entries" value
> that could be sent from Git to the filter to signal that there are no more 
> files
> to checkout. Some filters may implement mechanisms for converting files that
> require a signal to know when all files have been sent. Specifically, Git LFS
> (https://git-lfs.github.com) batches files to be transferred together, and 
> needs
> to know when all files have been announced to truncate and send the last 
> batch,
> if it is not yet full. I'm sure other filter implementations use a similar
> mechanism and would benefit from this as well.

I agree. I think the filter already has this info implicitly as explained above
but an explicit message would be better!

Thanks,
Lars

Re: [musl] Re: Test failures when Git is built with libpcre and grep is built without it

2017-01-11 Thread Jeff King
On Tue, Jan 10, 2017 at 12:40:00PM +0100, Szabolcs Nagy wrote:

> > > I'm not sure if musl is wrong for failing to complain about a
> > > bogus regex. Generally making something that would break into
> > > something that works is an OK way to extend the standard. So our
> > > test is at fault for assuming that the regex will fail. I guess
> 
> \x is undefined in posix and musl is based on tre which
> supports \x{hexdigits} in ere.

Thanks for confirming; I figured it was something like that.

> > > we'd need to find some more exotic syntax that pcre supports, but
> > > that ERE doesn't. Maybe "(?:)" or something.
> 
> i think you would have to use something that's invalid
> in posix ere, ? after empty expression is undefined,
> not an error so "(?:)" is a valid ere extension.

Reading through POSIX[1], hardly anything is explicitly labeled as
"invalid". Most things are just "undefined", which leaves rooms for
implementations to do what they like.

That's a good thing for a standard to do, but a bad thing when you are
trying to find behavior that differs reliably between PCRE and ERE. :)
In most cases, PCRE constructs could be viable extensions to ERE.

> since most syntax is either defined or undefined in ere
> instead of being invalid, distinguishing pcre using
> syntax is not easy.
> 
> there are semantic differences in subexpression matching:
> leftmost match has higher priority in pcre, longest match
> has higher priority in ere.
> 
> $ echo ab | grep -o -E '(a|ab)'
> ab
> $ echo ab | grep -o -P '(a|ab)'
> a
> 
> unfortunately grep -o is not portable.

In this case we're testing whether Git has internally fed the regex to
pcre or to regcomp(), not a system grep. So we'd need something like
"-o" for "git grep", which I don't think exists.

Another difference I found is that "[\d]" matches a literal "\" or "d"
in ERE, but behaves like "[0-9]" in PCRE. I'll work up a patch based on
that.

Thanks for your answer. I'll drop the musl list from the cc when I
follow-up, as this is most definitely not a musl problem, but a git one.

-Peff


GREETINGS

2017-01-11 Thread THANDI ROBERT



Hello my name is Ms. Thandi Robert, from Ivory Coast. My parents were brutally 
mulled by the former president Laurent Gbagbo because of political crisis as 
the only survival of my family. I got your email while searching for a reliable 
personality in my private study on the internet. I am in need of your help and 
stand as my guardian in the management of my family inherited sum of $22.5M 
USD.  Please get back to me with your private telephone number with sincerity. 

Sincerely Yours, 
Ms. Thandi Robert 


Re: [PATCH v1] convert: add "status=delayed" to filter process protocol

2017-01-11 Thread Lars Schneider

> On 08 Jan 2017, at 21:45, Eric Wong  wrote:
> 
> larsxschnei...@gmail.com wrote:
>> +++ b/t/t0021/rot13-filter.pl
> 
>> +$DELAY{'test-delay1.r'} = 1;
>> +$DELAY{'test-delay3.r'} = 3;
>> 
>> open my $debug, ">>", "rot13-filter.log" or die "cannot open log file: $!";
>> 
>> @@ -166,6 +176,15 @@ while (1) {
>>  packet_txt_write("status=abort");
>>  packet_flush();
>>  }
>> +elsif ( $command eq "smudge" and
>> +exists $DELAY{$pathname} and
>> +$DELAY{$pathname} gt 0 ) {
> 
> Use '>' for numeric comparisons.  'gt' is for strings (man perlop)

Still learning Perl :-)


> Sidenote, staying <= 80 columns for the rest of the changes is
> strongly preferred, some of us need giant fonts.  I think what
> Torsten said about introducing a new *_internal function can
> also help with that.

OK! 

Thank you,
Lars


Re: [PATCH v1] convert: add "status=delayed" to filter process protocol

2017-01-11 Thread Lars Schneider

> On 08 Jan 2017, at 21:14, Torsten Bögershausen  wrote:
> 
> On Sun, Jan 08, 2017 at 08:17:36PM +0100, larsxschnei...@gmail.com wrote:
>> From: Lars Schneider 
>> 
>> Some `clean` / `smudge` filters might require a significant amount of
>> time to process a single blob. During this process the Git checkout
>> operation is blocked and Git needs to wait until the filter is done to
>> continue with the checkout.
>> 
>> Teach the filter process protocol (introduced in edcc858) to accept the
>> status "delayed" as response to a filter request. Upon this response Git
>> continues with the checkout operation and asks the filter to process the
>> blob again after all other blobs have been processed.
>> 
>> Git has a multiple code paths that checkout a blob. Support delayed
>> checkouts only in `clone` (in unpack-trees.c) and `checkout` operations.
>> 
>> Signed-off-by: Lars Schneider 
>> ---
>> 
> 
> Some feeling tells me that it may be better to leave 
> convert_to_working_tree() as it is.
> And change convert_to_working_tree_internal as suggested:
> 
> int convert_to_working_tree(const char *path, const char *src, size_t len, 
> struct strbuf *dst)
> {
> - return convert_to_working_tree_internal(path, src, len, dst, 0);
> + return convert_to_working_tree_internal(path, src, len, dst, NULL, 0);
> }

If I do this then I would have no way to communicate to the caller that the
processing is delayed. Consequently the caller would not know that an additional
call is necessary to fetch the result.

Thanks,
Lars

Re: [PATCH v1] convert: add "status=delayed" to filter process protocol

2017-01-11 Thread Lars Schneider

> On 09 Jan 2017, at 00:42, Junio C Hamano  wrote:
> 
> larsxschnei...@gmail.com writes:
> 
>> From: Lars Schneider 
>> 
>> Some `clean` / `smudge` filters might require a significant amount of
>> time to process a single blob. During this process the Git checkout
>> operation is blocked and Git needs to wait until the filter is done to
>> continue with the checkout.
>> 
>> Teach the filter process protocol (introduced in edcc858) to accept the
>> status "delayed" as response to a filter request. Upon this response Git
>> continues with the checkout operation and asks the filter to process the
>> blob again after all other blobs have been processed.
> 
> Hmm, I would have expected that the basic flow would become
> 
>   for each paths to be processed:
>   convert-to-worktree to buf
>   if not delayed:
>   do the caller's thing to use buf
>   else:
>   remember path
> 
>   for each delayed paths:
>   ensure filter process finished processing for path
>   fetch the thing to buf from the process
>   do the caller's thing to use buf
> 
> and that would make quite a lot of sense.  However, what is actually
> implemented is a bit disappointing from that point of view.  While
> its first part is the same as above, the latter part instead does:
> 
>   for each delayed paths:
>   checkout the path
> 
> Presumably, checkout_entry() does the "ensure that the process is
> done converting" (otherwise the result is simply buggy), but what
> disappoints me is that this does not allow callers that call
> "convert-to-working-tree", whose interface is obtain the bytestream 
> in-core in the working tree representation, given an object in the
> object-db representation in an in-core buffer, to _use_ the result
> of the conversion.  The caller does not have a chance to even see
> the result as it is written straight to the filesystem, once it
> calls checkout_delayed_entries().

I am not sure I can follow you here. A caller of "convert_to_working_tree"
would indeed see filtered result. Consider the following example. The 
filter delays the conversion twice and responds with the filtered results
on the third call:

CALL: int convert_to_working_tree(*src=='CONTENT', *dst, *delayed==0)
RESPONSE: return == 1; *delayed == 1, *dst==''

CALL: int convert_to_working_tree(*src=='CONTENT', *dst, *delayed==0)
RESPONSE: return == 1; *delayed == 1, *dst==''

CALL: int convert_to_working_tree(*src=='CONTENT', *dst, *delayed==0)
RESPONSE: return == 1; *delayed == 0, *dst=='FILTERED_CONTENT'

I implemented the "checkout_delayed_entries" function in v1 because
it solved the problem with minimal changes in the existing code. Our previous 
discussion made me think that this is the preferred way:

 I do not think we want to see such a rewrite all over the
 codepaths.  It might be OK to add such a "these entries are known
 to be delayed" list in struct checkout so that the above becomes
 more like this:

   for (i = 0; i < active_nr; i++)
  checkout_entry(active_cache[i], state, NULL);
 + checkout_entry_finish(state);

 That is, addition of a single "some of the checkout_entry() calls
 done so far might have been lazy, and I'll give them a chance to
 clean up" might be palatable.  Anything more than that on the
 caller side is not.

c.f. http://public-inbox.org/git/xmqqvavotych@gitster.mtv.corp.google.com/

Thanks,
Lars