[PATCH 1/2] mailinfo: Add support for keep_cr
From: Matthew Wilcox If you have a base-64 encoded patch with CRLF endings (as produced by forwarding a patch from Outlook to a Linux machine, for example), the keep_cr setting is not honoured because keep_cr is only passed to mailsplit, which does not look through the encoding. The keep_cr logic needs to be applied after the base64 decode. I copied that logic to handle_filter(), and rather than add a new keep_cr parameter to handle_filter, I opted to add keep_cr to struct mailinfo; it seemed appropriate given use_scissors was already there. Then I needed to initialise keep_cr in the struct mailinfo passed from git-am, and rather than thread a 'keep_cr' parameter all the way through to parse_mail(), I decided to add keep_cr to struct am_state, which let it be removed as a parameter from five other functions. Signed-off-by: Matthew Wilcox --- builtin/am.c | 49 - mailinfo.c | 6 ++ mailinfo.h | 1 + 3 files changed, 31 insertions(+), 25 deletions(-) diff --git a/builtin/am.c b/builtin/am.c index 31fb60578..6cb6e6ca8 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -124,6 +124,7 @@ struct am_state { int keep; /* enum keep_type */ int message_id; int scissors; /* enum scissors_type */ + int keep_cr; struct argv_array git_apply_opts; const char *resolvemsg; int committer_date_is_author_date; @@ -143,6 +144,7 @@ static void am_state_init(struct am_state *state, const char *dir) memset(state, 0, sizeof(*state)); + state->keep_cr = -1; assert(dir); state->dir = xstrdup(dir); @@ -697,7 +699,7 @@ static int detect_patch_format(const char **paths) * a mbox file or a Maildir. Returns 0 on success, -1 on failure. */ static int split_mail_mbox(struct am_state *state, const char **paths, - int keep_cr, int mboxrd) + int mboxrd) { struct child_process cp = CHILD_PROCESS_INIT; struct strbuf last = STRBUF_INIT; @@ -707,7 +709,7 @@ static int split_mail_mbox(struct am_state *state, const char **paths, argv_array_pushf(&cp.args, "-d%d", state->prec); argv_array_pushf(&cp.args, "-o%s", state->dir); argv_array_push(&cp.args, "-b"); - if (keep_cr) + if (state->keep_cr) argv_array_push(&cp.args, "--keep-cr"); if (mboxrd) argv_array_push(&cp.args, "--mboxrd"); @@ -737,7 +739,7 @@ typedef int (*mail_conv_fn)(FILE *out, FILE *in, int keep_cr); * Returns 0 on success, -1 on failure. */ static int split_mail_conv(mail_conv_fn fn, struct am_state *state, - const char **paths, int keep_cr) + const char **paths) { static const char *stdin_only[] = {"-", NULL}; int i; @@ -766,7 +768,7 @@ static int split_mail_conv(mail_conv_fn fn, struct am_state *state, return error_errno(_("could not open '%s' for writing"), mail); - ret = fn(out, in, keep_cr); + ret = fn(out, in, state->keep_cr); fclose(out); fclose(in); @@ -826,8 +828,7 @@ static int stgit_patch_to_mail(FILE *out, FILE *in, int keep_cr) * * Returns 0 on success, -1 on failure. */ -static int split_mail_stgit_series(struct am_state *state, const char **paths, - int keep_cr) +static int split_mail_stgit_series(struct am_state *state, const char **paths) { const char *series_dir; char *series_dir_buf; @@ -857,7 +858,7 @@ static int split_mail_stgit_series(struct am_state *state, const char **paths, strbuf_release(&sb); free(series_dir_buf); - ret = split_mail_conv(stgit_patch_to_mail, state, patches.argv, keep_cr); + ret = split_mail_conv(stgit_patch_to_mail, state, patches.argv); argv_array_clear(&patches); return ret; @@ -937,30 +938,27 @@ static int hg_patch_to_mail(FILE *out, FILE *in, int keep_cr) * state->cur will be set to the index of the first mail, and state->last will * be set to the index of the last mail. * - * Set keep_cr to 0 to convert all lines ending with \r\n to end with \n, 1 - * to disable this behavior, -1 to use the default configured setting. - * * Returns 0 on success, -1 on failure. */ static int split_mail(struct am_state *state, enum patch_format patch_format, - const char **paths, int keep_cr) + const char **paths) { - if (keep_cr < 0) { - keep_cr = 0; - git_config_get_bool("am.keepcr", &keep_cr); + if (state->keep_cr < 0) { + state->keep_cr = 0; + git_config_get_bool("am.keepcr", &state->keep_cr); } switch (patch_format) { case PATCH_FORMAT_MBOX: - return split_mail_mbox(state, pa
[PATCH 2/2] mailinfo: Understand forwarded patches
From: Matthew Wilcox Extend the --scissors mechanism to strip off the preamble created by forwarding a patch. There are a couple of extra headers ("Sent" and "To") added by forwarding, but other than that, the --scissors option will now remove patches forwarded from Microsoft Outlook to a Linux email account. Signed-off-by: Matthew Wilcox --- mailinfo.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/mailinfo.c b/mailinfo.c index 2059704a8..fc1275532 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -332,7 +332,7 @@ static void cleanup_subject(struct mailinfo *mi, struct strbuf *subject) #define MAX_HDR_PARSED 10 static const char *header[MAX_HDR_PARSED] = { - "From","Subject","Date", + "From","Subject","Date","Sent","To", }; static inline int cmp_header(const struct strbuf *line, const char *hdr) @@ -685,6 +685,13 @@ static int is_scissors_line(const char *line) c++; continue; } + if (!memcmp(c, "Original Message", 16)) { + in_perforation = 1; + perforation += 16; + scissors += 16; + c += 15; + continue; + } in_perforation = 0; } -- 2.11.0
Re: [PATCH 2/2] Use 'env' to find perl instead of fixed path
I'm not at all attached to changing all of them, just figured it made sense while I was here. Would a patch that changes only: git-add--interactive.perl | 2 +- git-archimport.perl | 2 +- git-cvsexportcommit.perl | 2 +- git-cvsimport.perl| 2 +- git-cvsserver.perl| 2 +- git-difftool.perl | 2 +- git-relink.perl | 2 +- git-send-email.perl | 2 +- git-svn.perl | 2 +- be more acceptable? On Thu, Jan 12, 2017 at 1:27 AM, Johannes Sixt wrote: > Am 12.01.2017 um 06:51 schrieb Pat Pannuto: >> >> diff --git a/git-add--interactive.perl b/git-add--interactive.perl >> index cf6fc926a..6d7b6c35d 100755 >> --- a/git-add--interactive.perl >> +++ b/git-add--interactive.perl >> @@ -1,4 +1,4 @@ >> -#!/usr/bin/perl >> +#!/usr/bin/env perl > > > This change, and all others that affect installed external git programs, is > a no-go. On Windows, our execve emulation is not complete. It would invoke > only `env` (looked up in PATH), but not pass 'perl' as argument. > > Sorry for the bad news. > > I would have suggested to set PERL_PATH in your config.mak, but that does > not change the generated perl scripts, I think. Perhaps you should implement > that? > > I'm not thrilled about your changes to the test scripts, but I do not expect > that they break on Windows. > > -- Hannes >
Re: [PATCH 0/2] Use env for all perl invocations
[plain text re-send, sorry] Interesting, I'm guessing this came from when git was installed ( https://github.com/Homebrew/homebrew-core/blob/master/Formula/git.rb#L50 ), when the perl path was likely still /usr/bin/perl It feels weird to me that the perl path is fixed at compile/install-time as opposed to run-time discovery -- this means users can't change their perl install without breaking git? On Thu, Jan 12, 2017 at 1:21 AM, Junio C Hamano wrote: > Pat Pannuto writes: > >> I spent a little while debugging why git-format-patch refused to believe >> that SSL support was installed (Can't locate Net/SMTP/SSL.pm in @INC...) >> Turns out that it was installed for my system's preferred >> /usr/local/bin/perl, >> but not for git-format-patch's hard-coded /usr/bin/perl; changing the shebang >> allowed git format-patch to work as expected. > > Isn't that an indication that you are not building correctly? > Perhaps > > $ git grep 'Define PERL_' Makefile > $ make PERL_PATH=/usr/local/bin/perl > > would help?
Re: [RFC PATCH 3/2] vreportf: add prefix to each line
On Wed, Jan 11, 2017 at 02:11:42PM -0800, Junio C Hamano wrote: > > It's actually kind of ugly. For instance, a failing test in > > t3600 now produces: > > > >error: the following files have staged content different from both the > >error: file and the HEAD: > >error: bar.txt > >error: foo.txt > >error: (use -f to force removal) > > > > which seems a bit aggressive. > > I agree that it is ugly, but one reason I was hoping to do this > myself (or have somebody else do it by procrastinating) was that I > thought it may help i18n. That is, for an original > > error(_("we found an error")) > > a .po file may translate the string to a multi-line string that has > LFs in it and the output would look correct. The translator already > can do so by indenting the second and subsequent lines by the same > column-width as "error: " (or its translation in their language, if > we are going to i18n these headers), but that (1) is extra work for > them, and (2) makes it impossible to have the same message appear in > different contexts (i.e. "error:" vs "warning:" that have different > column-widths). Yes, I agree that would be a functional benefit. I'm just hoping we can do it in a way that is visually pleasing. > > It also screws up messages which indent with tabs (the prefix eats > > up some of the tabstop, making the indentation smaller). > > This is unavoidable and at the same time is a non-issue, isn't it? > Messages that indent the second and subsequent lines with tabs are > compensating the lack of the multi-line support of vreportf(), which > this RFC patch fixes. They may need to be adjusted to the new world > order, but that is a good thing. New multi-line messages no longer > have to worry about the prefix that is added only to the first line > when continuing the message to multiple lines. I'm not so sure it is just about compensating. Look at the message quoted above. The original looks like: error: the following files have staged content different from both the file and the HEAD: bar.txt foo.txt (use -f to force removal) The leading whitespace is visually separating the list of files, not just from the line with "error:", but from the other lines. Though I think if we replaced tabs with spaces in this instance, then they would still be bumped relative to the rest of the text. > > It may be possible to address some of that by using some > > other kind of continuation marker (instead of just repeating > > the prefix), and expanding initial tabs. > > Yes indeed. The "some other kind of continuation marker" could just > be a run of spaces that fill the same column as the "error: " or > other prefix given to the first line. I tried that, along with several other variants, but it gets confusing/ugly when mixed with indentation that is significant to the line. For example, the t3600 message becomes: error: the following files have staged content different from both the file and the HEAD: bar.txt foo.txt (use -f to force removal) Which is arguably better than what we have now, but still looks pretty bad to me. I wonder if it would help for the marker to end in a non-whitespace character. Like: error: the following files have staged content different from both the : file and the HEAD: : bar.txt : foo.txt : (use -f to force removal) or something. The ":" is a bit sparse looking. Maybe there are better options. That does ruin line-by-line selection for cut-and-paste, though. So I dunno. I am open to ideas, but I didn't find one that I really liked (this patch is actually a leftover from before Christmas, so I don't even remember all the things I tried, just that I didn't like any of them ;) ). -Peff
Re: [PATCH 2/2] Use 'env' to find perl instead of fixed path
Am 12.01.2017 um 06:51 schrieb Pat Pannuto: diff --git a/git-add--interactive.perl b/git-add--interactive.perl index cf6fc926a..6d7b6c35d 100755 --- a/git-add--interactive.perl +++ b/git-add--interactive.perl @@ -1,4 +1,4 @@ -#!/usr/bin/perl +#!/usr/bin/env perl This change, and all others that affect installed external git programs, is a no-go. On Windows, our execve emulation is not complete. It would invoke only `env` (looked up in PATH), but not pass 'perl' as argument. Sorry for the bad news. I would have suggested to set PERL_PATH in your config.mak, but that does not change the generated perl scripts, I think. Perhaps you should implement that? I'm not thrilled about your changes to the test scripts, but I do not expect that they break on Windows. -- Hannes
Re: [PATCH 0/2] Use env for all perl invocations
Pat Pannuto writes: > I spent a little while debugging why git-format-patch refused to believe > that SSL support was installed (Can't locate Net/SMTP/SSL.pm in @INC...) > Turns out that it was installed for my system's preferred /usr/local/bin/perl, > but not for git-format-patch's hard-coded /usr/bin/perl; changing the shebang > allowed git format-patch to work as expected. Isn't that an indication that you are not building correctly? Perhaps $ git grep 'Define PERL_' Makefile $ make PERL_PATH=/usr/local/bin/perl would help?
[PATCH 2/2] Use 'env' to find perl instead of fixed path
Depending on system configuration, /usr/bin/perl may not be the preferred perl interpreter, use /usr/bin/env perl to invoke perl instead to repsect user preferences. In most cases this will not matter as any perl install will work, but for applications such as git-format-patch, which may require plugins such as Net/SMTP/SSL.pm to be installed, it is very confusing when git fails to work after installation because it chose the wrong perl installation. This patch converts all shebangs to use env for consistency across the project. Signed-off-by: Pat Pannuto --- Documentation/build-docdep.perl | 2 +- Documentation/cat-texi.perl | 2 +- Documentation/cmd-list.perl | 2 +- Documentation/fix-texi.perl | 2 +- Documentation/lint-gitlink.perl | 2 +- compat/vcbuild/scripts/clink.pl | 2 +- compat/vcbuild/scripts/lib.pl | 2 +- contrib/buildsystems/engine.pl| 2 +- contrib/buildsystems/generate | 2 +- contrib/buildsystems/parse.pl | 2 +- contrib/contacts/git-contacts | 2 +- contrib/credential/netrc/git-credential-netrc | 2 +- contrib/credential/netrc/test.pl | 2 +- contrib/diff-highlight/diff-highlight | 2 +- contrib/examples/git-remote.perl | 2 +- contrib/examples/git-rerere.perl | 2 +- contrib/examples/git-svnimport.perl | 2 +- contrib/fast-import/git-import.perl | 2 +- contrib/fast-import/import-directories.perl | 2 +- contrib/fast-import/import-tars.perl | 2 +- contrib/hooks/setgitperms.perl| 2 +- contrib/hooks/update-paranoid | 2 +- contrib/long-running-filter/example.pl| 2 +- contrib/mw-to-git/git-mw.perl | 2 +- contrib/mw-to-git/git-remote-mediawiki.perl | 2 +- contrib/mw-to-git/t/test-gitmw.pl | 2 +- contrib/stats/mailmap.pl | 2 +- contrib/stats/packinfo.pl | 2 +- git-add--interactive.perl | 2 +- git-archimport.perl | 2 +- git-cvsexportcommit.perl | 2 +- git-cvsimport.perl| 2 +- git-cvsserver.perl| 2 +- git-difftool.perl | 2 +- git-relink.perl | 2 +- git-send-email.perl | 2 +- git-svn.perl | 2 +- gitweb/gitweb.perl| 2 +- t/Git-SVN/Utils/can_compress.t| 2 +- t/Git-SVN/Utils/fatal.t | 2 +- t/check-non-portable-shell.pl | 2 +- t/gitweb-lib.sh | 2 +- t/perf/aggregate.perl | 2 +- t/perf/min_time.perl | 2 +- t/t0202/test.pl | 2 +- t/t4034/perl/post | 2 +- t/t4034/perl/pre | 2 +- t/t9000/test.pl | 2 +- t/t9500-gitweb-standalone-no-errors.sh| 2 +- t/t9700/test.pl | 2 +- t/test-terminal.perl | 2 +- 51 files changed, 51 insertions(+), 51 deletions(-) diff --git a/Documentation/build-docdep.perl b/Documentation/build-docdep.perl index ba4205e03..dc50f21f3 100755 --- a/Documentation/build-docdep.perl +++ b/Documentation/build-docdep.perl @@ -1,4 +1,4 @@ -#!/usr/bin/perl +#!/usr/bin/env perl my %include = (); my %included = (); diff --git a/Documentation/cat-texi.perl b/Documentation/cat-texi.perl index 1cd28b1b5..98dcc2c42 100755 --- a/Documentation/cat-texi.perl +++ b/Documentation/cat-texi.perl @@ -1,4 +1,4 @@ -#!/usr/bin/perl +#!/usr/bin/env perl use warnings; diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl index ba640a441..f440eebe3 100755 --- a/Documentation/cmd-list.perl +++ b/Documentation/cmd-list.perl @@ -1,4 +1,4 @@ -#!/usr/bin/perl +#!/usr/bin/env perl use warnings; diff --git a/Documentation/fix-texi.perl b/Documentation/fix-texi.perl index c247aece7..61287a0a9 100755 --- a/Documentation/fix-texi.perl +++ b/Documentation/fix-texi.perl @@ -1,4 +1,4 @@ -#!/usr/bin/perl +#!/usr/bin/env perl use warnings; diff --git a/Documentation/lint-gitlink.perl b/Documentation/lint-gitlink.perl index 476cc30b8..3e4b00eda 100755 --- a/Documentation/lint-gitlink.perl +++ b/Documentation/lint-gitlink.perl @@ -1,4 +1,4 @@ -#!/usr/bin/perl +#!/usr/bin/env perl use File::Find; use Getopt::Long; diff --git a/compat/vcbuild/scripts/clink.pl b/compat/vcbuild/scripts/clink.pl index 46eb61c5c..857847703 100755 --- a/compat/vcbuild/scripts/clink.pl +++ b/compat/vcbuild/scripts/clink.pl @@ -1,4 +1,4 @@ -#!/usr/bin/perl +#!/usr/bin/env perl ## #
[PATCH 0/2] Use env for all perl invocations
I spent a little while debugging why git-format-patch refused to believe that SSL support was installed (Can't locate Net/SMTP/SSL.pm in @INC...) Turns out that it was installed for my system's preferred /usr/local/bin/perl, but not for git-format-patch's hard-coded /usr/bin/perl; changing the shebang allowed git format-patch to work as expected. This patch set converts all perl invocations in git to use env so that the user-preferred perl interpreter is always used. Pat Pannuto (2): Convert all 'perl -w' to 'perl' + 'use warnings;' Use 'env' to find perl instead of fixed path Documentation/build-docdep.perl | 2 +- Documentation/cat-texi.perl | 4 +++- Documentation/cmd-list.perl | 4 +++- Documentation/fix-texi.perl | 4 +++- Documentation/lint-gitlink.perl | 2 +- compat/vcbuild/scripts/clink.pl | 3 ++- compat/vcbuild/scripts/lib.pl | 3 ++- contrib/buildsystems/engine.pl| 3 ++- contrib/buildsystems/generate | 3 ++- contrib/buildsystems/parse.pl | 3 ++- contrib/contacts/git-contacts | 2 +- contrib/credential/netrc/git-credential-netrc | 2 +- contrib/credential/netrc/test.pl | 2 +- contrib/diff-highlight/diff-highlight | 2 +- contrib/examples/git-remote.perl | 3 ++- contrib/examples/git-rerere.perl | 2 +- contrib/examples/git-svnimport.perl | 2 +- contrib/fast-import/git-import.perl | 2 +- contrib/fast-import/import-directories.perl | 2 +- contrib/fast-import/import-tars.perl | 2 +- contrib/hooks/setgitperms.perl| 2 +- contrib/hooks/update-paranoid | 2 +- contrib/long-running-filter/example.pl| 2 +- contrib/mw-to-git/git-mw.perl | 2 +- contrib/mw-to-git/git-remote-mediawiki.perl | 2 +- contrib/mw-to-git/t/test-gitmw.pl | 5 - contrib/stats/mailmap.pl | 2 +- contrib/stats/packinfo.pl | 2 +- git-add--interactive.perl | 2 +- git-archimport.perl | 2 +- git-cvsexportcommit.perl | 2 +- git-cvsimport.perl| 2 +- git-cvsserver.perl| 2 +- git-difftool.perl | 2 +- git-relink.perl | 2 +- git-send-email.perl | 2 +- git-svn.perl | 2 +- gitweb/gitweb.perl| 2 +- t/Git-SVN/Utils/can_compress.t| 2 +- t/Git-SVN/Utils/fatal.t | 2 +- t/check-non-portable-shell.pl | 2 +- t/gitweb-lib.sh | 2 +- t/perf/aggregate.perl | 2 +- t/perf/min_time.perl | 2 +- t/t0202/test.pl | 2 +- t/t4034/perl/post | 2 +- t/t4034/perl/pre | 2 +- t/t9000/test.pl | 2 +- t/t9500-gitweb-standalone-no-errors.sh| 2 +- t/t9700/test.pl | 2 +- t/test-terminal.perl | 2 +- 51 files changed, 66 insertions(+), 51 deletions(-) -- 2.11.0
[PATCH 1/2] Convert all 'perl -w' to 'perl' + 'use warnings;'
This commit is in preparation for converting all shebangs to use 'env' instead of a fixed perl path, which will not allow for arguments to 'perl'. Signed-off-by: Pat Pannuto --- Documentation/cat-texi.perl | 4 +++- Documentation/cmd-list.perl | 4 +++- Documentation/fix-texi.perl | 4 +++- compat/vcbuild/scripts/clink.pl | 3 ++- compat/vcbuild/scripts/lib.pl | 3 ++- contrib/buildsystems/engine.pl| 3 ++- contrib/buildsystems/generate | 3 ++- contrib/buildsystems/parse.pl | 3 ++- contrib/examples/git-remote.perl | 3 ++- contrib/mw-to-git/t/test-gitmw.pl | 5 - 10 files changed, 25 insertions(+), 10 deletions(-) diff --git a/Documentation/cat-texi.perl b/Documentation/cat-texi.perl index 87437f8a9..1cd28b1b5 100755 --- a/Documentation/cat-texi.perl +++ b/Documentation/cat-texi.perl @@ -1,4 +1,6 @@ -#!/usr/bin/perl -w +#!/usr/bin/perl + +use warnings; my @menu = (); my $output = $ARGV[0]; diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl index 5aa73cfe4..ba640a441 100755 --- a/Documentation/cmd-list.perl +++ b/Documentation/cmd-list.perl @@ -1,4 +1,6 @@ -#!/usr/bin/perl -w +#!/usr/bin/perl + +use warnings; use File::Compare qw(compare); diff --git a/Documentation/fix-texi.perl b/Documentation/fix-texi.perl index ff7d78f62..c247aece7 100755 --- a/Documentation/fix-texi.perl +++ b/Documentation/fix-texi.perl @@ -1,4 +1,6 @@ -#!/usr/bin/perl -w +#!/usr/bin/perl + +use warnings; while (<>) { if (/^\@setfilename/) { diff --git a/compat/vcbuild/scripts/clink.pl b/compat/vcbuild/scripts/clink.pl index a87d0da51..46eb61c5c 100755 --- a/compat/vcbuild/scripts/clink.pl +++ b/compat/vcbuild/scripts/clink.pl @@ -1,4 +1,4 @@ -#!/usr/bin/perl -w +#!/usr/bin/perl ## # Compiles or links files # @@ -10,6 +10,7 @@ # Copyright (C) 2009 Marius Storm-Olsen ## use strict; +use warnings; my @args = (); my @cflags = (); my $is_linking = 0; diff --git a/compat/vcbuild/scripts/lib.pl b/compat/vcbuild/scripts/lib.pl index d8054e469..e571b8470 100755 --- a/compat/vcbuild/scripts/lib.pl +++ b/compat/vcbuild/scripts/lib.pl @@ -1,4 +1,4 @@ -#!/usr/bin/perl -w +#!/usr/bin/perl ## # Libifies files on Windows # @@ -10,6 +10,7 @@ # Copyright (C) 2009 Marius Storm-Olsen ## use strict; +use warnings; my @args = (); while (@ARGV) { my $arg = shift @ARGV; diff --git a/contrib/buildsystems/engine.pl b/contrib/buildsystems/engine.pl index 23da787dc..a173669ce 100755 --- a/contrib/buildsystems/engine.pl +++ b/contrib/buildsystems/engine.pl @@ -1,4 +1,4 @@ -#!/usr/bin/perl -w +#!/usr/bin/perl ## # Do not call this script directly! # @@ -8,6 +8,7 @@ # Copyright (C) 2009 Marius Storm-Olsen ## use strict; +use warnings; use File::Basename; use File::Spec; use Cwd; diff --git a/contrib/buildsystems/generate b/contrib/buildsystems/generate index bc10f25ff..9af89454a 100755 --- a/contrib/buildsystems/generate +++ b/contrib/buildsystems/generate @@ -1,4 +1,4 @@ -#!/usr/bin/perl -w +#!/usr/bin/perl ## # Generate buildsystem files # @@ -19,6 +19,7 @@ # Copyright (C) 2009 Marius Storm-Olsen ## use strict; +use warnings; use File::Basename; use Cwd; diff --git a/contrib/buildsystems/parse.pl b/contrib/buildsystems/parse.pl index c9656ece9..33ca89eb0 100755 --- a/contrib/buildsystems/parse.pl +++ b/contrib/buildsystems/parse.pl @@ -1,4 +1,4 @@ -#!/usr/bin/perl -w +#!/usr/bin/perl ## # Do not call this script directly! # @@ -8,6 +8,7 @@ # Copyright (C) 2009 Marius Storm-Olsen ## use strict; +use warnings; use File::Basename; use Cwd; diff --git a/contrib/examples/git-remote.perl b/contrib/examples/git-remote.perl index d42df7b41..5bf3ffd4c 100755 --- a/contrib/examples/git-remote.perl +++ b/contrib/examples/git-remote.perl @@ -1,6 +1,7 @@ -#!/usr/bin/perl -w +#!/usr/bin/perl use strict; +use warnings; use Git; my $git = Git->repository(); diff --git a/contrib/mw-to-git/t/test-gitmw.pl b/contrib/mw-to-git/t/test-gitmw.pl index 0ff76259f..8d0e7c078 100755 --- a/contrib/mw-to-git/t/test-gitmw.pl +++ b/contrib/mw-to-git/t/test-gitmw.pl @@ -1,4 +1,4 @@ -#!/usr/bin/perl -w -s +#!/usr/bin/perl # Copyright (C) 2012 # Charles Roussel # Simon Cathebras @@ -22,6 +22,9 @@ # "edit_page" # "getallpagename" +use strict; +use warnin
Re: [ANNOUNCE] git-test: run automated tests against a range of Git commits
On 01/10/2017 10:35 AM, Jeff King wrote: > On Sun, Jan 08, 2017 at 10:52:25AM +0100, Michael Haggerty wrote: >> [...] Since my last email, I have implemented a bunch of what we discussed [1]. Because of the new semantics, I also *renamed the main command* from git test range [...] to git test run [...] >> I'm thinking of maybe >> >> * If an argument matches `*..*`, pass it to `rev-list` (like now). This was already implemented. >> * Otherwise, treat each argument as a single commit/tree (i.e., pass it >> to `rev-parse`). This is now implemented, too. Plus, it is now allowed to specify multiple commits and/or ranges in a single invocation. >> * If no argument is specified, test `@{u}..` (assuming that an >> upstream is configured). Though actually, this won't be as >> convenient as it sounds, because (a) `git test` is often run >> in a separate worktree, and (2) on errors, it currently leaves the >> repository with a detached `HEAD`. I decided that if no argument is specified, it makes more sense to test HEAD if the working tree is clean, and the contents of the working tree otherwise. In the latter case no results are stored to notes, but it is still useful to be able to run a preconfigured test quickly while working. These are both implemented. >> * Support a `--stdin` option, to read a list of commits/trees to test >> from standard input. By this mechanism, users could use arbitrary >> `rev-list` commands to choose what to test. This is now implemented, too. > [...] >> I think ideally `git notes add` would look for pre-existing notes, and: >> >> * If none are found, create an empty notes reference. >> >> * If pre-existing notes are found and there was no existing test with >> that name, probably just leave the old notes in place. >> >> * If pre-existing notes are found and there was already a test with >> that name but a different command, perhaps insist that the user >> decide explicitly whether to forget the old results or continue using >> them. This might help users avoid the mistake of re-using old results >> even if they change the manner of testing. > > I'm not quite sure what you mean here. By "test" and "command", do you > mean the test name that is used in the notes ref, and the command that > it is defined as? By "test", I mean a test as configured in `git-test` and referred to by its name (e.g., in `--test=name`). Currently the only configuration for a test is `test..command`, but I structured the namespace that way to leave room to add more configuration in the future. > In the notes-cache.c subsystem, the commit message stores a validity > token which must match in order to use the cache. You could do something > similar here (store the executed command in the commit message, and > invalidate the cache the user has changed the command). The notes-cache > stuff isn't available outside of the C code, though. You could either > expose it, or just do something similar longhand. > > Thinking about it, though, I did notice that the tree sha1 is not the > only input to the cache. Things like config.mak (not to mention the > system itself) contribute to the test results. So no system will ever be > perfect, and it seems like just making an easy way to force a retest (or > just invalidate the whole cache) would be sufficient. It's true that the tree and the test command are not the only inputs to the testing machinery. I wasn't hoping to build a watertight system to prevent accidental use of old results when `config.mak` or something else in the environment has changed. I was only trying to give the user a reminder that if they change the command, it's a good time to consider whether the old results have to be invalidated. I suppose if we wanted to make this system more watertight, we could let the test configuration specify the names of other files on which its results depend; for example, test.default.command = make -j16 test test.default.auxiliaryInput = config.mak and include some kind of hash of the auxiliary inputs in a validity token. But that feels like overkill. > [...] >> Yeah, this is awkward, not only because many people don't know what to >> make of detached HEAD, but also because it makes it awkward in general >> to use `git test` in your main working directory. I didn't model this >> behavior on `git rebase --interactive`'s `edit` command, because I >> rarely use that. But I can see how they would fit together pretty well >> for people who like that workflow. > > Yeah, after sleeping on it, I think it's best if "git test" remains > separate from that. It's primary function is to run the test, possibly > serving up a cached answer. So it would be perfectly reasonable to do: > > git rebase -x 'git test range HEAD' > > to accomplish the interactive testing (though perhaps just "git test" > would be a nice synonym for that). That invocation can now be written `git test run`, which is a bit shorter. There could be a special
[PATCH 1/5] doc: add documentation for OPT_STRING_LIST
From: Jacob Keller Commit c8ba16391655 ("parse-options: add OPT_STRING_LIST helper", 2011-06-09) added the OPT_STRING_LIST as a way to accumulate a repeated list of strings. However, this was not documented in the api-parse-options documentation. Add documentation now so that future developers may learn of its existence. Signed-off-by: Jacob Keller --- Documentation/technical/api-parse-options.txt | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/technical/api-parse-options.txt b/Documentation/technical/api-parse-options.txt index 27bd701c0d68..15e876e4c804 100644 --- a/Documentation/technical/api-parse-options.txt +++ b/Documentation/technical/api-parse-options.txt @@ -168,6 +168,11 @@ There are some macros to easily define options: Introduce an option with string argument. The string argument is put into `str_var`. +`OPT_STRING_LIST(short, long, &list, arg_str, description)`:: + Introduce an option with a string argument. Repeated invocations + accumulate into a list of strings. Reset and clear the list with + `--no-option`. + `OPT_INTEGER(short, long, &int_var, description)`:: Introduce an option with integer argument. The integer is put into `int_var`. -- 2.11.0.403.g196674b8396b
[PATCH 4/5] describe: teach --match to accept multiple patterns
From: Jacob Keller Teach `--match` to be accepted multiple times, accumulating a list of patterns to match into a string list. Each pattern is inclusive, such that a tag need only match one of the provided patterns to be considered for matching. This extension is useful as it enables more flexibility in what tags match, and may avoid the need to run the describe command multiple times to get the same result. Add tests and update the documentation for this change. Signed-off-by: Jacob Keller --- Documentation/git-describe.txt | 5 - builtin/describe.c | 30 +++--- t/t6120-describe.sh| 19 +++ 3 files changed, 46 insertions(+), 8 deletions(-) diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt index e4ac448ff565..7ad41e2f6ade 100644 --- a/Documentation/git-describe.txt +++ b/Documentation/git-describe.txt @@ -83,7 +83,10 @@ OPTIONS --match :: Only consider tags matching the given `glob(7)` pattern, excluding the "refs/tags/" prefix. This can be used to avoid - leaking private tags from the repository. + leaking private tags from the repository. If given multiple times, a + list of patterns will be accumulated, and tags matching any of the + patterns will be considered. Use `--no-match` to clear and reset the + list of patterns. --always:: Show uniquely abbreviated commit object as fallback. diff --git a/builtin/describe.c b/builtin/describe.c index 01490a157efc..5cc9e9abe798 100644 --- a/builtin/describe.c +++ b/builtin/describe.c @@ -28,7 +28,7 @@ static int abbrev = -1; /* unspecified */ static int max_candidates = 10; static struct hashmap names; static int have_util; -static const char *pattern; +static struct string_list patterns = STRING_LIST_INIT_NODUP; static int always; static const char *dirty; @@ -129,9 +129,24 @@ static int get_name(const char *path, const struct object_id *oid, int flag, voi if (!all && !is_tag) return 0; - /* Accept only tags that match the pattern, if given */ - if (pattern && (!is_tag || wildmatch(pattern, path + 10, 0, NULL))) - return 0; + /* +* If we're given patterns, accept only tags which match at least one +* pattern. +*/ + if (patterns.nr) { + struct string_list_item *item; + + if (!is_tag) + return 0; + + for_each_string_list_item(item, &patterns) { + if (!wildmatch(item->string, path + 10, 0, NULL)) + break; + + /* If we get here, no pattern matched. */ + return 0; + } + } /* Is it annotated? */ if (!peel_ref(path, peeled.hash)) { @@ -404,7 +419,7 @@ int cmd_describe(int argc, const char **argv, const char *prefix) N_("only output exact matches"), 0), OPT_INTEGER(0, "candidates", &max_candidates, N_("consider most recent tags (default: 10)")), - OPT_STRING(0, "match", &pattern, N_("pattern"), + OPT_STRING_LIST(0, "match", &patterns, N_("pattern"), N_("only consider tags matching ")), OPT_BOOL(0, "always",&always, N_("show abbreviated commit object as fallback")), @@ -430,6 +445,7 @@ int cmd_describe(int argc, const char **argv, const char *prefix) die(_("--long is incompatible with --abbrev=0")); if (contains) { + struct string_list_item *item; struct argv_array args; argv_array_init(&args); @@ -440,8 +456,8 @@ int cmd_describe(int argc, const char **argv, const char *prefix) argv_array_push(&args, "--always"); if (!all) { argv_array_push(&args, "--tags"); - if (pattern) - argv_array_pushf(&args, "--refs=refs/tags/%s", pattern); + for_each_string_list_item(item, &patterns) + argv_array_pushf(&args, "--refs=refs/tags/%s", item->string); } if (argc) argv_array_pushv(&args, argv); diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh index 85f269411cb3..9e5db9b87a1f 100755 --- a/t/t6120-describe.sh +++ b/t/t6120-describe.sh @@ -182,6 +182,10 @@ check_describe "test2-lightweight-*" --tags --match="test2-*" check_describe "test2-lightweight-*" --long --tags --match="test2-*" HEAD^ +check_describe "test1-lightweight-*" --long --tags --match="test1-*" --match="test2-*" HEAD^ + +check_describe "test2-lightweight-*" --long --tags --match="test1-*" --no-match --match="test2-*" HEAD^ + test_expect_success 'name-rev with exact tags' ' ech
[PATCH 3/5] name-rev: add support to discard refs by pattern match
From: Jacob Keller Extend name-rev further to support matching refs by adding `--discard` patterns. These patterns will limit the scope of refs by discarding any ref that matches at least one discard pattern. Checking the discard refs shall happen first, before checking the include --refs patterns. This will allow more flexibility to matching certain kinds of references. Add tests and update Documentation for this change. Signed-off-by: Jacob Keller --- Documentation/git-name-rev.txt | 7 +++ builtin/name-rev.c | 14 +- t/t6007-rev-list-cherry-pick-file.sh | 7 +++ 3 files changed, 27 insertions(+), 1 deletion(-) diff --git a/Documentation/git-name-rev.txt b/Documentation/git-name-rev.txt index 7433627db12d..9b46e5ea9aae 100644 --- a/Documentation/git-name-rev.txt +++ b/Documentation/git-name-rev.txt @@ -30,6 +30,13 @@ OPTIONS given multiple times, use refs whose names match any of the given shell patterns. Use `--no-refs` to clear any previous ref patterns given. +--discard=:: + Do not use any ref whose name matches a given shell pattern. The + pattern can be one of branch name, tag name or fully qualified ref + name. If given multiple times, discard refs that match any of the given + shell patterns. Use `--no-discards` to clear the list of discard + patterns. + --all:: List all commits reachable from all refs diff --git a/builtin/name-rev.c b/builtin/name-rev.c index 000a2a700ed3..86479c17a7c9 100644 --- a/builtin/name-rev.c +++ b/builtin/name-rev.c @@ -109,6 +109,7 @@ struct name_ref_data { int tags_only; int name_only; struct string_list ref_filters; + struct string_list discard_filters; }; static struct tip_table { @@ -150,6 +151,15 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo if (data->tags_only && !starts_with(path, "refs/tags/")) return 0; + if (data->discard_filters.nr) { + struct string_list_item *item; + + for_each_string_list_item(item, &data->discard_filters) { + if (subpath_matches(path, item->string) >= 0) + return 0; + } + } + if (data->ref_filters.nr) { struct string_list_item *item; int matched = 0; @@ -323,12 +333,14 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix) { struct object_array revs = OBJECT_ARRAY_INIT; int all = 0, transform_stdin = 0, allow_undefined = 1, always = 0, peel_tag = 0; - struct name_ref_data data = { 0, 0, STRING_LIST_INIT_NODUP }; + struct name_ref_data data = { 0, 0, STRING_LIST_INIT_NODUP, STRING_LIST_INIT_NODUP }; struct option opts[] = { OPT_BOOL(0, "name-only", &data.name_only, N_("print only names (no SHA-1)")), OPT_BOOL(0, "tags", &data.tags_only, N_("only use tags to name the commits")), OPT_STRING_LIST(0, "refs", &data.ref_filters, N_("pattern"), N_("only use refs matching ")), + OPT_STRING_LIST(0, "discard", &data.discard_filters, N_("pattern"), + N_("ignore refs matching ")), OPT_GROUP(""), OPT_BOOL(0, "all", &all, N_("list all commits reachable from all refs")), OPT_BOOL(0, "stdin", &transform_stdin, N_("read from stdin")), diff --git a/t/t6007-rev-list-cherry-pick-file.sh b/t/t6007-rev-list-cherry-pick-file.sh index d072ec43b016..8a4c35f6ffee 100755 --- a/t/t6007-rev-list-cherry-pick-file.sh +++ b/t/t6007-rev-list-cherry-pick-file.sh @@ -118,6 +118,13 @@ test_expect_success 'name-rev --refs excludes non-matched patterns' ' test_cmp actual.named expect ' +test_expect_success 'name-rev --discard excludes matched patterns' ' + git rev-list --left-right --cherry-pick F...E -- bar > actual && + git name-rev --stdin --name-only --refs="*tags/*" --discard="*E" \ + < actual > actual.named && + test_cmp actual.named expect +' + cat >expect <
[PATCH 5/5] describe: teach describe negative pattern matches
From: Jacob Keller Teach git-describe the `--discard` option which will allow specifying a glob pattern of tags to ignore. This can be combined with the `--match` patterns to enable more flexibility in determining which tags to consider. For example, suppose you wish to find the first official release tag that contains a certain commit. If we assume that official release tags are of the form "v*" and pre-release candidates include "*rc*" in their name, we can now find the first tag that introduces commit abcdef via: git describe --contains --match="v*" --discard="*rc*" Add documentation and tests for this change. Signed-off-by: Jacob Keller --- Documentation/git-describe.txt | 8 builtin/describe.c | 21 + t/t6120-describe.sh| 8 3 files changed, 37 insertions(+) diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt index 7ad41e2f6ade..a89bbde207b2 100644 --- a/Documentation/git-describe.txt +++ b/Documentation/git-describe.txt @@ -88,6 +88,14 @@ OPTIONS patterns will be considered. Use `--no-match` to clear and reset the list of patterns. +--discard :: + Do not consider tags matching the given `glob(7)` pattern, excluding + the "refs/tags/" prefix. This can be used to narrow the tag space and + find only tags matching some meaningful criteria. If given multiple + times, a list of patterns will be accumulated and tags matching any + of the patterns will be discarded. Use `--no-discard` to clear and + reset the list of patterns. + --always:: Show uniquely abbreviated commit object as fallback. diff --git a/builtin/describe.c b/builtin/describe.c index 5cc9e9abe798..c09288ee6321 100644 --- a/builtin/describe.c +++ b/builtin/describe.c @@ -29,6 +29,7 @@ static int max_candidates = 10; static struct hashmap names; static int have_util; static struct string_list patterns = STRING_LIST_INIT_NODUP; +static struct string_list discard_patterns = STRING_LIST_INIT_NODUP; static int always; static const char *dirty; @@ -130,6 +131,22 @@ static int get_name(const char *path, const struct object_id *oid, int flag, voi return 0; /* +* If we're given discard patterns, first discard any tag which match +* any of the discard pattern. +*/ + if (discard_patterns.nr) { + struct string_list_item *item; + + if (!is_tag) + return 0; + + for_each_string_list_item(item, &discard_patterns) { + if (!wildmatch(item->string, path + 10, 0, NULL)) + return 0; + } + } + + /* * If we're given patterns, accept only tags which match at least one * pattern. */ @@ -421,6 +438,8 @@ int cmd_describe(int argc, const char **argv, const char *prefix) N_("consider most recent tags (default: 10)")), OPT_STRING_LIST(0, "match", &patterns, N_("pattern"), N_("only consider tags matching ")), + OPT_STRING_LIST(0, "discard", &discard_patterns, N_("pattern"), + N_("do not consider tags matching ")), OPT_BOOL(0, "always",&always, N_("show abbreviated commit object as fallback")), {OPTION_STRING, 0, "dirty", &dirty, N_("mark"), @@ -458,6 +477,8 @@ int cmd_describe(int argc, const char **argv, const char *prefix) argv_array_push(&args, "--tags"); for_each_string_list_item(item, &patterns) argv_array_pushf(&args, "--refs=refs/tags/%s", item->string); + for_each_string_list_item(item, &discard_patterns) + argv_array_pushf(&args, "--discard=refs/tags/%s", item->string); } if (argc) argv_array_pushv(&args, argv); diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh index 9e5db9b87a1f..4e4a9f2e5305 100755 --- a/t/t6120-describe.sh +++ b/t/t6120-describe.sh @@ -218,6 +218,14 @@ test_expect_success 'describe --contains and --match' ' test_cmp expect actual ' +test_expect_success 'describe --discard' ' + echo "c~1" >expect && + tagged_commit=$(git rev-parse "refs/tags/A^0") && + test_must_fail git describe --contains --match="B" $tagged_commit && + git describe --contains --match="?" --discard="A" $tagged_commit >actual && + test_cmp expect actual +' + test_expect_success 'describe --contains and --no-match' ' echo "A^0" >expect && tagged_commit=$(git rev-parse "refs/tags/A^0") && -- 2.11.0.403.g196674b8396b
[PATCH 0/5] extend git-describe pattern matching
From: Jacob Keller Teach git describe and git name-rev the ability to match multiple patterns inclusively. Additionally, teach these commands to also accept negative patterns to discard any refs which match. The pattern lists for positive and negative patterns are inclusive. This means that for the positive patterns, a reference will be considered as long as it matches at least one of the match patterns. It need not match all given patterns. Additionally for negative patterns, we will not consider any ref which matches any negative pattern, even if it matches one of the positive patterns. Together this allows the ability to express far more sets of tags than a single match pattern alone. It does not provide quite the same depth as would teaching full regexp but it is simpler and easy enough to understand. This is a re-send of a series from a month or so ago, I've since re-based this on next since it appears that it was not picked up before. Jacob Keller (5): doc: add documentation for OPT_STRING_LIST name-rev: extend --refs to accept multiple patterns name-rev: add support to discard refs by pattern match describe: teach --match to accept multiple patterns describe: teach describe negative pattern matches Documentation/git-describe.txt| 13 ++- Documentation/git-name-rev.txt| 11 +- Documentation/technical/api-parse-options.txt | 5 +++ builtin/describe.c| 51 ++ builtin/name-rev.c| 53 +-- t/t6007-rev-list-cherry-pick-file.sh | 37 +++ t/t6120-describe.sh | 27 ++ 7 files changed, 176 insertions(+), 21 deletions(-) -- 2.11.0.403.g196674b8396b
[PATCH 2/5] name-rev: extend --refs to accept multiple patterns
From: Jacob Keller Teach git name-rev to take a string list of patterns from --refs instead of only a single pattern. The list of patterns will be matched inclusively, such that a ref only needs to match one pattern to be included. If a ref will only be excluded if it does not match any of the patterns. Add tests and documentation for this change. Signed-off-by: Jacob Keller --- Documentation/git-name-rev.txt | 4 +++- builtin/name-rev.c | 41 +--- t/t6007-rev-list-cherry-pick-file.sh | 30 ++ 3 files changed, 62 insertions(+), 13 deletions(-) diff --git a/Documentation/git-name-rev.txt b/Documentation/git-name-rev.txt index ca28fb8e2a07..7433627db12d 100644 --- a/Documentation/git-name-rev.txt +++ b/Documentation/git-name-rev.txt @@ -26,7 +26,9 @@ OPTIONS --refs=:: Only use refs whose names match a given shell pattern. The pattern - can be one of branch name, tag name or fully qualified ref name. + can be one of branch name, tag name or fully qualified ref name. If + given multiple times, use refs whose names match any of the given shell + patterns. Use `--no-refs` to clear any previous ref patterns given. --all:: List all commits reachable from all refs diff --git a/builtin/name-rev.c b/builtin/name-rev.c index cd89d48b65e8..000a2a700ed3 100644 --- a/builtin/name-rev.c +++ b/builtin/name-rev.c @@ -108,7 +108,7 @@ static const char *name_ref_abbrev(const char *refname, int shorten_unambiguous) struct name_ref_data { int tags_only; int name_only; - const char *ref_filter; + struct string_list ref_filters; }; static struct tip_table { @@ -150,16 +150,33 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo if (data->tags_only && !starts_with(path, "refs/tags/")) return 0; - if (data->ref_filter) { - switch (subpath_matches(path, data->ref_filter)) { - case -1: /* did not match */ - return 0; - case 0: /* matched fully */ - break; - default: /* matched subpath */ - can_abbreviate_output = 1; - break; + if (data->ref_filters.nr) { + struct string_list_item *item; + int matched = 0; + + /* See if any of the patterns match. */ + for_each_string_list_item(item, &data->ref_filters) { + /* +* We want to check every pattern even if we already +* found a match, just in case one of the later +* patterns could abbreviate the output. +*/ + switch (subpath_matches(path, item->string)) { + case -1: /* did not match */ + break; + case 0: /* matched fully */ + matched = 1; + break; + default: /* matched subpath */ + matched = 1; + can_abbreviate_output = 1; + break; + } } + + /* If none of the patterns matched, stop now */ + if (!matched) + return 0; } add_to_tip_table(oid->hash, path, can_abbreviate_output); @@ -306,11 +323,11 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix) { struct object_array revs = OBJECT_ARRAY_INIT; int all = 0, transform_stdin = 0, allow_undefined = 1, always = 0, peel_tag = 0; - struct name_ref_data data = { 0, 0, NULL }; + struct name_ref_data data = { 0, 0, STRING_LIST_INIT_NODUP }; struct option opts[] = { OPT_BOOL(0, "name-only", &data.name_only, N_("print only names (no SHA-1)")), OPT_BOOL(0, "tags", &data.tags_only, N_("only use tags to name the commits")), - OPT_STRING(0, "refs", &data.ref_filter, N_("pattern"), + OPT_STRING_LIST(0, "refs", &data.ref_filters, N_("pattern"), N_("only use refs matching ")), OPT_GROUP(""), OPT_BOOL(0, "all", &all, N_("list all commits reachable from all refs")), diff --git a/t/t6007-rev-list-cherry-pick-file.sh b/t/t6007-rev-list-cherry-pick-file.sh index 1408b608eb03..d072ec43b016 100755 --- a/t/t6007-rev-list-cherry-pick-file.sh +++ b/t/t6007-rev-list-cherry-pick-file.sh @@ -99,6 +99,36 @@ test_expect_success '--cherry-pick bar does not come up empty (II)' ' test_cmp actual.named expect ' +test_expect_success 'name-rev multiple --refs combine inclusive' ' + git rev-list --left-right --cherry-pick F...E -- bar > actual && + git name-rev
[PATCHv2 4/4] unpack-trees: support super-prefix option
In the future we want to support working tree operations within submodules, e.g. "git checkout --recurse-submodules", which will update the submodule to the commit as recorded in its superproject. In the submodule the unpack-tree operation is carried out as usual, but the reporting to the user needs to prefix any path with the superproject. The mechanism for this is the super-prefix. (see 74866d757, git: make super-prefix option) Add support for the super-prefix option for commands that unpack trees by wrapping any path output in unpacking trees in the newly introduced super_prefixed function. This new function prefixes any path with the super-prefix if there is one. Assuming the submodule case doesn't happen in the majority of the cases, we'd want to have a fast behavior for no super prefix, i.e. no reallocation/copying, but just returning path. Another aspect of introducing the `super_prefixed` function is to consider who owns the memory and if this is the right place where the path gets modified. As the super prefix ought to change the output behavior only and not the actual unpack tree part, it is fine to be that late in the line. As we get passed in 'const char *path', we cannot change the path itself, which means in case of a super prefix we have to copy over the path. We need two static buffers in that function as the error messages contain at most two paths. For testing purposes enable it in read-tree, which has no output of paths other than an unpack-trees.c. These are all converted in this patch. Signed-off-by: Stefan Beller --- This is only patchv4 that is rerolled, patches 1-3 remain as is. The test uses '\'' now, and the super_prefixed function is rewritten using the flow Junio suggested. The commit message got enhanced. Thanks, Stefan git.c | 2 +- t/t1001-read-tree-m-2way.sh | 8 unpack-trees.c | 38 +++--- 3 files changed, 44 insertions(+), 4 deletions(-) diff --git a/git.c b/git.c index dce529fcbf..acbabd1298 100644 --- a/git.c +++ b/git.c @@ -471,7 +471,7 @@ static struct cmd_struct commands[] = { { "prune-packed", cmd_prune_packed, RUN_SETUP }, { "pull", cmd_pull, RUN_SETUP | NEED_WORK_TREE }, { "push", cmd_push, RUN_SETUP }, - { "read-tree", cmd_read_tree, RUN_SETUP }, + { "read-tree", cmd_read_tree, RUN_SETUP | SUPPORT_SUPER_PREFIX}, { "receive-pack", cmd_receive_pack }, { "reflog", cmd_reflog, RUN_SETUP }, { "remote", cmd_remote, RUN_SETUP }, diff --git a/t/t1001-read-tree-m-2way.sh b/t/t1001-read-tree-m-2way.sh index 7b70089705..5ededd8e40 100755 --- a/t/t1001-read-tree-m-2way.sh +++ b/t/t1001-read-tree-m-2way.sh @@ -363,6 +363,14 @@ test_expect_success 'a/b (untracked) vs a, plus c/d case test.' ' test -f a/b ' +test_expect_success 'read-tree supports the super-prefix' ' + cat <<-EOF >expect && + error: Updating '\''fictional/a'\'' would lose untracked files in it + EOF + test_must_fail git --super-prefix fictional/ read-tree -u -m "$treeH" "$treeM" 2>actual && + test_cmp expect actual +' + test_expect_success 'a/b vs a, plus c/d case setup.' ' rm -f .git/index && rm -fr a && diff --git a/unpack-trees.c b/unpack-trees.c index 7a6df99d10..0a24472359 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -52,6 +52,36 @@ static const char *unpack_plumbing_errors[NB_UNPACK_TREES_ERROR_TYPES] = { ? ((o)->msgs[(type)]) \ : (unpack_plumbing_errors[(type)]) ) +static const char *super_prefixed(const char *path) +{ + static struct strbuf buf[2] = {STRBUF_INIT, STRBUF_INIT}; + static int super_prefix_len = -1; + static unsigned idx = 0; + + if (super_prefix_len < 0) { + if (!get_super_prefix()) + super_prefix_len = 0; + else { + int i; + + super_prefix_len = strlen(get_super_prefix()); + for (i = 0; i < ARRAY_SIZE(buf); i++) + strbuf_addstr(&buf[i], get_super_prefix()); + } + } + + if (!super_prefix_len) + return path; + + if (++idx >= ARRAY_SIZE(buf)) + idx = 0; + + strbuf_setlen(&buf[idx], super_prefix_len); + strbuf_addstr(&buf[idx], path); + + return buf[idx].buf; +} + void setup_unpack_trees_porcelain(struct unpack_trees_options *opts, const char *cmd) { @@ -172,7 +202,7 @@ static int add_rejected_path(struct unpack_trees_options *o, const char *path) { if (!o->show_all_errors) - return error(ERRORMSG(o, e), path); + return error(ERRORMSG(o, e), super_prefixed(path)); /* * Otherwise, insert in a list for future display by @@ -196,7 +226,7 @@ static void display_error_msgs(struct unpack
Re: [PATCH 4/4] unpack-trees: support super-prefix option
On Wed, Jan 11, 2017 at 3:28 PM, Junio C Hamano wrote: > Stefan Beller writes: > >>> Preparing the expected output "expect" outside test_expect_success >>> block is also old-school. Move it inside the new test? >> >> I looked into that. What is our current stance on using single/double quotes? > > Using dq around executable part, i.e. > > test_expect_success title " > echo \"'foo'\" > " > > is a million-times more grave sin even if the test initially does > not refer to any variable in its body. Somebody will eventually > need to add a line that refers to a variable and either forgets to > quote \$ in front or properly quotes it. agreed. > The former makes the > variable interpolated while the arguments to the test_expect_success > is prepared (which is a bug) and the latter makes the result quite > ugly, like so: > > test_expect_success title " > sq=\' var=foo && > echo \"\${sq}\$value\${sq}\" > " > > Enclosing the body always in sq-pair does mean something ugly like > this from the beginning once you need to use sq inside: > > test_expect_success title ' > sq='\'' && > echo "${sq}foo${sq}" > ' This one fails error: bug in the test script: broken &&-chain: sq=' && echo "${sq}foo${sq}" both other occurrences of using ${sq} are defining sq outside (one as sq=\' and the other as sq="'") So I think we either have to keep sq out of the test, sq="'" test_expect_success title ' echo "${sq}foo${sq}" ' or do the echo/printf trick inside: test_expect_success title ' sq=$(echo -e "\x27") && echo "${sq}foo${sq}" ' Eh, I just realize the '\'' works inside here-doc, too. Nevermind then, I'll resend it with just quoted sq in the here-doc then. Thanks, Stefan
Re: [PATCH v10 19/20] branch: use ref-filter printing APIs
On Tue, Jan 10, 2017 at 12:49 AM, Karthik Nayak wrote: > diff --git a/builtin/branch.c b/builtin/branch.c > index 34cd61cd9..f293ee5b0 100644 > --- a/builtin/branch.c > +++ b/builtin/branch.c > @@ -37,11 +37,11 @@ static unsigned char head_sha1[20]; > static int branch_use_color = -1; > static char branch_colors[][COLOR_MAXLEN] = { > GIT_COLOR_RESET, > - GIT_COLOR_NORMAL, /* PLAIN */ > - GIT_COLOR_RED, /* REMOTE */ > - GIT_COLOR_NORMAL, /* LOCAL */ > - GIT_COLOR_GREEN,/* CURRENT */ > - GIT_COLOR_BLUE, /* UPSTREAM */ > + GIT_COLOR_NORMAL, /* PLAIN */ > + GIT_COLOR_RED, /* REMOTE */ > + GIT_COLOR_NORMAL, /* LOCAL */ > + GIT_COLOR_GREEN,/* CURRENT */ > + GIT_COLOR_BLUE, /* UPSTREAM */ > }; What's... actually changing here? It looks like just white space? Is there a strong reason for why this is changing? Thanks, Jake
Re: [PATCH 4/4] unpack-trees: support super-prefix option
Stefan Beller writes: >> Preparing the expected output "expect" outside test_expect_success >> block is also old-school. Move it inside the new test? > > I looked into that. What is our current stance on using single/double quotes? Using dq around executable part, i.e. test_expect_success title " echo \"'foo'\" " is a million-times more grave sin even if the test initially does not refer to any variable in its body. Somebody will eventually need to add a line that refers to a variable and either forgets to quote \$ in front or properly quotes it. The former makes the variable interpolated while the arguments to the test_expect_success is prepared (which is a bug) and the latter makes the result quite ugly, like so: test_expect_success title " sq=\' var=foo && echo \"\${sq}\$value\${sq}\" " Enclosing the body always in sq-pair does mean something ugly like this from the beginning once you need to use sq inside: test_expect_success title ' sq='\'' && echo "${sq}foo${sq}" ' or test_expect_success title ' echo "'\''foo'\''" ' but the ugliness is localized to places where single quote is absolutely needed, and '\'' is something eyes soon get used to as an idiom [*1*]. It does not affect every reference of variables like enclosing with dq does. [Footnote] *1* That "idiom"-ness is the reason why the last example above is not written like this: test_expect_success title ' echo "'\'foo\''" ' even though the sq pair surrounding "foo" is not technically necessary. It lets us think of the string as almost always is inside sq context, with the single exception that we step outside sq context and append a sq by saying bs-sq (\').
Re: [PATCH 4/4] unpack-trees: support super-prefix option
On Wed, Jan 11, 2017 at 1:32 PM, Junio C Hamano wrote: > Stefan Beller writes: > >> Add support for the super-prefix option for commands that unpack trees. >> For testing purposes enable it in read-tree, which has no other path >> related output. > > "path related output"? I am not sure I understand this. Well, s/path related output/output of paths/. > When read-tree reads N trees, or unpack_trees() is asked to "merge" > N trees, how does --super-prefix supposed to interact with the paths > in these trees? Does the prefix added to paths in all trees? Internally the super-prefix is ignored. Only the (input and) output is using that super prefix for messages. It was introduced for grepping recursively into submodules, i.e. invoke as git grep --recurse-submodules \ -e path/inside/submodule/and/further/down # internally it invokes: git -C .. --super-prefix .. grep .. # which operates "just normal" except for the # input parsing and output The use case for this patch is working tree related things, i.e. git checkout --recurse-submodules # internally when recursing we call "git read-tree -u", but # reporting could be: Your local changes to the following files would be overwritten by checkout: path/inside/submodule/file.txt > > Did you mean that read-tree (and unpack_trees() in general) is a > whole-tree operation and there is no path related input (i.e. > pathspec limiting), but if another command that started in the > superproject is running us in its submodule, it wants us to prefix > the path to the submodule when we report errors? Yes. I tried to explain it better above, but you got it here. > > If that is what is going on, I think it makes sense, but it may be > asking too much from the readers to guess that from "Add support for > the super-prefix option". I need to enhance the commit message by a lot then. > >> --- a/t/t1001-read-tree-m-2way.sh >> +++ b/t/t1001-read-tree-m-2way.sh >> @@ -363,6 +363,15 @@ test_expect_success 'a/b (untracked) vs a, plus c/d >> case test.' ' >> test -f a/b >> ' >> >> +cat <<-EOF >expect && >> + error: Updating 'fictional/a' would lose untracked files in it >> +EOF >> + >> +test_expect_success 'read-tree supports the super-prefix' ' >> + test_must_fail git --super-prefix fictional/ read-tree -u -m "$treeH" >> "$treeM" 2>actual && >> + test_cmp expect actual >> +' >> + > > Preparing the expected output "expect" outside test_expect_success > block is also old-school. Move it inside the new test? I looked into that. What is our current stance on using single/double quotes? Most tests use single quotes for the test title as well as the test instructions, such that double quotes can be used naturally in there. But single quotes cannot be used even escaped, such that we need to do a thing like sq="'" test_expect_success 'test title' ' cat <<-EOF >expect && error for ${sq}path${sq} EOF test instructions go here ' though that is not used as often as I would think as grep \${sq} yields t1507 and t3005. > > Hmph, as a reader of the code, I do not even want to wonder how > expensive get_super_prefix() is. If the executable part of the > above were written like this, it would have been easier to > understand: > > if (super_prefix_len < 0) { > if (!get_super_prefix()) > super_prefix_len = 0; > else { > int i; > ... prepare buf[] and set super_prefix_len ...; > } > } > > if (!super_prefix_len) > return path; > > ... use buf[] to do the prefixing and return it ... > good point. I'll fix that.
Re: [RFC PATCH 3/2] vreportf: add prefix to each line
Jeff King writes: > On Wed, Jan 11, 2017 at 09:01:38AM -0500, Jeff King wrote: > >> [1/2]: Revert "vreportf: avoid intermediate buffer" >> [2/2]: vreport: sanitize ASCII control chars > > We've talked before about repeating the "error:" header for multi-line > messages. The reversion in patch 1 makes this easy to play with, so I > did. I kind of hate it. But here it is, for your viewing pleasure. Thanks. > > -- >8 -- > Subject: vreportf: add prefix to each line > > Some error and warning messages are multi-line, but we put > the prefix only on the first line. This means that either > the subsequent lines don't have any indication that they're > connected to the original error, or the caller has to make > multiple calls to error() or warning(). And that's not even > possible for die(). > > Instead, let's put the prefix at the start of each line, > just as advise() does. > > Note that this patch doesn't update the tests yet, so it > causes tons of failures. This is just to see what it might > look like. > > It's actually kind of ugly. For instance, a failing test in > t3600 now produces: > >error: the following files have staged content different from both the >error: file and the HEAD: >error: bar.txt >error: foo.txt >error: (use -f to force removal) > > which seems a bit aggressive. I agree that it is ugly, but one reason I was hoping to do this myself (or have somebody else do it by procrastinating) was that I thought it may help i18n. That is, for an original error(_("we found an error")) a .po file may translate the string to a multi-line string that has LFs in it and the output would look correct. The translator already can do so by indenting the second and subsequent lines by the same column-width as "error: " (or its translation in their language, if we are going to i18n these headers), but that (1) is extra work for them, and (2) makes it impossible to have the same message appear in different contexts (i.e. "error:" vs "warning:" that have different column-widths). > It also screws up messages which indent with tabs (the prefix eats > up some of the tabstop, making the indentation smaller). This is unavoidable and at the same time is a non-issue, isn't it? Messages that indent the second and subsequent lines with tabs are compensating the lack of the multi-line support of vreportf(), which this RFC patch fixes. They may need to be adjusted to the new world order, but that is a good thing. New multi-line messages no longer have to worry about the prefix that is added only to the first line when continuing the message to multiple lines. > And the result is > a little harder if you need to cut-and-paste the file lines > (if your terminal lets you triple-click to select a whole > line, now you have this "error:" cruft on the front). This might be an issue, but I personally do not care (as I know how to drive my "screen", namely, use 'c' while cutting) ;-). > It may be possible to address some of that by using some > other kind of continuation marker (instead of just repeating > the prefix), and expanding initial tabs. Yes indeed. The "some other kind of continuation marker" could just be a run of spaces that fill the same column as the "error: " or other prefix given to the first line. > Signed-off-by: Jeff King > --- > usage.c | 16 ++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/usage.c b/usage.c > index ad6d2910f..8a1f6ff4e 100644 > --- a/usage.c > +++ b/usage.c > @@ -8,18 +8,30 @@ > > static FILE *error_handle; > > +static void show_line(FILE *fh, const char *prefix, const char *line, int > len) > +{ > + fprintf(fh, "%s%.*s\n", prefix, len, line); > +} > + > void vreportf(const char *prefix, const char *err, va_list params) > { > char msg[4096]; > FILE *fh = error_handle ? error_handle : stderr; > + const char *base; > char *p; > > vsnprintf(msg, sizeof(msg), err, params); > + > + base = msg; > for (p = msg; *p; p++) { > - if (iscntrl(*p) && *p != '\t' && *p != '\n') > + if (*p == '\n') { > + show_line(fh, prefix, base, p - base); > + base = p + 1; > + } else if (iscntrl(*p) && *p != '\t') > *p = '?'; > } > - fprintf(fh, "%s%s\n", prefix, msg); > + > + show_line(fh, prefix, base, p - base); > } > > static NORETURN void usage_builtin(const char *err, va_list params)
Re: [PATCHv2 1/2] builtin/commit.c: drop use snprintf via dynamic allocation
Elia Pinto writes: > In general snprintf is bad because it may silently truncate results if we're > wrong. In this patch where we use PATH_MAX, we'd want to handle larger > paths anyway, so we switch to dynamic allocation. > > Helped-by: Jeff King > Signed-off-by: Elia Pinto > --- > This is the second version of the patch. > > I have split the original commit in two, as discussed here > http://public-inbox.org/git/20161213132717.42965-1-gitter.spi...@gmail.com/. And both halves have the same title? I think the the primary value of this one is that it removes the PATH_MAX limitation from the environment setting that points to a filename. Reducing the number of snprintf() call is a side effect, even though that may have been what motivated you originally. The original code used snprintf() correctly after all. The primary value of 2/2 is reduction of the cognitive burden from the programmers---switching to xstrfmt(), they no longer have to count bytes needed for static allocation. Reducing the number of snprintf() call is a side effect, even though that may have been what motivated you originally. The original code used snprintf() correctly after all. The code looks correct in both patches (except that in 1/2 _clear() immediately before exit() wouldn't have to be there, but it does not hurt to have one either). They need to be better explained. Thanks. > builtin/commit.c | 22 +++--- > 1 file changed, 11 insertions(+), 11 deletions(-) > > diff --git a/builtin/commit.c b/builtin/commit.c > index 0ed634b26..09bcc0f13 100644 > --- a/builtin/commit.c > +++ b/builtin/commit.c > @@ -960,15 +960,16 @@ static int prepare_to_commit(const char *index_file, > const char *prefix, > return 0; > > if (use_editor) { > - char index[PATH_MAX]; > - const char *env[2] = { NULL }; > - env[0] = index; > - snprintf(index, sizeof(index), "GIT_INDEX_FILE=%s", index_file); > - if (launch_editor(git_path_commit_editmsg(), NULL, env)) { > + struct argv_array env = ARGV_ARRAY_INIT; > + > + argv_array_pushf(&env, "GIT_INDEX_FILE=%s", index_file); > + if (launch_editor(git_path_commit_editmsg(), NULL, env.argv)) { > fprintf(stderr, > _("Please supply the message using either -m or -F > option.\n")); > + argv_array_clear(&env); > exit(1); > } > + argv_array_clear(&env); > } > > if (!no_verify && > @@ -1557,23 +1558,22 @@ static int run_rewrite_hook(const unsigned char > *oldsha1, > > int run_commit_hook(int editor_is_used, const char *index_file, const char > *name, ...) > { > - const char *hook_env[3] = { NULL }; > - char index[PATH_MAX]; > + struct argv_array hook_env = ARGV_ARRAY_INIT; > va_list args; > int ret; > > - snprintf(index, sizeof(index), "GIT_INDEX_FILE=%s", index_file); > - hook_env[0] = index; > + argv_array_pushf(&hook_env, "GIT_INDEX_FILE=%s", index_file); > > /* >* Let the hook know that no editor will be launched. >*/ > if (!editor_is_used) > - hook_env[1] = "GIT_EDITOR=:"; > + argv_array_push(&hook_env, "GIT_EDITOR=:"); > > va_start(args, name); > - ret = run_hook_ve(hook_env, name, args); > + ret = run_hook_ve(hook_env.argv,name, args); > va_end(args); > + argv_array_clear(&hook_env); > > return ret; > }
Re: [PATCH] index: improve constness for reading blob data
Brandon Williams writes: > Improve constness of the index_state parameter to the > 'read_blob_data_from_index' function. > > Signed-off-by: Brandon Williams > --- Thanks.
Re: [PATCH 4/4] unpack-trees: support super-prefix option
Stefan Beller writes: > Add support for the super-prefix option for commands that unpack trees. > For testing purposes enable it in read-tree, which has no other path > related output. "path related output"? I am not sure I understand this. When read-tree reads N trees, or unpack_trees() is asked to "merge" N trees, how does --super-prefix supposed to interact with the paths in these trees? Does the prefix added to paths in all trees? Did you mean that read-tree (and unpack_trees() in general) is a whole-tree operation and there is no path related input (i.e. pathspec limiting), but if another command that started in the superproject is running us in its submodule, it wants us to prefix the path to the submodule when we report errors? If that is what is going on, I think it makes sense, but it may be asking too much from the readers to guess that from "Add support for the super-prefix option". > --- a/t/t1001-read-tree-m-2way.sh > +++ b/t/t1001-read-tree-m-2way.sh > @@ -363,6 +363,15 @@ test_expect_success 'a/b (untracked) vs a, plus c/d case > test.' ' > test -f a/b > ' > > +cat <<-EOF >expect && > + error: Updating 'fictional/a' would lose untracked files in it > +EOF > + > +test_expect_success 'read-tree supports the super-prefix' ' > + test_must_fail git --super-prefix fictional/ read-tree -u -m "$treeH" > "$treeM" 2>actual && > + test_cmp expect actual > +' > + Preparing the expected output "expect" outside test_expect_success block is also old-school. Move it inside the new test? > diff --git a/unpack-trees.c b/unpack-trees.c > index 7a6df99d10..bc56195e27 100644 > --- a/unpack-trees.c > +++ b/unpack-trees.c > @@ -52,6 +52,37 @@ static const char > *unpack_plumbing_errors[NB_UNPACK_TREES_ERROR_TYPES] = { > ? ((o)->msgs[(type)]) \ > : (unpack_plumbing_errors[(type)]) ) > > +static const char *super_prefixed(const char *path) > +{ > + /* > + * This is used for the error messages above. > + * We need to have exactly two buffer spaces. > + */ > + static struct strbuf buf[2] = {STRBUF_INIT, STRBUF_INIT}; > + static int super_prefix_len = -1; > + static unsigned idx = 0; > + > + if (!get_super_prefix()) > + return path; > + > + if (super_prefix_len < 0) { > + int i; > + > + for (i = 0; i < ARRAY_SIZE(buf); i++) > + strbuf_addstr(&buf[i], get_super_prefix()); > + > + super_prefix_len = strlen(get_super_prefix()); > + } > + > + if (++idx >= ARRAY_SIZE(buf)) > + idx = 0; > + > + strbuf_setlen(&buf[idx], super_prefix_len); > + strbuf_addstr(&buf[idx], path); > + > + return buf[idx].buf; > +} Hmph, as a reader of the code, I do not even want to wonder how expensive get_super_prefix() is. If the executable part of the above were written like this, it would have been easier to understand: if (super_prefix_len < 0) { if (!get_super_prefix()) super_prefix_len = 0; else { int i; ... prepare buf[] and set super_prefix_len ...; } } if (!super_prefix_len) return path; ... use buf[] to do the prefixing and return it ...
Re: [PATCH 2/2] diff: document the pattern format for diff.orderFile
Richard Hansen writes: >> Back then we didn't even have wildmatch(), and used fnmatch() >> instead, so forcing FNM_PATHNAME would have meant that people >> wouldn't be able to say "foo*bar" to match "foo/other/bar"; with >> wildmatch, "foo**bar" lets you defeat WM_PATHNAME so having >> WM_PATHNAME always in effect is less of an issue, but with >> fnmatch(), having FNM_PATHNAME always in effect has a lot of >> downside. > > Ah, that makes sense. >> >> I'd expect that orderfile people have today will be broken and >> require tweaking if you switched WM_PATHNAME on. > > OK, so we don't want to turn on WM_PATHNAME unless we do it for a new > major version. I do agree with you that if we were starting Git from scratch, or at least if we were adding diff.orderfile feature today, we would have used wildmatch(WM_PATHNAME) for this matching. We would also have used the same parser as used to read the exclude files (and when we see negative matching entries in the parsed result, either errored out or ignored them with warning). That kind of change unfortunately would require a major version bump, I am afraid.
[PATCH] submodule absorbgitdirs: mention in docstring help
This part was missing in f6f85861 (submodule: add absorb-git-dir function, 2016-12-12). Noticed-by: Jonathan Nieder Signed-off-by: Stefan Beller --- git-submodule.sh | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/git-submodule.sh b/git-submodule.sh index 554bd1c494..b43af1742c 100755 --- a/git-submodule.sh +++ b/git-submodule.sh @@ -12,7 +12,8 @@ USAGE="[--quiet] add [-b ] [-f|--force] [--name ] [--reference ] [--recursive] [--] [...] or: $dashless [--quiet] summary [--cached|--files] [--summary-limit ] [commit] [--] [...] or: $dashless [--quiet] foreach [--recursive] - or: $dashless [--quiet] sync [--recursive] [--] [...]" + or: $dashless [--quiet] sync [--recursive] [--] [...] + or: $dashless [--quiet] absorbgitdirs [--] [...]" OPTIONS_SPEC= SUBDIRECTORY_OK=Yes . git-sh-setup -- 2.11.0.259.g7b30ecf4f0
Re: [PATCH 1/2] diff: document behavior of relative diff.orderFile
Jeff King writes: > On Tue, Jan 10, 2017 at 02:15:11PM -0800, Junio C Hamano wrote: > >> Richard Hansen writes: >> ... >> > I think so. (For bare repositories anyway; non-bare should be >> > relative to GIT_WORK_TREE.) Perhaps git_config_pathname() itself >> > should convert relative paths to absolute so that every pathname >> > setting automatically works without changing any calling code. >> >> Yes, that was what I was alluding to. We might have to wait until >> major version boundary to do so, but I think that it is the sensible >> way forward in the longer term to convert relative to absolute in >> git_config_pathname(). > > Yeah, I'd agree. > > I'm undecided on whether it would need to happen at a major version > bump. ... > > So I dunno. I do hate to break even corner cases, but I'm having trouble > imagining the scenario where somebody is actually using the current > behavior in a useful way. Thanks for a detailed analysis (I probably should have spelt them out when I said "we might have to" to save you the trouble).
Re: [musl] Re: Test failures when Git is built with libpcre and grep is built without it
Jeff King writes: > Another difference I found is that "[\d]" matches a literal "\" or "d" > in ERE, but behaves like "[0-9]" in PCRE. I'll work up a patch based on > that. Wow, clever. Thanks.
Re: [PATCH v1] convert: add "status=delayed" to filter process protocol
Lars Schneider writes: >> Hmm, I would have expected that the basic flow would become >> >> for each paths to be processed: >> convert-to-worktree to buf >> if not delayed: >> do the caller's thing to use buf >> else: >> remember path >> >> for each delayed paths: >> ensure filter process finished processing for path >> fetch the thing to buf from the process >> do the caller's thing to use buf >> >> and that would make quite a lot of sense. However, what is actually >> implemented is a bit disappointing from that point of view. While >> its first part is the same as above, the latter part instead does: >> >> for each delayed paths: >> checkout the path >> ... > > I am not sure I can follow you here. > ... > I implemented the "checkout_delayed_entries" function in v1 because > it solved the problem with minimal changes in the existing code. Our previous > discussion made me think that this is the preferred way: > > I do not think we want to see such a rewrite all over the > codepaths. It might be OK to add such a "these entries are known > to be delayed" list in struct checkout so that the above becomes > more like this: > >for (i = 0; i < active_nr; i++) > checkout_entry(active_cache[i], state, NULL); > + checkout_entry_finish(state); > > That is, addition of a single "some of the checkout_entry() calls > done so far might have been lazy, and I'll give them a chance to > clean up" might be palatable. Anything more than that on the > caller side is not. But that is apples-and-oranges comparision, no? The old discussion assumes there is no "caller's thing to use buf" other than "checkout to the working tree", which is why the function its main loop calls is "checkout_entry()" and the caller does not see the contents of the filtered blob at all. In that context, checkout_entry_finish() that equally does not let the caller see the contents of the filtered blob is quite appropriate.
Re: [PATCH v1] convert: add "status=delayed" to filter process protocol
Jakub Narębski writes: >> Yes, this problem happens every day with filters that perform network >> requests (e.g. GitLFS). > > Do I understand it correctly that the expected performance improvement > thanks to this feature is possible only if there is some amount of > parallelism and concurrency in the filter? That is, filter can be sending > one blob to Git while processing other one, or filter can be fetching blobs > in parallel. The first-object latency may not be helped, but by allowing "delayed", the latency to retrieve the second and subsequent objects can be hidden, I would think.
Re: git cat-file on a submodule
Jeff King writes: > On Tue, Jan 10, 2017 at 07:11:40PM -0500, David Turner wrote: > >> Why does git cat-file -t $sha:foo, where foo is a submodule, not work? > ... > I'm not sure if I'm complaining or not. I can't immediately think of > something that would be horribly broken. But it really feels like you > are using the wrong tool, and patching the tool to handle this case will > probably lead to weird cognitive dissonance down the road. "git cat-file [any option] $sha" should fail and complain for any $sha that does not name an object that exists in the object database of the repository it is working on. So I'd complain if the first example command quoted above from David's mail stopped failing when the commit bound at 'foo' in the top-level treeish $sha (i.e. a commit in the submodule) does not exist in the top-level repository's object database. > Maybe it would help to describe your use case more fully. If what you > care about is the presumed type based on the surrounding tree, then > maybe: > > git --literal-pathspecs ls-tree $sha -- foo > > would be a better match. Yup.
[PATCH] lib-submodule-update.sh: drop unneeded shell
In modern Git we prefer "git -C )" as it doesn't need an extra shell. Signed-off-by: Stefan Beller --- And because it is in a setup function, we actually save the invocation of 22 shells for a single run of the whole test suite. Noticed while adding a lot more in near vincinity, though not as near to cause merge conflicts, so sending it extra. Thanks, Stefan t/lib-submodule-update.sh | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh index 79cdd34a54..915eb4a7c6 100755 --- a/t/lib-submodule-update.sh +++ b/t/lib-submodule-update.sh @@ -69,10 +69,7 @@ create_lib_submodule_repo () { git checkout -b "replace_sub1_with_directory" "add_sub1" && git submodule update && - ( - cd sub1 && - git checkout modifications - ) && + git -C sub1 checkout modifications && git rm --cached sub1 && rm sub1/.git* && git config -f .gitmodules --remove-section "submodule.sub1" && -- 2.11.0.259.g7b30ecf4f0
Re: [PATCH 2/2] diff: document the pattern format for diff.orderFile
On 2017-01-11 13:15, Junio C Hamano wrote: Richard Hansen writes: On 2017-01-10 21:46, Junio C Hamano wrote: Richard Hansen writes: I was looking at the code to see how the two file formats differed and noticed that match_order() doesn't set the WM_PATHNAME flag when it calls wildmatch(). That's unintentional (a bug), right? It has been that way from day one IIRC even before we introduced wildmatch()---IOW it may be intentional that the current code that uses wildmatch() does not use WM_PATHNAME. You are the original author (af5323e027 2005-05-30). Do you remember what your intention was? Yes. Back then we didn't even have wildmatch(), and used fnmatch() instead, so forcing FNM_PATHNAME would have meant that people wouldn't be able to say "foo*bar" to match "foo/other/bar"; with wildmatch, "foo**bar" lets you defeat WM_PATHNAME so having WM_PATHNAME always in effect is less of an issue, but with fnmatch(), having FNM_PATHNAME always in effect has a lot of downside. Ah, that makes sense. I'd expect that orderfile people have today will be broken and require tweaking if you switched WM_PATHNAME on. OK, so we don't want to turn on WM_PATHNAME unless we do it for a new major version. I'll do another re-roll and document the non-WM_PATHNAME behavior. Perhaps I'll encourage users to prefer ** over * if they want to match slash (even though they are equivalent) so that migration is easier if we ever do turn on WM_PATHNAME. -Richard
Re: git cat-file on a submodule
On Wed, 2017-01-11 at 07:53 -0500, Jeff King wrote: > On Tue, Jan 10, 2017 at 07:11:40PM -0500, David Turner wrote: > > > Why does git cat-file -t $sha:foo, where foo is a submodule, not work? > > Because "cat-file" is about inspecting items in the object database, and > typically the submodule commit is not present in the superproject's > database. So we cannot know its type. You can infer what it _should_ be > from the surrounding tree, but you cannot actually do the object lookup. > > Likewise, "git cat-file -t $sha1:Makefile" is not just telling you that > we found a 100644 entry in the tree, so we expect a blob. It's resolving > to a sha1, and then checking the type of that sha1 in the database. It > _should_ be a blob, but if it isn't, then cat-file is the tool that > should tell you that it is not. > > > git rev-parse $sha:foo works. > > Right. Because that command is about resolving a name to a sha1, which > we can do even without the object. > > > By "why", I mean "would anyone complain if I fixed it?" FWIW, I think > > -p should just return the submodule's sha. > > I'm not sure if I'm complaining or not. I can't immediately think of > something that would be horribly broken. But it really feels like you > are using the wrong tool, and patching the tool to handle this case will > probably lead to weird cognitive dissonance down the road. OK, this makes sense to me. I tried cat-file because that is the tool I was familiar with, but that doesn't mean that it was the right thing to do. > Maybe it would help to describe your use case more fully. If what you > care about is the presumed type based on the surrounding tree, then > maybe: > > git --literal-pathspecs ls-tree $sha -- foo That (minus --literal-pathspecs, which does not exist in the version of git I happen to be using) is fine for my purposes. Thanks.
Re: [PATCH 2/2] diff: document the pattern format for diff.orderFile
Richard Hansen writes: > On 2017-01-10 21:46, Junio C Hamano wrote: >> Richard Hansen writes: >> >>> I was looking at the code to see how the two file formats differed and >>> noticed that match_order() doesn't set the WM_PATHNAME flag when it >>> calls wildmatch(). That's unintentional (a bug), right? >> >> It has been that way from day one IIRC even before we introduced >> wildmatch()---IOW it may be intentional that the current code that >> uses wildmatch() does not use WM_PATHNAME. > > You are the original author (af5323e027 2005-05-30). Do you remember > what your intention was? Yes. Back then we didn't even have wildmatch(), and used fnmatch() instead, so forcing FNM_PATHNAME would have meant that people wouldn't be able to say "foo*bar" to match "foo/other/bar"; with wildmatch, "foo**bar" lets you defeat WM_PATHNAME so having WM_PATHNAME always in effect is less of an issue, but with fnmatch(), having FNM_PATHNAME always in effect has a lot of downside. I'd expect that orderfile people have today will be broken and require tweaking if you switched WM_PATHNAME on.
Re: [RFC PATCH 0/5] Localise error headers
Jeff King writes: > Yes, I would think die_errno() is a no-brainer for translation, since > the strerror() will be translated. > >> apply.c:die(_("internal error")); >> >> That is funny, too. I think we should substitute that with >> >> die("BUG: untranslated, but what went wrong instead") > > Yep. We did not consistently use "BUG:" in the early days. I would say > that "BUG" lines do not need to be translated. The point is that nobody > should ever see them, so it seems like there is little point in giving > extra work to translators. In addition, "BUG: " is relatively recent introduction to our codebase. Perhaps having a separate BUG() function help the distinction further?
Re: [PATCH v1] convert: add "status=delayed" to filter process protocol
On Wed, Jan 11, 2017 at 11:13:00AM +0100, Lars Schneider wrote: > > In v1 I implemented a) with the busy-loop problem in mind. > > My thinking was this: > > If the filter sees at least one filter request twice then the filter knows > that > Git has already requested all files that require filtering. At that point the > filter could just block the "delayed" answer to the latest filter request > until > at least one of the previously delayed requests can be fulfilled. Then the > filter > answers "delay" to Git until Git requests the blob that can be fulfilled. This > process cycles until all requests can be fulfilled. Wouldn't that work? > > I think a "done" message by the filter is not easy. Right now the protocol > works > in a mode were Git always asks and the filter always answers. I believe > changing > the filter to be able to initiate a "done" message would complicated the > protocol. > > > Additionally, the protocol should specify a sentinel "no more entries" value > > that could be sent from Git to the filter to signal that there are no more > > files > > to checkout. Some filters may implement mechanisms for converting files that > > require a signal to know when all files have been sent. Specifically, Git > > LFS > > (https://git-lfs.github.com) batches files to be transferred together, and > > needs > > to know when all files have been announced to truncate and send the last > > batch, > > if it is not yet full. I'm sure other filter implementations use a similar > > mechanism and would benefit from this as well. > > I agree. I think the filter already has this info implicitly as explained > above > but an explicit message would be better! This works, but forces some undesirable constraints: - The filter has to keep track of all items in the checkout to determine how many times Git has asked about them. This is probably fine, though annoying. Cases where this is not fine is when there are _many_ items in the checkout forcing filter implementations to keep track of large sets of data. - The protocol is now _must_ ask for the status of checkout items in a particular order. If the assumption is that "Git has sent me all items in the checkout by the point I see Git ask for the status of an item more than once", then Git _cannot_ ask for the status of a delayed item while there are still more items to checkout. This could be OK, so long as the protocol commits to it and documents this behavior. What are your thoughts? -- Thanks, Taylor Blau ttayl...@github.com
Re: [PATCH 2/2] diff: document the pattern format for diff.orderFile
On 2017-01-10 21:46, Junio C Hamano wrote: Richard Hansen writes: I was looking at the code to see how the two file formats differed and noticed that match_order() doesn't set the WM_PATHNAME flag when it calls wildmatch(). That's unintentional (a bug), right? It has been that way from day one IIRC even before we introduced wildmatch()---IOW it may be intentional that the current code that uses wildmatch() does not use WM_PATHNAME. You are the original author (af5323e027 2005-05-30). Do you remember what your intention was? I would like to change it to pass WM_PATHNAME, but I'm not sure if that would break anyone. I'm guessing probably not, because users probably expect WM_PATHNAME and would be surprised (like I was) to learn otherwise. If we want to keep it as-is, I'll have to adjust [PATCH v2 2/2]. -Richard
Re: [RFC PATCH 0/5] Localise error headers
On Wed, Jan 11, 2017 at 3:37 AM, Jeff King wrote: > On Tue, Jan 10, 2017 at 10:28:42AM -0800, Stefan Beller wrote: > >> > And then presumably that mix would gradually move to 100% consistency as >> > more messages are translated. But the implicit question is: are there >> > die() messages that should never be translated? I'm not sure. >> >> I would assume any plumbing command is not localizing? >> Because in plumbing land, (easily scriptable) you may find >> a grep on the output/stderr for a certain condition? > > That's the assumption I'm challenging. Certainly the behavior and > certain aspects of the output of a plumbing command should remain the > same over time. But error messages to stderr? In an ideal world that assumption would hold true and any machine readable information from the plumbing commands are on stdout and nobody in their sane mind would ever try to parse stderr. There is no easy way to check though except auditing all the code. Having pointed out some string handling mistakes in the prior email, my confidence is not high that plumbing commands are that strict about separating useful machine output and errors for human consumption. > > It seems like they should be translated, because plumbing invoked on > behalf of porcelain scripts is going to send its stderr directly to the > user. Well that could be solved, c.f. unpack-trees.c lines 15-55. As another data point (coming from that area of strings): If you grep for these plumbing strings in the project, i.e. $ git grep "would be overwritten by merge. Cannot merge" $ git grep "not uptodate. Cannot merge." ... you only find the occurrence in the unpack-trees.c lines 15-55, which means our test suite at least doesn't grep for error messages but relies on the exit code of plumbing commands(?!) > >> To find a good example, "git grep die" giving me some food of though: >> >> die_errno(..) should always take a string marked up for translation, >> because the errno string is translated? > > Yes, I would think die_errno() is a no-brainer for translation, since > the strerror() will be translated. > >> apply.c:die(_("internal error")); >> >> That is funny, too. I think we should substitute that with >> >> die("BUG: untranslated, but what went wrong instead") > > Yep. We did not consistently use "BUG:" in the early days. I would say > that "BUG" lines do not need to be translated. The point is that nobody > should ever see them, so it seems like there is little point in giving > extra work to translators. > > -Peff
Re: [PATCH v1] convert: add "status=delayed" to filter process protocol
W dniu 11.01.2017 o 11:20, Lars Schneider pisze: > On 10 Jan 2017, at 23:11, Jakub Narębski wrote: >> W dniu 09.01.2017 o 00:42, Junio C Hamano pisze: >>> larsxschnei...@gmail.com writes: From: Lars Schneider Some `clean` / `smudge` filters might require a significant amount of time to process a single blob. During this process the Git checkout operation is blocked and Git needs to wait until the filter is done to continue with the checkout. >> >> Lars, what is expected use case for this feature; that is when do you >> think this problem may happen? Is it something that happened IRL? > > Yes, this problem happens every day with filters that perform network > requests (e.g. GitLFS). Do I understand it correctly that the expected performance improvement thanks to this feature is possible only if there is some amount of parallelism and concurrency in the filter? That is, filter can be sending one blob to Git while processing other one, or filter can be fetching blobs in parallel. This means that filter process works as a kind of (de)multiplexer for fetching and/or processing blob contents, I think. > [...] In GitLFS we even implemented Git wrapper > commands to address the problem: https://github.com/git-lfs/git-lfs/pull/988 > The ultimate goal of this patch is to be able to get rid of the wrapper > commands. I'm sorry, I don't see it how the wrapper helps here. > Teach the filter process protocol (introduced in edcc858) to accept the status "delayed" as response to a filter request. Upon this response Git continues with the checkout operation and asks the filter to process the blob again after all other blobs have been processed. >>> >>> Hmm, I would have expected that the basic flow would become >>> >>> for each paths to be processed: >>> convert-to-worktree to buf >>> if not delayed: >>> do the caller's thing to use buf >>> else: >>> remember path >>> >>> for each delayed paths: >>> ensure filter process finished processing for path >>> fetch the thing to buf from the process >>> do the caller's thing to use buf >> >> I would expect here to have a kind of event loop, namely >> >>while there are delayed paths: >>get path that is ready from filter >>fetch the thing to buf (supporting "delayed") >>if path done >>do the caller's thing to use buf >>(e.g. finish checkout path, eof convert, etc.) >> >> We can either trust filter process to tell us when it finished sending >> delayed paths, or keep list of paths that are being delayed in Git. > > I could implement "get path that is ready from filter" but wouldn't > that complicate the filter protocol? I think we can use the protocol pretty > much as if with the strategy outlined here: > http://public-inbox.org/git/f533857d-9b51-44c1-8889-aa0542ad8...@gmail.com/ You are talking about the "busy-loop" solution, isn't it? In the same notation, it would look like this: while there are delayed paths: for each delayed path: request path from filter [1] fetch the thing (supporting "delayed") [2] if path done do the caller's thing to use buf remove it from delayed paths list Footnotes: -- 1) We don't send the Git-side contents of blob again, isn't it? So we need some protocol extension / new understanding anyway. for example that we don't send contents if we request path again. 2) If path is not ready at all, filter protocol would send status=delayed with empty contents. This means that we would immediately go to the next path, if there is one. There are some cases where busy loop is preferable, but I don't think it is the case here. The event loop solution would require additional protocol extension, but I don't think those complicate protocol too much: A. Git would need to signal filter process that it has sent all paths, and that it should be sending delayed paths when they are ready. This could be done for example with "command=continue". packet: git> command=continue B. Filter driver, in the event-loop phase, when (de)multiplexing fetching or processing of data, it would need now to initialize transfer, instead of waiting for Git to ask. It could look like this: packet: git< status=resumed [3] packet: git< pathname=file/to/be/resumed [4] packet: git< packet: git< SMUDGED_CONTENT_CONTINUED packet: git< packet: git< # empty list, means "status=success" [5] Footnotes: -- 3.) It could be "status=success", "status=delayed", "command=resumed", etc. 4.) In the future we
Re: [PATCH 1/2] diff: document behavior of relative diff.orderFile
On Tue, Jan 10, 2017 at 02:15:11PM -0800, Junio C Hamano wrote: > Richard Hansen writes: > > >> A related tangent. > >> > >> I wonder if anything that uses git_config_pathname() should be > >> relative to GIT_DIR when it is not absolute. > > > > I think so. (For bare repositories anyway; non-bare should be > > relative to GIT_WORK_TREE.) Perhaps git_config_pathname() itself > > should convert relative paths to absolute so that every pathname > > setting automatically works without changing any calling code. > > Yes, that was what I was alluding to. We might have to wait until > major version boundary to do so, but I think that it is the sensible > way forward in the longer term to convert relative to absolute in > git_config_pathname(). Yeah, I'd agree. I'm undecided on whether it would need to happen at a major version bump. The existing semantics are fairly insane, and would cause a lot of confusing breakages. We can imagine use of relative paths in a bare repository falls into one of a few categories: 1. The user generally runs by "cd /path/to/bare.git && git ...". This would be unaffected, as relative and $GIT_DIR are the same. 2. The user runs via "cd /path/to/bare.git/some-subdir". This would be broken, but I have trouble imagining that they really wanted to read something like "objects/orderfile". 3. The user runs via "GIT_DIR=/path/to/bare.git" from various directories. This case is probably horribly broken, as things like diff.orderFile will complain if they ever run from a directory that doesn't have the order file. 4. They run GIT_DIR=/path/to/bare.git from a consistent origin directory. This _does_ work, and we'd be breaking it. Though I kind of question why the config in $GIT_DIR is meant to apply to a file in a totally unrelated directory. I suppose somebody could be relying on the behavior where setting GIT_DIR uses the current directory as the working tree (i.e., if core.bare is "true" in bare.git). But then, we'd consider their working directory as the working tree and read from that anyway. So the behavior would stay the same. So I dunno. I do hate to break even corner cases, but I'm having trouble imagining the scenario where somebody is actually using the current behavior in a useful way. -Peff
[RFC PATCH 3/2] vreportf: add prefix to each line
On Wed, Jan 11, 2017 at 09:01:38AM -0500, Jeff King wrote: > [1/2]: Revert "vreportf: avoid intermediate buffer" > [2/2]: vreport: sanitize ASCII control chars We've talked before about repeating the "error:" header for multi-line messages. The reversion in patch 1 makes this easy to play with, so I did. I kind of hate it. But here it is, for your viewing pleasure. -- >8 -- Subject: vreportf: add prefix to each line Some error and warning messages are multi-line, but we put the prefix only on the first line. This means that either the subsequent lines don't have any indication that they're connected to the original error, or the caller has to make multiple calls to error() or warning(). And that's not even possible for die(). Instead, let's put the prefix at the start of each line, just as advise() does. Note that this patch doesn't update the tests yet, so it causes tons of failures. This is just to see what it might look like. It's actually kind of ugly. For instance, a failing test in t3600 now produces: error: the following files have staged content different from both the error: file and the HEAD: error: bar.txt error: foo.txt error: (use -f to force removal) which seems a bit aggressive. It also screws up messages which indent with tabs (the prefix eats up some of the tabstop, making the indentation smaller). And the result is a little harder if you need to cut-and-paste the file lines (if your terminal lets you triple-click to select a whole line, now you have this "error:" cruft on the front). It may be possible to address some of that by using some other kind of continuation marker (instead of just repeating the prefix), and expanding initial tabs. Signed-off-by: Jeff King --- usage.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/usage.c b/usage.c index ad6d2910f..8a1f6ff4e 100644 --- a/usage.c +++ b/usage.c @@ -8,18 +8,30 @@ static FILE *error_handle; +static void show_line(FILE *fh, const char *prefix, const char *line, int len) +{ + fprintf(fh, "%s%.*s\n", prefix, len, line); +} + void vreportf(const char *prefix, const char *err, va_list params) { char msg[4096]; FILE *fh = error_handle ? error_handle : stderr; + const char *base; char *p; vsnprintf(msg, sizeof(msg), err, params); + + base = msg; for (p = msg; *p; p++) { - if (iscntrl(*p) && *p != '\t' && *p != '\n') + if (*p == '\n') { + show_line(fh, prefix, base, p - base); + base = p + 1; + } else if (iscntrl(*p) && *p != '\t') *p = '?'; } - fprintf(fh, "%s%s\n", prefix, msg); + + show_line(fh, prefix, base, p - base); } static NORETURN void usage_builtin(const char *err, va_list params) -- 2.11.0.627.gfa6151259
[PATCH 1/2] Revert "vreportf: avoid intermediate buffer"
This reverts commit f4c3edc0b156362a92bf9de4f0ec794e90a757fc. The purpose of that commit was to let us write errors of arbitrary length to stderr by skipping the intermediate buffer and sending our varargs straight to fprintf. That works, but it comes with a downside: we do not get access to the varargs before they are sent to stderr. On balance, it's not a good tradeoff. Error messages larger than our 4K buffer are quite uncommon, and we've lost the ability to make any modifications to the output (e.g., to remove non-printable characters). The only way to have both would be one of: 1. Write into a dynamic buffer. But this is a bad idea for a low-level function that may be called when malloc() has failed. 2. Do our own printf-format varargs parsing. This is too complex to be worth the trouble. Let's just revert that change and go back to a fixed buffer. Signed-off-by: Jeff King --- usage.c | 15 +++ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/usage.c b/usage.c index 17f52c1b5..b1cbe6799 100644 --- a/usage.c +++ b/usage.c @@ -7,21 +7,13 @@ #include "cache.h" static FILE *error_handle; -static int tweaked_error_buffering; void vreportf(const char *prefix, const char *err, va_list params) { + char msg[4096]; FILE *fh = error_handle ? error_handle : stderr; - - fflush(fh); - if (!tweaked_error_buffering) { - setvbuf(fh, NULL, _IOLBF, 0); - tweaked_error_buffering = 1; - } - - fputs(prefix, fh); - vfprintf(fh, err, params); - fputc('\n', fh); + vsnprintf(msg, sizeof(msg), err, params); + fprintf(fh, "%s%s\n", prefix, msg); } static NORETURN void usage_builtin(const char *err, va_list params) @@ -93,7 +85,6 @@ void set_die_is_recursing_routine(int (*routine)(void)) void set_error_handle(FILE *fh) { error_handle = fh; - tweaked_error_buffering = 0; } void NORETURN usagef(const char *err, ...) -- 2.11.0.627.gfa6151259
[PATCH 2/2] vreport: sanitize ASCII control chars
Our error() and die() calls may report messages with arbitrary data (e.g., filenames or even data from a remote server). Let's make it harder to cause confusion with mischievous filenames. E.g., try: git rev-parse "$(printf "\rfatal: this argument is too sneaky")" -- or git rev-parse "$(printf "\x1b[5mblinky\x1b[0m")" -- Let's block all ASCII control characters, with the exception of TAB and LF. We use both in our own messages (and we are necessarily sanitizing the complete output of snprintf here, as we do not have access to the individual varargs). And TAB and LF are unlikely to cause confusion (you could put "\nfatal: sneaky\n" in your filename, but it would at least not _cover up_ the message leading to it, unlike "\r"). We'll replace the characters with a "?", which is similar to how "ls" behaves. It might be nice to do something less lossy, like converting them to "\x" hex codes. But replacing with a single character makes it easy to do in-place and without worrying about length limitations. This feature should kick in rarely enough that the "?" marks are almost never seen. We'll leave high-bit characters as-is, as they are likely to be UTF-8 (though there may be some Unicode mischief you could cause, which may require further patches). Signed-off-by: Jeff King --- usage.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/usage.c b/usage.c index b1cbe6799..ad6d2910f 100644 --- a/usage.c +++ b/usage.c @@ -12,7 +12,13 @@ void vreportf(const char *prefix, const char *err, va_list params) { char msg[4096]; FILE *fh = error_handle ? error_handle : stderr; + char *p; + vsnprintf(msg, sizeof(msg), err, params); + for (p = msg; *p; p++) { + if (iscntrl(*p) && *p != '\t' && *p != '\n') + *p = '?'; + } fprintf(fh, "%s%s\n", prefix, msg); } -- 2.11.0.627.gfa6151259
[PATCH 0/2] sanitizing error message contents
When adding a warning() call in 50d341374 (http: make redirects more obvious, 2016-12-06), somebody brought up that evil servers can redirect you to something like: https://evil.example.com/some/repo?unused=\rwarning:+rainbows+and_unicorns_ahead (where "\r" is a literal CR), and instead of seeing: warning: redirecting to https://evil.example.com/... you just get: warning: rainbows and unicorns ahead or whatever innocuous looking line they prefer (probably just ANSI "clear to beginning of line" would be even more effective). Since it's hard to figure out which error messages could potentially contain malicious contents, and since spewing control characters to the terminal is generally bad anyway, this series sanitizes at the lowest level. Note that this doesn't cover "remote:" lines coming over the sideband. Those are already covered for "\r", as we have to parse it to handle printing "remote:" consistently. But you can play tricks like putting: printf '\0331K\033[0Efatal: this looks local\n' into a pre-receive hook. I'm not sure if we would want to do more sanitizing there. The goal of this series is not so much that a remote can't send funny strings that may look local, but that they can't prevent local strings from being displayed. OTOH, I suspect clever use of ANSI codes (moving the cursor, clearing lines, etc) could get you pretty far. I'd be hesitant to disallow control codes entirely, though, as I suspect some servers do send colors over the sideband. So I punted on that here, but I think this is at least an incremental improvement. [1/2]: Revert "vreportf: avoid intermediate buffer" [2/2]: vreport: sanitize ASCII control chars usage.c | 17 +++-- 1 file changed, 7 insertions(+), 10 deletions(-) -Peff
Re: RFC: Enable delayed responses to Git clean/smudge filter requests
> On 09 Jan 2017, at 21:44, Stefan Beller wrote: > > On Mon, Nov 14, 2016 at 1:09 PM, Lars Schneider > wrote: >> Hi, >> >> Git always performs a clean/smudge filter on files in sequential order. >> Sometimes a filter operation can take a noticeable amount of time. >> This blocks the entire Git process. >> >> I would like to give a filter process the possibility to answer Git with >> "I got your request, I am processing it, ask me for the result later!". >> >> I see the following way to realize this: >> >> In unpack-trees.c:check_updates() [1] we loop through the cache >> entries and "ask me later" could be an acceptable return value of the >> checkout_entry() call. The loop could run until all entries returned >> success or error. > > Late to this thread, but here is an answer nevertheless. > > I am currently working on getting submodules working > for working tree modifying commands (prominently checkout, but > also read-tree -u and any other caller that uses the code in > unpack-trees.) > > Once the submodules are supported and used, I anticipate that > putting the files in the working tree on disk will become a bottle neck, > i.e. the checkout taking way too long for an oversized project. > > So in the future we have to do something to make checkout fast > again, which IMHO is threading. My current vision is to have checkout > automatically choose a number of threads based on expected workload, > c.f. preload-index.c, line 18-25. That sounds interesting! We are using "submodule.fetchjobs=0" to process submodules in parallel already and it works great! Thanks a lot for implementing this! >> The filter machinery is triggered in various other places in Git and >> all places that want to support "ask me later" would need to be patched >> accordingly. > > I think this makes sense, even in a threaded git-checkout. > I assume this idea is implemented before threading hits checkout, > so a question on the design: > > Who determines the workload that is acceptable? > From reading this email, it seems to be solely the filter that uses > as many threads/processes as it thinks is ok. Correct. > Would it be possible to enhance the protocol further to have > Git also mingle with the workload, i.e. tell the filter it is > allowed to use up (N-M) threads, as it itself already uses > M out of N configured threads? > > (I do not want to discuss the details here, but only if such a thing > is viable with this approach as well) Yes, I think we could give a filter these kind of hints. However, it would, of course, be up to the filter implementation to follow the hints. In case you curious, here is the discussion on v1 of the delay implementation: http://public-inbox.org/git/20170108191736.47359-1-larsxschnei...@gmail.com/ Cheers, Lars
Re: git cat-file on a submodule
On Tue, Jan 10, 2017 at 07:11:40PM -0500, David Turner wrote: > Why does git cat-file -t $sha:foo, where foo is a submodule, not work? Because "cat-file" is about inspecting items in the object database, and typically the submodule commit is not present in the superproject's database. So we cannot know its type. You can infer what it _should_ be from the surrounding tree, but you cannot actually do the object lookup. Likewise, "git cat-file -t $sha1:Makefile" is not just telling you that we found a 100644 entry in the tree, so we expect a blob. It's resolving to a sha1, and then checking the type of that sha1 in the database. It _should_ be a blob, but if it isn't, then cat-file is the tool that should tell you that it is not. > git rev-parse $sha:foo works. Right. Because that command is about resolving a name to a sha1, which we can do even without the object. > By "why", I mean "would anyone complain if I fixed it?" FWIW, I think > -p should just return the submodule's sha. I'm not sure if I'm complaining or not. I can't immediately think of something that would be horribly broken. But it really feels like you are using the wrong tool, and patching the tool to handle this case will probably lead to weird cognitive dissonance down the road. Maybe it would help to describe your use case more fully. If what you care about is the presumed type based on the surrounding tree, then maybe: git --literal-pathspecs ls-tree $sha -- foo would be a better match. -Peff
Re: [RFC PATCH 0/5] Localise error headers
On Tue, Jan 10, 2017 at 10:28:42AM -0800, Stefan Beller wrote: > > And then presumably that mix would gradually move to 100% consistency as > > more messages are translated. But the implicit question is: are there > > die() messages that should never be translated? I'm not sure. > > I would assume any plumbing command is not localizing? > Because in plumbing land, (easily scriptable) you may find > a grep on the output/stderr for a certain condition? That's the assumption I'm challenging. Certainly the behavior and certain aspects of the output of a plumbing command should remain the same over time. But error messages to stderr? It seems like they should be translated, because plumbing invoked on behalf of porcelain scripts is going to send its stderr directly to the user. > To find a good example, "git grep die" giving me some food of though: > > die_errno(..) should always take a string marked up for translation, > because the errno string is translated? Yes, I would think die_errno() is a no-brainer for translation, since the strerror() will be translated. > apply.c:die(_("internal error")); > > That is funny, too. I think we should substitute that with > > die("BUG: untranslated, but what went wrong instead") Yep. We did not consistently use "BUG:" in the early days. I would say that "BUG" lines do not need to be translated. The point is that nobody should ever see them, so it seems like there is little point in giving extra work to translators. -Peff
[PATCH] t7810: avoid assumption about invalid regex syntax
A few of the tests want to check that "git grep -P -E" will override -P with -E, and vice versa. To do so, we use a regex with "\x{..}", which is valid in PCRE but not defined by POSIX (for basic or extended regular expressions). However, POSIX declares quite a lot of syntax, including "\x", as "undefined". That leaves implementations free to extend the standard if they choose. At least one, musl libc, implements "\x" in the same way as PCRE. Our tests check that "-E" complains about "\x", which fails with musl. We can fix this by finding some construct which behaves reliably on both PCRE and POSIX, but differently in each system. One such construct is the use of backslash inside brackets. In PCRE, "[\d]" interprets "\d" as it would outside the brackets, matching a digit. Whereas in POSIX, the backslash must be treated literally, and we match either it or a literal "d". Moreover, implementations are not free to change this according to POSIX, so we should be able to rely on it. Signed-off-by: Jeff King --- I've tested this with glibc, but I wasn't able to do so with musl. The two complications are: 1. Recent versions of git won't build with musl's regex at all, because it doesn't support the non-standard REG_STARTEND that we rely on since b7d36ffca (regex: use regexec_buf(), 2016-09-21). So if applied on an older git, this patch should help, but newer versions need NO_REGEX (to use the fallback glibc regex code) either way, which would also make the problem go away. Still, I think it's the right thing to do, since we are relying on something that POSIX clearly leaves up to the implementation. It may also help on other systems, or if musl ends up supporting REG_STARTEND in the future. 2. I tried to cherry-pick to v2.7.x and test it with musl. Debian ships with a "musl-gcc" wrapper, but it doesn't work out of the box. Both zlib and pcre are compiled against glibc, so I'd have to rebuild those, too. At which point I gave up and decided to just let you test it on your musl-based system. :) t/t7810-grep.sh | 26 +++--- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index de2405ccb..19f0108f8 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -39,6 +39,10 @@ test_expect_success setup ' echo "a+bc" echo "abc" } >ab && + { + echo d && + echo 0 + } >d0 && echo vvv >v && echo ww w >w && echo x x xx x >x && @@ -1105,36 +1109,36 @@ test_expect_success 'grep pattern with grep.patternType=fixed, =basic, =extended ' test_expect_success 'grep -G -F -P -E pattern' ' - >empty && - test_must_fail git grep -G -F -P -E "a\x{2b}b\x{2a}c" ab >actual && - test_cmp empty actual + echo "d0:d" >expected && + git grep -G -F -P -E "[\d]" d0 >actual && + test_cmp expected actual ' test_expect_success 'grep pattern with grep.patternType=fixed, =basic, =perl, =extended' ' - >empty && - test_must_fail git \ + echo "d0:d" >expected && + git \ -c grep.patterntype=fixed \ -c grep.patterntype=basic \ -c grep.patterntype=perl \ -c grep.patterntype=extended \ - grep "a\x{2b}b\x{2a}c" ab >actual && - test_cmp empty actual + grep "[\d]" d0 >actual && + test_cmp expected actual ' test_expect_success LIBPCRE 'grep -G -F -E -P pattern' ' - echo "ab:a+b*c" >expected && - git grep -G -F -E -P "a\x{2b}b\x{2a}c" ab >actual && + echo "d0:0" >expected && + git grep -G -F -E -P "[\d]" d0 >actual && test_cmp expected actual ' test_expect_success LIBPCRE 'grep pattern with grep.patternType=fixed, =basic, =extended, =perl' ' - echo "ab:a+b*c" >expected && + echo "d0:0" >expected && git \ -c grep.patterntype=fixed \ -c grep.patterntype=basic \ -c grep.patterntype=extended \ -c grep.patterntype=perl \ - grep "a\x{2b}b\x{2a}c" ab >actual && + grep "[\d]" d0 >actual && test_cmp expected actual ' -- 2.11.0.627.gfa6151259
Re: [PATCH v1] convert: add "status=delayed" to filter process protocol
> On 10 Jan 2017, at 23:11, Jakub Narębski wrote: > > W dniu 09.01.2017 o 00:42, Junio C Hamano pisze: >> larsxschnei...@gmail.com writes: >>> From: Lars Schneider >>> >>> Some `clean` / `smudge` filters might require a significant amount of >>> time to process a single blob. During this process the Git checkout >>> operation is blocked and Git needs to wait until the filter is done to >>> continue with the checkout. > > Lars, what is expected use case for this feature; that is when do you > think this problem may happen? Is it something that happened IRL? Yes, this problem happens every day with filters that perform network requests (e.g. GitLFS). In GitLFS we even implemented Git wrapper commands to address the problem: https://github.com/git-lfs/git-lfs/pull/988 The ultimate goal of this patch is to be able to get rid of the wrapper commands. >>> Teach the filter process protocol (introduced in edcc858) to accept the >>> status "delayed" as response to a filter request. Upon this response Git >>> continues with the checkout operation and asks the filter to process the >>> blob again after all other blobs have been processed. >> >> Hmm, I would have expected that the basic flow would become >> >> for each paths to be processed: >> convert-to-worktree to buf >> if not delayed: >> do the caller's thing to use buf >> else: >> remember path >> >> for each delayed paths: >> ensure filter process finished processing for path >> fetch the thing to buf from the process >> do the caller's thing to use buf > > I would expect here to have a kind of event loop, namely > >while there are delayed paths: >get path that is ready from filter >fetch the thing to buf (supporting "delayed") >if path done >do the caller's thing to use buf >(e.g. finish checkout path, eof convert, etc.) > > We can either trust filter process to tell us when it finished sending > delayed paths, or keep list of paths that are being delayed in Git. I could implement "get path that is ready from filter" but wouldn't that complicate the filter protocol? I think we can use the protocol pretty much as if with the strategy outlined here: http://public-inbox.org/git/f533857d-9b51-44c1-8889-aa0542ad8...@gmail.com/ Thanks, Lars
Re: [PATCH v1] convert: add "status=delayed" to filter process protocol
> On 10 Jan 2017, at 00:38, Taylor Blau wrote: > > I've been considering some alternative approaches in order to make the > communication between Git and any extension that implements this protocol more > intuitive. > > In particular, I'm considering alternatives to: > >> for each delayed paths: >> ensure filter process finished processing for path >> fetch the thing to buf from the process >> do the caller's thing to use buf > > As I understand it, the above sequence of steps would force Git to either: > > a) loop over all delayed paths and ask the filter if it's done processing, > creating a busy-loop between the filter and Git, or... > b) loop over all delayed paths sequentially, checking out each path in > sequence > > I would like to avoid both of those situations, and instead opt for an > asynchronous approach. In (a), the protocol is far too chatty. In (b), the > protocol is much less chatty, but forces the checkout to be the very last > step, > which has negative performance implications on checkouts with many large > files. > > For instance, checking out several multi-gigabyte files one after the other > means that a significant amount of time is lost while the filter has some of > the > items ready. Instead of checking them out as they become available, Git waits > until the very end when they are all available. > > I think it would be preferable for the protocol to specify a sort of "done" > signal against each path such that Git could check out delayed paths as they > become available. If implemented this way, Git could checkout files > asynchronously, while the filter continues to do work on the other end. In v1 I implemented a) with the busy-loop problem in mind. My thinking was this: If the filter sees at least one filter request twice then the filter knows that Git has already requested all files that require filtering. At that point the filter could just block the "delayed" answer to the latest filter request until at least one of the previously delayed requests can be fulfilled. Then the filter answers "delay" to Git until Git requests the blob that can be fulfilled. This process cycles until all requests can be fulfilled. Wouldn't that work? I think a "done" message by the filter is not easy. Right now the protocol works in a mode were Git always asks and the filter always answers. I believe changing the filter to be able to initiate a "done" message would complicated the protocol. > Additionally, the protocol should specify a sentinel "no more entries" value > that could be sent from Git to the filter to signal that there are no more > files > to checkout. Some filters may implement mechanisms for converting files that > require a signal to know when all files have been sent. Specifically, Git LFS > (https://git-lfs.github.com) batches files to be transferred together, and > needs > to know when all files have been announced to truncate and send the last > batch, > if it is not yet full. I'm sure other filter implementations use a similar > mechanism and would benefit from this as well. I agree. I think the filter already has this info implicitly as explained above but an explicit message would be better! Thanks, Lars
Re: [musl] Re: Test failures when Git is built with libpcre and grep is built without it
On Tue, Jan 10, 2017 at 12:40:00PM +0100, Szabolcs Nagy wrote: > > > I'm not sure if musl is wrong for failing to complain about a > > > bogus regex. Generally making something that would break into > > > something that works is an OK way to extend the standard. So our > > > test is at fault for assuming that the regex will fail. I guess > > \x is undefined in posix and musl is based on tre which > supports \x{hexdigits} in ere. Thanks for confirming; I figured it was something like that. > > > we'd need to find some more exotic syntax that pcre supports, but > > > that ERE doesn't. Maybe "(?:)" or something. > > i think you would have to use something that's invalid > in posix ere, ? after empty expression is undefined, > not an error so "(?:)" is a valid ere extension. Reading through POSIX[1], hardly anything is explicitly labeled as "invalid". Most things are just "undefined", which leaves rooms for implementations to do what they like. That's a good thing for a standard to do, but a bad thing when you are trying to find behavior that differs reliably between PCRE and ERE. :) In most cases, PCRE constructs could be viable extensions to ERE. > since most syntax is either defined or undefined in ere > instead of being invalid, distinguishing pcre using > syntax is not easy. > > there are semantic differences in subexpression matching: > leftmost match has higher priority in pcre, longest match > has higher priority in ere. > > $ echo ab | grep -o -E '(a|ab)' > ab > $ echo ab | grep -o -P '(a|ab)' > a > > unfortunately grep -o is not portable. In this case we're testing whether Git has internally fed the regex to pcre or to regcomp(), not a system grep. So we'd need something like "-o" for "git grep", which I don't think exists. Another difference I found is that "[\d]" matches a literal "\" or "d" in ERE, but behaves like "[0-9]" in PCRE. I'll work up a patch based on that. Thanks for your answer. I'll drop the musl list from the cc when I follow-up, as this is most definitely not a musl problem, but a git one. -Peff
GREETINGS
Hello my name is Ms. Thandi Robert, from Ivory Coast. My parents were brutally mulled by the former president Laurent Gbagbo because of political crisis as the only survival of my family. I got your email while searching for a reliable personality in my private study on the internet. I am in need of your help and stand as my guardian in the management of my family inherited sum of $22.5M USD. Please get back to me with your private telephone number with sincerity. Sincerely Yours, Ms. Thandi Robert
Re: [PATCH v1] convert: add "status=delayed" to filter process protocol
> On 08 Jan 2017, at 21:45, Eric Wong wrote: > > larsxschnei...@gmail.com wrote: >> +++ b/t/t0021/rot13-filter.pl > >> +$DELAY{'test-delay1.r'} = 1; >> +$DELAY{'test-delay3.r'} = 3; >> >> open my $debug, ">>", "rot13-filter.log" or die "cannot open log file: $!"; >> >> @@ -166,6 +176,15 @@ while (1) { >> packet_txt_write("status=abort"); >> packet_flush(); >> } >> +elsif ( $command eq "smudge" and >> +exists $DELAY{$pathname} and >> +$DELAY{$pathname} gt 0 ) { > > Use '>' for numeric comparisons. 'gt' is for strings (man perlop) Still learning Perl :-) > Sidenote, staying <= 80 columns for the rest of the changes is > strongly preferred, some of us need giant fonts. I think what > Torsten said about introducing a new *_internal function can > also help with that. OK! Thank you, Lars
Re: [PATCH v1] convert: add "status=delayed" to filter process protocol
> On 08 Jan 2017, at 21:14, Torsten Bögershausen wrote: > > On Sun, Jan 08, 2017 at 08:17:36PM +0100, larsxschnei...@gmail.com wrote: >> From: Lars Schneider >> >> Some `clean` / `smudge` filters might require a significant amount of >> time to process a single blob. During this process the Git checkout >> operation is blocked and Git needs to wait until the filter is done to >> continue with the checkout. >> >> Teach the filter process protocol (introduced in edcc858) to accept the >> status "delayed" as response to a filter request. Upon this response Git >> continues with the checkout operation and asks the filter to process the >> blob again after all other blobs have been processed. >> >> Git has a multiple code paths that checkout a blob. Support delayed >> checkouts only in `clone` (in unpack-trees.c) and `checkout` operations. >> >> Signed-off-by: Lars Schneider >> --- >> > > Some feeling tells me that it may be better to leave > convert_to_working_tree() as it is. > And change convert_to_working_tree_internal as suggested: > > int convert_to_working_tree(const char *path, const char *src, size_t len, > struct strbuf *dst) > { > - return convert_to_working_tree_internal(path, src, len, dst, 0); > + return convert_to_working_tree_internal(path, src, len, dst, NULL, 0); > } If I do this then I would have no way to communicate to the caller that the processing is delayed. Consequently the caller would not know that an additional call is necessary to fetch the result. Thanks, Lars
Re: [PATCH v1] convert: add "status=delayed" to filter process protocol
> On 09 Jan 2017, at 00:42, Junio C Hamano wrote: > > larsxschnei...@gmail.com writes: > >> From: Lars Schneider >> >> Some `clean` / `smudge` filters might require a significant amount of >> time to process a single blob. During this process the Git checkout >> operation is blocked and Git needs to wait until the filter is done to >> continue with the checkout. >> >> Teach the filter process protocol (introduced in edcc858) to accept the >> status "delayed" as response to a filter request. Upon this response Git >> continues with the checkout operation and asks the filter to process the >> blob again after all other blobs have been processed. > > Hmm, I would have expected that the basic flow would become > > for each paths to be processed: > convert-to-worktree to buf > if not delayed: > do the caller's thing to use buf > else: > remember path > > for each delayed paths: > ensure filter process finished processing for path > fetch the thing to buf from the process > do the caller's thing to use buf > > and that would make quite a lot of sense. However, what is actually > implemented is a bit disappointing from that point of view. While > its first part is the same as above, the latter part instead does: > > for each delayed paths: > checkout the path > > Presumably, checkout_entry() does the "ensure that the process is > done converting" (otherwise the result is simply buggy), but what > disappoints me is that this does not allow callers that call > "convert-to-working-tree", whose interface is obtain the bytestream > in-core in the working tree representation, given an object in the > object-db representation in an in-core buffer, to _use_ the result > of the conversion. The caller does not have a chance to even see > the result as it is written straight to the filesystem, once it > calls checkout_delayed_entries(). I am not sure I can follow you here. A caller of "convert_to_working_tree" would indeed see filtered result. Consider the following example. The filter delays the conversion twice and responds with the filtered results on the third call: CALL: int convert_to_working_tree(*src=='CONTENT', *dst, *delayed==0) RESPONSE: return == 1; *delayed == 1, *dst=='' CALL: int convert_to_working_tree(*src=='CONTENT', *dst, *delayed==0) RESPONSE: return == 1; *delayed == 1, *dst=='' CALL: int convert_to_working_tree(*src=='CONTENT', *dst, *delayed==0) RESPONSE: return == 1; *delayed == 0, *dst=='FILTERED_CONTENT' I implemented the "checkout_delayed_entries" function in v1 because it solved the problem with minimal changes in the existing code. Our previous discussion made me think that this is the preferred way: I do not think we want to see such a rewrite all over the codepaths. It might be OK to add such a "these entries are known to be delayed" list in struct checkout so that the above becomes more like this: for (i = 0; i < active_nr; i++) checkout_entry(active_cache[i], state, NULL); + checkout_entry_finish(state); That is, addition of a single "some of the checkout_entry() calls done so far might have been lazy, and I'll give them a chance to clean up" might be palatable. Anything more than that on the caller side is not. c.f. http://public-inbox.org/git/xmqqvavotych@gitster.mtv.corp.google.com/ Thanks, Lars