date:20170425

Re: [PATCH v4 05/19] grep: remove redundant `regflags &= ~REG_EXTENDED` assignments

2017-04-25 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason   writes:

> @@ -417,7 +415,6 @@ static void compile_fixed_regexp(struct grep_pat *p, 
> struct grep_opt *opt)
>   int regflags;
>  
>   basic_regex_quote_buf(&sb, p->pattern);
> - regflags = opt->regflags & ~REG_EXTENDED;
>   if (opt->ignore_case)
>   regflags |= REG_ICASE;
>   err = regcomp(&p->regexp, sb.buf, regflags);

This hunk is wrong.  Now the use of regflags we see in the post
context is mixing ICASE bit into an uninitialized garbage on the
stack.

What's cooking in git.git (Apr 2017, #05; Tue, 25)

2017-04-25 Thread Junio C Hamano

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* jh/verify-index-checksum-only-in-fsck (2017-04-15) 1 commit
  (merged to 'next' on 2017-04-19 at a089b97d34)
 + read-cache: force_verify_index_checksum

 The index file has a trailing SHA-1 checksum to detect file
 corruption, and historically we checked it every time the index
 file is used.  Omit the validation during normal use, and instead
 verify only in "git fsck".

 There is an update to the test script as a follow-up.


* bw/submodule-with-bs-path (2017-04-16) 1 commit
  (merged to 'next' on 2017-04-19 at 692775477b)
 + submodule: prevent backslash expantion in submodule names

 "git submodule" script does not work well with strange pathnames.
 Protect it from a path with slashes in them, at least.


* dt/http-postbuffer-can-be-large (2017-04-13) 1 commit
  (merged to 'next' on 2017-04-19 at dc9de26fc9)
 + http.postbuffer: allow full range of ssize_t values

 Allow the http.postbuffer configuration variable to be set to a
 size that can be expressed in size_t, which can be larger than
 ulong on some platforms.


* dt/xgethostname-nul-termination (2017-04-18) 2 commits
  (merged to 'next' on 2017-04-19 at 4035f442cc)
 + xgethostname: handle long hostnames
 + use HOST_NAME_MAX to size buffers for gethostname(2)

 gethostname(2) may not NUL terminate the buffer if hostname does
 not fit; unfortunately there is no easy way to see if our buffer
 was too small, but at least this will make sure we will not end up
 using garbage past the end of the buffer.


* jh/string-list-micro-optim (2017-04-15) 1 commit
  (merged to 'next' on 2017-04-19 at 2a5e339df9)
 + string-list: use ALLOC_GROW macro when reallocing string_list

 The string-list API used a custom reallocation strategy that was
 very inefficient, instead of using the usual ALLOC_GROW() macro,
 which has been fixed.


* jh/unpack-trees-micro-optim (2017-04-15) 1 commit
  (merged to 'next' on 2017-04-19 at 970c47a2f9)
 + unpack-trees: avoid duplicate ODB lookups during checkout

 In a 2- and 3-way merge of trees, more than one source trees often
 end up sharing an identical subtree; optimize by not reading the
 same tree multiple times in such a case.


* jk/loose-object-fsck (2017-04-16) 1 commit
  (merged to 'next' on 2017-04-19 at 0cc7810018)
 + sha1_file: remove an used fd variable

 Code cleanup.


* jk/ls-files-recurse-submodules-fix (2017-04-18) 2 commits
  (merged to 'next' on 2017-04-19 at d4682ceaae)
 + ls-files: fix path used when recursing into submodules
 + ls-files: fix recurse-submodules with nested submodules

 "ls-files --recurse-submodules" did not quite work well in a
 project with nested submodules.


* jk/quarantine-received-objects (2017-04-16) 3 commits
  (merged to 'next' on 2017-04-19 at 2c6fcf1d6d)
 + refs: reject ref updates while GIT_QUARANTINE_PATH is set
 + receive-pack: document user-visible quarantine effects
 + receive-pack: drop tmp_objdir_env from run_update_hook

 Add finishing touches to a recent topic.


* jk/snprintf-cleanups (2017-04-17) 1 commit
  (merged to 'next' on 2017-04-19 at 4a18ea9971)
 + replace: plug a memory leak

 Hotfix for a topic that is already in 'master'.


* jt/fetch-pack-error-reporting (2017-04-17) 1 commit
  (merged to 'next' on 2017-04-19 at 6da61f7236)
 + fetch-pack: show clearer error message upon ERR

 "git fetch-pack" was not prepared to accept ERR packet that the
 upload-pack can send with a human-readable error message.  It
 showed the packet contents with ERR prefix, so there was no data
 loss, but it was redundant to say "ERR" in an error message.


* km/t1400-modernization (2017-04-16) 1 commit
  (merged to 'next' on 2017-04-19 at 7490be39bf)
 + t1400: use consistent style for test_expect_success calls

 Code cleanup.


* nd/conditional-config-include (2017-04-14) 2 commits
  (merged to 'next' on 2017-04-19 at 2e94c40b46)
 + config: resolve symlinks in conditional include's patterns
 + path.c: and an option to call real_path() in expand_user_path()

 $GIT_DIR may in some cases be normalized with all symlinks resolved
 while "gitdir" path expansion in the pattern does not receive the
 same treatment, leading to incorrect mismatch.  This has been fixed.


* rs/misc-cppcheck-fixes (2017-04-17) 3 commits
  (merged to 'next' on 2017-04-19 at e8fca7f593)
 + server-info: avoid calling fclose(3) twice in update_info_file()
 + files_for_each_reflog_ent_reverse(): close stream and free strbuf on error
 + am: close stream on error, but not stdin

 Various small fixes.


* sb/checkout-recurse

Re: [PATCH v4 05/19] grep: remove redundant `regflags &= ~REG_EXTENDED` assignments

2017-04-25 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason   writes:

> Remove redundant assignments to the "regflags" variable. There are no
> code paths that have previously set the regflags to anything, and
> certainly not to `|= REG_EXTENDED`.
>
> This code gave the impression that it had to reset its environment,
> but it doesn't. This dates back to the initial introduction of
> git-grep in commit 5010cb5fcc ("built-in "git grep"", 2006-04-30).

As long as nobody calls grep_commit_pattern_type() more than once
while interpreting something like "grep -E -G -P -F -G" (i.e. "I
cannot quite decide which pattern type I want to give..."), this
change should be safe.

This made me wonder if we should be doing "opt->regflags =
REG_EXTENDED" for ERE from the same "this is only called once"
reasoning, but regflags is a collection of bits, and presumably bits
other than REG_EXTENDED can be set before the control reaches
grep_commit_pattern_type(), so that one must stay as-is, i.e.
"opt->regflags |= REG_EXTENDED".

Thanks.

Re: [PATCH v8] read-cache: call verify_hdr() in a background thread

2017-04-25 Thread Junio C Hamano

Junio C Hamano  writes:

> g...@jeffhostetler.com writes:
>
>> From: Jeff Hostetler 
>>
>> Version 8 of this patch converts the unit test to use
>> perl to corrupt the index checksum (rather than altering
>> a filename) and also verifies the fsck error message as
>> suggested in response to v7 on the mailing list.
>>
>> If there are no other suggestions, I think this version
>> should be considered final.
>
> Oops.  The earlier one has already been in 'master' for a few days.
> Let's make this an incremental update.  
>
> Is the description in the following something you are OK with (so
> that I can add your sign-off)?
>
> Thanks.

By the way, I said I am fine with the two-liner version in Dscho's
message, but I am also OK with this version that does not use a
separate dd and instead does everything in a single invocation of
Perl.  As long as you've tested this version, there is no point
replacing it with yet another one.

Thanks.

> -- >8 --
> From: Jeff Hostetler 
> Date: Tue, 25 Apr 2017 18:41:09 +
> Subject: t1450: avoid use of "sed" on the index, which is a binary file
>
> The previous step added a path  to the index, and then used
> "sed" to replace this string to  to create a test case where
> the checksum at the end of the file does not match the contents.
>
> Unfortunately, use of "sed" on a non-text file is not portable.
> Instead, use a Perl script that seeks to the end and modifies the
> last byte of the file (where we _know_ stores the trailing
> checksum).
>
>
> ---
>  t/t1450-fsck.sh | 33 ++---
>  1 file changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index 677e15a7a4..eff1cd68e9 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -689,16 +689,35 @@ test_expect_success 'bogus head does not fallback to 
> all heads' '
>   ! grep $blob out
>  '
>  
> +# Corrupt the checksum on the index.
> +# Add 1 to the last byte in the SHA.
> +corrupt_index_checksum () {
> +perl -w -e '
> + use Fcntl ":seek";
> + open my $fh, "+<", ".git/index" or die "open: $!";
> + binmode $fh;
> + seek $fh, -1, SEEK_END or die "seek: $!";
> + read $fh, my $in_byte, 1 or die "read: $!";
> +
> + $in_value = unpack("C", $in_byte);
> + $out_value = ($in_value + 1) & 255;
> +
> + $out_byte = pack("C", $out_value);
> +
> + seek $fh, -1, SEEK_END or die "seek: $!";
> + print $fh $out_byte;
> + close $fh or die "close: $!";
> +'
> +}
> +
> +# Corrupt the checksum on the index and then
> +# verify that only fsck notices.
>  test_expect_success 'detect corrupt index file in fsck' '
>   cp .git/index .git/index.backup &&
>   test_when_finished "mv .git/index.backup .git/index" &&
> - echo  > &&
> - git add  &&
> - sed -e "s///" .git/index >.git/index.yyy &&
> - mv .git/index.yyy .git/index &&
> - # Confirm that fsck detects invalid checksum
> - test_must_fail git fsck --cache &&
> - # Confirm that status no longer complains about invalid checksum
> + corrupt_index_checksum &&
> + test_must_fail git fsck --cache 2>expect &&
> + grep "bad index file" expect &&
>   git status
>  '
>

Re: t0025 flaky on OSX

2017-04-25 Thread Torsten Bögershausen





So all in all it seams as if there is a very old race condition here,
which we "never" have seen yet.
Moving the file to a different inode number fixes the test case,
Git doesn't treat it as unchanged any more.


Thanks a lot for investigating this! Would you mind posting the
fix as patch?


That's ongoing.
TC #3 and #4 are fixable, but #5 resists to be cured so far.
I think we need a touch and sleep or so, more the next days (or weeks)

Re: [PATCH v8] read-cache: call verify_hdr() in a background thread

2017-04-25 Thread Junio C Hamano

g...@jeffhostetler.com writes:

> From: Jeff Hostetler 
>
> Version 8 of this patch converts the unit test to use
> perl to corrupt the index checksum (rather than altering
> a filename) and also verifies the fsck error message as
> suggested in response to v7 on the mailing list.
>
> If there are no other suggestions, I think this version
> should be considered final.

Oops.  The earlier one has already been in 'master' for a few days.
Let's make this an incremental update.  

Is the description in the following something you are OK with (so
that I can add your sign-off)?

Thanks.

-- >8 --
From: Jeff Hostetler 
Date: Tue, 25 Apr 2017 18:41:09 +
Subject: t1450: avoid use of "sed" on the index, which is a binary file

The previous step added a path  to the index, and then used
"sed" to replace this string to  to create a test case where
the checksum at the end of the file does not match the contents.

Unfortunately, use of "sed" on a non-text file is not portable.
Instead, use a Perl script that seeks to the end and modifies the
last byte of the file (where we _know_ stores the trailing
checksum).


---
 t/t1450-fsck.sh | 33 ++---
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 677e15a7a4..eff1cd68e9 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -689,16 +689,35 @@ test_expect_success 'bogus head does not fallback to all 
heads' '
! grep $blob out
 '
 
+# Corrupt the checksum on the index.
+# Add 1 to the last byte in the SHA.
+corrupt_index_checksum () {
+perl -w -e '
+   use Fcntl ":seek";
+   open my $fh, "+<", ".git/index" or die "open: $!";
+   binmode $fh;
+   seek $fh, -1, SEEK_END or die "seek: $!";
+   read $fh, my $in_byte, 1 or die "read: $!";
+
+   $in_value = unpack("C", $in_byte);
+   $out_value = ($in_value + 1) & 255;
+
+   $out_byte = pack("C", $out_value);
+
+   seek $fh, -1, SEEK_END or die "seek: $!";
+   print $fh $out_byte;
+   close $fh or die "close: $!";
+'
+}
+
+# Corrupt the checksum on the index and then
+# verify that only fsck notices.
 test_expect_success 'detect corrupt index file in fsck' '
cp .git/index .git/index.backup &&
test_when_finished "mv .git/index.backup .git/index" &&
-   echo  > &&
-   git add  &&
-   sed -e "s///" .git/index >.git/index.yyy &&
-   mv .git/index.yyy .git/index &&
-   # Confirm that fsck detects invalid checksum
-   test_must_fail git fsck --cache &&
-   # Confirm that status no longer complains about invalid checksum
+   corrupt_index_checksum &&
+   test_must_fail git fsck --cache 2>expect &&
+   grep "bad index file" expect &&
git status
 '

Re: [PATCH] test: remove unused parameter from the wildmatch test

2017-04-25 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason  writes:

> Right now this is all dead code, but I wonder if instead we should be
> partially reverting commit 70a8fc999d ("stop using fnmatch (either
> native or compat)", 2014-02-15) by Duy to the extent of being able to
> extend t/helper/test-wildmatch.c to test fnmatch() as well.
>
> We wouldn't be using fnmatch(), but I think it's a probably a good
> idea for the tests to support a mode where we have to declare
> explicitly whether something should also match under fnmatch or not,
> so we document the differences.

I am on the fence and can go either way.  What you suggest is
intellectually intriguing even though I am unsure of its practical
value.

Re: [PATCH v2] rebase -i: add config to abbreviate command-names

2017-04-25 Thread Junio C Hamano

Jeff King  writes:

> I think the words "instruction list" may have come from my suggestion. I
> used them because that is the term used in the rebase.instructionFormat
> documentation directly above the option you are adding.
>
> It may be worth a follow-on patch to convert that one to "todo list" if
> that's the preferred name.

Running

$ git grep -i -e 'instruction [ls]' -e 'todo l'

lets us count how we call them, and we can see there is only one
instance of 'instruction list'.

Running the above in v1.7.3 tree shows that it was originally called
'todo list', and we can see that an enhancement of cherry-pick in
cd4093b6 ("Merge branch 'rr/revert-cherry-pick-continue'",
2011-10-05)) started calling this instruction sheet around v1.7.8.

A follow-on patch to unify all three would be nice, indeed.

Thanks.

Re: [PATCH v4 8/9] Use uintmax_t for timestamps

2017-04-25 Thread Junio C Hamano

Junio C Hamano  writes:

> Johannes Schindelin  writes:
>
>> In any case, it is a question unrelated to the work I performed in this
>> patch series: the raison d'être of these patches is to allow timestamps to
>> refer to dates that are currently insanely far in the future.
>
> Yes, but the job of the maintainer is to prevent narrow-focused
> individual contributors from throwing us into a hole we cannot dig
> out of by closing the door for plausible future enhancements.

Having said that, IIRC, this series does not tighten the existing
code to specifically check for integer wrap-around anyway, so in a
sense, users who use a timestamp that is in an insanely distant
future is already accepting the risk of getting broken in the
future, so my answer to the question I asked is "it would be extra
nice to future-proof people's data, but not doing anything is
probably OK---at least we is not making things worse."

Re: [PATCH v2 0/9] The final building block for a faster rebase -i

2017-04-25 Thread Junio C Hamano

Johannes Schindelin  writes:

> This patch series reimplements the expensive pre- and post-processing of
> the todo script in C.
>
> And it concludes the work I did to accelerate rebase -i.

I noticed (this is merely "I felt"; I didn't measure) that recent
"rebase -i" sessions I do on a Linux box is already plenty faster
than before, and it is good to finally see the end of the long road.

Thanks.

Re: [PATCH v7] read-cache: force_verify_index_checksum

2017-04-25 Thread Junio C Hamano

Johannes Schindelin  writes:

> Just in case that certain reviewers favor length over readability, let me
> offer this snippet:
>
>   size=$(perl -e "print -s \".git/index\"") &&
>   dd if=/dev/zero of=.git/index bs=1 seek=$(($size-20) count=20

Yup, this does sound like a good direction to go (I think you have
some unbalanced parens but it is sufficient to convey the idea).

> Since whatever hash will be used in the future is most likely larger than
> 20 bytes, this should still work fine (and even if somebody sane replaces
> the SHA-1 of the index with a CRC-32 for the same benefit we have now, the
> test will fail quickly and it is easy to replace the 20 by 4).

True that, too.

Re: Submodule/contents conflict

2017-04-25 Thread Junio C Hamano

"Philip Oakley"  writes:

> As I recall Christoph was using checkout to copy a file (e.g. a
> template file) from an older commit/revision into his worktree, and
> was suprised that this (git checkout  ) also _staged_ the
> file, rather than simply letting it be in a modified/untracked state.

This probably is taking it even further than the original topic, but
I raise this weather-balloon to see if anybody is interested.

In the modern day, it might be useful if the "--working-tree-only"
mode added a new file as an intent-to-add entry to the index, but
that is not what "git apply (no other options)" (which is the gold
standard for command that operates on the working tree and/or on the
index) does, so it is not done in this patch.  IOW, if you grab a
path that does not exist in your index out of , you will
write out an untracked file to the working tree.

-- >8 --
Subject: [PATCH] checkout: add --working-tree-only option

"git checkout  " has always copied the blob from
the tree-ish to the index before checking them out to the working tree.

Some users may want to grab a blob out of a tree-ish directly to the
working tree, without updating the index, so that "git diff" can be
used to assess the damage and adjust the file contents taken from a
different branch to be more appropriate for the current branch.

The new option "--working-tree-only" allows exactly that.

In the hindsight, when a command works on the working tree and/or
the index, the usual convention is:

 - with no other option, the command works only on the working tree;

 - with "--cached" option, the command works only on the index; and

 - with "--index" option, the command works on both the working tree
   and the index.

So we probably should have triggered the default behaviour under the
"--index" option, and triggered this "--working-tree-only" mode of
behaviour when "--index" option is not given.  From the same point
of view, "git checkout --cached  " would have
done the same as "git reset  " would do.  And
that may have made the command set a bit more consistent.

But that is merely a hindsight being 20/20, oh well.

Signed-off-by: Junio C Hamano 
---
 Documentation/git-checkout.txt | 22 +++---
 builtin/checkout.c | 10 +-
 t/t2022-checkout-paths.sh  | 21 +
 3 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
index 8e2c0662dd..201677752e 100644
--- a/Documentation/git-checkout.txt
+++ b/Documentation/git-checkout.txt
@@ -14,6 +14,7 @@ SYNOPSIS
 'git checkout' [-q] [-f] [-m] [[-b|-B|--orphan] ] []
 'git checkout' [-f|--ours|--theirs|-m|--conflict=

Re: [PATCH v2] rebase -i: add config to abbreviate command-names

2017-04-25 Thread Junio C Hamano

Jacob Keller  writes:

> On Mon, Apr 24, 2017 at 11:29 PM, Junio C Hamano  wrote:
>> Personally I am happy with the beginning of each instruction line
>> aligned, so from that point of view, this patch is a mild Meh to me,
>> even though I do a fair amount of "rebase -i" myself.  But obviously
>> I am not the only user of Git you need to please, so...
>
> I would instead justify this as making it easier to change the action,
> since you only need to rewrite a single letter, which at least in vim
> takes "r" to change the action, vs slightly more keystrokes
> such as "ct

Re: [PATCH v2] sequencer: require trailing NL in footers

2017-04-25 Thread Junio C Hamano

Jonathan Nieder  writes:

> Jonathan Tan wrote:
>
>> Reported-by: Brian Norris 
>> Signed-off-by: Jonathan Tan 
>> ---
> [...]
>>  sequencer.c  | 11 +++
>>  t/t3511-cherry-pick-x.sh | 14 ++
>>  2 files changed, 25 insertions(+)
>
> Reviewed-by: Jonathan Nieder 
>
> This is still pretty subtle (using the added newline that is added after
> a non-footer to turn the invalid footer into a valid one), but
>
>  * it is clear from the code that it will work correctly
>  * the new test ensures we'll continue to support this case
>  * it is understandable after a little head scratching
>  * I'm hoping the subtlety will go away once the code learns to deal
>with ordinary non-footer text that has a missing newline

Hmph, perhaps we need another test that documents a known failure as
well in the meantime?

> [...]
>> --- a/t/t3511-cherry-pick-x.sh
>> +++ b/t/t3511-cherry-pick-x.sh
>> @@ -208,6 +208,20 @@ test_expect_success 'cherry-pick -x -s adds sob even 
>> when trailing sob exists fo
>>  test_cmp expect actual
>>  '
>>  
>> +test_expect_success 'cherry-pick -x handles commits with no NL at end of 
>> message' '
>> +pristine_detach initial &&
>> +sha1=$(printf "title\n\nSigned-off-by: a" | git commit-tree -p initial 
>> mesg-with-footer^{tree}) &&
>
> nit: Should this use a more typical sign-off line with an email
> address, to avoid a false-positive success in case git becomes more
> strict about its signoffs in the future?
>
> Something like
>
>   printf "title\n\nSigned-off-by: S. I. Gner " >msg &&
>   sha1=$(git commit-tree -p initial mesg-with-footer^{tree}...

That is a good point and has an added benefit that the test script
becomes easier to follow.

 t/t3511-cherry-pick-x.sh | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/t3511-cherry-pick-x.sh b/t/t3511-cherry-pick-x.sh
index 6f518020b2..c2b143802d 100755
--- a/t/t3511-cherry-pick-x.sh
+++ b/t/t3511-cherry-pick-x.sh
@@ -210,12 +210,14 @@ test_expect_success 'cherry-pick -x -s adds sob even when 
trailing sob exists fo
 
 test_expect_success 'cherry-pick -x handles commits with no NL at end of 
message' '
pristine_detach initial &&
-   sha1=$(printf "title\n\nSigned-off-by: a" | git commit-tree -p initial 
mesg-with-footer^{tree}) &&
+   signer="S. I. Gner " &&
+   printf "title\n\nSigned-off-by: %s" "$signer" >msg &&
+   sha1=$(git commit-tree -p initial mesg-with-footer^{tree} expect &&
title
 
-   Signed-off-by: a
+   Signed-off-by: $signer
(cherry picked from commit $sha1)
EOF
git log -1 --pretty=format:%B >actual &&

Re: [PATCH] tests: fix tests broken under GETTEXT_POISON=YesPlease

2017-04-25 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason  writes:

> Indeed, I've tried to be careful not to introduce bugs like that, but
> in this skipped case the tests look completely stand-alone to me.

Yes, the ones I commented on in the upthread looked like their side
effect were not felt in the later tests.

> In any case, I like my other patch to just remove this whole thing better.

Hmph, did I miss a patch message?

Re: [PATCH v5 0/8] Introduce timestamp_t for timestamps

2017-04-25 Thread Junio C Hamano

René Scharfe  writes:

> I can't think of many ways to get future time stamps (broken clock,
> broken CMOS battery, bit rot, time travel), so I wouldn't expect a
> change towards better error reporting to affect a lot of users.  (Not
> necessarily as part of this series, of course.)

Better error reporting is one thing, but we do not want to kill "git
log" in the middle by calling die().  

Dropping the patch v4 9/9 that caused us to call die() was a good
thing to do within the scope of this series, I would think.

Re: [PATCH v4 8/9] Use uintmax_t for timestamps

2017-04-25 Thread Junio C Hamano

Johannes Schindelin  writes:

> In any case, it is a question unrelated to the work I performed in this
> patch series: the raison d'être of these patches is to allow timestamps to
> refer to dates that are currently insanely far in the future.

Yes, but the job of the maintainer is to prevent narrow-focused
individual contributors from throwing us into a hole we cannot dig
out of by closing the door for plausible future enhancements.

Re: [PATCH v2] rebase -i: add config to abbreviate command-names

2017-04-25 Thread Jeff King

On Tue, Apr 25, 2017 at 08:13:27PM -0400, liam Beguin wrote:

> > > +rebase.abbrevCmd::
> > > + If set to true, `git rebase -i` will abbreviate the command-names in the
> > > + instruction list. This means that instead of looking like this,
> > 
> > This is by no means your fault, but it is really horrible by how many
> > different names Git's documentation refers to the todo script, nothing
> > short of confusing. It is the todo script (which I called it initially,
> > maybe not a good name, but it has the merit of the longest tradition at
> > least), the todo list, the instruction sheet, the rebase script, the
> > instruction list... etc
> > 
> > However, the thing is called "todo list" elsewhere in the same file,
> > therefore lets try to avoid even more confusion and use that term instead
> > of "instruction list" here.
> 
> thanks for pointing this out, I was not quite sure what to call this list.

I think the words "instruction list" may have come from my suggestion. I
used them because that is the term used in the rebase.instructionFormat
documentation directly above the option you are adding.

It may be worth a follow-on patch to convert that one to "todo list" if
that's the preferred name.

-Peff

Re: [PATCH v8 2/2] run-command: restrict PATH search to executable files

2017-04-25 Thread Junio C Hamano

Brandon Williams  writes:

> In some situations run-command will incorrectly try (and fail) to
> execute a directory instead of an executable file.  This was observed by
> having a directory called "ssh" in $PATH before the real ssh and trying
> to use ssh protoccol, reslting in the following:
>
>   $ git ls-remote ssh://url
>   fatal: cannot exec 'ssh': Permission denied
>
> It ends up being worse and run-command will even try to execute a
> non-executable file if it preceeds the executable version of a file on
> the PATH.  For example, if PATH=~/bin1:~/bin2:~/bin3 and there exists a
> directory 'git-hello' in 'bin1', a non-executable file 'git-hello' in
> bin2 and an executable file 'git-hello' (which prints "Hello World!") in
> bin3 the following will occur:
>
>   $ git hello
>   fatal: cannot exec 'git-hello': Permission denied
>
> This is due to only checking 'access()' when locating an executable in
> PATH, which doesn't distinguish between files and directories.  Instead
> use 'is_executable()' which check that the path is to a regular,
> executable file.  Now run-command won't try to execute the directory or
> non-executable file 'git-hello':
>
>   $ git hello
>   Hello World!

Could you add a line after this example, that says something like
"which matches what execvp() would have done with a request to
execute git-hello with such a $PATH."

That is because it can be argued that bin1/git-hello should be found
and get complaint "not an executable file", or that bin1/git-hello
should be skipped but bin2/git-hello should be found and get
complaint "not an executable file", both to help the user diagnose
and fix the broken $PATH (or director contents).  It is the easiest
to justify why we chose this other definition to skip both git-hello
in bin1 and bin2 if that is an established existing practice---we
can say "sure, what you propose also may make sense, but we match
what execvp(3) does".

The patch text looks good.

Thanks.

Re: [PATCH v2] clone: add a --no-tags option to clone without tags

2017-04-25 Thread Junio C Hamano

Jonathan Nieder  writes:

> In other words, I think the commit message needs a bit more detail about
> the use case, to say why omitting those tags is useful.  The use case
> is probably sane but it is not explained.  A side effect (and my main
> motivation) is that this would make it crystal clear to people looking
> at the patch in history that it is talking about tags that are part of
> "master"'s history, not tags pointing elsewhere.

I agree that it is unclear "having no tags, not even the harmless
and usually useful ones that point at the history of the branch of
interest" is the point of this new feature from the documentation
and log message.  

Responding to your other message, I do not think this new feature
should be tied to --single-branch; I think having the tags to mark
commits in the branch's history (while not fetching other tags
irrelevant to the branch's history) is usually what users would
want.

>> Before this the only way of doing this was either by manually tweaking
>> the config in a fresh repository:
>
> Usually commit messages refer to the state of things without some
> patch using the present tense --- e.g. "Without this patch, this
> --no-tags option can be emulated by (1) manually tweaking the config
> in a fresh repository, or (2) by setting tagOpt=--no-tags after
> cloning and deleting any existing tags".

Thanks--I'll use this myself when responding to patches from other
people.  I recall getting irritated while reading some patches and
couldn't pinpoint why they were irritating, and now I realize that
it was because they said "Previously Git did X." and somesuch.

Re: [PATCH v2] rebase -i: add config to abbreviate command-names

2017-04-25 Thread liam Beguin

Hi Johannes, 

On Tue, 2017-04-25 at 22:08 +0200, Johannes Schindelin wrote:
> Hi Liam,
> 
> On Tue, 25 Apr 2017, Liam Beguin wrote:
> 
> > Add the 'rebase.abbrevCmd' boolean config option to allow `git rebase -i`
> > to abbreviate the command-names in the instruction list.
> > 
> > This means that `git rebase -i` would print:
> > p deadbee The oneline of this commit
> > ...
> > 
> > instead of:
> > pick deadbee The oneline of this commit
> > ...
> > 
> > Using a single character command-name allows the lines to remain
> > aligned, making the whole set more readable.
> > 
> > Signed-off-by: Liam Beguin 
> 
> Apart from either abbreviating commands after --edit-todo, or documenting
> explicitly that the new config option only concerns the initial todo list,
> there is another problem that just occurred to me: --exec.
> 
> When you call `git rebase -x "make DEVELOPER=1 -j15"`, the idea is to
> append an "exec make DEVELOPER=1 -j15" line after every pick line. The
> code in question looks like this:
> 
> add_exec_commands () {
> {
> first=t
> while read -r insn rest
> do
> case $insn in
> pick)
> test -n "$first" ||
> printf "%s" "$cmd"
> ;;
> esac
> printf "%s %s\n" "$insn" "$rest"
> first=
> done
> printf "%s" "$cmd"
> } <"$1" >"$1.new" &&
> mv "$1.new" "$1"
> }
> 
> Obviously, the git-rebase--interactive script expects at this point that
> the command is spelled out, so your patch needs to change the `pick)` case
> to `p|pick)`, I think.
> 
> In addition, since the rationale for the new option is to align the lines
> better, the `exec` would need to be replaced by `x`, and as multiple `-x`
> options are allowed, you would need something like this at the beginning
> of `add_exec_commands`, too:
> 
>   # abbreviate `exec` if rebase.abbrevCmd is true
>   test p != "$rebasecmd" ||
>   cmd="$(echo "$cmd" | sed 's/^exec/x/')"
> 



> Also:
> 
> > diff --git a/Documentation/config.txt b/Documentation/config.txt
> > index 475e874d5155..8b1877f2df91 100644
> > --- a/Documentation/config.txt
> > +++ b/Documentation/config.txt
> > @@ -2614,6 +2614,25 @@ rebase.instructionFormat::
> >     the instruction list during an interactive rebase.  The format will 
> > automatically
> >     have the long commit hash prepended to the format.
> >  
> > +rebase.abbrevCmd::
> 
> It does not fail to amuse that the term "abbrevCmd" is abbreviated
> heavily itself. However, I would strongly suggest to avoid that. It would
> be much more pleasant to call the config option rebase.abbreviateCommands

I tried to use something similar to the rest of the options but I guess that
would be best.

> 
> > +rebase.abbrevCmd::
> > +   If set to true, `git rebase -i` will abbreviate the command-names in the
> > +   instruction list. This means that instead of looking like this,
> 
> This is by no means your fault, but it is really horrible by how many
> different names Git's documentation refers to the todo script, nothing
> short of confusing. It is the todo script (which I called it initially,
> maybe not a good name, but it has the merit of the longest tradition at
> least), the todo list, the instruction sheet, the rebase script, the
> instruction list... etc
> 
> However, the thing is called "todo list" elsewhere in the same file,
> therefore lets try to avoid even more confusion and use that term instead
> of "instruction list" here.

thanks for pointing this out, I was not quite sure what to call this list.

> 
> > diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> > index 2c9c0165b5ab..9f3e82b79615 100644
> > --- a/git-rebase--interactive.sh
> > +++ b/git-rebase--interactive.sh
> > @@ -1210,6 +1210,10 @@ else
> >     revisions=$onto...$orig_head
> >     shortrevisions=$shorthead
> >  fi
> > +
> > +rebasecmd=pick
> > +test "$(git config --bool --get rebase.abbrevCmd)" = true && rebasecmd=p
> 
> A better name would be "pickcmd", as there are more rebase commands than
> just `pick` and what we want here is really only associated with one of
> those commands.

Wouldn't that make it confusing when the patch starts to handle other commands?
A common name across the script would limit further confusion.
I noticed that it is already called `action` in `rearrange_squash`.
would that do? (even though it has no reference to 'command')

> 
> Ciao,
> Johannes

Thanks for the detailed answer,
Liam

Re: [PATCH v2] sequencer: require trailing NL in footers

2017-04-25 Thread Jonathan Nieder

Jonathan Tan wrote:

> Reported-by: Brian Norris 
> Signed-off-by: Jonathan Tan 
> ---
[...]
>  sequencer.c  | 11 +++
>  t/t3511-cherry-pick-x.sh | 14 ++
>  2 files changed, 25 insertions(+)

Reviewed-by: Jonathan Nieder 

This is still pretty subtle (using the added newline that is added after
a non-footer to turn the invalid footer into a valid one), but

 * it is clear from the code that it will work correctly
 * the new test ensures we'll continue to support this case
 * it is understandable after a little head scratching
 * I'm hoping the subtlety will go away once the code learns to deal
   with ordinary non-footer text that has a missing newline

[...]
> --- a/t/t3511-cherry-pick-x.sh
> +++ b/t/t3511-cherry-pick-x.sh
> @@ -208,6 +208,20 @@ test_expect_success 'cherry-pick -x -s adds sob even 
> when trailing sob exists fo
>   test_cmp expect actual
>  '
>  
> +test_expect_success 'cherry-pick -x handles commits with no NL at end of 
> message' '
> + pristine_detach initial &&
> + sha1=$(printf "title\n\nSigned-off-by: a" | git commit-tree -p initial 
> mesg-with-footer^{tree}) &&

nit: Should this use a more typical sign-off line with an email
address, to avoid a false-positive success in case git becomes more
strict about its signoffs in the future?

Something like

printf "title\n\nSigned-off-by: S. I. Gner " >msg &&
sha1=$(git commit-tree -p initial mesg-with-footer^{tree}

[PATCH v2] sequencer: require trailing NL in footers

2017-04-25 Thread Jonathan Tan

In commit 967dfd4 ("sequencer: use trailer's trailer layout",
2016-11-29), sequencer was taught to use the same mechanism as
interpret-trailers to determine the nature of the trailer of a commit
message (referred to as the "footer" in sequencer.c). However, the
requirement that a footer end in a newline character was inadvertently
removed. Restore that requirement.

While writing this commit, I noticed that if the "ignore_footer"
parameter in "has_conforming_footer" is greater than the distance
between the trailer start and sb->len, "has_conforming_footer" will
return an unexpected result. This does not occur in practice, because
"ignore_footer" is either zero or the return value of an invocation to
"ignore_non_trailer", which only skips empty lines and comment lines.
This commit contains a comment explaining this in the function's
documentation.

Reported-by: Brian Norris 
Signed-off-by: Jonathan Tan 
---

jrnieder pointed out the existence of commit-tree to me, which I have
used to write the test below.

Changes from v1:
 - added test
 - used one of sbeller's documentation suggestions

 sequencer.c  | 11 +++
 t/t3511-cherry-pick-x.sh | 14 ++
 2 files changed, 25 insertions(+)

diff --git a/sequencer.c b/sequencer.c
index 77afecaeb..500a76260 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -151,6 +151,12 @@ static const char *get_todo_path(const struct replay_opts 
*opts)
  * Returns 1 for conforming footer
  * Returns 2 when sob exists within conforming footer
  * Returns 3 when sob exists within conforming footer as last entry
+ *
+ * A footer that does not end in a newline is considered non-conforming.
+ *
+ * ignore_footer, if not zero, should be the return value of an invocation to
+ * ignore_non_trailer(). See the documentation of that function for more
+ * information.
  */
 static int has_conforming_footer(struct strbuf *sb, struct strbuf *sob,
int ignore_footer)
@@ -159,6 +165,11 @@ static int has_conforming_footer(struct strbuf *sb, struct 
strbuf *sob,
int i;
int found_sob = 0, found_sob_last = 0;
 
+   if (sb->len <= ignore_footer)
+   return 0;
+   if (sb->buf[sb->len - ignore_footer - 1] != '\n')
+   return 0;
+
trailer_info_get(&info, sb->buf);
 
if (info.trailer_start == info.trailer_end)
diff --git a/t/t3511-cherry-pick-x.sh b/t/t3511-cherry-pick-x.sh
index bf0a5c988..6f518020b 100755
--- a/t/t3511-cherry-pick-x.sh
+++ b/t/t3511-cherry-pick-x.sh
@@ -208,6 +208,20 @@ test_expect_success 'cherry-pick -x -s adds sob even when 
trailing sob exists fo
test_cmp expect actual
 '
 
+test_expect_success 'cherry-pick -x handles commits with no NL at end of 
message' '
+   pristine_detach initial &&
+   sha1=$(printf "title\n\nSigned-off-by: a" | git commit-tree -p initial 
mesg-with-footer^{tree}) &&
+   git cherry-pick -x $sha1 &&
+   cat <<-EOF >expect &&
+   title
+
+   Signed-off-by: a
+   (cherry picked from commit $sha1)
+   EOF
+   git log -1 --pretty=format:%B >actual &&
+   test_cmp expect actual
+'
+
 test_expect_success 'cherry-pick -x treats "(cherry picked from..." line as 
part of footer' '
pristine_detach initial &&
sha1=$(git rev-parse mesg-with-cherry-footer^0) &&
-- 
2.13.0.rc0.306.g87b477812d-goog

Re: [PATCH v8 2/2] run-command: restrict PATH search to executable files

2017-04-25 Thread Jonathan Nieder

Brandon Williams wrote:

> In some situations run-command will incorrectly try (and fail) to
> execute a directory instead of an executable file.  This was observed by
> having a directory called "ssh" in $PATH before the real ssh and trying
> to use ssh protoccol, reslting in the following:
>
>   $ git ls-remote ssh://url
>   fatal: cannot exec 'ssh': Permission denied
>
> It ends up being worse and run-command will even try to execute a
> non-executable file if it preceeds the executable version of a file on
> the PATH.  For example, if PATH=~/bin1:~/bin2:~/bin3 and there exists a
> directory 'git-hello' in 'bin1', a non-executable file 'git-hello' in
> bin2 and an executable file 'git-hello' (which prints "Hello World!") in
> bin3 the following will occur:
>
>   $ git hello
>   fatal: cannot exec 'git-hello': Permission denied
>
> This is due to only checking 'access()' when locating an executable in
> PATH, which doesn't distinguish between files and directories.  Instead
> use 'is_executable()' which check that the path is to a regular,
> executable file.  Now run-command won't try to execute the directory or
> non-executable file 'git-hello':
>
>   $ git hello
>   Hello World!
>
> Reported-by: Brian Hatfield 
> Signed-off-by: Brandon Williams 
> ---
>  run-command.c  | 19 ++-
>  t/t0061-run-command.sh | 30 ++
>  2 files changed, 48 insertions(+), 1 deletion(-)

Reviewed-by: Jonathan Nieder 

Thanks.  Patch left unsnipped for reference.

> diff --git a/run-command.c b/run-command.c
> index 2ffbd7e67..9e36151bf 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -159,6 +159,23 @@ int is_executable(const char *name)
>   return st.st_mode & S_IXUSR;
>  }
>  
> +/*
> + * Search $PATH for a command.  This emulates the path search that
> + * execvp would perform, without actually executing the command so it
> + * can be used before fork() to prepare to run a command using
> + * execve() or after execvp() to diagnose why it failed.
> + *
> + * The caller should ensure that file contains no directory
> + * separators.
> + *
> + * Returns the path to the command, as found in $PATH or NULL if the
> + * command could not be found.  The caller inherits ownership of the memory
> + * used to store the resultant path.
> + *
> + * This should not be used on Windows, where the $PATH search rules
> + * are more complicated (e.g., a search for "foo" should find
> + * "foo.exe").
> + */
>  static char *locate_in_PATH(const char *file)
>  {
>   const char *p = getenv("PATH");
> @@ -179,7 +196,7 @@ static char *locate_in_PATH(const char *file)
>   }
>   strbuf_addstr(&buf, file);
>  
> - if (!access(buf.buf, F_OK))
> + if (is_executable(buf.buf))
>   return strbuf_detach(&buf, NULL);
>  
>   if (!*end)
> diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
> index 98c09dd98..e4739170a 100755
> --- a/t/t0061-run-command.sh
> +++ b/t/t0061-run-command.sh
> @@ -37,6 +37,36 @@ test_expect_success !MINGW 'run_command can run a script 
> without a #! line' '
>   test_cmp empty err
>  '
>  
> +test_expect_success 'run_command does not try to execute a directory' '
> + test_when_finished "rm -rf bin1 bin2" &&
> + mkdir -p bin1/greet bin2 &&
> + write_script bin2/greet <<-\EOF &&
> + cat bin2/greet
> + EOF
> +
> + PATH=$PWD/bin1:$PWD/bin2:$PATH \
> + test-run-command run-command greet >actual 2>err &&
> + test_cmp bin2/greet actual &&
> + test_cmp empty err
> +'
> +
> +test_expect_success POSIXPERM 'run_command passes over non-executable file' '
> + test_when_finished "rm -rf bin1 bin2" &&
> + mkdir -p bin1 bin2 &&
> + write_script bin1/greet <<-\EOF &&
> + cat bin1/greet
> + EOF
> + chmod -x bin1/greet &&
> + write_script bin2/greet <<-\EOF &&
> + cat bin2/greet
> + EOF
> +
> + PATH=$PWD/bin1:$PWD/bin2:$PATH \
> + test-run-command run-command greet >actual 2>err &&
> + test_cmp bin2/greet actual &&
> + test_cmp empty err
> +'
> +
>  test_expect_success POSIXPERM 'run_command reports EACCES' '
>   cat hello-script >hello.sh &&
>   chmod -x hello.sh &&

Re: [PATCH v8 1/2] run-command: expose is_executable function

2017-04-25 Thread Jonathan Nieder

Brandon Williams wrote:

> Move the logic for 'is_executable()' from help.c to run_command.c and
> expose it so that callers from outside help.c can access the function.
> This is to enable run-command to be able to query if a file is
> executable in a future patch.
>
> Signed-off-by: Brandon Williams 
> ---
>  help.c| 43 +--
>  run-command.c | 42 ++
>  run-command.h |  1 +
>  3 files changed, 44 insertions(+), 42 deletions(-)

Reviewed-by: Jonathan Nieder

[PATCH v8 2/2] run-command: restrict PATH search to executable files

2017-04-25 Thread Brandon Williams

In some situations run-command will incorrectly try (and fail) to
execute a directory instead of an executable file.  This was observed by
having a directory called "ssh" in $PATH before the real ssh and trying
to use ssh protoccol, reslting in the following:

$ git ls-remote ssh://url
fatal: cannot exec 'ssh': Permission denied

It ends up being worse and run-command will even try to execute a
non-executable file if it preceeds the executable version of a file on
the PATH.  For example, if PATH=~/bin1:~/bin2:~/bin3 and there exists a
directory 'git-hello' in 'bin1', a non-executable file 'git-hello' in
bin2 and an executable file 'git-hello' (which prints "Hello World!") in
bin3 the following will occur:

$ git hello
fatal: cannot exec 'git-hello': Permission denied

This is due to only checking 'access()' when locating an executable in
PATH, which doesn't distinguish between files and directories.  Instead
use 'is_executable()' which check that the path is to a regular,
executable file.  Now run-command won't try to execute the directory or
non-executable file 'git-hello':

$ git hello
Hello World!

Reported-by: Brian Hatfield 
Signed-off-by: Brandon Williams 
---
 run-command.c  | 19 ++-
 t/t0061-run-command.sh | 30 ++
 2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/run-command.c b/run-command.c
index 2ffbd7e67..9e36151bf 100644
--- a/run-command.c
+++ b/run-command.c
@@ -159,6 +159,23 @@ int is_executable(const char *name)
return st.st_mode & S_IXUSR;
 }
 
+/*
+ * Search $PATH for a command.  This emulates the path search that
+ * execvp would perform, without actually executing the command so it
+ * can be used before fork() to prepare to run a command using
+ * execve() or after execvp() to diagnose why it failed.
+ *
+ * The caller should ensure that file contains no directory
+ * separators.
+ *
+ * Returns the path to the command, as found in $PATH or NULL if the
+ * command could not be found.  The caller inherits ownership of the memory
+ * used to store the resultant path.
+ *
+ * This should not be used on Windows, where the $PATH search rules
+ * are more complicated (e.g., a search for "foo" should find
+ * "foo.exe").
+ */
 static char *locate_in_PATH(const char *file)
 {
const char *p = getenv("PATH");
@@ -179,7 +196,7 @@ static char *locate_in_PATH(const char *file)
}
strbuf_addstr(&buf, file);
 
-   if (!access(buf.buf, F_OK))
+   if (is_executable(buf.buf))
return strbuf_detach(&buf, NULL);
 
if (!*end)
diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
index 98c09dd98..e4739170a 100755
--- a/t/t0061-run-command.sh
+++ b/t/t0061-run-command.sh
@@ -37,6 +37,36 @@ test_expect_success !MINGW 'run_command can run a script 
without a #! line' '
test_cmp empty err
 '
 
+test_expect_success 'run_command does not try to execute a directory' '
+   test_when_finished "rm -rf bin1 bin2" &&
+   mkdir -p bin1/greet bin2 &&
+   write_script bin2/greet <<-\EOF &&
+   cat bin2/greet
+   EOF
+
+   PATH=$PWD/bin1:$PWD/bin2:$PATH \
+   test-run-command run-command greet >actual 2>err &&
+   test_cmp bin2/greet actual &&
+   test_cmp empty err
+'
+
+test_expect_success POSIXPERM 'run_command passes over non-executable file' '
+   test_when_finished "rm -rf bin1 bin2" &&
+   mkdir -p bin1 bin2 &&
+   write_script bin1/greet <<-\EOF &&
+   cat bin1/greet
+   EOF
+   chmod -x bin1/greet &&
+   write_script bin2/greet <<-\EOF &&
+   cat bin2/greet
+   EOF
+
+   PATH=$PWD/bin1:$PWD/bin2:$PATH \
+   test-run-command run-command greet >actual 2>err &&
+   test_cmp bin2/greet actual &&
+   test_cmp empty err
+'
+
 test_expect_success POSIXPERM 'run_command reports EACCES' '
cat hello-script >hello.sh &&
chmod -x hello.sh &&
-- 
2.13.0.rc0.306.g87b477812d-goog

[PATCH v8 1/2] run-command: expose is_executable function

2017-04-25 Thread Brandon Williams

Move the logic for 'is_executable()' from help.c to run_command.c and
expose it so that callers from outside help.c can access the function.
This is to enable run-command to be able to query if a file is
executable in a future patch.

Signed-off-by: Brandon Williams 
---
 help.c| 43 +--
 run-command.c | 42 ++
 run-command.h |  1 +
 3 files changed, 44 insertions(+), 42 deletions(-)

diff --git a/help.c b/help.c
index bc6cd19cf..0c65a2d21 100644
--- a/help.c
+++ b/help.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "builtin.h"
 #include "exec_cmd.h"
+#include "run-command.h"
 #include "levenshtein.h"
 #include "help.h"
 #include "common-cmds.h"
@@ -96,48 +97,6 @@ static void pretty_print_cmdnames(struct cmdnames *cmds, 
unsigned int colopts)
string_list_clear(&list, 0);
 }
 
-static int is_executable(const char *name)
-{
-   struct stat st;
-
-   if (stat(name, &st) || /* stat, not lstat */
-   !S_ISREG(st.st_mode))
-   return 0;
-
-#if defined(GIT_WINDOWS_NATIVE)
-   /*
-* On Windows there is no executable bit. The file extension
-* indicates whether it can be run as an executable, and Git
-* has special-handling to detect scripts and launch them
-* through the indicated script interpreter. We test for the
-* file extension first because virus scanners may make
-* it quite expensive to open many files.
-*/
-   if (ends_with(name, ".exe"))
-   return S_IXUSR;
-
-{
-   /*
-* Now that we know it does not have an executable extension,
-* peek into the file instead.
-*/
-   char buf[3] = { 0 };
-   int n;
-   int fd = open(name, O_RDONLY);
-   st.st_mode &= ~S_IXUSR;
-   if (fd >= 0) {
-   n = read(fd, buf, 2);
-   if (n == 2)
-   /* look for a she-bang */
-   if (!strcmp(buf, "#!"))
-   st.st_mode |= S_IXUSR;
-   close(fd);
-   }
-}
-#endif
-   return st.st_mode & S_IXUSR;
-}
-
 static void list_commands_in_dir(struct cmdnames *cmds,
 const char *path,
 const char *prefix)
diff --git a/run-command.c b/run-command.c
index a97d7bf9f..2ffbd7e67 100644
--- a/run-command.c
+++ b/run-command.c
@@ -117,6 +117,48 @@ static inline void close_pair(int fd[2])
close(fd[1]);
 }
 
+int is_executable(const char *name)
+{
+   struct stat st;
+
+   if (stat(name, &st) || /* stat, not lstat */
+   !S_ISREG(st.st_mode))
+   return 0;
+
+#if defined(GIT_WINDOWS_NATIVE)
+   /*
+* On Windows there is no executable bit. The file extension
+* indicates whether it can be run as an executable, and Git
+* has special-handling to detect scripts and launch them
+* through the indicated script interpreter. We test for the
+* file extension first because virus scanners may make
+* it quite expensive to open many files.
+*/
+   if (ends_with(name, ".exe"))
+   return S_IXUSR;
+
+{
+   /*
+* Now that we know it does not have an executable extension,
+* peek into the file instead.
+*/
+   char buf[3] = { 0 };
+   int n;
+   int fd = open(name, O_RDONLY);
+   st.st_mode &= ~S_IXUSR;
+   if (fd >= 0) {
+   n = read(fd, buf, 2);
+   if (n == 2)
+   /* look for a she-bang */
+   if (!strcmp(buf, "#!"))
+   st.st_mode |= S_IXUSR;
+   close(fd);
+   }
+}
+#endif
+   return st.st_mode & S_IXUSR;
+}
+
 static char *locate_in_PATH(const char *file)
 {
const char *p = getenv("PATH");
diff --git a/run-command.h b/run-command.h
index 4fa8f65ad..3932420ec 100644
--- a/run-command.h
+++ b/run-command.h
@@ -51,6 +51,7 @@ struct child_process {
 #define CHILD_PROCESS_INIT { NULL, ARGV_ARRAY_INIT, ARGV_ARRAY_INIT }
 void child_process_init(struct child_process *);
 void child_process_clear(struct child_process *);
+extern int is_executable(const char *name);
 
 int start_command(struct child_process *);
 int finish_command(struct child_process *);
-- 
2.13.0.rc0.306.g87b477812d-goog

Re: [PATCH] sequencer: require trailing NL in footers

2017-04-25 Thread Jonathan Nieder

Johannes Schindelin wrote:
>> On Tue, Apr 25, 2017 at 2:51 PM, Jonathan Tan  
>> wrote:
>>> On 04/25/2017 02:04 PM, Stefan Beller wrote:

 Do we want to test for this use case in the future?
[...]
>>> I'm not sure of the value of including a test for this specific use case,
>>> because Git normally does not create commit messages with no trailing
>>> newlines. (To test this, I suspect I would need to use hash-object with a
>>> specifically crafted commit object.)
[...]
> Just because Git usually does not create commit messages without trailing
> newlines does not mean that it is a rare thing. We got bug reports about
> this, so I think it is frequent enough that we could save time by adding
> that test and avoid future bug reports/bug hunts.

To put it another way: it wouldn't have been worth writing this patch
if it wasn't worth keeping the behavior.  So we do need a test.

Thanks,
Jonathan

Possible regression in git-http-backend

2017-04-25 Thread Daniel Wallace

I am not sure if this is a regression or not, but I wanted to get feedback.

It looks like this commit changed some behavior in git-http-backend

https://git.kernel.org/pub/scm/git/git.git/commit/?id=6bc0cb5176a5e42ca4a74e3558e8f0790ed09bb1

The change that it has made is that it no git-upload-pack hangs when
uwsgi doesn't close stdin.

http://lists.unbit.it/pipermail/uwsgi/2016-April/008440.html

This was the only place I could find this discussed online for uwsgi
(but I did find several blog posts comments mentioning the issue).  It
looks like it was worked around in uwsgi >= 2.0.13 by adding the
setting `cgi-close-stdin-on-eof = true`.  But there was some
discussion last year that this shouldn't be needed based on the RFC
for cgi?

http://lists.unbit.it/pipermail/uwsgi/2016-May/008462.html
http://web.archive.org/web/20100212094332/http://hoohoo.ncsa.illinois.edu/cgi/in.html

"""
The server will send CONTENT_LENGTH bytes on this file descriptor.
Remember that it will give the CONTENT_TYPE of the data as well. The
server is in no way obligated to send end-of-file after the script reads
CONTENT_LENGTH bytes.
"""

Here is a gist for providing a config setup, I used the latest nginx
from the nginx repo for centos 7 and uwsgi 2.0.14 from epel.  The
setup requires starting nginx.service and uwsgi.service and creating a
bare repo in /srv/git/ and running

https://gist.github.com/gtmanfred/8b2e26b07db3c75094d0607048b2c828

Thanks,
Daniel

Re: [PATCH v2] rebase -i: add config to abbreviate command-names

2017-04-25 Thread liam Beguin

Hi Jake, 

On Tue, 2017-04-25 at 01:29 -0700, Jacob Keller wrote:
> On Mon, Apr 24, 2017 at 11:29 PM, Junio C Hamano  wrote:
> > Personally I am happy with the beginning of each instruction line
> > aligned, so from that point of view, this patch is a mild Meh to me,
> > even though I do a fair amount of "rebase -i" myself.  But obviously
> > I am not the only user of Git you need to please, so...
> 
> I would instead justify this as making it easier to change the action,
> since you only need to rewrite a single letter, which at least in vim
> takes "r" to change the action, vs slightly more keystrokes
> such as "ct  
> Also, if you change the default commit hash length, it becomes long
> enough to cover most commits and you see all commits at say 12 digits
> commit hash and everything is nicely aligned.
> 
> Thanks,
> Jake

Thanks,
Liam

Re: [PATCH] sequencer: require trailing NL in footers

2017-04-25 Thread Jonathan Nieder

Hi,

Jonathan Tan wrote:

> In commit 967dfd4 ("sequencer: use trailer's trailer layout",
> 2016-11-29), sequencer was taught to use the same mechanism as
> interpret-trailers to determine the nature of the trailer of a commit
> message (referred to as the "footer" in sequencer.c). However, the
> requirement that a footer end in a newline character was inadvertently
> removed. Restore that requirement.
>
> While writing this commit, I noticed that if the "ignore_footer"
> parameter in "has_conforming_footer" is greater than the distance
> between the trailer start and sb->len, "has_conforming_footer" will
> return an unexpected result. This does not occur in practice, because
> "ignore_footer" is either zero or the return value of an invocation to
> "ignore_non_trailer", which only skips empty lines and comment lines.
> This commit contains a comment explaining this in the function's
> documentation.
>
> Reported-by: Brian Norris 
> Signed-off-by: Jonathan Tan 
> ---
[...]
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -151,6 +151,12 @@ static const char *get_todo_path(const struct 
> replay_opts *opts)
>   * Returns 1 for conforming footer
>   * Returns 2 when sob exists within conforming footer
>   * Returns 3 when sob exists within conforming footer as last entry
> + *
> + * A footer that does not end in a newline is considered non-conforming.
> + *
> + * ignore_footer, if not zero, should be the return value of an invocation to
> + * ignore_non_trailer. See the documentation of that function for more
> + * information.
>   */
>  static int has_conforming_footer(struct strbuf *sb, struct strbuf *sob,
>   int ignore_footer)
> @@ -159,6 +165,11 @@ static int has_conforming_footer(struct strbuf *sb, 
> struct strbuf *sob,
>   int i;
>   int found_sob = 0, found_sob_last = 0;
>  
> + if (sb->len <= ignore_footer)
> + return 0;
> + if (sb->buf[sb->len - ignore_footer - 1] != '\n')
> + return 0;
> +

This is super subtle, but it does the right thing.  The caller will
notice it's not a conforming footer, add a newline to separate the new
footer from it, and repair the footer as a side effect.

Reviewed-by: Jonathan Nieder 

Followup question: what should happen if there is a non-footer-shaped
thing with no trailing newline at the end of the commit message? Should
we add two newlines in that case when producing a new footer?

Thanks,
Jonathan

Re: [PATCH] rebase -i: add config to abbreviate command name

2017-04-25 Thread liam BEGUIN

Hi Johannes,

On Tue, 2017-04-25 at 21:45 +0200, Johannes Schindelin wrote:
> Hi Liam,
> 
> On Mon, 24 Apr 2017, liam BEGUIN wrote:
> 
> > On Mon, 2017-04-24 at 12:26 +0200, Johannes Schindelin wrote:
> > 
> > > On Sun, 23 Apr 2017, Liam Beguin wrote:
> > > 
> > > > Add the 'rebase.abbrevCmd' boolean config option to allow the user
> > > > to abbreviate the default command name while editing the
> > > > 'git-rebase-todo' file.
> > > 
> > > This patch does not handle the `git rebase --edit-todo` subcommand.
> > > Intentional?
> > 
> > After a little more investigation, I'm not sure what should be added for
> > the `git rebase --edit-todo` subcommand. It seems like it uses the same
> > text that was added the first time (with `git rebase -i`).
> 
> Well, it uses whatever the user may have edited. It may surprise users
> that their `pick` does not get converted to `p` like all the original
> commands.
> 

It makes more sens to me now, I'll add it in next patch

> Ciao,
> Johannes

Thanks, 
Liam

Re: [PATCH v2] rebase -i: add config to abbreviate command-names

2017-04-25 Thread liam BEGUIN

Hi Johannes,


On Tue, 2017-04-25 at 23:23 +0200, Johannes Schindelin wrote:
> Hi Andreas,
> 
> On Tue, 25 Apr 2017, Andreas Schwab wrote:
> 
> > On Apr 25 2017, Liam Beguin  wrote:
> > 
> > > diff --git a/Documentation/config.txt b/Documentation/config.txt
> > > index 475e874d5155..8b1877f2df91 100644
> > > --- a/Documentation/config.txt
> > > +++ b/Documentation/config.txt
> > > @@ -2614,6 +2614,25 @@ rebase.instructionFormat::
> > >   the instruction list during an interactive rebase.  The format will 
> > > automatically
> > >   have the long commit hash prepended to the format.
> > >  
> > > +rebase.abbrevCmd::
> > > + If set to true, `git rebase -i` will abbreviate the command-names in the
> > > + instruction list. This means that instead of looking like this,
> > > +
> > > +---
> > > + pick deadbee The oneline of this commit
> > > + pick fa1afe1 The oneline of the next commit
> > > + ...
> > > +---
> > > +
> > > + the list would use the short version of the command resulting in
> > > + something like this.
> > > +
> > > +---
> > > + p deadbee The oneline of this commit
> > > + p fa1afe1 The oneline of the next commit
> > > + ...
> > > +---
> > 
> > That doesn't explain the point of the option.
> 
> And what you forgot to say in order to make this a constructive criticism
> is: we probably want to add a sentence like this:
> 
> 
>   Using the one-letter abbreviations will align the lines better
>   in case that the non-abbreviated commands have different lengths.
> 
> Speaking of commands with different lengths, I just thought of fixup and
> squash. I do not think those are handled by the patch, but they should be
> (the `action` in the first loop of `rearrange_squash` should abbreviate
> via `test p != "$pickcmd" || action=${action%${action#?}}`).
> 

I just noticed this today, I'll make changes to handle this case. 

> Ciao,
> Johannes

Thanks,
Liam

Re: [PATCH v2] clone: add a --no-tags option to clone without tags

2017-04-25 Thread Jonathan Nieder

Hi,

Ævar Arnfjörð Bjarmason wrote:

> Add a --no-tags option to "git clone" to clone without tags. Currently
> there's no easy way to clone a repository and end up with just a
> "master" branch via --single-branch, or track all branches and no
> tags. Now --no-tags can be added to "git clone" with or without
> --single-branch to clone a repository without tags.

Now I've read the discussion from v1, so you can see my thoughts
evolving in real time. :)

The above feels a bit misleading when it says "there's no easy way to
clone a repository and end up with just a 'master' branch".
--single-branch does exactly that.  Some annotated tags *pointing to
its history* come along for the ride, but what harm are they doing?

In other words, I think the commit message needs a bit more detail about
the use case, to say why omitting those tags is useful.  The use case
is probably sane but it is not explained.  A side effect (and my main
motivation) is that this would make it crystal clear to people looking
at the patch in history that it is talking about tags that are part of
"master"'s history, not tags pointing elsewhere.

> Before this the only way of doing this was either by manually tweaking
> the config in a fresh repository:

Usually commit messages refer to the state of things without some
patch using the present tense --- e.g. "Without this patch, this
--no-tags option can be emulated by (1) manually tweaking the config
in a fresh repository, or (2) by setting tagOpt=--no-tags after
cloning and deleting any existing tags".

[...]
> Which of course was also subtly buggy if --branch was pointed at a
> tag, leaving the user in a detached head:
>
> git clone --single-branch --branch v2.12.0 g...@github.com:git/git.git &&
> cd git &&
> git config remote.origin.tagOpt --no-tags &&
> git tag -l | xargs git tag -d

At this point I lose the trail of thought.  I don't think it's
important to understanding the patch.

> Now all this complexity becomes the much simpler:
>
> git clone --single-branch --no-tags g...@github.com:git/git.git
>
> Or in the case of cloning a single tag "branch":
>
> git clone --single-branch --branch v2.12.0 --no-tags 
> g...@github.com:git/git.git

Nice.

[...]
>  Documentation/git-clone.txt | 14 -
>  builtin/clone.c | 13 ++--
>  t/t5612-clone-refspec.sh| 73 
> +++--
>  3 files changed, 95 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
> index 30052cce49..57b3f478ed 100644
> --- a/Documentation/git-clone.txt
> +++ b/Documentation/git-clone.txt
> @@ -13,7 +13,7 @@ SYNOPSIS
> [-l] [-s] [--no-hardlinks] [-q] [-n] [--bare] [--mirror]
> [-o ] [-b ] [-u ] [--reference ]
> [--dissociate] [--separate-git-dir ]
> -   [--depth ] [--[no-]single-branch]
> +   [--depth ] [--[no-]single-branch] [--no-tags]

Can I pass --tags to negate a previous --no-tags?

[...]
> +--no-tags::
> + Don't clone any tags, and set
> + `remote..tagOpt=--no-tags` in the config, ensuring
> + that future `git pull` and `git fetch` operations won't follow
> + any tags. Subsequent explicit tag fetches will still work,
> + (see linkgit:git-fetch[1]).
> ++
> +Can be used in conjunction with `--single-branch` to clone & maintain

nit: s/&/and/

[...]
> +test_expect_success 'clone with --no-tags' '
> + (
> + cd dir_all_no_tags && git fetch &&
> + git for-each-ref refs/tags >../actual

nit: this would be easier to read with the 'cd' and 'git fetch' on
separate lines.

[...]
> +test_expect_success '--single-branch while HEAD pointing at master and 
> --no-tags' '
> + (
> + cd dir_master_no_tags && git fetch &&

Likewise.

> + git for-each-ref refs/remotes/origin |
> + sed -e "/HEAD$/d" \
> + -e "s|/remotes/origin/|/heads/|" >../actual

Can $/ be expanded by the shell?

The rest looks sensible.

Thanks and hope that helps,
Jonathan

Re: [PATCH] clone: add a --no-tags option to clone without tags

2017-04-25 Thread Jonathan Nieder

Hi,

Ævar Arnfjörð Bjarmason wrote:

> Add a --no-tags option to "git clone" to clone without tags. Currently
> there's no easy way to clone a repository and end up with just a
> "master" branch via --single-branch, or track all branches and no
> tags. Now --no-tags can be added to "git clone" with or without
> --single-branch to clone a repository without tags.

Could --single-branch be made to imply --no-tags, like --depth implies
--single-branch?  After all, all I wanted is that one branch, not some
tags.

Callers who really want the tags could still pass --tags to request
that.

Just thinking out loud,
Jonathan

Re: [PATCH] sequencer: require trailing NL in footers

2017-04-25 Thread Brian Norris

On Tue, Apr 25, 2017 at 02:56:53PM -0700, Stefan Beller wrote:
> In that case: Is it needed to hint at how this bug occurred in the wild?
> (A different Git implementation, which may be fixed now?)

I might contact the original author, but it's easy enough to imagine
automated 'filter-branch' scripts that could produce this. e.g., this
trivial example:

git filter-branch --msg-filter "perl -pe 'chomp if eof'" HEAD^..

Re: [PATCH v5 0/8] Introduce timestamp_t for timestamps

2017-04-25 Thread Johannes Schindelin

Hi René,

On Tue, 25 Apr 2017, René Scharfe wrote:

> Am 24.04.2017 um 15:57 schrieb Johannes Schindelin:
> > Git v2.9.2 was released in a hurry to accomodate for platforms like
> > Windows, where the `unsigned long` data type is 32-bit even for 64-bit
> > setups.
> > 
> > The quick fix was to simply disable all the testing with "absurd"
> > future dates.
> > 
> > However, we can do much better than that, as we already make use of
> > 64-bit data types internally. There is no good reason why we should
> > not use the same for timestamps. Hence, let's use uintmax_t for
> > timestamps.
> > 
> > Note: while the `time_t` data type exists and is meant to be used for
> > timestamps, on 32-bit Linux it is *still* 32-bit. An earlier iteration
> > used `time_t` for that reason, but it came with a few serious
> > downsides: as `time_t` can be signed (and indeed, on Windows it is an
> > int64_t), Git's expectation that 0 is the minimal value does no longer
> > hold true, introducing its own set of interesting challenges. Besides,
> > if we *can* handle far in the future timestamps (except for formatting
> > them using the system libraries), it is more consistent to do so.
> 
> time_t is signed on Linux and BSDs as well.

s/is/happens to be/.

The point is: we must not rely on time_t to be signed just because it
*happens* to be the case on the setups to which we have access. We want
Git to be portable, not only "portable to our own setups".

> Using an unsigned type gives us the ability to represent times beyond
> the 292 billion years in the future that int64_t would give us, but
> prevents recording events that occurred before the Epoch.  That doesn't
> sound like a good deal to me -- storing historical works (e.g. law
> texts) with real time stamps is probably more interesting than fixing
> the year 292277026596 problem within this decade.

It sounds like a good deal to me, if the alternative is that *I* have to
patch Git's source code to support signed timestamps, when *I* am probably
the only one in this entire thread who does not even want them.

So could y'all please just stop talking about signed timestamps to me? I
feel that they are really, really start to irritate the hell out of me.

> > The upside of using `uintmax_t` for timestamps is that we do a much
> > better job to support far in the future timestamps across all
> > platforms, including 32-bit ones. The downside is that those platforms
> > that use a 32-bit `time_t` will barf when parsing or formatting those
> > timestamps.
> 
> IIUC this series has two aims: solving the year 2038 problem on 32-bit
> Linux by replacing time_t (int32_t), and solving the year 2106 problem
> on Windows by replacing unsigned long (uint32_t), right?

No. The series has one aim: to stop using `unsigned long` (which is
ill-defined to begin with) for timestamps.

> The latter one sounds more interesting, because 32-bit platforms would
> still be unable to fully use bigger time values as you wrote above.
> 
> Can we leave time_t alone and just do the part where you replace
> unsigned long with timestamp_t defined as uint64_t?  That should already
> help on Windows, correct?  When/if timestamp_t is later changed to a
> signed type then we could easily convert the time_t cases to timestamp_t
> as well, or the other way around.

This patch series leaves time_t alone already, so your wish has been
fulfilled preemptively.

> > This iteration makes the date_overflows() check more stringent again.
> > 
> > It is arguably a bug to paper over too-large author/committer dates and
> > to replace them with Jan 1 1970 without even telling the user that we do
> > that, but this is the behavior that t4212 verifies, so I reinstated that
> > behavior. The change in behavior was missed because of the missing
> > unsigned_add_overflows() test.
> 
> I can't think of many ways to get future time stamps (broken clock,
> broken CMOS battery, bit rot, time travel), so I wouldn't expect a
> change towards better error reporting to affect a lot of users.  (Not
> necessarily as part of this series, of course.)

If you want to suggest that we should stop verifying overflows when a
complex reasoning can prove that the overflow is not happening in a
billion years, I disagree. Not only is it unnecessarily time-consuming to
ask readers to perform the complex reasoning, and not only is there enough
room for bugs to hide in plain sight (because of the complexity), it also
makes the same code harder to reuse in other software where a different
timestamp data type was chosen (or inherited from previous Git versions).

I'd much rather have easy-to-reason code that does not cause head
scratching (like the "why do we ignore a too large timestamp?" triggering
`if (date_overflows(date)) date = 0;`) than pretending to be smart and
clever and make everybody else feel stupid by forcing them through hoops
of thinking bubbles until they also reached the conclusion that this
actually won't happen. Unless there is a bug i

Re: [PATCH] sequencer: require trailing NL in footers

2017-04-25 Thread Johannes Schindelin

Hi Stefan,

On Tue, 25 Apr 2017, Stefan Beller wrote:

> On Tue, Apr 25, 2017 at 2:51 PM, Jonathan Tan  
> wrote:
> > On 04/25/2017 02:04 PM, Stefan Beller wrote:
> >>
> >> Thanks for the fix. :)
> >> Do we want to test for this use case in the future?
> >
> >
> > Thanks!
> >
> > I'm not sure of the value of including a test for this specific use case,
> > because Git normally does not create commit messages with no trailing
> > newlines. (To test this, I suspect I would need to use hash-object with a
> > specifically crafted commit object.)
> 
> Okay, makes sense to omit a test.
> In that case: Is it needed to hint at how this bug occurred in the wild?
> (A different Git implementation, which may be fixed now?)

Just because Git usually does not create commit messages without trailing
newlines does not mean that it is a rare thing. We got bug reports about
this, so I think it is frequent enough that we could save time by adding
that test and avoid future bug reports/bug hunts.

Ciao,
Johannes

Re: [PATCH] sequencer: require trailing NL in footers

2017-04-25 Thread Stefan Beller

On Tue, Apr 25, 2017 at 2:51 PM, Jonathan Tan  wrote:
> On 04/25/2017 02:04 PM, Stefan Beller wrote:
>>
>> Thanks for the fix. :)
>> Do we want to test for this use case in the future?
>
>
> Thanks!
>
> I'm not sure of the value of including a test for this specific use case,
> because Git normally does not create commit messages with no trailing
> newlines. (To test this, I suspect I would need to use hash-object with a
> specifically crafted commit object.)

Okay, makes sense to omit a test.
In that case: Is it needed to hint at how this bug occurred in the wild?
(A different Git implementation, which may be fixed now?)

Re: [PATCH] sequencer: require trailing NL in footers

2017-04-25 Thread Johannes Schindelin

Hi Jonathan,

On Tue, 25 Apr 2017, Jonathan Tan wrote:

> Thanks for the bug report. Here's a fix - I've verified this with the
> way to reproduce provided in the original e-mail, and it seems to work
> now.

If there is a straight-forward way to convert the manual test into an
automated one, it would be good to make it part of the patch so that we
can stave off future regressions of the fix.

Ciao,
Dscho

Re: [PATCH v5 0/8] Introduce timestamp_t for timestamps

2017-04-25 Thread René Scharfe


Am 24.04.2017 um 15:57 schrieb Johannes Schindelin:

Git v2.9.2 was released in a hurry to accomodate for platforms like
Windows, where the `unsigned long` data type is 32-bit even for 64-bit
setups.

The quick fix was to simply disable all the testing with "absurd" future
dates.

However, we can do much better than that, as we already make use of
64-bit data types internally. There is no good reason why we should not
use the same for timestamps. Hence, let's use uintmax_t for timestamps.

Note: while the `time_t` data type exists and is meant to be used for
timestamps, on 32-bit Linux it is *still* 32-bit. An earlier iteration
used `time_t` for that reason, but it came with a few serious downsides:
as `time_t` can be signed (and indeed, on Windows it is an int64_t),
Git's expectation that 0 is the minimal value does no longer hold true,
introducing its own set of interesting challenges. Besides, if we *can*
handle far in the future timestamps (except for formatting them using
the system libraries), it is more consistent to do so.


time_t is signed on Linux and BSDs as well.

Using an unsigned type gives us the ability to represent times beyond
the 292 billion years in the future that int64_t would give us, but
prevents recording events that occurred before the Epoch.  That doesn't
sound like a good deal to me -- storing historical works (e.g. law
texts) with real time stamps is probably more interesting than fixing
the year 292277026596 problem within this decade.


The upside of using `uintmax_t` for timestamps is that we do a much
better job to support far in the future timestamps across all platforms,
including 32-bit ones. The downside is that those platforms that use a
32-bit `time_t` will barf when parsing or formatting those timestamps.


IIUC this series has two aims: solving the year 2038 problem on 32-bit
Linux by replacing time_t (int32_t), and solving the year 2106 problem
on Windows by replacing unsigned long (uint32_t), right?  The latter one
sounds more interesting, because 32-bit platforms would still be unable
to fully use bigger time values as you wrote above.

Can we leave time_t alone and just do the part where you replace
unsigned long with timestamp_t defined as uint64_t?  That should already
help on Windows, correct?  When/if timestamp_t is later changed to a
signed type then we could easily convert the time_t cases to timestamp_t
as well, or the other way around.


This iteration makes the date_overflows() check more stringent again.

It is arguably a bug to paper over too-large author/committer dates and
to replace them with Jan 1 1970 without even telling the user that we do
that, but this is the behavior that t4212 verifies, so I reinstated that
behavior. The change in behavior was missed because of the missing
unsigned_add_overflows() test.


I can't think of many ways to get future time stamps (broken clock,
broken CMOS battery, bit rot, time travel), so I wouldn't expect a
change towards better error reporting to affect a lot of users.  (Not
necessarily as part of this series, of course.)


Changes since v4:

- in gm_time_t(), we now test specifically that the timezone adjustment
   neither underflows nor overflows.

- the patch introduced in v4 that tried to defer the date_overflows()
   check to gm_time_t() rather than replacing the ident timestamp by a 0
   without any warning was dropped again: it broke t4212.


Johannes Schindelin (8):
   ref-filter: avoid using `unsigned long` for catch-all data type
   t0006 & t5000: prepare for 64-bit timestamps
   t0006 & t5000: skip "far in the future" test when time_t is too
 limited
   Specify explicitly where we parse timestamps
   Introduce a new "printf format" for timestamps
   Introduce a new data type for timestamps
   Abort if the system time cannot handle one of our timestamps
   Use uintmax_t for timestamps

  Documentation/technical/api-parse-options.txt |   8 +-
  archive-tar.c |   5 +-
  archive-zip.c |  12 ++-
  archive.h |   2 +-
  builtin/am.c  |   4 +-
  builtin/blame.c   |  14 ++--
  builtin/fsck.c|   6 +-
  builtin/gc.c  |   2 +-
  builtin/log.c |   4 +-
  builtin/merge-base.c  |   2 +-
  builtin/name-rev.c|   6 +-
  builtin/pack-objects.c|   4 +-
  builtin/prune.c   |   4 +-
  builtin/receive-pack.c|  14 ++--
  builtin/reflog.c  |  24 +++---
  builtin/rev-list.c|   2 +-
  builtin/rev-parse.c   |   2 +-
  builtin/show-branch.c |   4 +-
  builtin/worktree.c|   4 +-
  bundle.c

Re: [PATCH] sequencer: require trailing NL in footers

2017-04-25 Thread Jonathan Tan


On 04/25/2017 02:04 PM, Stefan Beller wrote:

Thanks for the fix. :)
Do we want to test for this use case in the future?


Thanks!

I'm not sure of the value of including a test for this specific use 
case, because Git normally does not create commit messages with no 
trailing newlines. (To test this, I suspect I would need to use 
hash-object with a specifically crafted commit object.)





@@ -151,6 +151,12 @@ static const char *get_todo_path(const struct replay_opts 
*opts)
  * Returns 1 for conforming footer
  * Returns 2 when sob exists within conforming footer
  * Returns 3 when sob exists within conforming footer as last entry
+ *
+ * A footer that does not end in a newline is considered non-conforming.
+ *
+ * ignore_footer, if not zero, should be the return value of an invocation to
+ * ignore_non_trailer. See the documentation of that function for more
+ * information.
  */


Makes sense. Maybe s/ignore_non_trailer/ignore_non_trailer()/ which makes
it easier to recognize it as a function? I'd also drop the last
sentence as it is
implied in the previous sentence (sort of).


OK, I'll add the parentheses if I need to reroll. As for the last 
sentence, the documentation for ignore_non_trailer is a bit unusual in 
that it says specifically "to be fed as the second parameter to 
append_signoff()", so I want to call extra attention to that.

Re: Git For Windows SDK - cannot build

2017-04-25 Thread Johannes Schindelin

Hi Stepan,

as I commented on your post to the Git for Windows mailing list:

On Tue, 25 Apr 2017, Stepan Kasal wrote:

> I have installed git for windows sdk from the web and tried to build it.

Nobody in the project found the time to update the website yet, but we do
have a simpler way to install the SDK now:

git clone https://github.com/git-for-windows/git-sdk-64

> $ make
> LINK git-credential-store
> libgit.a(utf8.o): In function `reencode_string_iconv':
> /usr/src/git/utf8.c:463: undefined reference to `libiconv'
> /usr/src/git/utf8.c:463:(.text+0xf77): relocation truncated to fit: 
> R_X86_64_PC32 against undefined symbol `libiconv'

This most likely means that `pacman -Sy mingw-w64-x86_64-libiconv` should
be called before re-running `make`.

The reason why the `libiconv` package (without `mingw-w64-` prefix) is not
helping: that package is an *MSYS2* package, i.e. it implicitly links to
the MSYS2 runtime (the POSIX emulation layer derived from Cygwin that we
use in Git for Windows to run Shell and Perl scripts, as well as OpenSSH).

Ciao,
Johannes

Re: BUG: diff-{index,files,tree} (and git-gui) do not respect the diff.indentHeuristic config setting

2017-04-25 Thread Jeff King

On Tue, Apr 25, 2017 at 01:21:09PM -0400, Marc Branchaud wrote:

> So I have
> 
>   diff.indentHeuristic = true
> 
> and I noticed that git-gui was not using the heuristic.  This is because
> git-gui uses diff-index, and that does not respect the config setting, even
> though it supports the --indent-heuristic option.
> 
> And it looks like diff-files and diff-tree also have the same problem.
> 
> I tried a couple of quick-n-dirty things to fix it in diff-index, without
> success, and I've run out of git-hacking tame, so all I can do for now is
> throw out a bug report.

Right, this is intentional. Those commands are scriptable plumbing, and
we try to avoid surprising their callers with behavior-changing config.

I could see an argument that the behavior change for this particular
option is subtle enough that it any script caller would not need to
care. The resulting diff is syntactically identical, and it's not like
Git makes promises about the exact diff algorithm it uses. We generally
tend to err on the side of caution with plumbing, though (for instance,
if you piped the result through patch-id, you'd probably get different
results with and without the config option set).

> diff-index.c explicitly says "no 'diff' UI options" since 83ad63cfeb ("diff:
> do not use configuration magic at the core-level", 2006-07-08), so maybe
> this needs to be fixed in git-gui (and maybe elsewhere), but to me it feels
> like the diff-foo commands should respect the setting.

Yes, if git-gui wants to respect the option, it needs to read the config
and turn on the command-line option. That's what add--interactive does,
for example.

It's possible we could make this simpler with a catch-all "allow diff
ui config" command-line option. The effect is the same, but it makes
things simpler for the caller.

-Peff

Re: [PATCH v2] rebase -i: add config to abbreviate command-names

2017-04-25 Thread Johannes Schindelin

Hi Andreas,

On Tue, 25 Apr 2017, Andreas Schwab wrote:

> On Apr 25 2017, Liam Beguin  wrote:
> 
> > diff --git a/Documentation/config.txt b/Documentation/config.txt
> > index 475e874d5155..8b1877f2df91 100644
> > --- a/Documentation/config.txt
> > +++ b/Documentation/config.txt
> > @@ -2614,6 +2614,25 @@ rebase.instructionFormat::
> > the instruction list during an interactive rebase.  The format will 
> > automatically
> > have the long commit hash prepended to the format.
> >  
> > +rebase.abbrevCmd::
> > +   If set to true, `git rebase -i` will abbreviate the command-names in the
> > +   instruction list. This means that instead of looking like this,
> > +
> > +---
> > +   pick deadbee The oneline of this commit
> > +   pick fa1afe1 The oneline of the next commit
> > +   ...
> > +---
> > +
> > +   the list would use the short version of the command resulting in
> > +   something like this.
> > +
> > +---
> > +   p deadbee The oneline of this commit
> > +   p fa1afe1 The oneline of the next commit
> > +   ...
> > +---
> 
> That doesn't explain the point of the option.

And what you forgot to say in order to make this a constructive criticism
is: we probably want to add a sentence like this:

Using the one-letter abbreviations will align the lines better
in case that the non-abbreviated commands have different lengths.

Speaking of commands with different lengths, I just thought of fixup and
squash. I do not think those are handled by the patch, but they should be
(the `action` in the first loop of `rearrange_squash` should abbreviate
via `test p != "$pickcmd" || action=${action%${action#?}}`).

Ciao,
Johannes

Re: Inquiries about the use of captured images etc.

2017-04-25 Thread Johannes Schindelin

Hi,

On Mon, 24 Apr 2017, suzuki-kenta wrote:

> Git image and capture screen,
> I'd like to get a license but is it possible?

If I understand correctly, you want to use screenshots of Git? As far as I
can tell, there is no need to license those. There are tons of examples
all over the internet where users blog about how they use Git...

Ciao,
Johannes

[PATCH v4 16/19] grep: add support for the PCRE v1 JIT API

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Change the grep PCRE v1 code to use JIT when available. When PCRE
support was initially added in commit 63e7e9d8b6 ("git-grep: Learn
PCRE", 2011-05-09) PCRE had no JIT support, it was integrated into
8.20 released on 2011-10-21.

When JIT support is enabled the PCRE performance usually improves by
more than 50%. The pattern compilation times are relatively slower,
but those relative numbers are tiny, and are easily made back in all
but the most trivial cases of grep. Detailed benchhmarks are available
at: http://sljit.sourceforge.net/pcre.html

With this change the difference in a t/perf/p7820-grep-engines.sh run
is, shown with git --word-diff:

7820.1: extended with how.to   
[-0.28(1.23+0.44)-]{+0.28(1.18+0.39)+}
7820.2: extended with ^how to  
[-0.26(1.15+0.38)-]{+0.27(1.13+0.40)+}
7820.3: extended with \w+our\w*
[-6.06(38.44+0.35)-]{+6.11(38.66+0.32)+}
7820.4: extended with -?-?-?-?-?-?-?-?-?-?-?---$   
[-0.37(1.57+0.38)-]{+0.37(1.56+0.42)+}
7820.5: pcre1 with how.to  
[-0.26(1.15+0.37)-]{+0.19(0.39+0.55)+}
7820.6: pcre1 with ^how to 
[-0.46(2.66+0.31)-]{+0.22(0.67+0.44)+}
7820.7: pcre1 with \w+our\w*   
[-16.42(99.42+0.48)-]{+0.51(3.05+0.24)+}
7820.8: pcre1 with -?-?-?-?-?-?-?-?-?-?-?---$  
[-81.52(275.37+0.41)-]{+5.16(19.31+0.33)+}

The conditional support for JIT is implemented as suggested in the
pcrejit(3) man page. E.g. defining PCRE_STUDY_JIT_COMPILE to 0 if it's
not present.

There's no graceful fallback if pcre_jit_stack_alloc() fails under
PCRE_CONFIG_JIT, instead the program will abort. I don't think this is
worth handling, it'll only fail in cases where malloc() doesn't work,
in which case we're screwed anyway.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 27 ++-
 grep.h |  5 +
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index c61e69deaa..8e3ba164b5 100644
--- a/grep.c
+++ b/grep.c
@@ -329,6 +329,9 @@ static void compile_pcre1_regexp(struct grep_pat *p, const 
struct grep_opt *opt)
const char *error;
int erroffset;
int options = PCRE_MULTILINE;
+#ifdef PCRE_CONFIG_JIT
+   int canjit;
+#endif
 
if (opt->ignore_case) {
if (has_non_ascii(p->pattern))
@@ -343,9 +346,19 @@ static void compile_pcre1_regexp(struct grep_pat *p, const 
struct grep_opt *opt)
if (!p->pcre1_regexp)
compile_regexp_failed(p, error);
 
-   p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, &error);
+   p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 
PCRE_STUDY_JIT_COMPILE, &error);
if (!p->pcre1_extra_info && error)
die("%s", error);
+
+#ifdef PCRE_CONFIG_JIT
+   pcre_config(PCRE_CONFIG_JIT, &canjit);
+   if (canjit == 1) {
+   p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
+   if (!p->pcre1_jit_stack)
+   die("BUG: Couldn't allocate PCRE JIT stack");
+   pcre_assign_jit_stack(p->pcre1_extra_info, NULL, 
p->pcre1_jit_stack);
+   }
+#endif
 }
 
 static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
@@ -356,8 +369,15 @@ static int pcre1match(struct grep_pat *p, const char 
*line, const char *eol,
if (eflags & REG_NOTBOL)
flags |= PCRE_NOTBOL;
 
+#ifdef PCRE_CONFIG_JIT
+   ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - 
line,
+   0, flags, ovector, ARRAY_SIZE(ovector),
+   p->pcre1_jit_stack);
+#else
ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line,
0, flags, ovector, ARRAY_SIZE(ovector));
+#endif
+
if (ret < 0 && ret != PCRE_ERROR_NOMATCH)
die("pcre_exec failed with error code %d", ret);
if (ret > 0) {
@@ -372,7 +392,12 @@ static int pcre1match(struct grep_pat *p, const char 
*line, const char *eol,
 static void free_pcre1_regexp(struct grep_pat *p)
 {
pcre_free(p->pcre1_regexp);
+#ifdef PCRE_CONFIG_JIT
+   pcre_free_study(p->pcre1_extra_info);
+   pcre_jit_stack_free(p->pcre1_jit_stack);
+#else
pcre_free(p->pcre1_extra_info);
+#endif
pcre_free((void *)p->pcre1_tables);
 }
 #else /* !USE_LIBPCRE1 */
diff --git a/grep.h b/grep.h
index fa2ab9485f..29e20bf837 100644
--- a/grep.h
+++ b/grep.h
@@ -3,9 +3,13 @@
 #include "color.h"
 #ifdef USE_LIBPCRE1
 #include 
+#ifndef PCRE_STUDY_JIT_COMPILE
+#define PCRE_STUDY_JIT_COMPILE 0
+#endif
 #else
 typedef int pcre;
 typedef int pcre_extra;
+typedef int pcre_jit_stack;
 #endif
 #include "kwset.h"
 #include "thread-utils.h"
@@ -48,6 +52,7 @@ struct grep_pat {
regex_t regexp;
pcre *pcre1_regexp;
pcre_extra *pcre1_extra_info;
+   pcre_jit_st

[PATCH v4 18/19] grep: remove support for concurrent use of both PCRE v1 & v2

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Remove the support for concurrently using PCRE v1 & v2 by compiling
Git with support for both at the same time.

Having access to both at runtime via grep.patternType=[pcre1|pcre2]
makes it easier for the developer hacking on the PCRE implementations
to test them concurrently, but adds confusion for everyone else,
particularly Git users who have no reason to concurrently use both
libraries.

Now either USE_LIBPCRE1=YesPlease (or its alias USE_LIBPCRE) or
USE_LIBPCRE2=YesPlease can be supplied when building git, but
providing both will yield an error, similarly providing both
--with-libpcre1 & --with-libpcre2 to the configure script will produce
an error.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/config.txt   |  7 
 Makefile   | 36 +--
 builtin/grep.c |  3 --
 configure.ac   | 72 +-
 grep.c | 19 +-
 grep.h |  4 +--
 t/README   | 12 ---
 t/perf/p7820-grep-engines.sh   |  2 +-
 t/t7810-grep.sh| 30 +---
 t/t7814-grep-recurse-submodules.sh | 14 +---
 t/test-lib.sh  |  5 ++-
 11 files changed, 65 insertions(+), 139 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index a5fc482495..475e874d51 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1624,13 +1624,6 @@ grep.patternType::
'fixed', or 'perl' will enable the `--basic-regexp`, 
`--extended-regexp`,
`--fixed-strings`, or `--perl-regexp` option accordingly, while the
value 'default' will return to the default matching behavior.
-+
-The 'perl' and 'pcre' values are synonyms. Depending on which PCRE
-library Git was compiled with either or both of 'pcre1' and 'pcre2'
-might also be available.
-+
-If both are available Git currently defaults to 'pcre1', but this
-might change in future versions.
 
 grep.extendedRegexp::
If set to true, enable `--extended-regexp` option by default. This
diff --git a/Makefile b/Makefile
index afdde49cda..a792f206b9 100644
--- a/Makefile
+++ b/Makefile
@@ -29,16 +29,13 @@ all::
 # Perl-compatible regular expressions instead of standard or extended
 # POSIX regular expressions.
 #
-# Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
-# /foo/bar/include and /foo/bar/lib directories.
+# Currently USE_LIBPCRE is a synonym for USE_LIBPCRE1, define
+# USE_LIBPCRE2 instead if you'd like to use version 2 of the PCRE
+# library. The USE_LIBPCRE flag will likely be changed to mean v2 by
+# default in future releases.
 #
-# Define USE_LIBPCRE2 if you have and want to use libpcre2. Various
-# commands such as log and grep offer runtime options to use
-# Perl-compatible regular expressions instead of standard or extended
-# POSIX regular expressions.
-#
-# Define LIBPCRE2DIR=/foo/bar if your libpcre2 header and library
-# files are in /foo/bar/include and /foo/bar/lib directories.
+# Define LIBPCREDIR=/foo/bar if your PCRE header and library files are in
+# /foo/bar/include and /foo/bar/lib directories.
 #
 # Define HAVE_ALLOCA_H if you have working alloca(3) defined in that header.
 #
@@ -1093,24 +1090,27 @@ ifdef NO_LIBGEN_H
COMPAT_OBJS += compat/basename.o
 endif
 
-ifdef USE_LIBPCRE
-   BASIC_CFLAGS += -DUSE_LIBPCRE1
-   ifdef LIBPCREDIR
-   BASIC_CFLAGS += -I$(LIBPCREDIR)/include
-   EXTLIBS += -L$(LIBPCREDIR)/$(lib) 
$(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib)
+USE_LIBPCRE1 ?= $(USE_LIBPCRE)
+
+ifneq (,$(USE_LIBPCRE1))
+   ifdef USE_LIBPCRE2
+$(error Only set USE_LIBPCRE1 (or its alias USE_LIBPCRE) or USE_LIBPCRE2, not 
both!)
endif
+
+   BASIC_CFLAGS += -DUSE_LIBPCRE1
EXTLIBS += -lpcre
 endif
 
 ifdef USE_LIBPCRE2
BASIC_CFLAGS += -DUSE_LIBPCRE2
-   ifdef LIBPCRE2DIR
-   BASIC_CFLAGS += -I$(LIBPCRE2DIR)/include
-   EXTLIBS += -L$(LIBPCRE2DIR)/$(lib) 
$(CC_LD_DYNPATH)$(LIBPCR2EDIR)/$(lib)
-   endif
EXTLIBS += -lpcre2-8
 endif
 
+ifdef LIBPCREDIR
+   BASIC_CFLAGS += -I$(LIBPCREDIR)/include
+   EXTLIBS += -L$(LIBPCREDIR)/$(lib) $(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib)
+endif
+
 ifdef HAVE_ALLOCA_H
BASIC_CFLAGS += -DHAVE_ALLOCA_H
 endif
diff --git a/builtin/grep.c b/builtin/grep.c
index 178b10aa6f..be3dbd6957 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -495,9 +495,6 @@ static void compile_submodule_options(const struct grep_opt 
*opt,
break;
case GREP_PATTERN_TYPE_UNSPECIFIED:
break;
-   case GREP_PATTERN_TYPE_PCRE1:
-   case GREP_PATTERN_TYPE_PCRE2:
-   die("BUG: Command-line option for pcre1 or pcre2 added without 
updating switch statement");
default:
die("BUG: Added a new grep pattern type without updating switch 
statement");

[PATCH v4 19/19] Makefile & configure: make PCRE v2 the default PCRE implementation

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Change the USE_LIBPCRE=YesPlease & --with-libpcre flags to the
Makefile & configure script, respectively, to mean use PCRE v2, not
PCRE v1.

The legacy library previously available via those options is still
available on request via USE_LIBPCRE1=YesPlease or
--with-libpcre1. The existing USE_LIBPCRE2=YesPlease & --with-libpcre2
still explicitly ask for v2.

The v2 PCRE is stable & end-user compatible, all this change does is
change the default. Someone building a new git is likely to also have
packaged PCRE v2 sometime in the last 2 years since it was released.
If not they can choose to use the legacy v2 library by making the
trivial s/USE_LIBPCRE/USE_LIBPCRE1/ change, or package up PCRE v2.

New releases of PCRE v2 are already faster than PCRE v1, and not only
is all significant development is happening on v2, but bugs in
reported against v1 have started getting WONTFIX'd asking users to
just upgrade to v2.

So it makes sense to give our downstream distributors a nudge to
switch over to it.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile | 21 ++---
 configure.ac | 18 +-
 2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/Makefile b/Makefile
index a792f206b9..42b09f9632 100644
--- a/Makefile
+++ b/Makefile
@@ -29,10 +29,10 @@ all::
 # Perl-compatible regular expressions instead of standard or extended
 # POSIX regular expressions.
 #
-# Currently USE_LIBPCRE is a synonym for USE_LIBPCRE1, define
-# USE_LIBPCRE2 instead if you'd like to use version 2 of the PCRE
-# library. The USE_LIBPCRE flag will likely be changed to mean v2 by
-# default in future releases.
+# The USE_LIBPCRE flag is a synonym for USE_LIBPCRE2, in previous
+# versions it meant the same thing USE_LIBPCRE1 does now. Define
+# USE_LIBPCRE1 instead if you'd like to use the legacy version 1 of
+# the PCRE library.
 #
 # Define LIBPCREDIR=/foo/bar if your PCRE header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
@@ -1090,18 +1090,17 @@ ifdef NO_LIBGEN_H
COMPAT_OBJS += compat/basename.o
 endif
 
-USE_LIBPCRE1 ?= $(USE_LIBPCRE)
-
-ifneq (,$(USE_LIBPCRE1))
-   ifdef USE_LIBPCRE2
-$(error Only set USE_LIBPCRE1 (or its alias USE_LIBPCRE) or USE_LIBPCRE2, not 
both!)
-   endif
+USE_LIBPCRE2 ?= $(USE_LIBPCRE)
 
+ifdef USE_LIBPCRE1
BASIC_CFLAGS += -DUSE_LIBPCRE1
EXTLIBS += -lpcre
 endif
 
-ifdef USE_LIBPCRE2
+ifneq (,$(USE_LIBPCRE2))
+   ifdef USE_LIBPCRE1
+$(error Only set USE_LIBPCRE2 (or its alias USE_LIBPCRE) or USE_LIBPCRE1, not 
both!)
+   endif
BASIC_CFLAGS += -DUSE_LIBPCRE2
EXTLIBS += -lpcre2-8
 endif
diff --git a/configure.ac b/configure.ac
index 11d083fbe0..f9659daeb7 100644
--- a/configure.ac
+++ b/configure.ac
@@ -255,25 +255,25 @@ GIT_PARSE_WITH([openssl]))
 # Perl-compatible regular expressions instead of standard or extended
 # POSIX regular expressions.
 #
-# Currently USE_LIBPCRE is a synonym for USE_LIBPCRE1, define
-# USE_LIBPCRE2 instead if you'd like to use version 2 of the PCRE
-# library. The USE_LIBPCRE flag will likely be changed to mean v2 by
-# default in future releases.
+# The USE_LIBPCRE flag is a synonym for USE_LIBPCRE2, in previous
+# versions it meant the same thing USE_LIBPCRE1 does now. Define
+# USE_LIBPCRE1 instead if you'd like to use the legacy version 1 of
+# the PCRE library.
 #
 # Define LIBPCREDIR=/foo/bar if your PCRE header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
 #
 AC_ARG_WITH(libpcre,
-AS_HELP_STRING([--with-libpcre],[synonym for --with-libpcre1]),
+AS_HELP_STRING([--with-libpcre],[synonym for --with-libpcre2]),
 if test "$withval" = "no"; then
-   USE_LIBPCRE1=
+   USE_LIBPCRE2=
 elif test "$withval" = "yes"; then
-   USE_LIBPCRE1=YesPlease
+   USE_LIBPCRE2=YesPlease
 else
-   USE_LIBPCRE1=YesPlease
+   USE_LIBPCRE2=YesPlease
LIBPCREDIR=$withval
AC_MSG_NOTICE([Setting LIBPCREDIR to $LIBPCREDIR])
-dnl USE_LIBPCRE1 can still be modified below, so don't substitute
+dnl USE_LIBPCRE2 can still be modified below, so don't substitute
 dnl it yet.
GIT_CONF_SUBST([LIBPCREDIR])
 fi)
-- 
2.11.0

[PATCH v4 17/19] grep: add support for PCRE v2

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Add support for v2 of the PCRE API. This is a new major version of
PCRE that came out in early 2015[1].

The regular expression syntax is the same, but while the API is
similar-ish, pretty much every function is either renamed or takes
different arguments. Thus using it via entirely new functions makes
sense, as opposed to trying to e.g. have one compile_pcre_pattern()
that would call either PCRE v1 or v2 functions.

Git can now be compiled with any combination of
USE_LIBPCRE=[YesPlease|] & USE_LIBPCRE2=[YesPlease|]. If both are
provided the version of the PCRE library can be selected at runtime
with grep.PatternType, but the default (for now) is v1.

This table shows what the various combinations do, depending on what
libraries Git is compiled against:

|--+-+-+--|
| grep.PatternType | v1  | v2  | v1 && v2 |
|--+-+-+--|
| perl | v1  | v2  | v1   |
| pcre | v1  | v2  | v1   |
| pcre1| v1  | ERR | v1   |
| pcre2| ERR | v2  | v2   |
|--+-+-+--|

When Git is only compiled with v2 grep.PatternType=perl, --perl-regexp
& -P will use v2. All tests pass with this new PCRE version. When Git
is compiled with both v1 & v2 most of the tests will only test v1, but
there are some v2-specific tests that will be run.

With earlier patches to enable JIT for PCRE v1 the performance of the
release versions of both libraries have almost exactly the same
performance, with PCRE v2 being around 1% slower.

However after I reported this to the pcre-dev mailing list[2] I got a
lot of help with the API use from Zoltán Herczeg, he
subsequently optimized some of the JIT functionality in v2 of the
library.

Running the p7820-grep-engines.sh performance test against the latest
Subversion trunk of both, with both them and git compiled as -O3, and
the test run against linux.git, gives the following results:

7820.1: extended with how.to   0.27(1.22+0.34)
7820.2: extended with ^how to  0.27(1.18+0.36)
7820.3: extended with \w+our\w*6.09(38.64+0.32)
7820.4: extended with -?-?-?-?-?-?-?-?-?-?-?---$   0.38(1.69+0.28)
7820.5: pcre1 with how.to  0.19(0.42+0.53)
7820.6: pcre1 with ^how to 0.23(0.58+0.50)
7820.7: pcre1 with \w+our\w*   0.50(2.93+0.34)
7820.8: pcre1 with -?-?-?-?-?-?-?-?-?-?-?---$  5.12(19.32+0.38)
7820.9: pcre2 with how.to  0.19(0.34+0.57)
7820.10: pcre2 with ^how to0.19(0.29+0.60)
7820.11: pcre2 with \w+our\w*  0.51(2.85+0.41)
7820.12: pcre2 with -?-?-?-?-?-?-?-?-?-?-?---$ 5.04(19.27+0.31)

I.e. the two are neck-to-neck, but PCRE v2 usually pulls ahead, when
it does it's up to 20% faster.

A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
the compiled pattern can be shared between threads, but not some of
the JIT context, however the grep threading support does all pattern &
JIT compilation in separate threads, so this code doesn't need to
concern itself with thread safety.

See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the
initial addition of PCRE v1. This change follows some of the same
patterns it did (and which were discussed on list at the time),
e.g. mocking up types with typedef instead of ifdef-ing them out when
USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the
program, but makes the code look nicer.

1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html
2. https://lists.exim.org/lurker/thread/20170419.172322.833ee099.en.html

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/config.txt   |  14 ++--
 Makefile   |  18 +
 builtin/grep.c |   7 +-
 configure.ac   |  49 +++-
 grep.c | 154 -
 grep.h |  21 -
 revision.c |   2 +-
 t/README   |  12 +++
 t/perf/p7820-grep-engines.sh   |   2 +-
 t/t7810-grep.sh|  30 +++-
 t/t7813-grep-icase-iso.sh  |  11 ++-
 t/t7814-grep-recurse-submodules.sh |  12 ++-
 t/test-lib.sh  |   4 +-
 13 files changed, 309 insertions(+), 27 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 5ef12d0694..a5fc482495 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1625,14 +1625,12 @@ grep.patternType::
`--fixed-strings`, or `--perl-regexp` option accordingly, while the
value 'default' will return to the default matching behavior.
 +
-The 'pcre

[PATCH v4 15/19] perf: add a performance comparison test of grep -E and -P

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Add a very basic performance comparison test comparing the POSIX
extended & pcre1 engines.

I'm skipping the "basic" POSIX engine because supporting its alternate
regex syntax is hard, although it would be interesting to test it, at
least under glibc it seems to be an entirely different engine, since
it can have very different performance even for patterns that mean the
same thing under extended and non-extended POSIX regular expression
syntax.

Running this on an i7 3.4GHz Linux 3.16.0-4 Debian testing against a
checkout of linux.git & latest upstream PCRE, both PCRE and git
compiled with -O3:

$ GIT_PERF_LARGE_REPO=~/g/linux ./run p7820-grep-engines.sh
[...]
Test   this tree

-
7820.1: extended with how.to   0.28(1.23+0.44)
7820.2: extended with ^how to  0.26(1.15+0.38)
7820.3: extended with \w+our\w*6.06(38.44+0.35)
7820.4: extended with -?-?-?-?-?-?-?-?-?-?-?---$   0.37(1.57+0.38)
7820.5: pcre1 with how.to  0.26(1.15+0.37)
7820.6: pcre1 with ^how to 0.46(2.66+0.31)
7820.7: pcre1 with \w+our\w*   16.42(99.42+0.48)
7820.8: pcre1 with -?-?-?-?-?-?-?-?-?-?-?---$  
81.52(275.37+0.41)

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/perf/p7820-grep-engines.sh | 25 +
 1 file changed, 25 insertions(+)
 create mode 100755 t/perf/p7820-grep-engines.sh

diff --git a/t/perf/p7820-grep-engines.sh b/t/perf/p7820-grep-engines.sh
new file mode 100755
index 00..5ae42ceccc
--- /dev/null
+++ b/t/perf/p7820-grep-engines.sh
@@ -0,0 +1,25 @@
+#!/bin/sh
+
+test_description="Comparison of git-grep's regex engines"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+for engine in extended pcre1
+do
+   # Patterns stolen from http://sljit.sourceforge.net/pcre.html
+   for pattern in \
+   'how.to' \
+   '^how to' \
+   '\w+our\w*' \
+   '-?-?-?-?-?-?-?-?-?-?-?---$'
+   do
+   test_perf "$engine with $pattern" "
+   git -c grep.patternType=$engine grep -- '$pattern' || :
+   "
+   done
+done
+
+test_done
-- 
2.11.0

[PATCH v4 11/19] grep: make grep.patternType=[pcre|pcre1] a synonym for "perl"

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Make the pattern types "pcre" & "pcre1" synonyms for the long-standing
"perl" grep.patternType.

This change is part of a longer patch series to add pcre2 support to
Git. It's nice to be able to performance test PCRE v1 v.s. v2 without
having to recompile git, and doing that via grep.patternType makes
sense.

However, just adding "pcre2" when we only have "perl" would be
confusing, so start by adding a "pcre" & "pcre1" synonym.

In the future "perl" and "pcre" might be changed to default to "pcre2"
instead of "pcre1", and depending on how Git is compiled the more
specific "pcre1" or "pcre2" pattern types might produce an error.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/config.txt   |  9 +
 grep.c |  4 +++-
 t/t7810-grep.sh| 10 ++
 t/t7814-grep-recurse-submodules.sh |  4 
 4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 475e874d51..5ef12d0694 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1624,6 +1624,15 @@ grep.patternType::
'fixed', or 'perl' will enable the `--basic-regexp`, 
`--extended-regexp`,
`--fixed-strings`, or `--perl-regexp` option accordingly, while the
value 'default' will return to the default matching behavior.
++
+The 'pcre' and 'pcre1' values are synonyms for 'perl'. The other
+values starting with 'pcre' are reserved for future use, e.g. if we'd
+like to use 'pcre2' for the PCRE v2 library.
++
+In the future 'perl' and 'pcre' might become synonyms for some other
+implementation or PCRE version, such as 'pcre2', while the more
+specific 'pcre1' & 'pcre2' might throw errors depending on whether git
+is compiled to include those libraries.
 
 grep.extendedRegexp::
If set to true, enable `--extended-regexp` option by default. This
diff --git a/grep.c b/grep.c
index 6995f0989a..62c3749d9f 100644
--- a/grep.c
+++ b/grep.c
@@ -60,7 +60,9 @@ static int parse_pattern_type_arg(const char *opt, const char 
*arg)
return GREP_PATTERN_TYPE_ERE;
else if (!strcmp(arg, "fixed"))
return GREP_PATTERN_TYPE_FIXED;
-   else if (!strcmp(arg, "perl"))
+   else if (!strcmp(arg, "perl") ||
+!strcmp(arg, "pcre") ||
+!strcmp(arg, "pcre1"))
return GREP_PATTERN_TYPE_PCRE;
die("bad %s argument: %s", opt, arg);
 }
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index ec8fe585a7..7199356d35 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -1512,4 +1512,14 @@ test_expect_success 'grep does not report i-t-a and 
assume unchanged with -L' '
test_cmp expected actual
 '
 
+test_expect_success LIBPCRE "grep with grep.patternType synonyms 
perl/pcre/pcre1" '
+   echo "#include " >expected &&
+   git -c grep.patternType=perl  grep -h --no-line-number "st(?=dio)" 
>actual &&
+   test_cmp expected actual &&
+   git -c grep.patternType=pcre  grep -h --no-line-number "st(?=dio)" 
>actual &&
+   test_cmp expected actual &&
+   git -c grep.patternType=pcre1 grep -h --no-line-number "st(?=dio)" 
>actual &&
+   test_cmp expected actual
+'
+
 test_done
diff --git a/t/t7814-grep-recurse-submodules.sh 
b/t/t7814-grep-recurse-submodules.sh
index e1822feaf1..325fde53ef 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -358,6 +358,10 @@ test_expect_success 'grep --recurse-submodules should pass 
the pattern type alon
EOF
test_cmp expect actual &&
git -c grep.patternType=perl grep --recurse-submodules -e 
"(.|.)[\d]" >actual &&
+   test_cmp expect actual &&
+   git -c grep.patternType=pcre grep --recurse-submodules -e 
"(.|.)[\d]" >actual &&
+   test_cmp expect actual &&
+   git -c grep.patternType=pcre1 grep --recurse-submodules -e 
"(.|.)[\d]" >actual &&
test_cmp expect actual
fi
 '
-- 
2.11.0

[PATCH v4 08/19] log: add exhaustive tests for pattern style options & config

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Add exhaustive tests for how the different grep.patternType options &
the corresponding command-line options affect git-log.

Before this change it was possible to patch revision.c so that the
--basic-regexp option was synonymous with --extended-regexp, and
--perl-regexp wasn't recognized at all, and still have 100% of the
test suite pass.

This was because the first test being modified here, added in commit
34a4ae55b2 ("log --grep: use the same helper to set -E/-F options as
"git grep"", 2012-10-03), didn't actually check whether we'd enabled
extended regular expressions as distinct from re-toggling non-fixed
string support.

Fix that by changing the pattern to a pattern that'll only match if
--extended-regexp option is provided, but won't match under the
default --basic-regexp option.

Other potential regressions were possible since there were no tests
for the rest of the combinations of grep.patternType configuration
toggles & corresponding git-log command-line options. Add exhaustive
tests for those.

The patterns being passed to fixed/basic/extended/PCRE are carefully
crafted to return the wrong thing if the grep engine were to pick any
other matching method than the one it's told to use.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t4202-log.sh | 77 +-
 1 file changed, 76 insertions(+), 1 deletion(-)

diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index f577990716..fc186f10ea 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -262,7 +262,23 @@ test_expect_success 'log --grep -i' '
 
 test_expect_success 'log -F -E --grep= uses ere' '
echo second >expect &&
-   git log -1 --pretty="tformat:%s" -F -E --grep=s.c.nd >actual &&
+   git log -1 --pretty="tformat:%s" -F -E --grep="(s).c.nd" >actual &&
+   test_cmp expect actual
+'
+
+test_expect_success LIBPCRE 'log -F -E --perl-regexp --grep= uses PCRE' '
+   test_when_finished "rm -rf num_commits" &&
+   git init num_commits &&
+   (
+   cd num_commits &&
+   test_commit 1d &&
+   test_commit 2e
+   ) &&
+   echo 2e >expect &&
+   git -C num_commits log -1 --pretty="tformat:%s" -F -E --perl-regexp 
--grep="[\d]" >actual &&
+   test_cmp expect actual &&
+   echo 1d >expect &&
+   git -C num_commits log -1 --pretty="tformat:%s" -F -E --grep="[\d]" 
>actual &&
test_cmp expect actual
 '
 
@@ -280,6 +296,65 @@ test_expect_success 'log with grep.patternType 
configuration and command line' '
test_cmp expect actual
 '
 
+test_expect_success 'log with various grep.patternType configurations & 
command-lines' '
+   git init pattern-type &&
+   (
+   cd pattern-type &&
+   test_commit 1 file A &&
+   test_commit "(1|2)" file B &&
+
+   echo "(1|2)" >expect.fixed &&
+   cp expect.fixed expect.basic &&
+   cp expect.fixed expect.extended &&
+   cp expect.fixed expect.perl &&
+
+   git -c grep.patternType=fixed log --pretty=tformat:%s \
+   --grep="(1|2)" >actual.fixed &&
+   git -c grep.patternType=basic log --pretty=tformat:%s \
+   --grep="(.|.)" >actual.basic &&
+   git -c grep.patternType=extended log --pretty=tformat:%s \
+   --grep="\|2" >actual.extended &&
+   if test_have_prereq LIBPCRE
+   then
+   git -c grep.patternType=perl log --pretty=tformat:%s \
+   --grep="[\d]\|" >actual.perl
+   fi &&
+   test_cmp expect.fixed actual.fixed &&
+   test_cmp expect.basic actual.basic &&
+   test_cmp expect.extended actual.extended &&
+   if test_have_prereq LIBPCRE
+   then
+   test_cmp expect.perl actual.perl
+   fi &&
+
+   git log --pretty=tformat:%s -F \
+   --grep="(1|2)" >actual.fixed.short-arg &&
+   git log --pretty=tformat:%s -E \
+   --grep="\|2" >actual.extended.short-arg &&
+   test_cmp expect.fixed actual.fixed.short-arg &&
+   test_cmp expect.extended actual.extended.short-arg &&
+
+   git log --pretty=tformat:%s --fixed-strings \
+   --grep="(1|2)" >actual.fixed.long-arg &&
+   git log --pretty=tformat:%s --basic-regexp \
+   --grep="(.|.)" >actual.basic.long-arg &&
+   git log --pretty=tformat:%s --extended-regexp \
+   --grep="\|2" >actual.extended.long-arg &&
+   if test_have_prereq LIBPCRE
+   then
+   git log --pretty=tformat:%s --perl-regexp \
+   --grep="[\d]\|" >actual.perl.long-arg
+   fi &&
+   test_cmp expect.fixed actual.fixed.long-arg &&
+

[PATCH v4 12/19] test-lib: rename the LIBPCRE prerequisite to PCRE

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Rename the LIBPCRE prerequisite to PCRE. This is for preparation for
libpcre2 support, where having just "LIBPCRE" would be confusing as it
implies v1 of the library.

None of these tests are incompatible between versions 1 & 2 of
libpcre, it's less confusing to give them a more general name to make
it clear that they work on both library versions.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/README   |  4 ++--
 t/t4202-log.sh | 10 +-
 t/t7810-grep.sh| 32 
 t/t7812-grep-icase-non-ascii.sh|  4 ++--
 t/t7813-grep-icase-iso.sh  |  2 +-
 t/t7814-grep-recurse-submodules.sh |  2 +-
 t/test-lib.sh  |  2 +-
 7 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/t/README b/t/README
index ab386c3681..a90cb62583 100644
--- a/t/README
+++ b/t/README
@@ -803,9 +803,9 @@ use these, and "test_set_prereq" for how to define your own.
Test is not run by root user, and an attempt to write to an
unwritable file is expected to fail correctly.
 
- - LIBPCRE
+ - PCRE
 
-   Git was compiled with USE_LIBPCRE=YesPlease. Wrap any tests
+   Git was compiled with support for PCRE. Wrap any tests
that use git-grep --perl-regexp or git-grep -P in these.
 
  - CASE_INSENSITIVE_FS
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 06acc848b8..0b5f808ca3 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -266,7 +266,7 @@ test_expect_success 'log -F -E --grep= uses ere' '
test_cmp expect actual
 '
 
-test_expect_success LIBPCRE 'log -F -E --perl-regexp --grep= uses PCRE' '
+test_expect_success PCRE 'log -F -E --perl-regexp --grep= uses PCRE' '
test_when_finished "rm -rf num_commits" &&
git init num_commits &&
(
@@ -314,7 +314,7 @@ test_expect_success 'log with various grep.patternType 
configurations & command-
--grep="(.|.)" >actual.basic &&
git -c grep.patternType=extended log --pretty=tformat:%s \
--grep="\|2" >actual.extended &&
-   if test_have_prereq LIBPCRE
+   if test_have_prereq PCRE
then
git -c grep.patternType=perl log --pretty=tformat:%s \
--grep="[\d]\|" >actual.perl
@@ -322,7 +322,7 @@ test_expect_success 'log with various grep.patternType 
configurations & command-
test_cmp expect.fixed actual.fixed &&
test_cmp expect.basic actual.basic &&
test_cmp expect.extended actual.extended &&
-   if test_have_prereq LIBPCRE
+   if test_have_prereq PCRE
then
test_cmp expect.perl actual.perl
fi &&
@@ -349,7 +349,7 @@ test_expect_success 'log with various grep.patternType 
configurations & command-
--grep="(.|.)" >actual.basic.long-arg &&
git log --pretty=tformat:%s --extended-regexp \
--grep="\|2" >actual.extended.long-arg &&
-   if test_have_prereq LIBPCRE
+   if test_have_prereq PCRE
then
git log --pretty=tformat:%s --perl-regexp \
--grep="[\d]\|" >actual.perl.long-arg
@@ -357,7 +357,7 @@ test_expect_success 'log with various grep.patternType 
configurations & command-
test_cmp expect.fixed actual.fixed.long-arg &&
test_cmp expect.basic actual.basic.long-arg &&
test_cmp expect.extended actual.extended.long-arg &&
-   if test_have_prereq LIBPCRE
+   if test_have_prereq PCRE
then
test_cmp expect.perl actual.perl.long-arg
fi
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 7199356d35..f929b447e9 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -275,7 +275,7 @@ do
test_cmp expected actual
'
 
-   test_expect_success LIBPCRE "grep $L with grep.patterntype=perl" '
+   test_expect_success PCRE "grep $L with grep.patterntype=perl" '
echo "${HC}ab:a+b*c" >expected &&
git -c grep.patterntype=perl grep "a\x{2b}b\x{2a}c" $H ab 
>actual &&
test_cmp expected actual
@@ -1053,12 +1053,12 @@ hello.c:int main(int argc, const char **argv)
 hello.c:   printf("Hello world.\n");
 EOF
 
-test_expect_success LIBPCRE 'grep --perl-regexp pattern' '
+test_expect_success PCRE 'grep --perl-regexp pattern' '
git grep --perl-regexp "\p{Ps}.*?\p{Pe}" hello.c >actual &&
test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P pattern' '
+test_expect_success PCRE 'grep -P pattern' '
git grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual &&
test_cmp expected actual
 '
@@ -1070,13 +1070,13 @@ test_expect_success 'grep pattern with 
grep.extendedRegexp=true' '
test_cmp empty actual

[PATCH v4 10/19] grep & rev-list doc: stop promising libpcre for --perl-regexp

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Stop promising in our grep & rev-list options documentation that we're
always going to be using libpcre when given the --perl-regexp option.

Instead talk about using "Perl-compatible regular expressions" and
using these types of patterns using "a compile-time dependency".

Saying "libpcre" strongly suggests that we might be talking about
libpcre.so, which is always going to be v1. This change is part of a
series to add support for libpcre2 which comes with v2 of PCRE.

In the future we might use some completely unrelated library to
provide perl-compatible regular expression support. By wording the
documentation differently and not promising any specific version of
PCRE or ever PCRE at all we have more wiggle room to change the
implementation.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/git-grep.txt | 7 +--
 Documentation/rev-list-options.txt | 8 ++--
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 71f32f3508..5033483db4 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -161,8 +161,11 @@ OPTIONS
 
 -P::
 --perl-regexp::
-   Use Perl-compatible regexp for patterns. Requires libpcre to be
-   compiled in.
+   Use Perl-compatible regular expressions for patterns.
++
+Support for these types of regular expressions is an optional
+compile-time dependency. If Git wasn't compiled with support for them
+providing this option will cause it to die.
 
 -F::
 --fixed-strings::
diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 5b7fa49a7f..9c44eae55d 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -93,8 +93,12 @@ endif::git-rev-list[]
 
 -P::
 --perl-regexp::
-   Consider the limiting patterns to be Perl-compatible regular 
expressions.
-   Requires libpcre to be compiled in.
+   Consider the limiting patterns to be Perl-compatible regular
+   expressions.
++
+Support for these types of regular expressions is an optional
+compile-time dependency. If Git wasn't compiled with support for them
+providing this option will cause it to die.
 
 --remove-empty::
Stop when a given path disappears from the tree.
-- 
2.11.0

[PATCH v4 14/19] grep: change the internal PCRE code & header names to be PCRE1

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Change the internal PCRE variable & function names to have a "1"
suffix. This is for preparation for libpcre2 support, where having
non-versioned names would be confusing.

The earlier "grep: change the internal PCRE macro names to be PCRE1"
change elaborates on the motivations behind this commit.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/grep.c |  4 ++--
 grep.c | 56 
 grep.h | 10 +-
 revision.c |  2 +-
 4 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index be3dbd6957..28e0dd3236 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -490,7 +490,7 @@ static void compile_submodule_options(const struct grep_opt 
*opt,
case GREP_PATTERN_TYPE_FIXED:
argv_array_push(&submodule_options, "-F");
break;
-   case GREP_PATTERN_TYPE_PCRE:
+   case GREP_PATTERN_TYPE_PCRE1:
argv_array_push(&submodule_options, "-P");
break;
case GREP_PATTERN_TYPE_UNSPECIFIED:
@@ -1022,7 +1022,7 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
GREP_PATTERN_TYPE_FIXED),
OPT_SET_INT('P', "perl-regexp", &pattern_type_arg,
N_("use Perl-compatible regular expressions"),
-   GREP_PATTERN_TYPE_PCRE),
+   GREP_PATTERN_TYPE_PCRE1),
OPT_GROUP(""),
OPT_BOOL('n', "line-number", &opt.linenum, N_("show line 
numbers")),
OPT_NEGBIT('h', NULL, &opt.pathname, N_("don't show 
filenames"), 1),
diff --git a/grep.c b/grep.c
index a378b7edaa..c61e69deaa 100644
--- a/grep.c
+++ b/grep.c
@@ -63,7 +63,7 @@ static int parse_pattern_type_arg(const char *opt, const char 
*arg)
else if (!strcmp(arg, "perl") ||
 !strcmp(arg, "pcre") ||
 !strcmp(arg, "pcre1"))
-   return GREP_PATTERN_TYPE_PCRE;
+   return GREP_PATTERN_TYPE_PCRE1;
die("bad %s argument: %s", opt, arg);
 }
 
@@ -180,23 +180,23 @@ static void grep_set_pattern_type_option(enum 
grep_pattern_type pattern_type, st
 
case GREP_PATTERN_TYPE_BRE:
opt->fixed = 0;
-   opt->pcre = 0;
+   opt->pcre1 = 0;
break;
 
case GREP_PATTERN_TYPE_ERE:
opt->fixed = 0;
-   opt->pcre = 0;
+   opt->pcre1 = 0;
opt->regflags |= REG_EXTENDED;
break;
 
case GREP_PATTERN_TYPE_FIXED:
opt->fixed = 1;
-   opt->pcre = 0;
+   opt->pcre1 = 0;
break;
 
-   case GREP_PATTERN_TYPE_PCRE:
+   case GREP_PATTERN_TYPE_PCRE1:
opt->fixed = 0;
-   opt->pcre = 1;
+   opt->pcre1 = 1;
break;
}
 }
@@ -324,7 +324,7 @@ static NORETURN void compile_regexp_failed(const struct 
grep_pat *p,
 }
 
 #ifdef USE_LIBPCRE1
-static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
+static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt 
*opt)
 {
const char *error;
int erroffset;
@@ -332,23 +332,23 @@ static void compile_pcre_regexp(struct grep_pat *p, const 
struct grep_opt *opt)
 
if (opt->ignore_case) {
if (has_non_ascii(p->pattern))
-   p->pcre_tables = pcre_maketables();
+   p->pcre1_tables = pcre_maketables();
options |= PCRE_CASELESS;
}
if (is_utf8_locale() && has_non_ascii(p->pattern))
options |= PCRE_UTF8;
 
-   p->pcre_regexp = pcre_compile(p->pattern, options, &error, &erroffset,
- p->pcre_tables);
-   if (!p->pcre_regexp)
+   p->pcre1_regexp = pcre_compile(p->pattern, options, &error, &erroffset,
+ p->pcre1_tables);
+   if (!p->pcre1_regexp)
compile_regexp_failed(p, error);
 
-   p->pcre_extra_info = pcre_study(p->pcre_regexp, 0, &error);
-   if (!p->pcre_extra_info && error)
+   p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, &error);
+   if (!p->pcre1_extra_info && error)
die("%s", error);
 }
 
-static int pcrematch(struct grep_pat *p, const char *line, const char *eol,
+static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
regmatch_t *match, int eflags)
 {
int ovector[30], ret, flags = 0;
@@ -356,7 +356,7 @@ static int pcrematch(struct grep_pat *p, const char *line, 
const char *eol,
if (eflags & REG_NOTBOL)
flags |= PCRE_NOTBOL;
 
-   ret = pcre_exec(p->pcre_regexp, p->pcre_extra_info, line, eol - line,
+   ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line,
0, flags, ovecto

[PATCH v4 13/19] grep: change the internal PCRE macro names to be PCRE1

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Change the internal USE_LIBPCRE define, & build options flag to use a
naming convention ending in PCRE1, without changing the long-standing
USE_LIBPCRE Makefile flag which enables this code.

This is for preparation for libpcre2 support where having things like
USE_LIBPCRE and USE_LIBPCRE2 in any more places than we absolutely
need to for backwards compatibility with old Makefile arguments would
be confusing.

In some ways it would be better to change everything that now uses
USE_LIBPCRE to use USE_LIBPCRE1, and to make specifying
USE_LIBPCRE (or --with-pcre) an error. This would impose a one-time
burden on packagers of git to s/USE_LIBPCRE/USE_LIBPCRE1/ in their
build scripts.

However I'd like to leave the door open to making
USE_LIBPCRE=YesPlease eventually mean USE_LIBPCRE2=YesPlease,
i.e. once PCRE v2 is ubiquitous enough that it makes sense to make it
the default.

This code and the USE_LIBPCRE Makefile argument was added in commit
63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09). At the time there was
no indication that the PCRE project would release an entirely new &
incompatible API around 3 years later.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile  | 4 ++--
 grep.c| 6 +++---
 grep.h| 2 +-
 t/test-lib.sh | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index 2e63b1cfcc..aecdfdcfe6 100644
--- a/Makefile
+++ b/Makefile
@@ -1086,7 +1086,7 @@ ifdef NO_LIBGEN_H
 endif
 
 ifdef USE_LIBPCRE
-   BASIC_CFLAGS += -DUSE_LIBPCRE
+   BASIC_CFLAGS += -DUSE_LIBPCRE1
ifdef LIBPCREDIR
BASIC_CFLAGS += -I$(LIBPCREDIR)/include
EXTLIBS += -L$(LIBPCREDIR)/$(lib) 
$(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib)
@@ -2238,7 +2238,7 @@ GIT-BUILD-OPTIONS: FORCE
@echo TAR=\''$(subst ','\'',$(subst ','\'',$(TAR)))'\' >>$@+
@echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+
@echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
-   @echo USE_LIBPCRE=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' 
>>$@+
+   @echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' 
>>$@+
@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst 
','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
diff --git a/grep.c b/grep.c
index 62c3749d9f..a378b7edaa 100644
--- a/grep.c
+++ b/grep.c
@@ -323,7 +323,7 @@ static NORETURN void compile_regexp_failed(const struct 
grep_pat *p,
die("%s'%s': %s", where, p->pattern, error);
 }
 
-#ifdef USE_LIBPCRE
+#ifdef USE_LIBPCRE1
 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 {
const char *error;
@@ -375,7 +375,7 @@ static void free_pcre_regexp(struct grep_pat *p)
pcre_free(p->pcre_extra_info);
pcre_free((void *)p->pcre_tables);
 }
-#else /* !USE_LIBPCRE */
+#else /* !USE_LIBPCRE1 */
 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 {
die("cannot use Perl-compatible regexes when not compiled with 
USE_LIBPCRE");
@@ -390,7 +390,7 @@ static int pcrematch(struct grep_pat *p, const char *line, 
const char *eol,
 static void free_pcre_regexp(struct grep_pat *p)
 {
 }
-#endif /* !USE_LIBPCRE */
+#endif /* !USE_LIBPCRE1 */
 
 static int is_fixed(const char *s, size_t len)
 {
diff --git a/grep.h b/grep.h
index 267534ca24..073b0e4c92 100644
--- a/grep.h
+++ b/grep.h
@@ -1,7 +1,7 @@
 #ifndef GREP_H
 #define GREP_H
 #include "color.h"
-#ifdef USE_LIBPCRE
+#ifdef USE_LIBPCRE1
 #include 
 #else
 typedef int pcre;
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 6abf1d1918..e5cfbcc36b 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1010,7 +1010,7 @@ esac
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PYTHON" && test_set_prereq PYTHON
-test -n "$USE_LIBPCRE" && test_set_prereq PCRE
+test -n "$USE_LIBPCRE1" && test_set_prereq PCRE
 test -z "$NO_GETTEXT" && test_set_prereq GETTEXT
 
 # Can we rely on git's output in the C locale?
-- 
2.11.0

[PATCH v4 05/19] grep: remove redundant `regflags &= ~REG_EXTENDED` assignments

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Remove redundant assignments to the "regflags" variable. There are no
code paths that have previously set the regflags to anything, and
certainly not to `|= REG_EXTENDED`.

This code gave the impression that it had to reset its environment,
but it doesn't. This dates back to the initial introduction of
git-grep in commit 5010cb5fcc ("built-in "git grep"", 2006-04-30).

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/grep.c b/grep.c
index 59ae7809f2..6995f0989a 100644
--- a/grep.c
+++ b/grep.c
@@ -179,7 +179,6 @@ static void grep_set_pattern_type_option(enum 
grep_pattern_type pattern_type, st
case GREP_PATTERN_TYPE_BRE:
opt->fixed = 0;
opt->pcre = 0;
-   opt->regflags &= ~REG_EXTENDED;
break;
 
case GREP_PATTERN_TYPE_ERE:
@@ -191,7 +190,6 @@ static void grep_set_pattern_type_option(enum 
grep_pattern_type pattern_type, st
case GREP_PATTERN_TYPE_FIXED:
opt->fixed = 1;
opt->pcre = 0;
-   opt->regflags &= ~REG_EXTENDED;
break;
 
case GREP_PATTERN_TYPE_PCRE:
@@ -417,7 +415,6 @@ static void compile_fixed_regexp(struct grep_pat *p, struct 
grep_opt *opt)
int regflags;
 
basic_regex_quote_buf(&sb, p->pattern);
-   regflags = opt->regflags & ~REG_EXTENDED;
if (opt->ignore_case)
regflags |= REG_ICASE;
err = regcomp(&p->regexp, sb.buf, regflags);
-- 
2.11.0

[PATCH v4 06/19] Makefile & configure: reword outdated comment about PCRE

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Reword an outdated comment which suggests that only git-grep can use
PCRE.

This comment was added back when PCRE support was initially added in
commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09), and was true
at the time.

It hasn't been telling the full truth since git-log learned to use
PCRE with --grep in commit 727b6fc3ed ("log --grep: accept
--basic-regexp and --perl-regexp", 2012-10-03), and more importantly
is likely to get more inaccurate over time as more use is made of PCRE
in other areas.

Reword it to be more future-proof, and to more clearly explain that
this enables user-initiated runtime behavior.

Copy/pasting this so much in configure.ac is lame, these Makefile-like
flags aren't even used by autoconf, just the corresponding
--with[out]-* options. But copy/pasting the comments that make sense
for the Makefile to configure.ac where they make less sence is the
pattern everything else follows in that file. I'm not going to war
against that as part of this change, just following the existing
pattern.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile |  6 --
 configure.ac | 12 
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index eb1a1a7cff..2e63b1cfcc 100644
--- a/Makefile
+++ b/Makefile
@@ -24,8 +24,10 @@ all::
 # Define NO_OPENSSL environment variable if you do not have OpenSSL.
 # This also implies BLK_SHA1.
 #
-# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
-# able to use Perl-compatible regular expressions.
+# Define USE_LIBPCRE if you have and want to use libpcre. Various
+# commands such as log and grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
 #
 # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
diff --git a/configure.ac b/configure.ac
index 128165529f..deeb968daa 100644
--- a/configure.ac
+++ b/configure.ac
@@ -250,8 +250,10 @@ AS_HELP_STRING([--with-openssl],[use OpenSSL library 
(default is YES)])
 AS_HELP_STRING([],  [ARG can be prefix for openssl library and 
headers]),
 GIT_PARSE_WITH([openssl]))
 
-# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
-# able to use Perl-compatible regular expressions.
+# Define USE_LIBPCRE if you have and want to use libpcre. Various
+# commands such as log and grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
 #
 # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
@@ -499,8 +501,10 @@ GIT_CONF_SUBST([NEEDS_SSL_WITH_CRYPTO])
 GIT_CONF_SUBST([NO_OPENSSL])
 
 #
-# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
-# able to use Perl-compatible regular expressions.
+# Define USE_LIBPCRE if you have and want to use libpcre. Various
+# commands such as log and grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
 #
 
 if test -n "$USE_LIBPCRE"; then
-- 
2.11.0

[PATCH v4 09/19] log: add -P as a synonym for --perl-regexp

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Add a short -P option as a synonym for the longer --perl-regexp, for
consistency with the options the corresponding grep invocations
accept.

This was intentionally omitted in commit 727b6fc3ed ("log --grep:
accept --basic-regexp and --perl-regexp", 2012-10-03) for unspecified
future use.

Since nothing has come along in over 4 1/2 years that's wanted to use
it, it's more valuable to make it consistent with "grep" than to keep
it open for future use, and to avoid the confusion of -P meaning
different things for grep & log, as is the case with the -G option.

As noted in the aforementioned commit the --basic-regexp option can't
have a corresponding -G argument, as the log command already uses that
for -G.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/rev-list-options.txt | 1 +
 revision.c | 2 +-
 t/t4202-log.sh | 9 +
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index a02f7324c0..5b7fa49a7f 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -91,6 +91,7 @@ endif::git-rev-list[]
Consider the limiting patterns to be fixed strings (don't interpret
pattern as a regular expression).
 
+-P::
 --perl-regexp::
Consider the limiting patterns to be Perl-compatible regular 
expressions.
Requires libpcre to be compiled in.
diff --git a/revision.c b/revision.c
index 7ff61ff5f7..03a3a012de 100644
--- a/revision.c
+++ b/revision.c
@@ -1995,7 +1995,7 @@ static int handle_revision_opt(struct rev_info *revs, int 
argc, const char **arg
DIFF_OPT_SET(&revs->diffopt, PICKAXE_IGNORE_CASE);
} else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {
revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_FIXED;
-   } else if (!strcmp(arg, "--perl-regexp")) {
+   } else if (!strcmp(arg, "--perl-regexp") || !strcmp(arg, "-P")) {
revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_PCRE;
} else if (!strcmp(arg, "--all-match")) {
revs->grep_filter.all_match = 1;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index fc186f10ea..06acc848b8 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -331,8 +331,17 @@ test_expect_success 'log with various grep.patternType 
configurations & command-
--grep="(1|2)" >actual.fixed.short-arg &&
git log --pretty=tformat:%s -E \
--grep="\|2" >actual.extended.short-arg &&
+   if test_have_prereq PCRE
+   then
+   git log --pretty=tformat:%s -P \
+   --grep="[\d]\|" >actual.perl.short-arg
+   fi &&
test_cmp expect.fixed actual.fixed.short-arg &&
test_cmp expect.extended actual.extended.short-arg &&
+   if test_have_prereq PCRE
+   then
+   test_cmp expect.perl actual.perl.short-arg
+   fi &&
 
git log --pretty=tformat:%s --fixed-strings \
--grep="(1|2)" >actual.fixed.long-arg &&
-- 
2.11.0

[PATCH v4 07/19] grep: add a test for backreferences in PCRE patterns

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Add a test for backreferences such as (.)\1 in PCRE patterns. This
test ensures that the PCRE_NO_AUTO_CAPTURE option isn't turned
on. Before this change turning it on would break these sort of
patterns, but wouldn't break any tests.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7810-grep.sh | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index cee42097b0..ec8fe585a7 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -1102,6 +1102,13 @@ test_expect_success LIBPCRE 'grep -P -w pattern' '
test_cmp expected actual
 '
 
+test_expect_success LIBPCRE 'grep -P backreferences work (the PCRE 
NO_AUTO_CAPTURE flag is not set)' '
+   git grep -P -h "(?P.)(?P=one)" hello_world >actual &&
+   test_cmp hello_world actual &&
+   git grep -P -h "(.)\1" hello_world >actual &&
+   test_cmp hello_world actual
+'
+
 test_expect_success 'grep -G invalidpattern properly dies ' '
test_must_fail git grep -G "a["
 '
-- 
2.11.0

[PATCH v4 04/19] grep: remove redundant regflags assignment under PCRE

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Remove a redundant assignment to the "regflags" variable. This
variable is only used for POSIX regular expression matching, not when
the PCRE library is used.

This redundant assignment was added as a result of copy/paste
programming in commit 84befcd0a4 ("grep: add a grep.patternType
configuration setting", 2012-08-03). That commit modified already
working code in commit cca2c172e0 ("git-grep: do not die upon -F/-P
when grep.extendedRegexp is set.", 2011-05-09) which didn't assign to
regflags when under PCRE.

Revert back to that behavior, more to reduce "wait this is used under
PCRE how?" confusion when reading the code, than to to save ourselves
trivial CPU cycles by removing one assignment.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/grep.c b/grep.c
index 47cee45067..59ae7809f2 100644
--- a/grep.c
+++ b/grep.c
@@ -197,7 +197,6 @@ static void grep_set_pattern_type_option(enum 
grep_pattern_type pattern_type, st
case GREP_PATTERN_TYPE_PCRE:
opt->fixed = 0;
opt->pcre = 1;
-   opt->regflags &= ~REG_EXTENDED;
break;
}
 }
-- 
2.11.0

[PATCH v4 00/19] PCRE v1 improvements & PCRE v2 support

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Trivial changes since last time. Just sending this because I'd like
the copy in pu updated. Changes noted below:

Ævar Arnfjörð Bjarmason (19):
  grep: amend submodule recursion test in preparation for rx engine
testing
  grep: add tests for grep pattern types being passed to submodules

A s/PCRE/LIBPCRE/ on the test_have_prereq, now makes sense with the
series in sequence (error added during rebasing).

  grep: submodule-related case statements should die if new fields are
added
  grep: remove redundant regflags assignment under PCRE
  grep: remove redundant `regflags &= ~REG_EXTENDED` assignments

NEW: Similarly to how we didn't need to set regflags under PCRE, we
were negating REG_EXTENDED under POSIX basic, without ever setting it
in the first place.

This was just as confusing as the PCRE oddity, so remove it.

  Makefile & configure: reword outdated comment about PCRE
  grep: add a test for backreferences in PCRE patterns
  log: add exhaustive tests for pattern style options & config
  log: add -P as a synonym for --perl-regexp
  grep & rev-list doc: stop promising libpcre for --perl-regexp
  grep: make grep.patternType=[pcre|pcre1] a synonym for "perl"
  test-lib: rename the LIBPCRE prerequisite to PCRE

One corresponding s/LIBPCRE/PCRE/ for the earlier change, see above.

  grep: change the internal PCRE macro names to be PCRE1
  grep: change the internal PCRE code & header names to be PCRE1
  perf: add a performance comparison test of grep -E and -P
  grep: add support for the PCRE v1 JIT API
  grep: add support for PCRE v2

We now give proper error messages via pcre2_get_error_message() when
pcre2_match() fails with errors other than "didn't match", the common
case for this is that the engine gave up on a pathological pattern /
input combination.

  grep: remove support for concurrent use of both PCRE v1 & v2
  Makefile & configure: make PCRE v2 the default PCRE implementation

Added more details to the commit message about why switching to PCRE
v2 by default is a good idea. I hadn't noticed before that deep bugs
in PCRE v1 are being WONTFIX'd on the bugtracker saying "nope, never
fixing thath in v1, switch to v2".

 Documentation/git-grep.txt |   7 +-
 Documentation/rev-list-options.txt |   9 +-
 Makefile   |  39 +--
 builtin/grep.c |   4 +
 configure.ac   |  81 --
 grep.c | 222 +++--
 grep.h |  32 +-
 revision.c |   2 +-
 t/README   |   4 +-
 t/perf/p7820-grep-engines.sh   |  25 +
 t/t4202-log.sh |  86 +-
 t/t7810-grep.sh|  41 ---
 t/t7812-grep-icase-non-ascii.sh|   4 +-
 t/t7813-grep-icase-iso.sh  |  11 +-
 t/t7814-grep-recurse-submodules.sh | 215 +--
 t/test-lib.sh  |   3 +-
 16 files changed, 613 insertions(+), 172 deletions(-)
 create mode 100755 t/perf/p7820-grep-engines.sh

-- 
2.11.0

[PATCH v4 03/19] grep: submodule-related case statements should die if new fields are added

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Change two case statements added in commit 0281e487fd ("grep:
optionally recurse into submodules", 2016-12-16) so that they die if
new GREP_PATTERN_* enum fields are added without updating them.

These case statements currently check for an exhaustive list of
fields, but if a new field is added it's easy to introduce a bug here
where the code will start subtly doing the wrong thing, e.g. if a new
pattern type is added we'll fall through to
GREP_PATTERN_TYPE_UNSPECIFIED, i.e. the "basic" POSIX regular
expressions.

This should arguably be done for the switch(opt->binary)
case-statement as well, but isn't trivial to add since that code isn't
currently working with an exhaustive list.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/grep.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index 3ffb5b4e81..be3dbd6957 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -495,6 +495,8 @@ static void compile_submodule_options(const struct grep_opt 
*opt,
break;
case GREP_PATTERN_TYPE_UNSPECIFIED:
break;
+   default:
+   die("BUG: Added a new grep pattern type without updating switch 
statement");
}
 
for (pattern = opt->pattern_list; pattern != NULL;
@@ -515,6 +517,8 @@ static void compile_submodule_options(const struct grep_opt 
*opt,
case GREP_PATTERN_BODY:
case GREP_PATTERN_HEAD:
break;
+   default:
+   die("BUG: Added a new grep token type without updating 
case statement");
}
}
 
-- 
2.11.0

[PATCH v4 01/19] grep: amend submodule recursion test in preparation for rx engine testing

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Amend the submodule recursion test added in commit 0281e487fd ("grep:
optionally recurse into submodules", 2016-12-16) to prepare it for
subsequent tests of whether it passes along the grep.patternType to
the submodule greps.

This is just the result of searching & replacing:

foobar -> (1|2)d(3|4)
foo-> (1|2)
bar-> (3|4)

Currently there's no tests for whether e.g. -P or -E is correctly
passed along, tests for that will be added in a follow-up change, but
first add content to the tests which will match differently under
different regex engines.

Reuse the pattern established in an earlier commit of mine in this
series ("log: add exhaustive tests for pattern style options &
config", 2017-04-07). The pattern "(.|.)[\d]" will match this content
differently under fixed/basic/extended & perl.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7814-grep-recurse-submodules.sh | 166 ++---
 1 file changed, 83 insertions(+), 83 deletions(-)

diff --git a/t/t7814-grep-recurse-submodules.sh 
b/t/t7814-grep-recurse-submodules.sh
index 5b6eb3a65e..3c580b38ba 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -9,13 +9,13 @@ submodules.
 . ./test-lib.sh
 
 test_expect_success 'setup directory structure and submodule' '
-   echo "foobar" >a &&
+   echo "(1|2)d(3|4)" >a &&
mkdir b &&
-   echo "bar" >b/b &&
+   echo "(3|4)" >b/b &&
git add a b &&
git commit -m "add a and b" &&
git init submodule &&
-   echo "foobar" >submodule/a &&
+   echo "(1|2)d(3|4)" >submodule/a &&
git -C submodule add a &&
git -C submodule commit -m "add a" &&
git submodule add ./submodule &&
@@ -24,18 +24,18 @@ test_expect_success 'setup directory structure and 
submodule' '
 
 test_expect_success 'grep correctly finds patterns in a submodule' '
cat >expect <<-\EOF &&
-   a:foobar
-   b/b:bar
-   submodule/a:foobar
+   a:(1|2)d(3|4)
+   b/b:(3|4)
+   submodule/a:(1|2)d(3|4)
EOF
 
-   git grep -e "bar" --recurse-submodules >actual &&
+   git grep -e "(3|4)" --recurse-submodules >actual &&
test_cmp expect actual
 '
 
 test_expect_success 'grep and basic pathspecs' '
cat >expect <<-\EOF &&
-   submodule/a:foobar
+   submodule/a:(1|2)d(3|4)
EOF
 
git grep -e. --recurse-submodules -- submodule >actual &&
@@ -44,7 +44,7 @@ test_expect_success 'grep and basic pathspecs' '
 
 test_expect_success 'grep and nested submodules' '
git init submodule/sub &&
-   echo "foobar" >submodule/sub/a &&
+   echo "(1|2)d(3|4)" >submodule/sub/a &&
git -C submodule/sub add a &&
git -C submodule/sub commit -m "add a" &&
git -C submodule submodule add ./sub &&
@@ -54,117 +54,117 @@ test_expect_success 'grep and nested submodules' '
git commit -m "updated submodule" &&
 
cat >expect <<-\EOF &&
-   a:foobar
-   b/b:bar
-   submodule/a:foobar
-   submodule/sub/a:foobar
+   a:(1|2)d(3|4)
+   b/b:(3|4)
+   submodule/a:(1|2)d(3|4)
+   submodule/sub/a:(1|2)d(3|4)
EOF
 
-   git grep -e "bar" --recurse-submodules >actual &&
+   git grep -e "(3|4)" --recurse-submodules >actual &&
test_cmp expect actual
 '
 
 test_expect_success 'grep and multiple patterns' '
cat >expect <<-\EOF &&
-   a:foobar
-   submodule/a:foobar
-   submodule/sub/a:foobar
+   a:(1|2)d(3|4)
+   submodule/a:(1|2)d(3|4)
+   submodule/sub/a:(1|2)d(3|4)
EOF
 
-   git grep -e "bar" --and -e "foo" --recurse-submodules >actual &&
+   git grep -e "(3|4)" --and -e "(1|2)d" --recurse-submodules >actual &&
test_cmp expect actual
 '
 
 test_expect_success 'grep and multiple patterns' '
cat >expect <<-\EOF &&
-   b/b:bar
+   b/b:(3|4)
EOF
 
-   git grep -e "bar" --and --not -e "foo" --recurse-submodules >actual &&
+   git grep -e "(3|4)" --and --not -e "(1|2)d" --recurse-submodules 
>actual &&
test_cmp expect actual
 '
 
 test_expect_success 'basic grep tree' '
cat >expect <<-\EOF &&
-   HEAD:a:foobar
-   HEAD:b/b:bar
-   HEAD:submodule/a:foobar
-   HEAD:submodule/sub/a:foobar
+   HEAD:a:(1|2)d(3|4)
+   HEAD:b/b:(3|4)
+   HEAD:submodule/a:(1|2)d(3|4)
+   HEAD:submodule/sub/a:(1|2)d(3|4)
EOF
 
-   git grep -e "bar" --recurse-submodules HEAD >actual &&
+   git grep -e "(3|4)" --recurse-submodules HEAD >actual &&
test_cmp expect actual
 '
 
 test_expect_success 'grep tree HEAD^' '
cat >expect <<-\EOF &&
-   HEAD^:a:foobar
-   HEAD^:b/b:bar
-   HEAD^:submodule/a:foobar
+   HEAD^:a:(1|2)d(3|4)
+   HEAD^:b/b:(3|4)
+   HEAD^:submodule/a:(1|2)d(3|4)
EOF
 
-   git grep -e "bar" --recurse-submodules HEAD^ >actual &&
+   git grep -e "(3|4)" --recurse-submodules HEAD^ >actua

[PATCH v4 02/19] grep: add tests for grep pattern types being passed to submodules

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Add testing for grep pattern types being correctly passed to
submodules. The pattern "(.|.)[\d]" matches differently under
fixed (not at all), and then matches different lines under
basic/extended & perl regular expressions, so this change asserts that
the pattern type is passed along correctly.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7814-grep-recurse-submodules.sh | 49 ++
 1 file changed, 49 insertions(+)

diff --git a/t/t7814-grep-recurse-submodules.sh 
b/t/t7814-grep-recurse-submodules.sh
index 3c580b38ba..e1822feaf1 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -313,4 +313,53 @@ test_incompatible_with_recurse_submodules ()
 test_incompatible_with_recurse_submodules --untracked
 test_incompatible_with_recurse_submodules --no-index
 
+test_expect_success 'grep --recurse-submodules should pass the pattern type 
along' '
+   # Fixed
+   test_must_fail git grep -F --recurse-submodules -e "(.|.)[\d]" &&
+   test_must_fail git -c grep.patternType=fixed grep --recurse-submodules 
-e "(.|.)[\d]" &&
+
+   # Basic
+   git grep -G --recurse-submodules -e "(.|.)[\d]" >actual &&
+   cat >expect <<-\EOF &&
+   a:(1|2)d(3|4)
+   submodule/a:(1|2)d(3|4)
+   submodule/sub/a:(1|2)d(3|4)
+   EOF
+   test_cmp expect actual &&
+   git -c grep.patternType=basic grep --recurse-submodules -e "(.|.)[\d]" 
>actual &&
+   test_cmp expect actual &&
+
+   # Extended
+   git grep -E --recurse-submodules -e "(.|.)[\d]" >actual &&
+   cat >expect <<-\EOF &&
+   .gitmodules:[submodule "submodule"]
+   .gitmodules:path = submodule
+   .gitmodules:url = ./submodule
+   a:(1|2)d(3|4)
+   submodule/.gitmodules:[submodule "sub"]
+   submodule/a:(1|2)d(3|4)
+   submodule/sub/a:(1|2)d(3|4)
+   EOF
+   test_cmp expect actual &&
+   git -c grep.patternType=extended grep --recurse-submodules -e 
"(.|.)[\d]" >actual &&
+   test_cmp expect actual &&
+   git -c grep.extendedRegexp=true grep --recurse-submodules -e 
"(.|.)[\d]" >actual &&
+   test_cmp expect actual &&
+
+   # Perl
+   if test_have_prereq LIBPCRE
+   then
+   git grep -P --recurse-submodules -e "(.|.)[\d]" >actual &&
+   cat >expect <<-\EOF &&
+   a:(1|2)d(3|4)
+   b/b:(3|4)
+   submodule/a:(1|2)d(3|4)
+   submodule/sub/a:(1|2)d(3|4)
+   EOF
+   test_cmp expect actual &&
+   git -c grep.patternType=perl grep --recurse-submodules -e 
"(.|.)[\d]" >actual &&
+   test_cmp expect actual
+   fi
+'
+
 test_done
-- 
2.11.0

Re: [PATCH v7] read-cache: force_verify_index_checksum

2017-04-25 Thread Johannes Schindelin

Hi,

On Mon, 24 Apr 2017, Junio C Hamano wrote:

> Jeff Hostetler  writes:
> 
> >>> +test_expect_success 'detect corrupt index file in fsck' '
> >>> +cp .git/index .git/index.backup &&
> >>> +test_when_finished "mv .git/index.backup .git/index" &&
> >>> +echo  > &&
> >>> +git add  &&
> >>> +sed -e "s///" .git/index >.git/index.yyy &&
> >>
> >> sed on a binary file? Sooner or later we are going to run into
> >> portability issues.
> >
> > In v5 of this patch series I used "perl" and it was suggested that I
> > use "sed" instead.  It doesn't matter to me which we use.  My testing
> > showed that it was safe, but that was only Linux.

I am sorry to hear that the Git mailing list's review gives you whiplash.

The problem with sed is that BSD sed behaves a bit differently than GNU
sed, and we quietly expect every contributor to be an expert in the
portability aspects of sed.

TBH I am quite surprised that anybody would have suggested to use sed
rather than Perl to edit binary files in the first place. In my opinion,
that was bad advice.

> > Does the mailing list have a preference for this ?
> 
> Instead of munging pathnames z* to y*, I'd prefer to see the actual
> checksum bytes at the end replaced in the index file.  After all
> that is what this test really cares about, and it ensures that the
> failure detected is due to checksum mismatch.

I see that v8 uses a Perl script again, and it is well written and
obvious.

Just in case that certain reviewers favor length over readability, let me
offer this snippet:

size=$(perl -e "print -s \".git/index\"") &&
dd if=/dev/zero of=.git/index bs=1 seek=$(($size-20) count=20

Since whatever hash will be used in the future is most likely larger than
20 bytes, this should still work fine (and even if somebody sane replaces
the SHA-1 of the index with a CRC-32 for the same benefit we have now, the
test will fail quickly and it is easy to replace the 20 by 4).

Ciao,
Dscho

Re: [PATCH] sequencer: require trailing NL in footers

2017-04-25 Thread Stefan Beller

On Tue, Apr 25, 2017 at 12:06 PM, Jonathan Tan  wrote:
> In commit 967dfd4 ("sequencer: use trailer's trailer layout",
> 2016-11-29), sequencer was taught to use the same mechanism as
> interpret-trailers to determine the nature of the trailer of a commit
> message (referred to as the "footer" in sequencer.c). However, the
> requirement that a footer end in a newline character was inadvertently
> removed. Restore that requirement.
>
> While writing this commit, I noticed that if the "ignore_footer"
> parameter in "has_conforming_footer" is greater than the distance
> between the trailer start and sb->len, "has_conforming_footer" will
> return an unexpected result. This does not occur in practice, because
> "ignore_footer" is either zero or the return value of an invocation to
> "ignore_non_trailer", which only skips empty lines and comment lines.
> This commit contains a comment explaining this in the function's
> documentation.
>
> Reported-by: Brian Norris 
> Signed-off-by: Jonathan Tan 
> ---
>
> Thanks for the bug report. Here's a fix - I've verified this with the
> way to reproduce provided in the original e-mail, and it seems to work
> now.
>
> The commit message of the referenced commit
> (7b309aef0463340d3ad5449d1f605d14e10a4225) does not end in a newline,
> which is probably why different behavior was observed with this commit
> (as compared to others).

Thanks for the fix. :)
Do we want to test for this use case in the future?

> @@ -151,6 +151,12 @@ static const char *get_todo_path(const struct 
> replay_opts *opts)
>   * Returns 1 for conforming footer
>   * Returns 2 when sob exists within conforming footer
>   * Returns 3 when sob exists within conforming footer as last entry
> + *
> + * A footer that does not end in a newline is considered non-conforming.
> + *
> + * ignore_footer, if not zero, should be the return value of an invocation to
> + * ignore_non_trailer. See the documentation of that function for more
> + * information.
>   */

Makes sense. Maybe s/ignore_non_trailer/ignore_non_trailer()/ which makes
it easier to recognize it as a function? I'd also drop the last
sentence as it is
implied in the previous sentence (sort of).

Thanks,
Stefan

Re: [PATCH v4 0/9] Introduce timestamp_t for timestamps

2017-04-25 Thread Johannes Schindelin

Hi Peff,

On Mon, 24 Apr 2017, Jeff King wrote:

> On Sun, Apr 23, 2017 at 08:29:11PM -0700, Junio C Hamano wrote:
> 
> > Johannes Schindelin  writes:
> > 
> > > Changes since v3:
> > >
> > > - fixed the fix in archive-zip.c that tried to report a too large
> > >   timestamp (and would have reported the uninitialized time_t instead)
> > >
> > > - adjusted the so-far forgotten each_reflog() function (that was
> > >   introduced after v1, in 80f2a6097c4 (t/helper: add test-ref-store to
> > >   test ref-store functions, 2017-03-26)) to use timestamp_t and PRItime,
> > >   too
> > >
> > > - removed the date_overflows() check from time_to_tm(), as it calls
> > >   gm_time_t() which already performs that check
> > >
> > > - the date_overflows() check in show_ident_date() was removed, as we do
> > >   not know at that point yet whether we use the system functions to
> > >   render the date or not (and there would not be a problem in the latter
> > >   case)
> > 
> > Assuming that the list consensus is to go with a separate
> > timestamp_t (for that added Cc for those whose comments I saw in an
> > earlier round), the patches looked mostly good (I didn't read with
> > fine toothed comb the largest one 6/8 to see if there were
> > inadvertent or missed conversions from ulong to timestamp_t,
> > though), modulo a few minor "huh?" comments I sent separately.
> > 
> > Will queue; thanks.
> 
> Sorry, I haven't read the series carefully yet (but from a skim I'm
> happy with the overall direction). It does seem to cause failures in
> t4212, though. For example:
> 
>   expecting success: 
>   commit=$(munge_author_date HEAD 18446744073709551617) &&
>   echo "Thu Jan 1 00:00:00 1970 +" >expect &&
>   git log -1 --format=%ad $commit >actual &&
>   test_cmp expect actual
>   
>   fatal: Timestamp too large for this system: 18446744073709551615
>   not ok 7 - date parser recognizes integer overflow
> 
> We used to convert overflows into a sentinel time, but now we die. I
> originally chose the sentinel approach because it lets you use the tools
> to examine and recover from the broken state. I could be convinced that
> dying is better, but clearly we'd need to at least update the tests.

Sorry, I dropped this patch in v5 because I did not want to fight this
battle.

Ciao,
Dscho

Re: [PATCH v4 8/9] Use uintmax_t for timestamps

2017-04-25 Thread Johannes Schindelin

Hi Junio,

On Mon, 24 Apr 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> >> Should we at least clamp in date_overflows() so that large values
> >> representable with timestamp_t that will become unrepresentable when
> >> we start allowing negative timestamps would be rejected?  That way we
> >> won't have to hear complaints from the people who used timestamps far
> >> in the future that we regressed the implementation for them by
> >> halving the possible timestamp range.
> >
> > Please note that the date_overflows() command only tests when we are
> > about to call system functions. I do not think that it does what you
> > think it does (namely, validate timestamps when they enter Git).
> 
> OK, then please read my question without date_overflows(), that is,
> "should we at least clamp the values with some new mechanism to leave
> the door open for supporting times before the epoch, even if (and
> especially if) we leave the use of signed integral type for timestamps
> out of the scope of this series?"

You are asking the wrong person because I do not care about timestamps
dating before the dawn of Unix.

In any case, it is a question unrelated to the work I performed in this
patch series: the raison d'être of these patches is to allow timestamps to
refer to dates that are currently insanely far in the future.

Ciao,
Dscho

Re: [PATCH v2] rebase -i: add config to abbreviate command-names

2017-04-25 Thread Johannes Schindelin

Hi Liam,

On Tue, 25 Apr 2017, Liam Beguin wrote:

> Add the 'rebase.abbrevCmd' boolean config option to allow `git rebase -i`
> to abbreviate the command-names in the instruction list.
> 
> This means that `git rebase -i` would print:
> p deadbee The oneline of this commit
> ...
> 
> instead of:
> pick deadbee The oneline of this commit
> ...
> 
> Using a single character command-name allows the lines to remain
> aligned, making the whole set more readable.
> 
> Signed-off-by: Liam Beguin 

Apart from either abbreviating commands after --edit-todo, or documenting
explicitly that the new config option only concerns the initial todo list,
there is another problem that just occurred to me: --exec.

When you call `git rebase -x "make DEVELOPER=1 -j15"`, the idea is to
append an "exec make DEVELOPER=1 -j15" line after every pick line. The
code in question looks like this:

add_exec_commands () {
{
first=t
while read -r insn rest
do
case $insn in
pick)
test -n "$first" ||
printf "%s" "$cmd"
;;
esac
printf "%s %s\n" "$insn" "$rest"
first=
done
printf "%s" "$cmd"
} <"$1" >"$1.new" &&
mv "$1.new" "$1"
}

Obviously, the git-rebase--interactive script expects at this point that
the command is spelled out, so your patch needs to change the `pick)` case
to `p|pick)`, I think.

In addition, since the rationale for the new option is to align the lines
better, the `exec` would need to be replaced by `x`, and as multiple `-x`
options are allowed, you would need something like this at the beginning
of `add_exec_commands`, too:

# abbreviate `exec` if rebase.abbrevCmd is true
test p != "$rebasecmd" ||
cmd="$(echo "$cmd" | sed 's/^exec/x/')"

Also:

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 475e874d5155..8b1877f2df91 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2614,6 +2614,25 @@ rebase.instructionFormat::
>   the instruction list during an interactive rebase.  The format will 
> automatically
>   have the long commit hash prepended to the format.
>  
> +rebase.abbrevCmd::

It does not fail to amuse that the term "abbrevCmd" is abbreviated
heavily itself. However, I would strongly suggest to avoid that. It would
be much more pleasant to call the config option rebase.abbreviateCommands

> +rebase.abbrevCmd::
> + If set to true, `git rebase -i` will abbreviate the command-names in the
> + instruction list. This means that instead of looking like this,

This is by no means your fault, but it is really horrible by how many
different names Git's documentation refers to the todo script, nothing
short of confusing. It is the todo script (which I called it initially,
maybe not a good name, but it has the merit of the longest tradition at
least), the todo list, the instruction sheet, the rebase script, the
instruction list... etc

However, the thing is called "todo list" elsewhere in the same file,
therefore lets try to avoid even more confusion and use that term instead
of "instruction list" here.

> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index 2c9c0165b5ab..9f3e82b79615 100644
> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -1210,6 +1210,10 @@ else
>   revisions=$onto...$orig_head
>   shortrevisions=$shorthead
>  fi
> +
> +rebasecmd=pick
> +test "$(git config --bool --get rebase.abbrevCmd)" = true && rebasecmd=p

A better name would be "pickcmd", as there are more rebase commands than
just `pick` and what we want here is really only associated with one of
those commands.

Ciao,
Johannes

Re: [PATCH] rebase -i: add config to abbreviate command name

2017-04-25 Thread Johannes Schindelin

Hi Liam,

On Mon, 24 Apr 2017, liam BEGUIN wrote:

> On Mon, 2017-04-24 at 12:26 +0200, Johannes Schindelin wrote:
> 
> > On Sun, 23 Apr 2017, Liam Beguin wrote:
> > 
> > > Add the 'rebase.abbrevCmd' boolean config option to allow the user
> > > to abbreviate the default command name while editing the
> > > 'git-rebase-todo' file.
> > 
> > This patch does not handle the `git rebase --edit-todo` subcommand.
> > Intentional?
> 
> After a little more investigation, I'm not sure what should be added for
> the `git rebase --edit-todo` subcommand. It seems like it uses the same
> text that was added the first time (with `git rebase -i`).

Well, it uses whatever the user may have edited. It may surprise users
that their `pick` does not get converted to `p` like all the original
commands.

Ciao,
Johannes

Re: [PATCH v7 2/2] run-command: don't try to execute directories

2017-04-25 Thread Brandon Williams

On 04/25, Jonathan Nieder wrote:
> Brandon Williams wrote:
> 
> > Subject: run-command: don't try to execute directories
> 
> nit: this is also about non-executable files, now.  That would mean
> something like
> 
>  run-command: don't try to execute directories or non-executable files
> 
> or
> 
>  run-command: restrict PATH search to files we can execute
> 
> [...]
> > Reported-by: Brian Hatfield 
> > Signed-off-by: Brandon Williams 
> > ---
> >  run-command.c  |  2 +-
> >  t/t0061-run-command.sh | 23 +++
> >  2 files changed, 24 insertions(+), 1 deletion(-)
> > 
> > diff --git a/run-command.c b/run-command.c
> > index a97d7bf9f..ec08e0951 100644
> > --- a/run-command.c
> > +++ b/run-command.c
> > @@ -137,7 +137,7 @@ static char *locate_in_PATH(const char *file)
> > }
> > strbuf_addstr(&buf, file);
> >  
> > -   if (!access(buf.buf, F_OK))
> > +   if (is_executable(buf.buf))
> > return strbuf_detach(&buf, NULL);
> 
> It's probably worth a docstring for this function to explain
> that this is not a complete emulation of execvp on Windows, since
> it doesn't look for .com and .exe files.
> 
> 
> >  
> > if (!*end)
> > diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
> > index 98c09dd98..fd5e43766 100755
> > --- a/t/t0061-run-command.sh
> > +++ b/t/t0061-run-command.sh
> > @@ -37,6 +37,29 @@ test_expect_success !MINGW 'run_command can run a script 
> > without a #! line' '
> > test_cmp empty err
> >  '
> >  
> > +test_expect_success 'run_command should not try to execute a directory' '
> > +   test_when_finished "rm -rf bin1 bin2 bin3" &&
> > +   mkdir -p bin1/greet bin2 bin3 &&
> > +   write_script bin2/greet <<-\EOF &&
> > +   cat bin2/greet
> > +   EOF
> > +   chmod -x bin2/greet &&
> 
> This probably implies that the test needs a POSIXPERM dependency.
> Should it be a separate test_expect_success case so that the other
> part can still run on Windows?
> 
> The rest looks good.  Thanks for your patient work.
> 
> With whatever subset of the changes described makes sense,
> Reviewed-by: Jonathan Nieder 
> 
> Thanks.
> 
> diff --git i/run-command.c w/run-command.c
> index ec08e09518..dbbaec932e 100644
> --- i/run-command.c
> +++ w/run-command.c
> @@ -117,6 +117,21 @@ static inline void close_pair(int fd[2])
>   close(fd[1]);
>  }
>  
> +/*
> + * Search $PATH for a command.  This emulates the path search that
> + * execvp would perform, without actually executing the command so it
> + * can be used before fork() to prepare to run a command using
> + * execve() or after execvp() to diagnose why it failed.
> + *
> + * The caller should ensure that file contains no directory
> + * separators.
> + *
> + * Returns NULL if the command could not be found.
> + *
> + * This should not be used on Windows, where the $PATH search rules
> + * are more complicated (e.g., a search for "foo" should find
> + * "foo.exe").
> + */
>  static char *locate_in_PATH(const char *file)
>  {
>   const char *p = getenv("PATH");
> diff --git i/t/t0061-run-command.sh w/t/t0061-run-command.sh
> index fd5e43766a..e48a207fae 100755
> --- i/t/t0061-run-command.sh
> +++ w/t/t0061-run-command.sh
> @@ -37,26 +37,33 @@ test_expect_success !MINGW 'run_command can run a script 
> without a #! line' '
>   test_cmp empty err
>  '
>  
> -test_expect_success 'run_command should not try to execute a directory' '
> +test_expect_success 'run_command does not try to execute a directory' '
>   test_when_finished "rm -rf bin1 bin2" &&
> - mkdir -p bin1/greet bin2 bin3 &&
> + mkdir -p bin1/greet bin2 &&
>   write_script bin2/greet <<-\EOF &&
>   cat bin2/greet
>   EOF
> - chmod -x bin2/greet &&
> - write_script bin3/greet <<-\EOF &&
> - cat bin3/greet
> +
> + PATH=$PWD/bin1:$PWD/bin2:$PATH \
> + test-run-command run-command greet >actual 2>err &&
> + test_cmp bin2/greet actual &&
> + test_cmp empty err
> +'
> +
> +test_expect_success POSIXPERM 'run_command passes over non-executable file' '
> + test_when_finished "rm -rf bin1 bin2" &&
> + mkdir -p bin1 bin2 &&
> + write_script bin1/greet <<-\EOF &&
> + cat bin1/greet
> + EOF
> + chmod -x bin1/greet &&
> + write_script bin2/greet <<-\EOF &&
> + cat bin2/greet
>   EOF
>  
> - # Test that run-command does not try to execute the "greet" directory in
> - # "bin1", or the non-executable file "greet" in "bin2", but rather
> - # correcty executes the "greet" script located in bin3.
> - (
> - PATH=$PWD/bin1:$PWD/bin2:$PWD/bin3:$PATH &&
> - export PATH &&
> - test-run-command run-command greet >actual 2>err
> - ) &&
> - test_cmp bin3/greet actual &&
> + PATH=$PWD/bin1:$PWD/bin2:$PATH \
> + test-run-command run-command greet >actual 2>err &&
> + test_cmp bin2/greet actual &&
>   test_cmp empty err
>  '
>  

Yeah the

BUG: wildmatches like foo///bar don't match properly due to internal optimizations

2017-04-25 Thread Ævar Arnfjörð Bjarmason

Thought I'd just start another thread for this rather than tack it
onto the pathalogical case thread.

In commit 4c251e5cb5 ("wildmatch: make /**/ match zero or more
directories", 2012-10-15) Duy added support for ** in globs.

One test-case for this is:

match 1 0 'foo/baz/bar' 'foo/**/**/bar'

I.e. foo/**/**/bar matches foo/baz/bar. However due to some
pre-pruning we do in pathspec/ls-tree we can't ever match it, because
the first thing we do is peel the first part of the path/pattern off,
i.e. foo/, and then match baz/bar against **/**/bar.

The monkeypatch at the end of this mail makes this case work, but
breaks some others, and more importantly now e.g. on git.git we have
to call wildmatch() ~3K times for all the files we track, instead of
just for stuff in my foo/ folder.

So I don't think this needs to be made to work, being able to prune
patterns like this is very useful, but:

a) the wildmatch tests are lacking because they just call the
low-level function without a helper, but that's not actually how it
gets invoked by anything that matters

b) If we're going to be using wildmatch but implicitly not supporting
some of its very expensive optimization-beating syntax we should
probably make that an error explicitly.

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index a6c70dbe9e..4285a09ccc 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -235,3 +235,3 @@ static void show_ce_entry(const char *tag, const
struct cache_entry *ce)
struct strbuf name = STRBUF_INIT;
-   int len = max_prefix_len;
+   int len = 0;
if (super_prefix)
@@ -640,2 +640,3 @@ int cmd_ls_files(int argc, const char **argv,
const char *cmd_prefix)
max_prefix_len = max_prefix ? strlen(max_prefix) : 0;
+   max_prefix_len = 0;

diff --git a/dir.c b/dir.c
index f451bfa48c..b4399f042c 100644
--- a/dir.c
+++ b/dir.c
@@ -86,3 +86,3 @@ int git_fnmatch(const struct pathspec_item *item,
return wildmatch(pattern, string,
-item->magic & PATHSPEC_ICASE ? WM_CASEFOLD : 0,
+item->magic & PATHSPEC_ICASE ?
WM_CASEFOLD : WM_PATHNAME,
 NULL);
@@ -316,3 +316,3 @@ static int match_pathspec_item(const struct
pathspec_item *item, int prefix,
!git_fnmatch(item, match, name,
-item->nowildcard_len - prefix))
+prefix))
return MATCHED_FNMATCH;

[PATCH] sequencer: require trailing NL in footers

2017-04-25 Thread Jonathan Tan

In commit 967dfd4 ("sequencer: use trailer's trailer layout",
2016-11-29), sequencer was taught to use the same mechanism as
interpret-trailers to determine the nature of the trailer of a commit
message (referred to as the "footer" in sequencer.c). However, the
requirement that a footer end in a newline character was inadvertently
removed. Restore that requirement.

While writing this commit, I noticed that if the "ignore_footer"
parameter in "has_conforming_footer" is greater than the distance
between the trailer start and sb->len, "has_conforming_footer" will
return an unexpected result. This does not occur in practice, because
"ignore_footer" is either zero or the return value of an invocation to
"ignore_non_trailer", which only skips empty lines and comment lines.
This commit contains a comment explaining this in the function's
documentation.

Reported-by: Brian Norris 
Signed-off-by: Jonathan Tan 
---

Thanks for the bug report. Here's a fix - I've verified this with the
way to reproduce provided in the original e-mail, and it seems to work
now.

The commit message of the referenced commit
(7b309aef0463340d3ad5449d1f605d14e10a4225) does not end in a newline,
which is probably why different behavior was observed with this commit
(as compared to others).

 sequencer.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/sequencer.c b/sequencer.c
index 77afecaeb..4cbe64b7f 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -151,6 +151,12 @@ static const char *get_todo_path(const struct replay_opts 
*opts)
  * Returns 1 for conforming footer
  * Returns 2 when sob exists within conforming footer
  * Returns 3 when sob exists within conforming footer as last entry
+ *
+ * A footer that does not end in a newline is considered non-conforming.
+ *
+ * ignore_footer, if not zero, should be the return value of an invocation to
+ * ignore_non_trailer. See the documentation of that function for more
+ * information.
  */
 static int has_conforming_footer(struct strbuf *sb, struct strbuf *sob,
int ignore_footer)
@@ -159,6 +165,11 @@ static int has_conforming_footer(struct strbuf *sb, struct 
strbuf *sob,
int i;
int found_sob = 0, found_sob_last = 0;
 
+   if (sb->len <= ignore_footer)
+   return 0;
+   if (sb->buf[sb->len - ignore_footer - 1] != '\n')
+   return 0;
+
trailer_info_get(&info, sb->buf);
 
if (info.trailer_start == info.trailer_end)
-- 
2.13.0.rc0.306.g87b477812d-goog

Re: [PATCH v7 2/2] run-command: don't try to execute directories

2017-04-25 Thread Jonathan Nieder

Brandon Williams wrote:

> Subject: run-command: don't try to execute directories

nit: this is also about non-executable files, now.  That would mean
something like

 run-command: don't try to execute directories or non-executable files

or

 run-command: restrict PATH search to files we can execute

[...]
> Reported-by: Brian Hatfield 
> Signed-off-by: Brandon Williams 
> ---
>  run-command.c  |  2 +-
>  t/t0061-run-command.sh | 23 +++
>  2 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/run-command.c b/run-command.c
> index a97d7bf9f..ec08e0951 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -137,7 +137,7 @@ static char *locate_in_PATH(const char *file)
>   }
>   strbuf_addstr(&buf, file);
>  
> - if (!access(buf.buf, F_OK))
> + if (is_executable(buf.buf))
>   return strbuf_detach(&buf, NULL);

It's probably worth a docstring for this function to explain
that this is not a complete emulation of execvp on Windows, since
it doesn't look for .com and .exe files.


>  
>   if (!*end)
> diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
> index 98c09dd98..fd5e43766 100755
> --- a/t/t0061-run-command.sh
> +++ b/t/t0061-run-command.sh
> @@ -37,6 +37,29 @@ test_expect_success !MINGW 'run_command can run a script 
> without a #! line' '
>   test_cmp empty err
>  '
>  
> +test_expect_success 'run_command should not try to execute a directory' '
> + test_when_finished "rm -rf bin1 bin2 bin3" &&
> + mkdir -p bin1/greet bin2 bin3 &&
> + write_script bin2/greet <<-\EOF &&
> + cat bin2/greet
> + EOF
> + chmod -x bin2/greet &&

This probably implies that the test needs a POSIXPERM dependency.
Should it be a separate test_expect_success case so that the other
part can still run on Windows?

The rest looks good.  Thanks for your patient work.

With whatever subset of the changes described makes sense,
Reviewed-by: Jonathan Nieder 

Thanks.

diff --git i/run-command.c w/run-command.c
index ec08e09518..dbbaec932e 100644
--- i/run-command.c
+++ w/run-command.c
@@ -117,6 +117,21 @@ static inline void close_pair(int fd[2])
close(fd[1]);
 }
 
+/*
+ * Search $PATH for a command.  This emulates the path search that
+ * execvp would perform, without actually executing the command so it
+ * can be used before fork() to prepare to run a command using
+ * execve() or after execvp() to diagnose why it failed.
+ *
+ * The caller should ensure that file contains no directory
+ * separators.
+ *
+ * Returns NULL if the command could not be found.
+ *
+ * This should not be used on Windows, where the $PATH search rules
+ * are more complicated (e.g., a search for "foo" should find
+ * "foo.exe").
+ */
 static char *locate_in_PATH(const char *file)
 {
const char *p = getenv("PATH");
diff --git i/t/t0061-run-command.sh w/t/t0061-run-command.sh
index fd5e43766a..e48a207fae 100755
--- i/t/t0061-run-command.sh
+++ w/t/t0061-run-command.sh
@@ -37,26 +37,33 @@ test_expect_success !MINGW 'run_command can run a script 
without a #! line' '
test_cmp empty err
 '
 
-test_expect_success 'run_command should not try to execute a directory' '
+test_expect_success 'run_command does not try to execute a directory' '
test_when_finished "rm -rf bin1 bin2" &&
-   mkdir -p bin1/greet bin2 bin3 &&
+   mkdir -p bin1/greet bin2 &&
write_script bin2/greet <<-\EOF &&
cat bin2/greet
EOF
-   chmod -x bin2/greet &&
-   write_script bin3/greet <<-\EOF &&
-   cat bin3/greet
+
+   PATH=$PWD/bin1:$PWD/bin2:$PATH \
+   test-run-command run-command greet >actual 2>err &&
+   test_cmp bin2/greet actual &&
+   test_cmp empty err
+'
+
+test_expect_success POSIXPERM 'run_command passes over non-executable file' '
+   test_when_finished "rm -rf bin1 bin2" &&
+   mkdir -p bin1 bin2 &&
+   write_script bin1/greet <<-\EOF &&
+   cat bin1/greet
+   EOF
+   chmod -x bin1/greet &&
+   write_script bin2/greet <<-\EOF &&
+   cat bin2/greet
EOF
 
-   # Test that run-command does not try to execute the "greet" directory in
-   # "bin1", or the non-executable file "greet" in "bin2", but rather
-   # correcty executes the "greet" script located in bin3.
-   (
-   PATH=$PWD/bin1:$PWD/bin2:$PWD/bin3:$PATH &&
-   export PATH &&
-   test-run-command run-command greet >actual 2>err
-   ) &&
-   test_cmp bin3/greet actual &&
+   PATH=$PWD/bin1:$PWD/bin2:$PATH \
+   test-run-command run-command greet >actual 2>err &&
+   test_cmp bin2/greet actual &&
test_cmp empty err
 '

[PATCH v8] read-cache: call verify_hdr() in a background thread

2017-04-25 Thread git

From: Jeff Hostetler 

Version 8 of this patch converts the unit test to use
perl to corrupt the index checksum (rather than altering
a filename) and also verifies the fsck error message as
suggested in response to v7 on the mailing list.

If there are no other suggestions, I think this version
should be considered final.


Jeff Hostetler (1):
  read-cache: force_verify_index_checksum

 builtin/fsck.c  |  1 +
 cache.h |  2 ++
 read-cache.c|  7 +++
 t/t1450-fsck.sh | 32 
 4 files changed, 42 insertions(+)

-- 
2.9.3

[PATCH v8] read-cache: force_verify_index_checksum

2017-04-25 Thread git

From: Jeff Hostetler 

Teach git to skip verification of the SHA1-1 checksum at the end of
the index file in verify_hdr() which is called from read_index()
unless the "force_verify_index_checksum" global variable is set.

Teach fsck to force this verification.

The checksum verification is for detecting disk corruption, and for
small projects, the time it takes to compute SHA-1 is not that
significant, but for gigantic repositories this calculation adds
significant time to every command.

These effect can be seen using t/perf/p0002-read-cache.sh:

Test  HEAD~1HEAD
--
0002.1: read_cache/discard_cache 1000 times   0.66(0.44+0.20)   0.30(0.27+0.02) 
-54.5%

Signed-off-by: Jeff Hostetler 
---
 builtin/fsck.c  |  1 +
 cache.h |  2 ++
 read-cache.c|  7 +++
 t/t1450-fsck.sh | 32 
 4 files changed, 42 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 1a5cacc..5512d06 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -771,6 +771,7 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
}
 
if (keep_cache_objects) {
+   verify_index_checksum = 1;
read_cache();
for (i = 0; i < active_nr; i++) {
unsigned int mode;
diff --git a/cache.h b/cache.h
index 80b6372..87f13bf 100644
--- a/cache.h
+++ b/cache.h
@@ -685,6 +685,8 @@ extern void update_index_if_able(struct index_state *, 
struct lock_file *);
 extern int hold_locked_index(struct lock_file *, int);
 extern void set_alternate_index_output(const char *);
 
+extern int verify_index_checksum;
+
 /* Environment bits from configuration mechanism */
 extern int trust_executable_bit;
 extern int trust_ctime;
diff --git a/read-cache.c b/read-cache.c
index 9054369..c4205aa 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1371,6 +1371,9 @@ struct ondisk_cache_entry_extended {
ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
ondisk_cache_entry_size(ce_namelen(ce)))
 
+/* Allow fsck to force verification of the index checksum. */
+int verify_index_checksum;
+
 static int verify_hdr(struct cache_header *hdr, unsigned long size)
 {
git_SHA_CTX c;
@@ -1382,6 +1385,10 @@ static int verify_hdr(struct cache_header *hdr, unsigned 
long size)
hdr_version = ntohl(hdr->hdr_version);
if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
return error("bad index version %d", hdr_version);
+
+   if (!verify_index_checksum)
+   return 0;
+
git_SHA1_Init(&c);
git_SHA1_Update(&c, hdr, size - 20);
git_SHA1_Final(sha1, &c);
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 33a51c9..eff1cd6 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -689,4 +689,36 @@ test_expect_success 'bogus head does not fallback to all 
heads' '
! grep $blob out
 '
 
+# Corrupt the checksum on the index.
+# Add 1 to the last byte in the SHA.
+corrupt_index_checksum () {
+perl -w -e '
+   use Fcntl ":seek";
+   open my $fh, "+<", ".git/index" or die "open: $!";
+   binmode $fh;
+   seek $fh, -1, SEEK_END or die "seek: $!";
+   read $fh, my $in_byte, 1 or die "read: $!";
+
+   $in_value = unpack("C", $in_byte);
+   $out_value = ($in_value + 1) & 255;
+
+   $out_byte = pack("C", $out_value);
+
+   seek $fh, -1, SEEK_END or die "seek: $!";
+   print $fh $out_byte;
+   close $fh or die "close: $!";
+'
+}
+
+# Corrupt the checksum on the index and then
+# verify that only fsck notices.
+test_expect_success 'detect corrupt index file in fsck' '
+   cp .git/index .git/index.backup &&
+   test_when_finished "mv .git/index.backup .git/index" &&
+   corrupt_index_checksum &&
+   test_must_fail git fsck --cache 2>expect &&
+   grep "bad index file" expect &&
+   git status
+'
+
 test_done
-- 
2.9.3

Re: [PATCH v7 1/2] exec_cmd: expose is_executable function

2017-04-25 Thread Brandon Williams

On 04/25, Johannes Sixt wrote:
> Am 25.04.2017 um 19:54 schrieb Brandon Williams:
> >Move the logic for 'is_executable()' from help.c to exec_cmd.c and
> >expose it so that callers from outside help.c can access the function.
> 
> The function is quite low-level. IMO, run-command.[ch] would be a
> better home for it. Additionally, that would reduce the number of
> files that contain #ifdef GIT_WINDOWS_NATIVE.

Fair enough, Jonathan suggested the same so I'll move it to
run-command in a re-roll.

-- 
Brandon Williams

Re: What's cooking in git.git (Apr 2017, #04; Wed, 19)

2017-04-25 Thread Johannes Sixt


Am 25.04.2017 um 08:52 schrieb Junio C Hamano:

Johannes Sixt  writes:


The idea of marking git-gui and gitk histories that none of their
commits is checked out: it erases all Git source code from the working
directory, and a later bisection step places all code back and it
requires a full build. Not a big deal with Git, but there are much
larger code bases.

The current bisect behavior makes this idea unworkable. For me, it was
a big step backwards when it was implemented. :-(


I think it is sort-of unfair to blame bisect for that.  It talks
more about how "coolest merge evar!" cross project merges are
unworkable in practice.


No, it has nothing to do with the "coolest merge ever". The same problem 
occurs when I'm working on a topic branch:


 A----B   <- topic
/
 --g----G <- integration branch ("good by default")

While fiddling with B, I find a bug and suspect it somewhere on the 
topic from A to B. When A has a change that causes a substantial or 
complete rebuild, testing g causes burden for no good reason.


But, as I said, this is a tangent. I have no mission to change git bisect.

-- Hannes

Re: [PATCH v7 1/2] exec_cmd: expose is_executable function

2017-04-25 Thread Johannes Sixt


Am 25.04.2017 um 19:54 schrieb Brandon Williams:

Move the logic for 'is_executable()' from help.c to exec_cmd.c and
expose it so that callers from outside help.c can access the function.


The function is quite low-level. IMO, run-command.[ch] would be a better 
home for it. Additionally, that would reduce the number of files that 
contain #ifdef GIT_WINDOWS_NATIVE.


-- Hannes

Re: [PATCH v7 1/2] exec_cmd: expose is_executable function

2017-04-25 Thread Jonathan Nieder

Hi,

Brandon Williams wrote:

> Move the logic for 'is_executable()' from help.c to exec_cmd.c and
> expose it so that callers from outside help.c can access the function.
> This is to enable run-command to be able to query if a file is
> executable in a future patch.
>
> Signed-off-by: Brandon Williams 
> ---
>  exec_cmd.c | 42 ++
>  exec_cmd.h |  1 +
>  help.c | 42 --
>  3 files changed, 43 insertions(+), 42 deletions(-)

Makes sense.  Seems like as fine place for it.  (exec_cmd.c is mostly
an implementation detail of run_command, to hold the logic for
executing a git command.  This is another run_command helper, and it
doesn't feel illogical to find it there.  Another alternative place to
put t would be in run-command.c.)

> diff --git a/exec_cmd.c b/exec_cmd.c
> index fb94aeba9..6d9481e26 100644
> --- a/exec_cmd.c
> +++ b/exec_cmd.c
> @@ -149,3 +149,45 @@ int execl_git_cmd(const char *cmd,...)
>   argv[argc] = NULL;
>   return execv_git_cmd(argv);
>  }
> +
> +int is_executable(const char *name)

nit: it's a good practice to find a logical place for a new function in
the existing file, instead of defaulting to the end.  That way, the file
is easier to read sequentially, and if two different patches want to
add a function to the same file then they are less likely to conflict.

This could go before prepare_git_cmd, for example.

With or without such a tweak,
Reviewed-by: Jonathan Nieder 

diff --git i/exec_cmd.c w/exec_cmd.c
index 6d9481e26d..601fbc43bc 100644
--- i/exec_cmd.c
+++ w/exec_cmd.c
@@ -104,6 +104,48 @@ void setup_path(void)
strbuf_release(&new_path);
 }
 
+int is_executable(const char *name)
+{
+   struct stat st;
+
+   if (stat(name, &st) || /* stat, not lstat */
+   !S_ISREG(st.st_mode))
+   return 0;
+
+#if defined(GIT_WINDOWS_NATIVE)
+   /*
+* On Windows there is no executable bit. The file extension
+* indicates whether it can be run as an executable, and Git
+* has special-handling to detect scripts and launch them
+* through the indicated script interpreter. We test for the
+* file extension first because virus scanners may make
+* it quite expensive to open many files.
+*/
+   if (ends_with(name, ".exe"))
+   return S_IXUSR;
+
+{
+   /*
+* Now that we know it does not have an executable extension,
+* peek into the file instead.
+*/
+   char buf[3] = { 0 };
+   int n;
+   int fd = open(name, O_RDONLY);
+   st.st_mode &= ~S_IXUSR;
+   if (fd >= 0) {
+   n = read(fd, buf, 2);
+   if (n == 2)
+   /* look for a she-bang */
+   if (!strcmp(buf, "#!"))
+   st.st_mode |= S_IXUSR;
+   close(fd);
+   }
+}
+#endif
+   return st.st_mode & S_IXUSR;
+}
+
 const char **prepare_git_cmd(struct argv_array *out, const char **argv)
 {
argv_array_push(out, "git");
@@ -149,45 +191,3 @@ int execl_git_cmd(const char *cmd,...)
argv[argc] = NULL;
return execv_git_cmd(argv);
 }
-
-int is_executable(const char *name)
-{
-   struct stat st;
-
-   if (stat(name, &st) || /* stat, not lstat */
-   !S_ISREG(st.st_mode))
-   return 0;
-
-#if defined(GIT_WINDOWS_NATIVE)
-   /*
-* On Windows there is no executable bit. The file extension
-* indicates whether it can be run as an executable, and Git
-* has special-handling to detect scripts and launch them
-* through the indicated script interpreter. We test for the
-* file extension first because virus scanners may make
-* it quite expensive to open many files.
-*/
-   if (ends_with(name, ".exe"))
-   return S_IXUSR;
-
-{
-   /*
-* Now that we know it does not have an executable extension,
-* peek into the file instead.
-*/
-   char buf[3] = { 0 };
-   int n;
-   int fd = open(name, O_RDONLY);
-   st.st_mode &= ~S_IXUSR;
-   if (fd >= 0) {
-   n = read(fd, buf, 2);
-   if (n == 2)
-   /* look for a she-bang */
-   if (!strcmp(buf, "#!"))
-   st.st_mode |= S_IXUSR;
-   close(fd);
-   }
-}
-#endif
-   return st.st_mode & S_IXUSR;
-}
diff --git i/exec_cmd.h w/exec_cmd.h
index 48dd18a0d4..5e8200b952 100644
--- i/exec_cmd.h
+++ w/exec_cmd.h
@@ -7,11 +7,11 @@ extern void git_set_argv_exec_path(const char *exec_path);
 extern void git_extract_argv0_path(const char *path);
 extern const char *git_exec_path(void);
 extern void setup_path(void);
+extern int is_executable(const char *name);
 extern const char **prepare_git_cmd(struct argv_array *out, const char **argv);
 extern int execv_git_cmd(const char **argv); /* NULL terminated */
 LAST

[PATCH v7 2/2] run-command: don't try to execute directories

2017-04-25 Thread Brandon Williams

In some situations run-command will incorrectly try (and fail) to
execute a directory instead of an executable file.  This was observed by
having a directory called "ssh" in $PATH before the real ssh and trying
to use ssh protoccol, reslting in the following:

$ git ls-remote ssh://url
fatal: cannot exec 'ssh': Permission denied

It ends up being worse and run-command will even try to execute a
non-executable file if it preceeds the executable version of a file on
the PATH.  For example, if PATH=~/bin1:~/bin2:~/bin3 and there exists a
directory 'git-hello' in 'bin1', a non-executable file 'git-hello' in
bin2 and an executable file 'git-hello' (which prints "Hello World!") in
bin3 the following will occur:

$ git hello
fatal: cannot exec 'git-hello': Permission denied

This is due to only checking 'access()' when locating an executable in
PATH, which doesn't distinguish between files and directories.  Instead
use 'is_executable()' which check that the path is to a regular,
executable file.  Now run-command won't try to execute the directory or
non-executable file 'git-hello':

$ git hello
Hello World!

Reported-by: Brian Hatfield 
Signed-off-by: Brandon Williams 
---
 run-command.c  |  2 +-
 t/t0061-run-command.sh | 23 +++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/run-command.c b/run-command.c
index a97d7bf9f..ec08e0951 100644
--- a/run-command.c
+++ b/run-command.c
@@ -137,7 +137,7 @@ static char *locate_in_PATH(const char *file)
}
strbuf_addstr(&buf, file);
 
-   if (!access(buf.buf, F_OK))
+   if (is_executable(buf.buf))
return strbuf_detach(&buf, NULL);
 
if (!*end)
diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
index 98c09dd98..fd5e43766 100755
--- a/t/t0061-run-command.sh
+++ b/t/t0061-run-command.sh
@@ -37,6 +37,29 @@ test_expect_success !MINGW 'run_command can run a script 
without a #! line' '
test_cmp empty err
 '
 
+test_expect_success 'run_command should not try to execute a directory' '
+   test_when_finished "rm -rf bin1 bin2 bin3" &&
+   mkdir -p bin1/greet bin2 bin3 &&
+   write_script bin2/greet <<-\EOF &&
+   cat bin2/greet
+   EOF
+   chmod -x bin2/greet &&
+   write_script bin3/greet <<-\EOF &&
+   cat bin3/greet
+   EOF
+
+   # Test that run-command does not try to execute the "greet" directory in
+   # "bin1", or the non-executable file "greet" in "bin2", but rather
+   # correcty executes the "greet" script located in bin3.
+   (
+   PATH=$PWD/bin1:$PWD/bin2:$PWD/bin3:$PATH &&
+   export PATH &&
+   test-run-command run-command greet >actual 2>err
+   ) &&
+   test_cmp bin3/greet actual &&
+   test_cmp empty err
+'
+
 test_expect_success POSIXPERM 'run_command reports EACCES' '
cat hello-script >hello.sh &&
chmod -x hello.sh &&
-- 
2.13.0.rc0.306.g87b477812d-goog

[PATCH v7 1/2] exec_cmd: expose is_executable function

2017-04-25 Thread Brandon Williams

Move the logic for 'is_executable()' from help.c to exec_cmd.c and
expose it so that callers from outside help.c can access the function.
This is to enable run-command to be able to query if a file is
executable in a future patch.

Signed-off-by: Brandon Williams 
---
 exec_cmd.c | 42 ++
 exec_cmd.h |  1 +
 help.c | 42 --
 3 files changed, 43 insertions(+), 42 deletions(-)

diff --git a/exec_cmd.c b/exec_cmd.c
index fb94aeba9..6d9481e26 100644
--- a/exec_cmd.c
+++ b/exec_cmd.c
@@ -149,3 +149,45 @@ int execl_git_cmd(const char *cmd,...)
argv[argc] = NULL;
return execv_git_cmd(argv);
 }
+
+int is_executable(const char *name)
+{
+   struct stat st;
+
+   if (stat(name, &st) || /* stat, not lstat */
+   !S_ISREG(st.st_mode))
+   return 0;
+
+#if defined(GIT_WINDOWS_NATIVE)
+   /*
+* On Windows there is no executable bit. The file extension
+* indicates whether it can be run as an executable, and Git
+* has special-handling to detect scripts and launch them
+* through the indicated script interpreter. We test for the
+* file extension first because virus scanners may make
+* it quite expensive to open many files.
+*/
+   if (ends_with(name, ".exe"))
+   return S_IXUSR;
+
+{
+   /*
+* Now that we know it does not have an executable extension,
+* peek into the file instead.
+*/
+   char buf[3] = { 0 };
+   int n;
+   int fd = open(name, O_RDONLY);
+   st.st_mode &= ~S_IXUSR;
+   if (fd >= 0) {
+   n = read(fd, buf, 2);
+   if (n == 2)
+   /* look for a she-bang */
+   if (!strcmp(buf, "#!"))
+   st.st_mode |= S_IXUSR;
+   close(fd);
+   }
+}
+#endif
+   return st.st_mode & S_IXUSR;
+}
diff --git a/exec_cmd.h b/exec_cmd.h
index ff0b48048..48dd18a0d 100644
--- a/exec_cmd.h
+++ b/exec_cmd.h
@@ -12,5 +12,6 @@ extern int execv_git_cmd(const char **argv); /* NULL 
terminated */
 LAST_ARG_MUST_BE_NULL
 extern int execl_git_cmd(const char *cmd, ...);
 extern char *system_path(const char *path);
+extern int is_executable(const char *name);
 
 #endif /* GIT_EXEC_CMD_H */
diff --git a/help.c b/help.c
index bc6cd19cf..50f84b430 100644
--- a/help.c
+++ b/help.c
@@ -96,48 +96,6 @@ static void pretty_print_cmdnames(struct cmdnames *cmds, 
unsigned int colopts)
string_list_clear(&list, 0);
 }
 
-static int is_executable(const char *name)
-{
-   struct stat st;
-
-   if (stat(name, &st) || /* stat, not lstat */
-   !S_ISREG(st.st_mode))
-   return 0;
-
-#if defined(GIT_WINDOWS_NATIVE)
-   /*
-* On Windows there is no executable bit. The file extension
-* indicates whether it can be run as an executable, and Git
-* has special-handling to detect scripts and launch them
-* through the indicated script interpreter. We test for the
-* file extension first because virus scanners may make
-* it quite expensive to open many files.
-*/
-   if (ends_with(name, ".exe"))
-   return S_IXUSR;
-
-{
-   /*
-* Now that we know it does not have an executable extension,
-* peek into the file instead.
-*/
-   char buf[3] = { 0 };
-   int n;
-   int fd = open(name, O_RDONLY);
-   st.st_mode &= ~S_IXUSR;
-   if (fd >= 0) {
-   n = read(fd, buf, 2);
-   if (n == 2)
-   /* look for a she-bang */
-   if (!strcmp(buf, "#!"))
-   st.st_mode |= S_IXUSR;
-   close(fd);
-   }
-}
-#endif
-   return st.st_mode & S_IXUSR;
-}
-
 static void list_commands_in_dir(struct cmdnames *cmds,
 const char *path,
 const char *prefix)
-- 
2.13.0.rc0.306.g87b477812d-goog

Re: t0025 flaky on OSX

2017-04-25 Thread Lars Schneider


> On 24 Apr 2017, at 21:48, Torsten Bögershausen  wrote:
> 
> On 2017-04-24 19:00, Torsten Bögershausen wrote:
>> On 2017-04-24 18:45, Lars Schneider wrote:
>>> Hi,
>>> 
>>> "t0025.3 - crlf=true causes a CRLF file to be normalized" failed 
>>> sporadically on next and master recently: 
>>> https://travis-ci.org/git/git/jobs/225084459#L2382
>>> https://travis-ci.org/git/git/jobs/223830505#L2342
>>> 
>>> Are you aware of a race condition in the code
>>> or in the test?
>> Not yet - I'll have a look
>> 
> 
> So,
> The test failed under Linux & pu of the day, using Peff's
> stress script.
> 
> not ok 3 - crlf=true causes a CRLF file to be normalized
> 
> The good case (simplified):
> $ git status
>   modified:   CRLFonly
> 
> Untracked files:
>.gitattributes
> 
> 
> $ git diff | tr "\015" Q
> warning: CRLF will be replaced by LF in CRLFonly.
> The file will have its original line endings in your working directory.
> diff --git a/CRLFonly b/CRLFonly
> index 44fc21c..666dbf4 100644
> --- a/CRLFonly
> +++ b/CRLFonly
> @@ -1,7 +1,7 @@
> -IQ
> -amQ
> -veryQ
> -veryQ
> -fineQ
> -thankQ
> -youQ
> +I
> +am
> +very
> +very
> +fine
> +thank
> +you
> 
> 
> The failed case:
> $ git status
> Untracked files:
>.gitattributes
> 
> ---
> $ ls -al -i
> total 28
> 3430195 drwxr-xr-x 3 tb tb 4096 Apr 24 21:19 .
> 3427617 drwxr-xr-x 3 tb tb 4096 Apr 24 21:19 ..
> 3429958 -rw-r--r-- 1 tb tb   37 Apr 24 21:19 CRLFonly
> 3432574 drwxr-xr-x 8 tb tb 4096 Apr 24 21:27 .git
> 3425599 -rw-r--r-- 1 tb tb   14 Apr 24 21:19 .gitattributes
> 3430089 -rw-r--r-- 1 tb tb   24 Apr 24 21:19 LFonly
> 3430174 -rw-r--r-- 1 tb tb   36 Apr 24 21:19 LFwithNUL
> 
> -
> #After
> $ mv CRLFonly tmp
> $ cp tmp CRLFonly
> $ ls -al -i
> 3430195 drwxr-xr-x 3 tb tb 4096 Apr 24 21:36 .
> 3427617 drwxr-xr-x 3 tb tb 4096 Apr 24 21:19 ..
> 3401599 -rw-r--r-- 1 tb tb   37 Apr 24 21:36 CRLFonly
> 3432574 drwxr-xr-x 8 tb tb 4096 Apr 24 21:36 .git
> 3425599 -rw-r--r-- 1 tb tb   14 Apr 24 21:19 .gitattributes
> 3430089 -rw-r--r-- 1 tb tb   24 Apr 24 21:19 LFonly
> 3430174 -rw-r--r-- 1 tb tb   36 Apr 24 21:19 LFwithNUL
> 3429958 -rw-r--r-- 1 tb tb   37 Apr 24 21:19 tmp
> 
> $ git status
>   modified:   CRLFonly
> Untracked files:
>.gitattributes
>tmp
> 
> 
> So all in all it seams as if there is a very old race condition here,
> which we "never" have seen yet.
> Moving the file to a different inode number fixes the test case,
> Git doesn't treat it as unchanged any more.

Thanks a lot for investigating this! Would you mind posting the
fix as patch?

Thanks,
Lars

BUG: diff-{index,files,tree} (and git-gui) do not respect the diff.indentHeuristic config setting

2017-04-25 Thread Marc Branchaud


So I have

diff.indentHeuristic = true

and I noticed that git-gui was not using the heuristic.  This is because 
git-gui uses diff-index, and that does not respect the config setting, 
even though it supports the --indent-heuristic option.


And it looks like diff-files and diff-tree also have the same problem.

I tried a couple of quick-n-dirty things to fix it in diff-index, 
without success, and I've run out of git-hacking tame, so all I can do 
for now is throw out a bug report.


diff-index.c explicitly says "no 'diff' UI options" since 83ad63cfeb 
("diff: do not use configuration magic at the core-level", 2006-07-08), 
so maybe this needs to be fixed in git-gui (and maybe elsewhere), but to 
me it feels like the diff-foo commands should respect the setting.


(CC'ing Michael Haggerty, who added the heuristic.)

(This is git v2.12.2.)

M.

Re: [PATCH v3 4/5] archive-zip: support archives bigger than 4GB

2017-04-25 Thread René Scharfe


Am 25.04.2017 um 09:55 schrieb Peter Krefting:

René Scharfe:

This needs to be >=. The spec says that if the value is 0x, 
there should be a zip64 record with the actual size (even if it is 
0x).

Could you please cite the relevant part?


4.4.8 compressed size: (4 bytes)
4.4.9 uncompressed size: (4 bytes)

"If an archive is in ZIP64 format and the value in this field is 
0x, the size will be in the corresponding 8 byte ZIP64 extended 
information extra field."



Of course, there is no definition of how they define that "an archive is 
in ZIP64 format", but I would say that is whenever it has any ZIP64 
structures.


I struggled with that sentence as well.  There is no explicit "format"
field AFAICS.  The closest at the archive level are zip64 end of central
directory record and locator.  But what really matters is the presence
of a zip64 extended information extra field to hold the 64-bit size
value.

There's also this general note a bit higher up:

  "4.4.1.4  If one of the fields in the end of central directory
  record is too small to hold required data, the field should be
  set to -1 (0x or 0x) and the ZIP64 format record
  should be created."

My interpretation: An archiver that can only emit 32-bit ZIP files
(either because it doesn't support ZIP64 or due to a compatibility
option set by the user) writes 32-bit size fields and has no defined way
to deal with overflows.  An archiver that is allowed to use ZIP64 can
emit zip64 extras as needed.

Or in other words: A legacy ZIP archive and a ZIP64 archive can be
bit-wise the same if all values for all entries fit into the legacy
fields, but the difference in terms of the spec is what the archiver was
allowed to do when it created them.

# 4-byte sizes, not ZIP64
arch --format=zip ...

# ZIP64, can use 8-byte sizes as needed
arch --format=zip64 ...

Makes sense?

René

Re: Submodule/contents conflict

2017-04-25 Thread Stefan Beller

On Mon, Apr 24, 2017 at 7:12 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> A similar bug report
>> https://public-inbox.org/git/CAG0BQX=wvpkJ=pqwv-nbmhupv8yzvd_kykzjmsfwq9xstz2...@mail.gmail.com/
>>
>> "checkout --recurse-submodules" (as mentioned in that report)
>> made it into Git by now, but this bug goes unfixed, still.
>
> I do not mind reverting that merge out of 'master'.  Please holler
> if you feel these "go recursive" topics are premature.

The bugs reported here are not for the recursive checkout,
I was merely noting it as a side effect of time having passed. :)

I did not imply the recursive topics being premature.

Thanks,
Stefan

Re: [PATCH] submodule_init: die cleanly on submodules without url defined

2017-04-25 Thread Stefan Beller

On Mon, Apr 24, 2017 at 5:57 PM, Jeff King  wrote:
> When we init a submodule, we try to die when it has no URL
> defined:
>
>   url = xstrdup(sub->url);
>   if (!url)
>   die(...);
>
> But that's clearly nonsense. xstrdup() will never return
> NULL, and if sub->url is NULL, we'll segfault.
>
> These two bits of code need to be flipped, so we check
> sub->url before looking at it.
>
> Signed-off-by: Jeff King 
> ---

Makes sense. At the time I assumed xstrdup had a _or_null
behavior (i.e. if NULL goes in, it is not duplicated, but rather
return NULL)

Thanks,
Stefan

Re: [PATCH v6 1/8] pkt-line: add packet_read_line_gently()

2017-04-25 Thread Ben Peart

Sorry if you get this twice, somehow Thunderbird converted my response 
to HTML



On 4/24/2017 10:19 PM, Junio C Hamano wrote:

Ben Peart  writes:


On 4/24/2017 12:21 AM, Junio C Hamano wrote:

Ben Peart  writes:

+{ 

+   int len = packet_read(fd, NULL, NULL,
+ packet_buffer, sizeof(packet_buffer),
+ 
PACKET_READ_CHOMP_NEWLINE|PACKET_READ_GENTLE_ON_EOF);
+   if (dst_len)
+   *dst_len = len;
+   if (dst_line)
+   *dst_line = (len > 0) ? packet_buffer : NULL;

I have the same doubt as above for len == 0 case.

packet_read() returns -1 when PACKET_READ_GENTLE_ON_EOF is passed and
it hits truncated output from the remote process.

I know, but that is irrelevant to my question, which is about
CHOMP_NEWLINE.  I didn't even ask "why a negative len treated
specially?"  My question is about the case where len == 0.  Your
patch treats len==0 just like len==-1, i.e. an error, but I do not
know if that is correct, hence my question.  We both know len < 0
is an error and you do not need to waste time elaborating on it.




The packet_read_line() function returns NULL when len == 0 and 
PACKET_READ_CHOMP_NEWLINE is passed so I wrote the similar 
packet_read_line_gently() function to behave the same.


How about I update the comment in the function header to make this more 
clear:


diff --git a/pkt-line.h b/pkt-line.h
index 7c278a158b..b2965869ad 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -78,9 +78,10 @@ char *packet_read_line(int fd, int *size);
  * Convenience wrapper for packet_read that sets the 
PACKET_READ_GENTLE_ON_EOF
  * and CHOMP_NEWLINE options. The return value specifies the number of 
bytes
  * read into the buffer or -1 on truncated input. If the *dst_line 
parameter
- * is not NULL it will return NULL for a flush packet and otherwise 
points to
- * a static buffer (that may be overwritten by subsequent calls). If 
the size

- * parameter is not NULL, the length of the packet is written to it.
+ * is not NULL it will return NULL for a flush packet or when the number of
+ * bytes copied is zero and otherwise points to a static buffer (that 
may be

+ * overwritten by subsequent calls). If the size parameter is not NULL, the
+ * length of the packet is written to it.
  */
 int packet_read_line_gently(int fd, int *size, char **dst_line);

Re: [PATCH v2] rebase -i: add config to abbreviate command-names

2017-04-25 Thread Mike Rappazzo

On Tue, Apr 25, 2017 at 5:57 AM, Andreas Schwab  wrote:
> On Apr 25 2017, Liam Beguin  wrote:
>
>> Add the 'rebase.abbrevCmd' boolean config option to allow `git rebase -i`
>> to abbreviate the command-names in the instruction list.
>>
>> This means that `git rebase -i` would print:
>> p deadbee The oneline of this commit
>> ...
>>
>> instead of:
>> pick deadbee The oneline of this commit
>> ...
>>
>> Using a single character command-name allows the lines to remain
>> aligned, making the whole set more readable.
>
> Perhaps there should rather be an option to tell rebase to align the
> columns?
>

You _can_ set a custom instruction format using the config variable:
`rebase.instructionFormat`.  With this, you can align columns using
the normal git log format.

For example, I personally use this as my instruction format:

[%an%<|(64)%x5d %s

While, this won't always align perfectly, it may help scratch your itch.

[PATCH v2 9/9] rebase -i: rearrange fixup/squash lines using the rebase--helper

2017-04-25 Thread Johannes Schindelin

This operation has quadratic complexity, which is especially painful
on Windows, where shell scripts are *already* slow (mainly due to the
overhead of the POSIX emulation layer).

Let's reimplement this with linear complexity (using a hash map to
match the commits' subject lines) for the common case; Sadly, the
fixup/squash feature's design neglected performance considerations,
allowing arbitrary prefixes (read: `fixup! hell` will match the
commit subject `hello world`), which means that we are stuck with
quadratic performance in the worst case.

The reimplemented logic also happens to fix a bug where commented-out
lines (representing empty patches) were dropped by the previous code.

While at it, clarify how the fixup/squash feature works in `git rebase
-i`'s man page.

Signed-off-by: Johannes Schindelin 
---
 Documentation/git-rebase.txt |  16 ++--
 builtin/rebase--helper.c |   6 +-
 git-rebase--interactive.sh   |  90 +---
 sequencer.c  | 197 +++
 sequencer.h  |   1 +
 t/t3415-rebase-autosquash.sh |   2 +-
 6 files changed, 214 insertions(+), 98 deletions(-)

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 67d48e68831..da79fbda5b3 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -425,13 +425,15 @@ without an explicit `--interactive`.
 --autosquash::
 --no-autosquash::
When the commit log message begins with "squash! ..." (or
-   "fixup! ..."), and there is a commit whose title begins with
-   the same ..., automatically modify the todo list of rebase -i
-   so that the commit marked for squashing comes right after the
-   commit to be modified, and change the action of the moved
-   commit from `pick` to `squash` (or `fixup`).  Ignores subsequent
-   "fixup! " or "squash! " after the first, in case you referred to an
-   earlier fixup/squash with `git commit --fixup/--squash`.
+   "fixup! ..."), and there is already a commit in the todo list that
+   matches the same `...`, automatically modify the todo list of rebase
+   -i so that the commit marked for squashing comes right after the
+   commit to be modified, and change the action of the moved commit
+   from `pick` to `squash` (or `fixup`).  A commit matches the `...` if
+   the commit subject matches, or if the `...` refers to the commit's
+   hash. As a fall-back, partial matches of the commit subject work,
+   too.  The recommended way to create fixup/squash commits is by using
+   the `--fixup`/`--squash` options of linkgit:git-commit[1].
 +
 This option is only valid when the `--interactive` option is used.
 +
diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c
index de3ccd9bfbc..e6591f01112 100644
--- a/builtin/rebase--helper.c
+++ b/builtin/rebase--helper.c
@@ -14,7 +14,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
int keep_empty = 0;
enum {
CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_SHA1S, EXPAND_SHA1S,
-   CHECK_TODO_LIST, SKIP_UNNECESSARY_PICKS
+   CHECK_TODO_LIST, SKIP_UNNECESSARY_PICKS, REARRANGE_SQUASH
} command = 0;
struct option options[] = {
OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")),
@@ -33,6 +33,8 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
N_("check the todo list"), CHECK_TODO_LIST),
OPT_CMDMODE(0, "skip-unnecessary-picks", &command,
N_("skip unnecessary picks"), SKIP_UNNECESSARY_PICKS),
+   OPT_CMDMODE(0, "rearrange-squash", &command,
+   N_("rearrange fixup/squash lines"), REARRANGE_SQUASH),
OPT_END()
};
 
@@ -59,5 +61,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
return !!check_todo_list();
if (command == SKIP_UNNECESSARY_PICKS && argc == 1)
return !!skip_unnecessary_picks();
+   if (command == REARRANGE_SQUASH && argc == 1)
+   return !!rearrange_squash();
usage_with_options(builtin_rebase_helper_usage, options);
 }
diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 931bc09e0cf..d39fe4f5fb7 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -744,94 +744,6 @@ collapse_todo_ids() {
git rebase--helper --shorten-sha1s
 }
 
-# Rearrange the todo list that has both "pick sha1 msg" and
-# "pick sha1 fixup!/squash! msg" appears in it so that the latter
-# comes immediately after the former, and change "pick" to
-# "fixup"/"squash".
-#
-# Note that if the config has specified a custom instruction format
-# each log message will be re-retrieved in order to normalize the
-# autosquash arrangement
-rearrange_squash () {
-   format=$(git config --get rebase.instructionF

[PATCH v2 6/9] rebase -i: check for missing commits in the rebase--helper

2017-04-25 Thread Johannes Schindelin

In particular on Windows, where shell scripts are even more expensive
than on MacOSX or Linux, it makes sense to move a loop that forks
Git at least once for every line in the todo list into a builtin.

Signed-off-by: Johannes Schindelin 
---
 builtin/rebase--helper.c   |   7 +-
 git-rebase--interactive.sh | 164 ++---
 sequencer.c| 125 ++
 sequencer.h|   1 +
 4 files changed, 137 insertions(+), 160 deletions(-)

diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c
index 9444c8d6c60..e706eac710d 100644
--- a/builtin/rebase--helper.c
+++ b/builtin/rebase--helper.c
@@ -13,7 +13,8 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
struct replay_opts opts = REPLAY_OPTS_INIT;
int keep_empty = 0;
enum {
-   CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_SHA1S, EXPAND_SHA1S
+   CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_SHA1S, EXPAND_SHA1S,
+   CHECK_TODO_LIST
} command = 0;
struct option options[] = {
OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")),
@@ -28,6 +29,8 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
N_("shorten SHA-1s in the todo list"), SHORTEN_SHA1S),
OPT_CMDMODE(0, "expand-sha1s", &command,
N_("expand SHA-1s in the todo list"), EXPAND_SHA1S),
+   OPT_CMDMODE(0, "check-todo-list", &command,
+   N_("check the todo list"), CHECK_TODO_LIST),
OPT_END()
};
 
@@ -50,5 +53,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
return !!transform_todo_ids(1);
if (command == EXPAND_SHA1S && argc == 1)
return !!transform_todo_ids(0);
+   if (command == CHECK_TODO_LIST && argc == 1)
+   return !!check_todo_list();
usage_with_options(builtin_rebase_helper_usage, options);
 }
diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 52a19e0bdb3..1649506e1e4 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -890,96 +890,6 @@ add_exec_commands () {
mv "$1.new" "$1"
 }
 
-# Check if the SHA-1 passed as an argument is a
-# correct one, if not then print $2 in "$todo".badsha
-# $1: the SHA-1 to test
-# $2: the line number of the input
-# $3: the input filename
-check_commit_sha () {
-   badsha=0
-   if test -z "$1"
-   then
-   badsha=1
-   else
-   sha1_verif="$(git rev-parse --verify --quiet $1^{commit})"
-   if test -z "$sha1_verif"
-   then
-   badsha=1
-   fi
-   fi
-
-   if test $badsha -ne 0
-   then
-   line="$(sed -n -e "${2}p" "$3")"
-   warn "$(eval_gettext "\
-Warning: the SHA-1 is missing or isn't a commit in the following line:
- - \$line")"
-   warn
-   fi
-
-   return $badsha
-}
-
-# prints the bad commits and bad commands
-# from the todolist in stdin
-check_bad_cmd_and_sha () {
-   retval=0
-   lineno=0
-   while read -r command rest
-   do
-   lineno=$(( $lineno + 1 ))
-   case $command in
-   "$comment_char"*|''|noop|x|exec)
-   # Doesn't expect a SHA-1
-   ;;
-   "$cr")
-   # Work around CR left by "read" (e.g. with Git for
-   # Windows' Bash).
-   ;;
-   pick|p|drop|d|reword|r|edit|e|squash|s|fixup|f)
-   if ! check_commit_sha "${rest%%[]*}" "$lineno" 
"$1"
-   then
-   retval=1
-   fi
-   ;;
-   *)
-   line="$(sed -n -e "${lineno}p" "$1")"
-   warn "$(eval_gettext "\
-Warning: the command isn't recognized in the following line:
- - \$line")"
-   warn
-   retval=1
-   ;;
-   esac
-   done <"$1"
-   return $retval
-}
-
-# Print the list of the SHA-1 of the commits
-# from stdin to stdout
-todo_list_to_sha_list () {
-   git stripspace --strip-comments |
-   while read -r command sha1 rest
-   do
-   case $command in
-   "$comment_char"*|''|noop|x|"exec")
-   ;;
-   *)
-   long_sha=$(git rev-list --no-walk "$sha1" 2>/dev/null)
-   printf "%s\n" "$long_sha"
-   ;;
-   esac
-   done
-}
-
-# Use warn for each line in stdin
-warn_lines () {
-   while read -r line
-   do
-   warn " - $line"
-   done
-}
-
 # Switch to the branch in $into and notify it in the

[PATCH v2 7/9] rebase -i: skip unnecessary picks using the rebase--helper

2017-04-25 Thread Johannes Schindelin

In particular on Windows, where shell scripts are even more expensive
than on MacOSX or Linux, it makes sense to move a loop that forks
Git at least once for every line in the todo list into a builtin.

Note: The original code did not try to skip unnecessary picks of root
commits but punts instead (probably --root was not considered common
enough of a use case to bother optimizing). We do the same, for now.

Signed-off-by: Johannes Schindelin 
---
 builtin/rebase--helper.c   |  6 +++-
 git-rebase--interactive.sh | 41 ++---
 sequencer.c| 90 ++
 sequencer.h|  1 +
 4 files changed, 99 insertions(+), 39 deletions(-)

diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c
index e706eac710d..de3ccd9bfbc 100644
--- a/builtin/rebase--helper.c
+++ b/builtin/rebase--helper.c
@@ -14,7 +14,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
int keep_empty = 0;
enum {
CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_SHA1S, EXPAND_SHA1S,
-   CHECK_TODO_LIST
+   CHECK_TODO_LIST, SKIP_UNNECESSARY_PICKS
} command = 0;
struct option options[] = {
OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")),
@@ -31,6 +31,8 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
N_("expand SHA-1s in the todo list"), EXPAND_SHA1S),
OPT_CMDMODE(0, "check-todo-list", &command,
N_("check the todo list"), CHECK_TODO_LIST),
+   OPT_CMDMODE(0, "skip-unnecessary-picks", &command,
+   N_("skip unnecessary picks"), SKIP_UNNECESSARY_PICKS),
OPT_END()
};
 
@@ -55,5 +57,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
return !!transform_todo_ids(0);
if (command == CHECK_TODO_LIST && argc == 1)
return !!check_todo_list();
+   if (command == SKIP_UNNECESSARY_PICKS && argc == 1)
+   return !!skip_unnecessary_picks();
usage_with_options(builtin_rebase_helper_usage, options);
 }
diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 1649506e1e4..931bc09e0cf 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -713,43 +713,6 @@ do_rest () {
done
 }
 
-# skip picking commits whose parents are unchanged
-skip_unnecessary_picks () {
-   fd=3
-   while read -r command rest
-   do
-   # fd=3 means we skip the command
-   case "$fd,$command" in
-   3,pick|3,p)
-   # pick a commit whose parent is current $onto -> skip
-   sha1=${rest%% *}
-   case "$(git rev-parse --verify --quiet "$sha1"^)" in
-   "$onto"*)
-   onto=$sha1
-   ;;
-   *)
-   fd=1
-   ;;
-   esac
-   ;;
-   3,"$comment_char"*|3,)
-   # copy comments
-   ;;
-   *)
-   fd=1
-   ;;
-   esac
-   printf '%s\n' "$command${rest:+ }$rest" >&$fd
-   done <"$todo" >"$todo.new" 3>>"$done" &&
-   mv -f "$todo".new "$todo" &&
-   case "$(peek_next_command)" in
-   squash|s|fixup|f)
-   record_in_rewritten "$onto"
-   ;;
-   esac ||
-   die "$(gettext "Could not skip unnecessary pick commands")"
-}
-
 transform_todo_ids () {
while read -r command rest
do
@@ -1172,7 +1135,9 @@ git rebase--helper --check-todo-list || {
 
 expand_todo_ids
 
-test -d "$rewritten" || test -n "$force_rebase" || skip_unnecessary_picks
+test -d "$rewritten" || test -n "$force_rebase" ||
+onto="$(git rebase--helper --skip-unnecessary-picks)" ||
+die "Could not skip unnecessary pick commands"
 
 checkout_onto
 if test -z "$rebase_root" && test ! -d "$rewritten"
diff --git a/sequencer.c b/sequencer.c
index 3a935fa4cbc..bbbc98c9116 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2616,3 +2616,93 @@ int check_todo_list(void)
 
return res;
 }
+
+/* skip picking commits whose parents are unchanged */
+int skip_unnecessary_picks(void)
+{
+   const char *todo_file = rebase_path_todo();
+   struct strbuf buf = STRBUF_INIT;
+   struct todo_list todo_list = TODO_LIST_INIT;
+   struct object_id onto_oid, *oid = &onto_oid, *parent_oid;
+   int fd, i;
+
+   if (!read_oneliner(&buf, rebase_path_onto(), 0))
+   return error(_("could not read 'onto'"));
+   if (get_sha1(buf.buf, onto_oid.hash)) {
+   strbuf_release(&buf);
+   return error(_("need a HEAD to fixup"));
+   }
+   strbuf_release(&b

[PATCH v2 5/9] t3404: relax rebase.missingCommitsCheck tests

2017-04-25 Thread Johannes Schindelin

These tests were a bit anal about the *exact* warning/error message
printed by git rebase. But those messages are intended for the *end
user*, therefore it does not make sense to test so rigidly for the
*exact* wording.

In the following, we will reimplement the missing commits check in
the sequencer, with slightly different words.

So let's just test for the parts in the warning/error message that
we *really* care about, nothing more, nothing less.

Signed-off-by: Johannes Schindelin 
---
 t/t3404-rebase-interactive.sh | 22 --
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index 33d392ba112..61113be08a4 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -1242,20 +1242,13 @@ test_expect_success 'rebase -i respects 
rebase.missingCommitsCheck = error' '
test B = $(git cat-file commit HEAD^ | sed -ne \$p)
 '
 
-cat >expect expect

[PATCH v2 2/9] rebase -i: remove useless indentation

2017-04-25 Thread Johannes Schindelin

The commands used to be indented, and it is nice to look at, but when we
transform the SHA-1s, the indentation is removed. So let's do away with it.

For the moment, at least: when we will use the upcoming rebase--helper
to transform the SHA-1s, we *will* keep the indentation and can
reintroduce it. Yet, to be able to validate the rebase--helper against
the output of the current shell script version, we need to remove the
extra indentation.

Signed-off-by: Johannes Schindelin 
---
 git-rebase--interactive.sh | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 609e150d38f..c40b1fd1d2e 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -155,13 +155,13 @@ reschedule_last_action () {
 append_todo_help () {
gettext "
 Commands:
- p, pick = use commit
- r, reword = use commit, but edit the commit message
- e, edit = use commit, but stop for amending
- s, squash = use commit, but meld into previous commit
- f, fixup = like \"squash\", but discard this commit's log message
- x, exec = run command (the rest of the line) using shell
- d, drop = remove commit
+p, pick = use commit
+r, reword = use commit, but edit the commit message
+e, edit = use commit, but stop for amending
+s, squash = use commit, but meld into previous commit
+f, fixup = like \"squash\", but discard this commit's log message
+x, exec = run command (the rest of the line) using shell
+d, drop = remove commit
 
 These lines can be re-ordered; they are executed from top to bottom.
 " | git stripspace --comment-lines >>"$todo"
-- 
2.12.2.windows.2.406.gd14a8f8640f

[PATCH v2 4/9] rebase -i: also expand/collapse the SHA-1s via the rebase--helper

2017-04-25 Thread Johannes Schindelin

This is crucial to improve performance on Windows, as the speed is now
mostly dominated by the SHA-1 transformation (because it spawns a new
rev-parse process for *every* line, and spawning processes is pretty
slow from Git for Windows' MSYS2 Bash).

Signed-off-by: Johannes Schindelin 
---
 builtin/rebase--helper.c   | 10 +++-
 git-rebase--interactive.sh |  4 ++--
 sequencer.c| 59 ++
 sequencer.h|  2 ++
 4 files changed, 72 insertions(+), 3 deletions(-)

diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c
index 821058d452d..9444c8d6c60 100644
--- a/builtin/rebase--helper.c
+++ b/builtin/rebase--helper.c
@@ -13,7 +13,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
struct replay_opts opts = REPLAY_OPTS_INIT;
int keep_empty = 0;
enum {
-   CONTINUE = 1, ABORT, MAKE_SCRIPT
+   CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_SHA1S, EXPAND_SHA1S
} command = 0;
struct option options[] = {
OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")),
@@ -24,6 +24,10 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
ABORT),
OPT_CMDMODE(0, "make-script", &command,
N_("make rebase script"), MAKE_SCRIPT),
+   OPT_CMDMODE(0, "shorten-sha1s", &command,
+   N_("shorten SHA-1s in the todo list"), SHORTEN_SHA1S),
+   OPT_CMDMODE(0, "expand-sha1s", &command,
+   N_("expand SHA-1s in the todo list"), EXPAND_SHA1S),
OPT_END()
};
 
@@ -42,5 +46,9 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
return !!sequencer_remove_state(&opts);
if (command == MAKE_SCRIPT && argc > 1)
return !!sequencer_make_script(keep_empty, stdout, argc, argv);
+   if (command == SHORTEN_SHA1S && argc == 1)
+   return !!transform_todo_ids(1);
+   if (command == EXPAND_SHA1S && argc == 1)
+   return !!transform_todo_ids(0);
usage_with_options(builtin_rebase_helper_usage, options);
 }
diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 214af0372ba..52a19e0bdb3 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -774,11 +774,11 @@ transform_todo_ids () {
 }
 
 expand_todo_ids() {
-   transform_todo_ids
+   git rebase--helper --expand-sha1s
 }
 
 collapse_todo_ids() {
-   transform_todo_ids --short
+   git rebase--helper --shorten-sha1s
 }
 
 # Rearrange the todo list that has both "pick sha1 msg" and
diff --git a/sequencer.c b/sequencer.c
index d2d5bcd9b43..d6ae6407546 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2432,3 +2432,62 @@ int sequencer_make_script(int keep_empty, FILE *out,
strbuf_release(&buf);
return 0;
 }
+
+
+int transform_todo_ids(int shorten_sha1s)
+{
+   const char *todo_file = rebase_path_todo();
+   struct todo_list todo_list = TODO_LIST_INIT;
+   int fd, res, i;
+   FILE *out;
+
+   strbuf_reset(&todo_list.buf);
+   fd = open(todo_file, O_RDONLY);
+   if (fd < 0)
+   return error_errno(_("could not open '%s'"), todo_file);
+   if (strbuf_read(&todo_list.buf, fd, 0) < 0) {
+   close(fd);
+   return error(_("could not read '%s'."), todo_file);
+   }
+   close(fd);
+
+   res = parse_insn_buffer(todo_list.buf.buf, &todo_list);
+   if (res) {
+   todo_list_release(&todo_list);
+   return error(_("unusable instruction sheet: '%s'"), todo_file);
+   }
+
+   out = fopen(todo_file, "w");
+   if (!out) {
+   todo_list_release(&todo_list);
+   return error(_("unable to open '%s' for writing"), todo_file);
+   }
+   for (i = 0; i < todo_list.nr; i++) {
+   struct todo_item *item = todo_list.items + i;
+   int bol = item->offset_in_buf;
+   const char *p = todo_list.buf.buf + bol;
+   int eol = i + 1 < todo_list.nr ?
+   todo_list.items[i + 1].offset_in_buf :
+   todo_list.buf.len;
+
+   if (item->command >= TODO_EXEC && item->command != TODO_DROP)
+   fwrite(p, eol - bol, 1, out);
+   else {
+   int eoc = strcspn(p, " \t");
+   const char *sha1 = shorten_sha1s ?
+   short_commit_name(item->commit) :
+   oid_to_hex(&item->commit->object.oid);
+
+   if (!eoc) {
+   p += strspn(p, " \t");
+   eoc = strcspn(p, " \t");
+   }
+
+   fprintf(out, "%.*s %s %.*s\n",
+   eoc, p, sha1,

[PATCH v2 8/9] t3415: test fixup with wrapped oneline

2017-04-25 Thread Johannes Schindelin

The `git commit --fixup` command unwraps wrapped onelines when
constructing the commit message, without wrapping the result.

We need to make sure that `git rebase --autosquash` keeps handling such
cases correctly, in particular since we are about to move the autosquash
handling into the rebase--helper.

Signed-off-by: Johannes Schindelin 
---
 t/t3415-rebase-autosquash.sh | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/t/t3415-rebase-autosquash.sh b/t/t3415-rebase-autosquash.sh
index 48346f1cc0c..9fd629a6e21 100755
--- a/t/t3415-rebase-autosquash.sh
+++ b/t/t3415-rebase-autosquash.sh
@@ -304,4 +304,18 @@ test_expect_success 'extra spaces after fixup!' '
test $base = $parent
 '
 
+test_expect_success 'wrapped original subject' '
+   if test -d .git/rebase-merge; then git rebase --abort; fi &&
+   base=$(git rev-parse HEAD) &&
+   echo "wrapped subject" >wrapped &&
+   git add wrapped &&
+   test_tick &&
+   git commit --allow-empty -m "$(printf "To\nfixup")" &&
+   test_tick &&
+   git commit --allow-empty -m "fixup! To fixup" &&
+   git rebase -i --autosquash --keep-empty HEAD~2 &&
+   parent=$(git rev-parse HEAD^) &&
+   test $base = $parent
+'
+
 test_done
-- 
2.12.2.windows.2.406.gd14a8f8640f

[PATCH v2 1/9] rebase -i: generate the script via rebase--helper

2017-04-25 Thread Johannes Schindelin

The first step of an interactive rebase is to generate the so-called "todo
script", to be stored in the state directory as "git-rebase-todo" and to
be edited by the user.

Originally, we adjusted the output of `git log ` using a simple
sed script. Over the course of the years, the code became more
complicated. We now use shell scripting to edit the output of `git log`
conditionally, depending whether to keep "empty" commits (i.e. commits
that do not change any files).

On platforms where shell scripting is not native, this can be a serious
drag. And it opens the door for incompatibilities between platforms when
it comes to shell scripting or to Unix-y commands.

Let's just re-implement the todo script generation in plain C, using the
revision machinery directly.

This is substantially faster, improving the speed relative to the
shell script version of the interactive rebase from 2x to 3x on Windows.

Note that the rearrange_squash() function in git-rebase--interactive
relied on the fact that we set the "format" variable to the config setting
rebase.instructionFormat. Relying on a side effect like this is no good,
hence we explicitly perform that assignment (possibly again) in
rearrange_squash().

Signed-off-by: Johannes Schindelin 
---
 builtin/rebase--helper.c   |  8 +++-
 git-rebase--interactive.sh | 44 +++-
 sequencer.c| 44 
 sequencer.h|  3 +++
 4 files changed, 77 insertions(+), 22 deletions(-)

diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c
index ca1ebb2fa18..821058d452d 100644
--- a/builtin/rebase--helper.c
+++ b/builtin/rebase--helper.c
@@ -11,15 +11,19 @@ static const char * const builtin_rebase_helper_usage[] = {
 int cmd_rebase__helper(int argc, const char **argv, const char *prefix)
 {
struct replay_opts opts = REPLAY_OPTS_INIT;
+   int keep_empty = 0;
enum {
-   CONTINUE = 1, ABORT
+   CONTINUE = 1, ABORT, MAKE_SCRIPT
} command = 0;
struct option options[] = {
OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")),
+   OPT_BOOL(0, "keep-empty", &keep_empty, N_("keep empty 
commits")),
OPT_CMDMODE(0, "continue", &command, N_("continue rebase"),
CONTINUE),
OPT_CMDMODE(0, "abort", &command, N_("abort rebase"),
ABORT),
+   OPT_CMDMODE(0, "make-script", &command,
+   N_("make rebase script"), MAKE_SCRIPT),
OPT_END()
};
 
@@ -36,5 +40,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
char *prefix)
return !!sequencer_continue(&opts);
if (command == ABORT && argc == 1)
return !!sequencer_remove_state(&opts);
+   if (command == MAKE_SCRIPT && argc > 1)
+   return !!sequencer_make_script(keep_empty, stdout, argc, argv);
usage_with_options(builtin_rebase_helper_usage, options);
 }
diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 2c9c0165b5a..609e150d38f 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -785,6 +785,7 @@ collapse_todo_ids() {
 # each log message will be re-retrieved in order to normalize the
 # autosquash arrangement
 rearrange_squash () {
+   format=$(git config --get rebase.instructionFormat)
# extract fixup!/squash! lines and resolve any referenced sha1's
while read -r pick sha1 message
do
@@ -1210,26 +1211,27 @@ else
revisions=$onto...$orig_head
shortrevisions=$shorthead
 fi
-format=$(git config --get rebase.instructionFormat)
-# the 'rev-list .. | sed' requires %m to parse; the instruction requires %H to 
parse
-git rev-list $merges_option --format="%m%H ${format:-%s}" \
-   --reverse --left-right --topo-order \
-   $revisions ${restrict_revision+^$restrict_revision} | \
-   sed -n "s/^>//p" |
-while read -r sha1 rest
-do
-
-   if test -z "$keep_empty" && is_empty_commit $sha1 && ! is_merge_commit 
$sha1
-   then
-   comment_out="$comment_char "
-   else
-   comment_out=
-   fi
+if test t != "$preserve_merges"
+then
+   git rebase--helper --make-script ${keep_empty:+--keep-empty} \
+   $revisions ${restrict_revision+^$restrict_revision} >"$todo"
+else
+   format=$(git config --get rebase.instructionFormat)
+   # the 'rev-list .. | sed' requires %m to parse; the instruction 
requires %H to parse
+   git rev-list $merges_option --format="%m%H ${format:-%s}" \
+   --reverse --left-right --topo-order \
+   $revisions ${restrict_revision+^$restrict_revision} | \
+   sed -n "s/^>//p" |
+   while read -r sha1 rest
+   do
+
+   if test -z "$keep_empty" && is_empty_commit $sha1 && ! 
is_merge_commit $sha1
+

[PATCH v2 3/9] rebase -i: do not invent onelines when expanding/collapsing SHA-1s

2017-04-25 Thread Johannes Schindelin

To avoid problems with short SHA-1s that become non-unique during the
rebase, we rewrite the todo script with short/long SHA-1s before and
after letting the user edit the script. Since SHA-1s are not intuitive
for humans, rebase -i also provides the onelines (commit message
subjects) in the script, purely for the user's convenience.

It is very possible to generate a todo script via different means than
rebase -i and then to let rebase -i run with it; In this case, these
onelines are not required.

And this is where the expand/collapse machinery has a bug: it *expects*
that oneline, and failing to find one reuses the previous SHA-1 as
"oneline".

It was most likely an oversight, and made implementation in the (quite
limiting) shell script language less convoluted. However, we are about
to reimplement performance-critical parts in C (and due to spawning a
git.exe process for every single line of the todo script, the
expansion/collapsing of the SHA-1s *is* performance-hampering on
Windows), therefore let's fix this bug to make cross-validation with the
C version of that functionality possible.

Signed-off-by: Johannes Schindelin 
---
 git-rebase--interactive.sh | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index c40b1fd1d2e..214af0372ba 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -760,7 +760,12 @@ transform_todo_ids () {
;;
*)
sha1=$(git rev-parse --verify --quiet "$@" ${rest%%[
 ]*}) &&
-   rest="$sha1 ${rest#*[]}"
+   if test "a$rest" = "a${rest#*[   ]}"
+   then
+   rest=$sha1
+   else
+   rest="$sha1 ${rest#*[]}"
+   fi
;;
esac
printf '%s\n' "$command${rest:+ }$rest"
-- 
2.12.2.windows.2.406.gd14a8f8640f

1 2 >

1 - 100 of 119 matches

Mail list logo