Re: GDPR compliance best practices?

2018-06-06 Thread Peter Backes
Hi David,

thanks for your input on the issue.

> LEGAL GDPR NOTICE:
> According to the European data protection laws (GDPR), we would like to make 
> you
> aware that contributing to rsyslog via git will permanently store the
> name and email address you provide as well as the actual commit and the
> time and date you made it inside git's version history. This is inevitable,
> because it is a main feature git.

As we can, see, rsyslog tries to solve the issue by the already 
discussed legal "technology" of disclaimers (which is certainly not 
accepted as state of the art technology by the GDPR). In essence, they 
are giving excuses for why they are not honoring the right to be 
forgotten.

Disclaimers do not work. They have no legal effect, they are placebos.

The GDPR does not accept such excuses. If it would, companies could 
arbitrarily design their data storage such as to make it "the main 
feature" to not honor the right to be forgotten and/or other GDPR 
rights. It is obvious that this cannot work, as it would completely 
undermine those rights.

The GDPR honors technology as a means to protect the individual's 
rights, not as a means to subvert them.

> If you are concerned about your
> privacy, we strongly recommend to use
> 
> --author "anonymous "
> 
> together with your commit.

This can only be a solution if the project rejects any commits which 
are not anonymous.

> However, we have valid reasons why we cannot remove that information
> later on. The reasons are:
> 
> * this would break git history and make future merges unworkable

This is not a valid excuse (see above). The technology has to be 
designed or applied in such a way that the individuals rights are 
honored, not the other way around.

In absence of other means, the project has to rewrite history if it 
gets a valid request by someone exercising his right to be forgotten, 
even if that causes a lot of hazzle for everyone.

> * the rsyslog projects has legitimate interest to keep a permanent record of 
> the
>   contributor identity, once given, for
>   - copyright verification
>   - being able to provide proof should a malicious commit be made

True, but that doesn't justify publishing that information and keeping 
it published even when someone exercises his right to be forgotten.

In that case, "legitimate interest" is not enough. There need to be 
"overriding legitimate grounds". I don't see them here.

> Please also note that your commit is public and as such will potentially be
> processed by many third-parties. Git's distributed nature makes it impossible
> to track where exactly your commit, and thus your personal data, will be 
> stored
> and be processed. If you would not like to accept this risk, please do either
> commit anonymously or refrain from contributing to the rsyslog project.

This is one of those statements that ultimately say "we do not honor 
the GDPR; either accept that or don't submit". That's the old, arguably 
ignorant mentality, and won't stand.

The project has to have a legal basis for publishing the personal 
metadata contained in the repository. In doubt, it needs to be consent 
based, as that is practically the only basis that allows putting the 
data on the internet for everyone to download. And consent can be 
withdrawn at any time.

The GDPR's transitional period started over two years ago. There was 
enough time to get everything GDPR compliant.

It might be possible to implement my solution without changing git, 
btw. Simply use the anonymous hash as author name, and store the random 
number and the author as a git-notes. git-notes can be rewritten or 
deleted at any time without changing the commit ID. I am currently 
looking into this solution. One just needs to add something that can 
verify and resolve those anonymous hashes.

Best wishes
Peter


-- 
Peter Backes, r...@helen.plasma.xg8.de


Re: [PATCH v2] completion: reduce overhead of clearing cached --options

2018-06-06 Thread Jonathan Nieder
Hi,

SZEDER Gábor wrote:

> Other Bash versions, notably 4.4.19 on macOS via homebrew (i.e. a
> newer version on the same platform) and 3.2.25 on CentOS (i.e. a
> slightly earlier version, though on a different platform) are not
> affected.  ZSH in macOS (the versions shipped by default or installed
> via homebrew) or on other platforms isn't affected either.
[...]
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -282,7 +282,11 @@ __gitcomp ()
>  
>  # Clear the variables caching builtins' options when (re-)sourcing
>  # the completion script.
> -unset $(set |sed -ne 
> 's/^\(__gitcomp_builtin_[a-zA-Z0-9_][a-zA-Z0-9_]*\)=.*/\1/p') 2>/dev/null
> +if [[ -n ${ZSH_VERSION-} ]]; then
> + unset $(set |sed -ne 
> 's/^\(__gitcomp_builtin_[a-zA-Z0-9_][a-zA-Z0-9_]*\)=.*/\1/p') 2>/dev/null
> +else
> + unset $(compgen -v __gitcomp_builtin_)
> +fi

As discussed at [1], this fails catastrophically when sourced by
git-completion.zsh, which explicitly clears ZSH_VERSION.  By
catastrophically, I mean that Rick van Hattem and Dave Borowitz report
that it segfaults.

I can't reproduce it since I am having trouble following the
instructions at the top of git-completion.zsh to get the recommended
zsh completion script loading mechanism working at all.  (I've
succeeded in using git's bash completion on zsh before, but that was
many years ago.)  First attempt:

 1. rm -fr ~/.zsh ~/.zshrc
# you can move them out of the way instead for a less destructive
# reproduction recipe
 2. zsh
 3. hit "q"
 4. zstyle ':completion:*:*:git:*' script \
  $(pwd)/contrib/completion/git-completion.zsh
 5. git checkout 

Expected result: segfault

Actual result: succeeds, using zsh's standard completion script (not
git's).

I also tried

 1. rm -fr ~/.zsh ~/.zshrc
# you can move them out of the way instead for a less destructive
# reproduction recipe
 2. zsh
 3. hit "2"
 4. mkdir ~/.zsh; cp contrib/completion/git-completion.zsh ~/.zsh/_git
 5. echo 'fpath=(~/.zsh $fpath)' >>~/.zshrc
 6. ^D; zsh
 7. git checkout 

and a similar process with fpath earlier in the zshrc file.  Whatever
I try, I end up with one of two results: either it uses zsh's standard
completion, or it spews a stream of errors about missing functions
like compgen.  What am I doing wrong?

Related question: what would it take to add a zsh completion sanity
check to t/t9902-completion.sh?

When running with "set -x", here is what Dave Borowitz gets before the
segfault.  I don't have a stacktrace because, as mentioned above, I
haven't been able to reproduce it.

str=git
[[ -n git ]]
service=git
i=
_compskip=default
eval ''
ret=0
[[ default = *patterns* ]]
[[ default = all ]]
str=/usr/bin/git
[[ -n /usr/bin/git ]]
service=_dispatch:70: bad math expression: operand expected at 
`/usr/bin/g...'

zsh: segmentation fault  zsh

Here's a minimal fix, untested.  As a followup, as Gábor suggests at [2],
it would presumably make sense to stop overriding ZSH_VERSION, using
this GIT_ scoped variable everywhere instead.

Thoughts?

Reported-by: Rick van Hattem 
Reported-by: Dave Borowitz 
Signed-off-by: Jonathan Nieder 

[1] 
https://public-inbox.org/git/01020163c683e753-04629405-15f8-4a30-9dc3-e4e3f2a5aa26-000...@eu-west-1.amazonses.com/
[2] https://public-inbox.org/git/20180606114147.7753-1-szeder@gmail.com/

diff --git i/contrib/completion/git-completion.bash 
w/contrib/completion/git-completion.bash
index 12814e9bbf..e4bcc435ea 100644
--- i/contrib/completion/git-completion.bash
+++ w/contrib/completion/git-completion.bash
@@ -348,7 +348,7 @@ __gitcomp ()
 
 # Clear the variables caching builtins' options when (re-)sourcing
 # the completion script.
-if [[ -n ${ZSH_VERSION-} ]]; then
+if [[ -n ${ZSH_VERSION-} || -n ${GIT_SOURCING_ZSH_COMPLETION-} ]]; then
unset $(set |sed -ne 
's/^\(__gitcomp_builtin_[a-zA-Z0-9_][a-zA-Z0-9_]*\)=.*/\1/p') 2>/dev/null
 else
unset $(compgen -v __gitcomp_builtin_)
diff --git i/contrib/completion/git-completion.zsh 
w/contrib/completion/git-completion.zsh
index 53cb0f934f..c7be9fd4dc 100644
--- i/contrib/completion/git-completion.zsh
+++ w/contrib/completion/git-completion.zsh
@@ -39,7 +39,7 @@ if [ -z "$script" ]; then
test -f $e && script="$e" && break
done
 fi
-ZSH_VERSION='' . "$script"
+GIT_SOURCING_ZSH_COMPLETION=1 ZSH_VERSION='' . "$script"
 
 __gitcomp ()
 {


Re: [RFC PATCH 7/7] merge: fix misleading pre-merge check documentation

2018-06-06 Thread Elijah Newren
On Sat, Jun 2, 2018 at 11:58 PM, Elijah Newren  wrote:
> builtin/merge.c contains this important requirement for merge strategies:
>
> ...the index must be in sync with the head commit.  The strategies are
> responsible to ensure this.
>
> However, Documentation/git-merge.txt says:
>
> ...[merge will] abort if there are any changes registered in the index
> relative to the `HEAD` commit.  (One exception is when the changed
> index entries are in the state that would result from the merge
> already.)
>
> Interestingly, prior to commit c0be8aa06b85 ("Documentation/git-merge.txt:
> Partial rewrite of How Merge Works", 2008-07-19),
> Documentation/git-merge.txt said much more:
>
> ...the index file must match the tree of `HEAD` commit...
> [NOTE]
> This is a bit of a lite.  In certain special cases [explained
> in detail]...
> Otherwise, merge will refuse to do any harm to your repository
> (that is...your working tree...and index are left intact).
>
> So, this suggests that the exceptions existed because there were special
> cases where it would case no harm, and potentially be slightly more
> convenient for the user.  While the current text in git-merge.txt does
> list a condition under which it would be safe to proceed despite the index
> not matching HEAD, it does not match what is actually implemented, in
> three different ways:
>
> * The exception is written to describe what unpack-trees allows.  Not
>   all merge strategies allow such an exception, though, making this
>   description misleading.  'ours' and 'octopus' merges have strictly
>   enforced index==HEAD for a while, and the commit previous to this
>   one made 'recursive' do so as well.
>
> * If someone did a three-way content merge on a specific file using
>   versions from the relevant commits and staged it prior to running
>   merge, then that path would technically satisfy the exception listed
>   in git-merge.txt.  unpack-trees.c would still error out on the path,
>   though, because it defers the three-way content merge logic to other
>   parts of the code (resolve, octopus, or recursive) and has no way of
>   checking whether the index entry from before the merge will match
>   the end result of the merge.
>
> * The exception as implemented in unpack-trees actually only checked
>   that the index matched the MERGE_HEAD version of the file and that
>   HEAD matched the merge base.  Assuming no renames, that would indeed
>   provide cases where the index matches the end result we'd get from a
>   merge.  But renames means unpack-trees is checking that it instead
>   matches something other than what the final result will be, risking
>   either erroring out when we shouldn't need to, or not erroring out
>   when we should and overwriting the user's staged changes.
>
> In addition to the wording behind this exception being misleading, it is
> also somewhat surprising to see how many times the code for the special
> cases were wrong or the check to make sure the index matched head was
> forgotten altogether:
>
> * Prior to commit ee6566e8d70d ("[PATCH] Rewrite read-tree", 2005-09-05),
>   there were many cases where an unclean index entry was allowed (look for
>   merged_entry_allow_dirty()); it appears that in those cases, the merge
>   would have simply overwritten staged changes with the result of the
>   merge.  Thus, the merge result would have been correct, but the user's
>   uncommitted changes could be thrown away without warning.
>
> * Prior to commit 160252f81626 ("git-merge-ours: make sure our index
>   matches HEAD", 2005-11-03), the 'ours' merge strategy did not check
>   whether the index matched HEAD.  If it didn't, the resulting merge
>   would include all the staged changes, and thus wasn't really an 'ours'
>   strategy.
>
> * Prior to commit 3ec62ad9ffba ("merge-octopus: abort if index does not
>   match HEAD", 2016-04-09), 'octopus' merges did not check whether the
>   index matched HEAD, also resulting in any staged changes from before
>   the commit silently being folded into the resulting merge.  commit
>   a6ee883b8eb5 ("t6044: new merge testcases for when index doesn't match
>   HEAD", 2016-04-09) was also added at the same time to try to test to
>   make sure all strategies did the necessary checking for the requirement
>   that the index match HEAD.  Sadly, it didn't catch all the cases, as
>   evidenced by the remainder of this list...
>
> * Prior to commit 65170c07d466 ("merge-recursive: avoid incorporating
>   uncommitted changes in a merge", 2017-12-21), merge-recursive simply
>   relied on unpack_trees() to do the necessary check, but in one special
>   case it avoided calling unpack_trees() entirely and accidentally ended
>   up silently including any staged changes from before the merge in the
>   resulting merge commit.
>
> * The commit immediately before this one in this series 

Re: [PATCH] git-rebase--merge: modernize "git-$cmd" to "git $cmd"

2018-06-06 Thread Elijah Newren
On Wed, Jun 6, 2018 at 10:06 PM, Elijah Newren  wrote:
> 
> I tend to think git-rebase--merge is less popular than the other rebase
> types, leading to it being more overlooked and less well tested than the
> other ones.  This git-$cmd usage seems to support that argument.
>
> Anyway, this patch may be irrelevant if others agree with my goal to
> delete git-rebase--merge and implement --merge on top of the --interactive
> machinery, but sending it along in case others don't agree with that goal.
> 

I would forget to pull that out of the commit message area and put
that next to the diffstat in the email.  Whoops.

Oh, well.


Re: [PATCH 1/1] merge-recursive: give notice when submodule commit gets fast-forwarded

2018-06-06 Thread Elijah Newren
Hi Leif,

On Mon, Jun 4, 2018 at 11:48 AM, Leif Middelschulte
 wrote:
> From: Leif Middelschulte 
>
> Since submodules are treated similarly to ordinary files (i.e. not as 'dumb'
> pointers), an automatic merge should be mentioned if the user asks for it.
> Just as it is mentioned for oridnary files.

Thanks for following up; sorry it took me a few days to respond.
However, it looks like Junio merged the
sb/submodule-merge-in-merge-recursive topic, including your patch, to
master back on May 30.  As such, instead of re-rolling your patch,
we'd need a patch on top of the other existing change.

Also, take a look at the preliminary release announcement -- you show
up as a new contributor to git!  See it at
  https://public-inbox.org/git/xmqqwove4pzo@gitster-ct.c.googlers.com/


> +   output(o, 2, _("Auto-merging %s"), path);
...
> +   output(o, 2, _("Auto-merging %s"), path);

I preferred your old initial wording here, "Fast-forwarding submodule
%s" (I just wanted the "to %s" part at the end removed).  I'm afraid
that users who saw "Auto-merging $submodule" would assume that we
descended into the submodule and ran a full merge there.

Could you submit a patch that just removed that "to %s" part?


[RFC PATCH 0/2] contrib/credential/netrc Makefile & test improvements

2018-06-06 Thread Todd Zullinger
Hi,

I noticed failures from the contrib/credential/netrc tests
while building 2.18.0 release candidates.  I was surprised
to see the tests being run when called with a simple 'make'
command.

The first patch in the series adds an empty 'all::' make
target to match most of our other Makefiles and avoid the
surprise of running tests by default.  (When the netrc
helper was added to the fedora builds, it copied the same
'make -C contrib/credential/...' pattern from other
credential helpers -- despite the lack of anything to
build.)

The actual test failures were initially due to my build
environment lacking the perl autodie module, which was added
in 786ef50a23 ("git-credential-netrc: accept gpg option",
2018-05-12).

After installing the autodie module, the failures were due
to the build environment lacking a git install (specifically
the perl Git module).  The tests needing a pre-installed
perl Git seemed odd and worth fixing.

The second patch in the series aims to fix this.  I'm not
sure if there's a better or more preferable way to fix this,
which is one of the reasons for the RFC tag. (It's also why
I added you to the Cc Ævar, as you're one of the
knowledgeable perl folks here.)

The other reason for the RFC tag is that I'm unsure of how
to fix the last issue I found.  The tests exit cleanly even
when there are failures, which seems undesirable.  I'm not
familiar with the perl test_external framework to suggest a
fix in patch form.  It might be a matter of adding something
like this, from t/t9700/test.pl:

my $is_passing = eval { Test::More->is_passing };
exit($is_passing ? 0 : 1) unless $@ =~ /Can't locate object method/;

?  But that's a wild guess which I haven't tested.

Here's the output from 'make test' showing that most tests
fail and we still get a clean exit status:

$ make -C contrib/credential/netrc test ; echo "netrc test exit status: $?"
make: Entering directory 
'/builddir/build/BUILD/git-2.18.0.rc1/contrib/credential/netrc'
./t-git-credential-netrc.sh
ok 1 - set up test repository
# run 1: git-credential-netrc (perl 
/builddir/build/BUILD/git-2.18.0.rc1/t/../contrib/credential/netrc/test.pl)
ok 2 - Got 0 keys from insecure file
ok 3 - Got 0 keys from missing file
not ok 4 - Got first found keys with bad data
ok 5 - Got no corovamilkbar keys
not ok 6 - Got 2 Github keys
not ok 7 - Got correct Github password
not ok 8 - Got correct Github username
not ok 9 - Got 2 username-specific keys
not ok 10 - Got correct user-specific password
not ok 11 - Got correct user-specific protocol
not ok 12 - Got 2 host:port-specific keys
not ok 13 - Got correct host:port-specific password
not ok 14 - Got correct host:port-specific username
not ok 15 - Got 2 'host:port kills host' keys
not ok 16 - Got correct 'host:port kills host' password
not ok 17 - Got correct 'host:port kills host' username
not ok 18 - Got keys decrypted by git config option
not ok 19 - Got keys decrypted by command option
# test_external test git-credential-netrc was ok
make: Leaving directory 
'/builddir/build/BUILD/git-2.18.0.rc1/contrib/credential/netrc'
netrc test exit status: 0

Todd Zullinger (2):
  git-credential-netrc: make "all" default target of Makefile
  git-credential-netrc: use in-tree Git.pm for tests

 contrib/credential/netrc/Makefile | 3 +++
 contrib/credential/netrc/test.pl  | 3 +++
 2 files changed, 6 insertions(+)

Thanks all.

-- 
2.18.0.rc1


[RFC PATCH 1/2] git-credential-netrc: make "all" default target of Makefile

2018-06-06 Thread Todd Zullinger
Running "make" in contrib/credential/netrc should run the "all" target
rather than the "test" target.  Add an empty "all::" target like most of
our other Makefiles.

Signed-off-by: Todd Zullinger 
---
 contrib/credential/netrc/Makefile | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/contrib/credential/netrc/Makefile 
b/contrib/credential/netrc/Makefile
index 0ffa407191..6174e3bb83 100644
--- a/contrib/credential/netrc/Makefile
+++ b/contrib/credential/netrc/Makefile
@@ -1,3 +1,6 @@
+# The default target of this Makefile is...
+all::
+
 test:
./t-git-credential-netrc.sh
 
-- 
2.18.0.rc1



[RFC PATCH 2/2] git-credential-netrc: use in-tree Git.pm for tests

2018-06-06 Thread Todd Zullinger
The netrc test.pl script calls git-credential-netrc which imports the
Git module.  Pass GITPERLLIB to git-credential-netrc via PERL5LIB to
ensure the in-tree Git module is used for testing.

Signed-off-by: Todd Zullinger 
---
 contrib/credential/netrc/test.pl | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/contrib/credential/netrc/test.pl b/contrib/credential/netrc/test.pl
index 1e1001030e..5e26b4d190 100755
--- a/contrib/credential/netrc/test.pl
+++ b/contrib/credential/netrc/test.pl
@@ -27,6 +27,9 @@ BEGIN
   ? $ENV{PATH}
   : ();
 
+# use in-tree Git.pm
+local $ENV{PERL5LIB} = $ENV{GITPERLLIB};
+
 diag "Testing insecure file, nothing should be found\n";
 chmod 0644, $netrc;
 my $cred = run_credential(['-f', $netrc, 'get'],
-- 
2.18.0.rc1



Re: [PATCH] Use ZSH_NAME instead of ZSH_VERSION because it's redefined by git-completion.zsh

2018-06-06 Thread Jonathan Nieder
Hi,

Rick van Hattem wrote:
> On 4 June 2018 at 05:40, Junio C Hamano  wrote:

>> Overlong line, lack of sign-off.
>
> Apologies for the long lines, I wrote the message on Github where this
> message is properly formatted, apparently the submitgit script can be
> considered broken as it truncates the message when converting to email.
>
> The original message can be found here: https://github.com/git/git/pull/500
>
> Below is the original message with proper formatting:
>
>> A recent change (94408dc) broke zsh for me (actually segfault zsh when
>> trying to use git completion)
>>
>> The reason is that the `git-completion.zsh` sets the `ZSH_VERSION`
>> variable to an empty string:
>> ...
>> ZSH_VERSION='' . "$script"
>> ...
>>
>> I'm not sure if @szeder or @gitster used a different zsh version for
>> testing commit 94408dc but it segfaults zsh 5.5.1
>> (x86_64-apple-darwin15.6.0) on my OS X 10.11.6 machine.
>>
>> The proposed fix is quite simple and shouldn't break any backwards
>> compatibility.
>
> Hopefully that clears a little bit of the confusion.

Thanks.  That helps a little, but it's missing a crucial piece: may we
forge your sign-off?  See Documentation/SubmittingPatches section
[[sign-off]] ("Certify your work") for more details about what this
means.

Thanks,
Jonathan


[PATCH 1/2] t3418: add testcase showing problems with rebase -i and strategy options

2018-06-06 Thread Elijah Newren
We are not passing the same args to merge strategies when we are doing an
--interactive rebase as we do with a --merge rebase.  The merge strategy
should not need to be aware of which type of rebase is in effect.  Add a
testcase which checks for the appropriate args.

Signed-off-by: Elijah Newren 
---
 t/t3418-rebase-continue.sh | 32 
 1 file changed, 32 insertions(+)

diff --git a/t/t3418-rebase-continue.sh b/t/t3418-rebase-continue.sh
index 03bf1b8a3b..872022106f 100755
--- a/t/t3418-rebase-continue.sh
+++ b/t/t3418-rebase-continue.sh
@@ -74,6 +74,38 @@ test_expect_success 'rebase --continue remembers merge 
strategy and options' '
test -f funny.was.run
 '
 
+test_expect_failure 'rebase -i --continue handles merge strategy and options' '
+   rm -fr .git/rebase-* &&
+   git reset --hard commit-new-file-F2-on-topic-branch &&
+   test_commit "commit-new-file-F3-on-topic-branch-for-dash-i" F3 32 &&
+   test_when_finished "rm -fr test-bin funny.was.run funny.args" &&
+   mkdir test-bin &&
+   cat >test-bin/git-merge-funny <<-EOF &&
+   #!$SHELL_PATH
+   echo "\$@" >>funny.args
+   case "\$1" in --opt) ;; *) exit 2 ;; esac
+   case "\$2" in --foo) ;; *) exit 2 ;; esac
+   case "\$4" in --) ;; *) exit 2 ;; esac
+   shift 2 &&
+   >funny.was.run &&
+   exec git merge-recursive "\$@"
+   EOF
+   chmod +x test-bin/git-merge-funny &&
+   (
+   PATH=./test-bin:$PATH
+   test_must_fail git rebase -i -s funny -Xopt -Xfoo master topic
+   ) &&
+   test -f funny.was.run &&
+   rm funny.was.run &&
+   echo "Resolved" >F2 &&
+   git add F2 &&
+   (
+   PATH=./test-bin:$PATH
+   git rebase --continue
+   ) &&
+   test -f funny.was.run
+'
+
 test_expect_success 'rebase passes merge strategy options correctly' '
rm -fr .git/rebase-* &&
git reset --hard commit-new-file-F3-on-topic-branch &&
-- 
2.18.0.rc0.46.g9cee8fce43



[PATCH 2/2] Fix use of strategy options with interactive rebases

2018-06-06 Thread Elijah Newren
git-rebase.sh wrote stuff like
  '--ours'  '--renormalize'
to .git/rebase-merge/strategy_opts.  Note the double spaces.

git-rebase--interactive uses sequencer.c to parse that file, and
sequencer.c used split_cmdline() to get the individual strategy options.
After splitting, sequencer.c prefixed each "option" with a double dash,
so, concatenating all its options would result in:
  -- --ours -- --renormalize

So, when it ended up calling try_merge_strategy(), that in turn would run
  git merge-$strategy -- --ours -- --renormalize $merge_base -- $head $remote

instead of the expected
  git merge-$strategy --ours --renormalize $merge_base -- $head $remote

Remove the extra spaces so that split_cmdline() will work as expected.

Signed-off-by: Elijah Newren 
---
 git-rebase.sh  | 2 +-
 sequencer.c| 7 ++-
 t/t3418-rebase-continue.sh | 2 +-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/git-rebase.sh b/git-rebase.sh
index 40be59ecc4..224d5ebc29 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -316,7 +316,7 @@ do
do_merge=t
;;
--strategy-option=*)
-   strategy_opts="$strategy_opts $(git rev-parse --sq-quote 
"--${1#--strategy-option=}")"
+   strategy_opts="$strategy_opts $(git rev-parse --sq-quote 
"--${1#--strategy-option=}" | sed -e s/^.//)"
do_merge=t
test -z "$strategy" && strategy=recursive
;;
diff --git a/sequencer.c b/sequencer.c
index cca968043e..a2843e3906 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2203,6 +2203,7 @@ static int populate_opts_cb(const char *key, const char 
*value, void *data)
 static void read_strategy_opts(struct replay_opts *opts, struct strbuf *buf)
 {
int i;
+   char *strategy_opts_string;
 
strbuf_reset(buf);
if (!read_oneliner(buf, rebase_path_strategy(), 0))
@@ -2211,7 +2212,11 @@ static void read_strategy_opts(struct replay_opts *opts, 
struct strbuf *buf)
if (!read_oneliner(buf, rebase_path_strategy_opts(), 0))
return;
 
-   opts->xopts_nr = split_cmdline(buf->buf, (const char ***)&opts->xopts);
+   strategy_opts_string = buf->buf;
+   if (*strategy_opts_string == ' ')
+   strategy_opts_string++;
+   opts->xopts_nr = split_cmdline(strategy_opts_string,
+  (const char ***)&opts->xopts);
for (i = 0; i < opts->xopts_nr; i++) {
const char *arg = opts->xopts[i];
 
diff --git a/t/t3418-rebase-continue.sh b/t/t3418-rebase-continue.sh
index 872022106f..7ca6cbc415 100755
--- a/t/t3418-rebase-continue.sh
+++ b/t/t3418-rebase-continue.sh
@@ -74,7 +74,7 @@ test_expect_success 'rebase --continue remembers merge 
strategy and options' '
test -f funny.was.run
 '
 
-test_expect_failure 'rebase -i --continue handles merge strategy and options' '
+test_expect_success 'rebase -i --continue handles merge strategy and options' '
rm -fr .git/rebase-* &&
git reset --hard commit-new-file-F2-on-topic-branch &&
test_commit "commit-new-file-F3-on-topic-branch-for-dash-i" F3 32 &&
-- 
2.18.0.rc0.46.g9cee8fce43



[PATCH 2/2] git-rebase: error out when incompatible options passed

2018-06-06 Thread Elijah Newren
git rebase has three different types: am, merge, and interactive, all of
which are implemented in terms of separate scripts.  am builds on git-am,
merge builds on git-merge-recursive, and interactive builds on
git-cherry-pick.  We make use of features in those lower-level commands in
the different rebase types, but those features don't exist in all of the
lower level commands so we have a range of incompatibilities.  Previously,
we just accepted nearly any argument and silently ignored whichever ones
weren't implemented for the type of rebase specified.  Change this so the
incompatibilities are documented, included in the testsuite, and tested
for at runtime with an appropriate error message shown.

Some exceptions I left out:

  * --merge and --interactive are technically incompatible since they are
supposed to run different underlying scripts, but with a few small
changes, --interactive can do everything that --merge can.  In fact,
I'll shortly be sending another patch to remove git-rebase--merge and
reimplement it on top of git-rebase--interactive.

  * One could argue that --interactive and --quiet are incompatible since
--interactive doesn't implement a --quiet mode (perhaps since
cherry-pick itself does not implement one).  However, the interactive
mode is more quiet than the other modes in general with progress
messages, so one could argue that it's already quiet.

Signed-off-by: Elijah Newren 
---
 Documentation/git-rebase.txt   | 15 +--
 git-rebase.sh  | 17 +
 t/t3422-rebase-incompatible-options.sh | 10 +-
 3 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 0e20a66e73..451252c173 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -243,6 +243,10 @@ leave out at most one of A and B, in which case it 
defaults to HEAD.
 --keep-empty::
Keep the commits that do not change anything from its
parents in the result.
++
+This uses the `--interactive` machinery internally, and as such,
+anything that is incompatible with --interactive is incompatible
+with this option.
 
 --allow-empty-message::
By default, rebasing commits with an empty message will fail.
@@ -324,6 +328,8 @@ which makes little sense.
and after each change.  When fewer lines of surrounding
context exist they all must match.  By default no context is
ever ignored.
+   Incompatible with the --merge and --interactive options, or
+   anything that implies those options or their machinery.
 
 -f::
 --force-rebase::
@@ -355,13 +361,15 @@ default is `--no-fork-point`, otherwise the default is 
`--fork-point`.
 --whitespace=::
These flag are passed to the 'git apply' program
(see linkgit:git-apply[1]) that applies the patch.
-   Incompatible with the --interactive option.
+   Incompatible with the --merge and --interactive options, or
+   anything that implies those options or their machinery.
 
 --committer-date-is-author-date::
 --ignore-date::
These flags are passed to 'git am' to easily change the dates
of the rebased commits (see linkgit:git-am[1]).
-   Incompatible with the --interactive option.
+   Incompatible with the --merge and --interactive options, or
+   anything that implies those options or their machinery.
 
 --signoff::
Add a Signed-off-by: trailer to all the rebased commits. Note
@@ -400,6 +408,9 @@ The `--rebase-merges` mode is similar in spirit to 
`--preserve-merges`, but
 in contrast to that option works well in interactive rebases: commits can be
 reordered, inserted and dropped at will.
 +
+This uses the `--interactive` machinery internally, but it can be run
+without an explicit `--interactive`.
++
 It is currently only possible to recreate the merge commits using the
 `recursive` merge strategy; Different merge strategies can be used only via
 explicit `exec git merge -s  [...]` commands.
diff --git a/git-rebase.sh b/git-rebase.sh
index 40be59ecc4..f1dbecba18 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -503,6 +503,23 @@ then
git_format_patch_opt="$git_format_patch_opt --progress"
 fi
 
+if test -n "$git_am_opt"; then
+   incompatible_opts=`echo "$git_am_opt" | sed -e 's/ -q//'`
+   if test -n "$interactive_rebase"
+   then
+   if test -n "$incompatible_opts"
+   then
+   die "$(gettext "error: cannot combine interactive 
options (--interactive, --exec, --rebase-merges, --preserve-merges, 
--keep-empty, --root + --onto) with am options ($incompatible_opts)")"
+   fi
+   fi
+   if test -n "$do_merge"; then
+   if test -n "$incompatible_opts"
+   then
+   die "$(gettext "error: cannot combine merge options 
(--merge, --strategy, --strategy-option) with am options ($incompa

[PATCH] git-rebase.sh: handle keep-empty like all other options

2018-06-06 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 git-rebase.sh | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/git-rebase.sh b/git-rebase.sh
index 40be59ecc4..a56b286372 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -276,6 +276,7 @@ do
;;
--keep-empty)
keep_empty=yes
+   test -z "$interactive_rebase" && interactive_rebase=implied
;;
--allow-empty-message)
allow_empty_message=--allow-empty-message
@@ -480,11 +481,6 @@ then
test -z "$interactive_rebase" && interactive_rebase=implied
 fi
 
-if test -n "$keep_empty"
-then
-   test -z "$interactive_rebase" && interactive_rebase=implied
-fi
-
 if test -n "$interactive_rebase"
 then
type=interactive
-- 
2.18.0.rc0.46.g9cee8fce43



[PATCH 1/2] t3422: new testcases for checking when incompatible options passed

2018-06-06 Thread Elijah Newren
git rebase is split into three types: am, merge, and interactive.  Various
options imply different types, and which mode we are using determine which
sub-script (git-rebase--$type) is executed to finish the work.  Not all
options work with all types, so add tests for combinations where we expect
to receive an error rather than having options be silently ignored.

Signed-off-by: Elijah Newren 
---
 t/t3422-rebase-incompatible-options.sh | 69 ++
 1 file changed, 69 insertions(+)
 create mode 100755 t/t3422-rebase-incompatible-options.sh

diff --git a/t/t3422-rebase-incompatible-options.sh 
b/t/t3422-rebase-incompatible-options.sh
new file mode 100755
index 00..04cdae921b
--- /dev/null
+++ b/t/t3422-rebase-incompatible-options.sh
@@ -0,0 +1,69 @@
+#!/bin/sh
+
+test_description='test if rebase detects and aborts on incompatible options'
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+   test_seq 2 9 >foo &&
+   git add foo &&
+   git commit -m orig &&
+
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   test_seq 1 9 >foo &&
+   git add foo &&
+   git commit -m A &&
+
+   git checkout B &&
+   # This is indented with HT SP HT.
+   echo "  foo();" >>foo &&
+   git add foo &&
+   git commit -m B
+'
+
+#
+# Rebase has lots of useful options like --whitepsace=fix, which are
+# actually all built in terms of flags to git-am.  Since neither
+# --merge nor --interactive (nor any options that imply those two) use
+# git-am, using them together will result in flags like --whitespace=fix
+# being ignored.  Make sure rebase warns the user and aborts instead.
+#
+
+test_run_rebase () {
+   opt=$1
+   shift
+   test_expect_failure "$opt incompatible with --merge" "
+   git checkout B^0 &&
+   test_must_fail git rebase $opt --merge A
+   "
+
+   test_expect_failure "$opt incompatible with --strategy=ours" "
+   git checkout B^0 &&
+   test_must_fail git rebase $opt --strategy=ours A
+   "
+
+   test_expect_failure "$opt incompatible with --strategy-option=ours" "
+   git checkout B^0 &&
+   test_must_fail git rebase $opt --strategy=ours A
+   "
+
+   test_expect_failure "$opt incompatible with --interactive" "
+   git checkout B^0 &&
+   test_must_fail git rebase $opt --interactive A
+   "
+
+   test_expect_failure "$opt incompatible with --exec" "
+   git checkout B^0 &&
+   test_must_fail git rebase $opt --exec 'true' A
+   "
+
+}
+
+test_run_rebase --whitespace=fix
+test_run_rebase --ignore-whitespace
+test_run_rebase --committer-date-is-author-date
+test_run_rebase -C4
+
+test_done
-- 
2.18.0.rc0.46.g9cee8fce43



[PATCH] git-rebase--merge: modernize "git-$cmd" to "git $cmd"

2018-06-06 Thread Elijah Newren

I tend to think git-rebase--merge is less popular than the other rebase
types, leading to it being more overlooked and less well tested than the
other ones.  This git-$cmd usage seems to support that argument.

Anyway, this patch may be irrelevant if others agree with my goal to
delete git-rebase--merge and implement --merge on top of the --interactive
machinery, but sending it along in case others don't agree with that goal.


Signed-off-by: Elijah Newren 
---
 git-rebase--merge.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/git-rebase--merge.sh b/git-rebase--merge.sh
index cf4c042214..aa2f2f0872 100644
--- a/git-rebase--merge.sh
+++ b/git-rebase--merge.sh
@@ -71,7 +71,7 @@ call_merge () {
test -z "$strategy" && strategy=recursive
# If cmt doesn't have a parent, don't include it as a base
base=$(git rev-parse --verify --quiet $cmt^)
-   eval 'git-merge-$strategy' $strategy_opts $base ' -- "$hd" "$cmt"'
+   eval 'git merge-$strategy' $strategy_opts $base ' -- "$hd" "$cmt"'
rv=$?
case "$rv" in
0)
@@ -88,7 +88,7 @@ call_merge () {
;;
*)
die "Unknown exit code ($rv) from command:" \
-   "git-merge-$strategy $cmt^ -- HEAD $cmt"
+   "git merge-$strategy $cmt^ -- HEAD $cmt"
;;
esac
 }
-- 
2.18.0.rc0.46.g9cee8fce43



[PATCH] t5407: fix test to cover intended arguments

2018-06-06 Thread Elijah Newren
Test 8 in t5407 appears to be an accidental exact duplicate of of test 5;
the testcode is identical and has identical repo state, but the test
description is different and suggests that rebase -m followed by rebase
--skip was what was actually supposed to be tested.  Modify the test to
include the -m option.

Signed-off-by: Elijah Newren 
---
 t/t5407-post-rewrite-hook.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t5407-post-rewrite-hook.sh b/t/t5407-post-rewrite-hook.sh
index 7a48236e87..9b2a274c71 100755
--- a/t/t5407-post-rewrite-hook.sh
+++ b/t/t5407-post-rewrite-hook.sh
@@ -113,7 +113,7 @@ test_expect_success 'git rebase -m' '
 test_expect_success 'git rebase -m --skip' '
git reset --hard D &&
clear_hook_input &&
-   test_must_fail git rebase --onto A B &&
+   test_must_fail git rebase -m --onto A B &&
test_must_fail git rebase --skip &&
echo D > foo &&
git add foo &&
-- 
2.18.0.rc0.46.g9cee8fce43



[PATCH] t3401: add directory rename testcases for rebase and am

2018-06-06 Thread Elijah Newren
Add a simple directory rename testcase, in conjunction with each of the
types of rebases:
  git-rebase--interactive
  git-rebase--am
  git-rebase--merge
and also use the same testcase for
  git am --3way

This demonstrates an inconsistency between the different rebase backends
and which can detect the directory rename.

Signed-off-by: Elijah Newren 
---
 t/t3401-rebase-and-am-rename.sh | 105 
 1 file changed, 105 insertions(+)
 create mode 100755 t/t3401-rebase-and-am-rename.sh

diff --git a/t/t3401-rebase-and-am-rename.sh b/t/t3401-rebase-and-am-rename.sh
new file mode 100755
index 00..8f832957fc
--- /dev/null
+++ b/t/t3401-rebase-and-am-rename.sh
@@ -0,0 +1,105 @@
+#!/bin/sh
+
+test_description='git rebase + directory rename tests'
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-rebase.sh
+
+test_expect_success 'setup testcase' '
+   test_create_repo dir-rename &&
+   (
+   cd dir-rename &&
+
+   mkdir x &&
+   test_seq  1 10 >x/a &&
+   test_seq 11 20 >x/b &&
+   test_seq 21 30 >x/c &&
+   test_write_lines a b c d e f g h i >l &&
+   git add x l &&
+   git commit -m "Initial" &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   git mv x y &&
+   git mv l letters &&
+   git commit -m "Rename x to y, l to letters" &&
+
+   git checkout B &&
+   echo j >>l &&
+   test_seq 31 40 >x/d &&
+   git add l x/d &&
+   git commit -m "Modify l, add x/d"
+   )
+'
+
+test_expect_success 'rebase --interactive: directory rename detected' '
+   (
+   cd dir-rename &&
+
+   git checkout B^0 &&
+
+   set_fake_editor &&
+   FAKE_LINES="1" git rebase --interactive A &&
+
+   git ls-files -s >out &&
+   test_line_count = 5 out &&
+
+   test_path_is_file y/d &&
+   test_path_is_missing x/d
+   )
+'
+
+test_expect_failure 'rebase (am): directory rename detected' '
+   (
+   cd dir-rename &&
+
+   git checkout B^0 &&
+
+   git rebase A &&
+
+   git ls-files -s >out &&
+   test_line_count = 5 out &&
+
+   test_path_is_file y/d &&
+   test_path_is_missing x/d
+   )
+'
+
+test_expect_success 'rebase --merge: directory rename detected' '
+   (
+   cd dir-rename &&
+
+   git checkout B^0 &&
+
+   git rebase --merge A &&
+
+   git ls-files -s >out &&
+   test_line_count = 5 out &&
+
+   test_path_is_file y/d &&
+   test_path_is_missing x/d
+   )
+'
+
+test_expect_failure 'am: directory rename detected' '
+   (
+   cd dir-rename &&
+
+   git checkout A^0 &&
+
+   git format-patch -1 B &&
+
+   git am --3way 0001*.patch &&
+
+   git ls-files -s >out &&
+   test_line_count = 5 out &&
+
+   test_path_is_file y/d &&
+   test_path_is_missing x/d
+   )
+'
+
+test_done
-- 
2.18.0.rc0.46.g9cee8fce43



[PATCH] apply: fix grammar error in comment

2018-06-06 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 apply.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/apply.c b/apply.c
index d79e61591b..85f2c92740 100644
--- a/apply.c
+++ b/apply.c
@@ -4053,7 +4053,7 @@ static int preimage_oid_in_gitlink_patch(struct patch *p, 
struct object_id *oid)
return get_oid_hex(p->old_sha1_prefix, oid);
 }
 
-/* Build an index that contains the just the files needed for a 3way merge */
+/* Build an index that contains just the files needed for a 3way merge */
 static int build_fake_ancestor(struct apply_state *state, struct patch *list)
 {
struct patch *patch;
-- 
2.18.0.rc0.46.g9cee8fce43



RFC: rebase inconsistency in 2.18 -- course of action?

2018-06-06 Thread Elijah Newren
Hi everyone,

We've got a small rebase problem; I'm curious for thoughts about the
best course of action.  I'll provide several options below.

In short, while interactive-based and merge-based rebases handle
directory rename detection fine, am-based rebases do not.  This
inconsistency may frustrate or surprise users.

=== Demonstration Example ===

Let's say we have a repo with three commits -- an original commit
('O'), and two branches based off of it, A and B with an extra commit
each:

  O: has files l, x/a, x/b, x/c
  A: renames l -> lt, x/ -> y/
  B: modifies l, adds x/d

If I checkout B and run

  git rebase --interactive A

and don't bother editing the list of commits at all but just save,
then I end up with lt, y/{a, b, c, d} as expected.  However, if I run

  git rebase A

then I end up with lt, y/{a, b, c} and x/d; d is in the wrong place.

=== Problem explanation ===

am-based rebases suffer from a reduced ability to detect directory
renames upstream, which is fundamental to the fact that it throws away
information about the history: in particular, it dispenses with the
original commits involved by turning them into patches, and without
the knowledge of the original commits we cannot determine a proper
merge base.

The am-based rebase will proceed after generating patches by
commit-wise first trying git-apply, and if that fails it will attempt
a 3-way merge.  Since am has no knowledge of original commits, it
instead reconstructs a provisional merge base using
build_fake_ancestor() which will only contain the original versions of
the files modified in the patch.  Without the full list of files in
the real merge base, renames of any missing files cannot be detected.
Directory rename detection works by looking at individual file renames
and deducing when a full directory has been renamed.

Trying to modify build_fake_ancestor() to instead create a merge base
that includes common file information by looking for a commit that
contained all the same blobs as the pre-image of the patch would
require a very expensive search through history, and may choose a
wrong commit from among the several it finds.  (Also, regular am
outside of rebase may not find any commit that matches, forcing it to
somehow choose a near-fit).  Further, this would only work when the
fallback to a 3-way merge is triggered (i.e. when git-apply fails),
which may not happen for some patches. (For example, if the l->lt
change were not in the example, then git-apply would succeed and
incorrectly place d in x/, so we'd never even hit the 3-way fallback.)

In short, the am machinery simply doesn't have the necessary
information to properly detect directory renames, except when other
files involved in the renames upstream were also modified on the side
with the new file additions.

=== Possible solutions ===

1. Just accept it as a known shortcoming.

2. Revert directory rename detection from master...again.

   (Please, please, let's not pick this one.)

3. Claim it's not a bug at all.  In particular, the git-rebase(1) manpage
   states for the `--merge` option:

   Use merging strategies to rebase.  When the recursive (default)
   merge strategy is used, this allows rebase to be aware of
   renames on the upstream side.

   While "renames" is precisely why rebase --merge was added in commit
   58634dbff822 back on 2006-06-21, am-based rebases gained the
   ability to handle normal (non-directory) renames back in commit
   18cdf802ca6e from 2008-11-10.  So folks might be used to and expect
   their simple rebases to handle renames the same as --merge or
   --interactive rebases do.

4. Edit the 2.18 release notes.  Point out that directory renames can
   be detected by cherry-pick, merge, and rebases with either the -i
   or -m options (or any rebase option that implies either -i or -m);
   but not always by other rebases.  Folks may want to set pull.rebase
   to 'interactive' and just get used to immediately saving the
   popped-up list of commits and exiting to avoid this problem.

   Also, there's no good way to avoid this problem when using git-am
   directly.  At best, we can tell users to carefully select which
   older commit base to apply the patches on top of and then
   immediately run rebase --merge afterward.

5. Add an --am option to git-rebase, and make it not be the default
   (though make it be triggered by am-specific options like
   --whitespace or -C).  For the default, use either:

 5a. do_merge=t

 5b. interactive_rebase=implied

   As a slight aside, since there are only a few small changes
   necessary to make git-rebase--interactive handle ALL functionality
   from git-rebase--merge, we could take 5b one step further and
   delete git-rebase--merge and have one less rebase type.

   Either 5a or 5b may have a minor performance hit here, though
   personally I don't see this as a major factor.  But I'm happy to
   discuss at a high level if folks are curious (exact figures would
   depend heavi

Re: [PATCH v7 2/2] json-writer: t0019: add Python unit test

2018-06-06 Thread Jeff King
On Wed, Jun 06, 2018 at 10:23:53PM -0400, Jeff King wrote:

> Though maybe I am wrong that the remote-svn stuff requires python. I
> thought it did, but poking around, it looks like it's all C, and just
> the "svnrdump_sim" helper is python.

I think I was getting this mixed up with the git_remote_helpers python
work, which was removed long ago in ae34ac126f (git_remote_helpers:
remove little used Python library, 2013-09-07).

Note that it really changes much, but I was curious enough to dig down.

-Peff


Re: [PATCH 01/10] t: add tool to translate hash-related values

2018-06-06 Thread Jeff King
On Thu, Jun 07, 2018 at 12:57:04AM +, brian m. carlson wrote:

> > > Unless I'm wrong, we don't use the "local" keyword ?
> > 
> > We've got a test balloon out; see 01d3a526ad (t: check whether the
> > shell supports the "local" keyword, 2017-10-26). I think it's reasonable
> > to consider starting its use.
> 
> I used it because it's already in use earlier in the file in some of the
> mingw_* functions.  Perhaps we happen to know that our mingw systems
> will always have a suitable /bin/sh, but I suppose some less capable
> shells would still have choked on it by now.

We do know in that case; it's always bash under mingw.

That said...

> I can remove it if necessary, but it didn't seem necessary.

I feel OK about starting to use it, with the knowledge that we may get a
late-comer who hasn't even tested v2.16.0 yet and says "no, wait! My
shell doesn't support local!". And then we'd have to deal with it then.
But I suspect that won't happen, or it will turn out that the shell in
question is unusable for some other reason anyway.

-Peff


Re: [PATCH v7 2/2] json-writer: t0019: add Python unit test

2018-06-06 Thread Jeff King
On Wed, Jun 06, 2018 at 09:49:09PM -0400, Todd Zullinger wrote:

> Ramsay Jones wrote:
> [...]
> > I don't run the p4 or svn tests, so ... :-D
> 
> Heh, lucky you. :)
> 
> I try to run them all as part of the fedora builds since
> they cover much more than I'd ever use.  That's the main
> reason I noticed the bare python.  That would trip me up
> when it came time to build on a near-future fedora where
> python isn't installed by default and I only wanted to
> require python3 for the build/runtime scripts.

I'm not actually sure those svn bits involving python are usable by real
users. The main tool that people us for svn interop is git-svn, which is
written in perl.

There was an effort to have a real "git-remote-svn" helper (so that you
could just seamlessly "git clone svn://..." instead of using the "git
svn" wrapper.

So we have the work in vcs-svn, which gets built as part of a regular
compile. But AFAICT it is only used for git-remote-testsvn,
t/helper/test-svn-fe, and contrib/svn-fe. None of which have seen any
substantive work since 2012[1].

I suppose it's possible somebody could be using "git clone testsvn://"
in the wild, but the name would hopefully warn them off. I have no idea
how usable that work is in practice.

-Peff

[1] Browsing "git log", most of the commits are just tree-wide
cleanups, or fixing some compilation or dependency error. The last
"real" commit seems to be around 8e43a1d010 (remote-svn: add
incremental import, 2012-09-19).

I wonder if it is worth dropping this experiment as incomplete and
unmaintained. People have obviously spent time dealing with the code
for various fixups, but I think the whole thing may essentially just
be dead code. Or maybe people really are using contrib/svn-fe. Like
I said, I have no clue if this stuff is even usable, but we
certainly haven't packaged it to be seen by most users.


Re: [PATCH v7 2/2] json-writer: t0019: add Python unit test

2018-06-06 Thread Jeff King
On Thu, Jun 07, 2018 at 01:16:14AM +0100, Ramsay Jones wrote:

> > Probably. We may want to go the same route as we did for perl in 
> > a0e0ec9f7d (t: provide a perl() function which uses $PERL_PATH,
> > 2013-10-28) so that test writers don't have to remember this.
> > 
> > That said, I wonder if it would be hard to simply do the python bits
> > here in perl. This is the first use of python in our test scripts (and
> 
> Hmm, not quite the _first_ use:
> 
> $ git grep PYTHON_PATH -- t
> t/lib-git-p4.sh:(cd / && "$PYTHON_PATH" -c 'import time; 
> print(int(time.time()))')
> t/lib-git-p4.sh:"$PYTHON_PATH" "$TRASH_DIRECTORY/marshal-dump.py"
> t/t9020-remote-svn.sh:export PATH PYTHON_PATH GIT_BUILD_DIR
> t/t9020-remote-svn.sh:exec "$PYTHON_PATH" 
> "$GIT_BUILD_DIR/contrib/svn-fe/svnrdump_sim.py" "$@"
> t/t9802-git-p4-filetype.sh: "$PYTHON_PATH" 
> "$TRASH_DIRECTORY/k_smush.py" <"$cli/k-text-k" >cli-k-text-k-smush &&
> t/t9802-git-p4-filetype.sh: "$PYTHON_PATH" 
> "$TRASH_DIRECTORY/ko_smush.py" <"$cli/k-text-ko" >cli-k-text-ko-smush &&
> t/t9802-git-p4-filetype.sh: "$PYTHON_PATH" 
> "$TRASH_DIRECTORY/gendouble.py" >%double.png &&
> t/t9810-git-p4-rcs.sh:  "$PYTHON_PATH" "$TRASH_DIRECTORY/scrub_k.py" 
> <"$git/$file" >"$scrub" &&
> t/t9810-git-p4-rcs.sh:  "$PYTHON_PATH" "$TRASH_DIRECTORY/scrub_ko.py" 
> <"$git/$file" >"$scrub" &&
> $ 

OK, the first for a feature that is not already written in python
(leading into my second claim that python is used only for fringe
commands ;) ).

Though maybe I am wrong that the remote-svn stuff requires python. I
thought it did, but poking around, it looks like it's all C, and just
the "svnrdump_sim" helper is python.

At any rate, I think the point still stands that perl is our main
scripting language. I'd rather keep us to that unless there's a good
reason not to.

-Peff


Re: [PATCH v7 2/2] json-writer: t0019: add Python unit test

2018-06-06 Thread Todd Zullinger
Ramsay Jones wrote:
[...]
> I don't run the p4 or svn tests, so ... :-D

Heh, lucky you. :)

I try to run them all as part of the fedora builds since
they cover much more than I'd ever use.  That's the main
reason I noticed the bare python.  That would trip me up
when it came time to build on a near-future fedora where
python isn't installed by default and I only wanted to
require python3 for the build/runtime scripts.

> On 06/06/18 22:03, Jeff King wrote:
>> really the only user in the whole code base outside of a few fringe
>> commands). Leaving aside any perl vs python flame-war, I think there's
>> value in keeping the number of languages limited when there's not a
>> compelling reason to do otherwise.
> 
> I agree that fewer languages is (generally) a good idea.

Yep, that's certainly even better and if Jeff H. can use
perl relatively easily (or one of the many perl folks here
can help with that part of the test), that's great.  The
best way to solve many problems is avoid having them. :)

Thanks,

-- 
Todd
~~
Chaos, panic, and disorder - my job is done here.



Re: GDPR compliance best practices?

2018-06-06 Thread David Lang
I'm going to take the risk of inserting actual real-world data into the mix 
rather than just speculation :-)


Here is an example of that the Rsyslog project is doing (main developers based 
in Germany). I'll say as someone who's day job has been very involved with GDPR 
stuff recently, this looks like a very reasonable statement to me. But I am not 
a lawyer. I will also say that I think it would be very reasonable for projects 
to not accept code from someone who doesn't give them any way to contact them 
later in case there is a question about authorship or licensing.


David Lang


https://github.com/rsyslog/rsyslog/pull/2746/files

LEGAL GDPR NOTICE:
According to the European data protection laws (GDPR), we would like to make you
aware that contributing to rsyslog via git will permanently store the
name and email address you provide as well as the actual commit and the
time and date you made it inside git's version history. This is inevitable,
because it is a main feature git. If you are concerned about your
privacy, we strongly recommend to use

--author "anonymous "

together with your commit. Also please do NOT sign your commit in this case,
as that potentially could lead back to you. Please note that if you use your
real identity, the GDPR grants you the right to have this information removed
later. However, we have valid reasons why we cannot remove that information
later on. The reasons are:

* this would break git history and make future merges unworkable
* the rsyslog projects has legitimate interest to keep a permanent record of the
  contributor identity, once given, for
  - copyright verification
  - being able to provide proof should a malicious commit be made

Please also note that your commit is public and as such will potentially be
processed by many third-parties. Git's distributed nature makes it impossible
to track where exactly your commit, and thus your personal data, will be stored
and be processed. If you would not like to accept this risk, please do either
commit anonymously or refrain from contributing to the rsyslog project.


Hi

2018-06-06 Thread Robert Lee



Am a United State Army here in Afghanistan, am seeking your help to evacuate 
the sum of $ 7,000,000 to you as long as I am assured it will be safe That in 
your care Until I complete my service here in Afghanistan. This is not stolen 
money and there are no dangers involved. I count on your understanding. Please 
get back for more information :. rob322...@gmail.com


Re: [PATCH 04/10] t0027: use $ZERO_OID

2018-06-06 Thread brian m. carlson
On Wed, Jun 06, 2018 at 09:02:23AM +0200, Torsten Bögershausen wrote:
> Nothing wrong with the patch.
> There is, however, a trick in t0027 to transform the different IDs back to a 
> bunch of '0'.
> The content of the file use only uppercase letters, and all lowercase ad 
> digits
> are converted like this:
> 
> compare_ws_file () {
>   pfx=$1
>   exp=$2.expect
>   act=$pfx.actual.$3
>   tr '\015\000abcdef0123456789' QN0 <"$2" >"$exp" &&
>   tr '\015\000abcdef0123456789' QN0 <"$3" >"$act" &&
>   test_cmp "$exp" "$act" &&
>   rm "$exp" "$act"
> }
> 
> In the long term the 'tr' may need an additional 'sed' expression.

I'll take a look.  That may end up being a more robust solution.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 01/10] t: add tool to translate hash-related values

2018-06-06 Thread brian m. carlson
On Wed, Jun 06, 2018 at 04:58:46PM -0400, Jeff King wrote:
> On Wed, Jun 06, 2018 at 08:19:27AM +0200, Torsten Bögershausen wrote:
> 
> > > +test_translate_f_ () {
> > > + local file="$TEST_DIRECTORY/translate/$2" &&
> > 
> > Unless I'm wrong, we don't use the "local" keyword ?
> 
> We've got a test balloon out; see 01d3a526ad (t: check whether the
> shell supports the "local" keyword, 2017-10-26). I think it's reasonable
> to consider starting its use.

I used it because it's already in use earlier in the file in some of the
mingw_* functions.  Perhaps we happen to know that our mingw systems
will always have a suitable /bin/sh, but I suppose some less capable
shells would still have choked on it by now.

I can remove it if necessary, but it didn't seem necessary.

> > > + perl -e '
> > 
> > The bare "perl" is better spelled as "$PERL_PATH"
> 
> This use is OK. Since a0e0ec9f7d (t: provide a perl() function which
> uses $PERL_PATH, 2013-10-28), most common uses handle this automatically
> (there are some exceptions, covered in t/README).

This was exactly my reasoning.

> > > + if [ "$1" = "-f" ]
> > 
> > Style nit, please avoid [] and use test:
> > if test "$1" = "-f"
> 
> This I agree with. :)

Yeah, I forgot that that's our style.  I'll fix that.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v7 2/2] json-writer: t0019: add Python unit test

2018-06-06 Thread Ramsay Jones



On 06/06/18 22:03, Jeff King wrote:
> On Wed, Jun 06, 2018 at 01:10:52PM -0400, Todd Zullinger wrote:
> 
>> g...@jeffhostetler.com wrote:
>>> +# As a sanity check, ask Python to parse our generated JSON.  Let Python
>>> +# recursively dump the resulting dictionary in sorted order.  Confirm that
>>> +# that matches our expectations.
>>> +test_expect_success PYTHON 'parse JSON using Python' '
>> [...]
>>> +   python "$TEST_DIRECTORY"/t0019/parse_json_1.py actual &&
>>
>> Would this be better using $PYTHON_PATH rather than
>> hard-coding python as the command?
> 
> Probably. We may want to go the same route as we did for perl in 
> a0e0ec9f7d (t: provide a perl() function which uses $PERL_PATH,
> 2013-10-28) so that test writers don't have to remember this.
> 
> That said, I wonder if it would be hard to simply do the python bits
> here in perl. This is the first use of python in our test scripts (and

Hmm, not quite the _first_ use:

$ git grep PYTHON_PATH -- t
t/lib-git-p4.sh:(cd / && "$PYTHON_PATH" -c 'import time; 
print(int(time.time()))')
t/lib-git-p4.sh:"$PYTHON_PATH" "$TRASH_DIRECTORY/marshal-dump.py"
t/t9020-remote-svn.sh:export PATH PYTHON_PATH GIT_BUILD_DIR
t/t9020-remote-svn.sh:exec "$PYTHON_PATH" 
"$GIT_BUILD_DIR/contrib/svn-fe/svnrdump_sim.py" "$@"
t/t9802-git-p4-filetype.sh: "$PYTHON_PATH" 
"$TRASH_DIRECTORY/k_smush.py" <"$cli/k-text-k" >cli-k-text-k-smush &&
t/t9802-git-p4-filetype.sh: "$PYTHON_PATH" 
"$TRASH_DIRECTORY/ko_smush.py" <"$cli/k-text-ko" >cli-k-text-ko-smush &&
t/t9802-git-p4-filetype.sh: "$PYTHON_PATH" 
"$TRASH_DIRECTORY/gendouble.py" >%double.png &&
t/t9810-git-p4-rcs.sh:  "$PYTHON_PATH" "$TRASH_DIRECTORY/scrub_k.py" 
<"$git/$file" >"$scrub" &&
t/t9810-git-p4-rcs.sh:  "$PYTHON_PATH" "$TRASH_DIRECTORY/scrub_ko.py" 
<"$git/$file" >"$scrub" &&
$ 

I don't run the p4 or svn tests, so ... :-D

> really the only user in the whole code base outside of a few fringe
> commands). Leaving aside any perl vs python flame-war, I think there's
> value in keeping the number of languages limited when there's not a
> compelling reason to do otherwise.

I agree that fewer languages is (generally) a good idea.

ATB,
Ramsay Jones





Re: git rm bug

2018-06-06 Thread Ævar Arnfjörð Bjarmason


On Wed, Jun 06 2018, Timothy Rice wrote:

>> It does seem like something which could be noted in the git
>> rm docs.  Perhaps you'd care to take a stab at a patch to
>> add a note to Documentation/git-rm.txt Thomas?  Maybe a note
>> at the end of the DISCUSSION section?
>
> That same documentation could mention a common workaround for when someone
> does really want to keep the empty directories:
>
> $ touch ./path/to/empty/dir/.keep
> $ git add ./path/to/empty/dir/.keep
> $ git commit -m "Keep that empty directory because it is needed for 
> "

nit: The .gitkeep convention seems to be much more common in the wild
than .keep, and it's probably time we documented that somewhere, even
though it's not a magic .git* file like .gitattributes et al.


Re: git rm bug

2018-06-06 Thread Ævar Arnfjörð Bjarmason


On Wed, Jun 06 2018, Thomas Fischer wrote:

> I agree that the entire chain of empty directories should not be
> tracked, as git tracks content, not files.
>
> However, when I run 'rm path/to/some/file', I expect path/to/some/ to
> still exist.
>
> Similarly, when I run 'git rm path/to/some/file', I expect
> path/to/some/ to exist, *albeit untracked*.
>
> I do NOT expect git to *track* empty directories. But I also do NOT
> expect it to remove untracked directories.

Others have said why, but here's an edge case you probably haven't
thought of:

(
rm -rf /tmp/repo &&
git init /tmp/repo &&
cd /tmp/repo &&
mkdir -p foo/bar/baz &&
git status
)

If you just have empty directories "git status" will report nothing,
although "git clean -dxfn" will show what would be cleaned up.

So if this worked as you're suggesting then someone could 'git rm" some
file, then everything would report that they're on commit XYZ, but if
they re-cloned at that commit they'd get a tree that would look
different.

No git command should behave in such a way as to leave the tree in a
state when moving from commit X->Y that you wouldn't get the same Y if
you re-cloned.


Re: [PATCH 2/8] upload-pack: implement ref-in-want

2018-06-06 Thread Brandon Williams
On 06/07, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Jun 06 2018, Brandon Williams wrote:
> 
> > On 06/05, Ævar Arnfjörð Bjarmason wrote:
> >>
> >> On Tue, Jun 05 2018, Brandon Williams wrote:
> >>
> >> > +uploadpack.allowRefInWant::
> >> > +If this option is set, `upload-pack` will support the 
> >> > `ref-in-want`
> >> > +feature of the protocol version 2 `fetch` command.
> >> > +
> >>
> >> I think it makes sense to elaborate a bit on what this is for. Having
> >> read this series through, and to make sure I understood this, maybe
> >> something like this:
> >>
> >>This feature is intended for the benefit of load-balanced servers
> >>which may not have the same view of what SHA-1s their refs point to,
> >>but are guaranteed to never advertise a reference that another server
> >>serving the request doesn't know about.
> >>
> >> I.e. from what I can tell this gives no benefits for someone using a
> >> monolithic git server, except insofar as there would be a slight
> >> decrease in network traffic if the average length of refs is less than
> >> the length of a SHA-1.
> >
> > Yeah I agree that the motivation should probably be spelled out more,
> > thanks for the suggestion.
> >
> >>
> >> That's fair enough, just something we should prominently say.
> >>
> >> It does have the "disadvantage", if you can call it that, that it's
> >> introducing a race condition between when we read the ref advertisement
> >> and are promised XYZ refs, but may actually get ABC, but I can't think
> >> of a reason anyone would care about this in practice.
> >>
> >> The reason I'm saying "another server [...] doesn't know about" above is
> >> that 2/8 has this:
> >>
> >>if (read_ref(arg, &oid))
> >>die("unknown ref %s", arg);
> >>
> >> Doesn't that mean that if server A in your pool advertises master, next
> >> & pu, and you then go and fetch from server B advertising master & next,
> >> but not "pu" that the clone will die?
> >>
> >> Presumably at Google you either have something to ensure a consistent
> >> view, e.g. only advertise refs by name older than N seconds, or globally
> >> update ref name but not their contents, and don't allow deleting refs
> >> (or give them the same treatment).
> >>
> >> But that, and again, I may have misunderstood this whole thing,
> >> significantly reduces the utility of this feature for anyone "in the
> >> wild" since nothing shipped with "git" gives you that feature.
> >>
> >> The naïve way to do slave mirroring with stock git is to have a
> >> post-receive hook that pushes to your mirrors in a for-loop, or has them
> >> fetch from the master in a loop, and then round-robin LB those
> >> servers. Due to the "die on nonexisting" semantics in this extension
> >> that'll result in failed clones.
> >>
> >> So I think we should either be really vocal about that caveat, or
> >> perhaps think of how we could make that configurable, e.g. what happens
> >> if the server says "sorry, don't know about that one", and carries on
> >> with the rest it does know about?
> >
> > Jonathan actually pointed this out to me earlier and I think the best
> > way to deal with this is to just ignore the refs that the server doesn't
> > know about instead of dying here. I mean its no worse than what we
> > already have and we shouldn't hit this case too often.  And that way the
> > fetch can still proceed.
> >
> >>
> >> Is there a way for client & server to gracefully recover from that?
> >> E.g. send "master" & "next" now, and when I pull again in a few seconds
> >> I get the new "pu"?
> >
> > I think in this case the client would just need to wait for some amount
> > of replication delay and attempt fetching at a later point.
> >
> >>
> >> Also, as a digression isn't that a problem shared with protocol v2 in
> >> general? I.e. without this extension isn't it going to make another
> >> connection to the naïve LB'd mirroring setup described above and find
> >> that SHA-1s as well as refs don't match?
> >
> > This is actually an issue with fetch using either v2 or v0.  Unless I'm
> > misunderstanding what you're asking here.
> 
> Isn't the whole dialog in v1 guaranteed to be with one server from
> intial ref advertisement to the client saying have/want, or is that just
> with ssh?

That's only guaranteed with statefull connections (git:// and ssh://),
http:// has this issue because its stateless.

> 
> In any case the reason the above is an issue here is because you're
> getting the advertisement from a different server than you're
> negotiating the pack with, right?

Yes correct, or even a different server on each negotiation round-trip.

> 
> >>
> >> BREAK.
> >>
> >> Also is if this E-Mail wasn't long enough, on a completely different
> >> topic, in an earlier discussion in
> >> https://public-inbox.org/git/87inaje1uv@evledraar.gmail.com/ I noted
> >> that it would be neat-o to have optional wildmatch/pcre etc. matching
> >> for the use case you're not caring about 

Re: [PATCH 2/8] upload-pack: implement ref-in-want

2018-06-06 Thread Ævar Arnfjörð Bjarmason


On Wed, Jun 06 2018, Brandon Williams wrote:

> On 06/05, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Tue, Jun 05 2018, Brandon Williams wrote:
>>
>> > +uploadpack.allowRefInWant::
>> > +  If this option is set, `upload-pack` will support the `ref-in-want`
>> > +  feature of the protocol version 2 `fetch` command.
>> > +
>>
>> I think it makes sense to elaborate a bit on what this is for. Having
>> read this series through, and to make sure I understood this, maybe
>> something like this:
>>
>>This feature is intended for the benefit of load-balanced servers
>>which may not have the same view of what SHA-1s their refs point to,
>>but are guaranteed to never advertise a reference that another server
>>serving the request doesn't know about.
>>
>> I.e. from what I can tell this gives no benefits for someone using a
>> monolithic git server, except insofar as there would be a slight
>> decrease in network traffic if the average length of refs is less than
>> the length of a SHA-1.
>
> Yeah I agree that the motivation should probably be spelled out more,
> thanks for the suggestion.
>
>>
>> That's fair enough, just something we should prominently say.
>>
>> It does have the "disadvantage", if you can call it that, that it's
>> introducing a race condition between when we read the ref advertisement
>> and are promised XYZ refs, but may actually get ABC, but I can't think
>> of a reason anyone would care about this in practice.
>>
>> The reason I'm saying "another server [...] doesn't know about" above is
>> that 2/8 has this:
>>
>>  if (read_ref(arg, &oid))
>>  die("unknown ref %s", arg);
>>
>> Doesn't that mean that if server A in your pool advertises master, next
>> & pu, and you then go and fetch from server B advertising master & next,
>> but not "pu" that the clone will die?
>>
>> Presumably at Google you either have something to ensure a consistent
>> view, e.g. only advertise refs by name older than N seconds, or globally
>> update ref name but not their contents, and don't allow deleting refs
>> (or give them the same treatment).
>>
>> But that, and again, I may have misunderstood this whole thing,
>> significantly reduces the utility of this feature for anyone "in the
>> wild" since nothing shipped with "git" gives you that feature.
>>
>> The naïve way to do slave mirroring with stock git is to have a
>> post-receive hook that pushes to your mirrors in a for-loop, or has them
>> fetch from the master in a loop, and then round-robin LB those
>> servers. Due to the "die on nonexisting" semantics in this extension
>> that'll result in failed clones.
>>
>> So I think we should either be really vocal about that caveat, or
>> perhaps think of how we could make that configurable, e.g. what happens
>> if the server says "sorry, don't know about that one", and carries on
>> with the rest it does know about?
>
> Jonathan actually pointed this out to me earlier and I think the best
> way to deal with this is to just ignore the refs that the server doesn't
> know about instead of dying here. I mean its no worse than what we
> already have and we shouldn't hit this case too often.  And that way the
> fetch can still proceed.
>
>>
>> Is there a way for client & server to gracefully recover from that?
>> E.g. send "master" & "next" now, and when I pull again in a few seconds
>> I get the new "pu"?
>
> I think in this case the client would just need to wait for some amount
> of replication delay and attempt fetching at a later point.
>
>>
>> Also, as a digression isn't that a problem shared with protocol v2 in
>> general? I.e. without this extension isn't it going to make another
>> connection to the naïve LB'd mirroring setup described above and find
>> that SHA-1s as well as refs don't match?
>
> This is actually an issue with fetch using either v2 or v0.  Unless I'm
> misunderstanding what you're asking here.

Isn't the whole dialog in v1 guaranteed to be with one server from
intial ref advertisement to the client saying have/want, or is that just
with ssh?

In any case the reason the above is an issue here is because you're
getting the advertisement from a different server than you're
negotiating the pack with, right?

>>
>> BREAK.
>>
>> Also is if this E-Mail wasn't long enough, on a completely different
>> topic, in an earlier discussion in
>> https://public-inbox.org/git/87inaje1uv@evledraar.gmail.com/ I noted
>> that it would be neat-o to have optional wildmatch/pcre etc. matching
>> for the use case you're not caring about here (and I don't expect you
>> to, you're solving a different problem).
>>
>> But let's say I want to add that after this, and being unfamiliar with
>> the protocol v2 conventions. Would that be a whole new
>> ref-in-want-wildmatch-prefix capability with a new
>> want-ref-wildmatch-prefix verb, or is there some less verbose way we can
>> anticipate that use-case and internally version / advertise
>> sub-capabilities?
>>
>> I don't know if that makes any se

Re: [PATCH 2/8] upload-pack: implement ref-in-want

2018-06-06 Thread Brandon Williams
On 06/05, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Jun 05 2018, Brandon Williams wrote:
> 
> > +uploadpack.allowRefInWant::
> > +   If this option is set, `upload-pack` will support the `ref-in-want`
> > +   feature of the protocol version 2 `fetch` command.
> > +
> 
> I think it makes sense to elaborate a bit on what this is for. Having
> read this series through, and to make sure I understood this, maybe
> something like this:
> 
>This feature is intended for the benefit of load-balanced servers
>which may not have the same view of what SHA-1s their refs point to,
>but are guaranteed to never advertise a reference that another server
>serving the request doesn't know about.
> 
> I.e. from what I can tell this gives no benefits for someone using a
> monolithic git server, except insofar as there would be a slight
> decrease in network traffic if the average length of refs is less than
> the length of a SHA-1.

Yeah I agree that the motivation should probably be spelled out more,
thanks for the suggestion.

> 
> That's fair enough, just something we should prominently say.
> 
> It does have the "disadvantage", if you can call it that, that it's
> introducing a race condition between when we read the ref advertisement
> and are promised XYZ refs, but may actually get ABC, but I can't think
> of a reason anyone would care about this in practice.
> 
> The reason I'm saying "another server [...] doesn't know about" above is
> that 2/8 has this:
> 
>   if (read_ref(arg, &oid))
>   die("unknown ref %s", arg);
> 
> Doesn't that mean that if server A in your pool advertises master, next
> & pu, and you then go and fetch from server B advertising master & next,
> but not "pu" that the clone will die?
> 
> Presumably at Google you either have something to ensure a consistent
> view, e.g. only advertise refs by name older than N seconds, or globally
> update ref name but not their contents, and don't allow deleting refs
> (or give them the same treatment).
> 
> But that, and again, I may have misunderstood this whole thing,
> significantly reduces the utility of this feature for anyone "in the
> wild" since nothing shipped with "git" gives you that feature.
> 
> The naïve way to do slave mirroring with stock git is to have a
> post-receive hook that pushes to your mirrors in a for-loop, or has them
> fetch from the master in a loop, and then round-robin LB those
> servers. Due to the "die on nonexisting" semantics in this extension
> that'll result in failed clones.
> 
> So I think we should either be really vocal about that caveat, or
> perhaps think of how we could make that configurable, e.g. what happens
> if the server says "sorry, don't know about that one", and carries on
> with the rest it does know about?

Jonathan actually pointed this out to me earlier and I think the best
way to deal with this is to just ignore the refs that the server doesn't
know about instead of dying here. I mean its no worse than what we
already have and we shouldn't hit this case too often.  And that way the
fetch can still proceed.

> 
> Is there a way for client & server to gracefully recover from that?
> E.g. send "master" & "next" now, and when I pull again in a few seconds
> I get the new "pu"?

I think in this case the client would just need to wait for some amount
of replication delay and attempt fetching at a later point.

> 
> Also, as a digression isn't that a problem shared with protocol v2 in
> general? I.e. without this extension isn't it going to make another
> connection to the naïve LB'd mirroring setup described above and find
> that SHA-1s as well as refs don't match?

This is actually an issue with fetch using either v2 or v0.  Unless I'm
misunderstanding what you're asking here.

> 
> BREAK.
> 
> Also is if this E-Mail wasn't long enough, on a completely different
> topic, in an earlier discussion in
> https://public-inbox.org/git/87inaje1uv@evledraar.gmail.com/ I noted
> that it would be neat-o to have optional wildmatch/pcre etc. matching
> for the use case you're not caring about here (and I don't expect you
> to, you're solving a different problem).
> 
> But let's say I want to add that after this, and being unfamiliar with
> the protocol v2 conventions. Would that be a whole new
> ref-in-want-wildmatch-prefix capability with a new
> want-ref-wildmatch-prefix verb, or is there some less verbose way we can
> anticipate that use-case and internally version / advertise
> sub-capabilities?
> 
> I don't know if that makes any sense, and would be fine with just a
> ref-in-want-wildmatch-prefix if that's the way to do it. I just think
> it's inevitable that we'll have such a thing eventually, so it's worth
> thinking about how such a future extension fits in.

Yes back when introducing the server-side ref filtering in ls-refs we
originally talked about included wildmatch or other forms of pattern
matching.  We opted to not over complicate things and favored prefix
matching beca

Re: [PATCH v7 2/2] json-writer: t0019: add Python unit test

2018-06-06 Thread Jeff Hostetler




On 6/6/2018 5:03 PM, Jeff King wrote:

On Wed, Jun 06, 2018 at 01:10:52PM -0400, Todd Zullinger wrote:


g...@jeffhostetler.com wrote:

+# As a sanity check, ask Python to parse our generated JSON.  Let Python
+# recursively dump the resulting dictionary in sorted order.  Confirm that
+# that matches our expectations.
+test_expect_success PYTHON 'parse JSON using Python' '

[...]

+   python "$TEST_DIRECTORY"/t0019/parse_json_1.py actual &&


Would this be better using $PYTHON_PATH rather than
hard-coding python as the command?


Probably. We may want to go the same route as we did for perl in
a0e0ec9f7d (t: provide a perl() function which uses $PERL_PATH,
2013-10-28) so that test writers don't have to remember this.

That said, I wonder if it would be hard to simply do the python bits
here in perl. This is the first use of python in our test scripts (and
really the only user in the whole code base outside of a few fringe
commands). Leaving aside any perl vs python flame-war, I think there's
value in keeping the number of languages limited when there's not a
compelling reason to do otherwise.



Perl works too.  (I don't know perl, but I'm told it does work for
stuff like this. :-)

I'll take a stab at it.

Jeff





Re: [PATCH v7 2/2] json-writer: t0019: add Python unit test

2018-06-06 Thread Jeff Hostetler




On 6/6/2018 1:10 PM, Todd Zullinger wrote:

g...@jeffhostetler.com wrote:

+# As a sanity check, ask Python to parse our generated JSON.  Let Python
+# recursively dump the resulting dictionary in sorted order.  Confirm that
+# that matches our expectations.
+test_expect_success PYTHON 'parse JSON using Python' '

[...]

+   python "$TEST_DIRECTORY"/t0019/parse_json_1.py actual &&


Would this be better using $PYTHON_PATH rather than
hard-coding python as the command?



good catch!
thanks
Jeff


Re: [PATCH v7 2/2] json-writer: t0019: add Python unit test

2018-06-06 Thread Jeff King
On Wed, Jun 06, 2018 at 01:10:52PM -0400, Todd Zullinger wrote:

> g...@jeffhostetler.com wrote:
> > +# As a sanity check, ask Python to parse our generated JSON.  Let Python
> > +# recursively dump the resulting dictionary in sorted order.  Confirm that
> > +# that matches our expectations.
> > +test_expect_success PYTHON 'parse JSON using Python' '
> [...]
> > +   python "$TEST_DIRECTORY"/t0019/parse_json_1.py actual &&
> 
> Would this be better using $PYTHON_PATH rather than
> hard-coding python as the command?

Probably. We may want to go the same route as we did for perl in 
a0e0ec9f7d (t: provide a perl() function which uses $PERL_PATH,
2013-10-28) so that test writers don't have to remember this.

That said, I wonder if it would be hard to simply do the python bits
here in perl. This is the first use of python in our test scripts (and
really the only user in the whole code base outside of a few fringe
commands). Leaving aside any perl vs python flame-war, I think there's
value in keeping the number of languages limited when there's not a
compelling reason to do otherwise.

-Peff


Re: [PATCH 01/10] t: add tool to translate hash-related values

2018-06-06 Thread Jeff King
On Wed, Jun 06, 2018 at 08:19:27AM +0200, Torsten Bögershausen wrote:

> > +test_translate_f_ () {
> > +   local file="$TEST_DIRECTORY/translate/$2" &&
> 
> Unless I'm wrong, we don't use the "local" keyword ?

We've got a test balloon out; see 01d3a526ad (t: check whether the
shell supports the "local" keyword, 2017-10-26). I think it's reasonable
to consider starting its use.

> > +   perl -e '
> 
> The bare "perl" is better spelled as "$PERL_PATH"

This use is OK. Since a0e0ec9f7d (t: provide a perl() function which
uses $PERL_PATH, 2013-10-28), most common uses handle this automatically
(there are some exceptions, covered in t/README).

> > +   if [ "$1" = "-f" ]
> 
> Style nit, please avoid [] and use test:
>   if test "$1" = "-f"

This I agree with. :)

-Peff


[PATCH v2 5/8] fetch-pack: make negotiation-related vars local

2018-06-06 Thread Jonathan Tan
Reduce the number of global variables by making the priority queue and
the count of non-common commits in it local, passing them as a struct to
various functions where necessary.

This also helps in the case that fetch_pack() is invoked twice in the
same process (when tag following is required when using a transport that
does not support tag following), in that different priority queues will
now be used in each invocation, instead of reusing the possibly
non-empty one.

The struct containing these variables is named "data" to ease review of
a subsequent patch in this series - in that patch, this struct
definition and several functions will be moved to a negotiation-specific
file, and this allows the move to be verbatim.

Signed-off-by: Jonathan Tan 
---
 fetch-pack.c | 104 +--
 1 file changed, 59 insertions(+), 45 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 114207b8e..c31644bb9 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -50,8 +50,12 @@ static int marked;
  */
 #define MAX_IN_VAIN 256
 
-static struct prio_queue rev_list = { compare_commits_by_commit_date };
-static int non_common_revs, multi_ack, use_sideband;
+struct data {
+   struct prio_queue rev_list;
+   int non_common_revs;
+};
+
+static int multi_ack, use_sideband;
 /* Allow specifying sha1 if it is a ref tip. */
 #define ALLOW_TIP_SHA1 01
 /* Allow request of a sha1 if it is reachable from a ref (possibly hidden 
ref). */
@@ -93,7 +97,8 @@ static void cache_one_alternate(const char *refname,
cache->items[cache->nr++] = obj;
 }
 
-static void for_each_cached_alternate(void (*cb)(struct object *))
+static void for_each_cached_alternate(struct data *data,
+ void (*cb)(struct data *, struct object 
*))
 {
static int initialized;
static struct alternate_object_cache cache;
@@ -105,10 +110,10 @@ static void for_each_cached_alternate(void (*cb)(struct 
object *))
}
 
for (i = 0; i < cache.nr; i++)
-   cb(cache.items[i]);
+   cb(data, cache.items[i]);
 }
 
-static void rev_list_push(struct commit *commit, int mark)
+static void rev_list_push(struct data *data, struct commit *commit, int mark)
 {
if (!(commit->object.flags & mark)) {
commit->object.flags |= mark;
@@ -116,19 +121,20 @@ static void rev_list_push(struct commit *commit, int mark)
if (parse_commit(commit))
return;
 
-   prio_queue_put(&rev_list, commit);
+   prio_queue_put(&data->rev_list, commit);
 
if (!(commit->object.flags & COMMON))
-   non_common_revs++;
+   data->non_common_revs++;
}
 }
 
-static int rev_list_insert_ref(const char *refname, const struct object_id 
*oid)
+static int rev_list_insert_ref(struct data *data, const char *refname,
+  const struct object_id *oid)
 {
struct object *o = deref_tag(parse_object(oid), refname, 0);
 
if (o && o->type == OBJ_COMMIT)
-   rev_list_push((struct commit *)o, SEEN);
+   rev_list_push(data, (struct commit *)o, SEEN);
 
return 0;
 }
@@ -136,7 +142,7 @@ static int rev_list_insert_ref(const char *refname, const 
struct object_id *oid)
 static int rev_list_insert_ref_oid(const char *refname, const struct object_id 
*oid,
   int flag, void *cb_data)
 {
-   return rev_list_insert_ref(refname, oid);
+   return rev_list_insert_ref(cb_data, refname, oid);
 }
 
 static int clear_marks(const char *refname, const struct object_id *oid,
@@ -156,7 +162,7 @@ static int clear_marks(const char *refname, const struct 
object_id *oid,
when only the server does not yet know that they are common).
 */
 
-static void mark_common(struct commit *commit,
+static void mark_common(struct data *data, struct commit *commit,
int ancestors_only, int dont_parse)
 {
if (commit != NULL && !(commit->object.flags & COMMON)) {
@@ -166,12 +172,12 @@ static void mark_common(struct commit *commit,
o->flags |= COMMON;
 
if (!(o->flags & SEEN))
-   rev_list_push(commit, SEEN);
+   rev_list_push(data, commit, SEEN);
else {
struct commit_list *parents;
 
if (!ancestors_only && !(o->flags & POPPED))
-   non_common_revs--;
+   data->non_common_revs--;
if (!o->parsed && !dont_parse)
if (parse_commit(commit))
return;
@@ -179,7 +185,7 @@ static void mark_common(struct commit *commit,
for (parents = commit->parents;
parents;
parents =

[PATCH v2 7/8] fetch-pack: introduce negotiator API

2018-06-06 Thread Jonathan Tan
Introduce the new files fetch-negotiator.{h,c}, which contains an API
behind which the details of negotiation are abstracted. Currently, only
one algorithm is available: the existing one.

This patch is written to be easily reviewed: static functions are
moved verbatim from fetch-pack.c to negotiator/default.c, and it can be
seen that the lines replaced by negotiator->X() calls are present in the
X() functions respectively.

Signed-off-by: Jonathan Tan 
---
 Makefile |   2 +
 fetch-negotiator.c   |   8 ++
 fetch-negotiator.h   |  57 
 fetch-pack.c | 206 +--
 negotiator/default.c | 176 
 negotiator/default.h |   8 ++
 object.h |   3 +-
 7 files changed, 297 insertions(+), 163 deletions(-)
 create mode 100644 fetch-negotiator.c
 create mode 100644 fetch-negotiator.h
 create mode 100644 negotiator/default.c
 create mode 100644 negotiator/default.h

diff --git a/Makefile b/Makefile
index 1d27f3636..96f84d1dc 100644
--- a/Makefile
+++ b/Makefile
@@ -859,6 +859,7 @@ LIB_OBJS += ewah/ewah_bitmap.o
 LIB_OBJS += ewah/ewah_io.o
 LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec-cmd.o
+LIB_OBJS += fetch-negotiator.o
 LIB_OBJS += fetch-object.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
@@ -891,6 +892,7 @@ LIB_OBJS += merge-blobs.o
 LIB_OBJS += merge-recursive.o
 LIB_OBJS += mergesort.o
 LIB_OBJS += name-hash.o
+LIB_OBJS += negotiator/default.o
 LIB_OBJS += notes.o
 LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
diff --git a/fetch-negotiator.c b/fetch-negotiator.c
new file mode 100644
index 0..2675d120f
--- /dev/null
+++ b/fetch-negotiator.c
@@ -0,0 +1,8 @@
+#include "git-compat-util.h"
+#include "fetch-negotiator.h"
+#include "negotiator/default.h"
+
+void fetch_negotiator_init(struct fetch_negotiator *negotiator)
+{
+   default_negotiator_init(negotiator);
+}
diff --git a/fetch-negotiator.h b/fetch-negotiator.h
new file mode 100644
index 0..b1290aa9c
--- /dev/null
+++ b/fetch-negotiator.h
@@ -0,0 +1,57 @@
+#ifndef FETCH_NEGOTIATOR
+#define FETCH_NEGOTIATOR
+
+struct commit;
+
+/*
+ * An object that supplies the information needed to negotiate the contents of
+ * the to-be-sent packfile during a fetch.
+ *
+ * To set up the negotiator, call fetch_negotiator_init(), then known_common()
+ * (0 or more times), then add_tip() (0 or more times).
+ *
+ * Then, when "have" lines are required, call next(). Call ack() to report what
+ * the server tells us.
+ *
+ * Once negotiation is done, call release(). The negotiator then cannot be used
+ * (unless reinitialized with fetch_negotiator_init()).
+ */
+struct fetch_negotiator {
+   /*
+* Before negotiation starts, indicate that the server is known to have
+* this commit.
+*/
+   void (*known_common)(struct fetch_negotiator *, struct commit *);
+
+   /*
+* Once this function is invoked, known_common() cannot be invoked any
+* more.
+*
+* Indicate that this commit and all its ancestors are to be checked
+* for commonality with the server.
+*/
+   void (*add_tip)(struct fetch_negotiator *, struct commit *);
+
+   /*
+* Once this function is invoked, known_common() and add_tip() cannot
+* be invoked any more.
+*
+* Return the next commit that the client should send as a "have" line.
+*/
+   const struct object_id *(*next)(struct fetch_negotiator *);
+
+   /*
+* Inform the negotiator that the server has the given commit. This
+* method must only be called on commits returned by next().
+*/
+   int (*ack)(struct fetch_negotiator *, struct commit *);
+
+   void (*release)(struct fetch_negotiator *);
+
+   /* internal use */
+   void *data;
+};
+
+void fetch_negotiator_init(struct fetch_negotiator *negotiator);
+
+#endif
diff --git a/fetch-pack.c b/fetch-pack.c
index 4a4ec4da3..ad6f7ac32 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -15,10 +15,10 @@
 #include "connect.h"
 #include "transport.h"
 #include "version.h"
-#include "prio-queue.h"
 #include "sha1-array.h"
 #include "oidset.h"
 #include "packfile.h"
+#include "fetch-negotiator.h"
 
 static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
@@ -36,13 +36,7 @@ static const char *alternate_shallow_file;
 
 /* Remember to update object flag allocation in object.h */
 #define COMPLETE   (1U << 0)
-#define COMMON (1U << 1)
-#define COMMON_REF (1U << 2)
-#define SEEN   (1U << 3)
-#define POPPED (1U << 4)
-#define ALTERNATE  (1U << 5)
-
-static int marked;
+#define ALTERNATE  (1U << 1)
 
 /*
  * After sending this many "have"s if we do not get any new ACK , we
@@ -50,11 +44,6 @@ static int marked;
  */
 #define MAX_IN_VAIN 256
 
-struct data {
-   struct prio_queue rev_list;
-   int non_common_revs;
-};
-
 static int multi_ack, use_sideband;
 /* A

[PATCH v2 8/8] negotiator/default: use better style in comments

2018-06-06 Thread Jonathan Tan
Signed-off-by: Jonathan Tan 
---
 negotiator/default.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/negotiator/default.c b/negotiator/default.c
index b8f45cf78..a9e52c943 100644
--- a/negotiator/default.c
+++ b/negotiator/default.c
@@ -46,11 +46,10 @@ static int clear_marks(const char *refname, const struct 
object_id *oid,
 }
 
 /*
-   This function marks a rev and its ancestors as common.
-   In some cases, it is desirable to mark only the ancestors (for example
-   when only the server does not yet know that they are common).
-*/
-
+ * This function marks a rev and its ancestors as common.
+ * In some cases, it is desirable to mark only the ancestors (for example
+ * when only the server does not yet know that they are common).
+ */
 static void mark_common(struct data *data, struct commit *commit,
int ancestors_only, int dont_parse)
 {
@@ -80,9 +79,8 @@ static void mark_common(struct data *data, struct commit 
*commit,
 }
 
 /*
-  Get the next rev to send, ignoring the common.
-*/
-
+ * Get the next rev to send, ignoring the common.
+ */
 static const struct object_id *get_rev(struct data *data)
 {
struct commit *commit = NULL;
-- 
2.17.0.768.g1526ddbba1.dirty



[PATCH v2 0/8] Refactor fetch negotiation into its own API

2018-06-06 Thread Jonathan Tan
Thanks, Jonathan Nieder, for your comments.

This patch set now includes a patch at the beginning to split up
everything_local(), making it clear which functions have side effects
and which don't, and a patch at the end to reformat some comments.

While I was implementing the suggestions, there were some that I were
unsure about. Here they are:

> > nit: this adds the new test as last in the script.  Is there some
> > logical earlier place in the file it can go instead?  That way, the
> > file stays organized and concurrent patches that modify the same test
> > script are less likely to conflict.
> 
> Good point. I'll find a place.

I couldn't find a logical place (I didn't see any existing tests that
test negotiation). I did move it up a bit earlier in the file, before
filtering by blob size, because that seemed more distinct from the rest
of the tests.

> >> -static struct prio_queue rev_list = { compare_commits_by_commit_date };
> >> -static int non_common_revs, multi_ack, use_sideband;
> >> +struct data {
> >> + struct prio_queue rev_list;
> >> + int non_common_revs;
> >> +};
> >
> > How does this struct get used?  What does it represent?  A comment
> > might help.
> 
> I'll add a comment.

I thought of adding a comment, but felt that I would end up removing it
anyway upon its move to negotiator/default.c, so I didn't end up adding
it.

> >> +/*
> >> +   This function marks a rev and its ancestors as common.
> >> +   In some cases, it is desirable to mark only the ancestors (for example
> >> +   when only the server does not yet know that they are common).
> >> +*/
> >
> > Not about this change: comments should have ' *' at the start of each
> > line (could do in a preparatory patch or a followup).
> 
> I'll add a followup.

I'm now not sure of the value of making a change just to update
formatting, but I added the followup commit anyway - it can be easily
dropped if we decide to do so.

Jonathan Tan (8):
  fetch-pack: split up everything_local()
  fetch-pack: clear marks before re-marking
  fetch-pack: directly end negotiation if ACK ready
  fetch-pack: use ref adv. to prune "have" sent
  fetch-pack: make negotiation-related vars local
  fetch-pack: move common check and marking together
  fetch-pack: introduce negotiator API
  negotiator/default: use better style in comments

 Makefile  |   2 +
 fetch-negotiator.c|   8 ++
 fetch-negotiator.h|  57 ++
 fetch-pack.c  | 254 ++
 negotiator/default.c  | 174 +
 negotiator/default.h  |   8 ++
 object.h  |   3 +-
 t/t5500-fetch-pack.sh |  39 +++
 8 files changed, 376 insertions(+), 169 deletions(-)
 create mode 100644 fetch-negotiator.c
 create mode 100644 fetch-negotiator.h
 create mode 100644 negotiator/default.c
 create mode 100644 negotiator/default.h

-- 
2.17.0.768.g1526ddbba1.dirty



[PATCH v2 3/8] fetch-pack: directly end negotiation if ACK ready

2018-06-06 Thread Jonathan Tan
When "ACK %s ready" is received, find_common() clears rev_list in an
attempt to stop further "have" lines from being sent [1]. It is much
more readable to explicitly break from the loop instead, so do this.

This means that the memory in priority queue will be reclaimed only upon
program exit, similar to the cases in which "ACK %s ready" is not
received. (A related problem occurs when do_fetch_pack() is invoked a
second time in the same process with a possibly non-empty priority
queue, but this will be solved in a subsequent patch in this patch set.)

[1] The rationale is further described in the originating commit
f2cba9299b ("fetch-pack: Finish negotation if remote replies "ACK %s
ready"", 2011-03-14).

Signed-off-by: Jonathan Tan 
---
 fetch-pack.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 2812499a5..09f5c83c4 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -517,10 +517,8 @@ static int find_common(struct fetch_pack_args *args,
mark_common(commit, 0, 1);
retval = 0;
got_continue = 1;
-   if (ack == ACK_ready) {
-   clear_prio_queue(&rev_list);
+   if (ack == ACK_ready)
got_ready = 1;
-   }
break;
}
}
@@ -530,6 +528,8 @@ static int find_common(struct fetch_pack_args *args,
print_verbose(args, _("giving up"));
break; /* give up */
}
+   if (got_ready)
+   break;
}
}
 done:
@@ -1300,7 +1300,6 @@ static int process_acks(struct packet_reader *reader, 
struct oidset *common)
}
 
if (!strcmp(reader->line, "ready")) {
-   clear_prio_queue(&rev_list);
received_ready = 1;
continue;
}
-- 
2.17.0.768.g1526ddbba1.dirty



[PATCH v2 1/8] fetch-pack: split up everything_local()

2018-06-06 Thread Jonathan Tan
The function everything_local(), despite its name, also (1) marks
commits as COMPLETE and COMMON_REF and (2) invokes filter_refs() as
important side effects. Extract (1) into its own function
(mark_complete_and_common_ref()) and remove
(2).

The restoring of save_commit_buffer, which was introduced in a1c6d7c1a7
("fetch-pack: restore save_commit_buffer after use", 2017-12-08), is a
concern of the parse_object() call in mark_complete_and_common_ref(), so
it has been moved from the end of everything_local() to the end of
mark_complete_and_common_ref().

Signed-off-by: Jonathan Tan 
---
 fetch-pack.c | 39 ++-
 1 file changed, 30 insertions(+), 9 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index a320ce987..5c87bb8bb 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -734,12 +734,20 @@ static int add_loose_objects_to_set(const struct 
object_id *oid,
return 0;
 }
 
-static int everything_local(struct fetch_pack_args *args,
-   struct ref **refs,
-   struct ref **sought, int nr_sought)
+/*
+ * Mark recent commits available locally and reachable from a local ref as
+ * COMPLETE. If args->no_dependents is false, also mark COMPLETE remote refs as
+ * COMMON_REF (otherwise, we are not planning to participate in negotiation, 
and
+ * thus do not need COMMON_REF marks).
+ *
+ * The cutoff time for recency is determined by this heuristic: it is the
+ * earliest commit time of the objects in refs that are commits and that we 
know
+ * the commit time of.
+ */
+static void mark_complete_and_common_ref(struct fetch_pack_args *args,
+struct ref **refs)
 {
struct ref *ref;
-   int retval;
int old_save_commit_buffer = save_commit_buffer;
timestamp_t cutoff = 0;
struct oidset loose_oid_set = OIDSET_INIT;
@@ -812,7 +820,18 @@ static int everything_local(struct fetch_pack_args *args,
}
}
 
-   filter_refs(args, refs, sought, nr_sought);
+   save_commit_buffer = old_save_commit_buffer;
+}
+
+/*
+ * Returns 1 if every object pointed to by the given remote refs is available
+ * locally and reachable from a local ref, and 0 otherwise.
+ */
+static int everything_local(struct fetch_pack_args *args,
+   struct ref **refs)
+{
+   struct ref *ref;
+   int retval;
 
for (retval = 1, ref = *refs; ref ; ref = ref->next) {
const struct object_id *remote = &ref->old_oid;
@@ -829,8 +848,6 @@ static int everything_local(struct fetch_pack_args *args,
  ref->name);
}
 
-   save_commit_buffer = old_save_commit_buffer;
-
return retval;
 }
 
@@ -1053,7 +1070,9 @@ static struct ref *do_fetch_pack(struct fetch_pack_args 
*args,
if (!server_supports("deepen-relative") && args->deepen_relative)
die(_("Server does not support --deepen"));
 
-   if (everything_local(args, &ref, sought, nr_sought)) {
+   mark_complete_and_common_ref(args, &ref);
+   filter_refs(args, &ref, sought, nr_sought);
+   if (everything_local(args, &ref)) {
packet_flush(fd[1]);
goto all_done;
}
@@ -1377,7 +1396,9 @@ static struct ref *do_fetch_pack_v2(struct 
fetch_pack_args *args,
for_each_cached_alternate(insert_one_alternate_object);
 
/* Filter 'ref' by 'sought' and those that aren't local 
*/
-   if (everything_local(args, &ref, sought, nr_sought))
+   mark_complete_and_common_ref(args, &ref);
+   filter_refs(args, &ref, sought, nr_sought);
+   if (everything_local(args, &ref))
state = FETCH_DONE;
else
state = FETCH_SEND_REQUEST;
-- 
2.17.0.768.g1526ddbba1.dirty



[PATCH v2 4/8] fetch-pack: use ref adv. to prune "have" sent

2018-06-06 Thread Jonathan Tan
In negotiation using protocol v2, fetch-pack sometimes does not make
full use of the information obtained in the ref advertisement:
specifically, that if the server advertises a commit that the client
also has, the client never needs to inform the server that it has the
commit's parents, since it can just tell the server that it has the
advertised commit and it knows that the server can and will infer the
rest.

This is because, in do_fetch_pack_v2(), rev_list_insert_ref_oid() is
invoked before everything_local(). This means that if we have a commit
that is both our ref and their ref, it would be enqueued by
rev_list_insert_ref_oid() as SEEN, and since it is thus already SEEN,
everything_local() would not enqueue it.

If everything_local() were invoked first, as it is in do_fetch_pack()
for protocol v0, then everything_local() would enqueue it with
COMMON_REF | SEEN. The addition of COMMON_REF ensures that its parents
are not sent as "have" lines.

Change the order in do_fetch_pack_v2() to be consistent with
do_fetch_pack(), and to avoid sending unnecessary "have" lines.

Signed-off-by: Jonathan Tan 
---
 fetch-pack.c  |  6 +++---
 t/t5500-fetch-pack.sh | 39 +++
 2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 09f5c83c4..114207b8e 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1391,9 +1391,6 @@ static struct ref *do_fetch_pack_v2(struct 
fetch_pack_args *args,
for_each_ref(clear_marks, NULL);
marked = 1;
 
-   for_each_ref(rev_list_insert_ref_oid, NULL);
-   for_each_cached_alternate(insert_one_alternate_object);
-
/* Filter 'ref' by 'sought' and those that aren't local 
*/
mark_complete_and_common_ref(args, &ref);
filter_refs(args, &ref, sought, nr_sought);
@@ -1401,6 +1398,9 @@ static struct ref *do_fetch_pack_v2(struct 
fetch_pack_args *args,
state = FETCH_DONE;
else
state = FETCH_SEND_REQUEST;
+
+   for_each_ref(rev_list_insert_ref_oid, NULL);
+   for_each_cached_alternate(insert_one_alternate_object);
break;
case FETCH_SEND_REQUEST:
if (send_fetch_request(fd[1], args, ref, &common,
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index d4f435155..026ba9c9e 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -755,6 +755,45 @@ test_expect_success 'fetching deepen' '
)
 '
 
+test_expect_success 'use ref advertisement to prune "have" lines sent' '
+   rm -rf server client &&
+   git init server &&
+   test_commit -C server both_have_1 &&
+   git -C server tag -d both_have_1 &&
+   test_commit -C server both_have_2 &&
+
+   # In this test, the ref name that only the server has is a prefix of all
+   # other refs. This is because in protocol v2, the client sends
+   # "ref-prefix" to limit the ref advertisement. Naming the ref "bo" means
+   # that "ref-prefix refs/tags/bo*" is sent, resulting in the client also
+   # knowing about refs/tags/both_have_2, just as it would when it uses
+   # protocol v0.
+   git clone server client &&
+   test_commit -C server bo &&
+   test_commit -C client client_has &&
+
+   # In both protocol v0 and v2, ensure that the parent of both_have_2 is
+   # not sent as a "have" line. The client should know that the server has
+   # both_have_2, so it only needs to inform the server that it has
+   # both_have_2, and the server can infer the rest.
+
+   rm -f trace &&
+   cp -r client clientv0 &&
+   GIT_TRACE_PACKET="$(pwd)/trace" git -C clientv0 \
+   fetch origin bo &&
+   grep "have $(git -C client rev-parse client_has)" trace &&
+   grep "have $(git -C client rev-parse both_have_2)" trace &&
+   ! grep "have $(git -C client rev-parse both_have_2^)" trace &&
+
+   rm -f trace &&
+   cp -r client clientv2 &&
+   GIT_TRACE_PACKET="$(pwd)/trace" git -C clientv2 -c protocol.version=2 \
+   fetch origin bo &&
+   grep "have $(git -C client rev-parse client_has)" trace &&
+   grep "have $(git -C client rev-parse both_have_2)" trace &&
+   ! grep "have $(git -C client rev-parse both_have_2^)" trace
+'
+
 test_expect_success 'filtering by size' '
rm -rf server client &&
test_create_repo server &&
-- 
2.17.0.768.g1526ddbba1.dirty



[PATCH v2 2/8] fetch-pack: clear marks before re-marking

2018-06-06 Thread Jonathan Tan
If tag following is required when using a transport that does not
support tag following, fetch_pack() will be invoked twice in the same
process, necessitating a clearing of the object flags used by
fetch_pack() sometime during the second invocation. This is currently
done in find_common(), which means that the invocation of
mark_complete_and_common_ref() in do_fetch_pack() is useless.

(This cannot be reproduced with Git alone, because all transports that
come with Git support tag following.)

Therefore, move this clearing from find_common() to its
parent function do_fetch_pack(), right before it calls
mark_complete_and_common_ref().

This has been occurring since the commit that introduced the clearing of
marks, 420e9af498 ("Fix tag following", 2008-03-19).

The corresponding code for protocol v2 in do_fetch_pack_v2() does not
have this problem, as the clearing of flags is done before any marking
(whether by rev_list_insert_ref_oid() or
mark_complete_and_common_ref()).

Signed-off-by: Jonathan Tan 
---
 fetch-pack.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 5c87bb8bb..2812499a5 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -336,9 +336,6 @@ static int find_common(struct fetch_pack_args *args,
 
if (args->stateless_rpc && multi_ack == 1)
die(_("--stateless-rpc requires multi_ack_detailed"));
-   if (marked)
-   for_each_ref(clear_marks, NULL);
-   marked = 1;
 
for_each_ref(rev_list_insert_ref_oid, NULL);
for_each_cached_alternate(insert_one_alternate_object);
@@ -1070,6 +1067,9 @@ static struct ref *do_fetch_pack(struct fetch_pack_args 
*args,
if (!server_supports("deepen-relative") && args->deepen_relative)
die(_("Server does not support --deepen"));
 
+   if (marked)
+   for_each_ref(clear_marks, NULL);
+   marked = 1;
mark_complete_and_common_ref(args, &ref);
filter_refs(args, &ref, sought, nr_sought);
if (everything_local(args, &ref)) {
-- 
2.17.0.768.g1526ddbba1.dirty



[PATCH v2 6/8] fetch-pack: move common check and marking together

2018-06-06 Thread Jonathan Tan
When receiving 'ACK  continue' for a common commit, check if
the commit was already known to be common and mark it as such if not up
front. This should make future refactoring of how the information about
common commits is stored more straightforward.

No visible change intended.

Signed-off-by: Jonathan Tan 
---
 fetch-pack.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index c31644bb9..4a4ec4da3 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -499,11 +499,14 @@ static int find_common(struct data *data, struct 
fetch_pack_args *args,
case ACK_continue: {
struct commit *commit =
lookup_commit(result_oid);
+   int was_common;
if (!commit)
die(_("invalid commit %s"), 
oid_to_hex(result_oid));
+   was_common = commit->object.flags & 
COMMON;
+   mark_common(data, commit, 0, 1);
if (args->stateless_rpc
 && ack == ACK_common
-&& !(commit->object.flags & COMMON)) {
+&& !was_common) {
/* We need to replay the have 
for this object
 * on the next RPC request so 
the peer knows
 * it is in common with us.
@@ -520,7 +523,6 @@ static int find_common(struct data *data, struct 
fetch_pack_args *args,
} else if (!args->stateless_rpc
   || ack != ACK_common)
in_vain = 0;
-   mark_common(data, commit, 0, 1);
retval = 0;
got_continue = 1;
if (ack == ACK_ready)
-- 
2.17.0.768.g1526ddbba1.dirty



Re: git rm bug

2018-06-06 Thread Robert P. J. Day
On Thu, 7 Jun 2018, Timothy Rice wrote:

> > It does seem like something which could be noted in the git
> > rm docs.  Perhaps you'd care to take a stab at a patch to
> > add a note to Documentation/git-rm.txt Thomas?  Maybe a note
> > at the end of the DISCUSSION section?
>
> That same documentation could mention a common workaround for when
> someone does really want to keep the empty directories:
>
> $ touch ./path/to/empty/dir/.keep
> $ git add ./path/to/empty/dir/.keep
> $ git commit -m "Keep that empty directory because it is needed for 
> "
>
> This would obviate the need for a new flag to switch behaviours.

  i'm going to be contrarian and obstinate and suggest that the
current behaviour is fine, since there is no compelling rationale for
any *other* behaviour.

  invariably, every defense for hanging on to empty directories boils
down to, "i might do something in the future that expects those
directories to exist." well, if that's the case, then create them when
you need them -- nothing you do should ever simply *assume* the
existence of essential directories.

  in addition, by "untracking" those directories, you're suggesting
that git quietly do what should normally be done by "git rm --cached".
if i want that behaviour, i would prefer to have to type it myself.

  i'm fine with just adding a note to "git rm" making it clear what
happens in this case. i see no value in supporting extra options that
simply encourage bad behaviour.

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Re: git rm bug

2018-06-06 Thread Timothy Rice
> It does seem like something which could be noted in the git
> rm docs.  Perhaps you'd care to take a stab at a patch to
> add a note to Documentation/git-rm.txt Thomas?  Maybe a note
> at the end of the DISCUSSION section?

That same documentation could mention a common workaround for when someone
does really want to keep the empty directories:

$ touch ./path/to/empty/dir/.keep
$ git add ./path/to/empty/dir/.keep
$ git commit -m "Keep that empty directory because it is needed for "

This would obviate the need for a new flag to switch behaviours.

~ Tim



Re: git rm bug

2018-06-06 Thread Jeff King
On Wed, Jun 06, 2018 at 04:01:38PM -0400, Todd Zullinger wrote:

> Thomas Fischer wrote:
> > I agree that the entire chain of empty directories should not be tracked, 
> > as git tracks content, not files.
> > 
> > However, when I run 'rm path/to/some/file', I expect path/to/some/ to still 
> > exist.
> > 
> > Similarly, when I run 'git rm path/to/some/file', I expect path/to/some/ to 
> > exist, *albeit untracked*.
> > 
> > I do NOT expect git to *track* empty directories. But I also do NOT expect 
> > it to remove untracked directories.
> 
> It looks like this behavior has been in place for many
> years, since d9b814cc97 ("Add builtin "git rm" command",
> 2006-05-19).  Interestingly, Linus noted in the commit
> message that the removal of leading directories was
> different than when git-rm was a shell script.  And he
> wondered if it might be worth having an option to control
> that behavior.
> 
> I imagine that most users either want the current behavior
> or they rarely run across this and are surprised, given how
> long git rm has worked this way.

It's also consistent with other parts of Git that remove files. E.g.,
"git checkout" to a state that does not have the file will remove the
leading directories (if they're empty, of course).

> It does seem like something which could be noted in the git
> rm docs.  Perhaps you'd care to take a stab at a patch to
> add a note to Documentation/git-rm.txt Thomas?  Maybe a note
> at the end of the DISCUSSION section?

Yeah, agreed.

-Peff


Re: git rm bug

2018-06-06 Thread Todd Zullinger
Thomas Fischer wrote:
> I agree that the entire chain of empty directories should not be tracked, as 
> git tracks content, not files.
> 
> However, when I run 'rm path/to/some/file', I expect path/to/some/ to still 
> exist.
> 
> Similarly, when I run 'git rm path/to/some/file', I expect path/to/some/ to 
> exist, *albeit untracked*.
> 
> I do NOT expect git to *track* empty directories. But I also do NOT expect it 
> to remove untracked directories.

It looks like this behavior has been in place for many
years, since d9b814cc97 ("Add builtin "git rm" command",
2006-05-19).  Interestingly, Linus noted in the commit
message that the removal of leading directories was
different than when git-rm was a shell script.  And he
wondered if it might be worth having an option to control
that behavior.

I imagine that most users either want the current behavior
or they rarely run across this and are surprised, given how
long git rm has worked this way.

It does seem like something which could be noted in the git
rm docs.  Perhaps you'd care to take a stab at a patch to
add a note to Documentation/git-rm.txt Thomas?  Maybe a note
at the end of the DISCUSSION section?

-- 
Todd
~~
If everything seems to be going well, you have obviously overlooked
something.



Re: git rm bug

2018-06-06 Thread Duy Nguyen
On Wed, Jun 6, 2018 at 9:32 PM, Thomas Fischer
 wrote:
> OVERVIEW
>
> "git rm" will remove more files than specified. This is either a bug or 
> undocumented behavior (not in the man pages).

The behavior is intended, with a question mark. This change is
introduced in d9b814cc97 (Add builtin "git rm" command - 2006-05-19).
I quote the relevant paragraph from that commit

The other question is what to do with leading directories. The old "git
rm" script didn't do anything, which is somewhat inconsistent. This one
will actually clean up directories that have become empty as a result of
removing the last file, but maybe we want to have a flag to decide the
behaviour?

To me we definitely should document this (patches welcome!) then maybe
revisit this "have a flag to decide the behavior" question from 12
years ago.

> SETUP
>
> 1. In a git repository, create an empty directory OR a chain of empty 
> directories
>
> $ mkdir -p path/to/some/
>
> 2. Create a file in the deepest directory and add it to tracking
>
> $ touch path/to/some/file
> $ git add path/to/some/file
> $ git commit -m 'add path/to/some/file'
>
> THE BUG
>
> Run 'git rm' on the tracked file.
>
> EXPECTED BEHAVIOR
>
> $ git rm path/to/some/file
> rm 'path/to/some/file'
> $ ls path
> to/
> $ ls path/to
> some/
>
> Note that path/, path/to/, and path/to/some/ still exist.
>
> ACTUAL BEHAVIOR
>
> $ git rm path/to/some/file
> rm 'path/to/some/file'
> $ ls path
> ls: cannot access 'path': No such file or directory
>
> The entire chain of empty directories is removed, despite the fact the git 
> outputs only "rm 'path/to/some/file'".
>
> This ONLY occurs when all the directories in the chain are empty after the 
> tracked file has been removed.
>
> This behavior is NOT documented in the man pages.
>
> I propose that 'rmdir' statements are added to 'git rm' output, or that the 
> man pages be updated to reflect this behavior.
>
> Best,
> Thomas Fischer



-- 
Duy


Re: git rm bug

2018-06-06 Thread Robert P. J. Day
On Wed, 6 Jun 2018, Thomas Fischer wrote:

> I agree that the entire chain of empty directories should not be
> tracked, as git tracks content, not files.
>
> However, when I run 'rm path/to/some/file', I expect path/to/some/
> to still exist.

  why?

> Similarly, when I run 'git rm path/to/some/file', I expect
> path/to/some/ to exist, *albeit untracked*.

  again, why?

> I do NOT expect git to *track* empty directories. But I also do NOT
> expect it to remove untracked directories.

  why not?

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Re: git rm bug

2018-06-06 Thread Thomas Fischer
I agree that the entire chain of empty directories should not be tracked, as 
git tracks content, not files.

However, when I run 'rm path/to/some/file', I expect path/to/some/ to still 
exist.

Similarly, when I run 'git rm path/to/some/file', I expect path/to/some/ to 
exist, *albeit untracked*.

I do NOT expect git to *track* empty directories. But I also do NOT expect it 
to remove untracked directories.

-- 
  Thomas Fischer
  thomasfisc...@fastmail.com

On Wed, Jun 6, 2018, at 2:33 PM, Robert P. J. Day wrote:
> On Wed, 6 Jun 2018, Thomas Fischer wrote:
> 
> > OVERVIEW
> >
> > "git rm" will remove more files than specified. This is either a bug or 
> > undocumented behavior (not in the man pages).
> >
> > SETUP
> >
> > 1. In a git repository, create an empty directory OR a chain of empty 
> > directories
> >
> > $ mkdir -p path/to/some/
> >
> > 2. Create a file in the deepest directory and add it to tracking
> >
> > $ touch path/to/some/file
> > $ git add path/to/some/file
> > $ git commit -m 'add path/to/some/file'
> >
> > THE BUG
> >
> > Run 'git rm' on the tracked file.
> >
> > EXPECTED BEHAVIOR
> >
> > $ git rm path/to/some/file
> > rm 'path/to/some/file'
> > $ ls path
> > to/
> > $ ls path/to
> > some/
> >
> > Note that path/, path/to/, and path/to/some/ still exist.
> >
> > ACTUAL BEHAVIOR
> >
> > $ git rm path/to/some/file
> > rm 'path/to/some/file'
> > $ ls path
> > ls: cannot access 'path': No such file or directory
> >
> > The entire chain of empty directories is removed, despite the fact
> > the git outputs only "rm 'path/to/some/file'".
> 
>   git cannot track empty directories. as that was the *only* content
> in that whole hierarchy, the entire hierarchy had to be deleted.
> 
> rday


Re: [PATCH 03/35] object: add repository argument to lookup_unknown_object

2018-06-06 Thread Duy Nguyen
On Wed, May 30, 2018 at 2:47 AM, Stefan Beller  wrote:
> diff --git a/object.c b/object.c
> index 4de4fa58d59..def3c71cac2 100644
> --- a/object.c
> +++ b/object.c
> @@ -177,7 +177,7 @@ void *object_as_type(struct object *obj, enum object_type 
> type, int quiet)
> }
>  }
>
> -struct object *lookup_unknown_object(const unsigned char *sha1)
> +struct object *lookup_unknown_object_the_repository(const unsigned char 
> *sha1)

I'm looking at your branch and this function (with the _the_repository
suffix) is still there. Did you forget to send a patch to convert this
function?

>  {
> struct object *obj = lookup_object(the_repository, sha1);
> if (!obj)
-- 
Duy


Re: git rm bug

2018-06-06 Thread Robert P. J. Day
On Wed, 6 Jun 2018, Thomas Fischer wrote:

> OVERVIEW
>
> "git rm" will remove more files than specified. This is either a bug or 
> undocumented behavior (not in the man pages).
>
> SETUP
>
> 1. In a git repository, create an empty directory OR a chain of empty 
> directories
>
> $ mkdir -p path/to/some/
>
> 2. Create a file in the deepest directory and add it to tracking
>
> $ touch path/to/some/file
> $ git add path/to/some/file
> $ git commit -m 'add path/to/some/file'
>
> THE BUG
>
> Run 'git rm' on the tracked file.
>
> EXPECTED BEHAVIOR
>
> $ git rm path/to/some/file
> rm 'path/to/some/file'
> $ ls path
> to/
> $ ls path/to
> some/
>
> Note that path/, path/to/, and path/to/some/ still exist.
>
> ACTUAL BEHAVIOR
>
> $ git rm path/to/some/file
> rm 'path/to/some/file'
> $ ls path
> ls: cannot access 'path': No such file or directory
>
> The entire chain of empty directories is removed, despite the fact
> the git outputs only "rm 'path/to/some/file'".

  git cannot track empty directories. as that was the *only* content
in that whole hierarchy, the entire hierarchy had to be deleted.

rday


Re: [PATCH 28/35] commit.c: migrate the commit buffer to the parsed object store

2018-06-06 Thread Duy Nguyen
On Wed, May 30, 2018 at 2:48 AM, Stefan Beller  wrote:
> Signed-off-by: Stefan Beller 
> ---
>  commit.c | 29 +++--
>  commit.h |  2 ++
>  object.c |  5 +
>  object.h |  2 ++
>  4 files changed, 32 insertions(+), 6 deletions(-)
>
> diff --git a/commit.c b/commit.c
> index fd31c453fdc..45dbf6f1453 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -248,18 +248,32 @@ struct commit_buffer {
> unsigned long size;
>  };
>  define_commit_slab(buffer_slab, struct commit_buffer);

struct buffer_slab is defined locally here...

> diff --git a/commit.h b/commit.h
> index 536fa03955f..df199498bfb 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -81,6 +81,8 @@ static inline int parse_commit_the_repository(struct commit 
> *item)
>  }
>  void parse_commit_or_die(struct commit *item);
>
> +struct buffer_slab *allocate_commit_buffer_slab(void);

So you would need a forward declaration of struct buffer_slab in
commit.h before it's referenced here?

> diff --git a/object.h b/object.h
> index 6adc8323ca4..45e22282101 100644
> --- a/object.h
> +++ b/object.h
> @@ -22,6 +22,8 @@ struct parsed_object_pool {
> char *alternate_shallow_file;
>
> int commit_graft_prepared;
> +
> +   struct buffer_slab *buffer_slab;

and maybe here as well

>  };
>
>  struct parsed_object_pool *parsed_object_pool_new(void);
> --
> 2.17.0.582.gccdcbd54c44.dirty
>



-- 
Duy


git rm bug

2018-06-06 Thread Thomas Fischer
OVERVIEW

"git rm" will remove more files than specified. This is either a bug or 
undocumented behavior (not in the man pages).

SETUP

1. In a git repository, create an empty directory OR a chain of empty 
directories

$ mkdir -p path/to/some/

2. Create a file in the deepest directory and add it to tracking

$ touch path/to/some/file
$ git add path/to/some/file
$ git commit -m 'add path/to/some/file'

THE BUG

Run 'git rm' on the tracked file. 

EXPECTED BEHAVIOR

$ git rm path/to/some/file
rm 'path/to/some/file'
$ ls path
to/
$ ls path/to
some/

Note that path/, path/to/, and path/to/some/ still exist.

ACTUAL BEHAVIOR

$ git rm path/to/some/file
rm 'path/to/some/file'
$ ls path
ls: cannot access 'path': No such file or directory

The entire chain of empty directories is removed, despite the fact the git 
outputs only "rm 'path/to/some/file'".

This ONLY occurs when all the directories in the chain are empty after the 
tracked file has been removed.

This behavior is NOT documented in the man pages.

I propose that 'rmdir' statements are added to 'git rm' output, or that the man 
pages be updated to reflect this behavior.

Best,
Thomas Fischer


Re: [PATCH 07/35] tree: add repository argument to lookup_tree

2018-06-06 Thread Duy Nguyen
On Wed, May 30, 2018 at 2:47 AM, Stefan Beller  wrote:
> Add a repository argument to allow the callers of lookup_tree
> to be more specific about which repository to act on. This is a small
> mechanical change; it doesn't change the implementation to handle
> repositories other than the_repository yet.
>
> As with the previous commits, use a macro to catch callers passing a
> repository other than the_repository at compile time.
>
> Add the cocci patch that converted the callers.

I don't see the cocci patch in diffstat. I don't need to see it, but
this sentence probably should be dropped.

>
> Signed-off-by: Jonathan Nieder 
> Signed-off-by: Stefan Beller 
> ---
>  builtin/am.c| 6 --
>  builtin/diff-tree.c | 2 +-
>  builtin/diff.c  | 3 ++-
>  builtin/reflog.c| 2 +-
>  cache-tree.c| 3 ++-
>  commit-graph.c  | 2 +-
>  commit.c| 2 +-
>  fsck.c  | 2 +-
>  http-push.c | 3 ++-
>  list-objects.c  | 2 +-
>  merge-recursive.c   | 6 +++---
>  object.c| 2 +-
>  reachable.c | 2 +-
>  revision.c  | 4 ++--
>  sequencer.c | 2 +-
>  tag.c   | 2 +-
>  tree.c  | 4 ++--
>  tree.h  | 3 ++-
>  walker.c| 3 ++-
>  19 files changed, 31 insertions(+), 24 deletions(-)
-- 
Duy


Re: [RFC PATCH 0/5] format-patch: automate cover letter range-diff

2018-06-06 Thread Duy Nguyen
On Wed, May 30, 2018 at 10:03 AM, Eric Sunshine  wrote:
> Dscho recently implemented a 'tbdiff' replacement as a Git builtin named
> git-branch-diff[1] which computes differences between two versions of a
> patch series. Such a diff can be a useful aid for reviewers when
> inserted into a cover letter. However, doing so requires manual
> generation (invoking git-branch-diff) and copy/pasting the result into
> the cover letter.

Another option which I wanted to go is delegate part of cover letter
generation to a hook (or just a config key that contains a shell
command). This way it's easier to customize cover letters. We could
still have a good fallback that does shortlog, diffstat and tbdiff.
-- 
Duy


Re: BUG: submodule code prints '(null)'

2018-06-06 Thread Kaartic Sivaraam
On Tuesday 05 June 2018 09:01 PM, Duy Nguyen wrote:
> I'll leave it to submodule people to fix this :)
> 

I'm Ccing the only one I know to gain attention.


-- 
Sivaraam

QUOTE:

“The three principal virtues of a programmer are Laziness, Impatience,
and Hubris.”

- Camel book

Sivaraam?

You possibly might have noticed that my signature recently changed from
'Kaartic' to 'Sivaraam' both of which are parts of my name. I find the
new signature to be better for several reasons one of which is that the
former signature has a lot of ambiguities in the place I live as it is a
common name (NOTE: it's not a common spelling, just a common name). So,
I switched signatures before it's too late.

That said, I won't mind you calling me 'Kaartic' if you like it [of
course ;-)]. You can always call me using either of the names.


KIND NOTE TO THE NATIVE ENGLISH SPEAKER:

As I'm not a native English speaker myself, there might be mistaeks in
my usage of English. I apologise for any mistakes that I make.

It would be "helpful" if you take the time to point out the mistakes.

It would be "super helpful" if you could provide suggestions about how
to correct those mistakes.

Thanks in advance!



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] config.c: fix regression for core.safecrlf false

2018-06-06 Thread Eric Sunshine
On Wed, Jun 6, 2018 at 1:18 PM, Anthony Sottile  wrote:
> On Wed, Jun 6, 2018 at 10:17 AM, Eric Sunshine  
> wrote:
>> On Wed, Jun 6, 2018 at 1:15 PM, Eric Sunshine  
>> wrote:
>>> On Mon, Jun 4, 2018 at 4:17 PM, Anthony Sottile  wrote:
 +   for w in I am all CRLF; do echo $w; done | append_cr >allcrlf &&
>>>
>>> Simpler: printf "%s\n" I am all CRLF | append_cr >allcrlf &&
>>
>> Or even simpler:
>>
>> printf "%s\r\n" I am all CRLF >allcrlf &&
>
> Yeah, I just copied the line in my test from another test in this file
> which was doing a ~similar thing. [...]

Thanks for pointing that out. In that case, it's following existing
practice, thus certainly not worth a re-roll.


Re: [PATCH] config.c: fix regression for core.safecrlf false

2018-06-06 Thread Anthony Sottile
On Wed, Jun 6, 2018 at 10:17 AM, Eric Sunshine  wrote:
> On Wed, Jun 6, 2018 at 1:15 PM, Eric Sunshine  wrote:
>> On Mon, Jun 4, 2018 at 4:17 PM, Anthony Sottile  wrote:
>>> +   for w in I am all CRLF; do echo $w; done | append_cr >allcrlf &&
>>
>> Simpler: printf "%s\n" I am all CRLF | append_cr >allcrlf &&
>
> Or even simpler:
>
> printf "%s\r\n" I am all CRLF >allcrlf &&

Yeah, I just copied the line in my test from another test in this file
which was doing a ~similar thing.  My original bug report actually
uses `echo -en ...` to accomplish the same thing.


Re: [PATCH] config.c: fix regression for core.safecrlf false

2018-06-06 Thread Eric Sunshine
On Wed, Jun 6, 2018 at 1:15 PM, Eric Sunshine  wrote:
> On Mon, Jun 4, 2018 at 4:17 PM, Anthony Sottile  wrote:
>> +   for w in I am all CRLF; do echo $w; done | append_cr >allcrlf &&
>
> Simpler: printf "%s\n" I am all CRLF | append_cr >allcrlf &&

Or even simpler:

printf "%s\r\n" I am all CRLF >allcrlf &&


Re: [PATCH] config.c: fix regression for core.safecrlf false

2018-06-06 Thread Eric Sunshine
On Mon, Jun 4, 2018 at 4:17 PM, Anthony Sottile  wrote:
> A regression introduced in 8462ff43e42ab67cecd16fdfb59451a53cc8a945 caused
> autocrlf rewrites to produce a warning message despite setting safecrlf=false.
>
> Signed-off-by: Anthony Sottile 
> ---
> diff --git a/t/t0020-crlf.sh b/t/t0020-crlf.sh
> @@ -98,6 +98,16 @@ test_expect_success 'safecrlf: git diff demotes 
> safecrlf=true to warn' '
> +test_expect_success 'safecrlf: no warning with safecrlf=false' '
> +   git config core.autocrlf input &&
> +   git config core.safecrlf false &&

I was going to suggest test_config() for these rather than bare
git-config, but I see other tests in this file already use the bare
form, so this is following existing practice.

> +   for w in I am all CRLF; do echo $w; done | append_cr >allcrlf &&

Simpler: printf "%s\n" I am all CRLF | append_cr >allcrlf &&

(probably not worth a re-roll)

> +   git add allcrlf 2>err &&
> +   test_must_be_empty err
> +'


Re: [PATCH v7 2/2] json-writer: t0019: add Python unit test

2018-06-06 Thread Todd Zullinger
g...@jeffhostetler.com wrote:
> +# As a sanity check, ask Python to parse our generated JSON.  Let Python
> +# recursively dump the resulting dictionary in sorted order.  Confirm that
> +# that matches our expectations.
> +test_expect_success PYTHON 'parse JSON using Python' '
[...]
> + python "$TEST_DIRECTORY"/t0019/parse_json_1.py actual &&

Would this be better using $PYTHON_PATH rather than
hard-coding python as the command?

-- 
Todd
~~
Sometimes I think I understand everything, then I regain
consciousness.



Re: [PATCH v3 17/20] cache.c: remove an implicit dependency on the_index

2018-06-06 Thread Brandon Williams
On 06/06, Nguyễn Thái Ngọc Duy wrote:
> Make some index API take an index_state instead of assuming the_index
> in read-cache.c. All external call sites are converted blindly to keep
> the patch simple and retain current behavior.  Individual call sites
> may receive further updates to use the right index instead of the_index.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  apply.c|  2 +-
>  builtin/checkout.c |  2 +-
>  builtin/difftool.c |  4 ++--
>  builtin/reset.c|  2 +-
>  cache.h|  5 +++--
>  merge-recursive.c  |  2 +-
>  read-cache.c   | 19 +++
>  resolve-undo.c |  2 +-
>  8 files changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/apply.c b/apply.c
> index 9720855590..811ff2ad5e 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -4090,7 +4090,7 @@ static int build_fake_ancestor(struct apply_state 
> *state, struct patch *list)
>   return error(_("sha1 information is lacking or useless "
>  "(%s)."), name);
>  
> - ce = make_cache_entry(patch->old_mode, oid.hash, name, 0, 0);
> + ce = make_cache_entry(&the_index, patch->old_mode, oid.hash, 
> name, 0, 0);
>   if (!ce)
>   return error(_("make_cache_entry failed for path '%s'"),
>name);
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index f8c208cea1..3c8218304e 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -230,7 +230,7 @@ static int checkout_merged(int pos, const struct checkout 
> *state)
>   if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid))
>   die(_("Unable to add merge result for '%s'"), path);
>   free(result_buf.ptr);
> - ce = make_cache_entry(mode, oid.hash, path, 2, 0);
> + ce = make_cache_entry(&the_index, mode, oid.hash, path, 2, 0);
>   if (!ce)
>   die(_("make_cache_entry failed for path '%s'"), path);
>   status = checkout_entry(ce, state, NULL);
> diff --git a/builtin/difftool.c b/builtin/difftool.c
> index bc97d4aef2..e34e75a42d 100644
> --- a/builtin/difftool.c
> +++ b/builtin/difftool.c
> @@ -321,7 +321,7 @@ static int checkout_path(unsigned mode, struct object_id 
> *oid,
>   struct cache_entry *ce;
>   int ret;
>  
> - ce = make_cache_entry(mode, oid->hash, path, 0, 0);
> + ce = make_cache_entry(&the_index, mode, oid->hash, path, 0, 0);
>   ret = checkout_entry(ce, state, NULL);
>  
>   free(ce);
> @@ -488,7 +488,7 @@ static int run_dir_diff(const char *extcmd, int symlinks, 
> const char *prefix,
>* index.
>*/
>   struct cache_entry *ce2 =
> - make_cache_entry(rmode, roid.hash,
> + make_cache_entry(&the_index, rmode, 
> roid.hash,
>dst_path, 0, 0);
>  
>   add_index_entry(&wtindex, ce2,
> diff --git a/builtin/reset.c b/builtin/reset.c
> index a862c70fab..0ea0a19d5e 100644
> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -134,7 +134,7 @@ static void update_index_from_diff(struct 
> diff_queue_struct *q,
>   continue;
>   }
>  
> - ce = make_cache_entry(one->mode, one->oid.hash, one->path,
> + ce = make_cache_entry(&the_index, one->mode, one->oid.hash, 
> one->path,
> 0, 0);
>   if (!ce)
>   die(_("make_cache_entry failed for path '%s'"),
> diff --git a/cache.h b/cache.h
> index 89a107a7f7..5939233eb7 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -355,6 +355,7 @@ extern void free_name_hash(struct index_state *istate);
>  #define unmerged_cache() unmerged_index(&the_index)
>  #define cache_name_pos(name, namelen) 
> index_name_pos(&the_index,(name),(namelen))
>  #define add_cache_entry(ce, option) add_index_entry(&the_index, (ce), 
> (option))
> +#define refresh_cache_entry(ce, flags) refresh_index_entry(&the_index, ce, 
> flags)
>  #define rename_cache_entry_at(pos, new_name) 
> rename_index_entry_at(&the_index, (pos), (new_name))
>  #define remove_cache_entry_at(pos) remove_index_entry_at(&the_index, (pos))
>  #define remove_file_from_cache(path) remove_file_from_index(&the_index, 
> (path))
> @@ -698,7 +699,7 @@ extern int remove_file_from_index(struct index_state *, 
> const char *path);
>  extern int add_to_index(struct index_state *, const char *path, struct stat 
> *, int flags);
>  extern int add_file_to_index(struct index_state *, const char *path, int 
> flags);
>  
> -extern struct cache_entry *make_cache_entry(unsigned int mode, const 
> unsigned char *sha1, const char *path, int stage, unsigned int 
> refresh_options);
> +extern struct cache_entry *make_cache_entry(struct index_state 
> *istate,unsigned int mode, const unsigned c

[PATCH v4 07/23] attr: remove an implicit dependency on the_index

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Make the attr API take an index_state instead of assuming the_index in
attr code. All call sites are converted blindly to keep the patch
simple and retain current behavior. Individual call sites may receive
further updates to use the right index instead of the_index.

There is one ugly temporary workaround added in attr.c that needs some
more explanation.

Commit c24f3abace (apply: file commited with CRLF should roundtrip
diff and apply - 2017-08-19) forces one convert_to_git() call to NOT
read the index at all. But what do you know, we read it anyway by
falling back to the_index. When "istate" from convert_to_git is now
propagated down to read_attr_from_array() we will hit segfault
somewhere inside read_blob_data_from_index.

The right way of dealing with this is to kill "use_index" variable and
only follow "istate" but at this stage we are not ready for that:
while most git_attr_set_direction() calls just passes the_index to be
assigned to use_index, unpack-trees passes a different one which is
used by entry.c code, which has no way to know what index to use if we
delete use_index. So this has to be done later.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 archive.c  |  2 +-
 attr.c | 57 --
 attr.h | 10 +---
 builtin/check-attr.c   |  4 +--
 builtin/pack-objects.c |  2 +-
 convert.c  |  2 +-
 dir.c  |  2 +-
 ll-merge.c |  4 +--
 userdiff.c |  2 +-
 ws.c   |  2 +-
 10 files changed, 55 insertions(+), 32 deletions(-)

diff --git a/archive.c b/archive.c
index 4fe7bec60c..3b4db8956a 100644
--- a/archive.c
+++ b/archive.c
@@ -108,7 +108,7 @@ static const struct attr_check *get_archive_attrs(const 
char *path)
static struct attr_check *check;
if (!check)
check = attr_check_initl("export-ignore", "export-subst", NULL);
-   return git_check_attr(path, check) ? NULL : check;
+   return git_check_attr(&the_index, path, check) ? NULL : check;
 }
 
 static int check_attr_export_ignore(const struct attr_check *check)
diff --git a/attr.c b/attr.c
index 067fb9e0c0..863fad3bd1 100644
--- a/attr.c
+++ b/attr.c
@@ -708,10 +708,10 @@ static struct attr_stack *read_attr_from_array(const char 
**list)
  * another thread could potentially be calling into the attribute system.
  */
 static enum git_attr_direction direction;
-static struct index_state *use_index;
+static const struct index_state *use_index;
 
 void git_attr_set_direction(enum git_attr_direction new_direction,
-   struct index_state *istate)
+   const struct index_state *istate)
 {
if (is_bare_repository() && new_direction != GIT_ATTR_INDEX)
BUG("non-INDEX attr direction in a bare repo");
@@ -743,13 +743,24 @@ static struct attr_stack *read_attr_from_file(const char 
*path, int macro_ok)
return res;
 }
 
-static struct attr_stack *read_attr_from_index(const char *path, int macro_ok)
+static struct attr_stack *read_attr_from_index(const struct index_state 
*istate,
+  const char *path,
+  int macro_ok)
 {
struct attr_stack *res;
char *buf, *sp;
int lineno = 0;
+   const struct index_state *to_read_from;
 
-   buf = read_blob_data_from_index(use_index ? use_index : &the_index, 
path, NULL);
+   /*
+* Temporary workaround for c24f3abace (apply: file commited
+* with CRLF should roundtrip diff and apply - 2017-08-19)
+*/
+   to_read_from = use_index ? use_index : istate;
+   if (!to_read_from)
+   return NULL;
+
+   buf = read_blob_data_from_index(to_read_from, path, NULL);
if (!buf)
return NULL;
 
@@ -768,15 +779,16 @@ static struct attr_stack *read_attr_from_index(const char 
*path, int macro_ok)
return res;
 }
 
-static struct attr_stack *read_attr(const char *path, int macro_ok)
+static struct attr_stack *read_attr(const struct index_state *istate,
+   const char *path, int macro_ok)
 {
struct attr_stack *res = NULL;
 
if (direction == GIT_ATTR_INDEX) {
-   res = read_attr_from_index(path, macro_ok);
+   res = read_attr_from_index(istate, path, macro_ok);
} else if (!is_bare_repository()) {
if (direction == GIT_ATTR_CHECKOUT) {
-   res = read_attr_from_index(path, macro_ok);
+   res = read_attr_from_index(istate, path, macro_ok);
if (!res)
res = read_attr_from_file(path, macro_ok);
} else if (direction == GIT_ATTR_CHECKIN) {
@@ -788,7 +800,7 @@ static struct attr_stack *read_attr(const char *path, int 
macro_ok)
 * We allow operation in a sparsely checked out

[PATCH v4 18/23] apply.c: use the right index instead of the_index

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 apply.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/apply.c b/apply.c
index fc42a0eadf..82f681972f 100644
--- a/apply.c
+++ b/apply.c
@@ -4090,7 +4090,7 @@ static int build_fake_ancestor(struct apply_state *state, 
struct patch *list)
return error(_("sha1 information is lacking or useless "
   "(%s)."), name);
 
-   ce = make_index_entry(&the_index, patch->old_mode, oid.hash, 
name, 0, 0);
+   ce = make_index_entry(&result, patch->old_mode, oid.hash, name, 
0, 0);
if (!ce)
return error(_("make_index_entry failed for path '%s'"),
 name);
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v4 15/23] attr: remove index from git_attr_set_direction()

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Since attr checking API now take the index, there's no need to set an
index in advance with this call. Most call sites are straightforward
because they either pass the_index or NULL (which defaults back to
the_index previously). There's only one suspicious call site in
unpack-trees.c where it sets a different index.

This code in unpack-trees is about to checking out entries from the
new/temporary index after merging is done in it. The attributes will
be used by entry.c code to do crlf conversion if needed. entry.c now
respects struct checkout's istate field, and this field is correctly
set in unpack-trees.c, there should be no regression from this change.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 archive.c|  2 +-
 attr.c   | 15 +++
 attr.h   |  3 +--
 builtin/check-attr.c |  2 +-
 unpack-trees.c   |  4 ++--
 5 files changed, 8 insertions(+), 18 deletions(-)

diff --git a/archive.c b/archive.c
index 1b44503ebb..d1d0a0d6b3 100644
--- a/archive.c
+++ b/archive.c
@@ -273,7 +273,7 @@ int write_archive_entries(struct archiver_args *args,
init_tree_desc(&t, args->tree->buffer, args->tree->size);
if (unpack_trees(1, &t, &opts))
return -1;
-   git_attr_set_direction(GIT_ATTR_INDEX, &the_index);
+   git_attr_set_direction(GIT_ATTR_INDEX);
}
 
err = read_tree_recursive(args->tree, "", 0, 0, &args->pathspec,
diff --git a/attr.c b/attr.c
index 863fad3bd1..98e4953f6e 100644
--- a/attr.c
+++ b/attr.c
@@ -708,10 +708,8 @@ static struct attr_stack *read_attr_from_array(const char 
**list)
  * another thread could potentially be calling into the attribute system.
  */
 static enum git_attr_direction direction;
-static const struct index_state *use_index;
 
-void git_attr_set_direction(enum git_attr_direction new_direction,
-   const struct index_state *istate)
+void git_attr_set_direction(enum git_attr_direction new_direction)
 {
if (is_bare_repository() && new_direction != GIT_ATTR_INDEX)
BUG("non-INDEX attr direction in a bare repo");
@@ -720,7 +718,6 @@ void git_attr_set_direction(enum git_attr_direction 
new_direction,
drop_all_attr_stacks();
 
direction = new_direction;
-   use_index = istate;
 }
 
 static struct attr_stack *read_attr_from_file(const char *path, int macro_ok)
@@ -750,17 +747,11 @@ static struct attr_stack *read_attr_from_index(const 
struct index_state *istate,
struct attr_stack *res;
char *buf, *sp;
int lineno = 0;
-   const struct index_state *to_read_from;
 
-   /*
-* Temporary workaround for c24f3abace (apply: file commited
-* with CRLF should roundtrip diff and apply - 2017-08-19)
-*/
-   to_read_from = use_index ? use_index : istate;
-   if (!to_read_from)
+   if (!istate)
return NULL;
 
-   buf = read_blob_data_from_index(to_read_from, path, NULL);
+   buf = read_blob_data_from_index(istate, path, NULL);
if (!buf)
return NULL;
 
diff --git a/attr.h b/attr.h
index 3daca3c0cb..01dab4a126 100644
--- a/attr.h
+++ b/attr.h
@@ -77,8 +77,7 @@ enum git_attr_direction {
GIT_ATTR_CHECKOUT,
GIT_ATTR_INDEX
 };
-void git_attr_set_direction(enum git_attr_direction new_direction,
-   const struct index_state *istate);
+void git_attr_set_direction(enum git_attr_direction new_direction);
 
 void attr_start(void);
 
diff --git a/builtin/check-attr.c b/builtin/check-attr.c
index f7b59993d3..c05573ff9c 100644
--- a/builtin/check-attr.c
+++ b/builtin/check-attr.c
@@ -120,7 +120,7 @@ int cmd_check_attr(int argc, const char **argv, const char 
*prefix)
}
 
if (cached_attrs)
-   git_attr_set_direction(GIT_ATTR_INDEX, NULL);
+   git_attr_set_direction(GIT_ATTR_INDEX);
 
doubledash = -1;
for (i = 0; doubledash < 0 && i < argc; i++) {
diff --git a/unpack-trees.c b/unpack-trees.c
index 3ace82ca27..8cb407173e 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -361,7 +361,7 @@ static int check_updates(struct unpack_trees_options *o)
progress = get_progress(o);
 
if (o->update)
-   git_attr_set_direction(GIT_ATTR_CHECKOUT, index);
+   git_attr_set_direction(GIT_ATTR_CHECKOUT);
 
if (should_update_submodules() && o->update && !o->dry_run)
load_gitmodules_file(index, NULL);
@@ -421,7 +421,7 @@ static int check_updates(struct unpack_trees_options *o)
stop_progress(&progress);
errs |= finish_delayed_checkout(&state);
if (o->update)
-   git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
+   git_attr_set_direction(GIT_ATTR_CHECKIN);
return errs != 0;
 }
 
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v4 17/23] read-cache.c: remove an implicit dependency on the_index

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Make some index API take an index_state instead of assuming the_index
in read-cache.c. All external call sites are converted blindly to keep
the patch simple and retain current behavior.  Individual call sites
may receive further updates to use the right index instead of the_index.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 apply.c|  4 ++--
 builtin/checkout.c |  4 ++--
 builtin/difftool.c |  4 ++--
 builtin/reset.c|  4 ++--
 cache.h|  5 +++--
 merge-recursive.c  |  2 +-
 read-cache.c   | 19 +++
 resolve-undo.c |  2 +-
 8 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/apply.c b/apply.c
index 9720855590..fc42a0eadf 100644
--- a/apply.c
+++ b/apply.c
@@ -4090,9 +4090,9 @@ static int build_fake_ancestor(struct apply_state *state, 
struct patch *list)
return error(_("sha1 information is lacking or useless "
   "(%s)."), name);
 
-   ce = make_cache_entry(patch->old_mode, oid.hash, name, 0, 0);
+   ce = make_index_entry(&the_index, patch->old_mode, oid.hash, 
name, 0, 0);
if (!ce)
-   return error(_("make_cache_entry failed for path '%s'"),
+   return error(_("make_index_entry failed for path '%s'"),
 name);
if (add_index_entry(&result, ce, ADD_CACHE_OK_TO_ADD)) {
free(ce);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index f8c208cea1..d2257e0d82 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -230,9 +230,9 @@ static int checkout_merged(int pos, const struct checkout 
*state)
if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid))
die(_("Unable to add merge result for '%s'"), path);
free(result_buf.ptr);
-   ce = make_cache_entry(mode, oid.hash, path, 2, 0);
+   ce = make_index_entry(&the_index, mode, oid.hash, path, 2, 0);
if (!ce)
-   die(_("make_cache_entry failed for path '%s'"), path);
+   die(_("make_index_entry failed for path '%s'"), path);
status = checkout_entry(ce, state, NULL);
free(ce);
return status;
diff --git a/builtin/difftool.c b/builtin/difftool.c
index bc97d4aef2..fb2ccfe6f0 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -321,7 +321,7 @@ static int checkout_path(unsigned mode, struct object_id 
*oid,
struct cache_entry *ce;
int ret;
 
-   ce = make_cache_entry(mode, oid->hash, path, 0, 0);
+   ce = make_index_entry(&the_index, mode, oid->hash, path, 0, 0);
ret = checkout_entry(ce, state, NULL);
 
free(ce);
@@ -488,7 +488,7 @@ static int run_dir_diff(const char *extcmd, int symlinks, 
const char *prefix,
 * index.
 */
struct cache_entry *ce2 =
-   make_cache_entry(rmode, roid.hash,
+   make_index_entry(&the_index, rmode, 
roid.hash,
 dst_path, 0, 0);
 
add_index_entry(&wtindex, ce2,
diff --git a/builtin/reset.c b/builtin/reset.c
index a862c70fab..067f535031 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -134,10 +134,10 @@ static void update_index_from_diff(struct 
diff_queue_struct *q,
continue;
}
 
-   ce = make_cache_entry(one->mode, one->oid.hash, one->path,
+   ce = make_index_entry(&the_index, one->mode, one->oid.hash, 
one->path,
  0, 0);
if (!ce)
-   die(_("make_cache_entry failed for path '%s'"),
+   die(_("make_index_entry failed for path '%s'"),
one->path);
if (is_missing) {
ce->ce_flags |= CE_INTENT_TO_ADD;
diff --git a/cache.h b/cache.h
index 89a107a7f7..64abd3e55c 100644
--- a/cache.h
+++ b/cache.h
@@ -355,6 +355,7 @@ extern void free_name_hash(struct index_state *istate);
 #define unmerged_cache() unmerged_index(&the_index)
 #define cache_name_pos(name, namelen) 
index_name_pos(&the_index,(name),(namelen))
 #define add_cache_entry(ce, option) add_index_entry(&the_index, (ce), (option))
+#define refresh_cache_entry(ce, flags) refresh_index_entry(&the_index, ce, 
flags)
 #define rename_cache_entry_at(pos, new_name) rename_index_entry_at(&the_index, 
(pos), (new_name))
 #define remove_cache_entry_at(pos) remove_index_entry_at(&the_index, (pos))
 #define remove_file_from_cache(path) remove_file_from_index(&the_index, (path))
@@ -698,7 +699,7 @@ extern int remove_file_from_index(struct index_state *, 
const char *path);
 extern int add_to_index(struct index_state *, const char *path, struct stat *, 
int flags);
 extern int add_file_to_index(stru

[PATCH v4 21/23] resolve-undo.c: use the right index instead of the_index

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 resolve-undo.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/resolve-undo.c b/resolve-undo.c
index 383231b011..2377995d6d 100644
--- a/resolve-undo.c
+++ b/resolve-undo.c
@@ -1,3 +1,4 @@
+#define NO_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
 #include "dir.h"
 #include "resolve-undo.h"
@@ -146,7 +147,7 @@ int unmerge_index_entry_at(struct index_state *istate, int 
pos)
struct cache_entry *nce;
if (!ru->mode[i])
continue;
-   nce = make_index_entry(&the_index, ru->mode[i], ru->oid[i].hash,
+   nce = make_index_entry(istate, ru->mode[i], ru->oid[i].hash,
   name, i + 1, 0);
if (matched)
nce->ce_flags |= CE_MATCHED;
@@ -186,7 +187,7 @@ void unmerge_index(struct index_state *istate, const struct 
pathspec *pathspec)
 
for (i = 0; i < istate->cache_nr; i++) {
const struct cache_entry *ce = istate->cache[i];
-   if (!ce_path_match(&the_index, ce, pathspec, NULL))
+   if (!ce_path_match(istate, ce, pathspec, NULL))
continue;
i = unmerge_index_entry_at(istate, i);
}
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v4 12/23] pathspec.c: use the right index instead of the_index

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 pathspec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pathspec.c b/pathspec.c
index 897cb9cbbe..6f005996fd 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -37,7 +37,7 @@ void add_pathspec_matches_against_index(const struct pathspec 
*pathspec,
return;
for (i = 0; i < istate->cache_nr; i++) {
const struct cache_entry *ce = istate->cache[i];
-   ce_path_match(&the_index, ce, pathspec, seen);
+   ce_path_match(istate, ce, pathspec, seen);
}
 }
 
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v4 19/23] difftool: use the right index instead of the_index

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/difftool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/difftool.c b/builtin/difftool.c
index fb2ccfe6f0..c7d6296762 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -321,7 +321,7 @@ static int checkout_path(unsigned mode, struct object_id 
*oid,
struct cache_entry *ce;
int ret;
 
-   ce = make_index_entry(&the_index, mode, oid->hash, path, 0, 0);
+   ce = make_index_entry(state->istate, mode, oid->hash, path, 0, 0);
ret = checkout_entry(ce, state, NULL);
 
free(ce);
@@ -488,7 +488,7 @@ static int run_dir_diff(const char *extcmd, int symlinks, 
const char *prefix,
 * index.
 */
struct cache_entry *ce2 =
-   make_index_entry(&the_index, rmode, 
roid.hash,
+   make_index_entry(&wtindex, rmode, 
roid.hash,
 dst_path, 0, 0);
 
add_index_entry(&wtindex, ce2,
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v4 22/23] grep: use the right index instead of the_index

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/grep.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 2eae397e92..a8cef2b159 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -497,7 +497,7 @@ static int grep_cache(struct grep_opt *opt, struct 
repository *repo,
strbuf_addstr(&name, ce->name);
 
if (S_ISREG(ce->ce_mode) &&
-   match_pathspec(&the_index, pathspec, name.buf, name.len, 0, 
NULL,
+   match_pathspec(repo->index, pathspec, name.buf, name.len, 
0, NULL,
   S_ISDIR(ce->ce_mode) ||
   S_ISGITLINK(ce->ce_mode))) {
/*
@@ -515,7 +515,7 @@ static int grep_cache(struct grep_opt *opt, struct 
repository *repo,
hit |= grep_file(opt, name.buf);
}
} else if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
-  submodule_path_match(&the_index, pathspec, name.buf, 
NULL)) {
+  submodule_path_match(repo->index, pathspec, 
name.buf, NULL)) {
hit |= grep_submodule(opt, repo, pathspec, NULL, 
ce->name, ce->name);
} else {
continue;
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v4 09/23] convert.c: remove an implicit dependency on the_index

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Make the convert API take an index_state instead of assuming the_index
in convert.c. All external call sites are converted blindly to keep
the patch simple and retain current behavior. Individual call sites
may receive further updates to use the right index instead of
the_index.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 apply.c|  2 +-
 archive.c  |  2 +-
 builtin/cat-file.c |  2 +-
 builtin/ls-files.c |  2 +-
 convert.c  | 41 -
 convert.h  | 15 ++-
 diff.c |  2 +-
 entry.c|  6 +++---
 merge-recursive.c  |  2 +-
 sha1-file.c|  4 ++--
 10 files changed, 45 insertions(+), 33 deletions(-)

diff --git a/apply.c b/apply.c
index d79e61591b..9720855590 100644
--- a/apply.c
+++ b/apply.c
@@ -4333,7 +4333,7 @@ static int try_create_file(const char *path, unsigned int 
mode, const char *buf,
if (fd < 0)
return 1;
 
-   if (convert_to_working_tree(path, buf, size, &nbuf)) {
+   if (convert_to_working_tree(&the_index, path, buf, size, &nbuf)) {
size = nbuf.len;
buf  = nbuf.buf;
}
diff --git a/archive.c b/archive.c
index 3b4db8956a..f3631a4fb5 100644
--- a/archive.c
+++ b/archive.c
@@ -78,7 +78,7 @@ void *object_file_to_archive(const struct archiver_args *args,
size_t size = 0;
 
strbuf_attach(&buf, buffer, *sizep, *sizep + 1);
-   convert_to_working_tree(path, buf.buf, buf.len, &buf);
+   convert_to_working_tree(&the_index, path, buf.buf, buf.len, 
&buf);
if (commit)
format_subst(commit, buf.buf, buf.len, &buf);
buffer = strbuf_detach(&buf, &size);
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 665b581949..a615a0048b 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -38,7 +38,7 @@ static int filter_object(const char *path, unsigned mode,
 oid_to_hex(oid), path);
if ((type == OBJ_BLOB) && S_ISREG(mode)) {
struct strbuf strbuf = STRBUF_INIT;
-   if (convert_to_working_tree(path, *buf, *size, &strbuf)) {
+   if (convert_to_working_tree(&the_index, path, *buf, *size, 
&strbuf)) {
free(*buf);
*size = strbuf.len;
*buf = strbuf_detach(&strbuf, NULL);
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 88bb2019ad..d996734b45 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -63,7 +63,7 @@ static void write_eolinfo(const struct index_state *istate,
struct stat st;
const char *i_txt = "";
const char *w_txt = "";
-   const char *a_txt = get_convert_attr_ascii(path);
+   const char *a_txt = get_convert_attr_ascii(&the_index, path);
if (ce && S_ISREG(ce->ce_mode))
i_txt = get_cached_convert_stats_ascii(istate,
   ce->name);
diff --git a/convert.c b/convert.c
index 9d5dc32564..2b1a87a104 100644
--- a/convert.c
+++ b/convert.c
@@ -1290,7 +1290,8 @@ struct conv_attrs {
const char *working_tree_encoding; /* Supported encoding or default 
encoding if NULL */
 };
 
-static void convert_attrs(struct conv_attrs *ca, const char *path)
+static void convert_attrs(const struct index_state *istate,
+ struct conv_attrs *ca, const char *path)
 {
static struct attr_check *check;
 
@@ -1302,7 +1303,7 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
git_config(read_convert_config, NULL);
}
 
-   if (!git_check_attr(&the_index, path, check)) {
+   if (!git_check_attr(istate, path, check)) {
struct attr_check_item *ccheck = check->items;
ca->crlf_action = git_path_check_crlf(ccheck + 4);
if (ca->crlf_action == CRLF_UNDEFINED)
@@ -1339,11 +1340,11 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
ca->crlf_action = CRLF_AUTO_INPUT;
 }
 
-int would_convert_to_git_filter_fd(const char *path)
+int would_convert_to_git_filter_fd(const struct index_state *istate, const 
char *path)
 {
struct conv_attrs ca;
 
-   convert_attrs(&ca, path);
+   convert_attrs(istate, &ca, path);
if (!ca.drv)
return 0;
 
@@ -1358,11 +1359,11 @@ int would_convert_to_git_filter_fd(const char *path)
return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN, NULL);
 }
 
-const char *get_convert_attr_ascii(const char *path)
+const char *get_convert_attr_ascii(const struct index_state *istate, const 
char *path)
 {
struct conv_attrs ca;
 
-   convert_attrs(&ca, path);
+   convert_attrs(istate, &ca, path);
switch (ca.attr_action) {
case CRLF_UNDEFINED:
return "

[PATCH v4 11/23] ls-files: correct index argument to get_convert_attr_ascii()

2018-06-06 Thread Nguyễn Thái Ngọc Duy
write_eolinfo() does take an istate as function argument and it should
be used instead of the_index.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/ls-files.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 7233b92794..7f9919a362 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -63,7 +63,7 @@ static void write_eolinfo(const struct index_state *istate,
struct stat st;
const char *i_txt = "";
const char *w_txt = "";
-   const char *a_txt = get_convert_attr_ascii(&the_index, path);
+   const char *a_txt = get_convert_attr_ascii(istate, path);
if (ce && S_ISREG(ce->ce_mode))
i_txt = get_cached_convert_stats_ascii(istate,
   ce->name);
@@ -121,18 +121,19 @@ static void print_debug(const struct cache_entry *ce)
}
 }
 
-static void show_dir_entry(const char *tag, struct dir_entry *ent)
+static void show_dir_entry(const struct index_state *istate,
+  const char *tag, struct dir_entry *ent)
 {
int len = max_prefix_len;
 
if (len > ent->len)
die("git ls-files: internal error - directory entry not 
superset of prefix");
 
-   if (!dir_path_match(&the_index, ent, &pathspec, len, ps_matched))
+   if (!dir_path_match(istate, ent, &pathspec, len, ps_matched))
return;
 
fputs(tag, stdout);
-   write_eolinfo(NULL, NULL, ent->name);
+   write_eolinfo(istate, NULL, ent->name);
write_name(ent->name);
 }
 
@@ -145,7 +146,7 @@ static void show_other_files(const struct index_state 
*istate,
struct dir_entry *ent = dir->entries[i];
if (!index_name_is_other(istate, ent->name, ent->len))
continue;
-   show_dir_entry(tag_other, ent);
+   show_dir_entry(istate, tag_other, ent);
}
 }
 
@@ -196,7 +197,7 @@ static void show_killed_files(const struct index_state 
*istate,
}
}
if (killed)
-   show_dir_entry(tag_killed, dir->entries[i]);
+   show_dir_entry(istate, tag_killed, dir->entries[i]);
}
 }
 
@@ -228,7 +229,7 @@ static void show_ce(struct repository *repo, struct 
dir_struct *dir,
if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
is_submodule_active(repo, ce->name)) {
show_submodule(repo, dir, ce->name);
-   } else if (match_pathspec(&the_index, &pathspec, fullname, 
strlen(fullname),
+   } else if (match_pathspec(repo->index, &pathspec, fullname, 
strlen(fullname),
  max_prefix_len, ps_matched,
  S_ISDIR(ce->ce_mode) ||
  S_ISGITLINK(ce->ce_mode))) {
@@ -264,7 +265,7 @@ static void show_ru_info(const struct index_state *istate)
len = strlen(path);
if (len < max_prefix_len)
continue; /* outside of the prefix */
-   if (!match_pathspec(&the_index, &pathspec, path, len,
+   if (!match_pathspec(istate, &pathspec, path, len,
max_prefix_len, ps_matched, 0))
continue; /* uninterested */
for (i = 0; i < 3; i++) {
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v4 14/23] entry.c: use the right index instead of the_index

2018-06-06 Thread Nguyễn Thái Ngọc Duy
checkout-index.c needs update because if checkout->istate is NULL,
ie_match_stat() will crash. Previously this is ie_match_stat(&the_index, ..)
so it will not crash, but it is not technically correct either.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/checkout-index.c |  1 +
 entry.c  | 10 ++
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index a730f6a1aa..d92db62fbd 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -190,6 +190,7 @@ int cmd_checkout_index(int argc, const char **argv, const 
char *prefix)
 
argc = parse_options(argc, argv, prefix, builtin_checkout_index_options,
builtin_checkout_index_usage, 0);
+   state.istate = &the_index;
state.force = force;
state.quiet = quiet;
state.not_new = not_new;
diff --git a/entry.c b/entry.c
index 12d9191051..02752f7b53 100644
--- a/entry.c
+++ b/entry.c
@@ -1,3 +1,4 @@
+#define NO_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
 #include "blob.h"
 #include "dir.h"
@@ -265,7 +266,7 @@ static int write_entry(struct cache_entry *ce,
const struct submodule *sub;
 
if (ce_mode_s_ifmt == S_IFREG) {
-   struct stream_filter *filter = get_stream_filter(&the_index, 
ce->name,
+   struct stream_filter *filter = get_stream_filter(state->istate, 
ce->name,
 &ce->oid);
if (filter &&
!streaming_write_entry(ce, path, filter,
@@ -313,14 +314,14 @@ static int write_entry(struct cache_entry *ce,
 * Convert from git internal format to working tree format
 */
if (dco && dco->state != CE_NO_DELAY) {
-   ret = async_convert_to_working_tree(&the_index, 
ce->name, new_blob,
+   ret = async_convert_to_working_tree(state->istate, 
ce->name, new_blob,
size, &buf, dco);
if (ret && string_list_has_string(&dco->paths, 
ce->name)) {
free(new_blob);
goto delayed;
}
} else
-   ret = convert_to_working_tree(&the_index, ce->name, 
new_blob, size, &buf);
+   ret = convert_to_working_tree(state->istate, ce->name, 
new_blob, size, &buf);
 
if (ret) {
free(new_blob);
@@ -421,7 +422,8 @@ int checkout_entry(struct cache_entry *ce,
 
if (!check_path(path.buf, path.len, &st, state->base_dir_len)) {
const struct submodule *sub;
-   unsigned changed = ce_match_stat(ce, &st, 
CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE);
+   unsigned changed = ie_match_stat(state->istate, ce, &st,
+
CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE);
/*
 * Needs to be checked before !changed returns early,
 * as the possibly empty directory was not changed
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v4 23/23] cache.h: make the_index part of "compatibility macros"

2018-06-06 Thread Nguyễn Thái Ngọc Duy
While the_index is not actually a macro, its use throughout the code
base is dangerous because developers sometimes may not see that some
function is using the_index (instead of some other index that the devs
are interested in).

By keeping the_index part of this NO_ macro, we try to reduce its use
more and more until it's completely gone.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 cache.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 64abd3e55c..eb8c79b8a1 100644
--- a/cache.h
+++ b/cache.h
@@ -330,8 +330,6 @@ struct index_state {
struct ewah_bitmap *fsmonitor_dirty;
 };
 
-extern struct index_state the_index;
-
 /* Name hashing */
 extern int test_lazy_init_name_hash(struct index_state *istate, int 
try_threaded);
 extern void add_name_hash(struct index_state *istate, struct cache_entry *ce);
@@ -340,6 +338,8 @@ extern void free_name_hash(struct index_state *istate);
 
 
 #ifndef NO_THE_INDEX_COMPATIBILITY_MACROS
+extern struct index_state the_index;
+
 #define active_cache (the_index.cache)
 #define active_nr (the_index.cache_nr)
 #define active_alloc (the_index.cache_alloc)
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v4 20/23] checkout: avoid the_index when possible

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/checkout.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index d2257e0d82..4dbcab3727 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -230,7 +230,7 @@ static int checkout_merged(int pos, const struct checkout 
*state)
if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid))
die(_("Unable to add merge result for '%s'"), path);
free(result_buf.ptr);
-   ce = make_index_entry(&the_index, mode, oid.hash, path, 2, 0);
+   ce = make_index_entry(state->istate, mode, oid.hash, path, 2, 0);
if (!ce)
die(_("make_index_entry failed for path '%s'"), path);
status = checkout_entry(ce, state, NULL);
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v4 16/23] preload-index.c: use the right index instead of the_index

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 preload-index.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/preload-index.c b/preload-index.c
index d61d7662d5..cc2b579791 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -1,6 +1,7 @@
 /*
  * Copyright (C) 2008 Linus Torvalds
  */
+#define NO_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
 #include "pathspec.h"
 #include "dir.h"
@@ -58,7 +59,7 @@ static void *preload_thread(void *_data)
continue;
if (ce->ce_flags & CE_FSMONITOR_VALID)
continue;
-   if (!ce_path_match(&the_index, ce, &p->pathspec, NULL))
+   if (!ce_path_match(index, ce, &p->pathspec, NULL))
continue;
if (threaded_has_symlink_leading_path(&cache, ce->name, 
ce_namelen(ce)))
continue;
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v4 13/23] submodule.c: use the right index instead of the_index

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 submodule.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/submodule.c b/submodule.c
index 955560bdbb..3aed76e3ee 100644
--- a/submodule.c
+++ b/submodule.c
@@ -93,7 +93,7 @@ int update_path_in_gitmodules(const char *oldpath, const char 
*newpath)
if (!file_exists(GITMODULES_FILE)) /* Do nothing without .gitmodules */
return -1;
 
-   if (is_gitmodules_unmerged(&the_index))
+   if (is_gitmodules_unmerged(the_repository->index))
die(_("Cannot change unmerged .gitmodules, resolve merge 
conflicts first"));
 
submodule = submodule_from_path(the_repository, &null_oid, oldpath);
@@ -127,7 +127,7 @@ int remove_path_from_gitmodules(const char *path)
if (!file_exists(GITMODULES_FILE)) /* Do nothing without .gitmodules */
return -1;
 
-   if (is_gitmodules_unmerged(&the_index))
+   if (is_gitmodules_unmerged(the_repository->index))
die(_("Cannot change unmerged .gitmodules, resolve merge 
conflicts first"));
 
submodule = submodule_from_path(the_repository, &null_oid, path);
@@ -188,7 +188,7 @@ void set_diffopt_flags_from_submodule_config(struct 
diff_options *diffopt,
 
if (ignore)
handle_ignore_submodules_arg(diffopt, ignore);
-   else if (is_gitmodules_unmerged(&the_index))
+   else if (is_gitmodules_unmerged(the_repository->index))
diffopt->flags.ignore_submodules = 1;
}
 }
@@ -258,7 +258,7 @@ int is_submodule_active(struct repository *repo, const char 
*path)
}
 
parse_pathspec(&ps, 0, 0, NULL, args.argv);
-   ret = match_pathspec(&the_index, &ps, path, strlen(path), 0, 
NULL, 1);
+   ret = match_pathspec(repo->index, &ps, path, strlen(path), 0, 
NULL, 1);
 
argv_array_clear(&args);
clear_pathspec(&ps);
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v4 10/23] dir.c: remove an implicit dependency on the_index in pathspec code

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Make the match_patchspec API and friends take an index_state instead
of assuming the_index in dir.c. All external call sites are converted
blindly to keep the patch simple and retain current behavior.
Individual call sites may receive further updates to use the right
index instead of the_index.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 archive.c   |  2 +-
 builtin/add.c   |  6 +++---
 builtin/checkout.c  |  2 +-
 builtin/clean.c |  2 +-
 builtin/commit.c|  2 +-
 builtin/grep.c  |  6 +++---
 builtin/ls-files.c  |  6 +++---
 builtin/rm.c|  2 +-
 builtin/submodule--helper.c |  2 +-
 builtin/update-index.c  |  2 +-
 diff-lib.c  |  4 ++--
 dir.c   | 27 ---
 dir.h   | 16 ++--
 pathspec.c  |  2 +-
 preload-index.c |  2 +-
 read-cache.c|  2 +-
 rerere.c|  2 +-
 resolve-undo.c  |  2 +-
 revision.c  |  2 +-
 submodule.c |  2 +-
 wt-status.c |  6 +++---
 21 files changed, 54 insertions(+), 45 deletions(-)

diff --git a/archive.c b/archive.c
index f3631a4fb5..1b44503ebb 100644
--- a/archive.c
+++ b/archive.c
@@ -312,7 +312,7 @@ static int reject_entry(const struct object_id *oid, struct 
strbuf *base,
struct strbuf sb = STRBUF_INIT;
strbuf_addbuf(&sb, base);
strbuf_addstr(&sb, filename);
-   if (!match_pathspec(context, sb.buf, sb.len, 0, NULL, 1))
+   if (!match_pathspec(&the_index, context, sb.buf, sb.len, 0, 
NULL, 1))
ret = READ_TREE_RECURSIVE;
strbuf_release(&sb);
}
diff --git a/builtin/add.c b/builtin/add.c
index 8a155dd41e..066623a195 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -40,7 +40,7 @@ static void chmod_pathspec(struct pathspec *pathspec, char 
flip)
for (i = 0; i < active_nr; i++) {
struct cache_entry *ce = active_cache[i];
 
-   if (pathspec && !ce_path_match(ce, pathspec, NULL))
+   if (pathspec && !ce_path_match(&the_index, ce, pathspec, NULL))
continue;
 
if (chmod_cache_entry(ce, flip) < 0)
@@ -135,7 +135,7 @@ static int renormalize_tracked_files(const struct pathspec 
*pathspec, int flags)
continue; /* do not touch unmerged paths */
if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
continue; /* do not touch non blobs */
-   if (pathspec && !ce_path_match(ce, pathspec, NULL))
+   if (pathspec && !ce_path_match(&the_index, ce, pathspec, NULL))
continue;
retval |= add_file_to_cache(ce->name, flags | HASH_RENORMALIZE);
}
@@ -155,7 +155,7 @@ static char *prune_directory(struct dir_struct *dir, struct 
pathspec *pathspec,
i = dir->nr;
while (--i >= 0) {
struct dir_entry *entry = *src++;
-   if (dir_path_match(entry, pathspec, prefix, seen))
+   if (dir_path_match(&the_index, entry, pathspec, prefix, seen))
*dst++ = entry;
}
dir->nr = dst - dir->entries;
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 2e1d2376d2..f8c208cea1 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -316,7 +316,7 @@ static int checkout_paths(const struct checkout_opts *opts,
 * match_pathspec() for _all_ entries when
 * opts->source_tree != NULL.
 */
-   if (ce_path_match(ce, &opts->pathspec, ps_matched))
+   if (ce_path_match(&the_index, ce, &opts->pathspec, ps_matched))
ce->ce_flags |= CE_MATCHED;
}
 
diff --git a/builtin/clean.c b/builtin/clean.c
index fad533a0a7..d4ca64904f 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -981,7 +981,7 @@ int cmd_clean(int argc, const char **argv, const char 
*prefix)
continue;
 
if (pathspec.nr)
-   matches = dir_path_match(ent, &pathspec, 0, NULL);
+   matches = dir_path_match(&the_index, ent, &pathspec, 0, 
NULL);
 
if (pathspec.nr && !matches)
continue;
diff --git a/builtin/commit.c b/builtin/commit.c
index a842fea666..f593ec1bbc 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -238,7 +238,7 @@ static int list_paths(struct string_list *list, const char 
*with_tree,
 
if (ce->ce_flags & CE_UPDATE)
continue;
-   if (!ce_path_match(ce, pattern, m))
+   if (!ce_path_match(&the_index, ce, pattern, m))
continue;
item = string_list_insert(list, ce->name);
if (ce_skip_worktree(

[PATCH v4 08/23] convert.h: drop 'extern' from function declaration

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 convert.h | 56 ---
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/convert.h b/convert.h
index 01385d9288..0a0fa15b58 100644
--- a/convert.h
+++ b/convert.h
@@ -57,35 +57,36 @@ struct delayed_checkout {
 
 extern enum eol core_eol;
 extern char *check_roundtrip_encoding;
-extern const char *get_cached_convert_stats_ascii(const struct index_state 
*istate,
- const char *path);
-extern const char *get_wt_convert_stats_ascii(const char *path);
-extern const char *get_convert_attr_ascii(const char *path);
+const char *get_cached_convert_stats_ascii(const struct index_state *istate,
+  const char *path);
+const char *get_wt_convert_stats_ascii(const char *path);
+const char *get_convert_attr_ascii(const char *path);
 
 /* returns 1 if *dst was used */
-extern int convert_to_git(const struct index_state *istate,
- const char *path, const char *src, size_t len,
- struct strbuf *dst, int conv_flags);
-extern int convert_to_working_tree(const char *path, const char *src,
-  size_t len, struct strbuf *dst);
-extern int async_convert_to_working_tree(const char *path, const char *src,
-size_t len, struct strbuf *dst,
-void *dco);
-extern int async_query_available_blobs(const char *cmd, struct string_list 
*available_paths);
-extern int renormalize_buffer(const struct index_state *istate,
- const char *path, const char *src, size_t len,
- struct strbuf *dst);
+int convert_to_git(const struct index_state *istate,
+  const char *path, const char *src, size_t len,
+  struct strbuf *dst, int conv_flags);
+int convert_to_working_tree(const char *path, const char *src,
+   size_t len, struct strbuf *dst);
+int async_convert_to_working_tree(const char *path, const char *src,
+ size_t len, struct strbuf *dst,
+ void *dco);
+int async_query_available_blobs(const char *cmd,
+   struct string_list *available_paths);
+int renormalize_buffer(const struct index_state *istate,
+  const char *path, const char *src, size_t len,
+  struct strbuf *dst);
 static inline int would_convert_to_git(const struct index_state *istate,
   const char *path)
 {
return convert_to_git(istate, path, NULL, 0, NULL, 0);
 }
 /* Precondition: would_convert_to_git_filter_fd(path) == true */
-extern void convert_to_git_filter_fd(const struct index_state *istate,
-const char *path, int fd,
-struct strbuf *dst,
-int conv_flags);
-extern int would_convert_to_git_filter_fd(const char *path);
+void convert_to_git_filter_fd(const struct index_state *istate,
+ const char *path, int fd,
+ struct strbuf *dst,
+ int conv_flags);
+int would_convert_to_git_filter_fd(const char *path);
 
 /*
  *
@@ -95,9 +96,10 @@ extern int would_convert_to_git_filter_fd(const char *path);
 
 struct stream_filter; /* opaque */
 
-extern struct stream_filter *get_stream_filter(const char *path, const struct 
object_id *);
-extern void free_stream_filter(struct stream_filter *);
-extern int is_null_stream_filter(struct stream_filter *);
+struct stream_filter *get_stream_filter(const char *path,
+   const struct object_id *);
+void free_stream_filter(struct stream_filter *);
+int is_null_stream_filter(struct stream_filter *);
 
 /*
  * Use as much input up to *isize_p and fill output up to *osize_p;
@@ -111,8 +113,8 @@ extern int is_null_stream_filter(struct stream_filter *);
  * such filters know there is no more input coming and it is time for
  * them to produce the remaining output based on the buffered input.
  */
-extern int stream_filter(struct stream_filter *,
-const char *input, size_t *isize_p,
-char *output, size_t *osize_p);
+int stream_filter(struct stream_filter *,
+ const char *input, size_t *isize_p,
+ char *output, size_t *osize_p);
 
 #endif /* CONVERT_H */
-- 
2.18.0.rc0.333.g22e6ee6cdf



Re: [PATCH v3 07/20] attr: remove an implicit dependency on the_index

2018-06-06 Thread Brandon Williams
On 06/06, Duy Nguyen wrote:
> On Wed, Jun 6, 2018 at 6:50 PM, Brandon Williams  wrote:
> > On 06/06, Nguyễn Thái Ngọc Duy wrote:
> >> Make the attr API take an index_state instead of assuming the_index in
> >> attr code. All call sites are converted blindly to keep the patch
> >> simple and retain current behavior. Individual call sites may receive
> >> further updates to use the right index instead of the_index.
> >>
> >> There is one ugly temporary workaround added in attr.c that needs some
> >> more explanation.
> >>
> >> Commit c24f3abace (apply: file commited * with CRLF should roundtrip
> >> diff and apply - 2017-08-19) forces one convert_to_git() call to NOT
> >> read the index at all. But what do you know, we read it anyway by
> >> falling back to the_index. When "istate" from convert_to_git is now
> >> propagated down to read_attr_from_array() we will hit segfault
> >> somewhere inside read_blob_data_from_index.
> >>
> >> The right way of dealing with this is to kill "use_index" variable and
> >> only follow "istate" but at this stage we are not ready for that:
> >> while most git_attr_set_direction() calls just passes the_index to be
> >> assigned to use_index, unpack-trees passes a different one which is
> >> used by entry.c code, which has no way to know what index to use if we
> >> delete use_index. So this has to be done later.
> >
> > Ugh I remember this when I was doing some refactoring to the attr
> > subsystem a year or so ago.  I really wanted to get rid of the whole
> > "direction" thing because if I remember correctly its one of the only
> > remaining globals that affects the outcome of an attr check (everything
> > else was converted to use the attr check struct.  I always got way to
> > far into the weeds when trying to do that too.  I'm not expecting that
> > from this series (since its way out of scope) but maybe it'll be easier
> > afterwards.
> 
> It's not that much easier. This direction thing is still global by
> design. It would be great if we have something like Scheme's parameter
> (aka. dynamic scoping iirc) then we can still scope this nicely. Git
> does not live in such worlds.

Yeah i realized this after sending.  Either way thanks for simplifying
the global state by getting ride of the index global.

> -- 
> Duy

-- 
Brandon Williams


[PATCH v4 06/23] attr.h: drop extern from function declaration

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 attr.h | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/attr.h b/attr.h
index 442d464db6..46340010bb 100644
--- a/attr.h
+++ b/attr.h
@@ -42,31 +42,31 @@ struct attr_check {
struct attr_stack *stack;
 };
 
-extern struct attr_check *attr_check_alloc(void);
-extern struct attr_check *attr_check_initl(const char *, ...);
-extern struct attr_check *attr_check_dup(const struct attr_check *check);
+struct attr_check *attr_check_alloc(void);
+struct attr_check *attr_check_initl(const char *, ...);
+struct attr_check *attr_check_dup(const struct attr_check *check);
 
-extern struct attr_check_item *attr_check_append(struct attr_check *check,
-const struct git_attr *attr);
+struct attr_check_item *attr_check_append(struct attr_check *check,
+ const struct git_attr *attr);
 
-extern void attr_check_reset(struct attr_check *check);
-extern void attr_check_clear(struct attr_check *check);
-extern void attr_check_free(struct attr_check *check);
+void attr_check_reset(struct attr_check *check);
+void attr_check_clear(struct attr_check *check);
+void attr_check_free(struct attr_check *check);
 
 /*
  * Return the name of the attribute represented by the argument.  The
  * return value is a pointer to a null-delimited string that is part
  * of the internal data structure; it should not be modified or freed.
  */
-extern const char *git_attr_name(const struct git_attr *);
+const char *git_attr_name(const struct git_attr *);
 
-extern int git_check_attr(const char *path, struct attr_check *check);
+int git_check_attr(const char *path, struct attr_check *check);
 
 /*
  * Retrieve all attributes that apply to the specified path.
  * check holds the attributes and their values.
  */
-extern void git_all_attrs(const char *path, struct attr_check *check);
+void git_all_attrs(const char *path, struct attr_check *check);
 
 enum git_attr_direction {
GIT_ATTR_CHECKIN,
@@ -76,6 +76,6 @@ enum git_attr_direction {
 void git_attr_set_direction(enum git_attr_direction new_direction,
struct index_state *istate);
 
-extern void attr_start(void);
+void attr_start(void);
 
 #endif /* ATTR_H */
-- 
2.18.0.rc0.333.g22e6ee6cdf



Re: [PATCH v3 07/20] attr: remove an implicit dependency on the_index

2018-06-06 Thread Duy Nguyen
On Wed, Jun 6, 2018 at 6:50 PM, Brandon Williams  wrote:
> On 06/06, Nguyễn Thái Ngọc Duy wrote:
>> Make the attr API take an index_state instead of assuming the_index in
>> attr code. All call sites are converted blindly to keep the patch
>> simple and retain current behavior. Individual call sites may receive
>> further updates to use the right index instead of the_index.
>>
>> There is one ugly temporary workaround added in attr.c that needs some
>> more explanation.
>>
>> Commit c24f3abace (apply: file commited * with CRLF should roundtrip
>> diff and apply - 2017-08-19) forces one convert_to_git() call to NOT
>> read the index at all. But what do you know, we read it anyway by
>> falling back to the_index. When "istate" from convert_to_git is now
>> propagated down to read_attr_from_array() we will hit segfault
>> somewhere inside read_blob_data_from_index.
>>
>> The right way of dealing with this is to kill "use_index" variable and
>> only follow "istate" but at this stage we are not ready for that:
>> while most git_attr_set_direction() calls just passes the_index to be
>> assigned to use_index, unpack-trees passes a different one which is
>> used by entry.c code, which has no way to know what index to use if we
>> delete use_index. So this has to be done later.
>
> Ugh I remember this when I was doing some refactoring to the attr
> subsystem a year or so ago.  I really wanted to get rid of the whole
> "direction" thing because if I remember correctly its one of the only
> remaining globals that affects the outcome of an attr check (everything
> else was converted to use the attr check struct.  I always got way to
> far into the weeds when trying to do that too.  I'm not expecting that
> from this series (since its way out of scope) but maybe it'll be easier
> afterwards.

It's not that much easier. This direction thing is still global by
design. It would be great if we have something like Scheme's parameter
(aka. dynamic scoping iirc) then we can still scope this nicely. Git
does not live in such worlds.
-- 
Duy


Re: [PATCH v3 15/20] attr: remove index from git_attr_set_direction()

2018-06-06 Thread Brandon Williams
On 06/06, Nguyễn Thái Ngọc Duy wrote:
> Since attr checking API now take the index, there's no need to set an
> index in advance with this call. Most call sites are straightforward
> because they either pass the_index or NULL (which defaults back to
> the_index previously). There's only one suspicious call site in
> unpack-trees.c where it sets a different index.
> 
> This code in unpack-trees is about to checking out entries from the
> new/temporary index after merging is done in it. The attributes will
> be used by entry.c code to do crlf conversion if needed. entry.c now
> respects struct checkout's istate field, and this field is correctly
> set in unpack-trees.c, there should be no regression from this change.

Nice! Much cleaner.

> 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  archive.c|  2 +-
>  attr.c   | 15 +++
>  attr.h   |  3 +--
>  builtin/check-attr.c |  2 +-
>  unpack-trees.c   |  4 ++--
>  5 files changed, 8 insertions(+), 18 deletions(-)
> 
> diff --git a/archive.c b/archive.c
> index 1b44503ebb..d1d0a0d6b3 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -273,7 +273,7 @@ int write_archive_entries(struct archiver_args *args,
>   init_tree_desc(&t, args->tree->buffer, args->tree->size);
>   if (unpack_trees(1, &t, &opts))
>   return -1;
> - git_attr_set_direction(GIT_ATTR_INDEX, &the_index);
> + git_attr_set_direction(GIT_ATTR_INDEX);
>   }
>  
>   err = read_tree_recursive(args->tree, "", 0, 0, &args->pathspec,
> diff --git a/attr.c b/attr.c
> index 863fad3bd1..98e4953f6e 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -708,10 +708,8 @@ static struct attr_stack *read_attr_from_array(const 
> char **list)
>   * another thread could potentially be calling into the attribute system.
>   */
>  static enum git_attr_direction direction;
> -static const struct index_state *use_index;
>  
> -void git_attr_set_direction(enum git_attr_direction new_direction,
> - const struct index_state *istate)
> +void git_attr_set_direction(enum git_attr_direction new_direction)
>  {
>   if (is_bare_repository() && new_direction != GIT_ATTR_INDEX)
>   BUG("non-INDEX attr direction in a bare repo");
> @@ -720,7 +718,6 @@ void git_attr_set_direction(enum git_attr_direction 
> new_direction,
>   drop_all_attr_stacks();
>  
>   direction = new_direction;
> - use_index = istate;
>  }
>  
>  static struct attr_stack *read_attr_from_file(const char *path, int macro_ok)
> @@ -750,17 +747,11 @@ static struct attr_stack *read_attr_from_index(const 
> struct index_state *istate,
>   struct attr_stack *res;
>   char *buf, *sp;
>   int lineno = 0;
> - const struct index_state *to_read_from;
>  
> - /*
> -  * Temporary workaround for c24f3abace (apply: file commited
> -  * with CRLF should roundtrip diff and apply - 2017-08-19)
> -  */
> - to_read_from = use_index ? use_index : istate;
> - if (!to_read_from)
> + if (!istate)
>   return NULL;
>  
> - buf = read_blob_data_from_index(to_read_from, path, NULL);
> + buf = read_blob_data_from_index(istate, path, NULL);
>   if (!buf)
>   return NULL;
>  
> diff --git a/attr.h b/attr.h
> index 3daca3c0cb..01dab4a126 100644
> --- a/attr.h
> +++ b/attr.h
> @@ -77,8 +77,7 @@ enum git_attr_direction {
>   GIT_ATTR_CHECKOUT,
>   GIT_ATTR_INDEX
>  };
> -void git_attr_set_direction(enum git_attr_direction new_direction,
> - const struct index_state *istate);
> +void git_attr_set_direction(enum git_attr_direction new_direction);
>  
>  void attr_start(void);
>  
> diff --git a/builtin/check-attr.c b/builtin/check-attr.c
> index f7b59993d3..c05573ff9c 100644
> --- a/builtin/check-attr.c
> +++ b/builtin/check-attr.c
> @@ -120,7 +120,7 @@ int cmd_check_attr(int argc, const char **argv, const 
> char *prefix)
>   }
>  
>   if (cached_attrs)
> - git_attr_set_direction(GIT_ATTR_INDEX, NULL);
> + git_attr_set_direction(GIT_ATTR_INDEX);
>  
>   doubledash = -1;
>   for (i = 0; doubledash < 0 && i < argc; i++) {
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 3ace82ca27..8cb407173e 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -361,7 +361,7 @@ static int check_updates(struct unpack_trees_options *o)
>   progress = get_progress(o);
>  
>   if (o->update)
> - git_attr_set_direction(GIT_ATTR_CHECKOUT, index);
> + git_attr_set_direction(GIT_ATTR_CHECKOUT);
>  
>   if (should_update_submodules() && o->update && !o->dry_run)
>   load_gitmodules_file(index, NULL);
> @@ -421,7 +421,7 @@ static int check_updates(struct unpack_trees_options *o)
>   stop_progress(&progress);
>   errs |= finish_delayed_checkout(&state);
>   if (o->update)
> - git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
> + 

[PATCH v4 05/23] unpack-trees: avoid the_index in verify_absent()

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Both functions that are updated in this commit are called by
verify_absent(), which is part of the "unpack-trees" operation that is
supposed to work on any index file specified by the caller. Thanks to
Brandon [1] [2], an implicit dependency on the_index is exposed. This
commit fixes it.

In both functions, it makes sense to use src_index to check for
exclusion because it's almost unchanged and should give us the same
outcome as if running the exclude check before the unpack.

It's "almost unchanged" because we do invalidate cache-tree and
untracked cache in the source index. But this should not affect how
exclude machinery uses the index: to see if a file is tracked, and to
read a blob from the index instead of worktree if it's marked
skip-worktree (i.e. it's not available in worktree)

[1] a0bba65b10 (dir: convert is_excluded to take an index - 2017-05-05
[2] 2c1eb10454 (dir: convert read_directory to take an index - 2017-05-05)

Helped-by: Elijah Newren 
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 unpack-trees.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 5268de7af5..3ace82ca27 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1651,7 +1651,7 @@ static int verify_clean_subdirectory(const struct 
cache_entry *ce,
memset(&d, 0, sizeof(d));
if (o->dir)
d.exclude_per_dir = o->dir->exclude_per_dir;
-   i = read_directory(&d, &the_index, pathbuf, namelen+1, NULL);
+   i = read_directory(&d, o->src_index, pathbuf, namelen+1, NULL);
if (i)
return o->gently ? -1 :
add_rejected_path(o, ERROR_NOT_UPTODATE_DIR, ce->name);
@@ -1693,7 +1693,7 @@ static int check_ok_to_remove(const char *name, int len, 
int dtype,
return 0;
 
if (o->dir &&
-   is_excluded(o->dir, &the_index, name, &dtype))
+   is_excluded(o->dir, o->src_index, name, &dtype))
/*
 * ce->name is explicitly excluded, so it is Ok to
 * overwrite it.
-- 
2.18.0.rc0.333.g22e6ee6cdf



Re: [PATCH v3 07/20] attr: remove an implicit dependency on the_index

2018-06-06 Thread Brandon Williams
On 06/06, Nguyễn Thái Ngọc Duy wrote:
> Make the attr API take an index_state instead of assuming the_index in
> attr code. All call sites are converted blindly to keep the patch
> simple and retain current behavior. Individual call sites may receive
> further updates to use the right index instead of the_index.
> 
> There is one ugly temporary workaround added in attr.c that needs some
> more explanation.
> 
> Commit c24f3abace (apply: file commited * with CRLF should roundtrip
> diff and apply - 2017-08-19) forces one convert_to_git() call to NOT
> read the index at all. But what do you know, we read it anyway by
> falling back to the_index. When "istate" from convert_to_git is now
> propagated down to read_attr_from_array() we will hit segfault
> somewhere inside read_blob_data_from_index.
> 
> The right way of dealing with this is to kill "use_index" variable and
> only follow "istate" but at this stage we are not ready for that:
> while most git_attr_set_direction() calls just passes the_index to be
> assigned to use_index, unpack-trees passes a different one which is
> used by entry.c code, which has no way to know what index to use if we
> delete use_index. So this has to be done later.

Ugh I remember this when I was doing some refactoring to the attr
subsystem a year or so ago.  I really wanted to get rid of the whole
"direction" thing because if I remember correctly its one of the only
remaining globals that affects the outcome of an attr check (everything
else was converted to use the attr check struct.  I always got way to
far into the weeds when trying to do that too.  I'm not expecting that
from this series (since its way out of scope) but maybe it'll be easier
afterwards.

> 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  archive.c  |  2 +-
>  attr.c | 57 --
>  attr.h | 10 +---
>  builtin/check-attr.c   |  4 +--
>  builtin/pack-objects.c |  2 +-
>  convert.c  |  2 +-
>  dir.c  |  2 +-
>  ll-merge.c |  4 +--
>  userdiff.c |  2 +-
>  ws.c   |  2 +-
>  10 files changed, 55 insertions(+), 32 deletions(-)
> 
> diff --git a/archive.c b/archive.c
> index 4fe7bec60c..3b4db8956a 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -108,7 +108,7 @@ static const struct attr_check *get_archive_attrs(const 
> char *path)
>   static struct attr_check *check;
>   if (!check)
>   check = attr_check_initl("export-ignore", "export-subst", NULL);
> - return git_check_attr(path, check) ? NULL : check;
> + return git_check_attr(&the_index, path, check) ? NULL : check;
>  }
>  
>  static int check_attr_export_ignore(const struct attr_check *check)
> diff --git a/attr.c b/attr.c
> index 067fb9e0c0..863fad3bd1 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -708,10 +708,10 @@ static struct attr_stack *read_attr_from_array(const 
> char **list)
>   * another thread could potentially be calling into the attribute system.
>   */
>  static enum git_attr_direction direction;
> -static struct index_state *use_index;
> +static const struct index_state *use_index;
>  
>  void git_attr_set_direction(enum git_attr_direction new_direction,
> - struct index_state *istate)
> + const struct index_state *istate)
>  {
>   if (is_bare_repository() && new_direction != GIT_ATTR_INDEX)
>   BUG("non-INDEX attr direction in a bare repo");
> @@ -743,13 +743,24 @@ static struct attr_stack *read_attr_from_file(const 
> char *path, int macro_ok)
>   return res;
>  }
>  
> -static struct attr_stack *read_attr_from_index(const char *path, int 
> macro_ok)
> +static struct attr_stack *read_attr_from_index(const struct index_state 
> *istate,
> +const char *path,
> +int macro_ok)
>  {
>   struct attr_stack *res;
>   char *buf, *sp;
>   int lineno = 0;
> + const struct index_state *to_read_from;
>  
> - buf = read_blob_data_from_index(use_index ? use_index : &the_index, 
> path, NULL);
> + /*
> +  * Temporary workaround for c24f3abace (apply: file commited
> +  * with CRLF should roundtrip diff and apply - 2017-08-19)
> +  */
> + to_read_from = use_index ? use_index : istate;
> + if (!to_read_from)
> + return NULL;
> +
> + buf = read_blob_data_from_index(to_read_from, path, NULL);
>   if (!buf)
>   return NULL;
>  
> @@ -768,15 +779,16 @@ static struct attr_stack *read_attr_from_index(const 
> char *path, int macro_ok)
>   return res;
>  }
>  
> -static struct attr_stack *read_attr(const char *path, int macro_ok)
> +static struct attr_stack *read_attr(const struct index_state *istate,
> + const char *path, int macro_ok)
>  {
>   struct attr_stack *res = NULL;
>  
>   if (direction == GIT_ATTR_INDEX) {
> 

[PATCH v4 03/23] unpack-trees: don't shadow global var the_index

2018-06-06 Thread Nguyễn Thái Ngọc Duy
This function mark_new_skip_worktree() has an argument named the_index
which is also the name of a global variable. While they have different
types (the global the_index is not a pointer) mistakes can easily
happen and it's also confusing for readers. Rename the function
argument to something other than the_index.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 unpack-trees.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 5d06aa9c98..45fcda3169 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1231,7 +1231,7 @@ static int clear_ce_flags(struct cache_entry **cache, int 
nr,
  * Set/Clear CE_NEW_SKIP_WORKTREE according to $GIT_DIR/info/sparse-checkout
  */
 static void mark_new_skip_worktree(struct exclude_list *el,
-  struct index_state *the_index,
+  struct index_state *istate,
   int select_flag, int skip_wt_flag)
 {
int i;
@@ -1240,8 +1240,8 @@ static void mark_new_skip_worktree(struct exclude_list 
*el,
 * 1. Pretend the narrowest worktree: only unmerged entries
 * are checked out
 */
-   for (i = 0; i < the_index->cache_nr; i++) {
-   struct cache_entry *ce = the_index->cache[i];
+   for (i = 0; i < istate->cache_nr; i++) {
+   struct cache_entry *ce = istate->cache[i];
 
if (select_flag && !(ce->ce_flags & select_flag))
continue;
@@ -1256,8 +1256,7 @@ static void mark_new_skip_worktree(struct exclude_list 
*el,
 * 2. Widen worktree according to sparse-checkout file.
 * Matched entries will have skip_wt_flag cleared (i.e. "in")
 */
-   clear_ce_flags(the_index->cache, the_index->cache_nr,
-  select_flag, skip_wt_flag, el);
+   clear_ce_flags(istate->cache, istate->cache_nr, select_flag, 
skip_wt_flag, el);
 }
 
 static int verify_absent(const struct cache_entry *,
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v4 02/23] unpack-trees: add a note about path invalidation

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 unpack-trees.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index 3a85a02a77..5d06aa9c98 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1545,6 +1545,17 @@ static int verify_uptodate_sparse(const struct 
cache_entry *ce,
return verify_uptodate_1(ce, o, ERROR_SPARSE_NOT_UPTODATE_FILE);
 }
 
+/*
+ * TODO: We should actually invalidate o->result, not src_index [1].
+ * But since cache tree and untracked cache both are not copied to
+ * o->result until unpacking is complete, we invalidate them on
+ * src_index instead with the assumption that they will be copied to
+ * dst_index at the end.
+ *
+ * [1] src_index->cache_tree is also used in unpack_callback() so if
+ * we invalidate o->result, we need to update it to use
+ * o->result.cache_tree as well.
+ */
 static void invalidate_ce_path(const struct cache_entry *ce,
   struct unpack_trees_options *o)
 {
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v4 04/23] unpack-tress: convert clear_ce_flags* to avoid the_index

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Prior to fba92be8f7, this code implicitly (and incorrectly) assumes
the_index when running the exclude machinery. fba92be8f7 helps show
this problem clearer because unpack-trees operation is supposed to
work on whatever index the caller specifies... not specifically
the_index.

Update the code to use "istate" argument that's originally from
mark_new_skip_worktree(). From the call sites, both in unpack_trees(),
you can see that this function works on two separate indexes:
o->src_index and o->result. The second mark_new_skip_worktree() so far
has incorecctly applied exclude rules on o->src_index instead of
o->result. It's unclear what is the consequences of this, but it's
definitely wrong.

[1] fba92be8f7 (dir: convert is_excluded_from_list to take an index -
2017-05-05)

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 unpack-trees.c | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 45fcda3169..5268de7af5 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1085,13 +1085,15 @@ static int unpack_callback(int n, unsigned long mask, 
unsigned long dirmask, str
return mask;
 }
 
-static int clear_ce_flags_1(struct cache_entry **cache, int nr,
+static int clear_ce_flags_1(struct index_state *istate,
+   struct cache_entry **cache, int nr,
struct strbuf *prefix,
int select_mask, int clear_mask,
struct exclude_list *el, int defval);
 
 /* Whole directory matching */
-static int clear_ce_flags_dir(struct cache_entry **cache, int nr,
+static int clear_ce_flags_dir(struct index_state *istate,
+ struct cache_entry **cache, int nr,
  struct strbuf *prefix,
  char *basename,
  int select_mask, int clear_mask,
@@ -1100,7 +1102,7 @@ static int clear_ce_flags_dir(struct cache_entry **cache, 
int nr,
struct cache_entry **cache_end;
int dtype = DT_DIR;
int ret = is_excluded_from_list(prefix->buf, prefix->len,
-   basename, &dtype, el, &the_index);
+   basename, &dtype, el, istate);
int rc;
 
strbuf_addch(prefix, '/');
@@ -1122,7 +1124,7 @@ static int clear_ce_flags_dir(struct cache_entry **cache, 
int nr,
 * calling clear_ce_flags_1(). That function will call
 * the expensive is_excluded_from_list() on every entry.
 */
-   rc = clear_ce_flags_1(cache, cache_end - cache,
+   rc = clear_ce_flags_1(istate, cache, cache_end - cache,
  prefix,
  select_mask, clear_mask,
  el, ret);
@@ -1145,7 +1147,8 @@ static int clear_ce_flags_dir(struct cache_entry **cache, 
int nr,
  *   cache[0]->name[0..(prefix_len-1)]
  * Top level path has prefix_len zero.
  */
-static int clear_ce_flags_1(struct cache_entry **cache, int nr,
+static int clear_ce_flags_1(struct index_state *istate,
+   struct cache_entry **cache, int nr,
struct strbuf *prefix,
int select_mask, int clear_mask,
struct exclude_list *el, int defval)
@@ -1179,7 +1182,7 @@ static int clear_ce_flags_1(struct cache_entry **cache, 
int nr,
len = slash - name;
strbuf_add(prefix, name, len);
 
-   processed = clear_ce_flags_dir(cache, cache_end - cache,
+   processed = clear_ce_flags_dir(istate, cache, cache_end 
- cache,
   prefix,
   prefix->buf + 
prefix->len - len,
   select_mask, clear_mask,
@@ -1193,7 +1196,7 @@ static int clear_ce_flags_1(struct cache_entry **cache, 
int nr,
}
 
strbuf_addch(prefix, '/');
-   cache += clear_ce_flags_1(cache, cache_end - cache,
+   cache += clear_ce_flags_1(istate, cache, cache_end - 
cache,
  prefix,
  select_mask, clear_mask, el, 
defval);
strbuf_setlen(prefix, prefix->len - len - 1);
@@ -1203,7 +1206,7 @@ static int clear_ce_flags_1(struct cache_entry **cache, 
int nr,
/* Non-directory */
dtype = ce_to_dtype(ce);
ret = is_excluded_from_list(ce->name, ce_namelen(ce),
-   name, &dtype, el, &the_index);
+   name, &dtype, el, istate);
if (ret < 0)
ret = defval;
if (ret > 0)
@@ -1213,

[PATCH v4 00/23] Fix incorrect use of the_index

2018-06-06 Thread Nguyễn Thái Ngọc Duy
v4 fixes some commit messages and killed a couple more the_index
references after I tried to merge this with 'pu'

diff --git a/apply.c b/apply.c
index 811ff2ad5e..82f681972f 100644
--- a/apply.c
+++ b/apply.c
@@ -4090,9 +4090,9 @@ static int build_fake_ancestor(struct apply_state *state, 
struct patch *list)
return error(_("sha1 information is lacking or useless "
   "(%s)."), name);
 
-   ce = make_cache_entry(&the_index, patch->old_mode, oid.hash, 
name, 0, 0);
+   ce = make_index_entry(&result, patch->old_mode, oid.hash, name, 
0, 0);
if (!ce)
-   return error(_("make_cache_entry failed for path '%s'"),
+   return error(_("make_index_entry failed for path '%s'"),
 name);
if (add_index_entry(&result, ce, ADD_CACHE_OK_TO_ADD)) {
free(ce);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 3c8218304e..4dbcab3727 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -230,9 +230,9 @@ static int checkout_merged(int pos, const struct checkout 
*state)
if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid))
die(_("Unable to add merge result for '%s'"), path);
free(result_buf.ptr);
-   ce = make_cache_entry(&the_index, mode, oid.hash, path, 2, 0);
+   ce = make_index_entry(state->istate, mode, oid.hash, path, 2, 0);
if (!ce)
-   die(_("make_cache_entry failed for path '%s'"), path);
+   die(_("make_index_entry failed for path '%s'"), path);
status = checkout_entry(ce, state, NULL);
free(ce);
return status;
diff --git a/builtin/difftool.c b/builtin/difftool.c
index e34e75a42d..c7d6296762 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -321,7 +321,7 @@ static int checkout_path(unsigned mode, struct object_id 
*oid,
struct cache_entry *ce;
int ret;
 
-   ce = make_cache_entry(&the_index, mode, oid->hash, path, 0, 0);
+   ce = make_index_entry(state->istate, mode, oid->hash, path, 0, 0);
ret = checkout_entry(ce, state, NULL);
 
free(ce);
@@ -488,7 +488,7 @@ static int run_dir_diff(const char *extcmd, int symlinks, 
const char *prefix,
 * index.
 */
struct cache_entry *ce2 =
-   make_cache_entry(&the_index, rmode, 
roid.hash,
+   make_index_entry(&wtindex, rmode, 
roid.hash,
 dst_path, 0, 0);
 
add_index_entry(&wtindex, ce2,
diff --git a/builtin/reset.c b/builtin/reset.c
index 0ea0a19d5e..067f535031 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -134,10 +134,10 @@ static void update_index_from_diff(struct 
diff_queue_struct *q,
continue;
}
 
-   ce = make_cache_entry(&the_index, one->mode, one->oid.hash, 
one->path,
+   ce = make_index_entry(&the_index, one->mode, one->oid.hash, 
one->path,
  0, 0);
if (!ce)
-   die(_("make_cache_entry failed for path '%s'"),
+   die(_("make_index_entry failed for path '%s'"),
one->path);
if (is_missing) {
ce->ce_flags |= CE_INTENT_TO_ADD;
diff --git a/cache.h b/cache.h
index 242aaa5498..eb8c79b8a1 100644
--- a/cache.h
+++ b/cache.h
@@ -699,7 +699,7 @@ extern int remove_file_from_index(struct index_state *, 
const char *path);
 extern int add_to_index(struct index_state *, const char *path, struct stat *, 
int flags);
 extern int add_file_to_index(struct index_state *, const char *path, int 
flags);
 
-extern struct cache_entry *make_cache_entry(struct index_state 
*istate,unsigned int mode, const unsigned char *sha1, const char *path, int 
stage, unsigned int refresh_options);
+extern struct cache_entry *make_index_entry(struct index_state *istate, 
unsigned int mode, const unsigned char *sha1, const char *path, int stage, 
unsigned int refresh_options);
 extern int chmod_index_entry(struct index_state *, struct cache_entry *ce, 
char flip);
 extern int ce_same_name(const struct cache_entry *a, const struct cache_entry 
*b);
 extern void set_object_name_for_intent_to_add_entry(struct cache_entry *ce);
diff --git a/merge-recursive.c b/merge-recursive.c
index 9280deb6a1..4f054d6dbb 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -315,7 +315,7 @@ static int add_cacheinfo(struct merge_options *o,
struct cache_entry *ce;
int ret;
 
-   ce = make_cache_entry(&the_index, mode, oid ? oid->hash : null_sha1, 
path, stage, 0);
+   ce = make_index_entry(&the_index, mode, oid ? oid->hash : null_s

[PATCH v4 01/23] unpack-trees: remove 'extern' on function declaration

2018-06-06 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 unpack-trees.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/unpack-trees.h b/unpack-trees.h
index c2b434c606..534358fcc5 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -82,8 +82,8 @@ struct unpack_trees_options {
struct exclude_list *el; /* for internal use */
 };
 
-extern int unpack_trees(unsigned n, struct tree_desc *t,
-   struct unpack_trees_options *options);
+int unpack_trees(unsigned n, struct tree_desc *t,
+struct unpack_trees_options *options);
 
 int verify_uptodate(const struct cache_entry *ce,
struct unpack_trees_options *o);
-- 
2.18.0.rc0.333.g22e6ee6cdf



Re: [PATCH] t3200-branch.sh: use "--set-upstream-to" in test

2018-06-06 Thread Kaartic Sivaraam
On Tuesday 05 June 2018 04:54 PM, SZEDER Gábor wrote:
> 
> Though arguably the test name could be more descriptive and tell why
> it should fail.
> 

That's arguable, indeed. I was about to send a patch that gives a better
description for the test. I didn't do it as I started wondering, Is it
even worth testing whether a removed option fails? Is this done for
other options that have been removed in the past? Should we just remove
the test completely?

-- 
Sivaraam

QUOTE:

“The three principal virtues of a programmer are Laziness, Impatience,
and Hubris.”

- Camel book

Sivaraam?

You possibly might have noticed that my signature recently changed from
'Kaartic' to 'Sivaraam' both of which are parts of my name. I find the
new signature to be better for several reasons one of which is that the
former signature has a lot of ambiguities in the place I live as it is a
common name (NOTE: it's not a common spelling, just a common name). So,
I switched signatures before it's too late.

That said, I won't mind you calling me 'Kaartic' if you like it [of
course ;-)]. You can always call me using either of the names.


KIND NOTE TO THE NATIVE ENGLISH SPEAKER:

As I'm not a native English speaker myself, there might be mistaeks in
my usage of English. I apologise for any mistakes that I make.

It would be "helpful" if you take the time to point out the mistakes.

It would be "super helpful" if you could provide suggestions about how
to correct those mistakes.

Thanks in advance!



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] config.c: fix regression for core.safecrlf false

2018-06-06 Thread Torsten Bögershausen
On Mon, Jun 04, 2018 at 01:17:42PM -0700, Anthony Sottile wrote:
> A regression introduced in 8462ff43e42ab67cecd16fdfb59451a53cc8a945 caused
> autocrlf rewrites to produce a warning message despite setting safecrlf=false.
> 
> Signed-off-by: Anthony Sottile 
> ---
>  config.c|  2 +-
>  t/t0020-crlf.sh | 10 ++
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/config.c b/config.c
> index fbbf0f8..de24e90 100644
> --- a/config.c
> +++ b/config.c
> @@ -1233,7 +1233,7 @@ static int git_default_core_config(const char *var, 
> const char *value)
>   }
>   eol_rndtrp_die = git_config_bool(var, value);
>   global_conv_flags_eol = eol_rndtrp_die ?
> - CONV_EOL_RNDTRP_DIE : CONV_EOL_RNDTRP_WARN;
> + CONV_EOL_RNDTRP_DIE : 0;
>   return 0;
>   }
>  
> diff --git a/t/t0020-crlf.sh b/t/t0020-crlf.sh
> index 71350e0..5f05698 100755
> --- a/t/t0020-crlf.sh
> +++ b/t/t0020-crlf.sh
> @@ -98,6 +98,16 @@ test_expect_success 'safecrlf: git diff demotes 
> safecrlf=true to warn' '
>  '
>  
>  
> +test_expect_success 'safecrlf: no warning with safecrlf=false' '
> + git config core.autocrlf input &&
> + git config core.safecrlf false &&
> +
> + for w in I am all CRLF; do echo $w; done | append_cr >allcrlf &&
> + git add allcrlf 2>err &&
> + test_must_be_empty err
> +'
> +
> +
>  test_expect_success 'switch off autocrlf, safecrlf, reset HEAD' '
>   git config core.autocrlf false &&
>   git config core.safecrlf false &&
> -- 
> 2.7.4
> 

Looks good to me, thanks for cleaning my mess.

Acked-By: Torsten Bögershausen 


Re: [PATCH v3 07/20] attr: remove an implicit dependency on the_index

2018-06-06 Thread Ramsay Jones



On 06/06/18 08:39, Nguyễn Thái Ngọc Duy wrote:
> Make the attr API take an index_state instead of assuming the_index in
> attr code. All call sites are converted blindly to keep the patch
> simple and retain current behavior. Individual call sites may receive
> further updates to use the right index instead of the_index.
> 
> There is one ugly temporary workaround added in attr.c that needs some
> more explanation.
> 
> Commit c24f3abace (apply: file commited * with CRLF should roundtrip

s/commited * with/commited with/

ATB,
Ramsay Jones



Re: [PATCH v5 18/21] commit-graph: use string-list API for input

2018-06-06 Thread Derrick Stolee

On 6/6/2018 8:26 AM, Ævar Arnfjörð Bjarmason wrote:

On Wed, Jun 06 2018, Derrick Stolee wrote:


On 6/6/2018 8:11 AM, Ævar Arnfjörð Bjarmason wrote:

On Wed, Jun 06 2018, Derrick Stolee wrote:


Signed-off-by: Derrick Stolee 
---
   builtin/commit-graph.c | 39 +--
   commit-graph.c | 15 +++
   commit-graph.h |  7 +++
   3 files changed, 23 insertions(+), 38 deletions(-)
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 3079cde6f9..d8eb8278b3 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -118,13 +118,9 @@ static int graph_read(int argc, const char **argv)

   static int graph_write(int argc, const char **argv)
   {
-   const char **pack_indexes = NULL;
-   int packs_nr = 0;
-   const char **commit_hex = NULL;
-   int commits_nr = 0;
-   const char **lines = NULL;
-   int lines_nr = 0;
-   int lines_alloc = 0;
+   struct string_list *pack_indexes = NULL;
+   struct string_list *commit_hex = NULL;
+   struct string_list lines;

static struct option builtin_commit_graph_write_options[] = {
OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -150,32 +146,23 @@ static int graph_write(int argc, const char **argv)

if (opts.stdin_packs || opts.stdin_commits) {
struct strbuf buf = STRBUF_INIT;
-   lines_nr = 0;
-   lines_alloc = 128;
-   ALLOC_ARRAY(lines, lines_alloc);
-
-   while (strbuf_getline(&buf, stdin) != EOF) {
-   ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
-   lines[lines_nr++] = strbuf_detach(&buf, NULL);
-   }
-
-   if (opts.stdin_packs) {
-   pack_indexes = lines;
-   packs_nr = lines_nr;
-   }
-   if (opts.stdin_commits) {
-   commit_hex = lines;
-   commits_nr = lines_nr;
-   }
+   string_list_init(&lines, 0);
+
+   while (strbuf_getline(&buf, stdin) != EOF)
+   string_list_append(&lines, strbuf_detach(&buf, NULL));
+
+   if (opts.stdin_packs)
+   pack_indexes = &lines;
+   if (opts.stdin_commits)
+   commit_hex = &lines;
}

write_commit_graph(opts.obj_dir,
   pack_indexes,
-  packs_nr,
   commit_hex,
-  commits_nr,
   opts.append);

+   string_list_clear(&lines, 0);
return 0;
   }

This results in an invalid free() & segfault because you're freeing
&lines which may not have been allocated by string_list_init().

Good point. Did my tests not catch this? (seems it requires calling
`git commit-graph write` with no `--stdin-packs` or
`--stdin-commits`).

Most of your tests (t5318-commit-graph.sh) segfaulted, but presumably
you're on a more forgiving compiler/platform/options. I compiled with
-O0 -g on clang 4.0.1-8 + Debian testing.


I appreciate the extra platform testing. I'm using GCC on Ubuntu (gcc 
(Ubuntu 7.3.0-16ubuntu3) 7.3.0).





Monkeypatch on top which I used to fix it:

  diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
  index 76423b3fa5..c7eb68aa3a 100644
  --- a/builtin/commit-graph.c
  +++ b/builtin/commit-graph.c
  @@ -122,6 +122,7 @@ static int graph_write(int argc, const char **argv)
  struct string_list *pack_indexes = NULL;
  struct string_list *commit_hex = NULL;
  struct string_list lines;
  +   int free_lines = 0;

  static struct option builtin_commit_graph_write_options[] = {
  OPT_STRING(0, "object-dir", &opts.obj_dir,
  @@ -155,6 +156,7 @@ static int graph_write(int argc, const char **argv)
  if (opts.stdin_packs || opts.stdin_commits) {
  struct strbuf buf = STRBUF_INIT;
  string_list_init(&lines, 0);
  +   free_lines = 1;

  while (strbuf_getline(&buf, stdin) != EOF)
  string_list_append(&lines, strbuf_detach(&buf, 
NULL));
  @@ -170,7 +172,8 @@ static int graph_write(int argc, const char **argv)
 commit_hex,
 opts.append);

  -   string_list_clear(&lines, 0);
  +   if (free_lines)
  +   string_list_clear(&lines, 0);
  return 0;
   }

But probably having a pointer to the struct which is NULL etc. is
better.

Wouldn't the easiest fix be to call `string_list_init(&lines, 0)`
outside of any conditional?

Sure that works too. We'd be doing the init when we don't need it, but
it's not like this part is performance critical or anything...




Re: [PATCH v5 18/21] commit-graph: use string-list API for input

2018-06-06 Thread Ævar Arnfjörð Bjarmason


On Wed, Jun 06 2018, Derrick Stolee wrote:

> On 6/6/2018 8:11 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Wed, Jun 06 2018, Derrick Stolee wrote:
>>
>>> Signed-off-by: Derrick Stolee 
>>> ---
>>>   builtin/commit-graph.c | 39 +--
>>>   commit-graph.c | 15 +++
>>>   commit-graph.h |  7 +++
>>>   3 files changed, 23 insertions(+), 38 deletions(-)
>>> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
>>> index 3079cde6f9..d8eb8278b3 100644
>>> --- a/builtin/commit-graph.c
>>> +++ b/builtin/commit-graph.c
>>> @@ -118,13 +118,9 @@ static int graph_read(int argc, const char **argv)
>>>
>>>   static int graph_write(int argc, const char **argv)
>>>   {
>>> -   const char **pack_indexes = NULL;
>>> -   int packs_nr = 0;
>>> -   const char **commit_hex = NULL;
>>> -   int commits_nr = 0;
>>> -   const char **lines = NULL;
>>> -   int lines_nr = 0;
>>> -   int lines_alloc = 0;
>>> +   struct string_list *pack_indexes = NULL;
>>> +   struct string_list *commit_hex = NULL;
>>> +   struct string_list lines;
>>>
>>> static struct option builtin_commit_graph_write_options[] = {
>>> OPT_STRING(0, "object-dir", &opts.obj_dir,
>>> @@ -150,32 +146,23 @@ static int graph_write(int argc, const char **argv)
>>>
>>> if (opts.stdin_packs || opts.stdin_commits) {
>>> struct strbuf buf = STRBUF_INIT;
>>> -   lines_nr = 0;
>>> -   lines_alloc = 128;
>>> -   ALLOC_ARRAY(lines, lines_alloc);
>>> -
>>> -   while (strbuf_getline(&buf, stdin) != EOF) {
>>> -   ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
>>> -   lines[lines_nr++] = strbuf_detach(&buf, NULL);
>>> -   }
>>> -
>>> -   if (opts.stdin_packs) {
>>> -   pack_indexes = lines;
>>> -   packs_nr = lines_nr;
>>> -   }
>>> -   if (opts.stdin_commits) {
>>> -   commit_hex = lines;
>>> -   commits_nr = lines_nr;
>>> -   }
>>> +   string_list_init(&lines, 0);
>>> +
>>> +   while (strbuf_getline(&buf, stdin) != EOF)
>>> +   string_list_append(&lines, strbuf_detach(&buf, NULL));
>>> +
>>> +   if (opts.stdin_packs)
>>> +   pack_indexes = &lines;
>>> +   if (opts.stdin_commits)
>>> +   commit_hex = &lines;
>>> }
>>>
>>> write_commit_graph(opts.obj_dir,
>>>pack_indexes,
>>> -  packs_nr,
>>>commit_hex,
>>> -  commits_nr,
>>>opts.append);
>>>
>>> +   string_list_clear(&lines, 0);
>>> return 0;
>>>   }
>> This results in an invalid free() & segfault because you're freeing
>> &lines which may not have been allocated by string_list_init().
>
> Good point. Did my tests not catch this? (seems it requires calling
> `git commit-graph write` with no `--stdin-packs` or
> `--stdin-commits`).

Most of your tests (t5318-commit-graph.sh) segfaulted, but presumably
you're on a more forgiving compiler/platform/options. I compiled with
-O0 -g on clang 4.0.1-8 + Debian testing.

>>
>> Monkeypatch on top which I used to fix it:
>>
>>  diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
>>  index 76423b3fa5..c7eb68aa3a 100644
>>  --- a/builtin/commit-graph.c
>>  +++ b/builtin/commit-graph.c
>>  @@ -122,6 +122,7 @@ static int graph_write(int argc, const char **argv)
>>  struct string_list *pack_indexes = NULL;
>>  struct string_list *commit_hex = NULL;
>>  struct string_list lines;
>>  +   int free_lines = 0;
>>
>>  static struct option builtin_commit_graph_write_options[] = {
>>  OPT_STRING(0, "object-dir", &opts.obj_dir,
>>  @@ -155,6 +156,7 @@ static int graph_write(int argc, const char **argv)
>>  if (opts.stdin_packs || opts.stdin_commits) {
>>  struct strbuf buf = STRBUF_INIT;
>>  string_list_init(&lines, 0);
>>  +   free_lines = 1;
>>
>>  while (strbuf_getline(&buf, stdin) != EOF)
>>  string_list_append(&lines, strbuf_detach(&buf, 
>> NULL));
>>  @@ -170,7 +172,8 @@ static int graph_write(int argc, const char **argv)
>> commit_hex,
>> opts.append);
>>
>>  -   string_list_clear(&lines, 0);
>>  +   if (free_lines)
>>  +   string_list_clear(&lines, 0);
>>  return 0;
>>   }
>>
>> But probably having a pointer to the struct which is NULL etc. is
>> better.
>
> Wouldn't the easiest fix be to call `string_list_init(&lines, 0)`
> outside of any conditional?

Sure that works too. We'd be doing the init when we don't need it, but
it's not like this part is performance critical or anything...


  1   2   >