Re: insteadOf and git-request-pull output

2018-11-16 Thread Junio C Hamano
"brian m. carlson"  writes:

>> $ git request-pull HEAD^ git://foo.example.com/example | grep example
>>   ssh://bar.example.com/example
>> 
>> I think that if we use the "principle of least surprise," insteadOf
>> rules shouldn't be applied for git-request-pull URLs.
>
> I'd like to point out a different use that may change your view.  I have
> an insteadOf alias, gh:, that points to GitHub.  Performing the rewrite
> is definitely the right thing to do, since other users may not have my
> alias available.
>
> I agree that in your case, a rewrite seems less appropriate, but I think
> we should only skip the rewrite if the value already matches a valid
> URL.

It would be tricky to define what a valid URL is, though.  Do we
need some way to say "this is a private URL that should not be
given preference when composing a request-pull message"?  E.g.

[url "git://git.dev/"]
insteadOf = https://git.dev/

[url "https://github.com/;]
insteadOf = gh:
private

The former does not mark https://git.dev/ a private one, so a
"request-pull https://git.dev/$thing; would show the original
"https://git.dev/$thing; without rewriting.  The latter marks gh: a
private one so "request-pull gh:$thing" would be rewritten before
exposed to the public as "https://github.com/$thing;

Or something like that?


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-16 Thread Junio C Hamano
sxe...@google.com writes:

> +Detailed design
> +===
> +Obsolescence information is stored as a graph of meta-commits. A meta-commit 
> is
> +a specially-formatted merge commit that describes how one commit was created
> +from others.
> +
> +Meta-commits look like this:
> +
> +$ git cat-file -p 
> +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> +parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> +author Stefan Xenos  1540841596 -0700
> +committer Stefan Xenos  1540841596 -0700
> +parent-type content
> +parent-type obsolete
> +parent-type origin
> +
> +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by
> +cherry-picking commit 7e1bbcd3”.
> +
> +The tree for meta-commits is always the empty tree whose hash matches
> +4b825dc642cb6eb9a060e54bf8d69288fbee4904 exactly, but future versions of git 
> may
> +attach other trees here. For forward-compatibility fsck should ignore such 
> trees
> +if found on future repository versions. Similarly, current versions of git
> +should always fill in an empty commit comment and tools like fsck should 
> ignore
> +the content of the commit comment if present in a future repository version.
> +This will allow future versions of git to add metadata to the meta-commit
> +comments or tree without breaking forwards compatibility.

Am I correct to understand that the reason why a commit object is
(ab|re)used to represent a meta-commit is because by doing so we
would get connectivity (i.e. fetching & pushing would transfer all
the associated objects along) for free, and by not representing it
as a new and different object type, existing implementations can
just pass them along without understanding what they are, and as
long as these are not mixed as parts of the main history of the
project (e.g. when enumerating commits that has aa7ce5 as its
parents, because somebody else obsoleted aa7ce5 and you want to
evolve anything that built on it, you do not want to mistake the
above "meta" commit as a commit that is part of the ordinary history
and rebuild on top of the new version of aa7ce5, which would lead to
a disaster), everything would work just fine?

Perhaps you'd use something like "presence of parent-type header
marks that a commit is a meta-commit and not part of the main
history".

How are these meta commits anchored so that it won't be reclaimed by
repack?  I do not see any "parent" field used to chain them
together, but I do not think we can afford to spend one ref per meta
commit, as refs are not designed to point into each and every object
in the repository.

I have a moderately strong opposition against "origin" thing.  If
aa7ce555 replaces d664309ee, in order for the tool to be able to
"evolve" other histories that build on top of d664309ee, it only
needs the history between aa7ce555 and d664309ee and it would not
matter how aa7ce555 was built relative to its parent.  The user may
have typed/developed it from scratch, the user may have borrowed 70%
of its change from 7e1bbcd while remaining 30% was done from
scratch, or it was a concatenation of the change made in 7e1bbcd and
another commit.  

One half of my point being that we can do _without_ it, and in all
cases, aa7ce555, if leaving the fact that it was derived from
7e1bbcd is so important, can mention that in its log message how it
relates to the "origin" thing.

And the other half is that while I consider the "origin" thing is
unnecessary for the above reasons, having it means we need to not
just transfer the history reading to aa7ce555 and d664309ee (which
are necessary anyway while we have histories to transplant from
d664309ee to aa7ce555) but also have to pull in the history leading
to 7e1bbcd and we cannot discard it.



Re: [PATCH] bundle: dup() output descriptor closer to point-of-use

2018-11-16 Thread Junio C Hamano
Jeff King  writes:

> Actually, I realized there's an even simpler way to do this. Here it is.
>
> -- >8 --
> Subject: [PATCH] bundle: dup() output descriptor closer to point-of-use
>
> When writing a bundle to a file, the bundle code actually creates
> "your.bundle.lock" using our lockfile interface. We feed that output
> descriptor to a child git-pack-objects via run-command, which has the
> quirk that it closes the output descriptor in the parent.
>
> To avoid confusing the lockfile code (which still thinks the descriptor
> is valid), we dup() it, and operate on the duplicate.

Yes...

> We can solve this by moving the dup() much closer to start_command(),
> shrinking the window in which we have the second descriptor open. It's
> easy to place this in such a way that no die() is possible. We could
> still die due to a signal in the exact wrong moment, but we already
> tolerate races there (e.g., a signal could come before we manage to put
> the file on the cleanup list in the first place).
>
> As a bonus, this shields create_bundle() itself from the duplicate-fd
> trick, and we can simplify its error handling (note that the lock
> rollback now happens unconditionally, but that's OK; it's a noop if we
> didn't open the lock in the first place).

... I found this way too clever for me, but by following the
codeflow, it was easy to convince myself that this does the right
thing.  Almost magical ;-)

Will queue, with a Tested-by: with Dscho's name on it.

TAhanks, both.


Re: [RFC/PATCH] read-cache: write all indexes with the same permissions

2018-11-16 Thread Christian Couder
On Fri, Nov 16, 2018 at 8:20 PM Duy Nguyen  wrote:
>
> On Fri, Nov 16, 2018 at 8:07 PM SZEDER Gábor  wrote:
>
> > With the default 20% threshold a new shared index is written rather
> > frequently with our usual small test-repos:
>
> Side note. Split index is definitely not meant for small repos.

I very much agree with that. It makes sense to use them only for big
repos and big repos usually don't pass a 20% threshold very often.

> But
> maybe we should have a lower limit (in terms of absolute number of
> entries) that prevent splitting. This splitting seems excessive.

I would agree if split index was the default mode or if our goal was
to eventually make it the default mode.

Or it could be a new "mixed" mode for core.splitIndex (which might
eventually become the default mode) to have no split-index as long as
the repo stays under a lower limit and to automatically use
split-index when the repo gets over the limit.


Re: [RFC/PATCH 0/5] stop installing old libexec aliases like "git-init"

2018-11-16 Thread Jeff King
On Fri, Nov 16, 2018 at 08:22:11PM +0100, Ævar Arnfjörð Bjarmason wrote:

> So maybe we should just document this interface better. It seems one
> implicit dependency is that we expect a manpage for the tool to exist
> for --help.

Yeah, I think this really the only problematic assumption. The rest of
"-c", "--git-dir", etc, are just manipulating the environment, and that
gets passed through to sub-invocations of Git (so if I have a script
which calls git-config, it will pick up the "-c" config).

It would be nice if there was a way to ask "is there a manpage?", and
fallback to running "git-cmd --help". But:

  1. I don't think there is a portable way to query that via man. And
 anyway, depending platform and config, it may be opening a browser
 to show HTML documentation (or I think even texinfo?).

  2. We can just ask whether "man git-sizer" (or whatever help display
 command) returned a non-zero exit code, and fall back to "git-sizer
 --help". But there's an infinite-loop possibility here: running
 "git-sizer --help" does what we want. But if "man git-log" failed,
 we'd run "git-log --help", which in turn runs "git help log", which
 runs "man git-log", and so on.

 You can break that loop with an environment variable for "I already
 tried to show the manpage", which would I guess convert "--help" to
 "-h".

So it's maybe do-able, but not quite as trivial as one might hope.

> But I don't remember the details, and can't reproduce it now with:
> 
> $ cat ~/bin/git-sigint 
> #!/usr/bin/env perl
> $SIG{INT} = sub { warn localtime . " " . $$ };
> sleep 1 while 1;
> $ git sigint # or git-sigint
> [... behaves the same either way ...]
> 
> So maybe it was something else, or I'm misremembering...

I think that generally works because hitting ^C is going to send SIGINT
to the whole process group. A more interesting case is:

  git sigint &
  kill -INT $!

There $! is a parent "git" process that is just waiting on git-sigint to
die. But that works, too, because the parent relays common signals due
to 10c6cddd92 (dashed externals: kill children on exit, 2012-01-08). So
you might have been remembering issues prior to that commit (or uncommon
signals; these come from sigchain_push_common).

-Peff


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-16 Thread Duy Nguyen
On Thu, Nov 15, 2018 at 2:00 AM  wrote:
> +Goals
> +-
> +Legend: Goals marked with P0 are required. Goals marked with Pn should be
> +attempted unless they interfere with goals marked with Pn-1.
> +
> +P0. All commands that modify commits (such as the normal commit --amend or
> +rebase command) should mark the old commit as being obsolete and 
> replaced by
> +the new one. No additional commands should be required to keep the
> +obsolescence graph up-to-date.

I sometimes "modify" a commit by "git reset @^", pick up the changes
then "git commit -c @{1}". I don't think this counts as a typical
modification and is probably hard to detect automatically. But I hope
there's some way for me to tell git "yes this is a modified commit of
that one, record that!".

> +Example usage
> +-
> +# First create three dependent changes
> +$ echo foo>bar.txt && git add .
> +$ git commit -m "This is a test"
> +created change metas/this_is_a_test

I guess as an example, how the name metas/this_is_a_test is
constructed does not matter much. But it's probably better to stick
with some sort of id because subject line will change over time and
the original one may become irrelevant. Perhaps we could use the
original commit id as name.

> +$ echo foo2>bar2.txt && git add .
> +$ git commit -m "This is also a test"
> +created change metas/this_is_also_a_test
> +$ echo foo3>bar3.txt && git add .
> +$ git commit -m "More testing"
> +created change metas/more_testing
> +
> +# List all our changes in progress
> +$ git change -l
> +metas/this_is_a_test
> +metas/this_is_also_a_test
> +* metas/more_testing
> +metas/some_change_already_merged_upstream
> +
> +# Now modify the earliest change, using its stable name
> +$ git reset --hard metas/this_is_a_test
> +$ echo morefoo>>bar.txt && git add . && git commit --amend --no-edit
> +
> +# Use git-evolve to fix up any dependent changes
> +$ git evolve
> +rebasing metas/this_is_also_a_test onto metas/this_is_a_test
> +rebasing metas/more_testing onto metas/this_is_also_a_test
> +Done
> +
> +# Use git-obslog to view the history of the this_is_a_test change
> +$ git obslog
> +93f110 metas/this_is_a_test@{0} commit (amend): This is a test
> +930219 metas/this_is_a_test@{1} commit: This is a test
> +
> +# Now create an unrelated change
> +$ git reset --hard origin/master
> +$ echo newchange>unrelated.txt && git add .
> +$ git commit -m "Unrelated change"
> +created change metas/unrelated_change
> +
> +# Fetch the latest code from origin/master and use git-evolve
> +# to rebase all dependent changes.
> +$ git fetch origin master
> +$ git evolve origin/master
> +deleting metas/some_change_already_merged_upstream
> +rebasing metas/this_is_a_test onto origin/master
> +rebasing metas/this_is_also_a_test onto metas/this_is_a_test
> +rebasing metas/more_testing onto metas/this_is_also_a_test
> +rebasing metas/unrelated_change onto origin/master
> +Conflict detected! Resolve it and then use git evolve --continue to resume.
> +
> +# Sort out the conflict
> +$ git mergetool
> +$ git evolve --continue
> +Done
> +
> +# Share the full history of edits for the this_is_a_test change
> +# with a review server
> +$ git push origin metas/this_is_a_test:refs/for/master
> +# Share the lastest commit for “Unrelated change”, without history
> +$ git push origin HEAD:refs/for/master

How do we group changes of a topic together? I think branch-diff could
take advantage of that.

> +Detailed design
> +===
> +Obsolescence information is stored as a graph of meta-commits. A meta-commit 
> is
> +a specially-formatted merge commit that describes how one commit was created
> +from others.
> +
> +Meta-commits look like this:
> +
> +$ git cat-file -p 
> +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> +parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> +author Stefan Xenos  1540841596 -0700
> +committer Stefan Xenos  1540841596 -0700
> +parent-type content
> +parent-type obsolete
> +parent-type origin
> +
> +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by
> +cherry-picking commit 7e1bbcd3”.

This feels a bit forced. Could we just organize it like a normal
history? Something like

*
|\
| * last version of the commit
*
|\
| * second last version of the commit
*
|\

Basically all commits will be linked in a new merge history. Real
commits are on the second parent, first parent is to link changes
together. This makes it possible to just use "git log --first-parent
--patch" (or "git log --oneline --graph") to examine the change. More
details (e.g. parent-type) could be stored as normal trailers in the
commit message of these merges.
-- 
Duy


[PATCH-2] Change writing style

2018-11-16 Thread Jacques Bodin-Hullin
---
 parse-options.c  | 4 ++--
 t/t0040-parse-options.sh | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/parse-options.c b/parse-options.c
index 3b874a83a0c89..f5ef24155950b 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -352,7 +352,7 @@ static void check_typos(const char *arg, const struct 
option *options)
return;
 
if (starts_with(arg, "no-")) {
-   error ("did you mean `--%s` (with two dashes ?)", arg);
+   error("did you mean `--%s` (with two dashes)?", arg);
exit(129);
}
 
@@ -360,7 +360,7 @@ static void check_typos(const char *arg, const struct 
option *options)
if (!options->long_name)
continue;
if (starts_with(options->long_name, arg)) {
-   error ("did you mean `--%s` (with two dashes ?)", arg);
+   error("did you mean `--%s` (with two dashes)?", arg);
exit(129);
}
}
diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh
index 5b0560fa20e34..c442f9ae15ff8 100755
--- a/t/t0040-parse-options.sh
+++ b/t/t0040-parse-options.sh
@@ -222,7 +222,7 @@ test_expect_success 'non ambiguous option (after two 
options it abbreviates)' '
 '
 
 cat >typo.err <<\EOF
-error: did you mean `--boolean` (with two dashes ?)
+error: did you mean `--boolean` (with two dashes)?
 EOF
 
 test_expect_success 'detect possible typos' '
@@ -232,7 +232,7 @@ test_expect_success 'detect possible typos' '
 '
 
 cat >typo.err <<\EOF
-error: did you mean `--ambiguous` (with two dashes ?)
+error: did you mean `--ambiguous` (with two dashes)?
 EOF
 
 test_expect_success 'detect possible typos' '

--
https://github.com/git/git/pull/540


[PATCH v5 1/1] protocol: advertise multiple supported versions

2018-11-16 Thread Josh Steadmon
Currently the client advertises that it supports the wire protocol
version set in the protocol.version config. However, not all services
support the same set of protocol versions. For example, git-receive-pack
supports v1 and v0, but not v2. If a client connects to git-receive-pack
and requests v2, it will instead be downgraded to v0. Other services,
such as git-upload-archive, do not do any version negotiation checks.

This patch creates a protocol version registry. Individual client and
server programs register all the protocol versions they support prior to
communicating with a remote instance. Versions should be listed in
preference order; the version specified in protocol.version will
automatically be moved to the front of the registry.

The protocol version registry is passed to remote helpers via the
GIT_PROTOCOL environment variable.

Clients now advertise the full list of registered versions. Servers
select the first allowed version from this advertisement.

Additionally, remove special cases around advertising version=0.
Previously we avoided adding version negotiation fields in server
responses if it looked like the client wanted v0. However, including
these fields does not change behavior, so it's better to have simpler
code.

While we're at it, remove unnecessary externs from function declarations
in protocol.h.

Signed-off-by: Josh Steadmon 
---
 builtin/archive.c   |   3 +
 builtin/clone.c |   4 ++
 builtin/fetch-pack.c|   4 ++
 builtin/fetch.c |   5 ++
 builtin/ls-remote.c |   5 ++
 builtin/pull.c  |   5 ++
 builtin/push.c  |   4 ++
 builtin/receive-pack.c  |   3 +
 builtin/send-pack.c |   3 +
 builtin/upload-archive.c|   3 +
 builtin/upload-pack.c   |   4 ++
 connect.c   |  52 +++
 protocol.c  | 122 +---
 protocol.h  |  23 ++-
 remote-curl.c   |  27 +---
 t/t5551-http-fetch-smart.sh |   1 +
 t/t5570-git-daemon.sh   |   2 +-
 t/t5601-clone.sh|  38 +--
 t/t5700-protocol-v1.sh  |   8 +--
 t/t5702-protocol-v2.sh  |  16 +++--
 transport-helper.c  |   6 ++
 21 files changed, 256 insertions(+), 82 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index e74f675390..8adb9f381b 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -8,6 +8,7 @@
 #include "transport.h"
 #include "parse-options.h"
 #include "pkt-line.h"
+#include "protocol.h"
 #include "sideband.h"
 
 static void create_output_file(const char *output_file)
@@ -94,6 +95,8 @@ int cmd_archive(int argc, const char **argv, const char 
*prefix)
OPT_END()
};
 
+   register_allowed_protocol_version(protocol_v0);
+
argc = parse_options(argc, argv, prefix, local_opts, NULL,
 PARSE_OPT_KEEP_ALL);
 
diff --git a/builtin/clone.c b/builtin/clone.c
index fd2c3ef090..1651a950b6 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -900,6 +900,10 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
struct refspec rs = REFSPEC_INIT_FETCH;
struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
+   register_allowed_protocol_version(protocol_v2);
+   register_allowed_protocol_version(protocol_v1);
+   register_allowed_protocol_version(protocol_v0);
+
fetch_if_missing = 0;
 
packet_trace_identity("clone");
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 1a1bc63566..cba935e4d3 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -57,6 +57,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
 
fetch_if_missing = 0;
 
+   register_allowed_protocol_version(protocol_v2);
+   register_allowed_protocol_version(protocol_v1);
+   register_allowed_protocol_version(protocol_v0);
+
packet_trace_identity("fetch-pack");
 
memset(, 0, sizeof(args));
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 61bec5d213..2a20cf8bfd 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -21,6 +21,7 @@
 #include "argv-array.h"
 #include "utf8.h"
 #include "packfile.h"
+#include "protocol.h"
 #include "list-objects-filter-options.h"
 
 static const char * const builtin_fetch_usage[] = {
@@ -1476,6 +1477,10 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
int prune_tags_ok = 1;
struct argv_array argv_gc_auto = ARGV_ARRAY_INIT;
 
+   register_allowed_protocol_version(protocol_v2);
+   register_allowed_protocol_version(protocol_v1);
+   register_allowed_protocol_version(protocol_v0);
+
packet_trace_identity("fetch");
 
fetch_if_missing = 0;
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index 1a25df7ee1..ea685e8bb9 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -1,5 +1,6 @@
 #include "builtin.h"
 #include "cache.h"
+#include "protocol.h"
 

[PATCH v5 0/1] Advertise multiple supported proto versions

2018-11-16 Thread Josh Steadmon
Changes from V4: remove special cases around advertising version=0.

Add a registry of supported wire protocol versions that individual
commands can use to declare supported versions before contacting a
server. The client will then advertise all supported versions, while the
server will choose the first allowed version from the advertised
list.

Every command that acts as a client or server must now register its
supported protocol versions.


Josh Steadmon (1):
  protocol: advertise multiple supported versions

 builtin/archive.c   |   3 +
 builtin/clone.c |   4 ++
 builtin/fetch-pack.c|   4 ++
 builtin/fetch.c |   5 ++
 builtin/ls-remote.c |   5 ++
 builtin/pull.c  |   5 ++
 builtin/push.c  |   4 ++
 builtin/receive-pack.c  |   3 +
 builtin/send-pack.c |   3 +
 builtin/upload-archive.c|   3 +
 builtin/upload-pack.c   |   4 ++
 connect.c   |  52 +++
 protocol.c  | 122 +---
 protocol.h  |  23 ++-
 remote-curl.c   |  27 +---
 t/t5551-http-fetch-smart.sh |   1 +
 t/t5570-git-daemon.sh   |   2 +-
 t/t5601-clone.sh|  38 +--
 t/t5700-protocol-v1.sh  |   8 +--
 t/t5702-protocol-v2.sh  |  16 +++--
 transport-helper.c  |   6 ++
 21 files changed, 256 insertions(+), 82 deletions(-)

Range-diff against v4:
1:  3c023991c0 ! 1:  60f6f2fbd8 protocol: advertise multiple supported versions
@@ -21,6 +21,12 @@
 Clients now advertise the full list of registered versions. Servers
 select the first allowed version from this advertisement.
 
+Additionally, remove special cases around advertising version=0.
+Previously we avoided adding version negotiation fields in server
+responses if it looked like the client wanted v0. However, including
+these fields does not change behavior, so it's better to have simpler
+code.
+
 While we're at it, remove unnecessary externs from function 
declarations
 in protocol.h.
 
@@ -245,18 +251,21 @@
  {
struct child_process *conn;
 @@
+   prog, path, 0,
target_host, 0);
  
-   /* If using a new version put that stuff here after a second null byte 
*/
+-  /* If using a new version put that stuff here after a second null byte 
*/
 -  if (version > 0) {
-+  if (strcmp(version_advert->buf, "version=0")) {
-   strbuf_addch(, '\0');
+-  strbuf_addch(, '\0');
 -  strbuf_addf(, "version=%d%c",
 -  version, '\0');
-+  strbuf_addf(, "%s%c", version_advert->buf, '\0');
-   }
+-  }
++  /* Add version fields after a second null byte */
++  strbuf_addch(, '\0');
++  strbuf_addf(, "%s%c", version_advert->buf, '\0');
  
packet_write(fd[1], request.buf, request.len);
+ 
 @@
   */
  static void push_ssh_options(struct argv_array *args, struct argv_array 
*env,
@@ -264,9 +273,9 @@
 -   enum protocol_version version, int flags)
 +   const struct strbuf *version_advert, int flags)
  {
-   if (variant == VARIANT_SSH &&
+-  if (variant == VARIANT_SSH &&
 -  version > 0) {
-+  strcmp(version_advert->buf, "version=0")) {
++  if (variant == VARIANT_SSH) {
argv_array_push(args, "-o");
argv_array_push(args, "SendEnv=" GIT_PROTOCOL_ENVIRONMENT);
 -  argv_array_pushf(env, GIT_PROTOCOL_ENVIRONMENT "=version=%d",
@@ -346,13 +355,13 @@
 -  if (version > 0) {
 -  argv_array_pushf(>env_array, 
GIT_PROTOCOL_ENVIRONMENT "=version=%d",
 -   version);
-+  if (strcmp(version_advert.buf, "version=0")) {
-+  argv_array_pushf(>env_array,
-+   GIT_PROTOCOL_ENVIRONMENT "=%s",
-+   version_advert.buf);
-   }
+-  }
++  argv_array_pushf(>env_array,
++   GIT_PROTOCOL_ENVIRONMENT "=%s",
++   version_advert.buf);
}
argv_array_push(>args, cmd.buf);
+ 
 
  diff --git a/protocol.c b/protocol.c
  --- a/protocol.c
@@ -573,8 +582,7 @@
 +static int get_client_protocol_http_header(const struct strbuf 
*version_advert,
 + struct strbuf *header)
 +{
-+  if (version_advert->len > 0 &&
-+  strcmp(version_advert->buf, "version=0")) {
++  if (version_advert->len > 0) {
 +  strbuf_addf(header, GIT_PROTOCOL_HEADER ": %s",
 +   

Re: [PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change

2018-11-16 Thread Stefan Beller
On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood  wrote:
>
> From: Phillip Wood 
>
> Currently diff --color-moved-ws=allow-indentation-change does not
> support indentation that contains a mix of tabs and spaces. For
> example in commit 546f70f377 ("convert.h: drop 'extern' from function
> declaration", 2018-06-30) the function parameters in the following
> lines are not colored as moved [1].
>
> -extern int stream_filter(struct stream_filter *,
> -const char *input, size_t *isize_p,
> -char *output, size_t *osize_p);
> +int stream_filter(struct stream_filter *,
> + const char *input, size_t *isize_p,
> + char *output, size_t *osize_p);
>
> This commit changes the way the indentation is handled to track the
> visual size of the indentation rather than the characters in the
> indentation. This has they benefit that any whitespace errors do not

s/they/the/

> interfer with the move detection (the whitespace errors will still be
> highlighted according to --ws-error-highlight). During the discussion
> of this feature there were concerns about the correct detection of
> indentation for python. However those concerns apply whether or not
> we're detecting moved lines so no attempt is made to determine if the
> indentation is 'pythonic'.
>
> [1] Note that before the commit to fix the erroneous coloring of moved
> lines each line was colored as a different block, since that commit
> they are uncolored.
>
> Signed-off-by: Phillip Wood 
> ---
>
> Notes:
> Changes since rfc:
>  - It now replaces the existing implementation rather than adding a new
>mode.
>  - The indentation deltas are now calculated once for each line and
>cached.
>  - Optimized the whitespace delta comparison to compare string lengths
>before comparing the actual strings.
>  - Modified the calculation of tabs as suggested by Stefan.
>  - Split out the blank line handling into a separate commit as suggest
>by Stefan.
>  - Fixed some comments pointed out by Stefan.
>
>  diff.c | 130 +
>  t/t4015-diff-whitespace.sh |  56 
>  2 files changed, 129 insertions(+), 57 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index c378ce3daf..89559293e7 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -750,6 +750,8 @@ struct emitted_diff_symbol {
> const char *line;
> int len;
> int flags;
> +   int indent_off;
> +   int indent_width;

So this is the trick how we compute the ws related
data only once per line. :-)

On the other hand, we do not save memory by disabling
the ws detection, but I guess that is not a problem for now.

Would it make sense to have the new variables be
unsigned? (Also a comment on what they are, I
needed to read the code to understand off to be
offset into the line, where the content starts, and
width to be the visual width, as I did not recall
the RFC.)

> +static void fill_es_indent_data(struct emitted_diff_symbol *es)
> [...]

> +   if (o->color_moved_ws_handling &
> +   COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
> +   fill_es_indent_data(>emitted_symbols->buf[n]);

Nice.

By reducing the information kept around to ints, we also do not need
to alloc/free
memory for each line.

> +++ b/t/t4015-diff-whitespace.sh
> @@ -1901,4 +1901,60 @@ test_expect_success 'compare whitespace delta 
> incompatible with other space opti
> test_i18ngrep allow-indentation-change err
>  '
>
> +test_expect_success 'compare mixed whitespace delta across moved blocks' '

Looks good,

Thanks!
Stefan


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-16 Thread Derrick Stolee

On 11/14/2018 7:55 PM, sxe...@google.com wrote:

From: Stefan Xenos 

This document describes what an obsolescence graph for
git would look like, the behavior of the evolve command,
and the changes planned for other commands.


Thanks for putting this together!


diff --git a/Documentation/technical/evolve.txt 
b/Documentation/technical/evolve.txt

...

+Git Obsolescence Graph
+==
+
+Objective
+-
+Track the edits to a commit over time in an obsolescence graph.


The file name and the title are in a mismatch.

I'd prefer if the title was "Git Evolve Design Document" and this 
opening paragraph

was about the reasons we want a 'git evolve' command. Here is my attempt:

  The proposed 'git evolve' command will help users craft a 
high-quality commit
  history in their topic branches. By working to improve commits one at 
a time,
  then running 'git evolve', users can rewrite recent history with more 
options
  than interactive rebase. The core benefit is that users can pause 
their progress
  and move to other branches before returning to where they left off. 
Users can
  also share progress with others using standard 'push', 'fetch', and 
'format-patch'

  commands.


+Background
+--


Perhaps you can call this "Example"?


+Imagine you have three dependent changes up for review and you receive feedback
+that requires editing all three changes. While you're editing one, more 
feedback
+arrives on one of the others. What do you do?


"three dependent changes" sounds a bit vague enough to me to possibly 
confuse readers. Perhaps

"three sequential patches"?


+- Users can view the history of a commit directly (the sequence of amends and
+  rebases it has undergone, orthogonal to the history of the branch it is on).


"the history of a commit" doesn't semantically work, as a commit is an 
immutable Git object.


Instead, I would try to use the term "patch" to describe a change to the 
codebase, and that
takes the form as a list of commits that are improving on each other 
(but don't actually
have each other in their commit history). This means that the lifetime 
of a patch is described

by the commits that are amended or rebased.


+- By pushing and pulling the obsolescence graph, users can collaborate more
+  easily on changes-in-progress. This is better than pushing and pulling the
+  changes themselves since the obsolescence graph can be used to locate a more
+  specific merge base, allowing for better merges between different versions of
+  the same change.


(Making a note so I come back to this. I hope to learn what you mean by 
this "more specific

merge base".)


+
+Similar technologies
+
... It can't handle the case where you have
+multiple changes sharing the same parent when that parent needs to be rebased


Perhaps this could be made more concrete by describing commit history 
and a specific workflow

change using 'git evolve'.

Suppose we have two topic branches, topic1 and topic2, that point to 
commits A and B,
respectively.Suppose further that A and B have a common parent C with 
parent D. If we rebase
topic1 relativeto D, then we create new commits C' and A' that are newer 
versions of commits
C and A. It would benice to easily update topic2 to be on a new commit 
B' with parent C'.
Currently, a user needs to knowthat C updated to C', and use 'git rebase 
--onto C' C topic2'.
Instead, if we have a marker showing thatC' is an updated version of C, 
'git log topic2'
would show that topic2 can be updated, and the 'gitevolve' command would 
perform the correct

action to make B' with parent C'.

(This paragraph above is an example of "what can happen now is 
complicated and demands that
the user keep some information in their memory" and "the new workflow is 
simpler and helps
users make the right decision". I think we could use more of these at 
the start to sell the

idea.)



+and won't let you collaborate with others on resolving a complicated 
interactive
+rebase.


In the same sentence, we have an even more complicated workflow 
mentioned as an aside. This
could be fleshed out more concretely. It could help describing that the 
current model is for
usersto share "!fixup" commits and then one performs an interactive 
rebase to apply those
fixups inthe correct order. If a user instead shares an amended commit, 
then we are in a
difficult state toapply those changes. The new workflow would be to 
share amended commits

and 'git evolve'inserts the correct amended commits in the right order.

I'm a big proponent of the teaching philosophy of "examples first". It's 
easier to talk

abstractlyafter going through some concrete examples.


  You can think of rebase -i as a top-down approach and the evolve command
+as the bottom-up approach to the same problem.


This comparison is important. Perhaps it is more specific to say 
"interactive rebase splits
a plan torewrite history into independent units of work, while evolve 
collects independent


Re: [PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change

2018-11-16 Thread Stefan Beller
On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood  wrote:
>
> From: Phillip Wood 
>
> When running
>
>   git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0
>
> cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7%
> return after comparing a and b. By comparing the lengths first we can
> return early in all but 0.03% of those cases without dereferencing the
> string pointers. The comparison between a and c fails in 6.8% of
> calls, by comparing the lengths first we reject all the failing calls
> without dereferencing the string pointers.
>
> This reduces the time to run the command above by by 42% from 14.6s to
> 8.5s. This is still much slower than the normal --color-moved which
> takes ~0.6-0.7s to run but is a significant improvement.
>
> The next commits will replace the current implementation with one that
> works with mixed tabs and spaces in the indentation. I think it is
> worth optimizing the current implementation first to enable a fair
> comparison between the two implementations.

Up to here the series looks good and I think we could take it
as a preparatory self-standing series.

I'll read on.
Thanks,
Stefan


Re: [PATCH 2/3] remote-curl: tighten "version 2" check for smart-http

2018-11-16 Thread Josh Steadmon
On 2018.11.16 03:48, Jeff King wrote:
> In a v2 smart-http conversation, the server should reply to our initial
> request with a pkt-line saying "version 2" (this is the start of the
> "capabilities advertisement"). We check for the string using
> starts_with(), but that's overly permissive (it would match "version
> 20", for example).
> 
> Let's tighten this check to use strcmp(). Note that we don't need to
> worry about a trailing newline here, because the ptk-line code will have
> chomped it for us already.
> 
> Signed-off-by: Jeff King 
> ---
>  remote-curl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/remote-curl.c b/remote-curl.c
> index dd9bc41aa1..3c9c4a07c3 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -375,7 +375,7 @@ static void check_smart_http(struct discovery *d, const 
> char *service,
>   d->len = src_len;
>   d->proto_git = 1;
>  
> - } else if (starts_with(line, "version 2")) {
> + } else if (!strcmp(line, "version 2")) {
>   /*
>* v2 smart http; do not consume version packet, which will
>* be handled elsewhere.
> -- 
> 2.19.1.1636.gc7a073d580
> 

Looks good to me.

Reviewed-by: Josh Steadmon 


Re: [PATCH 1/3] remote-curl: refactor smart-http discovery

2018-11-16 Thread Josh Steadmon
On 2018.11.16 03:47, Jeff King wrote:
> After making initial contact with an http server, we have to decide if
> the server supports smart-http, and if so, which version. Our rules are
> a bit inconsistent:
> 
>   1. For v0, we require that the content-type indicates a smart-http
>  response. We also require the response to look vaguely like a
>  pkt-line starting with "#". If one of those does not match, we fall
>  back to dumb-http.
> 
>  But according to our http protocol spec[1]:
> 
>Dumb servers MUST NOT return a return type starting with
>`application/x-git-`.
> 
>  If we see the expected content-type, we should consider it
>  smart-http. At that point we can parse the pkt-line for real, and
>  complain if it is not syntactically valid.
> 
>   2. For v2, we do not actually check the content-type. Our v2 protocol
>  spec says[2]:
> 
>When using the http:// or https:// transport a client makes a
>"smart" info/refs request as described in `http-protocol.txt`[...]
> 
>  and the http spec is clear that for a smart-http[3]:
> 
>The Content-Type MUST be `application/x-$servicename-advertisement`.
> 
>  So it is required according to the spec.
> 
> These inconsistencies were easy to miss because of the way the original
> code was written as an inline conditional. Let's pull it out into its
> own function for readability, and improve a few things:
> 
>  - we now predicate the smart/dumb decision entirely on the presence of
>the correct content-type
> 
>  - we do a real pkt-line parse before deciding how to proceed (and die
>if it isn't valid)
> 
>  - use skip_prefix() for comparing service strings, instead of
>constructing expected output in a strbuf; this avoids dealing with
>memory cleanup
> 
> Note that this _is_ tightening what the client will allow. It's all
> according to the spec, but it's possible that other implementations
> might violate these. However, violating these particular rules seems
> like an odd choice for a server to make.
> 
> [1] Documentation/technical/http-protocol.txt, l. 166-167
> [2] Documentation/technical/protocol-v2.txt, l. 63-64
> [3] Documentation/technical/http-protocol.txt, l. 247
> 
> Helped-by: Josh Steadmon 
> Signed-off-by: Jeff King 
> ---
>  remote-curl.c | 93 ---
>  1 file changed, 59 insertions(+), 34 deletions(-)
> 
> diff --git a/remote-curl.c b/remote-curl.c
> index 762a55a75f..dd9bc41aa1 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -330,9 +330,65 @@ static int get_protocol_http_header(enum 
> protocol_version version,
>   return 0;
>  }
>  
> +static void check_smart_http(struct discovery *d, const char *service,
> +  struct strbuf *type)
> +{
> + char *src_buf;
> + size_t src_len;
> + char *line;
> + const char *p;
> +
> + /*
> +  * If we don't see x-$service-advertisement, then it's not smart-http.
> +  * But once we do, we commit to it and assume any other protocol
> +  * violations are hard errors.
> +  */
> + if (!skip_prefix(type->buf, "application/x-", ) ||
> + !skip_prefix(p, service, ) ||
> + strcmp(p, "-advertisement"))
> + return;
> +
> + /*
> +  * "Peek" at the first packet by using a separate buf/len pair; some
> +  * cases below require us leaving the originals intact.
> +  */
> + src_buf = d->buf;
> + src_len = d->len;
> + line = packet_read_line_buf(_buf, _len, NULL);
> + if (!line)
> + die("invalid server response; expected service, got flush 
> packet");
> +
> + if (skip_prefix(line, "# service=", ) && !strcmp(p, service)) {
> + /*
> +  * The header can include additional metadata lines, up
> +  * until a packet flush marker.  Ignore these now, but
> +  * in the future we might start to scan them.
> +  */
> + while (packet_read_line_buf(_buf, _len, NULL))
> + ;
> +
> + /*
> +  * v0 smart http; callers expect us to soak up the
> +  * service and header packets
> +  */
> + d->buf = src_buf;
> + d->len = src_len;
> + d->proto_git = 1;
> +
> + } else if (starts_with(line, "version 2")) {
> + /*
> +  * v2 smart http; do not consume version packet, which will
> +  * be handled elsewhere.
> +  */
> + d->proto_git = 1;
> +
> + } else {
> + die("invalid server response; got '%s'", line);
> + }
> +}
> +
>  static struct discovery *discover_refs(const char *service, int for_push)
>  {
> - struct strbuf exp = STRBUF_INIT;
>   struct strbuf type = STRBUF_INIT;
>   struct strbuf charset = STRBUF_INIT;
>   struct strbuf buffer = STRBUF_INIT;
> @@ -405,38 +461,8 @@ static struct discovery 

Re: [PATCH 0/3] remote-curl smart-http discovery cleanup

2018-11-16 Thread Josh Steadmon
On 2018.11.16 03:44, Jeff King wrote:
[...]
> Amusingly, this does break the test you just added, because it tries to
> issue an ERR after claiming "text/html" (and after my patch, we
> correctly fall back to dumb-http).

Heh yeah, I copied the script from a dumb-http test without reading the
spec.


Re: [PATCH v3 1/1] protocol: advertise multiple supported versions

2018-11-16 Thread Josh Steadmon
On 2018.11.16 11:45, Junio C Hamano wrote:
> Josh Steadmon  writes:
> 
> >> What I was alludding to was a lot simpler, though.  An advert string
> >> "version=0:version=1" from a client that prefers version 0 won't be
> >> !strcmp("version=0", advert) but seeing many strcmp() in the patch
> >> made me wonder.
> >
> > Ah I see. The strcmp()s against "version=0" are special cases for where
> > it looks like the client does not understand any sort of version
> > negotiation. If we see multiple versions listed in the advert, then the
> > rest of the selection logic should do the right thing.
> 
> OK, let me try to see if I understand correctly by rephrasing.
> 
> If the client does not say anything about which version it prefers
> (because it only knows about version 0 without even realizing that
> there is a version negotiation exchange), we substitute the lack of
> proposed versions with string "version=0", and the strcmp()s I saw
> and was puzzled by are all about special casing such a client.  But
> we would end up behaving the same way (at least when judged only by
> externally visible effects) to a client that is aware of version
> negotiation and tells us it prefers version 0 (and nothing else)
> with the selection logic anyway.
> 
> Did I get it right?  If so, yeah, I think it makes sense to avoid
> two codepaths that ends up doing the same thing by removing the
> special case.

Yes, that is correct. The next version will remove the special cases
here.

> > However, I think that it might work to remove the special cases. In the
> > event that the client is so old that it doesn't understand any form of
> > version negotiation, it should just ignore the version fields /
> > environment vars. If you think it's cleaner to remove the special cases,
> > let me know.


Re: [PATCH v2] read-cache: write all indexes with the same permissions

2018-11-16 Thread Christian Couder
On Fri, Nov 16, 2018 at 7:29 PM SZEDER Gábor  wrote:
>
> On Fri, Nov 16, 2018 at 06:31:05PM +0100, Christian Couder wrote:
> > diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
> > index 2ac47aa0e4..fa1d3d468b 100755
> > --- a/t/t1700-split-index.sh
> > +++ b/t/t1700-split-index.sh
> > @@ -381,6 +381,26 @@ test_expect_success 'check 
> > splitIndex.sharedIndexExpire set to "never" and "now"
> >   test $(ls .git/sharedindex.* | wc -l) -le 2
> >  '
> >
> > +test_expect_success POSIXPERM 'same mode for index & split index' '
> > + git init same-mode &&
> > + (
> > + cd same-mode &&
> > + test_commit A &&
> > + test_modebits .git/index >index_mode &&
> > + test_must_fail git config core.sharedRepository &&
> > + git -c core.splitIndex=true status &&
> > + shared=$(ls .git/sharedindex.*) &&
>
> I think the command substitution and 'ls' are unnecessary, and
>
>   shared=.git/sharedindex.*
>
> would work as well.

If there is no shared index file with the above we would get:

shared=.git/sharedindex.*
$ echo $shared
.git/sharedindex.*

which seems bug prone to me.


Re: [PATCH v2] read-cache: write all indexes with the same permissions

2018-11-16 Thread Ævar Arnfjörð Bjarmason


On Fri, Nov 16 2018, SZEDER Gábor wrote:

> On Fri, Nov 16, 2018 at 06:31:05PM +0100, Christian Couder wrote:
>> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
>> index 2ac47aa0e4..fa1d3d468b 100755
>> --- a/t/t1700-split-index.sh
>> +++ b/t/t1700-split-index.sh
>> @@ -381,6 +381,26 @@ test_expect_success 'check splitIndex.sharedIndexExpire 
>> set to "never" and "now"
>>  test $(ls .git/sharedindex.* | wc -l) -le 2
>>  '
>>
>> +test_expect_success POSIXPERM 'same mode for index & split index' '
>> +git init same-mode &&
>> +(
>> +cd same-mode &&
>> +test_commit A &&
>> +test_modebits .git/index >index_mode &&
>> +test_must_fail git config core.sharedRepository &&
>> +git -c core.splitIndex=true status &&
>> +shared=$(ls .git/sharedindex.*) &&
>
> I think the command substitution and 'ls' are unnecessary, and
>
>   shared=.git/sharedindex.*
>
> would work as well.

Looks like it. FWIW I just copy/pasted what an adjacent test was doing
for consistency, which was added in one of Christian's earlier changes
to this behavior.

But yeah, if the test can be made simpler in a portable way it would
make sense to make this a two-parter test cleanup & bug fix series.

>> +case "$shared" in
>> +*" "*)
>> +# we have more than one???
>> +false ;;
>> +*)
>> +test_modebits "$shared" >split_index_mode &&
>> +test_cmp index_mode split_index_mode ;;
>> +esac
>> +)
>> +'
>> +
>>  while read -r mode modebits
>>  do
>>  test_expect_success POSIXPERM "split index respects 
>> core.sharedrepository $mode" '
>> --
>> 2.19.1.1053.g063ed687ac
>>


Re: [RFC/PATCH 0/5] stop installing old libexec aliases like "git-init"

2018-11-16 Thread Ævar Arnfjörð Bjarmason


On Fri, Nov 16 2018, Michael Haggerty wrote:

> On Fri, Nov 16, 2018 at 11:38 AM Ævar Arnfjörð Bjarmason
>  wrote:
>> [...]
>> A follow-up on this: We should really fix this for other
>> reasons. I.e. compile in some "this is stuff we ourselves think is in
>> git".
>>
>> There's other manifestations of this, e.g.:
>>
>> git-sizer --help # => shows you help
>> git sizer --help # => says it doesn't have a manpage
>>
>> Because we aren't aware that git-sizer is some external tool, and that
>> we should route --help to it.
>
> That would be nice. This has been an annoying for several tools named
> `git-foo` that I have worked on (e.g., git-sizer, git-imerge,
> git-when-merged, plus many internal tools).
>
>> Non-withstanding the arguable bug that things like git-sizer shouldn't
>> be allowing themselves to be invoked by "git" like that without
>> guaranteeing that it can consume all the options 'git' expects. When I
>> had to deal with a similar problem in an external git-* command I was
>> maintaining I simply made it an error to invoke it as "git mything"
>> instead of "git-mything".
>
> Hmmm, I always thought that it was intended and supported that an
> external tool can name itself `git-foo` so that it can be invoked as
> `git foo`.
>
> Which git options do you think that such a tool should be expected to
> support? Many useful ones, like `-C `, `--git-dir=`,
> `--work-tree=`, `-c =`, and `--no-replace-objects`,
> work pretty much for free if the external tool uses `git` to interact
> with the repository. I use such options regularly with external tools.
> IMO it would be a regression for these tools to refuse to run when
> invoked as, say, `git -C path/to/repo foo`.

I don't mean we should forbid it, just that if you have an external
git-foo tool meant to be invoked like "git-foo" that and not "git foo"
it might be worth to save yourself the trouble and not support the
latter. I thought git-sizer was one such tool, since it docs just
mention "git-sizer".

But yeah, "-c" etc. are useful, and we definitely want to support this
in the general case. E.g. for "git-annex" and others that are meant to
be used like that.

So maybe we should just document this interface better. It seems one
implicit dependency is that we expect a manpage for the tool to exist
for --help.

Or, keep some list of internal git tools and treat them specially. But I
see now that would break "git annex --help" in the other direction, but
maybe that would be better. I don't know.

As I recall I stopped supporting "git" invocation for the tool I was
fixing a long time ago because of some funny interaction with terminals
& signals. I.e. git itself would eat some of them, but the tool wanted
to handle it instead.

But I don't remember the details, and can't reproduce it now with:

$ cat ~/bin/git-sigint 
#!/usr/bin/env perl
$SIG{INT} = sub { warn localtime . " " . $$ };
sleep 1 while 1;
$ git sigint # or git-sigint
[... behaves the same either way ...]

So maybe it was something else, or I'm misremembering...


Re: [RFC/PATCH] read-cache: write all indexes with the same permissions

2018-11-16 Thread Duy Nguyen
On Fri, Nov 16, 2018 at 8:07 PM SZEDER Gábor  wrote:
>
> On Fri, Nov 16, 2018 at 06:41:43PM +0100, Christian Couder wrote:
> > On Tue, Nov 13, 2018 at 6:34 PM Ævar Arnfjörð Bjarmason
> >  wrote:
>
> > > I'm asking whether the bug in this patch isn't revealing an existing
> > > issue with us not having any tests for N number of sharedindex.*
> > > files. I.e. we have >1 of them, merge them and prune them, don't we?
>
> We don't merge shared index files, but write a new one.

True. They are immutable like git objects.

> With the default 20% threshold a new shared index is written rather
> frequently with our usual small test-repos:

Side note. Split index is definitely not meant for small repos. But
maybe we should have a lower limit (in terms of absolute number of
entries) that prevent splitting. This splitting seems excessive.

>   $ git init
>   $ git update-index --split-index
>   $ ls -1 .git/*index*
>   .git/index
>   .git/sharedindex.4370042739b31cd17a5c5cd6043a77c9a00df113
>   $ echo 1 >file
>   $ git add file
>   $ git commit -q -m 1
>   $ echo 2 >file
>   $ git commit -q -m 2 file
>   $ echo 3 >file
>   $ git commit -q -m 3 file
>   $ ls -1 .git/*index*
>   .git/index
>   .git/sharedindex.4370042739b31cd17a5c5cd6043a77c9a00df113
>   .git/sharedindex.6aedbf71b1a6bdc0018078ec7571e1b21ba4b954
>   .git/sharedindex.b9106e9b82a818a0e4e9148224fc44ea98f488a1
>   .git/sharedindex.bad0b75d270a431b9e961cfc15df6ec935a67be8
-- 
Duy


Re: [PATCH v2] read-cache: write all indexes with the same permissions

2018-11-16 Thread Duy Nguyen
On Fri, Nov 16, 2018 at 8:10 PM Christian Couder
 wrote:
>
> On Fri, Nov 16, 2018 at 7:03 PM Duy Nguyen  wrote:
> >
> > On Fri, Nov 16, 2018 at 6:31 PM Christian Couder
> >  wrote:
> > > diff --git a/read-cache.c b/read-cache.c
> > > index 8c924506dd..ea80600bff 100644
> > > --- a/read-cache.c
> > > +++ b/read-cache.c
> > > @@ -3165,7 +3165,8 @@ int write_locked_index(struct index_state *istate, 
> > > struct lock_file *lock,
> > > struct tempfile *temp;
> > > int saved_errno;
> > >
> > > -   temp = mks_tempfile(git_path("sharedindex_XX"));
> > > +   /* Same permissions as the main .git/index file */
> >
> > If the permission is already correct from the beginning (of this temp
> > file), should df801f3f9f be reverted since we don't need to adjust
> > permission anymore?
>
> df801f3f9f (read-cache: use shared perms when writing shared index,
> 2017-06-25) was fixing the bug that permissions of the shared index
> file did not take into account the shared permissions (which are about
> core.sharedRepository; "shared" has a different meaning in "shared
> index file" and in "shared permissions").
>
> This fix only changes permissions before the shared permissions are
> taken into account (so before adjust_shared_perm() is called).
>
> > Or does $GIT_DIR/index go through the same adjust_shared_perm() anyway
> > in the end, which means df801f3f9f must stay?
>
> Yeah, $GIT_DIR/index goes through adjust_shared_perm() too because
> create_tempfile() calls adjust_shared_perm(). So indeed df801f3f9f
> must stay.

Ah thanks. By the time I got to this part

> Let's instead make the two consistent by using mks_tempfile_sm() and
> passing 0666 in its `mode` argument.

went look at that function and back, I forgot about the paragraph above it.
-- 
Duy


Re: [PATCH v2] read-cache: write all indexes with the same permissions

2018-11-16 Thread Christian Couder
On Fri, Nov 16, 2018 at 7:03 PM Duy Nguyen  wrote:
>
> On Fri, Nov 16, 2018 at 6:31 PM Christian Couder
>  wrote:
> > diff --git a/read-cache.c b/read-cache.c
> > index 8c924506dd..ea80600bff 100644
> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -3165,7 +3165,8 @@ int write_locked_index(struct index_state *istate, 
> > struct lock_file *lock,
> > struct tempfile *temp;
> > int saved_errno;
> >
> > -   temp = mks_tempfile(git_path("sharedindex_XX"));
> > +   /* Same permissions as the main .git/index file */
>
> If the permission is already correct from the beginning (of this temp
> file), should df801f3f9f be reverted since we don't need to adjust
> permission anymore?

df801f3f9f (read-cache: use shared perms when writing shared index,
2017-06-25) was fixing the bug that permissions of the shared index
file did not take into account the shared permissions (which are about
core.sharedRepository; "shared" has a different meaning in "shared
index file" and in "shared permissions").

This fix only changes permissions before the shared permissions are
taken into account (so before adjust_shared_perm() is called).

> Or does $GIT_DIR/index go through the same adjust_shared_perm() anyway
> in the end, which means df801f3f9f must stay?

Yeah, $GIT_DIR/index goes through adjust_shared_perm() too because
create_tempfile() calls adjust_shared_perm(). So indeed df801f3f9f
must stay.

> If so, perhaps clarify
> that in the commit message.

There is already the following about df801f3f9f:

---
A bug related to this was spotted, fixed and tested for in df801f3f9f
("read-cache: use shared perms when writing shared index", 2017-06-25)
and 3ee83f48e5 ("t1700: make sure split-index respects
core.sharedrepository", 2017-06-25).

However, as noted in those commits we'd still create the file as 0600,
and would just re-chmod it depending on the setting of
core.sharedRepository.
---

So I would think that df801f3f9f should perhaps have explained that
create_tempfile() calls adjust_shared_perm(), but I don't think that
it is very relevant in this commit message. We are already talking
about df801f3f9f which should be enough to explain the issue
df801f3f9f fixed, and I think we should not need to explain in more
details why df801f3f9f did a good job. It's not as if we are reverting
it. We are just complementing it with another fix related to what
happens before adjust_shared_perm() is called.

I think rewording the comment a bit might help though, for example maybe:

/* Same initial permissions as the main .git/index file */

instead of:

/* Same permissions as the main .git/index file */

Thanks,
Christian.


Re: [RFC/PATCH] read-cache: write all indexes with the same permissions

2018-11-16 Thread SZEDER Gábor
On Fri, Nov 16, 2018 at 06:41:43PM +0100, Christian Couder wrote:
> On Tue, Nov 13, 2018 at 6:34 PM Ævar Arnfjörð Bjarmason
>  wrote:

> > I'm asking whether the bug in this patch isn't revealing an existing
> > issue with us not having any tests for N number of sharedindex.*
> > files. I.e. we have >1 of them, merge them and prune them, don't we?

We don't merge shared index files, but write a new one.

> I think we shouldn't have many of them. Usually we should have just
> one, though it's possible that switching the shared index files
> feature on and off several times or using temporary index files could
> create more than one.
>
> And there is clean_shared_index_files() which calls
> should_delete_shared_index() to make sure they are regularly cleaned
> up.

I would think that it's more common to have more than one shared index
files, because a new shared index is written when the number of
entries in the split index reaches the threshold given in
'splitIndex.maxPercentChange'.  The old ones are kept until they
expire, and 'splitIndex.sharedIndexExpire' defaults to 2 weeks (and
can even be be set to "never").

With the default 20% threshold a new shared index is written rather
frequently with our usual small test-repos:

  $ git init
  $ git update-index --split-index
  $ ls -1 .git/*index*
  .git/index
  .git/sharedindex.4370042739b31cd17a5c5cd6043a77c9a00df113
  $ echo 1 >file
  $ git add file
  $ git commit -q -m 1
  $ echo 2 >file
  $ git commit -q -m 2 file
  $ echo 3 >file
  $ git commit -q -m 3 file
  $ ls -1 .git/*index*
  .git/index
  .git/sharedindex.4370042739b31cd17a5c5cd6043a77c9a00df113
  .git/sharedindex.6aedbf71b1a6bdc0018078ec7571e1b21ba4b954
  .git/sharedindex.b9106e9b82a818a0e4e9148224fc44ea98f488a1
  .git/sharedindex.bad0b75d270a431b9e961cfc15df6ec935a67be8

> Anyway it's a different topic and according to what we privately
> discussed I just sent
> https://public-inbox.org/git/20181116173105.21784-1-chrisc...@tuxfamily.org/
> to fix the current issue.


Re: [PATCH v1 2/9] diff: use whitespace consistently

2018-11-16 Thread Stefan Beller
On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood  wrote:
>
> From: Phillip Wood 
>
> Most of the documentation uses 'whitespace' rather than 'white space'
> or 'white spaces' convert to latter two to the former for consistency.

Makes sense; this doesn't touch docs, but also code.
$ git grep "white space" yields some other places
as well (Documentation/git-cat-file.txt and lots in t/)
But I guess we keep it to this feature for now instead
of a tree wide cleanup.

Stefan


Re: [PATCH v2] read-cache: write all indexes with the same permissions

2018-11-16 Thread SZEDER Gábor
On Fri, Nov 16, 2018 at 06:31:05PM +0100, Christian Couder wrote:
> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
> index 2ac47aa0e4..fa1d3d468b 100755
> --- a/t/t1700-split-index.sh
> +++ b/t/t1700-split-index.sh
> @@ -381,6 +381,26 @@ test_expect_success 'check splitIndex.sharedIndexExpire 
> set to "never" and "now"
>   test $(ls .git/sharedindex.* | wc -l) -le 2
>  '
>  
> +test_expect_success POSIXPERM 'same mode for index & split index' '
> + git init same-mode &&
> + (
> + cd same-mode &&
> + test_commit A &&
> + test_modebits .git/index >index_mode &&
> + test_must_fail git config core.sharedRepository &&
> + git -c core.splitIndex=true status &&
> + shared=$(ls .git/sharedindex.*) &&

I think the command substitution and 'ls' are unnecessary, and

  shared=.git/sharedindex.*

would work as well.

> + case "$shared" in
> + *" "*)
> + # we have more than one???
> + false ;;
> + *)
> + test_modebits "$shared" >split_index_mode &&
> + test_cmp index_mode split_index_mode ;;
> + esac
> + )
> +'
> +
>  while read -r mode modebits
>  do
>   test_expect_success POSIXPERM "split index respects 
> core.sharedrepository $mode" '
> -- 
> 2.19.1.1053.g063ed687ac
> 


Re: [PATCH v2] read-cache: write all indexes with the same permissions

2018-11-16 Thread Duy Nguyen
On Fri, Nov 16, 2018 at 6:31 PM Christian Couder
 wrote:
> diff --git a/read-cache.c b/read-cache.c
> index 8c924506dd..ea80600bff 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -3165,7 +3165,8 @@ int write_locked_index(struct index_state *istate, 
> struct lock_file *lock,
> struct tempfile *temp;
> int saved_errno;
>
> -   temp = mks_tempfile(git_path("sharedindex_XX"));
> +   /* Same permissions as the main .git/index file */

If the permission is already correct from the beginning (of this temp
file), should df801f3f9f be reverted since we don't need to adjust
permission anymore?

Or does $GIT_DIR/index go through the same adjust_shared_perm() anyway
in the end, which means df801f3f9f must stay? If so, perhaps clarify
that in the commit message.

> +   temp = mks_tempfile_sm(git_path("sharedindex_XX"), 0, 
> 0666);
> if (!temp) {
> oidclr(>base_oid);
> ret = do_write_locked_index(istate, lock, flags);
-- 
Duy


Re: [RFC/PATCH] read-cache: write all indexes with the same permissions

2018-11-16 Thread Christian Couder
On Tue, Nov 13, 2018 at 6:34 PM Ævar Arnfjörð Bjarmason
 wrote:
>
> On Tue, Nov 13 2018, Duy Nguyen wrote:
>
> > On Tue, Nov 13, 2018 at 4:32 PM Ævar Arnfjörð Bjarmason
> >  wrote:
> > I don't have any bright idea how to catch the literal _X file.
> > It's a temporary file and will not last long enough for us to verify
> > unless we intercept open() calls with LD_PRELOAD.
>
> Sorry for being unclear. I don't mean how can we catch this specific
> bug, that would be uninteresting and hard to test for.
>
> I'm asking whether the bug in this patch isn't revealing an existing
> issue with us not having any tests for N number of sharedindex.*
> files. I.e. we have >1 of them, merge them and prune them, don't we?

I think we shouldn't have many of them. Usually we should have just
one, though it's possible that switching the shared index files
feature on and off several times or using temporary index files could
create more than one.

And there is clean_shared_index_files() which calls
should_delete_shared_index() to make sure they are regularly cleaned
up.

Anyway it's a different topic and according to what we privately
discussed I just sent
https://public-inbox.org/git/20181116173105.21784-1-chrisc...@tuxfamily.org/
to fix the current issue.


[PATCH v2] read-cache: write all indexes with the same permissions

2018-11-16 Thread Christian Couder
From: Ævar Arnfjörð Bjarmason 

Change the code that writes out the shared index to use
mks_tempfile_sm() instead of mks_tempfile().

The create_tempfile() function is used to write out the main
".git/index" (via ".git/index.lock") using lock_file(). The
create_tempfile() function respects the umask, as it uses open() with
0666, whereas the mks_tempfile() function uses open() with 0600.

So mks_tempfile() which is used to create the shared index file is
likely to create such a file with restricted permissions compared to
the main ".git/index" file.

A bug related to this was spotted, fixed and tested for in df801f3f9f
("read-cache: use shared perms when writing shared index", 2017-06-25)
and 3ee83f48e5 ("t1700: make sure split-index respects
core.sharedrepository", 2017-06-25).

However, as noted in those commits we'd still create the file as 0600,
and would just re-chmod it depending on the setting of
core.sharedRepository. So without core.splitIndex a system with
e.g. the umask set to group writeability would work for the members of
the group, but not with core.splitIndex set, as members of the group
would not be able to access the shared index file.

Let's instead make the two consistent by using mks_tempfile_sm() and
passing 0666 in its `mode` argument.

Note that we cannot use the create_tempfile() function itself that is
used to write the main ".git/index" file because we want the XX
part of the "sharedindex_XX" argument to be replaced by a pseudo
random value and create_tempfile() doesn't do that.

Ideally we'd split up the adjust_shared_perm() function to one that
can give us the mode we want so we could just call open() instead of
open() followed by chmod(), but that's an unrelated cleanup. We
already have that minor issue with the "index" file #leftoverbits.

Signed-off-by: Ævar Arnfjörð Bjarmason 
Signed-off-by: Christian Couder 
---

This is a simpler fix iterating from Ævar's RFC patch and the
following discussions:

https://public-inbox.org/git/20181113153235.25402-1-ava...@gmail.com/

 read-cache.c   |  3 ++-
 t/t1700-split-index.sh | 20 
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 8c924506dd..ea80600bff 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -3165,7 +3165,8 @@ int write_locked_index(struct index_state *istate, struct 
lock_file *lock,
struct tempfile *temp;
int saved_errno;
 
-   temp = mks_tempfile(git_path("sharedindex_XX"));
+   /* Same permissions as the main .git/index file */
+   temp = mks_tempfile_sm(git_path("sharedindex_XX"), 0, 0666);
if (!temp) {
oidclr(>base_oid);
ret = do_write_locked_index(istate, lock, flags);
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 2ac47aa0e4..fa1d3d468b 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -381,6 +381,26 @@ test_expect_success 'check splitIndex.sharedIndexExpire 
set to "never" and "now"
test $(ls .git/sharedindex.* | wc -l) -le 2
 '
 
+test_expect_success POSIXPERM 'same mode for index & split index' '
+   git init same-mode &&
+   (
+   cd same-mode &&
+   test_commit A &&
+   test_modebits .git/index >index_mode &&
+   test_must_fail git config core.sharedRepository &&
+   git -c core.splitIndex=true status &&
+   shared=$(ls .git/sharedindex.*) &&
+   case "$shared" in
+   *" "*)
+   # we have more than one???
+   false ;;
+   *)
+   test_modebits "$shared" >split_index_mode &&
+   test_cmp index_mode split_index_mode ;;
+   esac
+   )
+'
+
 while read -r mode modebits
 do
test_expect_success POSIXPERM "split index respects 
core.sharedrepository $mode" '
-- 
2.19.1.1053.g063ed687ac



Re: [RFC/PATCH 0/5] stop installing old libexec aliases like "git-init"

2018-11-16 Thread Michael Haggerty
On Fri, Nov 16, 2018 at 11:38 AM Ævar Arnfjörð Bjarmason
 wrote:
> [...]
> A follow-up on this: We should really fix this for other
> reasons. I.e. compile in some "this is stuff we ourselves think is in
> git".
>
> There's other manifestations of this, e.g.:
>
> git-sizer --help # => shows you help
> git sizer --help # => says it doesn't have a manpage
>
> Because we aren't aware that git-sizer is some external tool, and that
> we should route --help to it.

That would be nice. This has been an annoying for several tools named
`git-foo` that I have worked on (e.g., git-sizer, git-imerge,
git-when-merged, plus many internal tools).

> Non-withstanding the arguable bug that things like git-sizer shouldn't
> be allowing themselves to be invoked by "git" like that without
> guaranteeing that it can consume all the options 'git' expects. When I
> had to deal with a similar problem in an external git-* command I was
> maintaining I simply made it an error to invoke it as "git mything"
> instead of "git-mything".

Hmmm, I always thought that it was intended and supported that an
external tool can name itself `git-foo` so that it can be invoked as
`git foo`.

Which git options do you think that such a tool should be expected to
support? Many useful ones, like `-C `, `--git-dir=`,
`--work-tree=`, `-c =`, and `--no-replace-objects`,
work pretty much for free if the external tool uses `git` to interact
with the repository. I use such options regularly with external tools.
IMO it would be a regression for these tools to refuse to run when
invoked as, say, `git -C path/to/repo foo`.

Michael


[PATCH v2 2/2] merge: add scissors line on merge conflict

2018-11-16 Thread Denton Liu
This fixes a bug where the scissors line is placed after the Conflicts:
section, in the case where a merge conflict occurs and
merge.cleanup = scissors.

In addition, we give pull the passthrough option of --cleanup so that it
can also take advantage of this change.

Signed-off-by: Denton Liu 
---
 Documentation/merge-options.txt |  6 +
 builtin/merge.c | 16 +++
 builtin/pull.c  |  6 +
 t/t7600-merge.sh| 48 +
 4 files changed, 76 insertions(+)

diff --git a/Documentation/merge-options.txt b/Documentation/merge-options.txt
index 63a3fc0954..115e0ca6f0 100644
--- a/Documentation/merge-options.txt
+++ b/Documentation/merge-options.txt
@@ -27,6 +27,12 @@ they run `git merge`. To make it easier to adjust such 
scripts to the
 updated behaviour, the environment variable `GIT_MERGE_AUTOEDIT` can be
 set to `no` at the beginning of them.
 
+--cleanup=::
+   This option determines how the merge message will be cleaned up
+   before being passed on. Specifically, if the '' is given a
+   value of `scissors`, scissors will be prepended to the message in
+   the case of a merge conflict. See also linkgit:git-commit[1].
+
 --ff::
When the merge resolves as a fast-forward, only update the branch
pointer, without creating a merge commit.  This is the default
diff --git a/builtin/merge.c b/builtin/merge.c
index 8f4a5065c2..23a6e6bb93 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -36,6 +36,7 @@
 #include "packfile.h"
 #include "tag.h"
 #include "alias.h"
+#include "wt-status.h"
 
 #define DEFAULT_TWOHEAD (1<<0)
 #define DEFAULT_OCTOPUS (1<<1)
@@ -96,6 +97,9 @@ enum ff_type {
 
 static enum ff_type fast_forward = FF_ALLOW;
 
+static const char *cleanup_arg;
+static int put_scissors;
+
 static int option_parse_message(const struct option *opt,
const char *arg, int unset)
 {
@@ -243,6 +247,7 @@ static struct option builtin_merge_options[] = {
N_("perform a commit if the merge succeeds (default)")),
OPT_BOOL('e', "edit", _edit,
N_("edit message before committing")),
+   OPT_STRING(0, "cleanup", _arg, N_("default"), N_("how to strip 
spaces and #comments from message")),
OPT_SET_INT(0, "ff", _forward, N_("allow fast-forward (default)"), 
FF_ALLOW),
OPT_SET_INT_F(0, "ff-only", _forward,
  N_("abort if fast-forward is not possible"),
@@ -606,6 +611,8 @@ static int git_merge_config(const char *k, const char *v, 
void *cb)
return git_config_string(_twohead, k, v);
else if (!strcmp(k, "pull.octopus"))
return git_config_string(_octopus, k, v);
+   else if (!strcmp(k, "commit.cleanup"))
+   return git_config_string(_arg, k, v);
else if (!strcmp(k, "merge.renormalize"))
option_renormalize = git_config_bool(k, v);
else if (!strcmp(k, "merge.ff")) {
@@ -894,6 +901,13 @@ static int suggest_conflicts(void)
filename = git_path_merge_msg(the_repository);
fp = xfopen(filename, "a");
 
+   if (put_scissors) {
+   fputc('\n', fp);
+   wt_status_add_cut_line(fp);
+   /* comments out the newline from append_conflicts_hint */
+   fputc(comment_line_char, fp);
+   }
+
append_conflicts_hint();
fputs(msgbuf.buf, fp);
strbuf_release();
@@ -1402,6 +1416,8 @@ int cmd_merge(int argc, const char **argv, const char 
*prefix)
if (option_edit < 0)
option_edit = default_edit_option();
 
+   put_scissors = cleanup_arg && !strcmp(cleanup_arg, "scissors");
+
if (!use_strategies) {
if (!remoteheads)
; /* already up-to-date */
diff --git a/builtin/pull.c b/builtin/pull.c
index 681c127a07..88245bce0e 100644
--- a/builtin/pull.c
+++ b/builtin/pull.c
@@ -95,6 +95,7 @@ static char *opt_signoff;
 static char *opt_squash;
 static char *opt_commit;
 static char *opt_edit;
+static char *opt_cleanup;
 static char *opt_ff;
 static char *opt_verify_signatures;
 static int opt_autostash = -1;
@@ -162,6 +163,9 @@ static struct option pull_options[] = {
OPT_PASSTHRU(0, "edit", _edit, NULL,
N_("edit message before committing"),
PARSE_OPT_NOARG),
+   OPT_PASSTHRU(0, "cleanup", _cleanup, NULL,
+   N_("how to strip spaces and #comments from message"),
+   PARSE_OPT_NOARG),
OPT_PASSTHRU(0, "ff", _ff, NULL,
N_("allow fast-forward"),
PARSE_OPT_NOARG),
@@ -625,6 +629,8 @@ static int run_merge(void)
argv_array_push(, opt_commit);
if (opt_edit)
argv_array_push(, opt_edit);
+   if (opt_cleanup)
+   argv_array_push(, opt_cleanup);
if (opt_ff)
argv_array_push(, opt_ff);
if (opt_verify_signatures)
diff --git 

[PATCH v2 1/2] commit: don't add scissors line if one exists in MERGE_MSG

2018-11-16 Thread Denton Liu
If commit.cleanup = scissors is specified, don't produce a scissors line
if one already exists in the MERGE_MSG file.

Signed-off-by: Denton Liu 
---
 builtin/commit.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 0d9828e29e..a74a324b7a 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -659,6 +659,7 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
const char *hook_arg2 = NULL;
int clean_message_contents = (cleanup_mode != COMMIT_MSG_CLEANUP_NONE);
int old_display_comment_prefix;
+   int merge_contains_scissors = 0;
 
/* This checks and barfs if author is badly specified */
determine_author_info(author_ident);
@@ -719,6 +720,8 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
strbuf_addbuf(, );
hook_arg1 = "message";
} else if (!stat(git_path_merge_msg(the_repository), )) {
+   size_t merge_msg_start;
+
/*
 * prepend SQUASH_MSG here if it exists and a
 * "merge --squash" was originally performed
@@ -729,8 +732,14 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
hook_arg1 = "squash";
} else
hook_arg1 = "merge";
+
+   merge_msg_start = sb.len;
if (strbuf_read_file(, git_path_merge_msg(the_repository), 
0) < 0)
die_errno(_("could not read MERGE_MSG"));
+
+   if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS &&
+   wt_status_locate_end(sb.buf + merge_msg_start, sb.len - 
merge_msg_start) < sb.len - merge_msg_start)
+   merge_contains_scissors = 1;
} else if (!stat(git_path_squash_msg(the_repository), )) {
if (strbuf_read_file(, git_path_squash_msg(the_repository), 
0) < 0)
die_errno(_("could not read SQUASH_MSG"));
@@ -798,7 +807,8 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
struct ident_split ci, ai;
 
if (whence != FROM_COMMIT) {
-   if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS)
+   if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS &&
+   !merge_contains_scissors)
wt_status_add_cut_line(s->fp);
status_printf_ln(s, GIT_COLOR_NORMAL,
whence == FROM_MERGE
@@ -824,7 +834,8 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
  " Lines starting\nwith '%c' will be ignored, 
and an empty"
  " message aborts the commit.\n"), 
comment_line_char);
else if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS &&
-whence == FROM_COMMIT)
+whence == FROM_COMMIT &&
+!merge_contains_scissors)
wt_status_add_cut_line(s->fp);
else /* COMMIT_MSG_CLEANUP_SPACE, that is. */
status_printf(s, GIT_COLOR_NORMAL,
-- 
2.19.1



[PATCH v2 0/2] Fix scissors bug during merge conflict

2018-11-16 Thread Denton Liu
Thanks for your feedback, Junio.

I tried to reroll the patch by adding another option into the MERGE_MODE
file but unfortunately, it didn't work completely because doing `merge
--squash` doesn't produce a MERGE_MODE. In addition to this, because of
the way merge and commit were structured, I needed to reorder a lot of
calls because some variables were only being set after I needed them.
Unless we want to produce a MERGE_MODE during --squash (which I don't
think is worth it) I don't think that this is the way to go.

Instead, I just refined my first approach and only checked the contents
of MERGE_MSG for a scissors line. The MERGE_MSG is going to be
machine-generated anyway so we should be safe from accidentally ignoring
a human-placed scissors line.

Changes since V1:
-
* Only check MERGE_MSG for a scissors line instead of all prepended files
* Make a variable static in merge where appropriate
* Add passthrough options in pull
* Add documentation for the new option
* Add tests to ensure desired behaviour

Denton Liu (2):
  commit: don't add scissors line if one exists
  merge: add scissors line on merge conflict

 Documentation/merge-options.txt |  6 +
 builtin/commit.c| 15 +--
 builtin/merge.c | 16 +++
 builtin/pull.c  |  6 +
 t/t7600-merge.sh| 48 +
 5 files changed, 89 insertions(+), 2 deletions(-)

-- 
2.19.1



Re: [PATCH] bundle: dup() output descriptor closer to point-of-use

2018-11-16 Thread Johannes Schindelin
Hi Peff,

On Fri, 16 Nov 2018, Jeff King wrote:

> On Thu, Nov 15, 2018 at 09:01:07PM +0100, Johannes Schindelin wrote:
> 
> > > It seems like we should be checking that the stale lockfile isn't left,
> > > which is the real problem (the warning is annoying, but is a symptom of
> > > the same thing). I.e., something like:
> > > 
> > >   test_must_fail git bundle create foobar.bundle master..master &&
> > >   test_path_is_missing foobar.bundle.lock
> > > 
> > > That would already pass on non-Windows platforms, but that's OK. It will
> > > never give a false failure.
> > > 
> > > If you don't mind, can you confirm that the test above fails without
> > > either of the two patches under discussion?
> > 
> > This test succeeds with your patch as well as with Gaël's, and fails when
> > neither patch is applied. So you're right, it is the much better test.
> 
> Thanks for checking!
> 
> > > > Do you want to integrate this test into your patch and run with it, or
> > > > do you want me to shepherd your patch?
> > > 
> > > I'll wrap it up with a commit message and a test.
> 
> Actually, I realized there's an even simpler way to do this. Here it is.
> 
> -- >8 --
> Subject: [PATCH] bundle: dup() output descriptor closer to point-of-use
> 
> When writing a bundle to a file, the bundle code actually creates
> "your.bundle.lock" using our lockfile interface. We feed that output
> descriptor to a child git-pack-objects via run-command, which has the
> quirk that it closes the output descriptor in the parent.
> 
> To avoid confusing the lockfile code (which still thinks the descriptor
> is valid), we dup() it, and operate on the duplicate.
> 
> However, this has a confusing side effect: after the dup() but before we
> call pack-objects, we have _two_ descriptors open to the lockfile. If we
> call die() during that time, the lockfile code will try to clean up the
> partially-written file. It knows to close() the file before unlinking,
> since on some platforms (i.e., Windows) the open file would block the
> deletion. But it doesn't know about the duplicate descriptor. On
> Windows, triggering an error at the right part of the code will result
> in the cleanup failing and the lockfile being left in the filesystem.
> 
> We can solve this by moving the dup() much closer to start_command(),
> shrinking the window in which we have the second descriptor open. It's
> easy to place this in such a way that no die() is possible. We could
> still die due to a signal in the exact wrong moment, but we already
> tolerate races there (e.g., a signal could come before we manage to put
> the file on the cleanup list in the first place).
> 
> As a bonus, this shields create_bundle() itself from the duplicate-fd
> trick, and we can simplify its error handling (note that the lock
> rollback now happens unconditionally, but that's OK; it's a noop if we
> didn't open the lock in the first place).
> 
> The included test uses an empty bundle to cause a failure at the right
> spot in the code, because that's easy to trigger (the other likely
> errors are write() problems like ENOSPC).  Note that it would already
> pass on non-Windows systems (because they are happy to unlink an
> already-open file).

Thanks, this is a very nice explanation (and now that I do not feel so
stressed as I did yesterday, I can easily wrap my head around it).

I can confirm that the test fails without the changes to bundle.c, and
succeeds with the changes.

Thank you so much!
Dscho

> Based-on-a-patch-by: Gaël Lhez 
> Signed-off-by: Jeff King 
> ---
>  bundle.c| 39 ++-
>  t/t5607-clone-bundle.sh |  6 ++
>  2 files changed, 24 insertions(+), 21 deletions(-)
> 
> diff --git a/bundle.c b/bundle.c
> index 1ef584b93b..6b0f6d8f10 100644
> --- a/bundle.c
> +++ b/bundle.c
> @@ -243,7 +243,7 @@ static int is_tag_in_date_range(struct object *tag, 
> struct rev_info *revs)
>  }
>  
>  
> -/* Write the pack data to bundle_fd, then close it if it is > 1. */
> +/* Write the pack data to bundle_fd */
>  static int write_pack_data(int bundle_fd, struct rev_info *revs)
>  {
>   struct child_process pack_objects = CHILD_PROCESS_INIT;
> @@ -256,6 +256,20 @@ static int write_pack_data(int bundle_fd, struct 
> rev_info *revs)
>   pack_objects.in = -1;
>   pack_objects.out = bundle_fd;
>   pack_objects.git_cmd = 1;
> +
> + /*
> +  * start_command() will close our descriptor if it's >1. Duplicate it
> +  * to avoid surprising the caller.
> +  */
> + if (pack_objects.out > 1) {
> + pack_objects.out = dup(pack_objects.out);
> + if (pack_objects.out < 0) {
> + error_errno(_("unable to dup bundle descriptor"));
> + child_process_clear(_objects);
> + return -1;
> + }
> + }
> +
>   if (start_command(_objects))
>   return error(_("Could not spawn pack-objects"));
>  
> @@ -421,21 +435,10 @@ int 

Re: [PATCH] ref-filter: don't look for objects when outside of a repository

2018-11-16 Thread SZEDER Gábor
On Fri, Nov 16, 2018 at 02:09:07PM +0900, Junio C Hamano wrote:
> Jeff King  writes:
> 
> > On Thu, Nov 15, 2018 at 04:38:44AM -0500, Jeff King wrote:
> >
> >> Is SOURCE_NONE a complete match for what we want?
> >> 
> >> I see problems in both directions:
> >> 
> >>  - sorting by "objectname" works now, but it's marked with SOURCE_OBJ,
> >>and would be forbidden with your patch.  I'm actually not sure if
> >>SOURCE_OBJ is accurate; we shouldn't need to access the object to
> >>show it (and we are probably wasting effort loading the full contents
> >>for tools like for-each-ref).
> >> 
> >>However, that's not the full story. For objectname:short, it _does_ call
> >>find_unique_abbrev(). So we expect to have an object directory.
> >
> > Oops, I'm apparently bad at reading. It is in fact SOURCE_OTHER, which
> > makes sense (outside of this whole "--sort outside a repo thing").
> >
> > But we'd ideally distinguish between "objectname" (which should be OK
> > outside a repo) and "objectname:short" (which currently segfaults).
> 
> Arguably, use of ref-filter machinery in ls-remote, whether it is
> given from inside or outside a repo, was a mistake in 1fb20dfd
> ("ls-remote: create '--sort' option", 2018-04-09), as the whole
> point of "ls-remote" is to peek the list of refs and it is perfectly
> normal that the objects listed are not available.

I hope that one day 'git ls-remote' will learn to '--format=...' its
output, and I think that (re)using the ref-filter machinery would be
the right way to go to achive that.  Sure, ref-filter supports a lot
of format specifiers that don't at all make sense in the context of
'ls-remote' (perhaps we should have a dedicated set of valid_atoms for
that), but I think it's perfectly reasonable to do something like:

  git ls-remote --format=%(refname:strip=2) remote

A concrete use case for that could be to eliminate the last remaining
shell loops from refs completion.



Re: [PATCHv3 00/23] Bring more repository handles into our code base

2018-11-16 Thread Derrick Stolee

On 11/13/2018 7:12 PM, Stefan Beller wrote:

Please have a look at the last 4 patches specifically as they were new in
the last iteration (but did not receive any comment), as they demonstrate
and fix a problem that is only exposed when using GIT_TEST_COMMIT_GRAPH=1
for the test suite.


I took a look at these last four and didn't find anything wrong. Rest of 
the series looks good.


While all of the TODOs in the last patch are an imperfect solution, I 
think it's probably the best we can do for now.


Thanks,
-Stolee


Re: git pull --rebase=preserve is always rebasing something, even on up-to-date branch

2018-11-16 Thread Johannes Schindelin
Hi Mikhail,

On Mon, 17 Sep 2018, Mikhail Matrosov wrote:

> Please try the following:
> 
> mmatrosov@Mikhail-PC:~/test$ git init --bare server
> Initialized empty Git repository in /home/mmatrosov/test/server/
> mmatrosov@Mikhail-PC:~/test$ git clone server local
> Cloning into 'local'...
> warning: You appear to have cloned an empty repository.
> done.
> mmatrosov@Mikhail-PC:~/test$ cd local
> mmatrosov@Mikhail-PC:~/test/local$ echo a > a && git add . && git commit -m A
> [master (root-commit) a34c21f] A
>  1 file changed, 1 insertion(+)
>  create mode 100644 a
> mmatrosov@Mikhail-PC:~/test/local$ git push
> Counting objects: 3, done.
> Writing objects: 100% (3/3), 205 bytes | 0 bytes/s, done.
> Total 3 (delta 0), reused 0 (delta 0)
> To /home/mmatrosov/test/server
>  * [new branch]  master -> master
> mmatrosov@Mikhail-PC:~/test/local$ git pull
> Already up-to-date.
> mmatrosov@Mikhail-PC:~/test/local$ git pull --rebase=preserve
> Rebasing (1/1)
> Successfully rebased and updated refs/heads/master.
> 
> As you can see, running bare "git pull" just tells me everything is up
> to date. However, running "git pull --rebase=preserve" triggers
> rebasing of something.

This is most likely a `noop` command.

> It wont be a problem if it didn't take significant time (especially on
> Windows). Why this rebase happens? It is completely redundant and slows
> down the pull operation. Looks like a bug to me.

It is an implementation detail, and if you want to work on fixing it,
please let me know about your availability and C skill level.

> 
> Note that it is important to me, because I want to set "git config
> --global pull.rebase preserve".

Please note that `preserve` is on its way to deprecation, superseded by
`pull.rebase = merges`. Which may, or may not, share the performance
penalty you reported. Probably not, though.

Ciao,
Johannes

> But because of this issue, pulling on an up-to-date repository takes a
> lot of time. Which is very frustrating.
> 
> Tested with:
> * git version 2.19.0.windows.1 in Windows 10 Version 1803
> * git version 2.7.4 in Ubuntu 16.04.3 LTS (inside WSL)
> 
> -
> Best regards, Mikhail Matrosov
> 


Hello Dear I Need An Investment Partner

2018-11-16 Thread Aisha Gaddafi




--
Hello Dear ,
I came across your contact during my private search
Mrs Aisha Al-Qaddafi is my name, the only daughter of late Libyan
president, I have funds the sum
of $27.5 million USD for investment, I am interested in you for
investment project assistance in your country,
i shall compensate you 30% of the total sum after the funds are
transfer into your account,
Reply me urgent for more details
Mrs Aisha Al-Qaddafi
--



Re: [PATCH v3 11/11] fast-export: add a --show-original-ids option to show original names

2018-11-16 Thread SZEDER Gábor
On Thu, Nov 15, 2018 at 11:59:56PM -0800, Elijah Newren wrote:

> diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
> index d7d73061d0..5690fe2810 100755
> --- a/t/t9350-fast-export.sh
> +++ b/t/t9350-fast-export.sh
> @@ -77,6 +77,23 @@ test_expect_success 'fast-export 
> --reference-excluded-parents master~2..master'
>test $MASTER = $(git rev-parse --verify refs/heads/rewrite))
>  '
>  
> +test_expect_success 'fast-export --show-original-ids' '
> +
> + git fast-export --show-original-ids master >output &&
> + grep ^original-oid output| sed -e s/^original-oid.// | sort >actual &&

Nit: 'sed' can do what this 'grep' does:

  sed -n -e s/^original-oid.//p output | sort >actual &&

thus sparing a process.

> + git rev-list --objects master muss >objects-and-names &&
> + awk "{print \$1}" objects-and-names | sort >commits-trees-blobs &&
> + comm -23 actual commits-trees-blobs >unfound &&
> + test_must_be_empty unfound
> +'
> +
> +test_expect_success 'fast-export --show-original-ids | git fast-import' '
> +
> + git fast-export --show-original-ids master muss | git fast-import 
> --quiet &&
> + test $MASTER = $(git rev-parse --verify refs/heads/master) &&
> + test $MUSS = $(git rev-parse --verify refs/tags/muss)
> +'
> +
>  test_expect_success 'iso-8859-1' '
>  
>   git config i18n.commitencoding ISO8859-1 &&
> -- 
> 2.19.1.1063.g1796373474.dirty
> 


Re: insteadOf and git-request-pull output

2018-11-16 Thread brian m. carlson
On Thu, Nov 15, 2018 at 01:28:26PM -0500, Konstantin Ryabitsev wrote:
> Hi, all:
> 
> Looks like setting url.insteadOf rules alters the output of
> git-request-pull. I'm not sure that's the intended use of insteadOf,
> which is supposed to replace URLs for local use, not to expose them
> publicly (but I may be wrong). E.g.:
> 
> $ git request-pull HEAD^ git://foo.example.com/example | grep example
>   git://foo.example.com/example
> 
> $ git config url.ssh://bar.insteadOf git://foo
> 
> $ git request-pull HEAD^ git://foo.example.com/example | grep example
>   ssh://bar.example.com/example
> 
> I think that if we use the "principle of least surprise," insteadOf
> rules shouldn't be applied for git-request-pull URLs.

I'd like to point out a different use that may change your view.  I have
an insteadOf alias, gh:, that points to GitHub.  Performing the rewrite
is definitely the right thing to do, since other users may not have my
alias available.

I agree that in your case, a rewrite seems less appropriate, but I think
we should only skip the rewrite if the value already matches a valid
URL.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


[PATCH v1 1/9] diff: document --no-color-moved

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

Add documentation for --no-color-moved.

Signed-off-by: Phillip Wood 
---
 Documentation/diff-options.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 0378cd574e..151690f814 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -293,6 +293,10 @@ dimmed-zebra::
`dimmed_zebra` is a deprecated synonym.
 --
 
+--no-color-moved::
+   Turn off move detection. This can be used to override configuration
+   settings. It is the same as `--color-moved=no`.
+
 --color-moved-ws=::
This configures how white spaces are ignored when performing the
move detection for `--color-moved`.
-- 
2.19.1



[PATCH v1 4/9] diff --color-moved-ws: demonstrate false positives

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

'diff --color-moved-ws=allow-indentation-change' can highlight lines
that have internal whitespace changes rather than indentation
changes. For example in commit 1a07e59c3e ("Update messages in
preparation for i18n", 2018-07-21) the lines

-   die (_("must end with a color"));
+   die(_("must end with a color"));

are highlighted as moved when they should not be. Modify an existing
test to show the problem that will be fixed in the next commit.

Signed-off-by: Phillip Wood 
---
 t/t4015-diff-whitespace.sh | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index a9fb226c5a..eee81a1987 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1809,7 +1809,7 @@ test_expect_success 'only move detection ignores white 
spaces' '
test_cmp expected actual
 '
 
-test_expect_success 'compare whitespace delta across moved blocks' '
+test_expect_failure 'compare whitespace delta across moved blocks' '
 
git reset --hard &&
q_to_tab <<-\EOF >text.txt &&
@@ -1827,6 +1827,7 @@ test_expect_success 'compare whitespace delta across 
moved blocks' '
QQQthat has similar lines
QQQto previous blocks, but with different indent
QQQYetQAnotherQoutlierQ
+   QLine with internal w h i t e s p a c e change
EOF
 
git add text.txt &&
@@ -1847,6 +1848,7 @@ test_expect_success 'compare whitespace delta across 
moved blocks' '
QQthat has similar lines
QQto previous blocks, but with different indent
QQYetQAnotherQoutlier
+   QLine with internal whitespace change
EOF
 
git diff --color --color-moved 
--color-moved-ws=allow-indentation-change >actual.raw &&
@@ -1856,7 +1858,7 @@ test_expect_success 'compare whitespace delta across 
moved blocks' '
diff --git a/text.txt b/text.txt
--- a/text.txt
+++ b/text.txt
-   @@ -1,14 +1,14 @@
+   @@ -1,15 +1,15 @@
-QIndented
-QText across
-Qsome lines
@@ -1871,6 +1873,7 @@ test_expect_success 'compare whitespace delta across 
moved blocks' '
-QQQthat has similar lines
-QQQto previous blocks, but with different 
indent
-QQQYetQAnotherQoutlierQ
+   -QLine with internal w h i t e s p a c e change
+QQIndented
+QQText across
+QQsome lines
@@ -1885,6 +1888,7 @@ test_expect_success 'compare whitespace delta across 
moved blocks' '
+QQthat has similar lines
+QQto previous blocks, but with 
different indent
+QQYetQAnotherQoutlier
+   +QLine with internal whitespace 
change
EOF
 
test_cmp expected actual
-- 
2.19.1



[PATCH v1 3/9] diff: allow --no-color-moved-ws

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

Allow --no-color-moved-ws and --color-moved-ws=no to cancel any previous
--color-moved-ws option.

Signed-off-by: Phillip Wood 
---
 Documentation/diff-options.txt | 7 +++
 diff.c | 6 +-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 57a2f4cb7a..e1744fa80d 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -306,6 +306,8 @@ endif::git-diff[]
These modes can be given as a comma separated list:
 +
 --
+no::
+   Do not ignore whitespace when performing move detection.
 ignore-space-at-eol::
Ignore changes in whitespace at EOL.
 ignore-space-change::
@@ -322,6 +324,11 @@ allow-indentation-change::
other modes.
 --
 
+--no-color-moved-ws::
+   Do not ignore whitespace when performing move detection. This can be
+   used to override configuration settings. It is the same as
+   `--color-moved-ws=no`.
+
 --word-diff[=]::
Show a word diff, using the  to delimit changed words.
By default, words are delimited by whitespace; see
diff --git a/diff.c b/diff.c
index 78cd3958f4..9b9811988b 100644
--- a/diff.c
+++ b/diff.c
@@ -304,7 +304,9 @@ static int parse_color_moved_ws(const char *arg)
strbuf_addstr(, i->string);
strbuf_trim();
 
-   if (!strcmp(sb.buf, "ignore-space-change"))
+   if (!strcmp(sb.buf, "no"))
+   ret = 0;
+   else if (!strcmp(sb.buf, "ignore-space-change"))
ret |= XDF_IGNORE_WHITESPACE_CHANGE;
else if (!strcmp(sb.buf, "ignore-space-at-eol"))
ret |= XDF_IGNORE_WHITESPACE_AT_EOL;
@@ -5008,6 +5010,8 @@ int diff_opt_parse(struct diff_options *options,
if (cm < 0)
die("bad --color-moved argument: %s", arg);
options->color_moved = cm;
+   } else if (!strcmp(arg, "--no-color-moved-ws")) {
+   options->color_moved_ws_handling = 0;
} else if (skip_prefix(arg, "--color-moved-ws=", )) {
options->color_moved_ws_handling = parse_color_moved_ws(arg);
} else if (skip_to_optional_arg_default(arg, "--color-words", 
>word_regex, NULL)) {
-- 
2.19.1



[PATCH v1 2/9] diff: use whitespace consistently

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

Most of the documentation uses 'whitespace' rather than 'white space'
or 'white spaces' convert to latter two to the former for consistency.

Signed-off-by: Phillip Wood 
---
 Documentation/diff-options.txt | 4 ++--
 diff.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 151690f814..57a2f4cb7a 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -298,7 +298,7 @@ dimmed-zebra::
settings. It is the same as `--color-moved=no`.
 
 --color-moved-ws=::
-   This configures how white spaces are ignored when performing the
+   This configures how whitespace is ignored when performing the
move detection for `--color-moved`.
 ifdef::git-diff[]
It can be set by the `diff.colorMovedWS` configuration setting.
@@ -316,7 +316,7 @@ ignore-all-space::
Ignore whitespace when comparing lines. This ignores differences
even if one line has whitespace where the other line has none.
 allow-indentation-change::
-   Initially ignore any white spaces in the move detection, then
+   Initially ignore any whitespace in the move detection, then
group the moved code blocks only into a block if the change in
whitespace is the same per line. This is incompatible with the
other modes.
diff --git a/diff.c b/diff.c
index c29b1cce14..78cd3958f4 100644
--- a/diff.c
+++ b/diff.c
@@ -320,7 +320,7 @@ static int parse_color_moved_ws(const char *arg)
 
if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
(ret & XDF_WHITESPACE_FLAGS))
-   die(_("color-moved-ws: allow-indentation-change cannot be 
combined with other white space modes"));
+   die(_("color-moved-ws: allow-indentation-change cannot be 
combined with other whitespace modes"));
 
string_list_clear(, 0);
 
-- 
2.19.1



[PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

Currently diff --color-moved-ws=allow-indentation-change does not
support indentation that contains a mix of tabs and spaces. For
example in commit 546f70f377 ("convert.h: drop 'extern' from function
declaration", 2018-06-30) the function parameters in the following
lines are not colored as moved [1].

-extern int stream_filter(struct stream_filter *,
-const char *input, size_t *isize_p,
-char *output, size_t *osize_p);
+int stream_filter(struct stream_filter *,
+ const char *input, size_t *isize_p,
+ char *output, size_t *osize_p);

This commit changes the way the indentation is handled to track the
visual size of the indentation rather than the characters in the
indentation. This has they benefit that any whitespace errors do not
interfer with the move detection (the whitespace errors will still be
highlighted according to --ws-error-highlight). During the discussion
of this feature there were concerns about the correct detection of
indentation for python. However those concerns apply whether or not
we're detecting moved lines so no attempt is made to determine if the
indentation is 'pythonic'.

[1] Note that before the commit to fix the erroneous coloring of moved
lines each line was colored as a different block, since that commit
they are uncolored.

Signed-off-by: Phillip Wood 
---

Notes:
Changes since rfc:
 - It now replaces the existing implementation rather than adding a new
   mode.
 - The indentation deltas are now calculated once for each line and
   cached.
 - Optimized the whitespace delta comparison to compare string lengths
   before comparing the actual strings.
 - Modified the calculation of tabs as suggested by Stefan.
 - Split out the blank line handling into a separate commit as suggest
   by Stefan.
 - Fixed some comments pointed out by Stefan.

 diff.c | 130 +
 t/t4015-diff-whitespace.sh |  56 
 2 files changed, 129 insertions(+), 57 deletions(-)

diff --git a/diff.c b/diff.c
index c378ce3daf..89559293e7 100644
--- a/diff.c
+++ b/diff.c
@@ -750,6 +750,8 @@ struct emitted_diff_symbol {
const char *line;
int len;
int flags;
+   int indent_off;
+   int indent_width;
enum diff_symbol s;
 };
 #define EMITTED_DIFF_SYMBOL_INIT {NULL}
@@ -780,44 +782,68 @@ struct moved_entry {
struct moved_entry *next_line;
 };
 
-/**
- * The struct ws_delta holds white space differences between moved lines, i.e.
- * between '+' and '-' lines that have been detected to be a move.
- * The string contains the difference in leading white spaces, before the
- * rest of the line is compared using the white space config for move
- * coloring. The current_longer indicates if the first string in the
- * comparision is longer than the second.
- */
-struct ws_delta {
-   char *string;
-   unsigned int current_longer : 1;
-};
-#define WS_DELTA_INIT { NULL, 0 }
-
 struct moved_block {
struct moved_entry *match;
-   struct ws_delta wsd;
+   int wsd; /* The whitespace delta of this block */
 };
 
 static void moved_block_clear(struct moved_block *b)
 {
-   FREE_AND_NULL(b->wsd.string);
-   b->match = NULL;
+   memset(b, 0, sizeof(*b));
+}
+
+static void fill_es_indent_data(struct emitted_diff_symbol *es)
+{
+   unsigned int off = 0;
+   int width = 0, tab_width = es->flags & WS_TAB_WIDTH_MASK;
+   const char *s = es->line;
+   const int len = es->len;
+
+   /* skip any \v \f \r at start of indentation */
+   while (s[off] == '\f' || s[off] == '\v' ||
+  (s[off] == '\r' && off < len - 1))
+   off++;
+
+   /* calculate the visual width of indentation */
+   while(1) {
+   if (s[off] == ' ') {
+   width++;
+   off++;
+   } else if (s[off] == '\t') {
+   width += tab_width - (width % tab_width);
+   while (s[++off] == '\t')
+   width += tab_width;
+   } else {
+   break;
+   }
+   }
+
+   es->indent_off = off;
+   es->indent_width = width;
 }
 
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
-const struct emitted_diff_symbol *b,
-struct ws_delta *out)
+   const struct emitted_diff_symbol *b,
+   int *out)
 {
-   const struct emitted_diff_symbol *longer =  a->len > b->len ? a : b;
-   const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a;
-   int d = longer->len - shorter->len;
+   int a_len = a->len,
+   b_len = b->len,
+   a_off = a->indent_off,
+   a_width = a->indent_width,
+   b_off = 

[PATCH v1 0/9] diff --color-moved-ws fixes and enhancment

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

When trying out the new --color-moved-ws=allow-indentation-change I
was disappointed to discover it did not work if the indentation
contains a mix of spaces and tabs. This series reworks it so that it
does.

Since the rfc this series has grown a few fixes at the beginning. The
implementation has been reworked, the last two patches correspond to a
heavily reworked version the last patch of the rfc version, all the
other patches are new.

Phillip Wood (9):
  diff: document --no-color-moved
  diff: use whitespace consistently
  diff: allow --no-color-moved-ws
  diff --color-moved-ws: demonstrate false positives
  diff --color-moved-ws: fix false positives
  diff --color-moved=zebra: be stricter with color alternation
  diff --color-moved-ws: optimize allow-indentation-change
  diff --color-moved-ws: modify allow-indentation-change
  diff --color-moved-ws: handle blank lines

 Documentation/diff-options.txt |  15 ++-
 diff.c | 219 +
 t/t4015-diff-whitespace.sh |  99 ++-
 3 files changed, 251 insertions(+), 82 deletions(-)

-- 
2.19.1



[PATCH v1 5/9] diff --color-moved-ws: fix false positives

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

'diff --color-moved-ws=allow-indentation-change' can color lines as
moved when they are in fact different. For example in commit
1a07e59c3e ("Update messages in preparation for i18n", 2018-07-21) the
lines

-   die (_("must end with a color"));
+   die(_("must end with a color"));

are colored as moved even though they are different.

This is because if there is a fuzzy match for the first line of
a potential moved block the line is marked as moved before the
potential match is checked to see if it actually matches. The fix is
to delay marking the line as moved until after we have checked that
there really is at least one matching potential moved block.

Note that the test modified in the last commit still fails because
adding an unmoved line between two moved blocks that are already
separated by unmoved lines changes the color of the block following the
addition. This should not be the case and will be fixed in the next
commit.

Signed-off-by: Phillip Wood 
---
 diff.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/diff.c b/diff.c
index 9b9811988b..53a7ab5aca 100644
--- a/diff.c
+++ b/diff.c
@@ -1104,10 +1104,10 @@ static void mark_color_as_moved(struct diff_options *o,
continue;
}
 
-   l->flags |= DIFF_SYMBOL_MOVED_LINE;
-
-   if (o->color_moved == COLOR_MOVED_PLAIN)
+   if (o->color_moved == COLOR_MOVED_PLAIN) {
+   l->flags |= DIFF_SYMBOL_MOVED_LINE;
continue;
+   }
 
if (o->color_moved_ws_handling &
COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
@@ -1141,10 +1141,13 @@ static void mark_color_as_moved(struct diff_options *o,
block_length = 0;
}
 
-   block_length++;
+   if (pmb_nr) {
+   block_length++;
 
-   if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
-   l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
+   l->flags |= DIFF_SYMBOL_MOVED_LINE;
+   if (flipped_block && o->color_moved != 
COLOR_MOVED_BLOCKS)
+   l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
+   }
}
adjust_last_block(o, n, block_length);
 
-- 
2.19.1



[PATCH v1 9/9] diff --color-moved-ws: handle blank lines

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

When using --color-moved-ws=allow-indentation-change allow lines with
the same indentation change to be grouped across blank lines. For now
this only works if the blank lines have been moved as well, not for
blocks that have just had their indentation changed.

This completes the changes to the implementation of
--color-moved=allow-indentation-change. Running

  git diff --color-moved=allow-indentation-change v2.18.0 v2.19.0

now takes 5.0s. This is a saving of 41% from 8.5s for the optimized
version of the previous implementation and 66% from the original which
took 14.6s.

Signed-off-by: Phillip Wood 
---

Notes:
Changes since rfc:
 - Split these changes into a separate commit.
 - Detect blank lines when processing the indentation rather than
   parsing each line twice.
 - Tweaked the test to make it harder as suggested by Stefan.
 - Added timing data to the commit message.

 diff.c | 34 ---
 t/t4015-diff-whitespace.sh | 41 ++
 2 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 89559293e7..072b5bced6 100644
--- a/diff.c
+++ b/diff.c
@@ -792,9 +792,11 @@ static void moved_block_clear(struct moved_block *b)
memset(b, 0, sizeof(*b));
 }
 
+#define INDENT_BLANKLINE INT_MIN
+
 static void fill_es_indent_data(struct emitted_diff_symbol *es)
 {
-   unsigned int off = 0;
+   unsigned int off = 0, i;
int width = 0, tab_width = es->flags & WS_TAB_WIDTH_MASK;
const char *s = es->line;
const int len = es->len;
@@ -818,8 +820,18 @@ static void fill_es_indent_data(struct emitted_diff_symbol 
*es)
}
}
 
-   es->indent_off = off;
-   es->indent_width = width;
+   /* check if this line is blank */
+   for (i = off; i < len; i++)
+   if (!isspace(s[i]))
+   break;
+
+   if (i == len) {
+   es->indent_width = INDENT_BLANKLINE;
+   es->indent_off = len;
+   } else {
+   es->indent_off = off;
+   es->indent_width = width;
+   }
 }
 
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
@@ -834,6 +846,11 @@ static int compute_ws_delta(const struct 
emitted_diff_symbol *a,
b_width = b->indent_width;
int delta;
 
+   if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) {
+   *out = INDENT_BLANKLINE;
+   return 1;
+   }
+
if (a->s == DIFF_SYMBOL_PLUS)
delta = a_width - b_width;
else
@@ -877,6 +894,10 @@ static int cmp_in_block_with_wsd(const struct diff_options 
*o,
if (al != bl)
return 1;
 
+   /* If 'l' and 'cur' are both blank then they match. */
+   if (a_width == INDENT_BLANKLINE && c_width == INDENT_BLANKLINE)
+   return 0;
+
/*
 * The indent changes of the block are known and stored in pmb->wsd;
 * however we need to check if the indent changes of the current line
@@ -888,6 +909,13 @@ static int cmp_in_block_with_wsd(const struct diff_options 
*o,
else
delta = c_width - a_width;
 
+   /*
+* If the previous lines of this block were all blank then set its
+* whitespace delta.
+*/
+   if (pmb->wsd == INDENT_BLANKLINE)
+   pmb->wsd = delta;
+
return !(delta == pmb->wsd && al - a_off == cl - c_off &&
 !memcmp(a, b, al) && !
 memcmp(a + a_off, c + c_off, al - a_off));
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index e023839ba6..9d6f88b07f 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1901,10 +1901,20 @@ test_expect_success 'compare whitespace delta 
incompatible with other space opti
test_i18ngrep allow-indentation-change err
 '
 
+EMPTY=''
 test_expect_success 'compare mixed whitespace delta across moved blocks' '
 
git reset --hard &&
tr Q_ "\t " <<-EOF >text.txt &&
+   ${EMPTY}
+   too short without
+   ${EMPTY}
+   ___being grouped across blank line
+   ${EMPTY}
+   context
+   lines
+   to
+   anchor
Indented text to
_Qbe further indented by four spaces across
Qseveral lines
@@ -1918,9 +1928,18 @@ test_expect_success 'compare mixed whitespace delta 
across moved blocks' '
git commit -m "add text.txt" &&
 
tr Q_ "\t " <<-EOF >text.txt &&
+   context
+   lines
+   to
+   anchor
QIndented text to
QQbe further indented by four spaces across
Qseveral lines
+   ${EMPTY}
+   QQtoo short without
+   ${EMPTY}
+   Q___being grouped across blank line
+   ${EMPTY}
Q_QThese two lines have had their
indentation reduced by four spaces
QQdifferent indentation change
@@ 

[PATCH v1 6/9] diff --color-moved=zebra: be stricter with color alternation

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

Currently when using --color-moved=zebra the color of moved blocks
depends on the number of lines separating them. This means that adding
an odd number of unmoved lines between blocks that are already separated
by one or more unmoved lines will change the color of subsequent moved
blocks. This does not make much sense as the blocks were already
separated by unmoved lines and causes problems when adding lines to test
cases.

Fix this by only using the alternate colors for adjacent moved blocks.

Signed-off-by: Phillip Wood 
---

Notes:
An alternative would be to always alternate the color of blocks whether
are not they are adjacent to each other.

 diff.c | 27 +++
 t/t4015-diff-whitespace.sh |  6 +++---
 2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/diff.c b/diff.c
index 53a7ab5aca..8c08dd68df 100644
--- a/diff.c
+++ b/diff.c
@@ -1038,26 +1038,30 @@ static int shrink_potential_moved_blocks(struct 
moved_block *pmb,
  * The last block consists of the (n - block_length)'th line up to but not
  * including the nth line.
  *
+ * Returns 0 if the last block is empty or is unset by this function, non zero
+ * otherwise.
+ *
  * NEEDSWORK: This uses the same heuristic as blame_entry_score() in blame.c.
  * Think of a way to unify them.
  */
-static void adjust_last_block(struct diff_options *o, int n, int block_length)
+static int adjust_last_block(struct diff_options *o, int n, int block_length)
 {
int i, alnum_count = 0;
if (o->color_moved == COLOR_MOVED_PLAIN)
-   return;
+   return block_length;
for (i = 1; i < block_length + 1; i++) {
const char *c = o->emitted_symbols->buf[n - i].line;
for (; *c; c++) {
if (!isalnum(*c))
continue;
alnum_count++;
if (alnum_count >= COLOR_MOVED_MIN_ALNUM_COUNT)
-   return;
+   return 1;
}
}
for (i = 1; i < block_length + 1; i++)
o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE;
+   return 0;
 }
 
 /* Find blocks of moved code, delegate actual coloring decision to helper */
@@ -1067,14 +1071,15 @@ static void mark_color_as_moved(struct diff_options *o,
 {
struct moved_block *pmb = NULL; /* potentially moved blocks */
int pmb_nr = 0, pmb_alloc = 0;
-   int n, flipped_block = 1, block_length = 0;
+   int n, flipped_block = 0, block_length = 0;
 
 
for (n = 0; n < o->emitted_symbols->nr; n++) {
struct hashmap *hm = NULL;
struct moved_entry *key;
struct moved_entry *match = NULL;
struct emitted_diff_symbol *l = >emitted_symbols->buf[n];
+   enum diff_symbol last_symbol = 0;
 
switch (l->s) {
case DIFF_SYMBOL_PLUS:
@@ -1090,7 +1095,7 @@ static void mark_color_as_moved(struct diff_options *o,
free(key);
break;
default:
-   flipped_block = 1;
+   flipped_block = 0;
}
 
if (!match) {
@@ -1101,10 +1106,13 @@ static void mark_color_as_moved(struct diff_options *o,
moved_block_clear([i]);
pmb_nr = 0;
block_length = 0;
+   flipped_block = 0;
+   last_symbol = l->s;
continue;
}
 
if (o->color_moved == COLOR_MOVED_PLAIN) {
+   last_symbol = l->s;
l->flags |= DIFF_SYMBOL_MOVED_LINE;
continue;
}
@@ -1135,19 +1143,22 @@ static void mark_color_as_moved(struct diff_options *o,
}
}
 
-   flipped_block = (flipped_block + 1) % 2;
+   if (adjust_last_block(o, n, block_length) &&
+   pmb_nr && last_symbol != l->s)
+   flipped_block = (flipped_block + 1) % 2;
+   else
+   flipped_block = 0;
 
-   adjust_last_block(o, n, block_length);
block_length = 0;
}
 
if (pmb_nr) {
block_length++;
-
l->flags |= DIFF_SYMBOL_MOVED_LINE;
if (flipped_block && o->color_moved != 
COLOR_MOVED_BLOCKS)
l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
}
+   last_symbol = l->s;
}
adjust_last_block(o, n, block_length);
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 

[PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

When running

  git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0

cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7%
return after comparing a and b. By comparing the lengths first we can
return early in all but 0.03% of those cases without dereferencing the
string pointers. The comparison between a and c fails in 6.8% of
calls, by comparing the lengths first we reject all the failing calls
without dereferencing the string pointers.

This reduces the time to run the command above by by 42% from 14.6s to
8.5s. This is still much slower than the normal --color-moved which
takes ~0.6-0.7s to run but is a significant improvement.

The next commits will replace the current implementation with one that
works with mixed tabs and spaces in the indentation. I think it is
worth optimizing the current implementation first to enable a fair
comparison between the two implementations.

Signed-off-by: Phillip Wood 
---
 diff.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 8c08dd68df..c378ce3daf 100644
--- a/diff.c
+++ b/diff.c
@@ -829,20 +829,23 @@ static int cmp_in_block_with_wsd(const struct 
diff_options *o,
 int n)
 {
struct emitted_diff_symbol *l = >emitted_symbols->buf[n];
-   int al = cur->es->len, cl = l->len;
+   int al = cur->es->len, bl = match->es->len, cl = l->len;
const char *a = cur->es->line,
   *b = match->es->line,
   *c = l->line;
-
+   const char *orig_a = a;
int wslen;
 
/*
-* We need to check if 'cur' is equal to 'match'.
-* As those are from the same (+/-) side, we do not need to adjust for
-* indent changes. However these were found using fuzzy matching
-* so we do have to check if they are equal.
+* We need to check if 'cur' is equal to 'match'.  As those
+* are from the same (+/-) side, we do not need to adjust for
+* indent changes. However these were found using fuzzy
+* matching so we do have to check if they are equal. Here we
+* just check the lengths. We delay calling memcmp() to check
+* the contents until later as if the length comparison for a
+* and c fails we can avoid the call all together.
 */
-   if (strcmp(a, b))
+   if (al != bl)
return 1;
 
if (!pmb->wsd.string)
@@ -870,7 +873,7 @@ static int cmp_in_block_with_wsd(const struct diff_options 
*o,
al -= wslen;
}
 
-   if (al != cl || memcmp(a, c, al))
+   if (al != cl || memcmp(orig_a, b, bl) || memcmp(a, c, al))
return 1;
 
return 0;
-- 
2.19.1



Re: [PATCH v4 0/1] Advertise multiple supported proto versions

2018-11-16 Thread Junio C Hamano
Josh Steadmon  writes:

>> This one unfortunately seems to interact rather badly with your
>> js/remote-archive-v2 topic which is already in 'next', when this
>> topic is merged to 'pu', and my attempt to mechanically resolve
>> conflicts breaks t5000.
>
> Hmm, should we drop js/remote-archive-v2 then? Any solution that
> unblocks js/remote-archive-v2 will almost certainly depend on this
> patch.

That one is already in 'next' for quite some time.  If you are
retrating it, that's OK.

Let me revert the topic out of 'next' and discard it, and then queue
this one in 'pu'.

Thanks.






Re: [RFC/PATCH 0/5] stop installing old libexec aliases like "git-init"

2018-11-16 Thread Ævar Arnfjörð Bjarmason


On Fri, Nov 02 2018, Ævar Arnfjörð Bjarmason wrote:

> I think up to patch 4 here should be near a state that's ready for
> inclusion.
>
> Although I'm on the fence with the approach in 1/5. Should this be a
> giant getopt switch statement like that in a helper script? An
> alternative would be to write out a shell file similar to
> GIT-BUILD-OPTIONS and source that from this thing. I don't know, what
> do you all think?
>
> The idea with 4/5 was to make this symlink mode the default in
> config.mak.uname and have a blacklist of systems like Windows that
> couldn't deal with it.
>
> Since my ad874608d8 ("Makefile: optionally symlink libexec/git-core
> binaries to bin/git", 2018-03-13) I see that e.g. Debian and GitLab
> have started shipping with the INSTALL_SYMLINKS flag, so making that
> unconditional is the next logical step.
>
> The 5th one is more radical. See
> https://public-inbox.org/git/87woyfdkoi@evledraar.gmail.com/ from
> back in March for context.
>
> I'd like to say it's ready, but I've spotted some fallout:
>
>  * Help like "git ninit" suggesting "git init" doesn't work, this is
>because load_command_list() in help.c doesn't look out our
>in-memory idea of builtins, it reads the libexecdir, so if we don't
>have the programs there it doesn't know about it.

A follow-up on this: We should really fix this for other
reasons. I.e. compile in some "this is stuff we ourselves think is in
git".

There's other manifestations of this, e.g.:

git-sizer --help # => shows you help
git sizer --help # => says it doesn't have a manpage

Because we aren't aware that git-sizer is some external tool, and that
we should route --help to it.

Non-withstanding the arguable bug that things like git-sizer shouldn't
be allowing themselves to be invoked by "git" like that without
guaranteeing that it can consume all the options 'git' expects. When I
had to deal with a similar problem in an external git-* command I was
maintaining I simply made it an error to invoke it as "git mything"
instead of "git-mything".

>  * GIT_TEST_INSTALLED breaks entirely under this, as early as the
>heuristic for "are we built?" being "do we have git-init in
>libexecdir?". I tried a bit to make this work, but there's a lot of
>dependencies there.
>
>  * We still (and this is also true of my ad874608d8) hardlink
>everything in the build dir via a different part of the Makefile,
>ideally we should do exactly the same thing there so also normal
>tests and not just GIT_TEST_INSTALLED (if that worked) would test
>in the same mode.
>
>I gave making that work a bit of a try and gave up in the Makefile
>jungle.
>
> Ævar Arnfjörð Bjarmason (5):
>   Makefile: move long inline shell loops in "install" into helper
>   Makefile: conform some of the code to our coding standards
>   Makefile: stop hiding failures during "install"
>   Makefile: add NO_INSTALL_SYMLINKS_FALLBACK switch
>   Makefile: Add a NO_INSTALL_BUILTIN_EXECDIR_ALIASES flag
>
>  Makefile |  65 +++--
>  install_programs | 124 +++
>  2 files changed, 151 insertions(+), 38 deletions(-)
>  create mode 100755 install_programs


Re: [RFC/PATCH 1/5] Makefile: move long inline shell loops in "install" into helper

2018-11-16 Thread Ævar Arnfjörð Bjarmason


On Mon, Nov 12 2018, Johannes Schindelin wrote:

> Hi Ævar,
>
> On Mon, 12 Nov 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> On Mon, Nov 12 2018, Johannes Schindelin wrote:
>>
>> > On Fri, 2 Nov 2018, Ævar Arnfjörð Bjarmason wrote:
>> >
>> >> Move a 37 line for-loop mess out of "install" and into a helper
>> >> script. This started out fairly innocent but over the years has grown
>> >> into a hard-to-maintain monster, and my recent ad874608d8 ("Makefile:
>> >> optionally symlink libexec/git-core binaries to bin/git", 2018-03-13)
>> >> certainly didn't help.
>> >>
>> >> The shell code is ported pretty much as-is (with getopts added), it'll
>> >> be fixed & prettified in subsequent commits.
>> >>
>> >> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> >> ---
>> >>  Makefile | 52 
>> >>  install_programs | 89 
>> >>  2 files changed, 103 insertions(+), 38 deletions(-)
>> >>  create mode 100755 install_programs
>> >>
>> >> diff --git a/Makefile b/Makefile
>> >> index bbfbb4292d..aa6ca1fa68 100644
>> >> --- a/Makefile
>> >> +++ b/Makefile
>> >> @@ -2808,44 +2808,20 @@ endif
>> >>   bindir=$$(cd '$(DESTDIR_SQ)$(bindir_SQ)' && pwd) && \
>> >>   execdir=$$(cd '$(DESTDIR_SQ)$(gitexec_instdir_SQ)' && pwd) && \
>> >>   destdir_from_execdir_SQ=$$(echo '$(gitexecdir_relative_SQ)' | sed -e 
>> >> 's|[^/][^/]*|..|g') && \
>> >> - { test "$$bindir/" = "$$execdir/" || \
>> >> -   for p in git$X $(filter $(install_bindir_programs),$(ALL_PROGRAMS)); 
>> >> do \
>> >> - $(RM) "$$execdir/$$p" && \
>> >> - test -n "$(INSTALL_SYMLINKS)" && \
>> >> - ln -s "$$destdir_from_execdir_SQ/$(bindir_relative_SQ)/$$p" 
>> >> "$$execdir/$$p" || \
>> >> - { test -z 
>> >> "$(NO_INSTALL_HARDLINKS)$(NO_CROSS_DIRECTORY_HARDLINKS)" && \
>> >> -   ln "$$bindir/$$p" "$$execdir/$$p" 2>/dev/null || \
>> >> -   cp "$$bindir/$$p" "$$execdir/$$p" || exit; } \
>> >> -   done; \
>> >> - } && \
>> >> - for p in $(filter $(install_bindir_programs),$(BUILT_INS)); do \
>> >> - $(RM) "$$bindir/$$p" && \
>> >> - test -n "$(INSTALL_SYMLINKS)" && \
>> >> - ln -s "git$X" "$$bindir/$$p" || \
>> >> - { test -z "$(NO_INSTALL_HARDLINKS)" && \
>> >> -   ln "$$bindir/git$X" "$$bindir/$$p" 2>/dev/null || \
>> >> -   ln -s "git$X" "$$bindir/$$p" 2>/dev/null || \
>> >> -   cp "$$bindir/git$X" "$$bindir/$$p" || exit; } \
>> >> - done && \
>> >> - for p in $(BUILT_INS); do \
>> >> - $(RM) "$$execdir/$$p" && \
>> >> - test -n "$(INSTALL_SYMLINKS)" && \
>> >> - ln -s "$$destdir_from_execdir_SQ/$(bindir_relative_SQ)/git$X" 
>> >> "$$execdir/$$p" || \
>> >> - { test -z "$(NO_INSTALL_HARDLINKS)" && \
>> >> -   ln "$$execdir/git$X" "$$execdir/$$p" 2>/dev/null || \
>> >> -   ln -s "git$X" "$$execdir/$$p" 2>/dev/null || \
>> >> -   cp "$$execdir/git$X" "$$execdir/$$p" || exit; } \
>> >> - done && \
>> >> - remote_curl_aliases="$(REMOTE_CURL_ALIASES)" && \
>> >> - for p in $$remote_curl_aliases; do \
>> >> - $(RM) "$$execdir/$$p" && \
>> >> - test -n "$(INSTALL_SYMLINKS)" && \
>> >> - ln -s "git-remote-http$X" "$$execdir/$$p" || \
>> >> - { test -z "$(NO_INSTALL_HARDLINKS)" && \
>> >> -   ln "$$execdir/git-remote-http$X" "$$execdir/$$p" 2>/dev/null 
>> >> || \
>> >> -   ln -s "git-remote-http$X" "$$execdir/$$p" 2>/dev/null || \
>> >> -   cp "$$execdir/git-remote-http$X" "$$execdir/$$p" || exit; } \
>> >> - done && \
>> >
>> > This indeed looks like a mess...
>> >
>> >> + ./install_programs \
>> >> + --X="$$X" \
>> >> + --RM="$(RM)" \
>> >> + --bindir="$$bindir" \
>> >> + --bindir-relative="$(bindir_relative_SQ)" \
>> >> + --execdir="$$execdir" \
>> >> + --destdir-from-execdir="$$destdir_from_execdir_SQ" \
>> >> + --flag-install-symlinks="$(INSTALL_SYMLINKS)" \
>> >> + --flag-no-install-hardlinks="$(NO_INSTALL_HARDLINKS)" \
>> >> + 
>> >> --flag-no-cross-directory-hardlinks="$(NO_CROSS_DIRECTORY_HARDLINKS)" \
>> >> + --list-bindir-standalone="git$X $(filter 
>> >> $(install_bindir_programs),$(ALL_PROGRAMS))" \
>> >> + --list-bindir-git-dashed="$(filter 
>> >> $(install_bindir_programs),$(BUILT_INS))" \
>> >> + --list-execdir-git-dashed="$(BUILT_INS)" \
>> >> + --list-execdir-curl-aliases="$(REMOTE_CURL_ALIASES)" && \
>> >>   ./check_bindir "z$$bindir" "z$$execdir" "$$bindir/git-add$X"
>> >>
>> >>  .PHONY: install-gitweb install-doc install-man install-man-perl 
>> >> install-html install-info install-pdf
>> >> diff --git a/install_programs b/install_programs
>> >> new file mode 100755
>> >> index 00..e287108112
>> >> --- /dev/null
>> >> +++ b/install_programs
>> >> @@ -0,0 +1,89 @@
>> >> +#!/bin/sh
>> >> +
>> >> +while test $# != 0
>> >> +do
>> >> + case "$1" in
>> >> + --X=*)
>> >> +   

Re: [PATCH v2 1/2] [Outreachy] t3903-stash: test without configured user.name and user.email

2018-11-16 Thread Junio C Hamano
Slavica Djukic  writes:

>>> +   git var GIT_COMMITTER_IDENT >actual &&
>>> +   test_cmp expected actual &&
>> I am not sure what you are testing with this step.  There is nothing
>> that changed environment variables or configuration since we ran
>> "git var" above.  Why does this test suspect that somebody in the
>> future may break the expectation that after running 'git add' and/or
>> 'git stash', our committer identity may have been changed, and how
>> would such a breakage happen?
> In previous re-roll, you suggested that test should be improved so
> that when
> reasonable identity is present, git stash executes under that
> identity, and not
> under the fallback one. 

Yes, but for that you'd need to be checking the resulting commit
object that represents the stash entry.  It should be created under
the substitute identity.

> Here I'm just making sure that after calling
> git stash,
> we still have same reasonable identity present.

I do not think such a test would detect it, even when "git stash"
incorrectly used the fallback identity to create the stash entry.


Re: [PATCH] ref-filter: don't look for objects when outside of a repository

2018-11-16 Thread Junio C Hamano
Jeff King  writes:

> If you are arguing that even in a repo we should reject "authorname"
> early (just as we would outside of a repo), I could buy that.

Yup, that (and replace 'authorname' with anything that won't work
with missing objects) for consistency was what I meant.



Re: [PATCH v2 1/2] rebase doc: document rebase.useBuiltin

2018-11-16 Thread Junio C Hamano
Johannes Schindelin  writes:

> Meaning: essentially, `rebase.useBuiltin` was defaulting to `false`, and
> if a user installed Git for Windows with the experimental built-in rebase,
> it was set to `true` in the system config.

Oh, that changes the picture entirely.  If that was what was shipped
to Windows users for 2.19.X, then I'd say we should be able to trust
the switch well enough.  I just somehow thought that everybody in
the Windows land has been using the -in-C version.

>> Having said that, assuming that the switching back to scripted
>> version works correctly and assuming that we want to expose this to
>> end users, I think the patch text makes sense.
>
> Indeed.
>
> I would still love to see the built-in rebase to be used by default in
> v2.20.0, and I am reasonably sure that the escape hatch works (because, as
> I told you above, it worked in the reverse, making the built-in rebase an
> opt-in in Git for Windows v2.19.1).

Good.  That makes things a lot simpler.

Thanks.


[PATCH] bundle: dup() output descriptor closer to point-of-use

2018-11-16 Thread Jeff King
On Thu, Nov 15, 2018 at 09:01:07PM +0100, Johannes Schindelin wrote:

> > It seems like we should be checking that the stale lockfile isn't left,
> > which is the real problem (the warning is annoying, but is a symptom of
> > the same thing). I.e., something like:
> > 
> >   test_must_fail git bundle create foobar.bundle master..master &&
> >   test_path_is_missing foobar.bundle.lock
> > 
> > That would already pass on non-Windows platforms, but that's OK. It will
> > never give a false failure.
> > 
> > If you don't mind, can you confirm that the test above fails without
> > either of the two patches under discussion?
> 
> This test succeeds with your patch as well as with Gaël's, and fails when
> neither patch is applied. So you're right, it is the much better test.

Thanks for checking!

> > > Do you want to integrate this test into your patch and run with it, or
> > > do you want me to shepherd your patch?
> > 
> > I'll wrap it up with a commit message and a test.

Actually, I realized there's an even simpler way to do this. Here it is.

-- >8 --
Subject: [PATCH] bundle: dup() output descriptor closer to point-of-use

When writing a bundle to a file, the bundle code actually creates
"your.bundle.lock" using our lockfile interface. We feed that output
descriptor to a child git-pack-objects via run-command, which has the
quirk that it closes the output descriptor in the parent.

To avoid confusing the lockfile code (which still thinks the descriptor
is valid), we dup() it, and operate on the duplicate.

However, this has a confusing side effect: after the dup() but before we
call pack-objects, we have _two_ descriptors open to the lockfile. If we
call die() during that time, the lockfile code will try to clean up the
partially-written file. It knows to close() the file before unlinking,
since on some platforms (i.e., Windows) the open file would block the
deletion. But it doesn't know about the duplicate descriptor. On
Windows, triggering an error at the right part of the code will result
in the cleanup failing and the lockfile being left in the filesystem.

We can solve this by moving the dup() much closer to start_command(),
shrinking the window in which we have the second descriptor open. It's
easy to place this in such a way that no die() is possible. We could
still die due to a signal in the exact wrong moment, but we already
tolerate races there (e.g., a signal could come before we manage to put
the file on the cleanup list in the first place).

As a bonus, this shields create_bundle() itself from the duplicate-fd
trick, and we can simplify its error handling (note that the lock
rollback now happens unconditionally, but that's OK; it's a noop if we
didn't open the lock in the first place).

The included test uses an empty bundle to cause a failure at the right
spot in the code, because that's easy to trigger (the other likely
errors are write() problems like ENOSPC).  Note that it would already
pass on non-Windows systems (because they are happy to unlink an
already-open file).

Based-on-a-patch-by: Gaël Lhez 
Signed-off-by: Jeff King 
---
 bundle.c| 39 ++-
 t/t5607-clone-bundle.sh |  6 ++
 2 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/bundle.c b/bundle.c
index 1ef584b93b..6b0f6d8f10 100644
--- a/bundle.c
+++ b/bundle.c
@@ -243,7 +243,7 @@ static int is_tag_in_date_range(struct object *tag, struct 
rev_info *revs)
 }
 
 
-/* Write the pack data to bundle_fd, then close it if it is > 1. */
+/* Write the pack data to bundle_fd */
 static int write_pack_data(int bundle_fd, struct rev_info *revs)
 {
struct child_process pack_objects = CHILD_PROCESS_INIT;
@@ -256,6 +256,20 @@ static int write_pack_data(int bundle_fd, struct rev_info 
*revs)
pack_objects.in = -1;
pack_objects.out = bundle_fd;
pack_objects.git_cmd = 1;
+
+   /*
+* start_command() will close our descriptor if it's >1. Duplicate it
+* to avoid surprising the caller.
+*/
+   if (pack_objects.out > 1) {
+   pack_objects.out = dup(pack_objects.out);
+   if (pack_objects.out < 0) {
+   error_errno(_("unable to dup bundle descriptor"));
+   child_process_clear(_objects);
+   return -1;
+   }
+   }
+
if (start_command(_objects))
return error(_("Could not spawn pack-objects"));
 
@@ -421,21 +435,10 @@ int create_bundle(struct bundle_header *header, const 
char *path,
bundle_to_stdout = !strcmp(path, "-");
if (bundle_to_stdout)
bundle_fd = 1;
-   else {
+   else
bundle_fd = hold_lock_file_for_update(, path,
  LOCK_DIE_ON_ERROR);
 
-   /*
-* write_pack_data() will close the fd passed to it,
-* but commit_lock_file() will also try to close 

Re: [PATCH v2 1/2] rebase doc: document rebase.useBuiltin

2018-11-16 Thread Johannes Schindelin
Hi Junio,

On Fri, 16 Nov 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason   writes:
> 
> > The rebase.useBuiltin variable introduced in 55071ea248 ("rebase:
> > start implementing it as a builtin", 2018-08-07) was turned on by
> > default in 5541bd5b8f ("rebase: default to using the builtin rebase",
> > 2018-08-08), but had no documentation.
> 
> I actually thought that everybody understood that this was merely an
> aid to be used during the development of the feature and never meant
> to be given to the end users.

It may have been from git.git's point of view, but from Git for Windows'
point of view it was always meant to be a real feature flag.

> With my devil's advocate hat on, how much do we trust it as an
> escape hatch?

As you know, only a fraction of the bug reports about the built-in rebase
came in from Git for Windows: the autostash-with-submodules bug and the
perf-regression one. By my counting that is 2 out of 5 bugs coming in via
that route.

One of the reasons for that was that the built-in rebase that was shipped
in Git for Windows v2.19.1 was marked as experimental.

And the way I could mark it experimental was by flipping the default to
executing the scripted version:
https://github.com/git-for-windows/git/commit/cff1a96cfe (you will note
that I added the same escape hatch for `git stash` by adding back
`git-stash.sh` as `git-legacy-stash.sh` and imitating the same dance as
for built-in `rebase`, and I also added back the scripted
`git-rebase--interactive.sh` for use by `git-legacy-rebase.sh`).

Meaning: essentially, `rebase.useBuiltin` was defaulting to `false`, and
if a user installed Git for Windows with the experimental built-in rebase,
it was set to `true` in the system config.

There was not a single complaint about the scripted `git rebase` being
broken in any way.

So we *do* have some real-world testing of that feature. (Obviously I have
no numbers about Git for Windows' usage, apart from download numbers, and
they do not say how many users opted in and how many did not, but Git for
Windows v2.19.1 was downloaded more than 2.7 million times so far and I
think it is safe to assume that some percentage tested that feature.)

> After all, the codepath to hide the "rebase in C" implementation and use
> the scripted version were never in 'master' (or 'next' for that matter)
> with this variable turned off, so I am reasonably sure it had no serious
> exposure to real world usage.

See above for a counter-argument.

> Having said that, assuming that the switching back to scripted
> version works correctly and assuming that we want to expose this to
> end users, I think the patch text makes sense.

Indeed.

I would still love to see the built-in rebase to be used by default in
v2.20.0, and I am reasonably sure that the escape hatch works (because, as
I told you above, it worked in the reverse, making the built-in rebase an
opt-in in Git for Windows v2.19.1).

Ciao,
Dscho

Re: [PATCH] ref-filter: don't look for objects when outside of a repository

2018-11-16 Thread Jeff King
On Fri, Nov 16, 2018 at 02:09:07PM +0900, Junio C Hamano wrote:

> >> I see problems in both directions:
> >> 
> >>  - sorting by "objectname" works now, but it's marked with SOURCE_OBJ,
> >>and would be forbidden with your patch.  I'm actually not sure if
> >>SOURCE_OBJ is accurate; we shouldn't need to access the object to
> >>show it (and we are probably wasting effort loading the full contents
> >>for tools like for-each-ref).
> >> 
> >>However, that's not the full story. For objectname:short, it _does_ call
> >>find_unique_abbrev(). So we expect to have an object directory.
> >
> > Oops, I'm apparently bad at reading. It is in fact SOURCE_OTHER, which
> > makes sense (outside of this whole "--sort outside a repo thing").
> >
> > But we'd ideally distinguish between "objectname" (which should be OK
> > outside a repo) and "objectname:short" (which currently segfaults).
> 
> Arguably, use of ref-filter machinery in ls-remote, whether it is
> given from inside or outside a repo, was a mistake in 1fb20dfd
> ("ls-remote: create '--sort' option", 2018-04-09), as the whole
> point of "ls-remote" is to peek the list of refs and it is perfectly
> normal that the objects listed are not available.

I think it's conceptually reasonable to use the ref-filter machinery.
It's just that it was underprepared to handle this out-of-repo case. I
think we're not too far off, though.

> "ls-remote --sort=authorname" that is run in a repository may not
> segfault on a ref that points at a yet-to-be-fetched commit, but it
> cannot be doing anything sensible.  Is it still better to silently
> produce a nonsense result than refusing to --sort no matter what the
> sort keys are, whether we are inside or outside a repository?

I don't think we produce silent nonsense in the current code (or after
any of the discussed solutions), either in a repo or out. We say "fatal:
missing object ..." inside a repo if the request cannot be fulfilled.
That's not incredibly illuminating, perhaps, but it means we fulfill
whatever we _can_ on behalf of the user's request, and bail otherwise.

If you are arguing that even in a repo we should reject "authorname"
early (just as we would outside of a repo), I could buy that.
Technically we can make it work sometimes (if we happen to have fetched
everything the other side has), but behaving consistently (and with a
decent error message) may trump that.

-Peff


[PATCH 3/3] remote-curl: die on server-side errors

2018-11-16 Thread Jeff King
From: Josh Steadmon 

When a smart HTTP server sends an error message via pkt-line,
remote-curl will fail to detect the error (which usually results in
incorrectly falling back to dumb-HTTP mode).

This patch adds a check in check_smart_http() for server-side error
messages, as well as a test case for this issue.

Signed-off-by: Josh Steadmon 
Signed-off-by: Jeff King 
---
 remote-curl.c   | 3 +++
 t/lib-httpd.sh  | 1 +
 t/lib-httpd/apache.conf | 4 
 t/lib-httpd/error-smart-http.sh | 3 +++
 t/t5551-http-fetch-smart.sh | 5 +
 5 files changed, 16 insertions(+)
 create mode 100644 t/lib-httpd/error-smart-http.sh

diff --git a/remote-curl.c b/remote-curl.c
index 3c9c4a07c3..b1309f2bdc 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -382,6 +382,9 @@ static void check_smart_http(struct discovery *d, const 
char *service,
 */
d->proto_git = 1;
 
+   } else if (skip_prefix(line, "ERR ", )) {
+   die(_("remote error: %s"), p);
+
} else {
die("invalid server response; got '%s'", line);
}
diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index a8729f8232..4e946623bb 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -131,6 +131,7 @@ prepare_httpd() {
mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH"
cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
install_script broken-smart-http.sh
+   install_script error-smart-http.sh
install_script error.sh
install_script apply-one-time-sed.sh
 
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 581c010d8f..6de2bc775c 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -117,6 +117,7 @@ Alias /auth/dumb/ www/auth/dumb/
 
 ScriptAliasMatch /smart_*[^/]*/(.*) ${GIT_EXEC_PATH}/git-http-backend/$1
 ScriptAlias /broken_smart/ broken-smart-http.sh/
+ScriptAlias /error_smart/ error-smart-http.sh/
 ScriptAlias /error/ error.sh/
 ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1
 
@@ -125,6 +126,9 @@ ScriptAliasMatch /one_time_sed/(.*) apply-one-time-sed.sh/$1
 
Options ExecCGI
 
+
+   Options ExecCGI
+
 
   Options ExecCGI
 
diff --git a/t/lib-httpd/error-smart-http.sh b/t/lib-httpd/error-smart-http.sh
new file mode 100644
index 00..e65d447fc4
--- /dev/null
+++ b/t/lib-httpd/error-smart-http.sh
@@ -0,0 +1,3 @@
+echo "Content-Type: application/x-git-upload-pack-advertisement"
+echo
+printf "%s" "0019ERR server-side error"
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 8630b0cc39..ba83e567e5 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -429,5 +429,10 @@ test_expect_success 'GIT_TRACE_CURL_NO_DATA prevents data 
from being traced' '
! grep "=> Send data" err
 '
 
+test_expect_success 'server-side error detected' '
+   test_must_fail git clone $HTTPD_URL/error_smart/repo.git 2>actual &&
+   grep "server-side error" actual
+'
+
 stop_httpd
 test_done
-- 
2.19.1.1636.gc7a073d580


Re: [PATCH v3 00/11] fast export and import fixes and features

2018-11-16 Thread Jeff King
On Thu, Nov 15, 2018 at 11:59:45PM -0800, Elijah Newren wrote:

> This is a series of small fixes and features for fast-export and
> fast-import, mostly on the fast-export side.
> 
> Changes since v2 (full range-diff below):
>   * Dropped the final patch; going to try to use Peff's suggestion of
> rev-list and diff-tree to get what I need instead
>   * Inserted a new patch at the beginning to convert pre-existing sha1
> stuff to oid (rename sha1_to_hex() -> oid_to_hex(), rename
> anonymize_sha1() to anonymize_oid(), etc.)
>   * Modified other patches in the series to add calls to oid_to_hex() rather
> than sha1_to_hex()

Thanks, these changes all look good to me. I have no more nits to pick.
:)

-Peff


[PATCH 2/3] remote-curl: tighten "version 2" check for smart-http

2018-11-16 Thread Jeff King
In a v2 smart-http conversation, the server should reply to our initial
request with a pkt-line saying "version 2" (this is the start of the
"capabilities advertisement"). We check for the string using
starts_with(), but that's overly permissive (it would match "version
20", for example).

Let's tighten this check to use strcmp(). Note that we don't need to
worry about a trailing newline here, because the ptk-line code will have
chomped it for us already.

Signed-off-by: Jeff King 
---
 remote-curl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index dd9bc41aa1..3c9c4a07c3 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -375,7 +375,7 @@ static void check_smart_http(struct discovery *d, const 
char *service,
d->len = src_len;
d->proto_git = 1;
 
-   } else if (starts_with(line, "version 2")) {
+   } else if (!strcmp(line, "version 2")) {
/*
 * v2 smart http; do not consume version packet, which will
 * be handled elsewhere.
-- 
2.19.1.1636.gc7a073d580



[PATCH 1/3] remote-curl: refactor smart-http discovery

2018-11-16 Thread Jeff King
After making initial contact with an http server, we have to decide if
the server supports smart-http, and if so, which version. Our rules are
a bit inconsistent:

  1. For v0, we require that the content-type indicates a smart-http
 response. We also require the response to look vaguely like a
 pkt-line starting with "#". If one of those does not match, we fall
 back to dumb-http.

 But according to our http protocol spec[1]:

   Dumb servers MUST NOT return a return type starting with
   `application/x-git-`.

 If we see the expected content-type, we should consider it
 smart-http. At that point we can parse the pkt-line for real, and
 complain if it is not syntactically valid.

  2. For v2, we do not actually check the content-type. Our v2 protocol
 spec says[2]:

   When using the http:// or https:// transport a client makes a
   "smart" info/refs request as described in `http-protocol.txt`[...]

 and the http spec is clear that for a smart-http[3]:

   The Content-Type MUST be `application/x-$servicename-advertisement`.

 So it is required according to the spec.

These inconsistencies were easy to miss because of the way the original
code was written as an inline conditional. Let's pull it out into its
own function for readability, and improve a few things:

 - we now predicate the smart/dumb decision entirely on the presence of
   the correct content-type

 - we do a real pkt-line parse before deciding how to proceed (and die
   if it isn't valid)

 - use skip_prefix() for comparing service strings, instead of
   constructing expected output in a strbuf; this avoids dealing with
   memory cleanup

Note that this _is_ tightening what the client will allow. It's all
according to the spec, but it's possible that other implementations
might violate these. However, violating these particular rules seems
like an odd choice for a server to make.

[1] Documentation/technical/http-protocol.txt, l. 166-167
[2] Documentation/technical/protocol-v2.txt, l. 63-64
[3] Documentation/technical/http-protocol.txt, l. 247

Helped-by: Josh Steadmon 
Signed-off-by: Jeff King 
---
 remote-curl.c | 93 ---
 1 file changed, 59 insertions(+), 34 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 762a55a75f..dd9bc41aa1 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -330,9 +330,65 @@ static int get_protocol_http_header(enum protocol_version 
version,
return 0;
 }
 
+static void check_smart_http(struct discovery *d, const char *service,
+struct strbuf *type)
+{
+   char *src_buf;
+   size_t src_len;
+   char *line;
+   const char *p;
+
+   /*
+* If we don't see x-$service-advertisement, then it's not smart-http.
+* But once we do, we commit to it and assume any other protocol
+* violations are hard errors.
+*/
+   if (!skip_prefix(type->buf, "application/x-", ) ||
+   !skip_prefix(p, service, ) ||
+   strcmp(p, "-advertisement"))
+   return;
+
+   /*
+* "Peek" at the first packet by using a separate buf/len pair; some
+* cases below require us leaving the originals intact.
+*/
+   src_buf = d->buf;
+   src_len = d->len;
+   line = packet_read_line_buf(_buf, _len, NULL);
+   if (!line)
+   die("invalid server response; expected service, got flush 
packet");
+
+   if (skip_prefix(line, "# service=", ) && !strcmp(p, service)) {
+   /*
+* The header can include additional metadata lines, up
+* until a packet flush marker.  Ignore these now, but
+* in the future we might start to scan them.
+*/
+   while (packet_read_line_buf(_buf, _len, NULL))
+   ;
+
+   /*
+* v0 smart http; callers expect us to soak up the
+* service and header packets
+*/
+   d->buf = src_buf;
+   d->len = src_len;
+   d->proto_git = 1;
+
+   } else if (starts_with(line, "version 2")) {
+   /*
+* v2 smart http; do not consume version packet, which will
+* be handled elsewhere.
+*/
+   d->proto_git = 1;
+
+   } else {
+   die("invalid server response; got '%s'", line);
+   }
+}
+
 static struct discovery *discover_refs(const char *service, int for_push)
 {
-   struct strbuf exp = STRBUF_INIT;
struct strbuf type = STRBUF_INIT;
struct strbuf charset = STRBUF_INIT;
struct strbuf buffer = STRBUF_INIT;
@@ -405,38 +461,8 @@ static struct discovery *discover_refs(const char 
*service, int for_push)
last->buf_alloc = strbuf_detach(, >len);
last->buf = last->buf_alloc;
 
-   strbuf_addf(, "application/x-%s-advertisement", service);
-  

[PATCH 0/3] remote-curl smart-http discovery cleanup

2018-11-16 Thread Jeff King
On Thu, Nov 15, 2018 at 01:51:52PM -0800, Josh Steadmon wrote:

> > This patch tightens both of those (I also made a few stylistic tweaks,
> > and added the ERR condition to show where it would go). I dunno. Part of
> > me sees this as a nice cleanup, but maybe it is better to just leave it
> > alone. A lot of these behaviors are just how it happens to work now, and
> > not part of the spec, but we don't know what might be relying on them.
> 
> At least according to the protocol-v2 and http-protocol docs, the
> stricter behavior seems correct:
> 
> For the first point above, dumb servers should never use an
> "application/x-git-*" content type (http-protocol.txt line 163-167).
> 
> For the second point, the docs require v2 servers to use
> "application/x-git-*" content types. protocol-v2.txt lines 63-65 state
> that v2 clients should make a smart http request, while
> http-protocol.txt lines 247-252 state that a smart server's response
> type must be "application/x-git-*".

Thanks for digging into the spec. I agree that it's pretty clear that
the appropriate content-type is expected.

> Of course we don't know if other implementations follow the spec, but
> ISTM that this patch at least doesn't contradict how we've promised the
> protocols should work.

These seem like pretty unlikely ways for a buggy server to behave, so I
think it's a reasonable risk. I also checked GitHub's implementation
(which recently learned to speak v2) and made sure that it works. :)
I didn't check JGit, but given the provenance, I assume it's fine.

Amusingly, this does break the test you just added, because it tries to
issue an ERR after claiming "text/html" (and after my patch, we
correctly fall back to dumb-http).

> If no one has any objections, I'll include the diff below in v2. Thanks
> for the help Jeff!

I think it makes sense to do the refactoring first as a separate step.
And of course it needs a commit message. So how about this series (your
original is rebased on top)?

  [1/3]: remote-curl: refactor smart-http discovery
  [2/3]: remote-curl: tighten "version 2" check for smart-http
  [3/3]: remote-curl: die on server-side errors

 remote-curl.c   | 96 +
 t/lib-httpd.sh  |  1 +
 t/lib-httpd/apache.conf |  4 ++
 t/lib-httpd/error-smart-http.sh |  3 ++
 t/t5551-http-fetch-smart.sh |  5 ++
 5 files changed, 75 insertions(+), 34 deletions(-)
 create mode 100644 t/lib-httpd/error-smart-http.sh

-Peff


From:Miss:Fatima Yusuf.

2018-11-16 Thread Miss.Fatima Yusuf



From:Miss:Fatima Yusuf.

For sure this mail would definitely come to you as a surprise, but do take your 
good time to go through it, My name is Ms.Fatima Yusuf,i am from Ivory Coast.

I lost my parents a year and couple of months ago. My father was a serving 
director of the Agro-exporting board until his death. He was assassinated by 
his business partners.Before his death, he made a deposit of US$9.7 Million 
Dollars here in Cote d'ivoire which was for the purchase of cocoa processing 
machine and development of another factory before his untimely death.

Being that this part of the world experiences political and crises time without 
number, there is no guarantee of lives and properties. I cannot invest this 
money here any long, despite the fact it had been my late father's industrial 
plans.

I want you to do me a favor to receive this funds into your country or any 
safer place as the beneficiary, I have plans to invest this money in 
continuation with the investment vision of my late father, but not in this 
place again rather in your country. I have the vision of going into real estate 
and industrial production or any profitable business venture.

I will be ready to compensate you with 20% of the total Amount, now all my hope 
is banked on you and i really wants to invest this money in your country, where 
there is stability of Government, political and economic welfare.

My greatest worry now is how to move out of this country because my uncle is 
threatening to kill me as he killed my father,Please do not let anybody hear 
about this, it is between me and you alone because of my security reason.

I am waiting to hear from you.
Yours Sincerely,
Miss.Fatima Yusuf.


Re: [PATCH v2 1/2] [Outreachy] t3903-stash: test without configured user.name and user.email

2018-11-16 Thread Slavica Djukic

Hi Junio,

On 16-Nov-18 6:55 AM, Junio C Hamano wrote:

Slavica Djukic  writes:


+test_expect_failure 'stash works when user.name and user.email are not set' '
+   git reset &&
+   git var GIT_COMMITTER_IDENT >expected &&

All the other existing test pieces in this file calls the expected
result "expect"; is there a reason why this patch needs to be
different (e.g. 'expect' file left by the earlier step needs to be
kept unmodified for later use, or something like that)?  If not,
please avoid making a difference in irrelevant details, as that
would waste time of readers by forcing them to guess if there is
such a reason that readers cannot immediately see.


There is no specific reason for file to be "expected", I'll update that.



Anyway, we grab the committer ident we use by default during the
test with this command.  OK.


+   >1 &&
+   git add 1 &&
+   git stash &&

And we make sure we can create stash.


+   git var GIT_COMMITTER_IDENT >actual &&
+   test_cmp expected actual &&

I am not sure what you are testing with this step.  There is nothing
that changed environment variables or configuration since we ran
"git var" above.  Why does this test suspect that somebody in the
future may break the expectation that after running 'git add' and/or
'git stash', our committer identity may have been changed, and how
would such a breakage happen?
In previous re-roll, you suggested that test should be improved so that 
when
reasonable identity is present, git stash executes under that identity, 
and not
under the fallback one. Here I'm just making sure that after calling git 
stash,

we still have same reasonable identity present.



+   >2 &&
+   git add 2 &&
+   test_config user.useconfigonly true &&
+   test_config stash.usebuiltin true &&

Now we start using use-config-only, so unsetting environment
variables will cause trouble when Git insists on having an
explicitly configured identities.  Makes sense.


+   (
+   sane_unset GIT_AUTHOR_NAME &&
+   sane_unset GIT_AUTHOR_EMAIL &&
+   sane_unset GIT_COMMITTER_NAME &&
+   sane_unset GIT_COMMITTER_EMAIL &&
+   test_unconfig user.email &&
+   test_unconfig user.name &&

And then we try the same test, but without environment or config.
Since we are unsetting the environment, in order to be nice for
future test writers, we do this in a subshell, so that we do not
have to restore the original values of environment variables.

Don't we need to be nice the same way for configuration variables,
though?  We _know_ that nobody sets user.{email,name} config up to
this point in the test sequence, so that is why we do not do a "save
before test and then restore to the original" dance on them.  Even
though we are relying on the fact that these two variables are left
unset in the configuration file, we unconfig them here anyway, and I
do think it is a good idea for documentation purposes (i.e. we are
not documenting what we assume the config before running this test
would be; we are documenting what state we want these two variables
are in when running this "git stash"---that is, they are both unset).

So these later part of this test piece makes sense.  I still do not
know what you wanted to check in the earlier part of the test,
though.


+   git stash
+   )
+'
+
  test_done



Thank you,
Slavica


[PATCH v3 00/11] fast export and import fixes and features

2018-11-16 Thread Elijah Newren
This is a series of small fixes and features for fast-export and
fast-import, mostly on the fast-export side.

Changes since v2 (full range-diff below):
  * Dropped the final patch; going to try to use Peff's suggestion of
rev-list and diff-tree to get what I need instead
  * Inserted a new patch at the beginning to convert pre-existing sha1
stuff to oid (rename sha1_to_hex() -> oid_to_hex(), rename
anonymize_sha1() to anonymize_oid(), etc.)
  * Modified other patches in the series to add calls to oid_to_hex() rather
than sha1_to_hex()

Elijah Newren (11):
  fast-export: convert sha1 to oid
  git-fast-import.txt: fix documentation for --quiet option
  git-fast-export.txt: clarify misleading documentation about rev-list
args
  fast-export: use value from correct enum
  fast-export: avoid dying when filtering by paths and old tags exist
  fast-export: move commit rewriting logic into a function for reuse
  fast-export: when using paths, avoid corrupt stream with non-existent
mark
  fast-export: ensure we export requested refs
  fast-export: add --reference-excluded-parents option
  fast-import: remove unmaintained duplicate documentation
  fast-export: add a --show-original-ids option to show original names

 Documentation/git-fast-export.txt |  23 +++-
 Documentation/git-fast-import.txt |  23 +++-
 builtin/fast-export.c | 190 +-
 fast-import.c | 166 ++
 t/t9350-fast-export.sh|  80 -
 5 files changed, 268 insertions(+), 214 deletions(-)

 -:  -- >  1:  4c3370c85f fast-export: convert sha1 to oid
 1:  8870fb1340 =  2:  6ffa30e3c7 git-fast-import.txt: fix documentation for 
--quiet option
 2:  16d1c3e22d =  3:  1e278f009a git-fast-export.txt: clarify misleading 
documentation about rev-list args
 3:  e19f6b36f9 =  4:  9d7b2aef49 fast-export: use value from correct enum
 4:  2b305561d5 !  5:  b65a591d4d fast-export: avoid dying when filtering by 
paths and old tags exist
@@ -29,7 +29,7 @@
 -   oid_to_hex(>object.oid));
 +  if (!p->parents) {
 +  printf("reset %s\nfrom %s\n\n",
-+ name, sha1_to_hex(null_sha1));
++ name, oid_to_hex(_oid));
 +  free(buf);
 +  return;
 +  }
 5:  607b1dc2b2 !  6:  dde52c9cb6 fast-export: move commit rewriting logic into 
a function for reuse
@@ -47,7 +47,7 @@
 -  break;
 -  if (!p->parents) {
 -  printf("reset %s\nfrom %s\n\n",
-- name, sha1_to_hex(null_sha1));
+- name, oid_to_hex(_oid));
 -  free(buf);
 -  return;
 -  }
@@ -55,7 +55,7 @@
 +  p = rewrite_commit((struct commit *)tagged);
 +  if (!p) {
 +  printf("reset %s\nfrom %s\n\n",
-+ name, sha1_to_hex(null_sha1));
++ name, oid_to_hex(_oid));
 +  free(buf);
 +  return;
}
 6:  ec1862e858 !  7:  d9b2e326f0 fast-export: when using paths, avoid corrupt 
stream with non-existent mark
@@ -35,7 +35,7 @@
 +   * it.
 +   */
 +  printf("reset %s\nfrom %s\n\n",
-+ name, sha1_to_hex(null_sha1));
++ name, oid_to_hex(_oid));
 +  continue;
 +  }
printf("reset %s\nfrom :%d\n\n", name,
 7:  9da26e3ccb !  8:  9ddb155a70 fast-export: ensure we export requested refs
@@ -97,7 +97,7 @@
case OBJ_TAG:
handle_tag(name, (struct tag *)object);
 @@
-  name, sha1_to_hex(null_sha1));
+  name, oid_to_hex(_oid));
continue;
}
 -  printf("reset %s\nfrom :%d\n\n", name,
@@ -114,7 +114,7 @@
 +   * like a ref deletion to me.
 +   */
 +  printf("reset %s\nfrom %s\n\n",
-+ name, sha1_to_hex(null_sha1));
++ name, oid_to_hex(_oid));
 +  continue;
 +  }
 +
 8:  7e5fe2f02e !  9:  595d2e5d30 fast-export: add 

[PATCH v3 10/11] fast-import: remove unmaintained duplicate documentation

2018-11-16 Thread Elijah Newren
fast-import.c has started with a comment for nine and a half years
re-directing the reader to Documentation/git-fast-import.txt for
maintained documentation.  Instead of leaving the unmaintained
documentation in place, just excise it.

Signed-off-by: Elijah Newren 
---
 fast-import.c | 154 --
 1 file changed, 154 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 95600c78e0..555d49ad23 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1,157 +1,3 @@
-/*
-(See Documentation/git-fast-import.txt for maintained documentation.)
-Format of STDIN stream:
-
-  stream ::= cmd*;
-
-  cmd ::= new_blob
-| new_commit
-| new_tag
-| reset_branch
-| checkpoint
-| progress
-;
-
-  new_blob ::= 'blob' lf
-mark?
-file_content;
-  file_content ::= data;
-
-  new_commit ::= 'commit' sp ref_str lf
-mark?
-('author' (sp name)? sp '<' email '>' sp when lf)?
-'committer' (sp name)? sp '<' email '>' sp when lf
-commit_msg
-('from' sp commit-ish lf)?
-('merge' sp commit-ish lf)*
-(file_change | ls)*
-lf?;
-  commit_msg ::= data;
-
-  ls ::= 'ls' sp '"' quoted(path) '"' lf;
-
-  file_change ::= file_clr
-| file_del
-| file_rnm
-| file_cpy
-| file_obm
-| file_inm;
-  file_clr ::= 'deleteall' lf;
-  file_del ::= 'D' sp path_str lf;
-  file_rnm ::= 'R' sp path_str sp path_str lf;
-  file_cpy ::= 'C' sp path_str sp path_str lf;
-  file_obm ::= 'M' sp mode sp (hexsha1 | idnum) sp path_str lf;
-  file_inm ::= 'M' sp mode sp 'inline' sp path_str lf
-data;
-  note_obm ::= 'N' sp (hexsha1 | idnum) sp commit-ish lf;
-  note_inm ::= 'N' sp 'inline' sp commit-ish lf
-data;
-
-  new_tag ::= 'tag' sp tag_str lf
-'from' sp commit-ish lf
-('tagger' (sp name)? sp '<' email '>' sp when lf)?
-tag_msg;
-  tag_msg ::= data;
-
-  reset_branch ::= 'reset' sp ref_str lf
-('from' sp commit-ish lf)?
-lf?;
-
-  checkpoint ::= 'checkpoint' lf
-lf?;
-
-  progress ::= 'progress' sp not_lf* lf
-lf?;
-
- # note: the first idnum in a stream should be 1 and subsequent
- # idnums should not have gaps between values as this will cause
- # the stream parser to reserve space for the gapped values.  An
- # idnum can be updated in the future to a new object by issuing
- # a new mark directive with the old idnum.
- #
-  mark ::= 'mark' sp idnum lf;
-  data ::= (delimited_data | exact_data)
-lf?;
-
-# note: delim may be any string but must not contain lf.
-# data_line may contain any data but must not be exactly
-# delim.
-  delimited_data ::= 'data' sp '<<' delim lf
-(data_line lf)*
-delim lf;
-
- # note: declen indicates the length of binary_data in bytes.
- # declen does not include the lf preceding the binary data.
- #
-  exact_data ::= 'data' sp declen lf
-binary_data;
-
- # note: quoted strings are C-style quoting supporting \c for
- # common escapes of 'c' (e..g \n, \t, \\, \") or \nnn where nnn
- # is the signed byte value in octal.  Note that the only
- # characters which must actually be escaped to protect the
- # stream formatting is: \, " and LF.  Otherwise these values
- # are UTF8.
- #
-  commit-ish  ::= (ref_str | hexsha1 | sha1exp_str | idnum);
-  ref_str ::= ref;
-  sha1exp_str ::= sha1exp;
-  tag_str ::= tag;
-  path_str::= path| '"' quoted(path)'"' ;
-  mode::= '100644' | '644'
-| '100755' | '755'
-| '12'
-;
-
-  declen ::= # unsigned 32 bit value, ascii base10 notation;
-  bigint ::= # unsigned integer value, ascii base10 notation;
-  binary_data ::= # file content, not interpreted;
-
-  when ::= raw_when | rfc2822_when;
-  raw_when ::= ts sp tz;
-  rfc2822_when ::= # Valid RFC 2822 date and time;
-
-  sp ::= # ASCII space character;
-  lf ::= # ASCII newline (LF) character;
-
- # note: a colon (':') must precede the numerical value assigned to
- # an idnum.  This is to distinguish it from a ref or tag name as
- # GIT does not permit ':' in ref or tag strings.
- #
-  idnum   ::= ':' bigint;
-  path::= # GIT style file path, e.g. "a/b/c";
-  ref ::= # GIT ref name, e.g. "refs/heads/MOZ_GECKO_EXPERIMENT";
-  tag ::= # GIT tag name, e.g. "FIREFOX_1_5";
-  sha1exp ::= # Any valid GIT SHA1 expression;
-  hexsha1 ::= # SHA1 in hexadecimal format;
-
- # note: name and email are UTF8 strings, however name must not
- # contain '<' or lf and email must not contain any of the
- # following: '<', '>', lf.
- #
-  name  ::= # valid GIT author/committer name;
-  email ::= # valid GIT author/committer email;
-  ts::= # time since the epoch in seconds, ascii base10 notation;
-  tz::= # GIT style timezone;
-
- # note: comments, get-mark, ls-tree, and cat-blob requests may
- # appear anywhere in the input, except within a data 

[PATCH v3 06/11] fast-export: move commit rewriting logic into a function for reuse

2018-11-16 Thread Elijah Newren
Logic to replace a filtered commit with an unfiltered ancestor is useful
elsewhere; put it into a function we can call.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c | 37 ++---
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 7d50f5414e..43e98a38a8 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -187,6 +187,22 @@ static int get_object_mark(struct object *object)
return ptr_to_mark(decoration);
 }
 
+static struct commit *rewrite_commit(struct commit *p)
+{
+   for (;;) {
+   if (p->parents && p->parents->next)
+   break;
+   if (p->object.flags & UNINTERESTING)
+   break;
+   if (!(p->object.flags & TREESAME))
+   break;
+   if (!p->parents)
+   return NULL;
+   p = p->parents->item;
+   }
+   return p;
+}
+
 static void show_progress(void)
 {
static int counter = 0;
@@ -767,21 +783,12 @@ static void handle_tag(const char *name, struct tag *tag)
oid_to_hex(>object.oid),
type_name(tagged->type));
}
-   p = (struct commit *)tagged;
-   for (;;) {
-   if (p->parents && p->parents->next)
-   break;
-   if (p->object.flags & UNINTERESTING)
-   break;
-   if (!(p->object.flags & TREESAME))
-   break;
-   if (!p->parents) {
-   printf("reset %s\nfrom %s\n\n",
-  name, oid_to_hex(_oid));
-   free(buf);
-   return;
-   }
-   p = p->parents->item;
+   p = rewrite_commit((struct commit *)tagged);
+   if (!p) {
+   printf("reset %s\nfrom %s\n\n",
+  name, oid_to_hex(_oid));
+   free(buf);
+   return;
}
tagged_mark = get_object_mark(>object);
}
-- 
2.19.1.1063.g1796373474.dirty



[PATCH v3 09/11] fast-export: add --reference-excluded-parents option

2018-11-16 Thread Elijah Newren
git filter-branch has a nifty feature allowing you to rewrite, e.g. just
the last 8 commits of a linear history
  git filter-branch $OPTIONS HEAD~8..HEAD

If you try the same with git fast-export, you instead get a history of
only 8 commits, with HEAD~7 being rewritten into a root commit.  There
are two alternatives:

  1) Don't use the negative revision specification, and when you're
 filtering the output to make modifications to the last 8 commits,
 just be careful to not modify any earlier commits somehow.

  2) First run 'git fast-export --export-marks=somefile HEAD~8', then
 run 'git fast-export --import-marks=somefile HEAD~8..HEAD'.

Both are more error prone than I'd like (the first for obvious reasons;
with the second option I have sometimes accidentally included too many
revisions in the first command and then found that the corresponding
extra revisions were not exported by the second command and thus were
not modified as I expected).  Also, both are poor from a performance
perspective.

Add a new --reference-excluded-parents option which will cause
fast-export to refer to commits outside the specified rev-list-args
range by their sha1sum.  Such a stream will only be useful in a
repository which already contains the necessary commits (much like the
restriction imposed when using --no-data).

Note from Peff:
  I think we might be able to do a little more optimization here. If
  we're exporting HEAD^..HEAD and there's an object in HEAD^ which is
  unchanged in HEAD, I think we'd still print it (because it would not
  be marked SHOWN), but we could omit it (by walking the tree of the
  boundary commits and marking them shown).  I don't think it's a
  blocker for what you're doing here, but just a possible future
  optimization.

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-export.txt | 17 +++--
 builtin/fast-export.c | 42 +++
 t/t9350-fast-export.sh| 11 
 3 files changed, 58 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-fast-export.txt 
b/Documentation/git-fast-export.txt
index fda55b3284..f65026662a 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -110,6 +110,18 @@ marks the same across runs.
the shape of the history and stored tree.  See the section on
`ANONYMIZING` below.
 
+--reference-excluded-parents::
+   By default, running a command such as `git fast-export
+   master~5..master` will not include the commit master{tilde}5
+   and will make master{tilde}4 no longer have master{tilde}5 as
+   a parent (though both the old master{tilde}4 and new
+   master{tilde}4 will have all the same files).  Use
+   --reference-excluded-parents to instead have the the stream
+   refer to commits in the excluded range of history by their
+   sha1sum.  Note that the resulting stream can only be used by a
+   repository which already contains the necessary parent
+   commits.
+
 --refspec::
Apply the specified refspec to each ref exported. Multiple of them can
be specified.
@@ -119,8 +131,9 @@ marks the same across runs.
'git rev-list', that specifies the specific objects and references
to export.  For example, `master~10..master` causes the
current master reference to be exported along with all objects
-   added since its 10th ancestor commit and all files common to
-   master{tilde}9 and master{tilde}10.
+   added since its 10th ancestor commit and (unless the
+   --reference-excluded-parents option is specified) all files
+   common to master{tilde}9 and master{tilde}10.
 
 EXAMPLES
 
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index d71e0333d4..78fc67b03a 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -37,6 +37,7 @@ static int fake_missing_tagger;
 static int use_done_feature;
 static int no_data;
 static int full_tree;
+static int reference_excluded_commits;
 static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
 static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
 static struct refspec refspecs = REFSPEC_INIT_FETCH;
@@ -597,7 +598,8 @@ static void handle_commit(struct commit *commit, struct 
rev_info *rev,
message += 2;
 
if (commit->parents &&
-   get_object_mark(>parents->item->object) != 0 &&
+   (get_object_mark(>parents->item->object) != 0 ||
+reference_excluded_commits) &&
!full_tree) {
parse_commit_or_die(commit->parents->item);
diff_tree_oid(get_commit_tree_oid(commit->parents->item),
@@ -645,13 +647,21 @@ static void handle_commit(struct commit *commit, struct 
rev_info *rev,
unuse_commit_buffer(commit, commit_buffer);
 
for (i = 0, p = commit->parents; p; p = p->next) {
-   int mark = get_object_mark(>item->object);
-   if (!mark)
+

[PATCH v3 08/11] fast-export: ensure we export requested refs

2018-11-16 Thread Elijah Newren
If file paths are specified to fast-export and a ref points to a commit
that does not touch any of the relevant paths, then that ref would
sometimes fail to be exported.  (This depends on whether any ancestors
of the commit which do touch the relevant paths would be exported with
that same ref name or a different ref name.)  To avoid this problem,
put *all* specified refs into extra_refs to start, and then as we export
each commit, remove the refname used in the 'commit $REFNAME' directive
from extra_refs.  Then, in handle_tags_and_duplicates() we know which
refs actually do need a manual reset directive in order to be included.

This means that we do need some special handling for excluded refs; e.g.
if someone runs
   git fast-export ^master master
then they've asked for master to be exported, but they have also asked
for the commit which master points to and all of its history to be
excluded.  That logically means ref deletion.  Previously, such refs
were just silently omitted from being exported despite having been
explicitly requested for export.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c  | 54 --
 t/t9350-fast-export.sh | 16 ++---
 2 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 227488ae84..d71e0333d4 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -38,6 +38,7 @@ static int use_done_feature;
 static int no_data;
 static int full_tree;
 static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
+static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
 static struct refspec refspecs = REFSPEC_INIT_FETCH;
 static int anonymize;
 static struct revision_sources revision_sources;
@@ -612,6 +613,13 @@ static void handle_commit(struct commit *commit, struct 
rev_info *rev,
export_blob(_queued_diff.queue[i]->two->oid);
 
refname = *revision_sources_at(_sources, commit);
+   /*
+* FIXME: string_list_remove() below for each ref is overall
+* O(N^2).  Compared to a history walk and diffing trees, this is
+* just lost in the noise in practice.  However, theoretically a
+* repo may have enough refs for this to become slow.
+*/
+   string_list_remove(_refs, refname, 0);
if (anonymize) {
refname = anonymize_refname(refname);
anonymize_ident_line(, _end);
@@ -815,7 +823,7 @@ static struct commit *get_commit(struct rev_cmdline_entry 
*e, char *full_name)
/* handle nested tags */
while (tag && tag->object.type == OBJ_TAG) {
parse_object(the_repository, >object.oid);
-   string_list_append(_refs, full_name)->util = tag;
+   string_list_append(_refs, full_name)->util = tag;
tag = (struct tag *)tag->tagged;
}
if (!tag)
@@ -874,25 +882,30 @@ static void get_tags_and_duplicates(struct 
rev_cmdline_info *info)
}
 
/*
-* This ref will not be updated through a commit, lets make
-* sure it gets properly updated eventually.
+* Make sure this ref gets properly updated eventually, whether
+* through a commit or manually at the end.
 */
-   if (*revision_sources_at(_sources, commit) ||
-   commit->object.flags & SHOWN)
+   if (e->item->type != OBJ_TAG)
string_list_append(_refs, full_name)->util = 
commit;
+
if (!*revision_sources_at(_sources, commit))
*revision_sources_at(_sources, commit) = 
full_name;
}
+
+   string_list_sort(_refs);
+   string_list_remove_duplicates(_refs, 0);
 }
 
-static void handle_tags_and_duplicates(void)
+static void handle_tags_and_duplicates(struct string_list *extras)
 {
struct commit *commit;
int i;
 
-   for (i = extra_refs.nr - 1; i >= 0; i--) {
-   const char *name = extra_refs.items[i].string;
-   struct object *object = extra_refs.items[i].util;
+   for (i = extras->nr - 1; i >= 0; i--) {
+   const char *name = extras->items[i].string;
+   struct object *object = extras->items[i].util;
+   int mark;
+
switch (object->type) {
case OBJ_TAG:
handle_tag(name, (struct tag *)object);
@@ -913,8 +926,24 @@ static void handle_tags_and_duplicates(void)
   name, oid_to_hex(_oid));
continue;
}
-   printf("reset %s\nfrom :%d\n\n", name,
-  get_object_mark(>object));
+
+   mark = get_object_mark(>object);
+   if (!mark) {
+

[PATCH v3 03/11] git-fast-export.txt: clarify misleading documentation about rev-list args

2018-11-16 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-export.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-fast-export.txt 
b/Documentation/git-fast-export.txt
index ce954be532..fda55b3284 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -119,7 +119,8 @@ marks the same across runs.
'git rev-list', that specifies the specific objects and references
to export.  For example, `master~10..master` causes the
current master reference to be exported along with all objects
-   added since its 10th ancestor commit.
+   added since its 10th ancestor commit and all files common to
+   master{tilde}9 and master{tilde}10.
 
 EXAMPLES
 
-- 
2.19.1.1063.g1796373474.dirty



[PATCH v3 11/11] fast-export: add a --show-original-ids option to show original names

2018-11-16 Thread Elijah Newren
Knowing the original names (hashes) of commits can sometimes enable
post-filtering that would otherwise be difficult or impossible.  In
particular, the desire to rewrite commit messages which refer to other
prior commits (on top of whatever other filtering is being done) is
very difficult without knowing the original names of each commit.

In addition, knowing the original names (hashes) of blobs can allow
filtering by blob-id without requiring re-hashing the content of the
blob, and is thus useful as a small optimization.

Once we add original ids for both commits and blobs, we may as well
add them for tags too for completeness.  Perhaps someone will have a
use for them.

This commit teaches a new --show-original-ids option to fast-export
which will make it add a 'original-oid ' line to blob, commits,
and tags.  It also teaches fast-import to parse (and ignore) such
lines.

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-export.txt |  7 +++
 Documentation/git-fast-import.txt | 16 
 builtin/fast-export.c | 20 +++-
 fast-import.c | 12 
 t/t9350-fast-export.sh| 17 +
 5 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-fast-export.txt 
b/Documentation/git-fast-export.txt
index f65026662a..64c01ba918 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -122,6 +122,13 @@ marks the same across runs.
repository which already contains the necessary parent
commits.
 
+--show-original-ids::
+   Add an extra directive to the output for commits and blobs,
+   `original-oid `.  While such directives will likely be
+   ignored by importers such as git-fast-import, it may be useful
+   for intermediary filters (e.g. for rewriting commit messages
+   which refer to older commits, or for stripping blobs by id).
+
 --refspec::
Apply the specified refspec to each ref exported. Multiple of them can
be specified.
diff --git a/Documentation/git-fast-import.txt 
b/Documentation/git-fast-import.txt
index 7ab97745a6..43ab3b1637 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -385,6 +385,7 @@ change to the project.
 
'commit' SP  LF
mark?
+   original-oid?
('author' (SP )? SP LT  GT SP  LF)?
'committer' (SP )? SP LT  GT SP  LF
data
@@ -741,6 +742,19 @@ New marks are created automatically.  Existing marks can 
be moved
 to another object simply by reusing the same `` in another
 `mark` command.
 
+`original-oid`
+~~
+Provides the name of the object in the original source control system.
+fast-import will simply ignore this directive, but filter processes
+which operate on and modify the stream before feeding to fast-import
+may have uses for this information
+
+
+   'original-oid' SP  LF
+
+
+where `` is any string not containing LF.
+
 `tag`
 ~
 Creates an annotated tag referring to a specific commit.  To create
@@ -749,6 +763,7 @@ lightweight (non-annotated) tags see the `reset` command 
below.
 
'tag' SP  LF
'from' SP  LF
+   original-oid?
'tagger' (SP )? SP LT  GT SP  LF
data
 
@@ -823,6 +838,7 @@ assigned mark.
 
'blob' LF
mark?
+   original-oid?
data
 
 
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 78fc67b03a..36c2575de5 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -38,6 +38,7 @@ static int use_done_feature;
 static int no_data;
 static int full_tree;
 static int reference_excluded_commits;
+static int show_original_ids;
 static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
 static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
 static struct refspec refspecs = REFSPEC_INIT_FETCH;
@@ -271,7 +272,10 @@ static void export_blob(const struct object_id *oid)
 
mark_next_object(object);
 
-   printf("blob\nmark :%"PRIu32"\ndata %lu\n", last_idnum, size);
+   printf("blob\nmark :%"PRIu32"\n", last_idnum);
+   if (show_original_ids)
+   printf("original-oid %s\n", oid_to_hex(oid));
+   printf("data %lu\n", size);
if (size && fwrite(buf, size, 1, stdout) != 1)
die_errno("could not write blob '%s'", oid_to_hex(oid));
printf("\n");
@@ -635,8 +639,10 @@ static void handle_commit(struct commit *commit, struct 
rev_info *rev,
reencoded = reencode_string(message, "UTF-8", encoding);
if (!commit->parents)
printf("reset %s\n", refname);
-   printf("commit %s\nmark :%"PRIu32"\n%.*s\n%.*s\ndata %u\n%s",
-  refname, last_idnum,
+   printf("commit %s\nmark :%"PRIu32"\n", refname, last_idnum);
+   if (show_original_ids)
+   printf("original-oid %s\n", oid_to_hex(>object.oid));
+   printf("%.*s\n%.*s\ndata %u\n%s",
 

[PATCH v3 02/11] git-fast-import.txt: fix documentation for --quiet option

2018-11-16 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-import.txt | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-fast-import.txt 
b/Documentation/git-fast-import.txt
index e81117d27f..7ab97745a6 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -40,9 +40,10 @@ OPTIONS
not contain the old commit).
 
 --quiet::
-   Disable all non-fatal output, making fast-import silent when it
-   is successful.  This option disables the output shown by
-   --stats.
+   Disable the output shown by --stats, making fast-import usually
+   be silent when it is successful.  However, if the import stream
+   has directives intended to show user output (e.g. `progress`
+   directives), the corresponding messages will still be shown.
 
 --stats::
Display some basic statistics about the objects fast-import has
-- 
2.19.1.1063.g1796373474.dirty



[PATCH v3 04/11] fast-export: use value from correct enum

2018-11-16 Thread Elijah Newren
ABORT and ERROR happen to have the same value, but come from differnt
enums.  Use the one from the correct enum, and while at it, rename the
values to avoid such problems.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index f5166ac71e..e2be35f41e 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -31,8 +31,8 @@ static const char *fast_export_usage[] = {
 };
 
 static int progress;
-static enum { ABORT, VERBATIM, WARN, WARN_STRIP, STRIP } signed_tag_mode = 
ABORT;
-static enum { ERROR, DROP, REWRITE } tag_of_filtered_mode = ERROR;
+static enum { SIGNED_TAG_ABORT, VERBATIM, WARN, WARN_STRIP, STRIP } 
signed_tag_mode = SIGNED_TAG_ABORT;
+static enum { TAG_FILTERING_ABORT, DROP, REWRITE } tag_of_filtered_mode = 
TAG_FILTERING_ABORT;
 static int fake_missing_tagger;
 static int use_done_feature;
 static int no_data;
@@ -46,7 +46,7 @@ static int parse_opt_signed_tag_mode(const struct option *opt,
 const char *arg, int unset)
 {
if (unset || !strcmp(arg, "abort"))
-   signed_tag_mode = ABORT;
+   signed_tag_mode = SIGNED_TAG_ABORT;
else if (!strcmp(arg, "verbatim") || !strcmp(arg, "ignore"))
signed_tag_mode = VERBATIM;
else if (!strcmp(arg, "warn"))
@@ -64,7 +64,7 @@ static int parse_opt_tag_of_filtered_mode(const struct option 
*opt,
  const char *arg, int unset)
 {
if (unset || !strcmp(arg, "abort"))
-   tag_of_filtered_mode = ERROR;
+   tag_of_filtered_mode = TAG_FILTERING_ABORT;
else if (!strcmp(arg, "drop"))
tag_of_filtered_mode = DROP;
else if (!strcmp(arg, "rewrite"))
@@ -728,7 +728,7 @@ static void handle_tag(const char *name, struct tag *tag)
   "\n-BEGIN PGP 
SIGNATURE-\n");
if (signature)
switch(signed_tag_mode) {
-   case ABORT:
+   case SIGNED_TAG_ABORT:
die("encountered signed tag %s; use "
"--signed-tags= to handle it",
oid_to_hex(>object.oid));
@@ -753,7 +753,7 @@ static void handle_tag(const char *name, struct tag *tag)
tagged_mark = get_object_mark(tagged);
if (!tagged_mark) {
switch(tag_of_filtered_mode) {
-   case ABORT:
+   case TAG_FILTERING_ABORT:
die("tag %s tags unexported object; use "
"--tag-of-filtered-object= to handle it",
oid_to_hex(>object.oid));
-- 
2.19.1.1063.g1796373474.dirty



[PATCH v3 01/11] fast-export: convert sha1 to oid

2018-11-16 Thread Elijah Newren
Rename anonymize_sha1() to anonymize_oid(() and change its signature,
and switch from sha1_to_hex() to oid_to_hex() and from GIT_SHA1_RAWSZ to
the_hash_algo->rawsz.  Also change a comment and a die string to mention
oid instead of sha1.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 456797c12a..f5166ac71e 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -243,7 +243,7 @@ static void export_blob(const struct object_id *oid)
if (!buf)
die("could not read blob %s", oid_to_hex(oid));
if (check_object_signature(oid, buf, size, type_name(type)) < 0)
-   die("sha1 mismatch in blob %s", oid_to_hex(oid));
+   die("oid mismatch in blob %s", oid_to_hex(oid));
object = parse_object_buffer(the_repository, oid, type,
 size, buf, );
}
@@ -330,17 +330,18 @@ static void print_path(const char *path)
 
 static void *generate_fake_oid(const void *old, size_t *len)
 {
-   static uint32_t counter = 1; /* avoid null sha1 */
-   unsigned char *out = xcalloc(GIT_SHA1_RAWSZ, 1);
-   put_be32(out + GIT_SHA1_RAWSZ - 4, counter++);
+   static uint32_t counter = 1; /* avoid null oid */
+   const unsigned hashsz = the_hash_algo->rawsz;
+   unsigned char *out = xcalloc(hashsz, 1);
+   put_be32(out + hashsz - 4, counter++);
return out;
 }
 
-static const unsigned char *anonymize_sha1(const struct object_id *oid)
+static const struct object_id *anonymize_oid(const struct object_id *oid)
 {
-   static struct hashmap sha1s;
-   size_t len = GIT_SHA1_RAWSZ;
-   return anonymize_mem(, generate_fake_oid, oid, );
+   static struct hashmap objs;
+   size_t len = the_hash_algo->rawsz;
+   return anonymize_mem(, generate_fake_oid, oid, );
 }
 
 static void show_filemodify(struct diff_queue_struct *q,
@@ -399,9 +400,9 @@ static void show_filemodify(struct diff_queue_struct *q,
 */
if (no_data || S_ISGITLINK(spec->mode))
printf("M %06o %s ", spec->mode,
-  sha1_to_hex(anonymize ?
-  anonymize_sha1(>oid) :
-  spec->oid.hash));
+  oid_to_hex(anonymize ?
+ anonymize_oid(>oid) :
+ >oid));
else {
struct object *object = 
lookup_object(the_repository,
  
spec->oid.hash);
@@ -988,7 +989,7 @@ static void handle_deletes(void)
continue;
 
printf("reset %s\nfrom %s\n\n",
-   refspec->dst, sha1_to_hex(null_sha1));
+   refspec->dst, oid_to_hex(_oid));
}
 }
 
-- 
2.19.1.1063.g1796373474.dirty



[PATCH v3 05/11] fast-export: avoid dying when filtering by paths and old tags exist

2018-11-16 Thread Elijah Newren
If --tag-of-filtered-object=rewrite is specified along with a set of
paths to limit what is exported, then any tags pointing to old commits
that do not contain any of those specified paths cause problems.  Since
the old tagged commit is not exported, fast-export attempts to rewrite
such tags to an ancestor commit which was exported.  If no such commit
exists, then fast-export currently die()s.  Five years after the tag
rewriting logic was added to fast-export (see commit 2d8ad4691921,
"fast-export: Add a --tag-of-filtered-object  option for newly dangling
tags", 2009-06-25), fast-import gained the ability to delete refs (see
commit 4ee1b225b99f, "fast-import: add support to delete refs",
2014-04-20), so now we do have a valid option to rewrite the tag to.
Delete these tags instead of dying.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c  |  9 ++---
 t/t9350-fast-export.sh | 16 
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index e2be35f41e..7d50f5414e 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -775,9 +775,12 @@ static void handle_tag(const char *name, struct tag *tag)
break;
if (!(p->object.flags & TREESAME))
break;
-   if (!p->parents)
-   die("can't find replacement commit for 
tag %s",
-oid_to_hex(>object.oid));
+   if (!p->parents) {
+   printf("reset %s\nfrom %s\n\n",
+  name, oid_to_hex(_oid));
+   free(buf);
+   return;
+   }
p = p->parents->item;
}
tagged_mark = get_object_mark(>object);
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 6a392e87bc..3400ebeb51 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -325,6 +325,22 @@ test_expect_success 'rewriting tag of filtered out object' 
'
 )
 '
 
+test_expect_success 'rewrite tag predating pathspecs to nothing' '
+   test_create_repo rewrite_tag_predating_pathspecs &&
+   (
+   cd rewrite_tag_predating_pathspecs &&
+
+   test_commit initial &&
+
+   git tag -a -m "Some old tag" v0.0.0.0.0.0.1 &&
+
+   test_commit bar &&
+
+   git fast-export --tag-of-filtered-object=rewrite --all -- bar.t 
>output &&
+   grep from.$ZERO_OID output
+   )
+'
+
 cat > limit-by-paths/expected << EOF
 blob
 mark :1
-- 
2.19.1.1063.g1796373474.dirty



[PATCH v3 07/11] fast-export: when using paths, avoid corrupt stream with non-existent mark

2018-11-16 Thread Elijah Newren
If file paths are specified to fast-export and multiple refs point to a
commit that does not touch any of the relevant file paths, then
fast-export can hit problems.  fast-export has a list of additional refs
that it needs to explicitly set after exporting all blobs and commits,
and when it tries to get_object_mark() on the relevant commit, it can
get a mark of 0, i.e. "not found", because the commit in question did
not touch the relevant paths and thus was not exported.  Trying to
import a stream with a mark corresponding to an unexported object will
cause fast-import to crash.

Avoid this problem by taking the commit the ref points to and finding an
ancestor of it that was exported, and make the ref point to that commit
instead.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c  | 13 -
 t/t9350-fast-export.sh | 20 
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 43e98a38a8..227488ae84 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -901,7 +901,18 @@ static void handle_tags_and_duplicates(void)
if (anonymize)
name = anonymize_refname(name);
/* create refs pointing to already seen commits */
-   commit = (struct commit *)object;
+   commit = rewrite_commit((struct commit *)object);
+   if (!commit) {
+   /*
+* Neither this object nor any of its
+* ancestors touch any relevant paths, so
+* it has been filtered to nothing.  Delete
+* it.
+*/
+   printf("reset %s\nfrom %s\n\n",
+  name, oid_to_hex(_oid));
+   continue;
+   }
printf("reset %s\nfrom :%d\n\n", name,
   get_object_mark(>object));
show_progress();
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 3400ebeb51..299120ba70 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -382,6 +382,26 @@ test_expect_success 'path limiting with import-marks does 
not lose unmodified fi
grep file0 actual
 '
 
+test_expect_success 'avoid corrupt stream with non-existent mark' '
+   test_create_repo avoid_non_existent_mark &&
+   (
+   cd avoid_non_existent_mark &&
+
+   test_commit important-path &&
+
+   test_commit ignored &&
+
+   git branch A &&
+   git branch B &&
+
+   echo foo >>important-path.t &&
+   git add important-path.t &&
+   test_commit more changes &&
+
+   git fast-export --all -- important-path.t | git fast-import 
--force
+   )
+'
+
 test_expect_success 'full-tree re-shows unmodified files''
git checkout -f simple &&
git fast-export --full-tree simple >actual &&
-- 
2.19.1.1063.g1796373474.dirty