date:20180503

Re: [PATCH v2] git: add -P as a short option for --no-pager

2018-05-03 Thread Junio C Hamano

Eric Sunshine  writes:

>> available. Provide a short option, -P, to make the option easier
>> accessible.
>
> s/easier accessible/easier to access/
> --- or ---
> s/easier accessible/more easily accessible/
> --- or ---
> s/easier accessible/more accessible/
>
> The patch itself looks fine.

Thanks.  More easily accessible, it is.

Re: [PATCH 00/18] Add `branch-diff`, a `tbdiff` lookalike

2018-05-03 Thread Junio C Hamano

Johannes Schindelin  writes:

> Johannes Schindelin (17):
>   Add a function to solve least-cost assignment problems
>   Add a new builtin: branch-diff
>   branch-diff: first rudimentary implementation
>   branch-diff: improve the order of the shown commits
>   branch-diff: also show the diff between patches
>   branch-diff: right-trim commit messages
>   branch-diff: indent the diffs just like tbdiff
>   branch-diff: suppress the diff headers
>   branch-diff: adjust the output of the commit pairs
>   branch-diff: do not show "function names" in hunk headers
>   branch-diff: use color for the commit pairs
>   color: provide inverted colors, too
>   diff: add an internal option to dual-color diffs of diffs
>   branch-diff: offer to dual-color the diffs
>   branch-diff --dual-color: work around bogus white-space warning
>   branch-diff: add a man page
>   completion: support branch-diff
>
> Thomas Rast (1):
>   branch-diff: add tests

Lovely.  

I often have to criticize a series whose later half consists of many
follow-up patches with "don't do 'oops, the previous was wrong'",
but the follow-up patches in this series are not such corrections.
The organization of the series to outline the basic and core idea
first in the minimum form and then to build on it to improve an
aspect of the command one step at a time is very helpful to guide
the readers where the author of the series wants them to go.

Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-03 Thread Duy Nguyen

On Thu, May 3, 2018 at 10:32 PM, Johannes Schindelin
 wrote:
> Hi Duy,
>
> On Thu, 3 May 2018, Johannes Schindelin wrote:
>
>> On Thu, 3 May 2018, Duy Nguyen wrote:
>>
>> > On Thu, May 3, 2018 at 5:30 PM, Johannes Schindelin
>> >  wrote:
>> > > diff --git a/command-list.txt b/command-list.txt
>> > > index a1fad28fd82..c89ac8f417f 100644
>> > > --- a/command-list.txt
>> > > +++ b/command-list.txt
>> > > @@ -19,6 +19,7 @@ git-archive mainporcelain
>> > >  git-bisect  mainporcelain   info
>> > >  git-blame   ancillaryinterrogators
>> > >  git-branch  mainporcelain   history
>> > > +git-branch-diff mainporcelain   info
>> >
>> > Making it part of "git help" with the info keywords at this stage may
>> > be premature. "git help" is about _common_ commands and we don't know
>> > (yet) how popular this will be.
>>
>> Makes sense. I removed the `mainporcelain` keyword locally.
>
> On second thought, I *think* you meant to imply that I should remove that
> line altogether. Will do that now.

Actually I only suggested to remove the last word "info". That was
what made this command "common". Classifying all commands in this file
is definitely a good thing, and I think mainporcelain is the right
choice.
-- 
Duy

Re: [PATCH 03/18] branch-diff: first rudimentary implementation

2018-05-03 Thread Junio C Hamano

Johannes Schindelin  writes:

> Note: due to differences in the diff algorithm (`tbdiff` uses the
> Pythong module `difflib`, Git uses its xdiff fork), the cost matrix

Pythong???

> calculated by `branch-diff` is different (but very similar) to the one
> calculated by `tbdiff`. Therefore, it is possible that they find
> different matching commits in corner cases (e.g. when a patch was split
> into two patches of roughly equal length).

Re: [PATCH 17/18] branch-diff: add a man page

2018-05-03 Thread Eric Sunshine

On Thu, May 3, 2018 at 11:31 AM, Johannes Schindelin
 wrote:
> This is a heavily butchered version of the README written by Thomas
> Rast and Thomas Gummerer, lifted from https://github.com/trast/tbdiff.
>
> Signed-off-by: Johannes Schindelin 
> ---
> diff --git a/Documentation/git-branch-diff.txt 
> b/Documentation/git-branch-diff.txt
> @@ -0,0 +1,239 @@
> +Algorithm
> +-
> +
> +The general idea is this: we generate a cost matrix between the commits
> +in both commit ranges, then solve the least-cost assignment.
> +
> +To avoid false positives (e.g. when a patch has been removed, and an
> +unrelated patch has been added between two iterations of the same patch
> +series), the cost matrix is extended to allow for that, by adding
> +fixed-cost entries for wholesale deletes/adds.
> +
> +Example: let commits `1--2` be the first iteration of a patch series and

s/let/Let/

> +`A--C` the second iteration. Let's assume that `A` is a cherry-pick of
> +`2,` and `C` is a cherry-pick of `1` but with a small modification (say,
> +a fixed typo). Visualize the commits as a bipartite graph:

Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-03 Thread Junio C Hamano

Johannes Schindelin  writes:

> So please: as soon as you have a concrete suggestion where to use cyan
> (and preferably even a DIFF_* constant to feed to diff_get_color_opt()), I
> will be more than interested.

I do not think Stefan's comment was that he was keen to use 'cyan'.
It was a color I suggested in a review of his change where he added
new colors to the color.[ch] palette, and I found that reusing an
existing color would have achieved the same distinction between
lines of output from his code, and it would be beneficial to make
the outcome consistent to consider why these existing colors are
used in existing places and trying to align the rationale for new
uses. "cyan" was cited as an example to illustrate that last point,
i.e. we use it to dim out relatively uninteresting part.

Re: [PATCH 05/18] branch-diff: also show the diff between patches

2018-05-03 Thread Eric Sunshine

On Thu, May 3, 2018 at 10:51 PM, Eric Sunshine  wrote:
> On Thu, May 3, 2018 at 11:30 AM, Johannes Schindelin
>  wrote:
>> Note: while we now parse diff options such as --color, the effect is not
>> yet the same as in tbdiff, where also the commit pairs would be colored.
>
> "... tbdiff, in which the commit pairs would also be colored."
>
> However, I don't see the --color option being parsed by this patch, so
> perhaps this "Note" can be dropped?

Ignore this latter comment; I missed the newly added call to diff_opt_parse().

Re: [PATCH v4 2/3] merge: Add merge.renames config setting

2018-05-03 Thread Junio C Hamano

Ben Peart  writes:

I'd downcase the verb on the subject.

> Add the ability to control rename detection for merge via a config setting.
> This setting behaves the same and defaults to the value of diff.renames but 
> only
> applies to merge.
>
> Reviewed-by: Johannes Schindelin 
> Helped-by: Elijah Newren 
> Signed-off-by: Ben Peart 
> ...
> diff --git a/merge-recursive.h b/merge-recursive.h
> index d863cf8867..c1d9b5b3d9 100644
> --- a/merge-recursive.h
> +++ b/merge-recursive.h
> @@ -55,6 +56,11 @@ struct collision_entry {
>   struct string_list source_files;
>   unsigned reported_already:1;
>  };
> +inline int merge_detect_rename(struct merge_options *o)
> +{
> + return o->merge_detect_rename >= 0 ? o->merge_detect_rename :
> + o->diff_detect_rename >= 0 ? o->diff_detect_rename : 1;
> +}

I'll tweak the above to leave a blank before the function, and make
it "static inline", to ensure that the output from

$ git grep -e '\' --and --not -e 'static inline' -- \*.h

is empty.

Ihre Antwort benötigt von Sgt.Monica Lin Brown

2018-05-03 Thread Hpnunn

Ich bin Sgt.Monica Lin Brown, ursprünglich aus Lake Jackson Texas USA. Ich habe 
persönlich eine spezielle Recherche auf der Internetseite durchgeführt und bin 
auf Ihre Informationen gestoßen. Ich schreibe Ihnen diese Mail von U.S. 
Military Base Kabul Afghanistan. Ich habe einen gesicherten Geschäftsvorschlag 
für Sie. Wenn Sie Interesse an meiner privaten E-Mail 
sgtmonica-biya...@outlook.com haben, wenden Sie sich bitte an mich

Re: [PATCH 05/18] branch-diff: also show the diff between patches

2018-05-03 Thread Eric Sunshine

On Thu, May 3, 2018 at 11:30 AM, Johannes Schindelin
 wrote:
> Just like tbdiff, we now show the diff between matching patches. This is
> a "diff of two diffs", so it can be a bit daunting to read for the
> beginnger.

s/beginnger/beginner/

> This brings branch-diff closer to be feature-complete with regard to

s/be feature-complete/feature parity/

> tbdiff.
>
> An alternative would be to display an interdiff, i.e. the hypothetical
> diff which is the result of first reverting the old diff and then
> applying the new diff.
>
> Especially when rebasing often, an interdiff is often not feasible,
> though: if the old diff cannot be applied in reverse (due to a moving
> upstream), an interdiff can simply not be inferred.
>
> Note: while we now parse diff options such as --color, the effect is not
> yet the same as in tbdiff, where also the commit pairs would be colored.

"... tbdiff, in which the commit pairs would also be colored."

However, I don't see the --color option being parsed by this patch, so
perhaps this "Note" can be dropped?

> This is left for a later commit.
>
> Signed-off-by: Johannes Schindelin 
> ---
> diff --git a/builtin/branch-diff.c b/builtin/branch-diff.c
> @@ -319,24 +348,37 @@ static void output(struct string_list *a, struct 
> string_list *b)
>  int cmd_branch_diff(int argc, const char **argv, const char *prefix)
>  {
> -   int no_patches = 0;
> +   struct diff_options diffopt = { 0 };
> double creation_weight = 0.6;
> struct option options[] = {
> -   OPT_BOOL(0, "no-patches", _patches,
> -N_("short format (no diffs)")),

This was added in 2/18 but never used...

> +   OPT_SET_INT(0, "no-patches", _format,
> +   N_("short format (no diffs)"),
> +   DIFF_FORMAT_NO_OUTPUT),

... and is then replaced in its entirety by this. Perhaps just drop
the original --no-patches from 2/18 and let it be introduced for the
first time here?

> { OPTION_CALLBACK,
> 0, "creation-weight", _weight, N_("factor"),
> N_("Fudge factor by which creation is weighted 
> [0.6]"),
> 0, parse_creation_weight },
> OPT_END()
> };

Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string

2018-05-03 Thread Junio C Hamano

Jakub Narebski  writes:

> I think the problem is not with aligning, otherwise we would simply get
> bad aling, and not visible corruption.  The ACTUAL PROBLEM is most
> probably because of concatenating strings marked as UTF-8 and strings
> not marked as UTF-8.  Strange things happen then in Perl, unfortunetaly.

Sounds quite right---the patch needs to be retitled, if the bug is
not about "measuring offset", which I think is what fooled me when I
sent my response.

Re: [PATCH 03/18] branch-diff: first rudimentary implementation

2018-05-03 Thread Eric Sunshine

On Thu, May 3, 2018 at 11:30 AM, Johannes Schindelin
 wrote:
> At this stage, `git branch-diff` can determine corresponding commits of
> two related commit ranges. This makes use of the recently introduced
> implementation of the Hungarian algorithm.
>
> The core of this patch is a straight port of the ideas of tbdiff, the
> seemingly dormant project at https://github.com/trast/tbdiff.
>
> The output does not at all match `tbdiff`'s output yet, as this patch
> really concentrates on getting the patch matching part right.
>
> Note: due to differences in the diff algorithm (`tbdiff` uses the
> Pythong module `difflib`, Git uses its xdiff fork), the cost matrix

s/Pythong/Python/

> calculated by `branch-diff` is different (but very similar) to the one
> calculated by `tbdiff`. Therefore, it is possible that they find
> different matching commits in corner cases (e.g. when a patch was split
> into two patches of roughly equal length).
>
> Signed-off-by: Johannes Schindelin 
> ---
> diff --git a/builtin/branch-diff.c b/builtin/branch-diff.c
> @@ -19,6 +23,279 @@ static int parse_creation_weight(const struct option 
> *opt, const char *arg,
> +static int read_patches(const char *range, struct string_list *list)
> +{
> +   [...]
> +   struct strbuf buf = STRBUF_INIT, line = STRBUF_INIT;
> +   [...]
> +   } else if (starts_with(line.buf, "")) {
> +   strbuf_addbuf(, );
> +   strbuf_addch(, '\n');
> +   }
> +
> +   continue;

Unnecessary blank line above 'continue'?

> +   } else if (starts_with(line.buf, "@@ "))
> +   strbuf_addstr(, "@@");
> +   [...]
> +   }
> +   fclose(in);
> +
> +   if (util)
> +   string_list_append(list, buf.buf)->util = util;
> +   strbuf_release();

strbuf_release();

> +   if (finish_command())
> +   return -1;
> +
> +   return 0;
> +}
> @@ -32,9 +309,63 @@ int cmd_branch_diff(int argc, const char **argv, const 
> char *prefix)
> +   if (argc == 2) {
> +   if (!strstr(argv[0], ".."))
> +   warning(_("no .. in range: '%s'"), argv[0]);
> +   strbuf_addstr(, argv[0]);
> +
> +   if (!strstr(argv[1], ".."))
> +   warning(_("no .. in range: '%s'"), argv[1]);
> +   strbuf_addstr(, argv[1]);
> +   } else if (argc == 1) {
> +   if (!b)
> +   die(_("single arg format requires a symmetric 
> range"));
> +   } else {
> +   error("Need two commit ranges");

Other warning/error messages emitted by this function are not
capitalized: s/Need/need/

> +   usage_with_options(builtin_branch_diff_usage, options);
> +   }
> +
> +   if (read_patches(range1.buf, ))
> +   res = error(_("could not parse log for '%s'"), range1.buf);
> +   if (!res && read_patches(range2.buf, ))
> +   res = error(_("could not parse log for '%s'"), range2.buf);

Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-03 Thread Eric Sunshine

On Thu, May 3, 2018 at 11:30 AM, Johannes Schindelin
 wrote:
> This builtin does not do a whole lot so far, apart from showing a usage
> that is oddly similar to that of `git tbdiff`. And for a good reason:
> the next commits will turn `branch-diff` into a full-blown replacement
> for `tbdiff`.
>
> At this point, we ignore tbdiff's color options, as they will all be
> implemented later and require some patches to the diff machinery.
>
> Signed-off-by: Johannes Schindelin 
> ---
> diff --git a/builtin/branch-diff.c b/builtin/branch-diff.c
> @@ -0,0 +1,40 @@
> +static const char * const builtin_branch_diff_usage[] = {
> +   N_("git rebase--helper [] ( A..B C..D | A...B | base A B )"),
> +   NULL
> +};

The formatting of "" vs. "base" confused me into thinking
that the latter was a literal keyword, but I see from reading patch
3/18 that it is not a literal at all, thus probably ought to be
specified as "".

Re: [PATCH] revisions.txt: expand tabs to spaces in diagram

2018-05-03 Thread Junio C Hamano

"brian m. carlson"  writes:

> ...  We can help both AsciiDoc and Asciidoctor do the right thing
> here (and keep our 8-space tabs) by enclosing the diagram in a block
> like so:
>
> [literal]
> --
>Args   Expanded argumentsSelected commits
>DG H D
>D F  G H I J D F
>...
> = B ^D ^E ^F  B
>F^! D  = F ^I ^J D   G H D F
> --
>
> and using the tabsize=8 attribute when invoking Asciidoctor.  I can send
> a patch tomorrow which does this, which I think may be a bit nicer than
> having to give up tabs.

Yeah, trying to forbid indent-with-tab has been shown to be
unworkable so even if we tried to give up tabs, it is likely we
would break it by mistake.  Allowing indent-with-tab and ask the
tool to do the right thing is obviously the right direction.

Thanks.

Re: [PATCH v2 1/3] upload-pack: fix error message typo

2018-05-03 Thread Junio C Hamano

Jonathan Tan  writes:

> On Thu, 3 May 2018 11:58:59 -0700
> Stefan Beller  wrote:
>
>> > +   test_must_fail git -C server serve --stateless-rpc /dev/null 
>> > 2>err &&
>> 
>> Minor nit:
>> Why do we pipe stdout to /dev/null ?
>
> Usually there's a binary packfile produced, but not in this case. I'll
> remove it.

Hmm, when somebody breaks "git server serve", we probably would not
want to see the binary spewed to the output while debugging it.  So
I'd probably keep the redirection---it may be an improvement to use
">out" and then checking it is empty after the expected failure.

Re: [PATCH v3 09/12] get_short_oid / peel_onion: ^{tree} should be tree, not treeish

2018-05-03 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason  writes:

> The reason I'm doing this is because I found it confusing that I can't
> do:
>
> for t in tag commit tree blob; do ./git --exec-path=$PWD rev-parse 
> 7452^{$t}; done
>
> And get, respectively, only the SHAs that match the respective type, but
> currently (with released git) you can do:
>
> for t in tag commit committish treeish tree blob; do git -c 
> core.disambiguate=$t rev-parse 7452; done

Exactly.  The former asks "I (think I) know 7452 can be used to name
an object of type $t, with peeling if necessary--give me the underlying
object of type $t".  In short, the fact that you can write "$X^{$t}"
says that $X is a $t-ish (otherwise $X cannot be used as a stand-in
for an object of type $t) and that you fully expect that $X can merely
be of type $t-ish and not exactly $t (otherwise you wouldn't be
making sure to coerce $X into $t with ^{$t} notation).

In *THAT* context, disambiguation help that lists objects whose name
begins with "7452" you gave, hoping that it is a unique enough
prefix when it wasn't in reality, *MUST* give $t-ish; restricting it
to $t makes the help mostly useless.

> 1) Am I missing some subtlety or am I correct that there was no way to
> get git to return more than one SHA-1 for ^{commit} or ^{tree} before
> this disambiguation feature was added?

There is no such feature either before or after the disambiguation
help.  I am not saying there shouldn't exist such a feature.  I am
saying that breaking the existing feature and making it useless is
not the way to add such a feature.

Re: [PATCH] revisions.txt: expand tabs to spaces in diagram

2018-05-03 Thread brian m. carlson

On Wed, May 02, 2018 at 08:20:54AM +0200, Martin Ågren wrote:
> On 2 May 2018 at 06:50, Junio C Hamano  wrote:
> > Martin Ågren  writes:
> >
> >> The diagram renders fine in AsciiDoc before and after this patch.
> >> Asciidoctor, on the other hand, ignores the tabs entirely, which results
> >> in different indentation for different lines. The graph illustration
> >> earlier in the document already uses spaces instead of a tab.
> >
> > Ouch.  We might want to teach Documentation/.gitattributes that
> > indent-with-tab is unwelcome in that directory, after making this
> > fix (and possibly similar ones to other files).
> 
> I actually grepped around a little for a leading tab, to see if I could
> immediately spot any similar issues. But there are tons of tabs here
> (about 13000). Most of them work out just fine, e.g., in the OPTIONS,
> where we tab-indent the descriptions.
> 
> So while we could try to move away from leading tabs in Documentation/,
> it would be a huge undertaking. Any kind of "do it while we're nearby"
> approach would take a long time to complete. And a one-off conversion
> would probably be a horrible idea. ;-)
> 
> I just noticed another difference in how the tabs are handled. In
> git-add.txt, which I just picked at random, the three continuation lines
> in the synopsis indent differently in AsciiDoc (which indents them more
> than in the .txt) and Asciidoctor (which indents them less than in the
> .txt). To me, this is more of a "if I didn't sit down and compare the
> two outputs, I would never think about these indentations -- they're
> both fine".
> 
> So it might be that the only places where leading tabs really matter is
> this kind of diagrams. Maybe we have a handful such bugs lingering among
> the 13000 tab-indented lines...

I took a look at this.  Asciidoctor does seem to have some weird ideas
about tabs, but I think how it's handling this is converting them to
spaces, which makes sense if you're working with HTML or XML (which do
some bizarre things with tabs).

However, it's not converting them to 8 spaces, which is really what we
want here.  We can help both AsciiDoc and Asciidoctor do the right thing
here (and keep our 8-space tabs) by enclosing the diagram in a block
like so:

[literal]
--
   Args   Expanded argumentsSelected commits
   DG H D
   D F  G H I J D F
   ^G D H D
   ^D B E I J F B
   ^D B C   E I J F B C
   CI J F C
   B..C   = ^B CC
   B...C  = B ^F C  G H D E B C
   B^-= B^..B
  = ^B^1 B  E I J F B
   C^@= C^1
  = F   I J F
   B^@= B^1 B^2 B^3
  = D E F   D G H E F I J
   C^!= C ^C^@
  = C ^C^1
  = C ^FC
   B^!= B ^B^@
  = B ^B^1 ^B^2 ^B^3
  = B ^D ^E ^F  B
   F^! D  = F ^I ^J D   G H D F
--

and using the tabsize=8 attribute when invoking Asciidoctor.  I can send
a patch tomorrow which does this, which I think may be a bit nicer than
having to give up tabs.

Thoughts?
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [PATCH 39/41] Update shell scripts to compute empty tree object ID

2018-05-03 Thread brian m. carlson

On Tue, May 01, 2018 at 12:42:57PM +0200, Duy Nguyen wrote:
> On Mon, Apr 23, 2018 at 11:39:49PM +, brian m. carlson wrote:
> > Several of our shell scripts hard-code the object ID of the empty tree.
> > To avoid any problems when changing hashes, compute this value on
> > startup of the script.  For performance, store the value in a variable
> > and reuse it throughout the life of the script.
> > 
> > Signed-off-by: brian m. carlson 
> > ---
> >  git-filter-branch.sh   | 4 +++-
> >  git-rebase--interactive.sh | 4 +++-
> >  templates/hooks--pre-commit.sample | 2 +-
> >  3 files changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> > index 64f21547c1..ccceaf19a7 100755
> > --- a/git-filter-branch.sh
> > +++ b/git-filter-branch.sh
> > @@ -11,6 +11,8 @@
> >  # The following functions will also be available in the commit filter:
> >  
> >  functions=$(cat << \EOF
> > +EMPTY_TREE=$(git hash-object -t tree /dev/null)
> 
> All scripts (except those example hooks) must source
> git-sh-setup. Should we define this in there instead?

I think at this point, I'm okay with special-casing these two uses, but
I would generally say that if we gain any more we should move it there.

There's a trade-off between the benefits of reuse here and the fact that
we're forking a process, which incurs a cost, especially on Windows.

I'm open to hearing other opinions, of course.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

[PATCH v3 2/3] upload-pack: read config when serving protocol v2

2018-05-03 Thread Jonathan Tan

The upload-pack code paths never call git_config() with
upload_pack_config() when protocol v2 is used, causing options like
uploadpack.packobjectshook to not take effect. Ensure that this function
is called.

Signed-off-by: Jonathan Tan 
---
 t/t5702-protocol-v2.sh | 14 ++
 upload-pack.c  |  2 ++
 2 files changed, 16 insertions(+)

diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 56f7c3c32..abb15cd6d 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -201,6 +201,20 @@ test_expect_success 'ref advertisment is filtered during 
fetch using protocol v2
! grep "refs/tags/three" log
 '
 
+test_expect_success 'upload-pack respects config using protocol v2' '
+   git init server &&
+   write_script server/.git/hook <<-\EOF &&
+   touch hookout
+   "$@"
+   EOF
+   test_commit -C server one &&
+
+   test_config_global uploadpack.packobjectshook ./hook &&
+   test_path_is_missing server/.git/hookout &&
+   git -c protocol.version=2 clone "file://$(pwd)/server" client &&
+   test_path_is_file server/.git/hookout
+'
+
 # Test protocol v2 with 'http://' transport
 #
 . "$TEST_DIRECTORY"/lib-httpd.sh
diff --git a/upload-pack.c b/upload-pack.c
index c4456bb88..113edd32d 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -1376,6 +1376,8 @@ int upload_pack_v2(struct repository *r, struct 
argv_array *keys,
enum fetch_state state = FETCH_PROCESS_ARGS;
struct upload_pack_data data;
 
+   git_config(upload_pack_config, NULL);
+
upload_pack_data_init();
use_sideband = LARGE_PACKET_MAX;
 
-- 
2.17.0.441.gb46fe60e1d-goog

[PATCH v3 3/3] {fetch,upload}-pack: support filter in protocol v2

2018-05-03 Thread Jonathan Tan

The fetch-pack/upload-pack protocol v2 was developed independently of
the filter parameter (used in partial fetches), thus it did not include
support for it. Add support for the filter parameter.

Like in the legacy protocol, the server advertises and supports "filter"
only if uploadpack.allowfilter is configured.

Like in the legacy protocol, the client continues with a warning if
"--filter" is specified, but the server does not advertise it.

Signed-off-by: Jonathan Tan 
---
 Documentation/technical/protocol-v2.txt |  9 +++
 fetch-pack.c| 23 +-
 t/t5702-protocol-v2.sh  | 98 +
 upload-pack.c   | 15 +++-
 4 files changed, 140 insertions(+), 5 deletions(-)

diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
index 136179d7d..38d24fd2b 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -290,6 +290,15 @@ included in the clients request as well as the potential 
addition of the
Cannot be used with "deepen", but can be used with
"deepen-since".
 
+If the 'filter' feature is advertised, the following argument can be
+included in the client's request:
+
+filter 
+   Request that various objects from the packfile be omitted
+   using one of several filtering techniques. These are intended
+   for use with partial clone and partial fetch operations. See
+   `rev-list` for possible "filter-spec" values.
+
 The response of `fetch` is broken into a number of sections separated by
 delimiter packets (0001), with each section beginning with its section
 header.
diff --git a/fetch-pack.c b/fetch-pack.c
index f93723fec..3ed40aa46 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1191,14 +1191,29 @@ static int send_fetch_request(int fd_out, const struct 
fetch_pack_args *args,
else if (is_repository_shallow() || args->deepen)
die(_("Server does not support shallow requests"));
 
+   /* Add filter */
+   if (server_supports_feature("fetch", "filter", 0) &&
+   args->filter_options.choice) {
+   print_verbose(args, _("Server supports filter"));
+   packet_buf_write(_buf, "filter %s",
+args->filter_options.filter_spec);
+   } else if (args->filter_options.choice) {
+   warning("filtering not recognized by server, ignoring");
+   }
+
/* add wants */
add_wants(wants, _buf);
 
-   /* Add all of the common commits we've found in previous rounds */
-   add_common(_buf, common);
+   if (args->no_dependents) {
+   packet_buf_write(_buf, "done");
+   ret = 1;
+   } else {
+   /* Add all of the common commits we've found in previous rounds 
*/
+   add_common(_buf, common);
 
-   /* Add initial haves */
-   ret = add_haves(_buf, haves_to_send, in_vain);
+   /* Add initial haves */
+   ret = add_haves(_buf, haves_to_send, in_vain);
+   }
 
/* Send request */
packet_buf_flush(_buf);
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index abb15cd6d..25bf046b3 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -215,6 +215,104 @@ test_expect_success 'upload-pack respects config using 
protocol v2' '
test_path_is_file server/.git/hookout
 '
 
+test_expect_success 'setup filter tests' '
+   rm -rf server client &&
+   git init server &&
+
+   # 1 commit to create a file, and 1 commit to modify it
+   test_commit -C server message1 a.txt &&
+   test_commit -C server message2 a.txt &&
+   git -C server config protocol.version 2 &&
+   git -C server config uploadpack.allowfilter 1 &&
+   git -C server config uploadpack.allowanysha1inwant 1 &&
+   git -C server config protocol.version 2
+'
+
+test_expect_success 'partial clone' '
+   GIT_TRACE_PACKET="$(pwd)/trace" git -c protocol.version=2 \
+   clone --filter=blob:none "file://$(pwd)/server" client &&
+   grep "version 2" trace &&
+
+   # Ensure that the old version of the file is missing
+   git -C client rev-list master --quiet --objects --missing=print \
+   >observed.oids &&
+   grep "$(git -C server rev-parse message1:a.txt)" observed.oids &&
+
+   # Ensure that client passes fsck
+   git -C client fsck
+'
+
+test_expect_success 'dynamically fetch missing object' '
+   rm "$(pwd)/trace" &&
+   GIT_TRACE_PACKET="$(pwd)/trace" git -C client -c protocol.version=2 \
+   cat-file -p $(git -C server rev-parse message1:a.txt) &&
+   grep "version 2" trace
+'
+
+test_expect_success 'partial fetch' '
+   rm -rf client "$(pwd)/trace" &&
+   git init client &&
+   SERVER="file://$(pwd)/server" &&
+   test_config -C client extensions.partialClone

[PATCH v3 1/3] upload-pack: fix error message typo

2018-05-03 Thread Jonathan Tan

Fix a typo in an error message.

Also, this line was introduced in 3145ea957d2c ("upload-pack: introduce
fetch server command", 2018-03-15), which did not contain a test for the
case which causes this error to be printed, so introduce a test.

Signed-off-by: Jonathan Tan 
---
 t/t5701-git-serve.sh | 14 ++
 upload-pack.c|  2 +-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index 72d7bc562..6439f7792 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -173,4 +173,18 @@ test_expect_success 'symrefs parameter' '
test_cmp actual expect
 '
 
+test_expect_success 'unexpected lines are not allowed in fetch request' '
+   git init server &&
+
+   test-pkt-line pack >in <<-EOF &&
+   command=fetch
+   0001
+   this-is-not-a-command
+   
+   EOF
+
+   test_must_fail git -C server serve --stateless-rpc err &&
+   grep "unexpected line: .this-is-not-a-command." err
+'
+
 test_done
diff --git a/upload-pack.c b/upload-pack.c
index 87b4d32a6..c4456bb88 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -1252,7 +1252,7 @@ static void process_args(struct packet_reader *request,
}
 
/* ignore unknown lines maybe? */
-   die("unexpect line: '%s'", arg);
+   die("unexpected line: '%s'", arg);
}
 }
 
-- 
2.17.0.441.gb46fe60e1d-goog

[PATCH v3 0/3] Supporting partial clones in protocol v2

2018-05-03 Thread Jonathan Tan

Changes from v2: followed all Stefan's comments.

Jonathan Tan (3):
  upload-pack: fix error message typo
  upload-pack: read config when serving protocol v2
  {fetch,upload}-pack: support filter in protocol v2

 Documentation/technical/protocol-v2.txt |   9 ++
 fetch-pack.c|  23 -
 t/t5701-git-serve.sh|  14 +++
 t/t5702-protocol-v2.sh  | 112 
 upload-pack.c   |  19 +++-
 5 files changed, 171 insertions(+), 6 deletions(-)

-- 
2.17.0.441.gb46fe60e1d-goog

Re: [PATCH v2 2/3] upload-pack: read config when serving protocol v2

2018-05-03 Thread Jonathan Tan

On Thu, 3 May 2018 12:08:16 -0700
Stefan Beller  wrote:

> test_path_is_missing ?

Thanks for the pointer. Done.

> > +   GIT_TRACE=/tmp/y git -c protocol.version=2 clone 
> > "file://$(pwd)/server" client &&
> 
> Why do we redirect GIT_TRACE outside the test suite? do we read that
> back or want to read it out of the hook?
> 
> Is it possible to redirect to  /$(pwd)/trace or such?

Good catch - that was a leftover from debugging. I've just removed it.

> > +   test -f server/.git/hookout
> 
> test_path_is_file ?

Done.

Re: [PATCH v2 1/3] upload-pack: fix error message typo

2018-05-03 Thread Jonathan Tan

On Thu, 3 May 2018 11:58:59 -0700
Stefan Beller  wrote:

> > +   test_must_fail git -C server serve --stateless-rpc /dev/null 
> > 2>err &&
> 
> Minor nit:
> Why do we pipe stdout to /dev/null ?

Usually there's a binary packfile produced, but not in this case. I'll
remove it.

Re: [PATCH 11/18] branch-diff: add tests

2018-05-03 Thread Philip Oakley

From: "Johannes Schindelin" 

From: Thomas Rast 

These are essentially lifted from https://github.com/trast/tbdiff, with
light touch-ups to account for the new command name.

Apart from renaming `tbdiff` to `branch-diff`, only one test case needed
to be adjusted: 11 - 'changed message'.

The underlying reason it had to be adjusted is that diff generation is
sometimes ambiguous. In this case, a comment line and an empty line are
added, but it is ambiguous whether they were added after the existing
empty line, or whether an empty line and the comment line are added
*before* the existing emtpy line. And apparently xdiff picks a different

s/emtpy/empty/

option here than Python's difflib.

Signed-off-by: Johannes Schindelin 

[...]
Philip

Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-03 Thread Ramsay Jones



On 03/05/18 21:25, Johannes Schindelin wrote:

> On Thu, 3 May 2018, Ramsay Jones wrote:

>> On 03/05/18 16:30, Johannes Schindelin wrote:
[snip]

>>> diff --git a/builtin/branch-diff.c b/builtin/branch-diff.c
>>> new file mode 100644
>>> index 000..97266cd326d
>>> --- /dev/null
>>> +++ b/builtin/branch-diff.c
>>> @@ -0,0 +1,40 @@
>>> +#include "cache.h"
>>> +#include "parse-options.h"
>>> +
>>> +static const char * const builtin_branch_diff_usage[] = {
>>> +   N_("git rebase--helper [] ( A..B C..D | A...B | base A B )"),
>>
>> s/rebase--helper/branch-diff/
> 
> Whoops!
> 
> BTW funny side note: when I saw that you replied, I instinctively thought
> "oh no, I forgot to mark a function as `static`!" ;-)

Heh, but I hadn't got around to applying the patches and building
git yet! ;-)

Sparse has two complaints:

  > SP builtin/branch-diff.c
  > builtin/branch-diff.c:433:41: warning: Using plain integer as NULL pointer
  > builtin/branch-diff.c:431:5: warning: symbol 'cmd_branch_diff' was not 
declared. Should it be static?

I suppressed those warnings with the following patch (on top
of these patches):

  $ git diff
  diff --git a/builtin/branch-diff.c b/builtin/branch-diff.c
  index edf80ecb7..1373c22f4 100644
  --- a/builtin/branch-diff.c
  +++ b/builtin/branch-diff.c
  @@ -1,4 +1,5 @@
   #include "cache.h"
  +#include "builtin.h"
   #include "parse-options.h"
   #include "string-list.h"
   #include "run-command.h"
  @@ -430,7 +431,7 @@ static void output(struct string_list *a, struct 
string_list *b,
 
   int cmd_branch_diff(int argc, const char **argv, const char *prefix)
   {
  -   struct diff_options diffopt = { 0 };
  +   struct diff_options diffopt = { NULL };
  struct strbuf four_spaces = STRBUF_INIT;
  int dual_color = 0;
  double creation_weight = 0.6;
  $ 

The first hunk applies to patch 02/18 (ie this very patch) and
the second hunk should be applied to patch 05/18 (ie, "branch-diff:
also show the diff between patches").

ATB,
Ramsay Jones

[PATCH v2 5/6] rebase --rebase-merges: a "merge" into a new root is a fast-forward

2018-05-03 Thread Johannes Schindelin

When a user provides a todo list containing something like

reset [new root]
merge my-branch

let's do the same as if pulling into an orphan branch: simply
fast-forward.

Signed-off-by: Johannes Schindelin 
---
 sequencer.c  | 12 
 t/t3430-rebase-merges.sh | 13 +
 2 files changed, 25 insertions(+)

diff --git a/sequencer.c b/sequencer.c
index a7832399b1f..65a8c493781 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2870,6 +2870,18 @@ static int do_merge(struct commit *commit, const char 
*arg, int arg_len,
goto leave_merge;
}
 
+   if (opts->have_squash_onto &&
+   !oidcmp(_commit->object.oid, >squash_onto)) {
+   /*
+* When the user tells us to "merge" something into a
+* "[new root]", let's simply fast-forward to the merge head.
+*/
+   rollback_lock_file();
+   ret = fast_forward_to(_commit->object.oid,
+  _commit->object.oid, 0, opts);
+   goto leave_merge;
+   }
+
if (commit) {
const char *message = get_commit_buffer(commit, NULL);
const char *body;
diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
index 35260862fcb..5543f1d5a34 100755
--- a/t/t3430-rebase-merges.sh
+++ b/t/t3430-rebase-merges.sh
@@ -275,4 +275,17 @@ test_expect_success 'root commits' '
test_cmp_rev HEAD $before
 '
 
+test_expect_success 'a "merge" into a root commit is a fast-forward' '
+   head=$(git rev-parse HEAD) &&
+   cat >script-from-scratch <<-EOF &&
+   reset [new root]
+   merge $head
+   EOF
+   test_config sequence.editor \""$PWD"/replace-editor.sh\" &&
+   test_tick &&
+   git rebase -i -r HEAD^ &&
+   test_cmp_rev HEAD $head
+'
+
+
 test_done
-- 
2.17.0.windows.1.38.g05ca542f78d

[PATCH v2 1/6] sequencer: extract helper to update active_cache_tree

2018-05-03 Thread Johannes Schindelin

This patch extracts the code from is_index_unchanged() to initialize or
update the index' cache tree (i.e. a tree object reflecting the current
index' top-level tree).

The new helper will be used in the upcoming code to support `git rebase
-i --root` via the sequencer.

Signed-off-by: Johannes Schindelin 
---
 sequencer.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index e2f83942843..90c8218aa9a 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -562,9 +562,23 @@ static int do_recursive_merge(struct commit *base, struct 
commit *next,
return !clean;
 }
 
+static struct object_id *get_cache_tree_oid(void)
+{
+   if (!active_cache_tree)
+   active_cache_tree = cache_tree();
+
+   if (!cache_tree_fully_valid(active_cache_tree))
+   if (cache_tree_update(_index, 0)) {
+   error(_("unable to update cache tree"));
+   return NULL;
+   }
+
+   return _cache_tree->oid;
+}
+
 static int is_index_unchanged(void)
 {
-   struct object_id head_oid;
+   struct object_id head_oid, *cache_tree_oid;
struct commit *head_commit;
 
if (!resolve_ref_unsafe("HEAD", RESOLVE_REF_READING, _oid, NULL))
@@ -583,15 +597,10 @@ static int is_index_unchanged(void)
if (parse_commit(head_commit))
return -1;
 
-   if (!active_cache_tree)
-   active_cache_tree = cache_tree();
-
-   if (!cache_tree_fully_valid(active_cache_tree))
-   if (cache_tree_update(_index, 0))
-   return error(_("unable to update cache tree"));
+   if (!(cache_tree_oid = get_cache_tree_oid()))
+   return -1;
 
-   return !oidcmp(_cache_tree->oid,
-  _commit->tree->object.oid);
+   return !oidcmp(cache_tree_oid, _commit->tree->object.oid);
 }
 
 static int write_author_script(const char *message)
-- 
2.17.0.windows.1.38.g05ca542f78d

[PATCH v2 6/6] rebase --rebase-merges: root commits can be cousins, too

2018-05-03 Thread Johannes Schindelin

Reported by Wink Saville: when rebasing with no-rebase-cousins, we
will want to refrain from rebasing all of them, even when they are
root commits.

Signed-off-by: Johannes Schindelin 
---
 sequencer.c  |  3 ++-
 t/t3430-rebase-merges.sh | 25 +
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index 65a8c493781..01e561bc20e 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3903,7 +3903,8 @@ static int make_script_with_merges(struct 
pretty_print_context *pp,
}
 
if (!commit)
-   fprintf(out, "%s onto\n", cmd_reset);
+   fprintf(out, "%s %s\n", cmd_reset,
+   rebase_cousins ? "onto" : "[new root]");
else {
const char *to = NULL;
 
diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
index 5543f1d5a34..ce6de6f491e 100755
--- a/t/t3430-rebase-merges.sh
+++ b/t/t3430-rebase-merges.sh
@@ -287,5 +287,30 @@ test_expect_success 'a "merge" into a root commit is a 
fast-forward' '
test_cmp_rev HEAD $head
 '
 
+test_expect_success 'A root commit can be a cousin, treat it that way' '
+   git checkout --orphan khnum &&
+   test_commit yama &&
+   git checkout -b asherah master &&
+   test_commit shamkat &&
+   git merge --allow-unrelated-histories khnum &&
+   test_tick &&
+   git rebase -f -r HEAD^ &&
+   ! test_cmp_rev HEAD^2 khnum &&
+   test_cmp_graph HEAD^.. <<-\EOF &&
+   *   Merge branch '\''khnum'\'' into asherah
+   |\
+   | * yama
+   o shamkat
+   EOF
+   test_tick &&
+   git rebase --rebase-merges=rebase-cousins HEAD^ &&
+   test_cmp_graph HEAD^.. <<-\EOF
+   *   Merge branch '\''khnum'\'' into asherah
+   |\
+   | * yama
+   |/
+   o shamkat
+   EOF
+'
 
 test_done
-- 
2.17.0.windows.1.38.g05ca542f78d

[PATCH v2 2/6] sequencer: learn about the special "fake root commit" handling

2018-05-03 Thread Johannes Schindelin

When an interactive rebase wants to recreate a root commit, it
- first creates a new, empty root commit,
- checks it out,
- converts the next `pick` command so that it amends the empty root
  commit

Introduce support in the sequencer to handle such an empty root commit,
by looking for the file /rebase-merge/squash-onto; if it exists
and contains a commit name, the sequencer will compare the HEAD to said
root commit, and if identical, a new root commit will be created.

While converting scripted code into proper, portable C, we also do away
with the old "amend with an empty commit message, then cherry-pick
without committing, then amend again" dance and replace it with code
that uses the internal API properly to do exactly what we want: create a
new root commit.

To keep the implementation simple, we always spawn `git commit` to create
new root commits.

Signed-off-by: Johannes Schindelin 
---
 sequencer.c | 124 ++--
 sequencer.h |   4 ++
 2 files changed, 125 insertions(+), 3 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 90c8218aa9a..8eddda681d1 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -125,6 +125,12 @@ static GIT_PATH_FUNC(rebase_path_rewritten_list, 
"rebase-merge/rewritten-list")
 static GIT_PATH_FUNC(rebase_path_rewritten_pending,
"rebase-merge/rewritten-pending")
 
+/*
+ * The path of the file containig the OID of the "squash onto" commit, i.e.
+ * the dummy commit used for `reset [new root]`.
+ */
+static GIT_PATH_FUNC(rebase_path_squash_onto, "rebase-merge/squash-onto")
+
 /*
  * The path of the file listing refs that need to be deleted after the rebase
  * finishes. This is used by the `label` command to record the need for 
cleanup.
@@ -470,7 +476,8 @@ static int fast_forward_to(const struct object_id *to, 
const struct object_id *f
transaction = ref_transaction_begin();
if (!transaction ||
ref_transaction_update(transaction, "HEAD",
-  to, unborn ? _oid : from,
+  to, unborn && !is_rebase_i(opts) ?
+  _oid : from,
   0, sb.buf, ) ||
ref_transaction_commit(transaction, )) {
ref_transaction_free(transaction);
@@ -692,6 +699,52 @@ static char *get_author(const char *message)
return NULL;
 }
 
+/* Read author-script and return an ident line (author  timestamp) */
+static const char *read_author_ident(struct strbuf *buf)
+{
+   const char *keys[] = {
+   "GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
+   };
+   char *in, *out, *eol;
+   int i = 0, len;
+
+   if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
+   return NULL;
+
+   /* dequote values and construct ident line in-place */
+   for (in = out = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
+   if (!skip_prefix(in, keys[i], (const char **))) {
+   warning("could not parse '%s' (looking for '%s'",
+   rebase_path_author_script(), keys[i]);
+   return NULL;
+   }
+
+   eol = strchrnul(in, '\n');
+   *eol = '\0';
+   sq_dequote(in);
+   len = strlen(in);
+
+   if (i > 0) /* separate values by spaces */
+   *(out++) = ' ';
+   if (i == 1) /* email needs to be surrounded by <...> */
+   *(out++) = '<';
+   memmove(out, in, len);
+   out += len;
+   if (i == 1) /* email needs to be surrounded by <...> */
+   *(out++) = '>';
+   in = eol + 1;
+   }
+
+   if (i < 3) {
+   warning("could not parse '%s' (looking for '%s')",
+   rebase_path_author_script(), keys[i]);
+   return NULL;
+   }
+
+   buf->len = out - buf->buf;
+   return buf->buf;
+}
+
 static const char staged_changes_advice[] =
 N_("you have staged changes in your working tree\n"
 "If these changes are meant to be squashed into the previous commit, run:\n"
@@ -711,6 +764,7 @@ N_("you have staged changes in your working tree\n"
 #define AMEND_MSG   (1<<2)
 #define CLEANUP_MSG (1<<3)
 #define VERIFY_MSG  (1<<4)
+#define CREATE_ROOT_COMMIT (1<<5)
 
 /*
  * If we are cherry-pick, and if the merge did not result in
@@ -730,6 +784,40 @@ static int run_git_commit(const char *defmsg, struct 
replay_opts *opts,
struct child_process cmd = CHILD_PROCESS_INIT;
const char *value;
 
+   if (flags & CREATE_ROOT_COMMIT) {
+   struct strbuf msg = STRBUF_INIT, script = STRBUF_INIT;
+   const char *author = is_rebase_i(opts) ?
+   read_author_ident() : NULL;
+   struct object_id root_commit, *cache_tree_oid;
+   int

[PATCH v2 0/6] Let the sequencer handle `git rebase -i --root`

2018-05-03 Thread Johannes Schindelin

When I reimplemented the most performance-critical bits of the
interactive rebase in the sequencer, to speed up `git rebase -i`
particularly on Windows (even if the benefits are still quite notable on
Linux or macOS), I punted on the --root part.

I had always hoped that some other contributor (or I myself) would come
back later to address the --root part in the sequencer, too, with the
idea that this would move the last remaining complicated code from
git-rebase--interactive.sh into sequencer.c, to facilitate converting
the rest of git-rebase--interactive.sh.

When I say "the last remaining complicated code", of course I neglect
the --preserve-merges code, but as I worked hard on the --rebase-merges
patch series with the intention to eventually deprecate and maybe even
remove the --preserve-merges mode, I always implicitly assume that the
--preserve-merges code will be moved into its own shell script
(git-rebase--preserve-merges.sh, maybe?) and never be converted.

So here goes: the patches to move the handling of --root into the
sequencer. After two preparatory patches, the real conversion takes
place in the third patch. After that, we take care of the --root related
concerns that arise in conjunction with the --rebase-merges mode.

As the --rebase-merges/--root patches overlap quite a bit (not so much
in the code itself as in philosophical considerations such as "what
should happen if you try to merge a branch into a new root", or the
fact that the label/reset/merge commands make it desirable to be able to
create a new root commit in the middle of a todo list), I had to
consider in which order to contribute them. In the end, I decided to go
with --rebase-merges first, so the --root patches are based on the
--rebase-merges patch series.

I consider this patch series a critical prerequisite for Alban's Google
Summer of Code project to convert rebase -i into a builtin.

Changes since v1:

- Expanded is_pick_or_similar() to use a clear and verbose switch()
  statement, replacing the magic "command <= TODO_SQUASH".

- Completely revamped read_author_ident(), using sq_dequote(); Sadly,
  trying to reuse builtin/am.c's read_author_script() would result in
  substantialy more lines of code, and I also failed to find an easy way
  to allow for arbitrary order of the environment variables in
  author-script.


Johannes Schindelin (6):
  sequencer: extract helper to update active_cache_tree
  sequencer: learn about the special "fake root commit" handling
  rebase -i --root: let the sequencer handle even the initial part
  sequencer: allow introducing new root commits
  rebase --rebase-merges: a "merge" into a new root is a fast-forward
  rebase --rebase-merges: root commits can be cousins, too

 git-rebase--interactive.sh|   4 +-
 sequencer.c   | 206 ++
 sequencer.h   |   4 +
 t/t3404-rebase-interactive.sh |  19 ++-
 t/t3421-rebase-topology-linear.sh |   6 +-
 t/t3430-rebase-merges.sh  |  72 +++
 6 files changed, 276 insertions(+), 35 deletions(-)


base-commit: 673fb9cb8b5c7d57cb560b6ade45e419c8dd09fc
Based-On: recreate-merges at https://github.com/dscho/git
Fetch-Base-Via: git fetch https://github.com/dscho/git recreate-merges
Published-As: 
https://github.com/dscho/git/releases/tag/sequencer-and-root-commits-v2
Fetch-It-Via: git fetch https://github.com/dscho/git 
sequencer-and-root-commits-v2

Branch-diff vs v1:
 1: 42db734a980 ! 1: 73398da7119 sequencer: learn about the special "fake root 
commit" handling
 @@ -54,40 +54,50 @@
return NULL;
   }
   
 ++/* Read author-script and return an ident line (author  
timestamp) */
  +static const char *read_author_ident(struct strbuf *buf)
  +{
 -+ char *p, *p2;
 ++ const char *keys[] = {
 ++ "GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
 ++ };
 ++ char *in, *out, *eol;
 ++ int i = 0, len;
  +
  + if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
  + return NULL;
  +
 -+ for (p = buf->buf; *p; p++)
 -+ if (skip_prefix(p, "'''", (const char **)))
 -+ strbuf_splice(buf, p - buf->buf, p2 - p, "'", 1);
 -+ else if (*p == '\'')
 -+ strbuf_splice(buf, p-- - buf->buf, 1, "", 0);
 -+
 -+ if (skip_prefix(buf->buf, "GIT_AUTHOR_NAME=", (const char **))) {
 -+ strbuf_splice(buf, 0, p - buf->buf, "", 0);
 -+ p = strchr(buf->buf, '\n');
 -+ if (skip_prefix(p, "\nGIT_AUTHOR_EMAIL=", (const char **))) {
 -+ strbuf_splice(buf, p - buf->buf, p2 - p, " <", 2);
 -+ p = strchr(p, '\n');
 -+ if (skip_prefix(p, "\nGIT_AUTHOR_DATE=@",
 -+ (const char **))) {
 -+ strbuf_splice(buf, p - buf->buf, p2 - p,
 -+

[PATCH v2 4/6] sequencer: allow introducing new root commits

2018-05-03 Thread Johannes Schindelin

In the context of the new --rebase-merges mode, which was designed
specifically to allow for changing the existing branch topology
liberally, a user may want to extract commits into a completely fresh
branch that starts with a newly-created root commit.

This is now possible by inserting the command `reset [new root]` before
`pick`ing the commit that wants to become a root commit. Example:

reset [new root]
pick 012345 a commit that is about to become a root commit
pick 234567 this commit will have the previous one as parent

This does not conflict with other uses of the `reset` command because
`[new root]` is not (part of) a valid ref name: both the opening bracket
as well as the space are illegal in ref names.

Signed-off-by: Johannes Schindelin 
---
 sequencer.c  | 40 
 t/t3430-rebase-merges.sh | 34 ++
 2 files changed, 62 insertions(+), 12 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 8eddda681d1..a7832399b1f 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2747,18 +2747,34 @@ static int do_reset(const char *name, int len, struct 
replay_opts *opts)
if (hold_locked_index(, LOCK_REPORT_ON_ERROR) < 0)
return -1;
 
-   /* Determine the length of the label */
-   for (i = 0; i < len; i++)
-   if (isspace(name[i]))
-   len = i;
-
-   strbuf_addf(_name, "refs/rewritten/%.*s", len, name);
-   if (get_oid(ref_name.buf, ) &&
-   get_oid(ref_name.buf + strlen("refs/rewritten/"), )) {
-   error(_("could not read '%s'"), ref_name.buf);
-   rollback_lock_file();
-   strbuf_release(_name);
-   return -1;
+   if (len == 10 && !strncmp("[new root]", name, len)) {
+   if (!opts->have_squash_onto) {
+   const char *hex;
+   if (commit_tree("", 0, the_hash_algo->empty_tree,
+   NULL, >squash_onto,
+   NULL, NULL))
+   return error(_("writing fake root commit"));
+   opts->have_squash_onto = 1;
+   hex = oid_to_hex(>squash_onto);
+   if (write_message(hex, strlen(hex),
+ rebase_path_squash_onto(), 0))
+   return error(_("writing squash-onto"));
+   }
+   oidcpy(, >squash_onto);
+   } else {
+   /* Determine the length of the label */
+   for (i = 0; i < len; i++)
+   if (isspace(name[i]))
+   len = i;
+
+   strbuf_addf(_name, "refs/rewritten/%.*s", len, name);
+   if (get_oid(ref_name.buf, ) &&
+   get_oid(ref_name.buf + strlen("refs/rewritten/"), )) {
+   error(_("could not read '%s'"), ref_name.buf);
+   rollback_lock_file();
+   strbuf_release(_name);
+   return -1;
+   }
}
 
memset(_tree_opts, 0, sizeof(unpack_tree_opts));
diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
index 3d4dfdf7bec..35260862fcb 100755
--- a/t/t3430-rebase-merges.sh
+++ b/t/t3430-rebase-merges.sh
@@ -241,4 +241,38 @@ test_expect_success 'refuse to merge ancestors of HEAD' '
test_cmp_rev HEAD $before
 '
 
+test_expect_success 'root commits' '
+   git checkout --orphan unrelated &&
+   (GIT_AUTHOR_NAME="Parsnip" GIT_AUTHOR_EMAIL="r...@example.com" \
+test_commit second-root) &&
+   test_commit third-root &&
+   cat >script-from-scratch <<-\EOF &&
+   pick third-root
+   label first-branch
+   reset [new root]
+   pick second-root
+   merge first-branch # Merge the 3rd root
+   EOF
+   test_config sequence.editor \""$PWD"/replace-editor.sh\" &&
+   test_tick &&
+   git rebase -i --force --root -r &&
+   test "Parsnip" = "$(git show -s --format=%an HEAD^)" &&
+   test $(git rev-parse second-root^0) != $(git rev-parse HEAD^) &&
+   test $(git rev-parse second-root:second-root.t) = \
+   $(git rev-parse HEAD^:second-root.t) &&
+   test_cmp_graph HEAD <<-\EOF &&
+   *   Merge the 3rd root
+   |\
+   | * third-root
+   * second-root
+   EOF
+
+   : fast forward if possible &&
+   before="$(git rev-parse --verify HEAD)" &&
+   test_might_fail git config --unset sequence.editor &&
+   test_tick &&
+   git rebase -i --root -r &&
+   test_cmp_rev HEAD $before
+'
+
 test_done
-- 
2.17.0.windows.1.38.g05ca542f78d

[PATCH v2 3/6] rebase -i --root: let the sequencer handle even the initial part

2018-05-03 Thread Johannes Schindelin

In this developer's earlier attempt to accelerate interactive rebases by
converting large parts from Unix shell script into portable, performant
C, the --root handling was specifically excluded (to simplify the task a
little bit; it still took over a year to get that reduced set of patches
into Git proper).

This patch ties up that loose end: now only --preserve-merges uses the
slow Unix shell script implementation to perform the interactive rebase.

As the rebase--helper reports progress to stderr (unlike the scripted
interactive rebase, which reports it to stdout, of all places), we have
to adjust a couple of tests that did not expect that for `git rebase -i
--root`.

This patch fixes -- at long last! -- the really old bug reported in
6a6bc5bdc4d (add tests for rebasing root, 2013-06-06) that rebasing with
--root *always* rewrote the root commit, even if there were no changes.

The bug still persists in --preserve-merges mode, of course, but that
mode will be deprecated as soon as the new --rebase-merges mode
stabilizes, anyway.

Signed-off-by: Johannes Schindelin 
---
 git-rebase--interactive.sh|  4 +++-
 t/t3404-rebase-interactive.sh | 19 +--
 t/t3421-rebase-topology-linear.sh |  6 +++---
 3 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index cbf44f86482..2f4941d0fc9 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -894,6 +894,8 @@ init_revisions_and_shortrevisions () {
else
revisions=$onto...$orig_head
shortrevisions=$shorthead
+   test -z "$squash_onto" ||
+   echo "$squash_onto" >"$state_dir"/squash-onto
fi
 }
 
@@ -948,7 +950,7 @@ EOF
die "Could not skip unnecessary pick commands"
 
checkout_onto
-   if test -z "$rebase_root" && test ! -d "$rewritten"
+   if test ! -d "$rewritten"
then
require_clean_work_tree "rebase"
exec git rebase--helper ${force_rebase:+--no-ff} 
$allow_empty_message \
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index 59c766540e5..c65826ddace 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -1204,10 +1204,6 @@ test_expect_success 'drop' '
test A = $(git cat-file commit HEAD^^ | sed -ne \$p)
 '
 
-cat >expect expect actual.2 &&
+   cr_to_nl actual &&
test_i18ncmp expect actual &&
test D = $(git cat-file commit HEAD | sed -ne \$p)
 '
diff --git a/t/t3421-rebase-topology-linear.sh 
b/t/t3421-rebase-topology-linear.sh
index e7438ad06ac..99b2aac9219 100755
--- a/t/t3421-rebase-topology-linear.sh
+++ b/t/t3421-rebase-topology-linear.sh
@@ -328,9 +328,9 @@ test_run_rebase () {
test_cmp_rev c HEAD
"
 }
-test_run_rebase failure ''
-test_run_rebase failure -m
-test_run_rebase failure -i
+test_run_rebase success ''
+test_run_rebase success -m
+test_run_rebase success -i
 test_run_rebase failure -p
 
 test_run_rebase () {
-- 
2.17.0.windows.1.38.g05ca542f78d

Re: [PATCH 00/13] object store: alloc

2018-05-03 Thread Stefan Beller

On Wed, May 2, 2018 at 11:22 AM, Duy Nguyen  wrote:

> I think the two have quite different characteristics. alloc.c code is
> driven by overhead. struct blob is only 24 bytes each and about 1/3
> the repo is blobs, and each malloc has 16 bytes overhead or so if I
> remember correctly. struct cache_entry at minimum in 88 bytes so
> relative overhead is not that a big deal (but sure reducing it is
> still very nice).
>
> mem-pool is about allocation speed,

I don't think so, given that we do a linear search in each block allocation.

> but I think that's not a concern
> for alloc.c because when we do full rev walk, I think I/O is always
> the bottleneck (maybe object lookup as well). I don't see a good way
> to have the one memory allocator that satisfyies both to be honest.

By changing the allocation size of a block to be larger than 1024 entries
in alloc. we should lessen the impact of management overhead, and then
the mem pool can be more than feasible.

[PATCH] alloc.c: replace alloc by mempool

2018-05-03 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---

>> The reason for my doubt is the potential quadratic behavior for new 
>> allocations,
>> in mem_pool_alloc() we walk all mp_blocks to see if we can fit the requested
>> allocation in one of the later blocks.
>> So if we call mem_pool_alloc a million times, we get a O(n) mp_blocks which
>> we'd have to walk in each call.
>
> With the current design, when a new mp_block is allocated, it is
> placed at the head of the linked list. This means that the most
> recently allocated mp_block is the 1st block that is
> searched. The *vast* majority of allocations should be fulfilled
> from this 1st block. It is only when the block is full that we
> search other mp_blocks in the list.

I just measured on git.git and linux.git (both of which are not *huge* by
any standard, but should give a good indication. linux has  6M objects,
and when allocating 1024 at a time, we run into the new block allocation
~6000 times).

I could not measure any meaningful difference.

linux.git $ git count-objects -v
count: 0
size: 0
in-pack: 6036543
packs: 2
size-pack: 2492985
prune-packable: 0
garbage: 0
size-garbage: 0

(with this patch)
 Performance counter stats for '/u/git/git count-objects -v' (30 runs):

  2.123683  task-clock:u (msec)   #0.831 CPUs utilized  
  ( +-  2.32% )
 0  context-switches:u#0.000 K/sec  

 0  cpu-migrations:u  #0.000 K/sec  

   126  page-faults:u #0.059 M/sec  
  ( +-  0.22% )
   895,900  cycles:u  #0.422 GHz
  ( +-  1.40% )
   976,596  instructions:u#1.09  insn per cycle 
  ( +-  0.01% )
   218,256  branches:u#  102.772 M/sec  
  ( +-  0.01% )
 8,331  branch-misses:u   #3.82% of all branches
  ( +-  0.61% )

   0.002556203 seconds time elapsed 
 ( +-  2.20% )

  Performance counter stats for 'git count-objects -v' (30 runs):

  2.410352  task-clock:u (msec)   #0.801 CPUs utilized  
  ( +-  2.79% )
 0  context-switches:u#0.000 K/sec  

 0  cpu-migrations:u  #0.000 K/sec  

   131  page-faults:u #0.054 M/sec  
  ( +-  0.16% )
   993,301  cycles:u  #0.412 GHz
  ( +-  1.99% )
 1,087,428  instructions:u#1.09  insn per cycle 
  ( +-  0.02% )
   244,292  branches:u#  101.351 M/sec  
  ( +-  0.02% )
 9,264  branch-misses:u   #3.79% of all branches
  ( +-  0.57% )

   0.003010854 seconds time elapsed 
 ( +-  2.54% )

So I think we could just replace it for now and optimize again later, if it
turns out to be a problem. I think the easiest optimisation is to increase
the allocation size of having a lot more objects per mp_block.

> If this is a concern, I think
> we have a couple low cost options to mitigate it (maybe a flag to
> control whether we search past the 1st mp_block for space, or
> logic to move blocks out of the search queue when they are
> full or fall below a threshold for available space).

Instead of a flag I thought of its own struct with its own functions,
which would not just have a different searching behavior, but also
store the size in the struct such that you can just call
fixed_size_mem_pool_alloc(void) to get another pointer.
A flag might be more elegant.

>
> If this is of interest, I could contribute a patch to enable one
> of these behaviors?

I am tempted to just do away with alloc.c for now and use the mem-pool.

Thanks,
Stefan



 alloc.c | 60 +++--
 1 file changed, 11 insertions(+), 49 deletions(-)

diff --git a/alloc.c b/alloc.c
index 12afadfacdd..bf003e161be 100644
--- a/alloc.c
+++ b/alloc.c
@@ -15,6 +15,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "tag.h"
+#include "mem-pool.h"
 
 #define BLOCKING 1024
 
@@ -26,61 +27,39 @@ union any_object {
struct tag tag;
 };
 
-struct alloc_state {
-   int count; /* total number of nodes allocated */
-   int nr;/* number of nodes left in current allocation */
-   void *p;   /* first free node in current allocation */
-};
-
-static inline void *alloc_node(struct alloc_state *s, size_t node_size)
-{
-   void *ret;
-
-   if (!s->nr) {
-   s->nr = BLOCKING;
-   s->p = xmalloc(BLOCKING * node_size);
-   }
-   s->nr--;
-   s->count++;
-   ret = s->p;
-   s->p = (char *)s->p + node_size;
-

Re: [GSoC] A blog about 'git stash' project

2018-05-03 Thread Johannes Schindelin

Hi Paul,

On Fri, 4 May 2018, Paul-Sebastian Ungureanu wrote:

> The community bonding period started. It is well known that for a
> greater rate of success, it is recommended to send weekly reports
> regarding project status.  But, what would be a good form for such a
> report? I, for one, think that starting a blog is one of the best
> options because all the content will be stored in one place. Without
> further ado, I would like you to present my blog [1].
> 
> Any feedback is greatly appreciated! Thank you!
> 
> [1]
> https://ungps.github.io/

Looks great!

Maybe also mention that you hang out on IRC occasionally, in case anybody
wants to tell you just how awesome a contributor you are?

Ciao,
Dscho

Re: [PATCH 03/18] branch-diff: first rudimentary implementation

2018-05-03 Thread Johannes Schindelin

Hi Stefan,

On Thu, 3 May 2018, Stefan Beller wrote:

> >> In addition to that patch, we'd have to buffer commit messages and
> >> buffer multiple commits, as that only buffers a diff of a single
> >> commit.
> >
> > ... and make sure that the moved-code logic (which is currently the
> > only user of emitted_symbols, correct?) would never be called at the
> > same time as we generate the diff.
> 
> The moved detection is all part of the flags of an emitted symbol.
> 
> By design the emitted symbol has easy access to the raw line of the output,
> which made it easy for the move detection to work on the lines. AFAICT this
> is also desired here as lines are put into a hashmap for comparisons.
> (and having it colored differently would make finding the same line
> complex using hashmaps)
> 
> I just entertain the thought of having move detection active in a
> branch-diff. That would be really cool actually.

There are two separate times when we generate a diff in branch-diff: the
first time in that `git log -p ` call for both ranges, and later,
when displaying the changes of old/new commits.

(There is actually a third time, when the cost is calculated, but those
diffs are not shown, only their line count is used.)

It would be relatively easy to use move detection in the diff between
old/new commits. But it would be harder to do that with the `git log -p`
diffs, as we only color-code them later, not at the time they are
generated.

In fact, Ævar mentioned that he was pretty happy about the fact that `git
branch-diff` accepts all kinds of diff options, when tbdiff emulated only
two of them. Ævar mentioned specifically the use of `--color-words`...

> >> The benefit would be no invocation of new processes, letting us do
> >> more in core. This would allow for tweaking revision walking
> >> internally, e.g. passing of options to this command such as rename
> >> detection factors, can be passed through easily without the need of
> >> translating it back to the command line.
> >
> > On the other hand, we can simply copy those options to the
> > command-line for `log`. Which might even be better, as e.g. `--format`
> > changes global state :-(
> 
> ok.

I really appreciate the sanity check. It would benefit me on Windows if I
could avoid spawning... But I think in this case, it would save me 2*50ms,
which is not really worth doing the work for, so far.

> Thanks for your patience,

And thank you for yours! It *is* important to challenge beliefs during
code review, so that the choices are made for the right reasons.

Thanks,
Dscho

Re: [PATCH 00/18] Add `branch-diff`, a `tbdiff` lookalike

2018-05-03 Thread Jacob Keller

On Thu, May 3, 2018 at 11:05 AM, Ævar Arnfjörð Bjarmason
 wrote:
>
> On Thu, May 03 2018, Johannes Schindelin wrote:
>
>> The incredibly useful `git-tbdiff` tool to compare patch series (say, to see
>> what changed between two iterations sent to the Git mailing list) is slightly
>> less useful for this developer due to the fact that it requires the 
>> `hungarian`
>> and `numpy` Python packages which are for some reason really hard to build in
>> MSYS2. So hard that I even had to give up, because it was simply easier to
>> reimplement the whole shebang as a builtin command.
>>
>> The project at https://github.com/trast/tbdiff seems to be dormant, anyway.
>> Funny (and true) story: I looked at the open Pull Requests to see how active
>> that project is, only to find to my surprise that I had submitted one in 
>> August
>> 2015, and that it was still unanswered let alone merged.
>
> I've been using branch-diff and haven't found issues with it yet, it
> works like tbdiff but better. Faster, uses the same diff as git
> (better), and spews to the pager by default.

I'm hoping to take a look at this as well, I remember looking into
tbdiff in the past, but also had trouble getting it to work. I've
tried a variety of similar things, including 4-way parent diffs, but
nothing quite gave the results I expected.

Thanks!

Regards,
Jake

Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-03 Thread Johannes Schindelin

Hi Stefan,

On Thu, 3 May 2018, Stefan Beller wrote:

> On Thu, May 3, 2018 at 1:42 PM, Johannes Schindelin
>  wrote:
> 
> >> Speaking of colors, for origin/sb/blame-color Junio hinted at re-using
> >> cyan for "uninteresting" parts to deliver a consistent color scheme for
> >> Git. Eventually he dreams of having 2 layers of indirection IIUC, with
> >> "uninteresting" -> cyan
> >> "repeated lines in blame" -> uninteresting
> >>
> >> Maybe we can fit the coloring of this tool in this scheme, too?
> >
> > Sure. So you mean I should use cyan for... what part of the colored
> > output? ;-)
> 
> It is just a FYI heads up, not an actionable bikeshed painting plan. ;)

Oh, I did not understand it as bike-shedding at all. I had hoped that you
ran `git branch-diff --dual-color` on something interesting and found e.g.
the yellow color inherited from DIFF_COMMIT to be the wrong color for
unchanged commit pairs.

So please: as soon as you have a concrete suggestion where to use cyan
(and preferably even a DIFF_* constant to feed to diff_get_color_opt()), I
will be more than interested.

> >> Do we need to dynamic of a floating point, or would a rather small range
> >> suffice here? (Also see rename detection settings, that take percents as
> >> integers)
> >
> > I guess you are right, and we do not need floats. It was just very, very
> > convenient to do that instead of using integers because
> >
> > - I already had the Jonker-Volgenant implementation "lying around" from my
> >   previous life as an image processing expert, using doubles (but it was
> >   in Java, not in C, so I quickly converted it for branch-diff).
> >
> > - I was actually not paying attention whether divisions are a thing in the
> >   algorithm. From a cursory glance, it would appear that we are never
> >   dividing in hungarian.c, so theoretically integers should be fine.
> >
> > - using doubles neatly side-steps the overflow problem. If I use integers
> >   instead, I always will have to worry what to do if, say, adding
> >   `INT_MAX` to `INT_MAX`.
> >
> > I am particularly worried about that last thing: it could easily lead to
> > incorrect results if we blindly, say, pretend that `INT_MAX + INT_MAX ==
> > INT_MAX` for the purpose of avoiding overflows.
> >
> > If, however, I misunderstood and you are only concerned about using
> > *double-precision* floating point numbers, and would suggest using `float`
> > typed variables instead, that would be totally cool with me.
> 
> So by being worried about INT_MAX occurring, you are implying that
> we have to worry about a large range of values, so maybe floating points
> are the best choice here.

I am not really worried about a large range of values, I am worried about
a use case where we use the maximal value as an "impossible, must avoid at
all cost" value. See this line in hungarian.c:

u2 = DBL_MAX;

It does not seem as if any arithmetic is done on u2 after that
(theoretically, it should not survive the loop that comes after it and
tries to replace u2 with any smaller value it finds, but what if that loop
does not even run because column_count == 1? Sure, it is a pathological
case, but even those should have correct results).

But actually, yes, there *is* arithmetic performed on u2:

if (u1 < u2)
v[j1] -= u2 - u1;

So in the pathological case where we try to find the best assignment of a
single column to an arbitrary number of rows (where simply the row with
a smallest cost should be picked), we try to subtract from v[j1] an
insanely large value. As a consequence, v[j1] will be close to INT_MIN if
we were to switch to integers, and who is to say that the next time we get
to this part, j1 will be different? If it is the same, and we hit the same
u2, then we might end up subtracting something close to INT_MAX from
INT_MIN, which will definitely overflow and the computation will be
incorrect.

*That* is what I am worried about: overflowing integer arithmetic. IIRC if
you subtract DBL_MAX from -DBL_MAX, you still end up with -DBL_MAX. So in
that respect, using floating point numbers here is safer.

> Looking through that algorithm the costs seem to be integers only
> measuring number of lines, so I would not be too worried about running
> into INT_MAX problems except for the costs that are assigned INT_MAX
> explicitly.
> 
> I was more asking, if floating point is the right tool for the job.

I think I would have to spend some real quality time with the code in
hungarian.c to turn it into using integer costs instead of floating point
numbers, to ensure that the arithmetic is done in a way that is consistent
with the algorithm, even if we cannot represent the arithmetic faithfully
with limited-range integers.

I'll think about the best way forward.

Ciao,
Dscho

[GSoC] A blog about 'git stash' project

2018-05-03 Thread Paul-Sebastian Ungureanu


Hello everybody,

The community bonding period started. It is well known that for a 
greater rate of success, it is recommended to send weekly reports 
regarding project status. But, what would be a good form for such a 
report? I, for one, think that starting a blog is one of the best 
options because all the content will be stored in one place. Without 
further ado, I would like you to present my blog [1].


Any feedback is greatly appreciated! Thank you!

[1]
https://ungps.github.io/

Best regards,
Paul Ungureanu

Re: [PATCH 03/18] branch-diff: first rudimentary implementation

2018-05-03 Thread Stefan Beller

> To be honest, the main reason I spawn here is that I did not want to be
> bothered with resetting the commit flags after traversing the first commit
> range. But I guess I was just too cheap and should really do it?

Oh right, you'd have to do multiple revision walks.

> OTOH spawning here is a lot easier than not spawning, so maybe it would be
> premature optimization?

Most likely.

>
>> In addition to that patch, we'd have to buffer commit messages
>> and buffer multiple commits, as that only buffers a diff of a single
>> commit.
>
> ... and make sure that the moved-code logic (which is currently the only
> user of emitted_symbols, correct?) would never be called at the same time
> as we generate the diff.

The moved detection is all part of the flags of an emitted symbol.

By design the emitted symbol has easy access to the raw line of the output,
which made it easy for the move detection to work on the lines. AFAICT this
is also desired here as lines are put into a hashmap for comparisons.
(and having it colored differently would make finding the same line
complex using hashmaps)

I just entertain the thought of having move detection active in a
branch-diff. That would be really cool actually.

>
>> The benefit would be no invocation of new processes, letting us
>> do more in core. This would allow for tweaking revision walking
>> internally, e.g. passing of options to this command such as rename
>> detection factors, can be passed through easily without the need
>> of translating it back to the command line.
>
> On the other hand, we can simply copy those options to the command-line
> for `log`. Which might even be better, as e.g. `--format` changes global
> state :-(

ok.

Thanks for your patience,
Stefan

RE: [PATCH v2 0/5] Allocate cache entries from memory pool

2018-05-03 Thread Jameson Miller



> -Original Message-
> From: git-ow...@vger.kernel.org  On Behalf Of
> Stefan Beller
> Sent: Thursday, May 3, 2018 4:59 PM
> To: Duy Nguyen 
> Cc: Jameson Miller ; git@vger.kernel.org;
> gits...@pobox.com; jonathanta...@google.com
> Subject: Re: [PATCH v2 0/5] Allocate cache entries from memory pool
> 
> On Thu, May 3, 2018 at 12:17 PM, Duy Nguyen  wrote:
> >
> >> To me it is also a clear yes when it comes to combining these two
> >> memory pools.
> >
> > I also did not notice that jm/mem-pool already landed in master.
> 
> Oh, thanks for telling! Now that I look at it, I am doubting it;
> 
> The reason for my doubt is the potential quadratic behavior for new 
> allocations,
> in mem_pool_alloc() we walk all mp_blocks to see if we can fit the requested
> allocation in one of the later blocks.
> So if we call mem_pool_alloc a million times, we get a O(n) mp_blocks which
> we'd have to walk in each call.

With the current design, when a new mp_block is allocated, it is
placed at the head of the linked list. This means that the most
recently allocated mp_block is the 1st block that is
searched. The *vast* majority of allocations should be fulfilled
from this 1st block. It is only when the block is full that we
search other mp_blocks in the list. If this is a concern, I think
we have a couple low cost options to mitigate it (maybe a flag to
control whether we search past the 1st mp_block for space, or
logic to move blocks out of the search queue when they are
full or fall below a threshold for available space).

If this is of interest, I could contribute a patch to enable one
of these behaviors?

> 
> However in alloc.c we do know that a slab is full as soon as we look take the
> next slab. That is the beauty of knowing 'len' at construction time of the
> allocator.
> 
> So I guess I'll just re-use the mp_block and introduce another struct
> fixed_sized_mem_pool, which will not look into other mp_blocks but the
> current.
> 
> 
> > Have
> > you tried measure (both memory usage and allocation speed) of it and
> > alloc.c?
> 
> No, I was about to, but then started reading the code in an attempt to replace
> alloc.c by a mempool and saw the quadratic behavior.
> 
> > Just take some big repo as an example and do count-objects -v to see
> > how many blobs/trees/commits it has, then allocate the same amount
> > with both alloc.c and mem-pool.c and measure both speed/mem.
> > I'm pretty sure you're right that mem-pool.c is a clear yes. I was
> > just being more conservative because we do (slightly) change
> > allocator's behavior when we make the switch. But it's also very
> > likely that any performance difference will be insignificant.
> >
> > I'm asking this because if mem-pool.c is a clear winner, you can start
> > to update you series to use it now and kill alloc.c in the process.
> 
> I'll implement the fixed_sized_mem_pool and take some measurements.
> 
> >
> > PS. Is Jeff back yet?
> 
> His last email on the public list is Apr 10th, stating that he'll be offline 
> for "a few
> weeks", in <20180406175349.gb32...@sigill.intra.peff.net> he said the
> vacation part is 3 weeks. So I think he is done with vacation and is just 
> hiding to
> figure out a nice comeback. ;-)
> 
> > I'm sure Junio is listening and all but I'm afraid he's too busy being
> > a maintainer so Jeff's opinion in this area is really valuable. He has
> > all the fun and weird use cases to play with at github.
> 
> ok. I'll cc him for these patches.
> 
> Thanks,
> Stefan

Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-03 Thread Stefan Beller

On Thu, May 3, 2018 at 1:42 PM, Johannes Schindelin
 wrote:

>> Speaking of colors, for origin/sb/blame-color Junio hinted at re-using
>> cyan for "uninteresting" parts to deliver a consistent color scheme for
>> Git. Eventually he dreams of having 2 layers of indirection IIUC, with
>> "uninteresting" -> cyan
>> "repeated lines in blame" -> uninteresting
>>
>> Maybe we can fit the coloring of this tool in this scheme, too?
>
> Sure. So you mean I should use cyan for... what part of the colored
> output? ;-)
>

It is just a FYI heads up, not an actionable bikeshed painting plan. ;)

>> Do we need to dynamic of a floating point, or would a rather small range
>> suffice here? (Also see rename detection settings, that take percents as
>> integers)
>
> I guess you are right, and we do not need floats. It was just very, very
> convenient to do that instead of using integers because
>
> - I already had the Jonker-Volgenant implementation "lying around" from my
>   previous life as an image processing expert, using doubles (but it was
>   in Java, not in C, so I quickly converted it for branch-diff).
>
> - I was actually not paying attention whether divisions are a thing in the
>   algorithm. From a cursory glance, it would appear that we are never
>   dividing in hungarian.c, so theoretically integers should be fine.
>
> - using doubles neatly side-steps the overflow problem. If I use integers
>   instead, I always will have to worry what to do if, say, adding
>   `INT_MAX` to `INT_MAX`.
>
> I am particularly worried about that last thing: it could easily lead to
> incorrect results if we blindly, say, pretend that `INT_MAX + INT_MAX ==
> INT_MAX` for the purpose of avoiding overflows.
>
> If, however, I misunderstood and you are only concerned about using
> *double-precision* floating point numbers, and would suggest using `float`
> typed variables instead, that would be totally cool with me.

So by being worried about INT_MAX occurring, you are implying that
we have to worry about a large range of values, so maybe floating points
are the best choice here.

Looking through that algorithm the costs seem to be integers only
measuring number of lines, so I would not be too worried about running
into INT_MAX problems except for the costs that are assigned INT_MAX
explicitly.

I was more asking, if floating point is the right tool for the job.

Stefan

Re: [PATCH 00/18] Add `branch-diff`, a `tbdiff` lookalike

2018-05-03 Thread Johannes Schindelin

Hi Ævar,

On Thu, 3 May 2018, Ævar Arnfjörð Bjarmason wrote:

> On Thu, May 03 2018, Johannes Schindelin wrote:
> 
> > The incredibly useful `git-tbdiff` tool to compare patch series (say,
> > to see what changed between two iterations sent to the Git mailing
> > list) is slightly less useful for this developer due to the fact that
> > it requires the `hungarian` and `numpy` Python packages which are for
> > some reason really hard to build in MSYS2. So hard that I even had to
> > give up, because it was simply easier to reimplement the whole shebang
> > as a builtin command.
> >
> > The project at https://github.com/trast/tbdiff seems to be dormant,
> > anyway.  Funny (and true) story: I looked at the open Pull Requests to
> > see how active that project is, only to find to my surprise that I had
> > submitted one in August 2015, and that it was still unanswered let
> > alone merged.
> 
> I've been using branch-diff and haven't found issues with it yet, it
> works like tbdiff but better. Faster, uses the same diff as git
> (better), and spews to the pager by default.

Thanks for your enthusiasm!
Dscho

Re: [PATCH 11/18] branch-diff: add tests

2018-05-03 Thread Johannes Schindelin

Hi Stefan,

On Thu, 3 May 2018, Stefan Beller wrote:

> On Thu, May 3, 2018 at 8:30 AM, Johannes Schindelin
>  wrote:
> > From: Thomas Rast 
> >
> > These are essentially lifted from https://github.com/trast/tbdiff, with
> > light touch-ups to account for the new command name.
> >
> > Apart from renaming `tbdiff` to `branch-diff`, only one test case needed
> > to be adjusted: 11 - 'changed message'.
> >
> > The underlying reason it had to be adjusted is that diff generation is
> > sometimes ambiguous. In this case, a comment line and an empty line are
> > added, but it is ambiguous whether they were added after the existing
> > empty line, or whether an empty line and the comment line are added
> > *before* the existing emtpy line. And apparently xdiff picks a different
> > option here than Python's difflib.
> 
> I think that is the fallout of the diff heuristics. If you are keen on
> a 1:1 port, you can disable the diff sliding heuristics and it should
> produce the same diff with trailing new lines.

I am not keen on a 1:1 port. I am fine with having xdiff generate
different diffs than Python's difflib. That's par for the course when
relying on a not-quite-well-defined metric.

Ciao,
Dscho

Re: [PATCH 11/18] branch-diff: add tests

2018-05-03 Thread Johannes Schindelin

Hi Ævar,

On Thu, 3 May 2018, Ævar Arnfjörð Bjarmason wrote:

> On Thu, May 03 2018, Johannes Schindelin wrote:
> 
> > *before* the existing emtpy line. And apparently xdiff picks a different
> 
> s/emtpy/empty/

Thanks for the spell check!
Dscho

Re: [PATCH 03/18] branch-diff: first rudimentary implementation

2018-05-03 Thread Johannes Schindelin

Hi Stefan,

On Thu, 3 May 2018, Stefan Beller wrote:

> On Thu, May 3, 2018 at 8:30 AM, Johannes Schindelin
>  wrote:
> 
> > Note: due to differences in the diff algorithm (`tbdiff` uses the
> > Pythong module `difflib`, Git uses its xdiff fork), the cost matrix
> > calculated by `branch-diff` is different (but very similar) to the one
> > calculated by `tbdiff`. Therefore, it is possible that they find
> > different matching commits in corner cases (e.g. when a patch was
> > split into two patches of roughly equal length).
> 
> Does that mean, we may want to tweak the underlying diff parameters for
> this special use case eventually?

I don't think that will be necessary. Generating diffs is an ambiguous
business, after all, and we just have to live with the consequence that it
might even be possible that the cost is non-symmetric, i.e. that the
length (i.e. line count) of the diff is different depending whether we
compare old patch to new patch vs new patch to old patch.

If the result changes due to these vagaries, it means that there is no
single one good answer to the question which old/new commits form a pair.
I would expect that only to happen if a commit with a lengthy diff is cut
into two commits whose diffs have roughly equal lengths (so that the
difference of the commit message won't matter that much).

> > -#define COLOR_DUAL_MODE 2
> > -
> 
> Leave this out in the first patch?

Yep, as Ramsay said.

> > @@ -19,6 +23,279 @@ static int parse_creation_weight(const struct option 
> > *opt, const char *arg,
> > return 0;
> >  }
> >
> > +struct patch_util {
> > +   /* For the search for an exact match */
> > +   struct hashmap_entry e;
> > +   const char *diff, *patch;
> > +
> > +   int i;
> > +   int diffsize;
> > +   size_t diff_offset;
> > +   /* the index of the matching item in the other branch, or -1 */
> > +   int matching;
> > +   struct object_id oid;
> > +};
> > +
> > +/*
> > + * Reads the patches into a string list, with the `util` field being 
> > populated
> > + * as struct object_id (will need to be free()d).
> > + */
> > +static int read_patches(const char *range, struct string_list *list)
> > +{
> > +   struct child_process cp = CHILD_PROCESS_INIT;
> > +   FILE *in;
> > +   struct strbuf buf = STRBUF_INIT, line = STRBUF_INIT;
> > +   struct patch_util *util = NULL;
> > +   int in_header = 1;
> > +
> > +   argv_array_pushl(, "log", "--no-color", "-p", "--no-merges",
> > +   "--reverse", "--date-order", "--decorate=no",
> > +   "--no-abbrev-commit", range,
> > +   NULL);
> > +   cp.out = -1;
> > +   cp.no_stdin = 1;
> > +   cp.git_cmd = 1;
> > +
> > +   if (start_command())
> > +   return error_errno(_("could not start `log`"));
> > +   in = fdopen(cp.out, "r");
> > +   if (!in) {
> > +   error_errno(_("could not read `log` output"));
> > +   finish_command();
> > +   return -1;
> > +   }
> 
> With the implementation of --color-moved, there is an option
> to buffer all diff output in memory e6e045f8031 (diff.c: buffer
> all output if asked to, 2017-06-29), so I posit that running this
> diff in-core may be "not too hard". Famous last words.

True. I *did* stumble over emitted_symbols and thought that there might be
a way to leverage it, but I did not find any existing knob, and did not
want to interfere with any other user case of that feature (which we may
want to allow combining with branch-diff one day, afer all).

To be honest, the main reason I spawn here is that I did not want to be
bothered with resetting the commit flags after traversing the first commit
range. But I guess I was just too cheap and should really do it?

OTOH spawning here is a lot easier than not spawning, so maybe it would be
premature optimization?

> In addition to that patch, we'd have to buffer commit messages
> and buffer multiple commits, as that only buffers a diff of a single
> commit.

... and make sure that the moved-code logic (which is currently the only
user of emitted_symbols, correct?) would never be called at the same time
as we generate the diff.

> The benefit would be no invocation of new processes, letting us
> do more in core. This would allow for tweaking revision walking
> internally, e.g. passing of options to this command such as rename
> detection factors, can be passed through easily without the need
> of translating it back to the command line.

On the other hand, we can simply copy those options to the command-line
for `log`. Which might even be better, as e.g. `--format` changes global
state :-(

> > +
> > +   if (starts_with(line.buf, "diff --git")) {
> 
> When using the internal buffers, you would not need to
> string compare, but could just check for the
> DIFF_SYMBOL_HEADER.

True. Not sure whether the current way is *that* terrible,

Re: [PATCH v2 0/5] Allocate cache entries from memory pool

2018-05-03 Thread Stefan Beller

On Thu, May 3, 2018 at 12:17 PM, Duy Nguyen  wrote:
>
>> To me it is also a clear yes when it comes to combining these
>> two memory pools.
>
> I also did not notice that jm/mem-pool already landed in master.

Oh, thanks for telling! Now that I look at it, I am doubting it;

The reason for my doubt is the potential quadratic behavior for
new allocations, in mem_pool_alloc() we walk all mp_blocks to
see if we can fit the requested allocation in one of the later blocks.
So if we call mem_pool_alloc a million times, we get a O(n)
mp_blocks which we'd have to walk in each call.

However in alloc.c we do know that a slab is full as soon as we
look take the next slab. That is the beauty of knowing 'len' at
construction time of the allocator.

So I guess I'll just re-use the mp_block and introduce another
struct fixed_sized_mem_pool, which will not look into other mp_blocks
but the current.

> Have
> you tried measure (both memory usage and allocation speed) of it and
> alloc.c?

No, I was about to, but then started reading the code in an attempt to replace
alloc.c by a mempool and saw the quadratic behavior.

> Just take some big repo as an example and do count-objects -v
> to see how many blobs/trees/commits it has, then allocate the same
> amount with both alloc.c and mem-pool.c and measure both speed/mem.
> I'm pretty sure you're right that mem-pool.c is a clear yes. I was
> just being more conservative because we do (slightly) change
> allocator's behavior when we make the switch. But it's also very
> likely that any performance difference will be insignificant.
>
> I'm asking this because if mem-pool.c is a clear winner, you can start
> to update you series to use it now and kill alloc.c in the process.

I'll implement the fixed_sized_mem_pool and take some measurements.

>
> PS. Is Jeff back yet?

His last email on the public list is Apr 10th, stating that he'll be offline for
"a few weeks", in <20180406175349.gb32...@sigill.intra.peff.net> he
said the vacation part is 3 weeks. So I think he is done with vacation and
is just hiding to figure out a nice comeback. ;-)

> I'm sure Junio is listening and all but I'm
> afraid he's too busy being a maintainer so Jeff's opinion in this area
> is really valuable. He has all the fun and weird use cases to play
> with at github.

ok. I'll cc him for these patches.

Thanks,
Stefan

Re: [PATCH 03/18] branch-diff: first rudimentary implementation

2018-05-03 Thread Johannes Schindelin

Hi Ramsay,

On Thu, 3 May 2018, Ramsay Jones wrote:

> On 03/05/18 16:30, Johannes Schindelin wrote:
> > At this stage, `git branch-diff` can determine corresponding commits of
> > two related commit ranges. This makes use of the recently introduced
> > implementation of the Hungarian algorithm.
> > 
> > The core of this patch is a straight port of the ideas of tbdiff, the
> > seemingly dormant project at https://github.com/trast/tbdiff.
> > 
> > The output does not at all match `tbdiff`'s output yet, as this patch
> > really concentrates on getting the patch matching part right.
> > 
> > Note: due to differences in the diff algorithm (`tbdiff` uses the
> > Pythong module `difflib`, Git uses its xdiff fork), the cost matrix
> > calculated by `branch-diff` is different (but very similar) to the one
> > calculated by `tbdiff`. Therefore, it is possible that they find
> > different matching commits in corner cases (e.g. when a patch was split
> > into two patches of roughly equal length).
> > 
> > Signed-off-by: Johannes Schindelin 
> > ---
> >  builtin/branch-diff.c | 337 +-
> >  1 file changed, 334 insertions(+), 3 deletions(-)
> > 
> > diff --git a/builtin/branch-diff.c b/builtin/branch-diff.c
> > index 97266cd326d..02dc06a57ca 100644
> > --- a/builtin/branch-diff.c
> > +++ b/builtin/branch-diff.c
> > @@ -1,13 +1,17 @@
> >  #include "cache.h"
> >  #include "parse-options.h"
> > +#include "string-list.h"
> > +#include "run-command.h"
> > +#include "argv-array.h"
> > +#include "hashmap.h"
> > +#include "xdiff-interface.h"
> > +#include "hungarian.h"
> >  
> >  static const char * const builtin_branch_diff_usage[] = {
> > N_("git rebase--helper [] ( A..B C..D | A...B | base A B )"),
> > NULL
> >  };
> >  
> > -#define COLOR_DUAL_MODE 2
> > -
> 
> This #define was introduced in the previous patch, without being
> used in that patch, and is now deleted here.

Darn, darn, darn. You know, this macro simply won't die. I tried to kill
it off but it keeps showing its ugly head everywhere.

I *think* I finally killed it for good.

Thanks!
Dscho

Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-03 Thread Johannes Schindelin

Hi Stefan,

On Thu, 3 May 2018, Stefan Beller wrote:

> On Thu, May 3, 2018 at 8:30 AM, Johannes Schindelin
>  wrote:
> > This builtin does not do a whole lot so far, apart from showing a
> > usage that is oddly similar to that of `git tbdiff`. And for a good
> > reason: the next commits will turn `branch-diff` into a full-blown
> > replacement for `tbdiff`.
> 
> While I appreciate the 1:1 re-implementation, I'll comment as if this
> was a newly invented tool, questioning design choices. They are probably
> chosen pretty well, and fudge facotrs as below are at tweaked to a
> reasonable factor, but I'll try to look with fresh eyes.

Absolutely. While tbdiff got some testing over time, it has definitely not
gotten as much exposure as branch-diff hopefully will.

BTW I chose a different command name on purpose, so that we are free to
change the design and not harm existing tbdiff users.

> > At this point, we ignore tbdiff's color options, as they will all be
> > implemented later and require some patches to the diff machinery.
> 
> Speaking of colors, for origin/sb/blame-color Junio hinted at re-using
> cyan for "uninteresting" parts to deliver a consistent color scheme for
> Git. Eventually he dreams of having 2 layers of indirection IIUC, with
> "uninteresting" -> cyan
> "repeated lines in blame" -> uninteresting
> 
> Maybe we can fit the coloring of this tool in this scheme, too?

Sure. So you mean I should use cyan for... what part of the colored
output? ;-)

> > +   double creation_weight = 0.6;
> 
> I wondered if we use doubles in Gits code base at all,
> and I found
> 
> khash.h:59:static const double __ac_HASH_UPPER = 0.77;
> pack-bitmap-write.c:248:static const double
> REUSE_BITMAP_THRESHOLD = 0.2;
> pack-bitmap.c:751:  static const double REUSE_PERCENT = 0.9;
> 
> all other occurrences of `git grep double` are mentioning it in other
> contexts (e.g. "double linked list" or comments).
> 
> When implementing diff heuristics in 433860f3d0b (diff: improve
> positioning of add/delete blocks in diffs, 2016-09-05), Michael broke
> it down to fixed integers instead of floating point.
> 
> Do we need to dynamic of a floating point, or would a rather small range
> suffice here? (Also see rename detection settings, that take percents as
> integers)

I guess you are right, and we do not need floats. It was just very, very
convenient to do that instead of using integers because

- I already had the Jonker-Volgenant implementation "lying around" from my
  previous life as an image processing expert, using doubles (but it was
  in Java, not in C, so I quickly converted it for branch-diff).

- I was actually not paying attention whether divisions are a thing in the
  algorithm. From a cursory glance, it would appear that we are never
  dividing in hungarian.c, so theoretically integers should be fine.

- using doubles neatly side-steps the overflow problem. If I use integers
  instead, I always will have to worry what to do if, say, adding
  `INT_MAX` to `INT_MAX`.

I am particularly worried about that last thing: it could easily lead to
incorrect results if we blindly, say, pretend that `INT_MAX + INT_MAX ==
INT_MAX` for the purpose of avoiding overflows.

If, however, I misunderstood and you are only concerned about using
*double-precision* floating point numbers, and would suggest using `float`
typed variables instead, that would be totally cool with me.

Ciao,
Dscho

Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-03 Thread Johannes Schindelin

Hi Duy,

On Thu, 3 May 2018, Johannes Schindelin wrote:

> On Thu, 3 May 2018, Duy Nguyen wrote:
> 
> > On Thu, May 3, 2018 at 5:30 PM, Johannes Schindelin
> >  wrote:
> > > diff --git a/command-list.txt b/command-list.txt
> > > index a1fad28fd82..c89ac8f417f 100644
> > > --- a/command-list.txt
> > > +++ b/command-list.txt
> > > @@ -19,6 +19,7 @@ git-archive mainporcelain
> > >  git-bisect  mainporcelain   info
> > >  git-blame   ancillaryinterrogators
> > >  git-branch  mainporcelain   history
> > > +git-branch-diff mainporcelain   info
> > 
> > Making it part of "git help" with the info keywords at this stage may
> > be premature. "git help" is about _common_ commands and we don't know
> > (yet) how popular this will be.
> 
> Makes sense. I removed the `mainporcelain` keyword locally.

On second thought, I *think* you meant to imply that I should remove that
line altogether. Will do that now.

Ciao,
Dscho

Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-03 Thread Johannes Schindelin

Hi Duy,

On Thu, 3 May 2018, Duy Nguyen wrote:

> On Thu, May 3, 2018 at 5:30 PM, Johannes Schindelin
>  wrote:
> > diff --git a/command-list.txt b/command-list.txt
> > index a1fad28fd82..c89ac8f417f 100644
> > --- a/command-list.txt
> > +++ b/command-list.txt
> > @@ -19,6 +19,7 @@ git-archive mainporcelain
> >  git-bisect  mainporcelain   info
> >  git-blame   ancillaryinterrogators
> >  git-branch  mainporcelain   history
> > +git-branch-diff mainporcelain   info
> 
> Making it part of "git help" with the info keywords at this stage may
> be premature. "git help" is about _common_ commands and we don't know
> (yet) how popular this will be.

Makes sense. I removed the `mainporcelain` keyword locally.

> (I'm not complaining though, I almost wrote "what witchcraft is this
> and where can I get it" when I see branch-diff output mentioned in
> AEvar's mail)

Yeah, it is pretty neat.

Ciao,
Dscho

Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-03 Thread Johannes Schindelin

Hi Ramsay,

On Thu, 3 May 2018, Ramsay Jones wrote:

> On 03/05/18 16:30, Johannes Schindelin wrote:
> > This builtin does not do a whole lot so far, apart from showing a usage
> > that is oddly similar to that of `git tbdiff`. And for a good reason:
> > the next commits will turn `branch-diff` into a full-blown replacement
> > for `tbdiff`.
> > 
> > At this point, we ignore tbdiff's color options, as they will all be
> > implemented later and require some patches to the diff machinery.
> > 
> > Signed-off-by: Johannes Schindelin 
> > ---
> >  .gitignore|  1 +
> >  Makefile  |  1 +
> >  builtin.h |  1 +
> >  builtin/branch-diff.c | 40 
> >  command-list.txt  |  1 +
> >  git.c |  1 +
> >  6 files changed, 45 insertions(+)
> >  create mode 100644 builtin/branch-diff.c
> > 
> > diff --git a/.gitignore b/.gitignore
> > index 833ef3b0b78..1346a64492f 100644
> > --- a/.gitignore
> > +++ b/.gitignore
> > @@ -20,6 +20,7 @@
> >  /git-bisect--helper
> >  /git-blame
> >  /git-branch
> > +/git-branch-diff
> >  /git-bundle
> >  /git-cat-file
> >  /git-check-attr
> > diff --git a/Makefile b/Makefile
> > index 96f2e76a904..9b1984776d8 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -953,6 +953,7 @@ BUILTIN_OBJS += builtin/archive.o
> >  BUILTIN_OBJS += builtin/bisect--helper.o
> >  BUILTIN_OBJS += builtin/blame.o
> >  BUILTIN_OBJS += builtin/branch.o
> > +BUILTIN_OBJS += builtin/branch-diff.o
> >  BUILTIN_OBJS += builtin/bundle.o
> >  BUILTIN_OBJS += builtin/cat-file.o
> >  BUILTIN_OBJS += builtin/check-attr.o
> > diff --git a/builtin.h b/builtin.h
> > index 42378f3aa47..e1c4d2a529a 100644
> > --- a/builtin.h
> > +++ b/builtin.h
> > @@ -135,6 +135,7 @@ extern int cmd_archive(int argc, const char **argv, 
> > const char *prefix);
> >  extern int cmd_bisect__helper(int argc, const char **argv, const char 
> > *prefix);
> >  extern int cmd_blame(int argc, const char **argv, const char *prefix);
> >  extern int cmd_branch(int argc, const char **argv, const char *prefix);
> > +extern int cmd_branch_diff(int argc, const char **argv, const char 
> > *prefix);
> >  extern int cmd_bundle(int argc, const char **argv, const char *prefix);
> >  extern int cmd_cat_file(int argc, const char **argv, const char *prefix);
> >  extern int cmd_checkout(int argc, const char **argv, const char *prefix);
> > diff --git a/builtin/branch-diff.c b/builtin/branch-diff.c
> > new file mode 100644
> > index 000..97266cd326d
> > --- /dev/null
> > +++ b/builtin/branch-diff.c
> > @@ -0,0 +1,40 @@
> > +#include "cache.h"
> > +#include "parse-options.h"
> > +
> > +static const char * const builtin_branch_diff_usage[] = {
> > +   N_("git rebase--helper [] ( A..B C..D | A...B | base A B )"),
> 
> s/rebase--helper/branch-diff/

Whoops!

BTW funny side note: when I saw that you replied, I instinctively thought
"oh no, I forgot to mark a function as `static`!" ;-)

Thank you for helping me improve the patches,
Dscho

Re: [PATCH v2 3/3] {fetch,upload}-pack: support filter in protocol v2

2018-05-03 Thread Stefan Beller

Hi Jonathan,

On Tue, May 1, 2018 at 5:31 PM, Jonathan Tan  wrote:
> The fetch-pack/upload-pack protocol v2 was developed independently of
> the filter parameter (used in partial fetches), thus it did not include
> support for it. Add support for the filter parameter.
>
> Like in the legacy protocol, the server advertises and supports "filter"
> only if uploadpack.allowfilter is configured.
>
> Like in the legacy protocol, the client continues with a warning if
> "--filter" is specified, but the server does not advertise it.
>
> Signed-off-by: Jonathan Tan 

The whole series looks good,

Thanks,
Stefan

Re: [PATCH v2] git: add -P as a short option for --no-pager

2018-05-03 Thread Eric Sunshine

On Thu, May 3, 2018 at 1:15 PM, Johannes Sixt  wrote:
> It is possible to configure 'less', the pager, to use an alternate
> screen to show the content, for example, by setting LESS=RS in the
> environment. When it is closed in this configuration, it switches
> back to the original screen, and all content is gone.
>
> It is not uncommon to request that the output remains visible in
> the terminal. For this, the option --no-pager can be used. But
> it is a bit cumbersome to type, even when command completion is
> available. Provide a short option, -P, to make the option easier
> accessible.

s/easier accessible/easier to access/
--- or ---
s/easier accessible/more easily accessible/
--- or ---
s/easier accessible/more accessible/

The patch itself looks fine.

> Signed-off-by: Johannes Sixt

Re: [PATCH v2 0/5] Allocate cache entries from memory pool

2018-05-03 Thread Duy Nguyen

On Thu, May 3, 2018 at 7:21 PM, Stefan Beller  wrote:
> On Thu, May 3, 2018 at 9:35 AM, Duy Nguyen  wrote:
>> On Mon, Apr 30, 2018 at 5:31 PM, Jameson Miller  wrote:
>>> This patch series improves the performance of loading indexes by
>>> reducing the number of malloc() calls. Loading the index from disk is
>>> partly dominated by the time in malloc(), which is called for each
>>> index entry. This patch series reduces the number of times malloc() is
>>> called as part of loading the index, and instead allocates a block of
>>> memory upfront that is large enough to hold all of the cache entries,
>>> and chunks this memory itself. This change builds on [1].
>>
>> I have only looked at the mem-pool related patches to see if
>> mem-pool.c is good enough to replace alloc.c. To me, it's a "yes"
>> after we optimize mem_pool_alloc() a bit (not that performance really
>> matters in alloc.c case, but that may be because it's already
>> blazingly fast that we never noticed about it).
>
> alloc.c was not just about speed, but mostly about dense packing?
> 855419f764a (Add specialized object allocator, 2006-06-19)

I know. I vaguely remembered Linus made that change but did not really
look it up :) That reference should be included when/if you switch
from alloc.c to mem-pool.c.

> To me it is also a clear yes when it comes to combining these
> two memory pools.

I also did not notice that jm/mem-pool already landed in master. Have
you tried measure (both memory usage and allocation speed) of it and
alloc.c? Just take some big repo as an example and do count-objects -v
to see how many blobs/trees/commits it has, then allocate the same
amount with both alloc.c and mem-pool.c and measure both speed/mem.
I'm pretty sure you're right that mem-pool.c is a clear yes. I was
just being more conservative because we do (slightly) change
allocator's behavior when we make the switch. But it's also very
likely that any performance difference will be insignificant.

I'm asking this because if mem-pool.c is a clear winner, you can start
to update you series to use it now and kill alloc.c in the process.

PS. Is Jeff back yet? I'm sure Junio is listening and all but I'm
afraid he's too busy being a maintainer so Jeff's opinion in this area
is really valuable. He has all the fun and weird use cases to play
with at github.
-- 
Duy

Re: [PATCH v2 2/3] upload-pack: read config when serving protocol v2

2018-05-03 Thread Stefan Beller

On Tue, May 1, 2018 at 5:31 PM, Jonathan Tan  wrote:
> The upload-pack code paths never call git_config() with
> upload_pack_config() when protocol v2 is used, causing options like
> uploadpack.packobjectshook to not take effect. Ensure that this function
> is called.
>
> Signed-off-by: Jonathan Tan 
> ---
>  t/t5702-protocol-v2.sh | 14 ++
>  upload-pack.c  |  2 ++
>  2 files changed, 16 insertions(+)
>
> diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
> index 56f7c3c32..0ead3 100755
> --- a/t/t5702-protocol-v2.sh
> +++ b/t/t5702-protocol-v2.sh
> @@ -201,6 +201,20 @@ test_expect_success 'ref advertisment is filtered during 
> fetch using protocol v2
> ! grep "refs/tags/three" log
>  '
>
> +test_expect_success 'upload-pack respects config using protocol v2' '
> +   git init server &&
> +   write_script server/.git/hook <<-\EOF &&
> +   touch hookout
> +   "$@"
> +   EOF
> +   test_commit -C server one &&
> +
> +   test_config_global uploadpack.packobjectshook ./hook &&
> +   test ! -f server/.git/hookout &&

test_path_is_missing ?


> +   GIT_TRACE=/tmp/y git -c protocol.version=2 clone 
> "file://$(pwd)/server" client &&

Why do we redirect GIT_TRACE outside the test suite? do we read that
back or want to read it out of the hook?

Is it possible to redirect to  /$(pwd)/trace or such?

> +   test -f server/.git/hookout

test_path_is_file ?


> +'
> +
>  # Test protocol v2 with 'http://' transport
>  #
>  . "$TEST_DIRECTORY"/lib-httpd.sh
> diff --git a/upload-pack.c b/upload-pack.c
> index c4456bb88..113edd32d 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -1376,6 +1376,8 @@ int upload_pack_v2(struct repository *r, struct 
> argv_array *keys,
> enum fetch_state state = FETCH_PROCESS_ARGS;
> struct upload_pack_data data;
>
> +   git_config(upload_pack_config, NULL);
> +
> upload_pack_data_init();
> use_sideband = LARGE_PACKET_MAX;
>
> --
> 2.17.0.441.gb46fe60e1d-goog
>

Re: [PATCH v2 1/3] upload-pack: fix error message typo

2018-05-03 Thread Stefan Beller

On Tue, May 1, 2018 at 5:31 PM, Jonathan Tan  wrote:
> Fix a typo in an error message.
>
> Also, this line was introduced in 3145ea957d2c ("upload-pack: introduce
> fetch server command", 2018-03-15), which did not contain a test for the
> case which causes this error to be printed, so introduce a test.
>
> Signed-off-by: Jonathan Tan 

This is Reviewed-by: Stefan Beller 

> ---
>  t/t5701-git-serve.sh | 14 ++
>  upload-pack.c|  2 +-
>  2 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
> index 72d7bc562..1b4b13cc2 100755
> --- a/t/t5701-git-serve.sh
> +++ b/t/t5701-git-serve.sh
> @@ -173,4 +173,18 @@ test_expect_success 'symrefs parameter' '
> test_cmp actual expect
>  '
>
> +test_expect_success 'unexpected lines are not allowed in fetch request' '
> +   git init server &&
> +
> +   test-pkt-line pack >in <<-EOF &&
> +   command=fetch
> +   0001
> +   this-is-not-a-command
> +   
> +   EOF
> +
> +   test_must_fail git -C server serve --stateless-rpc /dev/null 
> 2>err &&

Minor nit:
Why do we pipe stdout to /dev/null ?

[PATCH v3 5/7] git-svn: remove ''--add-author-from' for 'commit-diff'

2018-05-03 Thread Andreas Heiduk

The subcommand 'commit-diff' does not support the option
'--add-author-from'.

Signed-off-by: Andreas Heiduk 
Signed-off-by: Eric Wong 
---
 Documentation/git-svn.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
index d59379ee23..e9615951d2 100644
--- a/Documentation/git-svn.txt
+++ b/Documentation/git-svn.txt
@@ -707,7 +707,7 @@ creating the branch or tag.
 config key: svn.useLogAuthor
 
 --add-author-from::
-   When committing to svn from Git (as part of 'commit-diff', 'set-tree' 
or 'dcommit'
+   When committing to svn from Git (as part of 'set-tree' or 'dcommit'
operations), if the existing log message doesn't already have a
`From:` or `Signed-off-by:` line, append a `From:` line based on the
Git commit's author string.  If you use this, then `--use-log-author`
-- 
2.16.2

[PATCH v3 3/7] doc: clarify ignore rules for git ls-files

2018-05-03 Thread Andreas Heiduk

Explain that `git ls-files --ignored` requires at least one
of the `--exclude*` options to do its job.

Signed-off-by: Andreas Heiduk 
---
 Documentation/git-ls-files.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index 3ac3e3a77d..f3474b2ede 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -53,7 +53,8 @@ OPTIONS
Show only ignored files in the output. When showing files in the
index, print only those matched by an exclude pattern. When
showing "other" files, show only those matched by an exclude
-   pattern.
+   pattern. Standard ignore rules are not automatically activated,
+   therefore at least one of the `--exclude*` options is required.
 
 -s::
 --stage::
-- 
2.16.2

[PATCH v3 6/7] doc: add note about shell quoting to revision.txt

2018-05-03 Thread Andreas Heiduk

Signed-off-by: Andreas Heiduk 
Reviewed-by: Junio C Hamano 
---
 Documentation/revisions.txt | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/revisions.txt b/Documentation/revisions.txt
index dfcc49c72c..e760416d07 100644
--- a/Documentation/revisions.txt
+++ b/Documentation/revisions.txt
@@ -7,6 +7,10 @@ syntax.  Here are various ways to spell object names.  The
 ones listed near the end of this list name trees and
 blobs contained in a commit.
 
+NOTE: This document shows the "raw" syntax as seen by git. The shell
+and other UIs might require additional quoting to protect special
+characters and to avoid word splitting.
+
 '', e.g. 'dae86e1950b1277e545cee180551750029cfe735', 'dae86e'::
   The full SHA-1 object name (40-byte hexadecimal string), or
   a leading substring that is unique within the repository.
@@ -186,6 +190,8 @@ existing tag object.
   is matched. ':/!-foo' performs a negative match, while ':/!!foo' matches a
   literal '!' character, followed by 'foo'. Any other sequence beginning with
   ':/!' is reserved for now.
+  Depending on the given text, the shell's word splitting rules might
+  require additional quoting.
 
 ':', e.g. 'HEAD:README', ':README', 'master:./README'::
   A suffix ':' followed by a path names the blob or tree
-- 
2.16.2

[PATCH v3 4/7] doc: add '-d' and '-o' for 'git push'

2018-05-03 Thread Andreas Heiduk

Add the missing `-o` shortcut for `--push-option` to the synopsis.
Add the missing `-d` shortcut for `--delete` in the main section.

Signed-off-by: Andreas Heiduk 
Reviewed-by: Martin Ågren 
---
 Documentation/git-push.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-push.txt b/Documentation/git-push.txt
index 5b08302fc2..f2bbda6e32 100644
--- a/Documentation/git-push.txt
+++ b/Documentation/git-push.txt
@@ -11,7 +11,7 @@ SYNOPSIS
 [verse]
 'git push' [--all | --mirror | --tags] [--follow-tags] [--atomic] [-n | 
--dry-run] [--receive-pack=]
   [--repo=] [-f | --force] [-d | --delete] [--prune] [-v | 
--verbose]
-  [-u | --set-upstream] [--push-option=]
+  [-u | --set-upstream] [-o  | --push-option=]
   [--[no-]signed|--signed=(true|false|if-asked)]
   [--force-with-lease[=[:]]]
   [--no-verify] [ [...]]
@@ -123,6 +123,7 @@ already exists on the remote side.
will be tab-separated and sent to stdout instead of stderr.  The full
symbolic names of the refs will be given.
 
+-d::
 --delete::
All listed refs are deleted from the remote repository. This is
the same as prefixing all refs with a colon.
-- 
2.16.2

[PATCH v3 7/7] doc: normalize [--options] to [options] in git-diff

2018-05-03 Thread Andreas Heiduk

SYNOPSIS and other manuals use [options] but DESCRIPTION
used [--options].

Signed-off-by: Andreas Heiduk 
---
 Documentation/git-diff.txt | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
index 6593b58299..7c2c442700 100644
--- a/Documentation/git-diff.txt
+++ b/Documentation/git-diff.txt
@@ -21,7 +21,7 @@ Show changes between the working tree and the index or a 
tree, changes
 between the index and a tree, changes between two trees, changes between
 two blob objects, or changes between two files on disk.
 
-'git diff' [--options] [--] [...]::
+'git diff' [options] [--] [...]::
 
This form is to view the changes you made relative to
the index (staging area for the next commit).  In other
@@ -29,7 +29,7 @@ two blob objects, or changes between two files on disk.
further add to the index but you still haven't.  You can
stage these changes by using linkgit:git-add[1].
 
-'git diff' [--options] --no-index [--]  ::
+'git diff' [options] --no-index [--]  ::
 
This form is to compare the given two paths on the
filesystem.  You can omit the `--no-index` option when
@@ -38,7 +38,7 @@ two blob objects, or changes between two files on disk.
or when running the command outside a working tree
controlled by Git.
 
-'git diff' [--options] --cached [] [--] [...]::
+'git diff' [options] --cached [] [--] [...]::
 
This form is to view the changes you staged for the next
commit relative to the named .  Typically you
@@ -48,7 +48,7 @@ two blob objects, or changes between two files on disk.
 is not given, it shows all staged changes.
--staged is a synonym of --cached.
 
-'git diff' [--options]  [--] [...]::
+'git diff' [options]  [--] [...]::
 
This form is to view the changes you have in your
working tree relative to the named .  You can
@@ -56,18 +56,18 @@ two blob objects, or changes between two files on disk.
branch name to compare with the tip of a different
branch.
 
-'git diff' [--options]   [--] [...]::
+'git diff' [options]   [--] [...]::
 
This is to view the changes between two arbitrary
.
 
-'git diff' [--options] .. [--] [...]::
+'git diff' [options] .. [--] [...]::
 
This is synonymous to the previous form.  If  on
one side is omitted, it will have the same effect as
using HEAD instead.
 
-'git diff' [--options] \... [--] [...]::
+'git diff' [options] \... [--] [...]::
 
This form is to view the changes on the branch containing
and up to the second , starting at a common ancestor
-- 
2.16.2

[PATCH v3 2/7] doc: align 'diff --no-index' in text and synopsis

2018-05-03 Thread Andreas Heiduk

Make the two '' parameters in DESCRIPTION mandatory and
move the `--options` part to the same place where the other
variants show them. And finally make `--no-index` in SYNOPSIS
as mandatory as in DESCRIPTION.

Signed-off-by: Andreas Heiduk 
Reviewed-by: Martin Ågren 
---
 Documentation/git-diff.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
index b0c1bb95c8..6593b58299 100644
--- a/Documentation/git-diff.txt
+++ b/Documentation/git-diff.txt
@@ -13,7 +13,7 @@ SYNOPSIS
 'git diff' [options] --cached [] [--] [...]
 'git diff' [options]   [--] [...]
 'git diff' [options]  
-'git diff' [options] [--no-index] [--]  
+'git diff' [options] --no-index [--]  
 
 DESCRIPTION
 ---
@@ -29,7 +29,7 @@ two blob objects, or changes between two files on disk.
further add to the index but you still haven't.  You can
stage these changes by using linkgit:git-add[1].
 
-'git diff' --no-index [--options] [--] [...]::
+'git diff' [--options] --no-index [--]  ::
 
This form is to compare the given two paths on the
filesystem.  You can omit the `--no-index` option when
-- 
2.16.2

[PATCH v3 1/7] doc: improve formatting in githooks.txt

2018-05-03 Thread Andreas Heiduk

Typeset commands and similar things with as `git foo` instead of
'git foo' or 'git-foo' and add linkgit to the commands which run
the hooks.

Signed-off-by: Andreas Heiduk 
Reviewed-by: Martin Ågren 
---
 Documentation/githooks.txt | 115 +++--
 1 file changed, 58 insertions(+), 57 deletions(-)

diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index f877f7b7cd..e3c283a174 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -31,7 +31,7 @@ Hooks can get their arguments via the environment, 
command-line
 arguments, and stdin. See the documentation for each hook below for
 details.
 
-'git init' may copy hooks to the new repository, depending on its
+`git init` may copy hooks to the new repository, depending on its
 configuration. See the "TEMPLATE DIRECTORY" section in
 linkgit:git-init[1] for details. When the rest of this document refers
 to "default hooks" it's talking about the default template shipped
@@ -45,9 +45,9 @@ HOOKS
 applypatch-msg
 ~~
 
-This hook is invoked by 'git am'.  It takes a single
+This hook is invoked by linkgit:git-am[1].  It takes a single
 parameter, the name of the file that holds the proposed commit
-log message.  Exiting with a non-zero status causes 'git am' to abort
+log message.  Exiting with a non-zero status causes `git am` to abort
 before applying the patch.
 
 The hook is allowed to edit the message file in place, and can
@@ -61,7 +61,7 @@ The default 'applypatch-msg' hook, when enabled, runs the
 pre-applypatch
 ~~
 
-This hook is invoked by 'git am'.  It takes no parameter, and is
+This hook is invoked by linkgit:git-am[1].  It takes no parameter, and is
 invoked after the patch is applied, but before a commit is made.
 
 If it exits with non-zero status, then the working tree will not be
@@ -76,33 +76,33 @@ The default 'pre-applypatch' hook, when enabled, runs the
 post-applypatch
 ~~~
 
-This hook is invoked by 'git am'.  It takes no parameter,
+This hook is invoked by linkgit:git-am[1].  It takes no parameter,
 and is invoked after the patch is applied and a commit is made.
 
 This hook is meant primarily for notification, and cannot affect
-the outcome of 'git am'.
+the outcome of `git am`.
 
 pre-commit
 ~~
 
-This hook is invoked by 'git commit', and can be bypassed
+This hook is invoked by linkgit:git-commit[1], and can be bypassed
 with the `--no-verify` option.  It takes no parameters, and is
 invoked before obtaining the proposed commit log message and
 making a commit.  Exiting with a non-zero status from this script
-causes the 'git commit' command to abort before creating a commit.
+causes the `git commit` command to abort before creating a commit.
 
 The default 'pre-commit' hook, when enabled, catches introduction
 of lines with trailing whitespaces and aborts the commit when
 such a line is found.
 
-All the 'git commit' hooks are invoked with the environment
+All the `git commit` hooks are invoked with the environment
 variable `GIT_EDITOR=:` if the command will not bring up an editor
 to modify the commit message.
 
 prepare-commit-msg
 ~~
 
-This hook is invoked by 'git commit' right after preparing the
+This hook is invoked by linkgit:git-commit[1] right after preparing the
 default log message, and before the editor is started.
 
 It takes one to three parameters.  The first is the name of the file
@@ -114,7 +114,7 @@ commit is a merge or a `.git/MERGE_MSG` file exists); 
`squash`
 (if a `.git/SQUASH_MSG` file exists); or `commit`, followed by
 a commit SHA-1 (if a `-c`, `-C` or `--amend` option was given).
 
-If the exit status is non-zero, 'git commit' will abort.
+If the exit status is non-zero, `git commit` will abort.
 
 The purpose of the hook is to edit the message file in place, and
 it is not suppressed by the `--no-verify` option.  A non-zero exit
@@ -127,7 +127,7 @@ help message found in the commented portion of the commit 
template.
 commit-msg
 ~~
 
-This hook is invoked by 'git commit' and 'git merge', and can be
+This hook is invoked by linkgit:git-commit[1] and linkgit:git-merge[1], and 
can be
 bypassed with the `--no-verify` option.  It takes a single parameter,
 the name of the file that holds the proposed commit log message.
 Exiting with a non-zero status causes the command to abort.
@@ -143,16 +143,16 @@ The default 'commit-msg' hook, when enabled, detects 
duplicate
 post-commit
 ~~~
 
-This hook is invoked by 'git commit'. It takes no parameters, and is
+This hook is invoked by linkgit:git-commit[1]. It takes no parameters, and is
 invoked after a commit is made.
 
 This hook is meant primarily for notification, and cannot affect
-the outcome of 'git commit'.
+the outcome of `git commit`.
 
 pre-rebase
 ~~
 
-This hook is called by 'git rebase' and can be used to prevent a
+This hook is called by

[PATCH v3 0/7] Some doc-fixes

2018-05-03 Thread Andreas Heiduk

Changes since the last reroll:

- Better commit comment for "doc: align 'diff --no-index' in text and synopsis"
  This includes Martin's `s/with/and/` comment.
- Eric's typo fix in "doc: add note about shell quoting to revision.txt"
- Added new patch for git-diff.txt with s/--options/options/.
  This addresses Eric's and Martin's comments.
  

Andreas Heiduk (7):
  doc: improve formatting in githooks.txt
  doc: align 'diff --no-index' in text and synopsis
  doc: clarify ignore rules for git ls-files
  doc: add '-d' and '-o' for 'git push'
  git-svn: remove ''--add-author-from' for 'commit-diff'
  doc: add note about shell quoting to revision.txt
  doc: normalize [--options] to [options] in git-diff

 Documentation/git-diff.txt |  16 +++---
 Documentation/git-ls-files.txt |   3 +-
 Documentation/git-push.txt |   3 +-
 Documentation/git-svn.txt  |   2 +-
 Documentation/githooks.txt | 115 +
 Documentation/revisions.txt|   6 +++
 6 files changed, 77 insertions(+), 68 deletions(-)

-- 
2.16.2

Re: [PATCH v5 09/11] commit: use generation number in remove_redundant()

2018-05-03 Thread Jakub Narebski

Derrick Stolee  writes:

> The static remove_redundant() method is used to filter a list
> of commits by removing those that are reachable from another
> commit in the list. This is used to remove all possible merge-
> bases except a maximal, mutually independent set.
>
> To determine these commits are independent, we use a number of
> paint_down_to_common() walks and use the PARENT1, PARENT2 flags
> to determine reachability. Since we only care about reachability
> and not the full set of merge-bases between 'one' and 'twos', we
> can use the 'min_generation' parameter to short-circuit the walk.
>
> When no commit-graph exists, there is no change in behavior.
>
> For a copy of the Linux repository, we measured the following
> performance improvements:
>
> git merge-base v3.3 v4.5
>
> Before: 234 ms
>  After: 208 ms
>  Rel %: -11%
>
> git merge-base v4.3 v4.5
>
> Before: 102 ms
>  After:  83 ms
>  Rel %: -19%
>
> The experiments above were chosen to demonstrate that we are
> improving the filtering of the merge-base set. In the first
> example, more time is spent walking the history to find the
> set of merge bases before the remove_redundant() call. The
> starting commits are closer together in the second example,
> therefore more time is spent in remove_redundant(). The relative
> change in performance differs as expected.
>
> Reported-by: Jakub Narebski 
> Signed-off-by: Derrick Stolee 

Good description.

> ---
>  commit.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>

Let me extend context a bit to make it easier to review.

> diff --git a/commit.c b/commit.c
> index 9875feec01..5064db4e61 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -949,6 +949,7 @@ static int remove_redundant(struct commit **array, int 
> cnt)
>   parse_commit(array[i]);
>   for (i = 0; i < cnt; i++) {
>   struct commit_list *common;
> + uint32_t min_generation = GENERATION_NUMBER_INFINITY;

As you have noticed, and how it is already fixed in 'pu' it should be

  + uint32_t min_generation = array[i]->generation;

>  
>   if (redundant[i])
>   continue;
> @@ -957,8 +958,12 @@ static int remove_redundant(struct commit **array, int 
> cnt)
>   continue;
>   filled_index[filled] = j;
>   work[filled++] = array[j];
> +
> + if (array[j]->generation < min_generation)
> + min_generation = array[j]->generation;

remove_redundant() checks if i-th commit is reachable from commits
i+1..cnt, and vice versa - via checking PARENT1 and PARENT2 flag,
respectively.

As you have noticed this means that the min_generation cutoff should be
minimum of array[i]->generation, and all of array[j]->generation for
j=i+1..cnt.  There is no reason going further down if we are interested
only in reachability, and not actually in merge bases.

>   }
> - common = paint_down_to_common(array[i], filled, work, 0);
> + common = paint_down_to_common(array[i], filled, work,
> +   min_generation);
>   if (array[i]->object.flags & PARENT2)
>   redundant[i] = 1;
>   for (j = 0; j < filled; j++)
if (work[j]->object.flags & PARENT1)
redundant[filled_index[j]] = 1;

Beside this issue, nice and simple speedup.  Good.

Best,
-- 
Jakub Narębski

Re: [PATCH 1/5] submodule foreach: correct '$path' in nested submodules from a subdirectory

2018-05-03 Thread Stefan Beller

On Thu, May 3, 2018 at 10:47 AM, Jonathan Tan  wrote:
> On Wed,  2 May 2018 17:53:54 -0700
> Stefan Beller  wrote:
>
>> From: Prathamesh Chavan 
>>
>> When running 'git submodule foreach --recursive' from a subdirectory of
>> your repository, nested submodules get a bogus value for $path:
>
> I know I said in a previous e-mail [1] that we should use $path instead
> of $sm_path, but now I got confused because the test shows a difference
> in $sm_path, not $path. Maybe add "(and $sm_path, which is an alias of
> $path)".
>
> [1] 
> https://public-inbox.org/git/20180206145406.b759164cead02cd3bb3fd...@google.com/
>
>> There are three different possible solutions that have more value:
>> (a) The path value is documented as the path from the toplevel of the
>> superproject to the mount point of the submodule. If 'the' refers to
>> the superproject holding this submodule ('sub' holding 'nested'),
>> the path would be expected to be path='nested'.
>
> What is "the", and why is it quoted?

because it is unclear what is emphasizes.
It could be the intermediate (direct) superproject, or it
could be the topmost superproject (where the command was run from).

Just having "the", makes it unclear which of both it refers to.

> Also, in (c), you use the same indicative present tense as "The path
> value is documented" to describe the current situation, whereas this
> seems like a situation you're proposing. It would be clearer to use the
> imperative here ("Document $path to be the path from the toplevel...").
> Do the same for the others.
>
> Also, whenever you mention "superproject", make it clear which
> superproject you're referring to ("outermost superproject" and
> "immediate superproject" seem like good terms to me).

ok, I'll rewrite the commit message with clearer superproject
annotations.

>> (b) In case 'the' superproject is referring to the toplevel, which
>> is the superproject in which the original command was invoked,
>> then path is expected to be path='sub/nested'.
>
> Same comment about "the", and I think s/toplevel, which is the
> superproject in which the original command was invoked/outermost
> superproject/.

The outermost superproject may not be the one you invoke the
command in.

We have
* the direct superproject. You can access it currently via $toplevel,
  which is misleading
* the superproject, where the command was invoked from,
  Currently only the undocumented $displaypath gives hints
  at this
* the outermost superproject, which may be even further
  out than the previous superproject in this list; Consider
  a layout with 4 git repositories nested as follows:

outmost/invoked/direct/submodule

Submodule is part of the superproject "direct", but the command
could have been invoked from outmost/invoked/dir which has "direct"
as a submodule at '../direct' and itself is a submodule of outmost.

IMHO 'outmost' is of no relevance to the discussion. If you care about
it, discover it yourself via repeated calls to
'git rev-parse --show-superproject-working-tree'

'invoked' is only interesting for $displaypath, but apart from that
there is no benefit in knowing that at the time of processing
'submodule'. (It doesn't contain information for submodule, as
all those configs are in 'direct')

'direct' is a better name than 'toplevel', which is confusing as it
could be understood as any of 'direct', 'invoked' or 'outmost'.

>> (c) The documentation explains $path as [...] "relative to the
>> superproject", following 091a6eb0fe (submodule: drop the
>> top-level requirement, 2013-06-16), such that the nested submodule
>> would be expected as path='../sub/nested', when "the" superproject
>> is the superproject, where the command was run from
>
> How does "relative to the superproject" result in "../" appearing in the
> path? I would expect "../" to appear only if a path is relative to $pwd.

Gah. I messed that up. I wanted to emphasize *relative* and not the
superproject that is merely mentioned to form a sentence there.

>
>> (d) or the value of path='nested' is expected if we take the
>> intermediate superproject into account. [This is the same as
>> (a); it highlights that the documentation is not clear, but
>> technically correct if we were to revert 091a6eb0fe.]
>
> How exactly are we taking the intermediate superproject into account?

'nested' is the full in-tree path from the intermediate (direct) superproject
to that submodule.

>> The behavior for (c) was introduced in 091a6eb0fe (submodule: drop the
>> top-level requirement, 2013-06-16) the intent for $path seemed to be
>> relative to $cwd to the submodule worktree, but that did not work for
>> nested submodules, as the intermittent submodules were not included in
>> the path.
>
> I think "pwd" is more used in the Git project than "cwd", so maybe use
> $pwd here.

ok.

>
>> If we were to fix the meaning of the $path using (a), (d)

Re: [PATCH 00/18] Add `branch-diff`, a `tbdiff` lookalike

2018-05-03 Thread Ævar Arnfjörð Bjarmason


On Thu, May 03 2018, Johannes Schindelin wrote:

> The incredibly useful `git-tbdiff` tool to compare patch series (say, to see
> what changed between two iterations sent to the Git mailing list) is slightly
> less useful for this developer due to the fact that it requires the 
> `hungarian`
> and `numpy` Python packages which are for some reason really hard to build in
> MSYS2. So hard that I even had to give up, because it was simply easier to
> reimplement the whole shebang as a builtin command.
>
> The project at https://github.com/trast/tbdiff seems to be dormant, anyway.
> Funny (and true) story: I looked at the open Pull Requests to see how active
> that project is, only to find to my surprise that I had submitted one in 
> August
> 2015, and that it was still unanswered let alone merged.

I've been using branch-diff and haven't found issues with it yet, it
works like tbdiff but better. Faster, uses the same diff as git
(better), and spews to the pager by default.

Re: [PATCH 5/5] submodule: port submodule subcommand 'foreach' from shell to C

2018-05-03 Thread Jonathan Tan

On Wed,  2 May 2018 17:53:58 -0700
Stefan Beller  wrote:

> + argv_array_pushf(, "path=%s; %s",
> + path, info->argv[0]);

Do we need to quote the path here? (For example, what if path has a
quotation mark?)

Also, would it be useful to have a test testing a submodule with a
"weird" character (e.g. greater-than or single quote) in the name and
the path?

Re: [PATCH 3/5] submodule foreach: clarify the '$toplevel' variable documentation

2018-05-03 Thread Jonathan Tan

On Wed,  2 May 2018 17:53:56 -0700
Stefan Beller  wrote:

>   $name is the name of the relevant submodule section in `.gitmodules`,
>   $sm_path is the path of the submodule as recorded in the superproject,
>   $sha1 is the commit as recorded in the superproject, and
> - $toplevel is the absolute path to the top-level of the superproject.
> + $toplevel is the absolute path to its direct superproject, such that
> + $toplevel/$sm_path is the absolute path of the submodule.
>   Note that to avoid conflicts with '$PATH' on Windows, the '$path'
>   variable is now a deprecated synonym of '$sm_path' variable.
>   Any submodules defined in the superproject but not checked out are

Ah, I commented on patch 2 before reading patch 3 properly. I think that
if you follow my suggestions on patch 2, it will be obvious that
$toplevel/$sm_path is the absolute path of the submodule, so that this
patch is no longer needed.

Re: [PATCH 2/5] submodule foreach: document '$sm_path' instead of '$path'

2018-05-03 Thread Jonathan Tan

On Wed,  2 May 2018 17:53:55 -0700
Stefan Beller  wrote:

>  foreach [--recursive] ::
>   Evaluates an arbitrary shell command in each checked out submodule.
> - The command has access to the variables $name, $path, $sha1 and
> + The command has access to the variables $name, $sm_path, $sha1 and
>   $toplevel:
>   $name is the name of the relevant submodule section in `.gitmodules`,
> - $path is the name of the submodule directory relative to the
> - superproject, $sha1 is the commit as recorded in the superproject,
> - and $toplevel is the absolute path to the top-level of the superproject.
> + $sm_path is the path of the submodule as recorded in the superproject,
> + $sha1 is the commit as recorded in the superproject, and
> + $toplevel is the absolute path to the top-level of the superproject.
> + Note that to avoid conflicts with '$PATH' on Windows, the '$path'
> + variable is now a deprecated synonym of '$sm_path' variable.
>   Any submodules defined in the superproject but not checked out are
>   ignored by this command. Unless given `--quiet`, foreach prints the name
>   of each submodule before evaluating the command.

This patch is fine as-is. I would go further and replace all mentions of
"the superproject" to "its immediate superproject", but that is
optional.

Re: [PATCH 1/5] submodule foreach: correct '$path' in nested submodules from a subdirectory

2018-05-03 Thread Jonathan Tan

On Wed,  2 May 2018 17:53:54 -0700
Stefan Beller  wrote:

> From: Prathamesh Chavan 
> 
> When running 'git submodule foreach --recursive' from a subdirectory of
> your repository, nested submodules get a bogus value for $path:

I know I said in a previous e-mail [1] that we should use $path instead
of $sm_path, but now I got confused because the test shows a difference
in $sm_path, not $path. Maybe add "(and $sm_path, which is an alias of
$path)".

[1] 
https://public-inbox.org/git/20180206145406.b759164cead02cd3bb3fd...@google.com/

> There are three different possible solutions that have more value:
> (a) The path value is documented as the path from the toplevel of the
> superproject to the mount point of the submodule. If 'the' refers to
> the superproject holding this submodule ('sub' holding 'nested'),
> the path would be expected to be path='nested'.

What is "the", and why is it quoted?

Also, in (c), you use the same indicative present tense as "The path
value is documented" to describe the current situation, whereas this
seems like a situation you're proposing. It would be clearer to use the
imperative here ("Document $path to be the path from the toplevel...").
Do the same for the others.

Also, whenever you mention "superproject", make it clear which
superproject you're referring to ("outermost superproject" and
"immediate superproject" seem like good terms to me).

> (b) In case 'the' superproject is referring to the toplevel, which
> is the superproject in which the original command was invoked,
> then path is expected to be path='sub/nested'.

Same comment about "the", and I think s/toplevel, which is the
superproject in which the original command was invoked/outermost
superproject/.

> (c) The documentation explains $path as [...] "relative to the
> superproject", following 091a6eb0fe (submodule: drop the
> top-level requirement, 2013-06-16), such that the nested submodule
> would be expected as path='../sub/nested', when "the" superproject
> is the superproject, where the command was run from

How does "relative to the superproject" result in "../" appearing in the
path? I would expect "../" to appear only if a path is relative to $pwd.

> (d) or the value of path='nested' is expected if we take the
> intermediate superproject into account. [This is the same as
> (a); it highlights that the documentation is not clear, but
> technically correct if we were to revert 091a6eb0fe.]

How exactly are we taking the intermediate superproject into account?

> The behavior for (c) was introduced in 091a6eb0fe (submodule: drop the
> top-level requirement, 2013-06-16) the intent for $path seemed to be
> relative to $cwd to the submodule worktree, but that did not work for
> nested submodules, as the intermittent submodules were not included in
> the path.

I think "pwd" is more used in the Git project than "cwd", so maybe use
$pwd here.

> If we were to fix the meaning of the $path using (a), (d) such that "path"
> is "the path from the intermediate superproject to the mount point of the
> submodule", we would break any existing submodule user that runs foreach
> from non-root of the superproject as the non-nested submodule
> '../sub' would change its path to 'sub'.
> 
> If we were to fix the meaning of $path using (b) such that "path"
> is "the path from the topmost superproject to the mount point of the
> submodule", then we would break any user that uses nested submodules
> (even from the root directory) as the 'nested' would become 'sub/nested'.
> 
> If we were to fix the meaning of $path using (c) such that "path"
> is "the display path from where the original command was invoked to the
> submodule", then we would break users that rely on the assumption that
> "$toplevel / $path" is the absolute path of the submodule.
> 
> All groups can be found in the wild.  The author has no data if one group
> outweighs the other by large margin, and offending each one seems equally
> bad at first.  However in the authors imagination it is better to go with
> (a) as running from a sub directory sounds like it is carried out by a
> human rather than by some automation task.  With a human on the keyboard
> the feedback loop is short and the changed behavior can be adapted to
> quickly unlike some automation that can break silently.

As I said in my previous e-mail, this is a good analysis.

> Another argument in favor of (a) is the consistency of the variables
> provided, "$toplevel / $path" gives the absolute path of the submodule,
> with 'toplevel' (both the variable as well as the documentation) referring
> to the immediate superproject of the submodule.

It's confusing to me that $toplevel is not the path to the outermost
superproject, but the path to the immediate superproject, so I'm not
sure that the goodness of "$toplevel/$path" exists. I would omit this
paragraph.

> Documentation of the variable is adjusted in a

Re: [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions

2018-05-03 Thread Duy Nguyen

On Thu, May 3, 2018 at 7:24 PM, Stefan Beller  wrote:
>>> +void clear_alloc_state(struct alloc_state *s)
>>> +{
>>> +   while (s->slab_nr > 0) {
>>> +   s->slab_nr--;
>>> +   free(s->slabs[s->slab_nr]);
>>
>> I think you're leaking memory here. Commit and tree objects may have
>> more allocations in them (especially trees, but I think we have
>> commit_list in struct commit too). Those need to be freed as well.
>
> I would think that tree and commit memory will be free'd in the
> parsed_objects release function? (TODO: I need to add it over there)

What release function? I know tree->buffer is often released since you
don't want to keep much in memory. But the user should not be required
to release all objects before calling object_parser_clear(). For
struct commit, I'm pretty we make the commit_list (for commit parents)
and never free. Now we need to do it, somewhere.

Oh you mean object_parser_clear() as release function? Yes. I've been
wondering if it makes more sense to go through obj_hash[] and release
objects before free(obj_hash) than doing it here. It's probably better
to do it there since obj_hash[] would contain the very last pointers
to these objects and look like the right place to release resources.

> Thanks,
> Stefan
-- 
Duy

Re: [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions

2018-05-03 Thread Stefan Beller

On Wed, May 2, 2018 at 1:50 PM, Jonathan Tan  wrote:
> On Tue,  1 May 2018 14:34:03 -0700
> Stefan Beller  wrote:
>
>> +void *allocate_alloc_state(void)
>> +{
>> + return xcalloc(1, sizeof(struct alloc_state));
>> +}
>> +
>> +void clear_alloc_state(struct alloc_state *s)
>> +{
>> + while (s->slab_nr > 0) {
>> + s->slab_nr--;
>> + free(s->slabs[s->slab_nr]);
>> + }
>> +}
>
> These functions are asymmetrical. I understand why it is this way
> (because we use pointers, and we want to use FREE_AND_NULL), but would
> still prefer _INIT and _release().
>
>>  static inline void *alloc_node(struct alloc_state *s, size_t node_size)
>>  {
>>   void *ret;
>> @@ -45,54 +63,56 @@ static inline void *alloc_node(struct alloc_state *s, 
>> size_t node_size)
>>   ret = s->p;
>>   s->p = (char *)s->p + node_size;
>>   memset(ret, 0, node_size);
>> +
>> + ALLOC_GROW(s->slabs, s->slab_nr + 1, s->slab_alloc);
>> + s->slabs[s->slab_nr++] = ret;
>> +
>>   return ret;
>>  }
>
> This unconditionally grows the slabs for each node allocation. Shouldn't
> more than one node fit in each slab?

Yes. I'll go with Duy's idea and make it a linked list by using the first
object as a pointer to the next slab.

>
>> +extern struct alloc_state the_repository_blob_state;
>> +extern struct alloc_state the_repository_tree_state;
>> +extern struct alloc_state the_repository_commit_state;
>> +extern struct alloc_state the_repository_tag_state;
>> +extern struct alloc_state the_repository_object_state;
>
> (Context: these were in alloc.h)
>
> These seem to be used only in object.c, and only in one function - could
> we declare them "static" inside that function instead?

ok

>
>> -struct object_parser *object_parser_new(void)
>> +struct object_parser *object_parser_new(int is_the_repo)
>>  {
>>   struct object_parser *o = xmalloc(sizeof(*o));
>>   memset(o, 0, sizeof(*o));
>> +
>> + if (is_the_repo) {
>> + o->blob_state = _repository_blob_state;
>> + o->tree_state = _repository_tree_state;
>> + o->commit_state = _repository_commit_state;
>> + o->tag_state = _repository_tag_state;
>> + o->object_state = _repository_object_state;
>> + } else {
>> + o->blob_state = allocate_alloc_state();
>> + o->tree_state = allocate_alloc_state();
>> + o->commit_state = allocate_alloc_state();
>> + o->tag_state = allocate_alloc_state();
>> + o->object_state = allocate_alloc_state();
>> + }
>>   return o;
>>  }
>
> I don't think saving 5 allocations is worth the code complexity (of the
> additional parameter).

Ok, I'll remove this overhead.

Thanks,
Stefan

Re: [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions

2018-05-03 Thread Stefan Beller

On Wed, May 2, 2018 at 10:44 AM, Duy Nguyen  wrote:
> On Tue, May 1, 2018 at 11:34 PM, Stefan Beller  wrote:
>>  #include "cache.h"
>>  #include "object.h"
>> @@ -30,8 +31,25 @@ struct alloc_state {
>> int count; /* total number of nodes allocated */
>> int nr;/* number of nodes left in current allocation */
>> void *p;   /* first free node in current allocation */
>> +
>> +   /* bookkeeping of allocations */
>> +   void **slabs;
>
> Another way to manage this is linked list: you could reserve one
> "object" in each slab to store the "next" (or "prev") pointer to
> another slab, then you can just walk through all slabs and free. It's
> a bit cheaper than reallocating slabs[], but I guess we reallocate so
> few times that readability matters more (whichever way is chosen).

This is a good idea. I'll do so in a resend.

>> +void clear_alloc_state(struct alloc_state *s)
>> +{
>> +   while (s->slab_nr > 0) {
>> +   s->slab_nr--;
>> +   free(s->slabs[s->slab_nr]);
>
> I think you're leaking memory here. Commit and tree objects may have
> more allocations in them (especially trees, but I think we have
> commit_list in struct commit too). Those need to be freed as well.

I would think that tree and commit memory will be free'd in the
parsed_objects release function? (TODO: I need to add it over there)

Thanks,
Stefan

Re: [PATCH v2 0/5] Allocate cache entries from memory pool

2018-05-03 Thread Stefan Beller

On Thu, May 3, 2018 at 9:35 AM, Duy Nguyen  wrote:
> On Mon, Apr 30, 2018 at 5:31 PM, Jameson Miller  wrote:
>> This patch series improves the performance of loading indexes by
>> reducing the number of malloc() calls. Loading the index from disk is
>> partly dominated by the time in malloc(), which is called for each
>> index entry. This patch series reduces the number of times malloc() is
>> called as part of loading the index, and instead allocates a block of
>> memory upfront that is large enough to hold all of the cache entries,
>> and chunks this memory itself. This change builds on [1].
>
> I have only looked at the mem-pool related patches to see if
> mem-pool.c is good enough to replace alloc.c. To me, it's a "yes"
> after we optimize mem_pool_alloc() a bit (not that performance really
> matters in alloc.c case, but that may be because it's already
> blazingly fast that we never noticed about it).

alloc.c was not just about speed, but mostly about dense packing?
855419f764a (Add specialized object allocator, 2006-06-19)

To me it is also a clear yes when it comes to combining these
two memory pools.

[PATCH v2] git: add -P as a short option for --no-pager

2018-05-03 Thread Johannes Sixt

It is possible to configure 'less', the pager, to use an alternate
screen to show the content, for example, by setting LESS=RS in the
environment. When it is closed in this configuration, it switches
back to the original screen, and all content is gone.

It is not uncommon to request that the output remains visible in
the terminal. For this, the option --no-pager can be used. But
it is a bit cumbersome to type, even when command completion is
available. Provide a short option, -P, to make the option easier
accessible.

Signed-off-by: Johannes Sixt 
---
 Given the positive feedback, I resurrect the patch.

 Changes since v1:
 - Use -P instead of -N
 - Commit message changed as proposed by Kaartic

 Documentation/git.txt | 3 ++-
 git.c | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 4767860e72..c662f41c1d 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -11,7 +11,7 @@ SYNOPSIS
 [verse]
 'git' [--version] [--help] [-C ] [-c =]
 [--exec-path[=]] [--html-path] [--man-path] [--info-path]
-[-p|--paginate|--no-pager] [--no-replace-objects] [--bare]
+[-p|--paginate|-P|--no-pager] [--no-replace-objects] [--bare]
 [--git-dir=] [--work-tree=] [--namespace=]
 [--super-prefix=]
  []
@@ -103,6 +103,7 @@ foo.bar= ...`) sets `foo.bar` to the empty string which 
`git config
configuration options (see the "Configuration Mechanism" section
below).
 
+-P::
 --no-pager::
Do not pipe Git output into a pager.
 
diff --git a/git.c b/git.c
index ceaa58ef40..71d013424e 100644
--- a/git.c
+++ b/git.c
@@ -7,7 +7,7 @@
 const char git_usage_string[] =
N_("git [--version] [--help] [-C ] [-c =]\n"
   "   [--exec-path[=]] [--html-path] [--man-path] 
[--info-path]\n"
-  "   [-p | --paginate | --no-pager] [--no-replace-objects] 
[--bare]\n"
+  "   [-p | --paginate | -P | --no-pager] 
[--no-replace-objects] [--bare]\n"
   "   [--git-dir=] [--work-tree=] 
[--namespace=]\n"
   "[]");
 
@@ -81,7 +81,7 @@ static int handle_options(const char ***argv, int *argc, int 
*envchanged)
exit(0);
} else if (!strcmp(cmd, "-p") || !strcmp(cmd, "--paginate")) {
use_pager = 1;
-   } else if (!strcmp(cmd, "--no-pager")) {
+   } else if (!strcmp(cmd, "-P") || !strcmp(cmd, "--no-pager")) {
use_pager = 0;
if (envchanged)
*envchanged = 1;
-- 
2.17.0.69.g0c1d01d9b6

Re: [PATCH 11/18] branch-diff: add tests

2018-05-03 Thread Stefan Beller

On Thu, May 3, 2018 at 8:30 AM, Johannes Schindelin
 wrote:
> From: Thomas Rast 
>
> These are essentially lifted from https://github.com/trast/tbdiff, with
> light touch-ups to account for the new command name.
>
> Apart from renaming `tbdiff` to `branch-diff`, only one test case needed
> to be adjusted: 11 - 'changed message'.
>
> The underlying reason it had to be adjusted is that diff generation is
> sometimes ambiguous. In this case, a comment line and an empty line are
> added, but it is ambiguous whether they were added after the existing
> empty line, or whether an empty line and the comment line are added
> *before* the existing emtpy line. And apparently xdiff picks a different
> option here than Python's difflib.

I think that is the fallout of the diff heuristics. If you are keen on
a 1:1 port, you can disable the diff sliding heuristics and it should produce
the same diff with trailing new lines.

Re: [PATCH 03/18] branch-diff: first rudimentary implementation

2018-05-03 Thread Stefan Beller

On Thu, May 3, 2018 at 8:30 AM, Johannes Schindelin
 wrote:

> Note: due to differences in the diff algorithm (`tbdiff` uses the
> Pythong module `difflib`, Git uses its xdiff fork), the cost matrix
> calculated by `branch-diff` is different (but very similar) to the one
> calculated by `tbdiff`. Therefore, it is possible that they find
> different matching commits in corner cases (e.g. when a patch was split
> into two patches of roughly equal length).

Does that mean, we may want to tweak the underlying diff parameters for
this special use case eventually?

>
> -#define COLOR_DUAL_MODE 2
> -

Leave this out in the first patch?

> @@ -19,6 +23,279 @@ static int parse_creation_weight(const struct option 
> *opt, const char *arg,
> return 0;
>  }
>
> +struct patch_util {
> +   /* For the search for an exact match */
> +   struct hashmap_entry e;
> +   const char *diff, *patch;
> +
> +   int i;
> +   int diffsize;
> +   size_t diff_offset;
> +   /* the index of the matching item in the other branch, or -1 */
> +   int matching;
> +   struct object_id oid;
> +};
> +
> +/*
> + * Reads the patches into a string list, with the `util` field being 
> populated
> + * as struct object_id (will need to be free()d).
> + */
> +static int read_patches(const char *range, struct string_list *list)
> +{
> +   struct child_process cp = CHILD_PROCESS_INIT;
> +   FILE *in;
> +   struct strbuf buf = STRBUF_INIT, line = STRBUF_INIT;
> +   struct patch_util *util = NULL;
> +   int in_header = 1;
> +
> +   argv_array_pushl(, "log", "--no-color", "-p", "--no-merges",
> +   "--reverse", "--date-order", "--decorate=no",
> +   "--no-abbrev-commit", range,
> +   NULL);
> +   cp.out = -1;
> +   cp.no_stdin = 1;
> +   cp.git_cmd = 1;
> +
> +   if (start_command())
> +   return error_errno(_("could not start `log`"));
> +   in = fdopen(cp.out, "r");
> +   if (!in) {
> +   error_errno(_("could not read `log` output"));
> +   finish_command();
> +   return -1;
> +   }

With the implementation of --color-moved, there is an option
to buffer all diff output in memory e6e045f8031 (diff.c: buffer
all output if asked to, 2017-06-29), so I posit that running this
diff in-core may be "not too hard". Famous last words.

In addition to that patch, we'd have to buffer commit messages
and buffer multiple commits, as that only buffers a diff of a single
commit.

The benefit would be no invocation of new processes, letting us
do more in core. This would allow for tweaking revision walking
internally, e.g. passing of options to this command such as rename
detection factors, can be passed through easily without the need
of translating it back to the command line.
Let's read on.

> +
> +   if (starts_with(line.buf, "diff --git")) {

When using the internal buffers, you would not need to
string compare, but could just check for the
DIFF_SYMBOL_HEADER.

> +   } else if (starts_with(line.buf, "@@ "))
> +   strbuf_addstr(, "@@");

So we omit line information for hunks. Makes sense,
though then we could also skip the "index ..." lines?

Stefan

Re: [PATCH 11/18] branch-diff: add tests

2018-05-03 Thread Ævar Arnfjörð Bjarmason


On Thu, May 03 2018, Johannes Schindelin wrote:

> *before* the existing emtpy line. And apparently xdiff picks a different

s/emtpy/empty/

Re: git merge banch w/ different submodule revision

2018-05-03 Thread Heiko Voigt

Hi,

On Wed, May 02, 2018 at 07:30:25AM +, Middelschulte, Leif wrote:
> Am Montag, den 30.04.2018, 19:02 +0200 schrieb Heiko Voigt:
> > On Thu, Apr 26, 2018 at 03:19:36PM -0700, Stefan Beller wrote:
> > > Stefan wrote:
> > > > See 
> > > > https://github.com/git/git/commit/68d03e4a6e448aa557f52adef92595ac4d6cd4bd
> > > > (68d03e4a6e (Implement automatic fast-forward merge for submodules, 
> > > > 2010-07-07)
> > > > to explain the situation you encounter. (specifically merge_submodule
> > > > at the end of the diff)
> > > 
> > > +cc Heiko, author of that commit.
> > 
> > In that commit we tried to be very careful about. I do not understand
> > the situation in which the current strategy would be wrong by default.
> > 
> > We only merge if the following applies:
> > 
> >  * The changes in the superproject on both sides point forward in the
> >submodule.
> > 
> >  * One side is contained in the other. Contained from the submodule
> >perspective. Sides from the superproject merge perspective.
> > 
> > So in case of the mentioned rewind of a submodule: Only one side of the
> > 3-way merge would point forward and the merge would fail.
> > 
> > I can imagine, that in case of a temporary revert of a commit in the
> > submodule that you would not want that merged into some other branch.
> > But that would be the same without submodules. If you merge a temporary
> > revert from another branch you will not get any conflict.
> > 
> > So maybe someone can explain the use case in which one would get the
> > results that seem wrong?
> In an ideal world, where there are no regressions between revisions, a
> fast-forward is appropriate. However, we might have regressions within
> submodules.
> 
> So the usecase is the following:
> 
> Environment:
> - We have a base library L that is developed by some team (Team B).
> - Another team (Team A) developes a product P based on those libraries using 
> git-flow.
> 
> Case:
> The problem occurs, when a developer (D) of Team A tries to have a feature
> that he developed on a branch accepted by a core developer of P:
> If a core developer of P advanced the reference of L within P (linear 
> history), he might
> deem the work D insufficient. Not because of the actual work by D, but 
> regressions
> that snuck into L. The core developer will not be informed about the 
> missmatching
> revisions of L.
> 
> So it would be nice if there was some kind of switch or at least some trigger.

I still do not understand how the current behaviour is mismatching with
users expectations. Let's assume that you directly tracked the files of
L in your product repository P, without any submodule boundary. How
would the behavior be different? Would it be? If D started on an older
revision and gets merged into a newer revision, there can always be
regressions even without submodules.

Why would the core developer need to be informed about mismatching
revisions if he himself advanced the submodule?

It seems to me that you do not want to mix integration testing and
testing of the feature itself. How about just testing/reviewing on the
branch then? You would still get the submodule revision D was working on
and then in a later stage check if integration with everything else
works.

Cheers Heiko

Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-03 Thread Stefan Beller

Hi Johannes,

On Thu, May 3, 2018 at 8:30 AM, Johannes Schindelin
 wrote:
> This builtin does not do a whole lot so far, apart from showing a usage
> that is oddly similar to that of `git tbdiff`. And for a good reason:
> the next commits will turn `branch-diff` into a full-blown replacement
> for `tbdiff`.

While I appreciate the 1:1 re-implementation, I'll comment as if this
was a newly invented tool, questioning design choices. They are probably
chosen pretty well, and fudge facotrs as below are at tweaked to a reasonable
factor, but I'll try to look with fresh eyes.

>
> At this point, we ignore tbdiff's color options, as they will all be
> implemented later and require some patches to the diff machinery.

Speaking of colors, for origin/sb/blame-color Junio hinted at re-using
cyan for "uninteresting" parts to deliver a consistent color scheme for
Git. Eventually he dreams of having 2 layers of indirection IIUC, with
"uninteresting" -> cyan
"repeated lines in blame" -> uninteresting

Maybe we can fit the coloring of this tool in this scheme, too?

> +   double creation_weight = 0.6;

I wondered if we use doubles in Gits code base at all,
and I found

khash.h:59:static const double __ac_HASH_UPPER = 0.77;
pack-bitmap-write.c:248:static const double
REUSE_BITMAP_THRESHOLD = 0.2;
pack-bitmap.c:751:  static const double REUSE_PERCENT = 0.9;

all other occurrences of `git grep double` are mentioning it in other
contexts (e.g. "double linked list" or comments).

When implementing diff heuristics in 433860f3d0b (diff: improve
positioning of add/delete blocks in diffs, 2016-09-05), Michael broke
it down to fixed integers instead of floating point.

Do we need to dynamic of a floating point, or would a rather small range
suffice here? (Also see rename detection settings, that take percents as
integers)

Thanks,
Stefan

Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-03 Thread Duy Nguyen

On Thu, May 3, 2018 at 5:30 PM, Johannes Schindelin
 wrote:
> diff --git a/command-list.txt b/command-list.txt
> index a1fad28fd82..c89ac8f417f 100644
> --- a/command-list.txt
> +++ b/command-list.txt
> @@ -19,6 +19,7 @@ git-archive mainporcelain
>  git-bisect  mainporcelain   info
>  git-blame   ancillaryinterrogators
>  git-branch  mainporcelain   history
> +git-branch-diff mainporcelain   info

Making it part of "git help" with the info keywords at this stage may
be premature. "git help" is about _common_ commands and we don't know
(yet) how popular this will be.

(I'm not complaining though, I almost wrote "what witchcraft is this
and where can I get it" when I see branch-diff output mentioned in
AEvar's mail)
-- 
Duy

Re: [PATCH v2 0/5] Allocate cache entries from memory pool

2018-05-03 Thread Duy Nguyen

On Mon, Apr 30, 2018 at 5:31 PM, Jameson Miller  wrote:
> This patch series improves the performance of loading indexes by
> reducing the number of malloc() calls. Loading the index from disk is
> partly dominated by the time in malloc(), which is called for each
> index entry. This patch series reduces the number of times malloc() is
> called as part of loading the index, and instead allocates a block of
> memory upfront that is large enough to hold all of the cache entries,
> and chunks this memory itself. This change builds on [1].

I have only looked at the mem-pool related patches to see if
mem-pool.c is good enough to replace alloc.c. To me, it's a "yes"
after we optimize mem_pool_alloc() a bit (not that performance really
matters in alloc.c case, but that may be because it's already
blazingly fast that we never noticed about it).

I probably should look at read-cache.c changes too. Maybe later.
Although after the change to use xmalloc() per entry a few years(?)
ago, it should be straight forward to use a different memory
allocator.
-- 
Duy

Re: [PATCH 03/18] branch-diff: first rudimentary implementation

2018-05-03 Thread Ramsay Jones



On 03/05/18 16:30, Johannes Schindelin wrote:
> At this stage, `git branch-diff` can determine corresponding commits of
> two related commit ranges. This makes use of the recently introduced
> implementation of the Hungarian algorithm.
> 
> The core of this patch is a straight port of the ideas of tbdiff, the
> seemingly dormant project at https://github.com/trast/tbdiff.
> 
> The output does not at all match `tbdiff`'s output yet, as this patch
> really concentrates on getting the patch matching part right.
> 
> Note: due to differences in the diff algorithm (`tbdiff` uses the
> Pythong module `difflib`, Git uses its xdiff fork), the cost matrix
> calculated by `branch-diff` is different (but very similar) to the one
> calculated by `tbdiff`. Therefore, it is possible that they find
> different matching commits in corner cases (e.g. when a patch was split
> into two patches of roughly equal length).
> 
> Signed-off-by: Johannes Schindelin 
> ---
>  builtin/branch-diff.c | 337 +-
>  1 file changed, 334 insertions(+), 3 deletions(-)
> 
> diff --git a/builtin/branch-diff.c b/builtin/branch-diff.c
> index 97266cd326d..02dc06a57ca 100644
> --- a/builtin/branch-diff.c
> +++ b/builtin/branch-diff.c
> @@ -1,13 +1,17 @@
>  #include "cache.h"
>  #include "parse-options.h"
> +#include "string-list.h"
> +#include "run-command.h"
> +#include "argv-array.h"
> +#include "hashmap.h"
> +#include "xdiff-interface.h"
> +#include "hungarian.h"
>  
>  static const char * const builtin_branch_diff_usage[] = {
>   N_("git rebase--helper [] ( A..B C..D | A...B | base A B )"),
>   NULL
>  };
>  
> -#define COLOR_DUAL_MODE 2
> -

This #define was introduced in the previous patch, without being
used in that patch, and is now deleted here.

ATB,
Ramsay Jones

Re: [PATCH v2 5/5] block alloc: add validations around cache_entry lifecyle

2018-05-03 Thread Duy Nguyen

On Mon, Apr 30, 2018 at 5:31 PM, Jameson Miller  wrote:
> Add an option (controlled by an environment variable) perform extra
> validations on mem_pool allocated cache entries. When set:
>
>   1) Invalidate cache_entry memory when discarding cache_entry.
>
>   2) When discarding index_state struct, verify that all cache_entries
>  were allocated from expected mem_pool.
>
>   3) When discarding mem_pools, invalidate mem_pool memory.

On linux step 3 could be better achieved by allocating blocks with
mmap() and freeing them with munmap(). Access to already munmap()'d
blocks will result in segmentation fault regardless of values.  I
guess Windows also has something similar (I vaguely remember something
about "locking memory" and stuff, but my win32 knowledge is decade old
at this point)

(Actually with glibc on linux, i'm pretty sure mmap is already used
for large allocation so step 3 is achieved without doing anything; not
sure about other libc implementations)

> This should provide extra checks that mem_pools and their allocated
> cache_entries are being used as expected.
>
> Signed-off-by: Jameson Miller 
> ---
>  cache.h  |  7 +++
>  git.c|  3 +++
>  mem-pool.c   | 24 +++-
>  mem-pool.h   |  2 ++
>  read-cache.c | 47 +++
>  5 files changed, 82 insertions(+), 1 deletion(-)
>
> diff --git a/cache.h b/cache.h
> index 7ed68f28e0..8f10f0649b 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -369,6 +369,13 @@ void discard_index_cache_entry(struct cache_entry *ce);
>   */
>  void discard_transient_cache_entry(struct cache_entry *ce);
>
> +/*
> + * Validate the cache entries in the index.  This is an internal
> + * consistency check that the cache_entry structs are allocated from
> + * the expected memory pool.
> + */
> +void validate_cache_entries(const struct index_state *istate);
> +
>  #ifndef NO_THE_INDEX_COMPATIBILITY_MACROS
>  #define active_cache (the_index.cache)
>  #define active_nr (the_index.cache_nr)
> diff --git a/git.c b/git.c
> index f598fae7b7..28ec7a6c4f 100644
> --- a/git.c
> +++ b/git.c
> @@ -347,7 +347,10 @@ static int run_builtin(struct cmd_struct *p, int argc, 
> const char **argv)
>
> trace_argv_printf(argv, "trace: built-in: git");
>
> +   validate_cache_entries(_index);
> status = p->fn(argc, argv, prefix);
> +   validate_cache_entries(_index);
> +
> if (status)
> return status;
>
> diff --git a/mem-pool.c b/mem-pool.c
> index a495885c4b..77adb5d5b9 100644
> --- a/mem-pool.c
> +++ b/mem-pool.c
> @@ -60,21 +60,43 @@ void mem_pool_discard(struct mem_pool *mem_pool)
>  {
> int i;
> struct mp_block *block, *block_to_free;
> +   int invalidate_memory = should_validate_cache_entries();

Heh.. cache-entries logic should not enter mem-pool.c.

> +
> for (block = mem_pool->mp_block; block;)
> {
> block_to_free = block;
> block = block->next_block;
> +
> +   if (invalidate_memory)
> +   memset(block_to_free->space, 0xDD, ((char 
> *)block_to_free->end) - ((char *)block_to_free->space));
> +
> free(block_to_free);
> }
>
> -   for (i = 0; i < mem_pool->nr; i++)
> +   for (i = 0; i < mem_pool->nr; i++) {
> +   memset(mem_pool->custom[i], 0xDD, ((char 
> *)mem_pool->custom_end[i]) - ((char *)mem_pool->custom[i]));

"if (invalidate_memory)" missing

> free(mem_pool->custom[i]);
> +   }
>
> free(mem_pool->custom);
> free(mem_pool->custom_end);
> free(mem_pool);
>  }
>
> +int should_validate_cache_entries(void)
> +{
> +   static int validate_index_cache_entries = -1;
> +
> +   if (validate_index_cache_entries < 0) {
> +   if (getenv("GIT_TEST_VALIDATE_INDEX_CACHE_ENTRIES"))

There's a safer version that you can use, get_env_bool()

> +   validate_index_cache_entries = 1;
> +   else
> +   validate_index_cache_entries = 0;
> +   }
> +
> +   return validate_index_cache_entries;
> +}
> +
>  void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
>  {
> struct mp_block *p;
> diff --git a/mem-pool.h b/mem-pool.h
> index 34df4fa709..b1f9a920ba 100644
> --- a/mem-pool.h
> +++ b/mem-pool.h
> @@ -63,4 +63,6 @@ void mem_pool_combine(struct mem_pool *dst, struct mem_pool 
> *src);
>   */
>  int mem_pool_contains(struct mem_pool *mem_pool, void *mem);
>
> +int should_validate_cache_entries(void);
> +
>  #endif
> diff --git a/read-cache.c b/read-cache.c
> index 01cd7fea41..e1dc9f7f33 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1270,6 +1270,8 @@ int add_index_entry(struct index_state *istate, struct 
> cache_entry *ce, int opti
>  {
> int pos;
>
> +   validate_cache_entries(istate);

Validating _all_ entries every time an entry is added sounds way too

Re: [PATCH v2 3/5] mem-pool: fill out functionality

2018-05-03 Thread Duy Nguyen

Another I noticed in the jm/mem-pool series is this loop in mem_pool_alloc()

for (p = mem_pool->mp_block; p; p = p->next_block)
if (p->end - p->next_free >= len)
break;

You always go from the start (mp_block) but at some point those first
blocks are filled up and we don't really need to walk from the start
anymore. If we allow the mem-pool user to set a "minimum alloc" limit,
then we can determine if the remaining space in a block is not useful
for any future allocation, we can just skip it and start looking for
an available from a new pointer, avail_block or something.

I'm writing this with alloc.c in mind because we have a lot more
blocks to allocate there. Unlike read-cache, you can't really estimate
how many mp_blocks you're going to need. This linked list could become
long. And because alloc.c does fixed size allocation, we use up one
block after another and will never find free space in previous blocks.

On Mon, Apr 30, 2018 at 5:31 PM, Jameson Miller  wrote:
> diff --git a/mem-pool.c b/mem-pool.c
> index 389d7af447..a495885c4b 100644
> --- a/mem-pool.c
> +++ b/mem-pool.c
> @@ -5,6 +5,8 @@
>  #include "cache.h"
>  #include "mem-pool.h"
>
> +#define BLOCK_GROWTH_SIZE 1024*1024 - sizeof(struct mp_block);

#define BLOCK_GROWTH_SIZE (1024*1024 - sizeof(struct mp_block))

(wrapped in brackets and no trailing semicolon)

> +
>  static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, 
> size_t block_alloc)
>  {
> struct mp_block *p;
> @@ -19,6 +21,60 @@ static struct mp_block *mem_pool_alloc_block(struct 
> mem_pool *mem_pool, size_t b
> return p;
>  }
>
> +static void *mem_pool_alloc_custom(struct mem_pool *mem_pool, size_t 
> block_alloc)
> +{
> +   char *p;

An empty line between variable declaration and function body would be nice.

> +   ALLOC_GROW(mem_pool->custom, mem_pool->nr + 1, mem_pool->alloc);
> +   ALLOC_GROW(mem_pool->custom_end, mem_pool->nr_end + 1, 
> mem_pool->alloc_end);
> +

If you put both custom and custom_end in a struct, then you can grow
just one array (of the new struct) and have fewer support variables
like nr_end and alloc_end

The only downside that I can see is the potential padding between
struct increasing memory consumption here. but since you don't care
about reallocation cost (i.e. large memory allocations should be
rare), it probably  does not matter either.

But wait, can we just reuse struct mp_block for this? You allocate a
new mp_block (for just one allocation) as usual, then you can can
maintain a linked list of custom alloc in "struct mp_block
*mp_custom_block" or something. This way we can walk both bulk and
custom allocation the same way.

> +   p = xmalloc(block_alloc);
> +   mem_pool->custom[mem_pool->nr++] = p;
> +   mem_pool->custom_end[mem_pool->nr_end++] = p + block_alloc;
> +
> +   mem_pool->pool_alloc += block_alloc;
> +
> +   return mem_pool->custom[mem_pool->nr];
> +}
> +
> +void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size)
> +{
> +   if (!(*mem_pool))

I think (!*mem_pool) should be enough, although you could avoid the
operator precedence headache by doing

if (*mem_pool)
return;

> +   {
> +   *mem_pool = xmalloc(sizeof(struct mem_pool));

I think we tend to do *mem_pool = xmalloc(sizeof(**mem_pool));

> +   (*mem_pool)->pool_alloc = 0;

Eghh.. perhaps just declare

struct mem_pool *pool;

then allocate a new memory to this, initialize everything and only do

*mem_pool = pool;

at the end? It's much less of this (*mem_pool)->

> +   (*mem_pool)->mp_block = NULL;

Just memset() it once (or use xcallo) and only initialize
non-zero/null fields afterwards.

> +   (*mem_pool)->block_alloc = BLOCK_GROWTH_SIZE;
> +   (*mem_pool)->custom = NULL;
> +   (*mem_pool)->nr = 0;
> +   (*mem_pool)->alloc = 0;
> +   (*mem_pool)->custom_end = NULL;
> +   (*mem_pool)->nr_end = 0;
> +   (*mem_pool)->alloc_end = 0;
> +
> +   if (initial_size > 0)
> +   mem_pool_alloc_block(*mem_pool, initial_size);
> +   }
> +}
> +
> +void mem_pool_discard(struct mem_pool *mem_pool)
> +{
> +   int i;
> +   struct mp_block *block, *block_to_free;
> +   for (block = mem_pool->mp_block; block;)
> +   {
> +   block_to_free = block;
> +   block = block->next_block;
> +   free(block_to_free);
> +   }
> +
> +   for (i = 0; i < mem_pool->nr; i++)
> +   free(mem_pool->custom[i]);
> +
> +   free(mem_pool->custom);
> +   free(mem_pool->custom_end);
> +   free(mem_pool);
> +}
> +
>  void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
>  {
> struct mp_block *p;
> @@ -33,10 +89,8 @@ void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
> break;
>
> if (!p) {
> -

Re: [PATCH 02/18] Add a new builtin: branch-diff

2018-05-03 Thread Ramsay Jones



On 03/05/18 16:30, Johannes Schindelin wrote:
> This builtin does not do a whole lot so far, apart from showing a usage
> that is oddly similar to that of `git tbdiff`. And for a good reason:
> the next commits will turn `branch-diff` into a full-blown replacement
> for `tbdiff`.
> 
> At this point, we ignore tbdiff's color options, as they will all be
> implemented later and require some patches to the diff machinery.
> 
> Signed-off-by: Johannes Schindelin 
> ---
>  .gitignore|  1 +
>  Makefile  |  1 +
>  builtin.h |  1 +
>  builtin/branch-diff.c | 40 
>  command-list.txt  |  1 +
>  git.c |  1 +
>  6 files changed, 45 insertions(+)
>  create mode 100644 builtin/branch-diff.c
> 
> diff --git a/.gitignore b/.gitignore
> index 833ef3b0b78..1346a64492f 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -20,6 +20,7 @@
>  /git-bisect--helper
>  /git-blame
>  /git-branch
> +/git-branch-diff
>  /git-bundle
>  /git-cat-file
>  /git-check-attr
> diff --git a/Makefile b/Makefile
> index 96f2e76a904..9b1984776d8 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -953,6 +953,7 @@ BUILTIN_OBJS += builtin/archive.o
>  BUILTIN_OBJS += builtin/bisect--helper.o
>  BUILTIN_OBJS += builtin/blame.o
>  BUILTIN_OBJS += builtin/branch.o
> +BUILTIN_OBJS += builtin/branch-diff.o
>  BUILTIN_OBJS += builtin/bundle.o
>  BUILTIN_OBJS += builtin/cat-file.o
>  BUILTIN_OBJS += builtin/check-attr.o
> diff --git a/builtin.h b/builtin.h
> index 42378f3aa47..e1c4d2a529a 100644
> --- a/builtin.h
> +++ b/builtin.h
> @@ -135,6 +135,7 @@ extern int cmd_archive(int argc, const char **argv, const 
> char *prefix);
>  extern int cmd_bisect__helper(int argc, const char **argv, const char 
> *prefix);
>  extern int cmd_blame(int argc, const char **argv, const char *prefix);
>  extern int cmd_branch(int argc, const char **argv, const char *prefix);
> +extern int cmd_branch_diff(int argc, const char **argv, const char *prefix);
>  extern int cmd_bundle(int argc, const char **argv, const char *prefix);
>  extern int cmd_cat_file(int argc, const char **argv, const char *prefix);
>  extern int cmd_checkout(int argc, const char **argv, const char *prefix);
> diff --git a/builtin/branch-diff.c b/builtin/branch-diff.c
> new file mode 100644
> index 000..97266cd326d
> --- /dev/null
> +++ b/builtin/branch-diff.c
> @@ -0,0 +1,40 @@
> +#include "cache.h"
> +#include "parse-options.h"
> +
> +static const char * const builtin_branch_diff_usage[] = {
> + N_("git rebase--helper [] ( A..B C..D | A...B | base A B )"),

s/rebase--helper/branch-diff/

ATB,
Ramsay Jones

Re: [PATCH 8/8] gpg-interface: handle alternative signature types

2018-05-03 Thread Ben Toews

On Tue, Apr 17, 2018 at 12:33 PM, Taylor Blau  wrote:
>
> On Tue, Apr 17, 2018 at 12:08:20PM -0600, Ben Toews wrote:
> > On Mon, Apr 16, 2018 at 7:54 PM, Junio C Hamano  wrote:
> > > "brian m. carlson"  writes:
> > >
> > >> If we just want to add gpgsm support, that's fine, but we should be
> > >> transparent about that fact and try to avoid making an interface which
> > >> is at once too generic and not generic enough.
> >
> > [...]
> >
> > My motivation with this series is not just to "add gpgsm support"
> > though. I've been working on some other CMS tooling that will be open
> > source eventually. My goal was to distribute this tool with a wrapper
> > that emulates the GnuPG interface.
> >
> > To me, this series feels like a good stepping stone towards more
> > generalized support for other tooling.
>
> I agree with Ben's assessment. A stand-in tool for gpgsm support will
> not be useful without a way to configure it with Git. I think that
> treating this series as ``add support for _gpgsm-like tools_'' is
> sensible, and a reasonable compromise between:
>
>   - More generalized support.
>   - No support at all.
>
> Thanks,
> Taylor


Any more thoughts as to whether adding support for CMS tooling is
worthwhile as a stepping stone towards supporting more general
tooling?

[PATCH 18/18] completion: support branch-diff

2018-05-03 Thread Johannes Schindelin

Tab completion of `branch-diff` is very convenient, especially given
that the revision arguments that need to be passed to `git branch-diff`
are typically more complex than, say, your grandfather's `git log`
arguments.

Without this patch, we would only complete the `branch-diff` part but
not the options and other arguments.

This of itself may already be slightly disruptive for well-trained
fingers that assume that `git braorimas` would expand to
`git branch origin/master`, as we now no longer automatically append a
space after completing `git branch`: this is now ambiguous.

Signed-off-by: Johannes Schindelin 
---
 contrib/completion/git-completion.bash | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 01dd9ff07a2..45addd525ac 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1496,6 +1496,24 @@ _git_format_patch ()
__git_complete_revlist
 }
 
+__git_branch_diff_options="
+   --no-patches --creation-weight= --dual-color
+"
+
+_git_branch_diff ()
+{
+   case "$cur" in
+   --*)
+   __gitcomp "
+   $__git_branch_diff_options
+   $__git_diff_common_options
+   "
+   return
+   ;;
+   esac
+   __git_complete_revlist
+}
+
 _git_fsck ()
 {
case "$cur" in
-- 
2.17.0.395.g6a618d6010f.dirty

[PATCH 17/18] branch-diff: add a man page

2018-05-03 Thread Johannes Schindelin

This is a heavily butchered version of the README written by Thomas
Rast and Thomas Gummerer, lifted from https://github.com/trast/tbdiff.

Signed-off-by: Johannes Schindelin 
---
 Documentation/git-branch-diff.txt | 239 ++
 1 file changed, 239 insertions(+)
 create mode 100644 Documentation/git-branch-diff.txt

diff --git a/Documentation/git-branch-diff.txt 
b/Documentation/git-branch-diff.txt
new file mode 100644
index 000..b841586735c
--- /dev/null
+++ b/Documentation/git-branch-diff.txt
@@ -0,0 +1,239 @@
+git-branch-diff(1)
+==
+
+NAME
+
+git-branch-diff - Compare two versions of a branch
+
+SYNOPSIS
+
+[verse]
+'git branch-diff' [--color=[]] [--no-color] []
+   [--dual-color] [--no-patches] [--creation-weight=]
+   (   | ... |)
+
+DESCRIPTION
+---
+
+This command shows the differences between two versions of a patch
+series, or more generally, two commit ranges (ignoring merges).
+
+To that end, it first finds pairs of commits from both commit ranges
+that correspond with each other. Two commits are said to correspond when
+the diff between their patches (i.e. the author information, the commit
+message and the commit diff) is reasonably small compared to the
+patches' size. See ``Algorithm` below for details.
+
+Finally, the list of matching commits is shown in the order of the
+second commit range, with unmatched commits being inserted just after
+all of their ancestors have been shown.
+
+
+OPTIONS
+---
+--no-patches::
+   Suppress the diffs between commit pairs that were deemed to
+   correspond; only show the pairings.
+
+--dual-color::
+   When the commit diffs differ, recreate the original diffs'
+   coloring, and add outer -/+ diff markers with the *background*
+   being red/green to make it easier to see e.g. when there was a
+   change in what exact lines were added.
+
+--creation-weight=::
+   Set the creation/deletion cost fudge factor to ``.
+   Defaults to 0.6. Try a larger value if `git branch-diff`
+   erroneously considers a large change a total rewrite (deletion
+   of one commit and addition of another), and a smaller one in
+   the reverse case. See the ``Algorithm`` section below for an
+   explanation why this is needed.
+
+ ::
+   Compare the commits specified by the two ranges, where
+   `` is considered an older version of ``.
+
+...::
+   Equivalent to passing `..` and `..`.
+
+  ::
+   Equivalent to passing `..` and `..`.
+   Note that `` does not need to be the exact branch point
+   of the branches. Example: after rebasing a branch `my-topic`,
+   `git branch-diff my-topic@{u} my-topic@{1} my-topic` would
+   show the differences introduced by the rebase.
+
+`git branch-diff` also accepts the regular diff options (see
+linkgit:git-diff[1]), most notably the `--color=[]` and
+`--no-color` options. These options are used when generating the "diff
+between patches", i.e. to compare the author, commit message and diff of
+corresponding old/new commits. There is currently no means to tweak the
+diff options passed to `git log` when generating those patches.
+
+
+CONFIGURATION
+-
+This command uses the `diff.color.*` and `pager.branch-diff` settings
+(the latter is on by default).
+See linkgit:git-config[1].
+
+
+Examples
+
+
+When a rebase required merge conflicts to be resolved, compare the changes
+introduced by the rebase directly afterwards using:
+
+
+$ git branch-diff @{u} @{1} @
+
+
+
+A typical output of `git branch-diff` would look like this:
+
+
+-:  --- > 1:  0ddba11 Prepare for the inevitable!
+1:  c0debee = 2:  cab005e Add a helpful message at the start
+2:  f00dbal ! 3:  decafe1 Describe a bug
+@@ -1,3 +1,3 @@
+ Author: A U Thor 
+
+-TODO: Describe a bug
++Describe a bug
+@@ -324,5 +324,6
+  This is expected.
+
+-+What is unexpected is that it will also crash.
+++Unexpectedly, it also crashes. This is a bug, and the jury is
+++still out there how to fix it best. See ticket #314 for details.
+
+  Contact
+3:  bedead < -:  --- TO-UNDO
+
+
+In this example, there are 3 old and 3 new commits, where the developer
+removed the 3rd, added a new one before the first two, and modified the
+commit message of the 2nd commit as well its diff.
+
+When the output goes to a terminal, it is color-coded by default, just
+like regular `git diff`'s output. In addition, the first line (adding a
+commit) is green, the last line (deleting a commit) is red, the second
+line (with a perfect match) is yellow like the commit header of `git
+show`'s output, and the third line colors the old commit red, the new
+one green and the rest like `git show`'s commit header.
+
+The color-coded diff is actually a bit hard to read, though, as it
+colors the entire lines red

[PATCH 14/18] diff: add an internal option to dual-color diffs of diffs

2018-05-03 Thread Johannes Schindelin

When diffing diffs, it can be quite daunting to figure out what the heck
is going on, as there are nested +/- signs.

Let's make this easier by adding a flag in diff_options that allows
color-coding the outer diff sign with inverted colors, so that the
preimage and postimage is colored like the diff it is.

Of course, this really only makes sense when the preimage and postimage
*are* diffs. So let's not expose this flag via a command-line option for
now.

This is a feature that was invented by git-tbdiff, and it will be used
in `branch-diff` in the next commit.

Signed-off-by: Johannes Schindelin 
---
 diff.c | 65 +-
 diff.h |  5 -
 2 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/diff.c b/diff.c
index f1bda0db3f5..98a41e88620 100644
--- a/diff.c
+++ b/diff.c
@@ -67,6 +67,8 @@ static char diff_colors[][COLOR_MAXLEN] = {
GIT_COLOR_BOLD_YELLOW,  /* NEW_MOVED ALTERNATIVE */
GIT_COLOR_FAINT,/* NEW_MOVED_DIM */
GIT_COLOR_FAINT_ITALIC, /* NEW_MOVED_ALTERNATIVE_DIM */
+   GIT_COLOR_INV_RED,  /* OLD_INV */
+   GIT_COLOR_INV_GREEN,/* NEW_INV */
 };
 
 static NORETURN void die_want_option(const char *option_name)
@@ -108,6 +110,10 @@ static int parse_diff_color_slot(const char *var)
return DIFF_FILE_NEW_MOVED_DIM;
if (!strcasecmp(var, "newmovedalternativedimmed"))
return DIFF_FILE_NEW_MOVED_ALT_DIM;
+   if (!strcasecmp(var, "oldinv"))
+   return DIFF_FILE_OLD_INV;
+   if (!strcasecmp(var, "newinv"))
+   return DIFF_FILE_NEW_INV;
return -1;
 }
 
@@ -577,7 +583,10 @@ static void emit_line_0(struct diff_options *o, const char 
*set, const char *res
int nofirst;
FILE *file = o->file;
 
-   fputs(diff_line_prefix(o), file);
+   if (first)
+   fputs(diff_line_prefix(o), file);
+   else if (!len)
+   return;
 
if (len == 0) {
has_trailing_newline = (first == '\n');
@@ -596,7 +605,7 @@ static void emit_line_0(struct diff_options *o, const char 
*set, const char *res
 
if (len || !nofirst) {
fputs(set, file);
-   if (!nofirst)
+   if (first && !nofirst)
fputc(first, file);
fwrite(line, len, 1, file);
fputs(reset, file);
@@ -970,7 +979,8 @@ static void dim_moved_lines(struct diff_options *o)
 
 static void emit_line_ws_markup(struct diff_options *o,
const char *set, const char *reset,
-   const char *line, int len, char sign,
+   const char *line, int len,
+   const char *set_sign, char sign,
unsigned ws_rule, int blank_at_eof)
 {
const char *ws = NULL;
@@ -981,14 +991,18 @@ static void emit_line_ws_markup(struct diff_options *o,
ws = NULL;
}
 
-   if (!ws)
+   if (!ws && set_sign == set)
emit_line_0(o, set, reset, sign, line, len);
-   else if (blank_at_eof)
+   else if (!ws) {
+   /* Emit just the prefix, then the rest. */
+   emit_line_0(o, set_sign, reset, sign, "", 0);
+   emit_line_0(o, set, reset, 0, line, len);
+   } else if (blank_at_eof)
/* Blank line at EOF - paint '+' as well */
emit_line_0(o, ws, reset, sign, line, len);
else {
/* Emit just the prefix, then the rest. */
-   emit_line_0(o, set, reset, sign, "", 0);
+   emit_line_0(o, set_sign, reset, sign, "", 0);
ws_check_emit(line, len, ws_rule,
  o->file, set, reset, ws);
}
@@ -998,7 +1012,7 @@ static void emit_diff_symbol_from_struct(struct 
diff_options *o,
 struct emitted_diff_symbol *eds)
 {
static const char *nneof = " No newline at end of file\n";
-   const char *context, *reset, *set, *meta, *fraginfo;
+   const char *context, *reset, *set, *set_sign, *meta, *fraginfo;
struct strbuf sb = STRBUF_INIT;
 
enum diff_symbol s = eds->s;
@@ -1038,7 +1052,16 @@ static void emit_diff_symbol_from_struct(struct 
diff_options *o,
case DIFF_SYMBOL_CONTEXT:
set = diff_get_color_opt(o, DIFF_CONTEXT);
reset = diff_get_color_opt(o, DIFF_RESET);
-   emit_line_ws_markup(o, set, reset, line, len, ' ',
+   set_sign = set;
+   if (o->flags.dual_color_diffed_diffs) {
+   char c = !len ? 0 : line[0];
+
+   if (c == '+')
+   set = diff_get_color_opt(o, DIFF_FILE_NEW);
+   else if (c == '-')
+   set = diff_get_color_opt(o,

[PATCH 16/18] branch-diff --dual-color: work around bogus white-space warning

2018-05-03 Thread Johannes Schindelin

When displaying a diff of diffs, it is possible that there is an outer
`+` before a context line. That happens when the context changed between
old and new commit. When that context line starts with a tab (after the
space that marks it as context line), our diff machinery spits out a
white-space error (space before tab), but in this case, that is
incorrect.

Work around this by detecting that situation and simply *not* printing
the space in that case.

This is slightly improper a fix because it is conceivable that an
output_prefix might be configured with *just* the right length to let
that tab jump to a different tab stop depending whether we emit that
space or not.

However, the proper fix would be relatively ugly and intrusive because
it would have to *weaken* the WS_SPACE_BEFORE_TAB option in ws.c.
Besides, we do not expose the --dual-color option in cases other than
the `branch-diff` command, which only uses a hard-coded output_prefix of
four spaces (which misses the problem by one column ;-)).

Signed-off-by: Johannes Schindelin 
---
 diff.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/diff.c b/diff.c
index 98a41e88620..b98a18fe014 100644
--- a/diff.c
+++ b/diff.c
@@ -1098,6 +1098,12 @@ static void emit_diff_symbol_from_struct(struct 
diff_options *o,
set = diff_get_color_opt(o, DIFF_FILE_OLD);
else if (c != '+')
set = diff_get_color_opt(o, DIFF_CONTEXT);
+   /* Avoid space-before-tab warning */
+   if (c == ' ' && (len < 2 || line[1] == '\t' ||
+line[1] == '\r' || line[1] == '\n')) {
+   line++;
+   len--;
+   }
}
emit_line_ws_markup(o, set, reset, line, len, set_sign, '+',
flags & DIFF_SYMBOL_CONTENT_WS_MASK,
-- 
2.17.0.395.g6a618d6010f.dirty

[PATCH 10/18] branch-diff: do not show "function names" in hunk headers

2018-05-03 Thread Johannes Schindelin

We are comparing complete, formatted commit messages with patches. There
are no function names here, so stop looking for them.

Signed-off-by: Johannes Schindelin 
---
 builtin/branch-diff.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/builtin/branch-diff.c b/builtin/branch-diff.c
index 0a0564b8ec2..7625da09e6f 100644
--- a/builtin/branch-diff.c
+++ b/builtin/branch-diff.c
@@ -10,6 +10,7 @@
 #include "diffcore.h"
 #include "commit.h"
 #include "pretty.h"
+#include "userdiff.h"
 
 static const char * const builtin_branch_diff_usage[] = {
N_("git rebase--helper [] ( A..B C..D | A...B | base A B )"),
@@ -327,6 +328,10 @@ static struct strbuf *output_prefix_cb(struct diff_options 
*opt, void *data)
return data;
 }
 
+static struct userdiff_driver no_func_name = {
+   .funcname = { "$^", 0 }
+};
+
 static struct diff_filespec *get_filespec(const char *name, const char *p)
 {
struct diff_filespec *spec = alloc_filespec(name);
@@ -336,6 +341,7 @@ static struct diff_filespec *get_filespec(const char *name, 
const char *p)
spec->size = strlen(p);
spec->should_munmap = 0;
spec->is_stdin = 1;
+   spec->driver = _func_name;
 
return spec;
 }
-- 
2.17.0.395.g6a618d6010f.dirty

[PATCH 15/18] branch-diff: offer to dual-color the diffs

2018-05-03 Thread Johannes Schindelin

When showing what changed between old and new commits, we show a diff of
the patches. This diff is a diff between diffs, therefore there are
nested +/- signs, and it can be relatively hard to understand what is
going on.

With the --dual-color option, the preimage and the postimage are colored
like the diffs they are, and the *outer* +/- sign is inverted for
clarity.

Signed-off-by: Johannes Schindelin 
---
 builtin/branch-diff.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/builtin/branch-diff.c b/builtin/branch-diff.c
index e505b696d11..edf80ecb736 100644
--- a/builtin/branch-diff.c
+++ b/builtin/branch-diff.c
@@ -432,8 +432,11 @@ int cmd_branch_diff(int argc, const char **argv, const 
char *prefix)
 {
struct diff_options diffopt = { 0 };
struct strbuf four_spaces = STRBUF_INIT;
+   int dual_color = 0;
double creation_weight = 0.6;
struct option options[] = {
+   OPT_BOOL(0, "dual-color", _color,
+   N_("color both diff and diff-between-diffs")),
OPT_SET_INT(0, "no-patches", _format,
N_("short format (no diffs)"),
DIFF_FORMAT_NO_OUTPUT),
@@ -469,6 +472,11 @@ int cmd_branch_diff(int argc, const char **argv, const 
char *prefix)
argc = j;
diff_setup_done();
 
+   if (dual_color) {
+   diffopt.use_color = 1;
+   diffopt.flags.dual_color_diffed_diffs = 1;
+   }
+
if (argc == 2) {
if (!strstr(argv[0], ".."))
warning(_("no .. in range: '%s'"), argv[0]);
-- 
2.17.0.395.g6a618d6010f.dirty

[PATCH 13/18] color: provide inverted colors, too

2018-05-03 Thread Johannes Schindelin

For every regular color, there exists the inverted equivalent where
background and foreground colors are exchanged.

We will use this in the next commit to allow inverting *just* the +/-
signs in a diff.

Signed-off-by: Johannes Schindelin 
---
 color.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/color.h b/color.h
index cd0bcedd084..f0984b09583 100644
--- a/color.h
+++ b/color.h
@@ -36,6 +36,12 @@ struct strbuf;
 #define GIT_COLOR_BOLD_BLUE"\033[1;34m"
 #define GIT_COLOR_BOLD_MAGENTA "\033[1;35m"
 #define GIT_COLOR_BOLD_CYAN"\033[1;36m"
+#define GIT_COLOR_INV_RED  "\033[7;31m"
+#define GIT_COLOR_INV_GREEN"\033[7;32m"
+#define GIT_COLOR_INV_YELLOW   "\033[7;33m"
+#define GIT_COLOR_INV_BLUE "\033[7;34m"
+#define GIT_COLOR_INV_MAGENTA  "\033[7;35m"
+#define GIT_COLOR_INV_CYAN "\033[7;36m"
 #define GIT_COLOR_BG_RED   "\033[41m"
 #define GIT_COLOR_BG_GREEN "\033[42m"
 #define GIT_COLOR_BG_YELLOW"\033[43m"
-- 
2.17.0.395.g6a618d6010f.dirty

[PATCH 11/18] branch-diff: add tests

2018-05-03 Thread Johannes Schindelin

From: Thomas Rast 

These are essentially lifted from https://github.com/trast/tbdiff, with
light touch-ups to account for the new command name.

Apart from renaming `tbdiff` to `branch-diff`, only one test case needed
to be adjusted: 11 - 'changed message'.

The underlying reason it had to be adjusted is that diff generation is
sometimes ambiguous. In this case, a comment line and an empty line are
added, but it is ambiguous whether they were added after the existing
empty line, or whether an empty line and the comment line are added
*before* the existing emtpy line. And apparently xdiff picks a different
option here than Python's difflib.

Signed-off-by: Johannes Schindelin 
---
 t/.gitattributes   |   1 +
 t/t7910-branch-diff.sh | 144 ++
 t/t7910/history.export | 604 +
 3 files changed, 749 insertions(+)
 create mode 100755 t/t7910-branch-diff.sh
 create mode 100644 t/t7910/history.export

diff --git a/t/.gitattributes b/t/.gitattributes
index 3bd959ae523..af15d5aeedd 100644
--- a/t/.gitattributes
+++ b/t/.gitattributes
@@ -18,5 +18,6 @@ t[0-9][0-9][0-9][0-9]/* -whitespace
 /t5515/* eol=lf
 /t556x_common eol=lf
 /t7500/* eol=lf
+/t7910/* eol=lf
 /t8005/*.txt eol=lf
 /t9*/*.dump eol=lf
diff --git a/t/t7910-branch-diff.sh b/t/t7910-branch-diff.sh
new file mode 100755
index 000..a7fece88045
--- /dev/null
+++ b/t/t7910-branch-diff.sh
@@ -0,0 +1,144 @@
+#!/bin/sh
+
+test_description='branch-diff tests'
+
+. ./test-lib.sh
+
+# Note that because of git-branch-diff's heuristics, test_commit does more
+# harm than good.  We need some real history.
+
+test_expect_success 'setup' '
+   git fast-import < "$TEST_DIRECTORY"/t7910/history.export
+'
+
+test_expect_success 'simple A..B A..C (unmodified)' '
+   git branch-diff --no-color master..topic master..unmodified >actual &&
+   cat >expected <<-EOF &&
+   1:  4de457d = 1:  35b9b25 s/5/A/
+   2:  fccce22 = 2:  de345ab s/4/A/
+   3:  147e64e = 3:  9af6654 s/11/B/
+   4:  a63e992 = 4:  2901f77 s/12/B/
+   EOF
+   test_cmp expected actual
+'
+
+test_expect_success 'simple B...C (unmodified)' '
+   git branch-diff --no-color topic...unmodified >actual &&
+   # same "expected" as above
+   test_cmp expected actual
+'
+
+test_expect_success 'simple A B C (unmodified)' '
+   git branch-diff --no-color master topic unmodified >actual &&
+   # same "expected" as above
+   test_cmp expected actual
+'
+
+test_expect_success 'trivial reordering' '
+   git branch-diff --no-color master topic reordered >actual &&
+   cat >expected <<-EOF &&
+   1:  4de457d = 1:  aca177a s/5/A/
+   3:  147e64e = 2:  14ad629 s/11/B/
+   4:  a63e992 = 3:  ee58208 s/12/B/
+   2:  fccce22 = 4:  307b27a s/4/A/
+   EOF
+   test_cmp expected actual
+'
+
+test_expect_success 'removed a commit' '
+   git branch-diff --no-color master topic removed >actual &&
+   cat >expected <<-EOF &&
+   1:  4de457d = 1:  7657159 s/5/A/
+   2:  fccce22 < -:  --- s/4/A/
+   3:  147e64e = 2:  43d84d3 s/11/B/
+   4:  a63e992 = 3:  a740396 s/12/B/
+   EOF
+   test_cmp expected actual
+'
+
+test_expect_success 'added a commit' '
+   git branch-diff --no-color master topic added >actual &&
+   cat >expected <<-EOF &&
+   1:  4de457d = 1:  2716022 s/5/A/
+   2:  fccce22 = 2:  b62accd s/4/A/
+   -:  --- > 3:  df46cfa s/6/A/
+   3:  147e64e = 4:  3e64548 s/11/B/
+   4:  a63e992 = 5:  12b4063 s/12/B/
+   EOF
+   test_cmp expected actual
+'
+
+test_expect_success 'new base, A B C' '
+   git branch-diff --no-color master topic rebased >actual &&
+   cat >expected <<-EOF &&
+   1:  4de457d = 1:  cc9c443 s/5/A/
+   2:  fccce22 = 2:  c5d9641 s/4/A/
+   3:  147e64e = 3:  28cc2b6 s/11/B/
+   4:  a63e992 = 4:  5628ab7 s/12/B/
+   EOF
+   test_cmp expected actual
+'
+
+test_expect_success 'new base, B...C' '
+   # this syntax includes the commits from master!
+   git branch-diff --no-color topic...rebased >actual &&
+   cat >expected <<-EOF &&
+   -:  --- > 1:  a31b12e unrelated
+   1:  4de457d = 2:  cc9c443 s/5/A/
+   2:  fccce22 = 3:  c5d9641 s/4/A/
+   3:  147e64e = 4:  28cc2b6 s/11/B/
+   4:  a63e992 = 5:  5628ab7 s/12/B/
+   EOF
+   test_cmp expected actual
+'
+
+test_expect_success 'changed commit' '
+   git branch-diff --no-color topic...changed >actual &&
+   cat >expected <<-EOF &&
+   1:  4de457d = 1:  a4b s/5/A/
+   2:  fccce22 = 2:  f51d370 s/4/A/
+   3:  147e64e ! 3:  0559556 s/11/B/
+   @@ -10,7 +10,7 @@
+ 9
+ 10
+-11
+   -+B
+   ++BB
+ 12
+ 13
+ 14
+   4:  a63e992 ! 4:  d966c5c s/12/B/
+   @@ -8,7 +8,7 @@
+@@
+ 9
+

[PATCH 07/18] branch-diff: indent the diffs just like tbdiff

2018-05-03 Thread Johannes Schindelin

The main information in the branch-diff view comes from the list of
matching and non-matching commits, the diffs are additional information.
Indenting them helps with the reading flow.

Signed-off-by: Johannes Schindelin 
---
 builtin/branch-diff.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/builtin/branch-diff.c b/builtin/branch-diff.c
index 9dc581087bb..a4e602deb5d 100644
--- a/builtin/branch-diff.c
+++ b/builtin/branch-diff.c
@@ -272,6 +272,11 @@ static const char *short_oid(struct patch_util *util)
return find_unique_abbrev(>oid, DEFAULT_ABBREV);
 }
 
+static struct strbuf *output_prefix_cb(struct diff_options *opt, void *data)
+{
+   return data;
+}
+
 static struct diff_filespec *get_filespec(const char *name, const char *p)
 {
struct diff_filespec *spec = alloc_filespec(name);
@@ -350,6 +355,7 @@ static void output(struct string_list *a, struct 
string_list *b,
 int cmd_branch_diff(int argc, const char **argv, const char *prefix)
 {
struct diff_options diffopt = { 0 };
+   struct strbuf four_spaces = STRBUF_INIT;
double creation_weight = 0.6;
struct option options[] = {
OPT_SET_INT(0, "no-patches", _format,
@@ -368,6 +374,9 @@ int cmd_branch_diff(int argc, const char **argv, const char 
*prefix)
 
diff_setup();
diffopt.output_format = DIFF_FORMAT_PATCH;
+   diffopt.output_prefix = output_prefix_cb;
+   strbuf_addstr(_spaces, "");
+   diffopt.output_prefix_data = _spaces;
 
argc = parse_options(argc, argv, NULL, options,
builtin_branch_diff_usage, PARSE_OPT_KEEP_UNKNOWN);
-- 
2.17.0.395.g6a618d6010f.dirty

[PATCH 12/18] branch-diff: use color for the commit pairs

2018-05-03 Thread Johannes Schindelin

Arguably the most important part of branch-diff's output is the list of
commits in the two branches, together with their relationships.

For that reason, tbdiff introduced color-coding that is pretty
intuitive, especially for unchanged patches (all dim yellow, like the
first line in `git show`'s output) vs modified patches (old commit is
red, new commit is green). Let's imitate that color scheme.

While at it, also copy tbdiff's change of the fragment color to magenta.

Signed-off-by: Johannes Schindelin 
---
 builtin/branch-diff.c | 49 +++
 1 file changed, 36 insertions(+), 13 deletions(-)

diff --git a/builtin/branch-diff.c b/builtin/branch-diff.c
index 7625da09e6f..e505b696d11 100644
--- a/builtin/branch-diff.c
+++ b/builtin/branch-diff.c
@@ -270,13 +270,19 @@ static int get_correspondences(struct string_list *a, 
struct string_list *b,
return res;
 }
 
-static void output_pair_header(struct strbuf *buf,
+static void output_pair_header(struct diff_options *diffopt, struct strbuf 
*buf,
   int i, struct patch_util *a_util,
   int j, struct patch_util *b_util)
 {
static char *dashes;
struct object_id *oid = a_util ? _util->oid : _util->oid;
struct commit *commit;
+   char status;
+   const char *color_reset = diff_get_color_opt(diffopt, DIFF_RESET);
+   const char *color_old = diff_get_color_opt(diffopt, DIFF_FILE_OLD);
+   const char *color_new = diff_get_color_opt(diffopt, DIFF_FILE_NEW);
+   const char *color_commit = diff_get_color_opt(diffopt, DIFF_COMMIT);
+   const char *color;
 
if (!dashes) {
char *p;
@@ -286,21 +292,33 @@ static void output_pair_header(struct strbuf *buf,
*p = '-';
}
 
+   if (j < 0) {
+   color = color_old;
+   status = '<';
+   } else if (i < 0) {
+   color = color_new;
+   status = '>';
+   } else if (strcmp(a_util->patch, b_util->patch)) {
+   color = color_commit;
+   status = '!';
+   } else {
+   color = color_commit;
+   status = '=';
+   }
+
strbuf_reset(buf);
+   strbuf_addstr(buf, status == '!' ? color_old : color);
if (i < 0)
strbuf_addf(buf, "-:  %s ", dashes);
else
strbuf_addf(buf, "%d:  %s ", i + 1,
find_unique_abbrev(_util->oid, DEFAULT_ABBREV));
 
-   if (i < 0)
-   strbuf_addch(buf, '>');
-   else if (j < 0)
-   strbuf_addch(buf, '<');
-   else if (strcmp(a_util->patch, b_util->patch))
-   strbuf_addch(buf, '!');
-   else
-   strbuf_addch(buf, '=');
+   if (status == '!')
+   strbuf_addf(buf, "%s%s", color_reset, color);
+   strbuf_addch(buf, status);
+   if (status == '!')
+   strbuf_addf(buf, "%s%s", color_reset, color_new);
 
if (j < 0)
strbuf_addf(buf, " -:  %s", dashes);
@@ -313,12 +331,15 @@ static void output_pair_header(struct strbuf *buf,
const char *commit_buffer = get_commit_buffer(commit, NULL);
const char *subject;
 
+   if (status == '!')
+   strbuf_addf(buf, "%s%s", color_reset, color);
+
find_commit_subject(commit_buffer, );
strbuf_addch(buf, ' ');
format_subject(buf, subject, " ");
unuse_commit_buffer(commit, commit_buffer);
}
-   strbuf_addch(buf, '\n');
+   strbuf_addf(buf, "%s\n", color_reset);
 
fwrite(buf->buf, buf->len, 1, stdout);
 }
@@ -381,21 +402,21 @@ static void output(struct string_list *a, struct 
string_list *b,
 
/* Show unmatched LHS commit whose predecessors were shown. */
if (i < a->nr && a_util->matching < 0) {
-   output_pair_header(, i, a_util, -1, NULL);
+   output_pair_header(diffopt, , i, a_util, -1, NULL);
i++;
continue;
}
 
/* Show unmatched RHS commits. */
while (j < b->nr && b_util->matching < 0) {
-   output_pair_header(, -1, NULL, j, b_util);
+   output_pair_header(diffopt, , -1, NULL, j, b_util);
b_util = ++j < b->nr ? b->items[j].util : NULL;
}
 
/* Show matching LHS/RHS pair. */
if (j < b->nr) {
a_util = a->items[b_util->matching].util;
-   output_pair_header(,
+   output_pair_header(diffopt, ,
   b_util->matching, a_util, j, b_util);
if (!(diffopt->output_format & DIFF_FORMAT_NO_OUTPUT))

1 2 >

1 - 100 of 126 matches

Mail list logo