Re: [PATCHv5 00/17] Diff machine: highlight moved lines.

2017-05-24 Thread Junio C Hamano
I was trying to see how this is useful in code moving change, with
this command line:

$ git -c color.moved diff js/blame-lib~6 js/blame-lib blame.c blame.h 
builtin/blame.c

Some random observations:

 * You do not seem to have any command line option yet.  I guess
   that is OK while the series is still in RFC state, but when we
   are ready to seriously start considering this for 'next', we'd
   need something like --color-moved that can be used across "diff",
   "log -p" and "show".

 * As configuration variable names go, "color.moved" is probably in
   a wrong section.  Perhaps "diff.colorMoved" or something?

 * The fact that I am using

 [diff "color"] 
old = red reverse
whitespace = blue reverse

   on a "black ink on white paper" terminal might have an effect on
   this, but lines printed in either bold green and on green
   background (i.e. not new ones but are the ones moved from
   elsewhere) stood out a lot more prominently than lines printed in
   green (i.e. truly new additions), and I felt that this is totally
   backwards from what I needed for this exercise.  That is, while
   reviewing a code moving change, I want to be able to concentrate
   primarily of three things:

   - What are the new lines that are *not* moved from elsewhere?
 Did the submitter try to sneak in unrelated changes?

   - What are the changes that are truly lost, not moved to
 elsewhere?

   - Do the lines moved from elsewhere form a coherent whole?  Where
 is the boundary between blocks of text that are copied?  Do
 adjacent two blocks of moved text come from the same old place
 next to each other?

   Using colors that stand out more prominently than for the regular
   new/old lines is counter-productive for all of these, especially
   for the first two purposes.  I may suggest using cyan (or any
   color that is very close to the background) so that the presense
   of moved lines are merely felt without being distracting.  IOW,
   while reviewing code moving patch, moved parts want to be dimmed,
   not highlighted.



Re: [PATCH 00/29] Add blame to libgit

2017-05-24 Thread Junio C Hamano
Jeff Smith  writes:

> Rather than duplicate large portions of builtin/blame.c in cgit, it
> would be better to shift its core functionality into libgit.a.  The
> functionality left in builtin/blame.c mostly relates to terminal
> presentation.

This was a lot more pleasant to review compared to the last round.
I made a few small comments here and there, but I do not think there
was anything major that needs to be corrected by another round.

Thanks.


Re: [PATCH 28/29] blame: move scoreboard setup to libgit

2017-05-24 Thread Junio C Hamano
Jeff Smith  writes:

> Signed-off-by: Jeff Smith 
> ---
>  blame.c | 279 
> +++-
>  blame.h |  10 +-
>  builtin/blame.c | 276 ---
>  3 files changed, 281 insertions(+), 284 deletions(-)
>
> ...
> +static struct commit *find_single_initial(struct rev_info *revs,
> +   const char **name_p)
> +{
> + int i;
> + struct commit *found = NULL;
> + const char *name = NULL;
> +
> + /*
> +  * There must be one and only one negative commit, and it must be
> +  * the boundary.
> +  */
> + for (i = 0; i < revs->pending.nr; i++) {
> + struct object *obj = revs->pending.objects[i].item;
> + if (!(obj->flags & UNINTERESTING))
> + continue;
> + obj = deref_tag(obj, NULL, 0);
> + if (obj->type != OBJ_COMMIT)
> + die("Non commit %s?", revs->pending.objects[i].name);
> + if (found)
> + die("More than one commit to dig up from, %s and %s?",
> + revs->pending.objects[i].name, name);
> + found = (struct commit *) obj;
> + name = revs->pending.objects[i].name;
> + }
> +
> + if (!name)
> + found = dwim_reverse_initial(revs, &name);
> + if (!name)
> + die("No commit to dig up from?");
> +
> + if (name_p)
> + *name_p = name;
> + return found;
> +}
> +...
> -static struct commit *find_single_initial(struct rev_info *revs,
> -   const char **name_p)
> -{
> - int i;
> - const char *final_commit_name = NULL;
> - struct commit *found = NULL;
> -
> -...
> -
> - if (!final_commit_name)
> - found = dwim_reverse_initial(revs, &final_commit_name);
> - if (!final_commit_name)
> - die("No commit to dig up from?");
> -
> - if (name_p)
> - *name_p = final_commit_name;
> - return found;
> -}


In a patch whose primary purpose is to move code between files,
making what used to be public to static and vice versa is an
integral part of moving code.  That is why we want to see a patch
organized in such a way that comparing the lines that are lost from
builtin/blame.c and the lines that are added to blame.[ch] is made
easy.

And from that point of view, it was somewhat irritating to find this
kind of meaningless change.  If you didn't like the name of the
variable "final-commit-name", that shold have been renamed while the
code was still in builtin/blame.c

The end result looks OK anyway (I've checked 29/29 as well).

Thanks.




Re: What's cooking in git.git (May 2017, #07; Tue, 23)

2017-05-24 Thread Stefan Beller
On Wed, May 24, 2017 at 8:42 PM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason  writes:
>
>>> The tests added by grep rely on the old content of
>>> test 2 'grep correctly finds patterns in a submodule'.
>>
>> Sorry about the fallout.
>>
>>> The (whitespace broken) diff below fixes it.
>
> Ah, then, this was an example of maintainer not doing a good job.
> When I see a topic that pass its own test that fails when merged to
> 'pu', I usually try to see where it goes wrong myself and come up
> with a fix in an evil merge, but this time I didn't have enough time
> to do so before sending out the "What's cooking" report.
>
> Here is what I taught my merge-fix machinery to apply after
> mechanical merge of the two topics.

Please evict (or stop paying attention to)
sb/submodule-blanket-recursive as it is fundamentally broken.

I hoped to resend a fixed version today, but it took me longer than expected
to figure out the config machinery playing with submodules.

The diff below looks correct to me.

Thanks,
Stefan
>
>  t/t7814-grep-recurse-submodules.sh | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/t/t7814-grep-recurse-submodules.sh 
> b/t/t7814-grep-recurse-submodules.sh
> index 14eeb54b4b..7184113b9b 100755
> --- a/t/t7814-grep-recurse-submodules.sh
> +++ b/t/t7814-grep-recurse-submodules.sh
> @@ -36,18 +36,18 @@ test_expect_success 'grep correctly finds patterns in a 
> submodule' '
>  test_expect_success 'grep finds patterns in a submodule via config' '
> test_config submodule.recurse true &&
> # expect from previous test
> -   git grep -e "bar" >actual &&
> +   git grep -e "(3|4)" >actual &&
> test_cmp expect actual
>  '
>
>  test_expect_success 'grep --no-recurse-submodules overrides config' '
> test_config submodule.recurse true &&
> cat >expect <<-\EOF &&
> -   a:foobar
> -   b/b:bar
> +   a:(1|2)d(3|4)
> +   b/b:(3|4)
> EOF
>
> -   git grep -e "bar" --no-recurse-submodules >actual &&
> +   git grep -e "(3|4)" --no-recurse-submodules >actual &&
> test_cmp expect actual
>  '
>
> --
> 2.13.0-491-g71cfeddc25
>


Re: [PATCHv5 17/17] diff.c: color moved lines differently

2017-05-24 Thread Stefan Beller
On Wed, May 24, 2017 at 7:27 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> When a patch consists mostly of moving blocks of code around, it can
>> be quite tedious to ensure that the blocks are moved verbatim, and not
>> ...
>> cases. This leads to another thought: We could pass on '--color-moved' to
>> submodules such that they color up moved lines for themselves. If we'd do
>> so only line moves within a repository boundary are marked up.
>>
>> Helped-by: Jonathan Tan 
>> Signed-off-by: Stefan Beller 
>> Signed-off-by: Junio C Hamano 
>>
>> # Conflicts:
>> # diff.c
>> Signed-off-by: Junio C Hamano 
>> Signed-off-by: Stefan Beller 
>> ---
>
> Hmph, what are these final lines about?

See the explanation in the patch 16/17.
My guess is that one of us (me) was careless again.

The commented lines are easily produced by git-gui, that
I use. I think it just takes the commit message from the underlying
git-core. But unlike git-core, it doesn't strip off commented lines
as there is no extra information in comments presented.
I wonder how you are the first signoff after the conflict markers
though, as that would hint that you signed off a commit message
with the commented conflict lines first, before I had them.

Puzzeled, will fix in a reroll.

Thanks,
Stefan


Re: [PATCHv5 16/17] diff: buffer all output if asked to

2017-05-24 Thread Stefan Beller
On Wed, May 24, 2017 at 7:26 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> Introduce a new option 'use_buffer' in the struct diff_options which
>> controls whether all output is buffered up until all output is available.
>> ...
>> Unconditionally enable output via buffer in this patch as it yields
>> a great opportunity for testing, i.e. all the diff tests from the
>> test suite pass without having reordering issues (i.e. only parts
>> of the output got buffered, and we forgot to buffer other parts).
>> The test suite passes, which gives confidence that we converted all
>> functions to use emit_line for output.
>>
>> Signed-off-by: Stefan Beller 
>> Signed-off-by: Junio C Hamano 
>
> Oh, did I?

Yes, this is essentially the v4 with small indentation fixes, so I
assumed your signoff is still valid.

Which leads to explaining my workflow (which I think we
discussed that half a year ago with Dscho in a longer thread).

As soon as you apply a series I take that series and work off
that series, because you explained you would do smaller
fixups occasionally.

Patches that change drastically, I strip off your signoff and
pretend like I created them from scratch, but for those which
I barely touch (if at all), I do not remove your signoff, as
I'd be just passing them along again.

Maybe I have to rethink that strategy.

>
>> ---
>>  diff.c | 155 
>> ++---
>>  diff.h |  41 +
>>  2 files changed, 161 insertions(+), 35 deletions(-)
>>
>> diff --git a/diff.c b/diff.c
>> index 514c5facd7..8e06206881 100644
>> --- a/diff.c
>> +++ b/diff.c
>> ...
>> @@ -2579,6 +2628,13 @@ static void builtin_diff(const char *name_a,
>>   xecfg.ctxlen = strtoul(v, NULL, 10);
>>   if (o->word_diff)
>>   init_diff_words_data(&ecbdata, o, one, two);
>> + if (o->use_buffer) {
>> + struct diff_line e = diff_line_INIT;
>
> This ...
>
>> + e.state = DIFF_LINE_RELOAD_WS_RULE;
>> ...
>> +#define diff_line_INIT {NULL, NULL, NULL, 0, 0, 0}
>
> ... and this should be in all caps.   We do not say
>
> struct strbuf buf = strbuf_INIT;
>
> and we should do the same for this new thing.

Yes. Totally agree. That is fallout from a mechanical
replace all s/buffered_patch_line/diff_line/ and the case
sensitivity got lost.

Will fix again.

Stefan


Re: [PATCH 24/29] blame: move core structures to header

2017-05-24 Thread Junio C Hamano
Jeff Smith  writes:

> The origin, entry, and scoreboard structures are core to the blame
> interface and need to be exposed for blame functionality.
>
> Signed-off-by: Jeff Smith 
> ---

Looks good.  

Thanks to "prepare everything in place before bulk movement"
approach this round takes, unlike the previous one, reviewing this
is just the matter of running

$ git blame -C -b HEAD^..HEAD -- blame.h

after applying this patch.  Anything that is shown as "introduced"
by HEAD needs careful examination; everything else we already know
are good because we know it came from existing parts.

Thanks.


Re: [PATCH 23/29] blame: create entry prepend function

2017-05-24 Thread Junio C Hamano
Jeff Smith  writes:

> Create function that populates a blame_entry and prepends it to a list.
>
> Signed-off-by: Jeff Smith 
> ---
>  builtin/blame.c | 25 +++--
>  1 file changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/builtin/blame.c b/builtin/blame.c
> index fd41551..29771b7 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -2643,6 +2643,20 @@ void setup_scoreboard(struct blame_scoreboard *sb, 
> const char *path, struct blam
>   *orig = o;
>  }
>  
> +struct blame_entry *blame_entry_prepend(struct blame_entry *head,
> + long start, long end,
> + struct blame_origin *o)
> +{
> + struct blame_entry *new_head = xcalloc(1, sizeof(struct blame_entry));

We have a slight tendency to favour sizeof(*pointer_to_thing) over
sizeof(type_of_thing), which is why the original is written that
way.  The tendency is so slight that if this were a new code, I do
not think we mind it written either way, but a patch whose goal is
to move existing code around does not have a justification to change
between the two.

> + new_head->lno = start;
> + new_head->num_lines = end - start;
> + new_head->suspect = o;
> + new_head->s_lno = start;
> + new_head->next = head;

On the other hand, naming the variables that receive start/end
anything but start/end was a stupidity in the original code (I can
badmouth the original because it is all my code ;-).  Thanks for
fixing the sloppy naming.

> + blame_origin_incref(o);
> + return new_head;
> +}
> +
>  int cmd_blame(int argc, const char **argv, const char *prefix)
>  {
>   struct rev_info revs;
> @@ -2885,16 +2899,7 @@ int cmd_blame(int argc, const char **argv, const char 
> *prefix)
>  
>   for (range_i = ranges.nr; range_i > 0; --range_i) {
>   const struct range *r = &ranges.ranges[range_i - 1];
> - long bottom = r->start;
> - long top = r->end;
> - struct blame_entry *next = ent;
> - ent = xcalloc(1, sizeof(*ent));
> - ent->lno = bottom;
> - ent->num_lines = top - bottom;
> - ent->suspect = o;
> - ent->s_lno = bottom;
> - ent->next = next;
> - blame_origin_incref(o);
> + ent = blame_entry_prepend(ent, r->start, r->end, o);
>   }
>  
>   o->suspects = ent;


Re: [PATCH 22/29] blame: create scoreboard setup function

2017-05-24 Thread Junio C Hamano
Jeff Smith  writes:

> Create function that completes setting up blame_scoreboard structure.
>
> Signed-off-by: Jeff Smith 
> ---
>  builtin/blame.c | 190 
> ++--
>  1 file changed, 101 insertions(+), 89 deletions(-)
>
> diff --git a/builtin/blame.c b/builtin/blame.c
> index f839571..fd41551 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> ...
> @@ -2759,92 +2855,8 @@ int cmd_blame(int argc, const char **argv, const char 
> *prefix)
> ...
> - sb.num_read_blob++;
> - lno = prepare_lines(&sb);
> + setup_scoreboard(&sb, path, &o);
> + lno = sb.num_lines;
>  
>   if (lno && !range_list.nr)
>   string_list_append(&range_list, "1");

After this change, nobody uses the value returned from
prepare_ilnes(), but the function is still returning an "int".
Perhaps change it to return nothing?


Re: [PATCH 20/29] blame: rework methods that determine 'final' commit

2017-05-24 Thread Junio C Hamano
Jeff Smith  writes:

> Either prepare_initial or prepare_final is used to determine which
> commit is marked as 'final'. Call the underlying methods directly to
> make this more clear.
>
> Signed-off-by: Jeff Smith 
> ---
>  builtin/blame.c | 49 +++--
>  1 file changed, 23 insertions(+), 26 deletions(-)

I do not necessarily agree with the log message in that this change
makes the contrast between prepare_{initial,final} more clear, but I
like it nevertheless because it makes the interface into the helpers
that prepare "initial" and "final" very similar.

The lifetime rule of final_commit_name used to be that these helpers
make a copy to be owned by the caller, and it seems that this change
makes all codepaths borrow it from entries in the revs.pending list.
I think that conversion is also done correctly in this patch.

So far, nicely done.


Re: [PATCH 18/29] blame: move progess updates to a scoreboard callback

2017-05-24 Thread Junio C Hamano
Jeff Smith  writes:

> Subject: Re: [PATCH 18/29] blame: move progess updates to a scoreboard 
> callback

s/progess/progress/ (I can do this on my end).

> Allow the interface user to decide how to handle a progress update.
>
> Signed-off-by: Jeff Smith 
> ---
>  builtin/blame.c | 27 +--
>  1 file changed, 17 insertions(+), 10 deletions(-)
>
> -static void found_guilty_entry(struct blame_entry *ent,
> -struct progress_info *pi)
> +static void found_guilty_entry(struct blame_entry *ent, void *data)
>  {
> + struct progress_info *pi = (struct progress_info *)data;
> +
>   if (incremental) {
>   struct blame_origin *suspect = ent->suspect;
>  

This hunk is interesting.  The function not only does the "progress"
eye candy, but it actually handles the "--incremental" option.
Anybody who wants to do something similar using the libified blame
needs to implement it in their found_guilty callback like this one
does, which is probably a good division of labor between the blame
lib and the front-end (which builtin/blame.c is one instance of).

> @@ -1754,11 +1758,6 @@ static void assign_blame(struct blame_scoreboard *sb, 
> int opt)
>  {
>   struct rev_info *revs = sb->revs;
>   struct commit *commit = prio_queue_get(&sb->commits);
> - struct progress_info pi = { NULL, 0 };
> -
> - if (show_progress)
> - pi.progress = start_progress_delay(_("Blaming lines"),
> -sb->num_lines, 50, 1);

And this piece (and matching "stop progress" at the end of the
function) is now the responsibility of the caller, which makes
sense.

>  
> + sb.found_guilty_entry = &found_guilty_entry;
> + sb.found_guilty_entry_data = π
> + if (show_progress)
> + pi.progress = start_progress_delay(_("Blaming lines"),
> +sb.num_lines, 50, 1);
> +
>   assign_blame(&sb, opt);
>  
> + stop_progress(&pi.progress);
> +
>   if (!incremental)
>   setup_pager();


Very nicely done so far.


Re: What's cooking in git.git (May 2017, #07; Tue, 23)

2017-05-24 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

>> The tests added by grep rely on the old content of
>> test 2 'grep correctly finds patterns in a submodule'.
>
> Sorry about the fallout.
>
>> The (whitespace broken) diff below fixes it.

Ah, then, this was an example of maintainer not doing a good job.
When I see a topic that pass its own test that fails when merged to
'pu', I usually try to see where it goes wrong myself and come up
with a fix in an evil merge, but this time I didn't have enough time
to do so before sending out the "What's cooking" report.

Here is what I taught my merge-fix machinery to apply after
mechanical merge of the two topics.

 t/t7814-grep-recurse-submodules.sh | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t7814-grep-recurse-submodules.sh 
b/t/t7814-grep-recurse-submodules.sh
index 14eeb54b4b..7184113b9b 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -36,18 +36,18 @@ test_expect_success 'grep correctly finds patterns in a 
submodule' '
 test_expect_success 'grep finds patterns in a submodule via config' '
test_config submodule.recurse true &&
# expect from previous test
-   git grep -e "bar" >actual &&
+   git grep -e "(3|4)" >actual &&
test_cmp expect actual
 '
 
 test_expect_success 'grep --no-recurse-submodules overrides config' '
test_config submodule.recurse true &&
cat >expect <<-\EOF &&
-   a:foobar
-   b/b:bar
+   a:(1|2)d(3|4)
+   b/b:(3|4)
EOF
 
-   git grep -e "bar" --no-recurse-submodules >actual &&
+   git grep -e "(3|4)" --no-recurse-submodules >actual &&
test_cmp expect actual
 '
 
-- 
2.13.0-491-g71cfeddc25



Re: [Non-Bug] cloning a repository with a default MASTER branch tries to check out the master branch

2017-05-24 Thread Junio C Hamano
Junio C Hamano  writes:

> Jeff King  writes:
>
>> Unfortunately, it can't, because the ref doesn't exist:
>>
>>   $ git ls-remote git://github.com/passcod/UPPERCASE-NPM.git
>>   efc7dbfd6ca155d5d19ce67eb98603896062f35a   refs/heads/MASTER
>>   e60ea8e6ec45ec45ff44ac8939cb4105b16477da   refs/pull/1/head
>>   f35a73dcb151d336dc3d30c9a2c7423ecdb7bd1c   refs/pull/2/head
>>   0d9b3a1268ff39350e04a7183af0add912b686e6   refs/tags/V1.0.0
>>   efc7dbfd6ca155d5d19ce67eb98603896062f35a   refs/tags/V1.0.1
>>
>> There is no HEAD line at all, so we have no information about it on the
>> client side.  Likewise, if you run with GIT_TRACE_PACKET=1, you'll see
>> that the capabilities line does not include a symref marker either.
>>
>> So if we wanted to improve this, I think the first step would be for the
>> server to start sending symref lines for HEAD, even when it does not
>> resolve to anything.
>
> Yup, noticed the same and I agree with your conclusion.

We probably should make head_ref_namespaced() to take the
resolve_flags that is passed down thru refs_read_ref_full() down to
refs_resolve_ref_unsafe().  For the purpose of the first call to it
in upload-pack.c to call back find_symref(), we do not need and want
to say RESOLVE_REF_READING (which requires a symref to be pointing
at an existing ref).  I suspect all the other calls (there are 2
other in unload-pack.c and yet another in http-backend.c) to the
function should keep passing RESOLVE_REF_READING, as they do want to
omit a symbolic ref that points at an unborn branch.



Re: Passing revs to git-bundle-create via stdin

2017-05-24 Thread Junio C Hamano
Junio C Hamano  writes:

> Jeff King  writes:
>
>> I think what's happening is that git-bundle actually runs _two_
>> traversals using the command-line arguments. ...
>> ... It was just a way of confirming my
>> guess about the double-read.
>>
>> The real solutions I can think of are:
>>
>>   1. Teach git-bundle not to require the double-read. I'm not sure why
>>  it's written the way it is, but presumably it would be tricky to
>>  undo it (or we would have just written it the other way in the
>>  first place!)
>
> If I remember correctly, the reason why it does the double-read is
> because it wants to cope with things like "--since".  There is no
> explicit bottom of the DAG specified on the command line, and the
> first one (without "--objects") is done to find "prerequisites" that
> are written in the header.
>
> Then the packdata is generated, which does another traversal (this
> time with "--objects" option).
>
> So perhaps the right way to fix it is to keep the first traversal
> as-is, but update the second one (I think write-bundle-refs is the
> culprit) so that it does not use the user-supplied command line
> as-is; instead it should use the positive end of the history from
> the command line with the negative end set to these "prerequisites"
> commits.
>
> I said "command line" in the above, but read that as "end user
> input"; the list of rev-list command line arguments given from the
> standard input is exactly the same deal.

Actually, after thinking a bit more about this, I think the bundle
we currently generate may be a bit less efficient than ideal when
options like --since or --max-count are used.

Imagine a history of this shape (child grows on the right hand side):

 A---D-E-G---H
  \ /
   B---C---F

The labels on commits also denote their timestamps in relative
terms, i.e. A has the oldest timestamp, D, even though it is a
parent of B, has newer timestamp than B has, etc.

Now, imagine running "git log --since=$time H" with time set to the
timestamp commit D has.  We traversal from H, following parent chain,
and stop when we see a commit with timestamp older than $time.  So,
we'd enumerate H G F E D; C and A are "boundaries"---we looked at,
but we decided not to include in the result.

A bundle file format records "By using this bundle, you can advance
your history up to this commit", which can be seen by running "git
ls-remote" on the bundle file.  It also records "However, this
bundle does not record the entire history; you need to have the
complete history behind these commits".  These are called
"prerequisites", and can be checked with "git bundle verify".
And then of course it has an actual packfile (which is thin).

So putting all together,

git bundle create mybundle --since=$time H

would record H as its head, and also C and A as prerequistes.

The "double reading of --stdin" we have been discussing is there
because we run two traversals; the first one is to find the
prerequisites (i.e. C and A in the above example).  The second one
uses the same rev-list arguments (i.e. "--since=$time H") to
generate pack, so it will include D.

As the recipient of a bundle is required to have complete history
behind both A and C, however, the packfile generated with the
current proceess is inefficient--it includes D but it does not need
to.

If we change the argument to rev-list used in the actual packfile
generation, and instead use the boundary we learned during the first
traversal (i.e. A and C in the above example) and the tip of the
history being recorded in the resulting bundle (i.e. H), then we'd
run "git log ^A ^C H", which would only walk "H G F E".

Which would be smaller (it no longer includes D), and the recipient
who has A and C can still apply.


Re: [PATCHv5 17/17] diff.c: color moved lines differently

2017-05-24 Thread Junio C Hamano
Stefan Beller  writes:

> When a patch consists mostly of moving blocks of code around, it can
> be quite tedious to ensure that the blocks are moved verbatim, and not
> ...
> cases. This leads to another thought: We could pass on '--color-moved' to
> submodules such that they color up moved lines for themselves. If we'd do
> so only line moves within a repository boundary are marked up.
>
> Helped-by: Jonathan Tan 
> Signed-off-by: Stefan Beller 
> Signed-off-by: Junio C Hamano 
>
> # Conflicts:
> # diff.c
> Signed-off-by: Junio C Hamano 
> Signed-off-by: Stefan Beller 
> ---

Hmph, what are these final lines about?


Re: [PATCHv5 16/17] diff: buffer all output if asked to

2017-05-24 Thread Junio C Hamano
Stefan Beller  writes:

> Introduce a new option 'use_buffer' in the struct diff_options which
> controls whether all output is buffered up until all output is available.
> ...
> Unconditionally enable output via buffer in this patch as it yields
> a great opportunity for testing, i.e. all the diff tests from the
> test suite pass without having reordering issues (i.e. only parts
> of the output got buffered, and we forgot to buffer other parts).
> The test suite passes, which gives confidence that we converted all
> functions to use emit_line for output.
>
> Signed-off-by: Stefan Beller 
> Signed-off-by: Junio C Hamano 

Oh, did I?

> ---
>  diff.c | 155 
> ++---
>  diff.h |  41 +
>  2 files changed, 161 insertions(+), 35 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 514c5facd7..8e06206881 100644
> --- a/diff.c
> +++ b/diff.c
> ...
> @@ -2579,6 +2628,13 @@ static void builtin_diff(const char *name_a,
>   xecfg.ctxlen = strtoul(v, NULL, 10);
>   if (o->word_diff)
>   init_diff_words_data(&ecbdata, o, one, two);
> + if (o->use_buffer) {
> + struct diff_line e = diff_line_INIT;

This ...

> + e.state = DIFF_LINE_RELOAD_WS_RULE;
> ...
> +#define diff_line_INIT {NULL, NULL, NULL, 0, 0, 0}

... and this should be in all caps.   We do not say

struct strbuf buf = strbuf_INIT;

and we should do the same for this new thing.



Re: [Non-Bug] cloning a repository with a default MASTER branch tries to check out the master branch

2017-05-24 Thread Junio C Hamano
Jeff King  writes:

> Unfortunately, it can't, because the ref doesn't exist:
>
>   $ git ls-remote git://github.com/passcod/UPPERCASE-NPM.git
>   efc7dbfd6ca155d5d19ce67eb98603896062f35arefs/heads/MASTER
>   e60ea8e6ec45ec45ff44ac8939cb4105b16477darefs/pull/1/head
>   f35a73dcb151d336dc3d30c9a2c7423ecdb7bd1crefs/pull/2/head
>   0d9b3a1268ff39350e04a7183af0add912b686e6refs/tags/V1.0.0
>   efc7dbfd6ca155d5d19ce67eb98603896062f35arefs/tags/V1.0.1
>
> There is no HEAD line at all, so we have no information about it on the
> client side.  Likewise, if you run with GIT_TRACE_PACKET=1, you'll see
> that the capabilities line does not include a symref marker either.
>
> So if we wanted to improve this, I think the first step would be for the
> server to start sending symref lines for HEAD, even when it does not
> resolve to anything.

Yup, noticed the same and I agree with your conclusion.


Re: Sama/Winbind AD Computer Accounts Moved

2017-05-24 Thread Stefan Beller
On Wed, May 24, 2017 at 2:46 PM, Thompson, Matt  wrote:
>  I realize this is a Samba support list but I'm curious to know if someone 
> may be familiar enough to render a guess.

I think the primary purpose of git@ is different than samba support. ;)
Wrong mailing list?


Sama/Winbind AD Computer Accounts Moved

2017-05-24 Thread Thompson, Matt
Hello, 

We've encountered an issue not previously seen in our environment.  We join our 
Linux machines (most are Oracle Enterprise Linux 6.x or 7.x) to an Active 
Directory domain.  We do this by using Samba/Winbind.  When joining a Linux 
host, we create the computer account in AD ahead of joining the computer to the 
domain.  This ensures the computer account is created in the sub-OU we need it 
in.  Historically, this has worked without issue. 

We recently noticed that this behavior has changed.  Now, when we join a Linux 
host to the domain after creating its computer account, it is moved to the 
default computers OU in the domain.  This is not where we want it to be 
located.  This does not happen when Windows hosts are joined. 

When we run 'net ads join' with debug output, the following line is seen:

"The machine account was moved into the specified OU."

A Google search indicated this is coming from Samba code. 

The version of Samba we are using is 4.4.4-12.  The samba-winbind version is 
the same. 

Was functionality to move the account to the default computers OU added or has 
it historically been in Samba?  If it has, is anyone aware of what 
functionality in AD could have changed to produce this behavior?  I realize 
this is a Samba support list but I'm curious to know if someone may be familiar 
enough to render a guess.  I am not an AD administrator and am at a loss. 

Thank you,

Matt Thompson
Assistant Managing Director
TOSM Enterprise Systems
Texas Tech University System
(806) 834-3646



[PATCHv5 16/17] diff: buffer all output if asked to

2017-05-24 Thread Stefan Beller
Introduce a new option 'use_buffer' in the struct diff_options which
controls whether all output is buffered up until all output is available.

We'll have a new struct 'diff_line' in diff.h which will be used to buffer
each line.  The diff_line will duplicate the memory of the line to buffer
as that is easiest to reason about for now. In a future patch we may want
to decrease the memory usage by not duplicating all output for buffering
but rather we may want to store offsets into the file or in case of hunk
descriptions such as the similarity score, we could just store the
relevant number and reproduce the text later on.

This approach was chosen as a first step because it is quite simple
compared to the alternative with less memory footprint.

emit_line factors out the emission part into emit_line_emission,
and depending on the diff_options->use_buffer the emission
will be performed directly when calling emit_line or after the
whole process is done, i.e. by buffering we have add the possibility
for a second pass over the whole output before doing the actual
output.

In 6440d34 (2012-03-14, diff: tweak a _copy_ of diff_options with
word-diff) we introduced a duplicate diff options struct for word
emissions as we may have different regex settings in there.
When buffering the output, we need to operate on just one buffer,
so we have to copy back the emissions of the word buffer into the
main buffer.

Unconditionally enable output via buffer in this patch as it yields
a great opportunity for testing, i.e. all the diff tests from the
test suite pass without having reordering issues (i.e. only parts
of the output got buffered, and we forgot to buffer other parts).
The test suite passes, which gives confidence that we converted all
functions to use emit_line for output.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 155 ++---
 diff.h |  41 +
 2 files changed, 161 insertions(+), 35 deletions(-)

diff --git a/diff.c b/diff.c
index 514c5facd7..8e06206881 100644
--- a/diff.c
+++ b/diff.c
@@ -516,54 +516,85 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t 
*mf2,
ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-
-static void emit_line(struct diff_options *o,
- const char *set, const char *reset,
- int add_line_prefix, int markup_ws,
- int sign, const char *line, int len)
+static void emit_diff_line(struct diff_options *o,
+  struct diff_line *e)
 {
const char *ws;
int has_trailing_newline, has_trailing_carriage_return;
+   int len = e->len;
FILE *file = o->file;
 
-   if (add_line_prefix)
+   if (e->add_line_prefix)
fputs(diff_line_prefix(o), file);
 
-   if (markup_ws) {
+   switch (e->state) {
+   case DIFF_LINE_WS:
ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
+   if (e->set)
+   fputs(e->set, file);
+   if (e->sign)
+   fputc(e->sign, file);
+   if (e->reset)
+   fputs(e->reset, file);
+   ws_check_emit(e->line, e->len, o->ws_rule,
+ file, e->set, e->reset, ws);
+   return;
+   case DIFF_LINE_ASIS:
+   has_trailing_newline = (len > 0 && e->line[len-1] == '\n');
+   if (has_trailing_newline)
+   len--;
+   has_trailing_carriage_return = (len > 0 && e->line[len-1] == 
'\r');
+   if (has_trailing_carriage_return)
+   len--;
 
-   if (set)
-   fputs(set, file);
-   if (sign)
-   fputc(sign, file);
-   if (reset)
-   fputs(reset, file);
-   ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
-   ws_check_emit(line, len, o->ws_rule,
- file, set, reset, ws);
+   if (len || e->sign) {
+   if (e->set)
+   fputs(e->set, file);
+   if (e->sign)
+   fputc(e->sign, file);
+   fwrite(e->line, len, 1, file);
+   if (e->reset)
+   fputs(e->reset, file);
+   }
+   if (has_trailing_carriage_return)
+   fputc('\r', file);
+   if (has_trailing_newline)
+   fputc('\n', file);
+   return;
+   case DIFF_LINE_RELOAD_WS_RULE:
+   o->ws_rule = whitespace_rule(e->line);
return;
+   default:
+   die("BUG: malformatted buffered patch line: '%d'", e->state);
}
+}
 
-   has_trailing_newline = (len > 0 && line[len-1] == '\n');
-   if (has_trailing_

[PATCHv5 13/17] diff.c: convert diff_flush to use emit_line_*

2017-05-24 Thread Stefan Beller
In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers diff_flush.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 8317824963..8ebe673331 100644
--- a/diff.c
+++ b/diff.c
@@ -4872,7 +4872,9 @@ void diff_flush(struct diff_options *options)
emit_line(options, NULL, NULL, 1, 0, term, !!term[0]);
if (options->stat_sep) {
/* attach patch instead of inline */
-   fputs(options->stat_sep, options->file);
+   emit_line(options, NULL, NULL, 0, 0,
+ options->stat_sep,
+ strlen(options->stat_sep));
}
}
 
-- 
2.13.0.18.g7d86cc8ba0



[PATCHv5 06/17] diff.c: convert builtin_diff to use emit_line_*

2017-05-24 Thread Stefan Beller
In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers builtin_diff.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 33 -
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/diff.c b/diff.c
index 8186289734..4fa976d43c 100644
--- a/diff.c
+++ b/diff.c
@@ -1289,8 +1289,9 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
o->found_changes = 1;
 
if (ecbdata->header) {
-   fprintf(o->file, "%s", ecbdata->header->buf);
-   strbuf_reset(ecbdata->header);
+   emit_line(o, NULL, NULL, 0, 0,
+ ecbdata->header->buf, ecbdata->header->len);
+   strbuf_release(ecbdata->header);
ecbdata->header = NULL;
}
 
@@ -2435,7 +2436,7 @@ static void builtin_diff(const char *name_a,
if (complete_rewrite &&
(textconv_one || !diff_filespec_is_binary(one)) &&
(textconv_two || !diff_filespec_is_binary(two))) {
-   fprintf(o->file, "%s", header.buf);
+   emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
strbuf_reset(&header);
emit_rewrite_diff(name_a, name_b, one, two,
textconv_one, textconv_two, o);
@@ -2445,7 +2446,7 @@ static void builtin_diff(const char *name_a,
}
 
if (o->irreversible_delete && lbl[1][0] == '/') {
-   fprintf(o->file, "%s", header.buf);
+   emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
strbuf_reset(&header);
goto free_ab_and_return;
} else if (!DIFF_OPT_TST(o, TEXT) &&
@@ -2456,12 +2457,15 @@ static void builtin_diff(const char *name_a,
!DIFF_OPT_TST(o, BINARY)) {
if (!oidcmp(&one->oid, &two->oid)) {
if (must_show_header)
-   fprintf(o->file, "%s", header.buf);
+   emit_line(o, NULL, NULL, 0, 0,
+ header.buf, header.len);
goto free_ab_and_return;
}
-   fprintf(o->file, "%s", header.buf);
-   fprintf(o->file, "%sBinary files %s and %s differ\n",
-   line_prefix, lbl[0], lbl[1]);
+   emit_line(o, NULL, NULL, 0, 0,
+ header.buf, header.len);
+   emit_line_fmt(o, NULL, NULL, 1,
+ "Binary files %s and %s differ\n",
+ lbl[0], lbl[1]);
goto free_ab_and_return;
}
if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
@@ -2470,16 +2474,19 @@ static void builtin_diff(const char *name_a,
if (mf1.size == mf2.size &&
!memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
if (must_show_header)
-   fprintf(o->file, "%s", header.buf);
+   emit_line(o, NULL, NULL, 0, 0,
+ header.buf, header.len);
goto free_ab_and_return;
}
-   fprintf(o->file, "%s", header.buf);
+   emit_line(o, NULL, NULL, 0, 0,
+ header.buf, header.len);
strbuf_reset(&header);
if (DIFF_OPT_TST(o, BINARY))
emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
else
-   fprintf(o->file, "%sBinary files %s and %s differ\n",
-   line_prefix, lbl[0], lbl[1]);
+   emit_line_fmt(o, NULL, NULL, 1,
+ "Binary files %s and %s differ\n",
+ lbl[0], lbl[1]);
o->found_changes = 1;
} else {
/* Crazy xdl interfaces.. */
@@ -2491,7 +2498,7 @@ static void builtin_diff(const char *name_a,
const struct userdiff_funcname *pe;
 
if (must_show_header) {
-   fprintf(o->file, "%s", header.buf);
+   emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
strbuf_reset(&header);
}

[PATCHv5 08/17] diff.c: convert emit_rewrite_lines to use emit_line_*

2017-05-24 Thread Stefan Beller
In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_rewrite_lines.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 3dda9f3c8e..ca6b48cf49 100644
--- a/diff.c
+++ b/diff.c
@@ -722,15 +722,23 @@ static void add_line_count(struct strbuf *out, int count)
 static void emit_rewrite_lines(struct emit_callback *ecb,
   int prefix, const char *data, int size)
 {
-   const char *endp = NULL;
-   static const char *nneof = " No newline at end of file\n";
const char *reset = diff_get_color(ecb->color_diff, DIFF_RESET);
+   struct strbuf sb = STRBUF_INIT;
 
while (0 < size) {
int len;
 
-   endp = memchr(data, '\n', size);
-   len = endp ? (endp - data + 1) : size;
+   const char *endp = memchr(data, '\n', size);
+   if (endp)
+   len = endp - data + 1;
+   else {
+   strbuf_add(&sb, data, size);
+   strbuf_addch(&sb, '\n');
+   size = 0; /* to exit the loop. */
+
+   data = sb.buf;
+   len = sb.len;
+   }
if (prefix != '+') {
ecb->lno_in_preimage++;
emit_del_line(reset, ecb, data, len);
@@ -741,12 +749,13 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
size -= len;
data += len;
}
-   if (!endp) {
+   if (sb.len) {
+   static const char *nneof = "\\ No newline at end of file\n";
const char *context = diff_get_color(ecb->color_diff,
 DIFF_CONTEXT);
-   putc('\n', ecb->opt->file);
-   emit_line(ecb->opt, context, reset, 1, '\\',
- nneof, strlen(nneof));
+   emit_line(ecb->opt, context, reset, 1, 0,
+   nneof, strlen(nneof));
+   strbuf_release(&sb);
}
 }
 
-- 
2.13.0.18.g7d86cc8ba0



[PATCHv5 14/17] diff.c: convert diff_summary to use emit_line_*

2017-05-24 Thread Stefan Beller
In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers diff_summary.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 64 ++--
 1 file changed, 34 insertions(+), 30 deletions(-)

diff --git a/diff.c b/diff.c
index 8ebe673331..76cafde4be 100644
--- a/diff.c
+++ b/diff.c
@@ -4504,67 +4504,71 @@ static void flush_one_pair(struct diff_filepair *p, 
struct diff_options *opt)
}
 }
 
-static void show_file_mode_name(FILE *file, const char *newdelete, struct 
diff_filespec *fs)
+static void show_file_mode_name(struct diff_options *opt, const char 
*newdelete, struct diff_filespec *fs)
 {
+   struct strbuf sb = STRBUF_INIT;
if (fs->mode)
-   fprintf(file, " %s mode %06o ", newdelete, fs->mode);
+   strbuf_addf(&sb, " %s mode %06o ", newdelete, fs->mode);
else
-   fprintf(file, " %s ", newdelete);
-   write_name_quoted(fs->path, file, '\n');
-}
+   strbuf_addf(&sb, " %s ", newdelete);
 
+   quote_c_style(fs->path, &sb, NULL, 0);
+   strbuf_addch(&sb, '\n');
+   emit_line(opt, NULL, NULL, 1, 0, sb.buf, sb.len);
+   strbuf_release(&sb);
+}
 
-static void show_mode_change(FILE *file, struct diff_filepair *p, int 
show_name,
-   const char *line_prefix)
+static void show_mode_change(struct diff_options *opt, struct diff_filepair *p,
+   int show_name)
 {
if (p->one->mode && p->two->mode && p->one->mode != p->two->mode) {
-   fprintf(file, "%s mode change %06o => %06o%c", line_prefix, 
p->one->mode,
-   p->two->mode, show_name ? ' ' : '\n');
+   struct strbuf sb = STRBUF_INIT;
if (show_name) {
-   write_name_quoted(p->two->path, file, '\n');
+   strbuf_addch(&sb, ' ');
+   quote_c_style(p->two->path, &sb, NULL, 0);
}
+   emit_line_fmt(opt, NULL, NULL, 1,
+ " mode change %06o => %06o%s\n",
+ p->one->mode, p->two->mode,
+ show_name ? sb.buf : "");
+   strbuf_release(&sb);
}
 }
 
-static void show_rename_copy(FILE *file, const char *renamecopy, struct 
diff_filepair *p,
-   const char *line_prefix)
+static void show_rename_copy(struct diff_options *opt, const char *renamecopy,
+   struct diff_filepair *p)
 {
char *names = pprint_rename(p->one->path, p->two->path);
-
-   fprintf(file, " %s %s (%d%%)\n", renamecopy, names, 
similarity_index(p));
+   emit_line_fmt(opt, NULL, NULL, 1, " %s %s (%d%%)\n",
+ renamecopy, names, similarity_index(p));
free(names);
-   show_mode_change(file, p, 0, line_prefix);
+   show_mode_change(opt, p, 0);
 }
 
 static void diff_summary(struct diff_options *opt, struct diff_filepair *p)
 {
-   FILE *file = opt->file;
-   const char *line_prefix = diff_line_prefix(opt);
-
switch(p->status) {
case DIFF_STATUS_DELETED:
-   fputs(line_prefix, file);
-   show_file_mode_name(file, "delete", p->one);
+   show_file_mode_name(opt, "delete", p->one);
break;
case DIFF_STATUS_ADDED:
-   fputs(line_prefix, file);
-   show_file_mode_name(file, "create", p->two);
+   show_file_mode_name(opt, "create", p->two);
break;
case DIFF_STATUS_COPIED:
-   fputs(line_prefix, file);
-   show_rename_copy(file, "copy", p, line_prefix);
+   show_rename_copy(opt, "copy", p);
break;
case DIFF_STATUS_RENAMED:
-   fputs(line_prefix, file);
-   show_rename_copy(file, "rename", p, line_prefix);
+   show_rename_copy(opt, "rename", p);
break;
default:
if (p->score) {
-   fprintf(file, "%s rewrite ", line_prefix);
-   write_name_quoted(p->two->path, file, ' ');
-   fprintf(file, "(%d%%)\n", similarity_index(p));
+   struct strbuf sb = STRBUF_INIT;
+   strbuf_addstr(&sb, " rewrite ");
+   quote_c_style(p->two->path, &sb, NULL, 0);
+   strbuf_addf(&sb, " (%d%%)\n", similarity_index(p));
+   emit_line(opt, NULL, NULL, 1, 0, sb.buf, sb.len);

[PATCHv5 12/17] diff.c: convert word diffing to use emit_line_*

2017-05-24 Thread Stefan Beller
In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers all code related to diffing words.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 73 +-
 1 file changed, 32 insertions(+), 41 deletions(-)

diff --git a/diff.c b/diff.c
index a76abf5f69..8317824963 100644
--- a/diff.c
+++ b/diff.c
@@ -897,37 +897,42 @@ struct diff_words_data {
struct diff_words_style *style;
 };
 
-static int fn_out_diff_words_write_helper(FILE *fp,
+static int fn_out_diff_words_write_helper(struct diff_options *o,
  struct diff_words_style_elem *st_el,
  const char *newline,
- size_t count, const char *buf,
- const char *line_prefix)
+ size_t count, const char *buf)
 {
int print = 0;
+   struct strbuf sb = STRBUF_INIT;
 
while (count) {
char *p = memchr(buf, '\n', count);
if (print)
-   fputs(line_prefix, fp);
+   emit_line(o, NULL, NULL, 1, 0, "", 0);
+
if (p != buf) {
-   if (st_el->color && fputs(st_el->color, fp) < 0)
-   return -1;
-   if (fputs(st_el->prefix, fp) < 0 ||
-   fwrite(buf, p ? p - buf : count, 1, fp) != 1 ||
-   fputs(st_el->suffix, fp) < 0)
-   return -1;
-   if (st_el->color && *st_el->color
-   && fputs(GIT_COLOR_RESET, fp) < 0)
-   return -1;
+   const char *reset = st_el->color && *st_el->color ?
+   GIT_COLOR_RESET : NULL;
+   strbuf_addstr(&sb, st_el->prefix);
+   strbuf_add(&sb, buf, p ? p - buf : count);
+   strbuf_addstr(&sb, st_el->suffix);
+   emit_line(o, st_el->color, reset,
+ 0, 0, sb.buf, sb.len);
+   strbuf_reset(&sb);
}
if (!p)
-   return 0;
-   if (fputs(newline, fp) < 0)
-   return -1;
+   goto out;
+
+   strbuf_addstr(&sb, newline);
+   emit_line(o, NULL, NULL, 0, 0, sb.buf, sb.len);
+   strbuf_reset(&sb);
count -= p + 1 - buf;
buf = p + 1;
print = 1;
}
+
+out:
+   strbuf_release(&sb);
return 0;
 }
 
@@ -981,14 +986,12 @@ static void fn_out_diff_words_aux(void *priv, char *line, 
unsigned long len)
int minus_first, minus_len, plus_first, plus_len;
const char *minus_begin, *minus_end, *plus_begin, *plus_end;
struct diff_options *opt = diff_words->opt;
-   const char *line_prefix;
 
if (line[0] != '@' || parse_hunk_header(line, len,
&minus_first, &minus_len, &plus_first, &plus_len))
return;
 
assert(opt);
-   line_prefix = diff_line_prefix(opt);
 
/* POSIX requires that first be decremented by one if len == 0... */
if (minus_len) {
@@ -1005,28 +1008,21 @@ static void fn_out_diff_words_aux(void *priv, char 
*line, unsigned long len)
} else
plus_begin = plus_end = diff_words->plus.orig[plus_first].end;
 
-   if (color_words_output_graph_prefix(diff_words)) {
-   fputs(line_prefix, diff_words->opt->file);
-   }
if (diff_words->current_plus != plus_begin) {
-   fn_out_diff_words_write_helper(diff_words->opt->file,
+   fn_out_diff_words_write_helper(diff_words->opt,
&style->ctx, style->newline,
plus_begin - diff_words->current_plus,
-   diff_words->current_plus, line_prefix);
-   if (*(plus_begin - 1) == '\n')
-   fputs(line_prefix, diff_words->opt->file);
+   diff_words->current_plus);
}
if (minus_begin != minus_end) {
-   fn_out_diff_words_write_helper(diff_words->opt->file,
+   fn_out_diff_words_write_helper(diff_words->opt,
&style->old, style->newline,
-

[PATCHv5 10/17] diff.c: convert emit_binary_diff_body to use emit_line_*

2017-05-24 Thread Stefan Beller
In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_binary_diff_body.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 39 ++-
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/diff.c b/diff.c
index 3357c0fca3..b5a5261a4e 100644
--- a/diff.c
+++ b/diff.c
@@ -2244,8 +2244,8 @@ static unsigned char *deflate_it(char *data,
return deflated;
 }
 
-static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
- const char *prefix)
+static void emit_binary_diff_body(struct diff_options *o,
+ mmfile_t *one, mmfile_t *two)
 {
void *cp;
void *delta;
@@ -2274,13 +2274,12 @@ static void emit_binary_diff_body(FILE *file, mmfile_t 
*one, mmfile_t *two,
}
 
if (delta && delta_size < deflate_size) {
-   fprintf(file, "%sdelta %lu\n", prefix, orig_size);
+   emit_line_fmt(o, NULL, NULL, 1, "delta %lu\n", orig_size);
free(deflated);
data = delta;
data_size = delta_size;
-   }
-   else {
-   fprintf(file, "%sliteral %lu\n", prefix, two->size);
+   } else {
+   emit_line_fmt(o, NULL, NULL, 1, "literal %lu\n", two->size);
free(delta);
data = deflated;
data_size = deflate_size;
@@ -2289,8 +2288,9 @@ static void emit_binary_diff_body(FILE *file, mmfile_t 
*one, mmfile_t *two,
/* emit data encoded in base85 */
cp = data;
while (data_size) {
+   int len;
int bytes = (52 < data_size) ? 52 : data_size;
-   char line[70];
+   char line[71];
data_size -= bytes;
if (bytes <= 26)
line[0] = bytes + 'A' - 1;
@@ -2298,20 +2298,25 @@ static void emit_binary_diff_body(FILE *file, mmfile_t 
*one, mmfile_t *two,
line[0] = bytes - 26 + 'a' - 1;
encode_85(line + 1, cp, bytes);
cp = (char *) cp + bytes;
-   fprintf(file, "%s", prefix);
-   fputs(line, file);
-   fputc('\n', file);
+
+   len = strlen(line);
+   line[len++] = '\n';
+   line[len] = '\0';
+
+   emit_line(o, NULL, NULL, 1, 0, line, len);
}
-   fprintf(file, "%s\n", prefix);
+   emit_line(o, NULL, NULL, 1, 0, "\n", 1);
free(data);
 }
 
-static void emit_binary_diff(FILE *file, mmfile_t *one, mmfile_t *two,
-const char *prefix)
+static void emit_binary_diff(struct diff_options *o,
+mmfile_t *one, mmfile_t *two)
 {
-   fprintf(file, "%sGIT binary patch\n", prefix);
-   emit_binary_diff_body(file, one, two, prefix);
-   emit_binary_diff_body(file, two, one, prefix);
+   const char *s = "GIT binary patch\n";
+   const int len = strlen(s);
+   emit_line(o, NULL, NULL, 1, 0, s, len);
+   emit_binary_diff_body(o, one, two);
+   emit_binary_diff_body(o, two, one);
 }
 
 int diff_filespec_is_binary(struct diff_filespec *one)
@@ -2498,7 +2503,7 @@ static void builtin_diff(const char *name_a,
  header.buf, header.len);
strbuf_reset(&header);
if (DIFF_OPT_TST(o, BINARY))
-   emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
+   emit_binary_diff(o, &mf1, &mf2);
else
emit_line_fmt(o, NULL, NULL, 1,
  "Binary files %s and %s differ\n",
-- 
2.13.0.18.g7d86cc8ba0



[PATCHv5 17/17] diff.c: color moved lines differently

2017-05-24 Thread Stefan Beller
When a patch consists mostly of moving blocks of code around, it can
be quite tedious to ensure that the blocks are moved verbatim, and not
undesirably modified in the move. To that end, color blocks that are
moved within the same patch differently. For example (OM, del, add,
and NM are different colors):

[OM]  -void sensitive_stuff(void)
[OM]  -{
[OM]  -if (!is_authorized_user())
[OM]  -die("unauthorized");
[OM]  -sensitive_stuff(spanning,
[OM]  -multiple,
[OM]  -lines);
[OM]  -}

   void another_function()
   {
[del] -printf("foo");
[add] +printf("bar");
   }

[NM]  +void sensitive_stuff(void)
[NM]  +{
[NM]  +if (!is_authorized_user())
[NM]  +die("unauthorized");
[NM]  +sensitive_stuff(spanning,
[NM]  +multiple,
[NM]  +lines);
[NM]  +}

Adjacent blocks are colored differently. For example, in this
potentially malicious patch, the swapping of blocks can be spotted:

[OM]  -void sensitive_stuff(void)
[OM]  -{
[OMA] -if (!is_authorized_user())
[OMA] -die("unauthorized");
[OM]  -sensitive_stuff(spanning,
[OM]  -multiple,
[OM]  -lines);
[OMA] -}

   void another_function()
   {
[del] -printf("foo");
[add] +printf("bar");
   }

[NM]  +void sensitive_stuff(void)
[NM]  +{
[NMA] +sensitive_stuff(spanning,
[NMA] +multiple,
[NMA] +lines);
[NM]  +if (!is_authorized_user())
[NM]  +die("unauthorized");
[NMA] +}

If the moved code is larger, it is easier to hide some permutation in the
code, which is why the alternative coloring is really needed.

As the reviewers attention should be brought to the places, where the
difference is introduced to the moved code, we cannot just have one new
color for all of moved code.

First I implemented an alternative design, which would show a moved hunk
in one color, and its boundaries in another color. This idea was error
prone as it inspected each line and its neighboring lines to determine
if the line was (a) moved and (b) if was deep inside a hunk by having
matching neighboring lines. This is unreliable as the we can construct
hunks which have equal neighbors that just exceed the number of lines
inspected. (Think of 'AXYZBXYZCXYZD..' with each letter as a line, that
is permutated to AXYZCXYZBXYZD..').

Instead this provides a dynamic programming greedy algorithm that finds
the largest moved hunk and then switches color to the alternative color
for the next hunk. By doing this any permutation is recognized and
displayed. That implies that there is no dedicated boundary or
inside-hunk color, but instead we'll have just two colors alternating
for hunks.

It would be a bit more UX friendly if the two corresponding hunks
(of added and deleted lines) for one move would get the same color id.
(Both get "regular moved" or "alternative moved"). This problem is
deferred to a later patch for now.

A note on the options '--submodule=diff' and '--color-words/--word-diff':
In the conversion to use emit_line in the prior patches both submodules
as well as word diff output carefully chose to call emit_line with sign=0.
All output with sign=0 is ignored for move detection purposes in this
patch, such that no weird looking output will be generated for these
cases. This leads to another thought: We could pass on '--color-moved' to
submodules such that they color up moved lines for themselves. If we'd do
so only line moves within a repository boundary are marked up.

Helped-by: Jonathan Tan 
Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 

# Conflicts:
#   diff.c
Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
---
 Documentation/config.txt   |  14 ++-
 diff.c | 275 +++--
 diff.h |   9 +-
 t/t4015-diff-whitespace.sh | 267 +++
 4 files changed, 552 insertions(+), 13 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 475e874d51..902d017c3b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1051,14 +1051,24 @@ This does not affect linkgit:git-format-patch[1] or the
 'git-diff-{asterisk}' plumbing commands.  Can be overridden on the
 command line with the `--color[=]` option.
 
+color.moved::
+   A boolean value, whether a diff should color moved lines
+   differently. The moved lines are searched for in the diff only.
+   Duplicated lines from somewhere in the project that are not
+   part of the diff are not colored as moved.
+   Defaults to false

[PATCHv5 07/17] diff.c: convert emit_rewrite_diff to use emit_line_*

2017-05-24 Thread Stefan Beller
In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_rewrite_diff.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 33 ++---
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/diff.c b/diff.c
index 4fa976d43c..3dda9f3c8e 100644
--- a/diff.c
+++ b/diff.c
@@ -704,17 +704,17 @@ static void remove_tempfile(void)
}
 }
 
-static void print_line_count(FILE *file, int count)
+static void add_line_count(struct strbuf *out, int count)
 {
switch (count) {
case 0:
-   fprintf(file, "0,0");
+   strbuf_addstr(out, "0,0");
break;
case 1:
-   fprintf(file, "1");
+   strbuf_addstr(out, "1");
break;
default:
-   fprintf(file, "1,%d", count);
+   strbuf_addf(out, "1,%d", count);
break;
}
 }
@@ -768,7 +768,7 @@ static void emit_rewrite_diff(const char *name_a,
char *data_one, *data_two;
size_t size_one, size_two;
struct emit_callback ecbdata;
-   const char *line_prefix = diff_line_prefix(o);
+   struct strbuf out = STRBUF_INIT;
 
if (diff_mnemonic_prefix && DIFF_OPT_TST(o, REVERSE_DIFF)) {
a_prefix = o->b_prefix;
@@ -806,20 +806,23 @@ static void emit_rewrite_diff(const char *name_a,
ecbdata.lno_in_preimage = 1;
ecbdata.lno_in_postimage = 1;
 
+   emit_line_fmt(o, metainfo, reset, 1, "--- %s%s\n", a_name.buf, 
name_a_tab);
+   emit_line_fmt(o, metainfo, reset, 1, "+++ %s%s\n", b_name.buf, 
name_b_tab);
+
lc_a = count_lines(data_one, size_one);
lc_b = count_lines(data_two, size_two);
-   fprintf(o->file,
-   "%s%s--- %s%s%s\n%s%s+++ %s%s%s\n%s%s@@ -",
-   line_prefix, metainfo, a_name.buf, name_a_tab, reset,
-   line_prefix, metainfo, b_name.buf, name_b_tab, reset,
-   line_prefix, fraginfo);
+
+   strbuf_addstr(&out, "@@ -");
if (!o->irreversible_delete)
-   print_line_count(o->file, lc_a);
+   add_line_count(&out, lc_a);
else
-   fprintf(o->file, "?,?");
-   fprintf(o->file, " +");
-   print_line_count(o->file, lc_b);
-   fprintf(o->file, " @@%s\n", reset);
+   strbuf_addstr(&out, "?,?");
+   strbuf_addstr(&out, " +");
+   add_line_count(&out, lc_b);
+   strbuf_addstr(&out, " @@\n");
+   emit_line(o, fraginfo, reset, 1, 0, out.buf, out.len);
+   strbuf_release(&out);
+
if (lc_a && !o->irreversible_delete)
emit_rewrite_lines(&ecbdata, '-', data_one, size_one);
if (lc_b)
-- 
2.13.0.18.g7d86cc8ba0



[PATCHv5 15/17] diff.c: emit_line includes whitespace highlighting

2017-05-24 Thread Stefan Beller
Currently any whitespace highlighting happens outside the emit_line
function. Teach the highlighting to emit_line, triggered by a new
parameter.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 107 ++---
 diff.h |   2 ++
 2 files changed, 65 insertions(+), 44 deletions(-)

diff --git a/diff.c b/diff.c
index 76cafde4be..514c5facd7 100644
--- a/diff.c
+++ b/diff.c
@@ -516,15 +516,34 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t 
*mf2,
ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-static void emit_line(struct diff_options *o, const char *set, const char 
*reset,
- int add_line_prefix, int sign, const char *line, int len)
+
+static void emit_line(struct diff_options *o,
+ const char *set, const char *reset,
+ int add_line_prefix, int markup_ws,
+ int sign, const char *line, int len)
 {
+   const char *ws;
int has_trailing_newline, has_trailing_carriage_return;
FILE *file = o->file;
 
if (add_line_prefix)
fputs(diff_line_prefix(o), file);
 
+   if (markup_ws) {
+   ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
+
+   if (set)
+   fputs(set, file);
+   if (sign)
+   fputc(sign, file);
+   if (reset)
+   fputs(reset, file);
+   ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
+   ws_check_emit(line, len, o->ws_rule,
+ file, set, reset, ws);
+   return;
+   }
+
has_trailing_newline = (len > 0 && line[len-1] == '\n');
if (has_trailing_newline)
len--;
@@ -558,14 +577,14 @@ static void emit_line_fmt(struct diff_options *o,
strbuf_vaddf(&sb, fmt, ap);
va_end(ap);
 
-   emit_line(o, set, reset, add_line_prefix, 0, sb.buf, sb.len);
+   emit_line(o, set, reset, add_line_prefix, 0, 0, sb.buf, sb.len);
strbuf_release(&sb);
 }
 
 void diff_emit_line(struct diff_options *o, const char *set, const char *reset,
const char *line, int len)
 {
-   emit_line(o, set, reset, 1, 0, line, len);
+   emit_line(o, set, reset, 1, 0, 0, line, len);
 }
 
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char 
*line, int len)
@@ -596,16 +615,15 @@ static void emit_line_checked(const char *reset,
}
 
if (!ws)
-   emit_line(ecbdata->opt, set, reset, 1, sign, line, len);
+   emit_line(ecbdata->opt, set, reset, 1, 0, sign, line, len);
else if (sign == '+' && new_blank_line_at_eof(ecbdata, line, len))
/* Blank line at EOF - paint '+' as well */
-   emit_line(ecbdata->opt, ws, reset, 1, sign, line, len);
+   emit_line(ecbdata->opt, ws, reset, 1, 1, sign, line, len);
else {
/* Emit just the prefix, then the rest. */
-   emit_line(ecbdata->opt, set, reset, 1, sign, "", 0);
-   ws_check_emit(line, len, ecbdata->ws_rule,
- ecbdata->opt->file, set, reset, ws);
+   emit_line(ecbdata->opt, set, reset, 1, 1, sign, line, len);
}
+
 }
 
 static void emit_add_line(const char *reset,
@@ -652,7 +670,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
if (len < 10 ||
memcmp(line, atat, 2) ||
!(ep = memmem(line + 2, len - 2, atat, 2))) {
-   emit_line(ecbdata->opt, context, reset, 1, 0, line, len);
+   emit_line(ecbdata->opt, context, reset, 1, 0, 0, line, len);
return;
}
ep += 2; /* skip over @@ */
@@ -688,7 +706,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
strbuf_add(&msgbuf, line + len, org_len - len);
strbuf_complete_line(&msgbuf);
 
-   emit_line(ecbdata->opt, "", "", 1, 0, msgbuf.buf, msgbuf.len);
+   emit_line(ecbdata->opt, "", "", 1, 0, 0, msgbuf.buf, msgbuf.len);
strbuf_release(&msgbuf);
 }
 
@@ -759,7 +777,7 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
static const char *nneof = "\\ No newline at end of file\n";
const char *context = diff_get_color(ecb->color_diff,
 DIFF_CONTEXT);
-   emit_line(ecb->opt, context, reset, 1, 0,
+   emit_line(ecb->opt, context, reset, 1, 0, 0,
nneof, strlen(nneof));
strbuf_release(&sb);
}
@@ -835,7 +853,7 @@ static void emit_rewrite_diff(const char *name_a,
strbuf_addstr(&out, " +");
add_line_count(&out, lc_b);
strbuf_addstr(&out, " @@\n");
-   emit_line(o, fraginfo, reset, 1, 0, out.buf, out.len);
+   emit_line(o, fraginfo, reset, 1, 0, 0, out.buf, out.

[PATCHv5 11/17] diff.c: convert show_stats to use emit_line_*

2017-05-24 Thread Stefan Beller
In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

We call print_stat_summary from builtin/apply, so we still
need the version with a file pointer, so introduce
print_stat_summary_0 that uses emit_line_* machinery and
keep print_stat_summary with the same arguments around.

The responsibility to print the line prefix moves from the callers
of print_stat_summary_0 into the function itself.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 89 ++
 diff.h |  4 +--
 2 files changed, 53 insertions(+), 40 deletions(-)

diff --git a/diff.c b/diff.c
index b5a5261a4e..a76abf5f69 100644
--- a/diff.c
+++ b/diff.c
@@ -1540,20 +1540,19 @@ static int scale_linear(int it, int width, int 
max_change)
return 1 + (it * (width - 1) / max_change);
 }
 
-static void show_name(FILE *file,
+static void show_name(struct strbuf *out,
  const char *prefix, const char *name, int len)
 {
-   fprintf(file, " %s%-*s |", prefix, len, name);
+   strbuf_addf(out, " %s%-*s |", prefix, len, name);
 }
 
-static void show_graph(FILE *file, char ch, int cnt, const char *set, const 
char *reset)
+static void show_graph(struct strbuf *out, char ch, int cnt, const char *set, 
const char *reset)
 {
if (cnt <= 0)
return;
-   fprintf(file, "%s", set);
-   while (cnt--)
-   putc(ch, file);
-   fprintf(file, "%s", reset);
+   strbuf_addstr(out, set);
+   strbuf_addchars(out, ch, cnt);
+   strbuf_addstr(out, reset);
 }
 
 static void fill_print_name(struct diffstat_file *file)
@@ -1577,14 +1576,16 @@ static void fill_print_name(struct diffstat_file *file)
file->print_name = pname;
 }
 
-int print_stat_summary(FILE *fp, int files, int insertions, int deletions)
+static void print_stat_summary_0(struct diff_options *options, int files,
+int insertions, int deletions)
 {
struct strbuf sb = STRBUF_INIT;
-   int ret;
 
if (!files) {
assert(insertions == 0 && deletions == 0);
-   return fprintf(fp, "%s\n", " 0 files changed");
+   strbuf_addstr(&sb, " 0 files changed");
+   emit_line(options, NULL, NULL, 1, 0, sb.buf, sb.len);
+   return;
}
 
strbuf_addf(&sb,
@@ -1611,9 +1612,17 @@ int print_stat_summary(FILE *fp, int files, int 
insertions, int deletions)
deletions);
}
strbuf_addch(&sb, '\n');
-   ret = fputs(sb.buf, fp);
+   emit_line(options, NULL, NULL, 1, 0, sb.buf, sb.len);
strbuf_release(&sb);
-   return ret;
+}
+
+void print_stat_summary(FILE *fp, int files,
+   int insertions, int deletions)
+{
+   struct diff_options o;
+   memset(&o, 0, sizeof(o));
+   o.file = fp;
+   print_stat_summary_0(&o, files, insertions, deletions);
 }
 
 static void show_stats(struct diffstat_t *data, struct diff_options *options)
@@ -1623,13 +1632,13 @@ static void show_stats(struct diffstat_t *data, struct 
diff_options *options)
int total_files = data->nr, count;
int width, name_width, graph_width, number_width = 0, bin_width = 0;
const char *reset, *add_c, *del_c;
-   const char *line_prefix = "";
int extra_shown = 0;
+   const char *line_prefix = diff_line_prefix(options);
+   struct strbuf out = STRBUF_INIT;
 
if (data->nr == 0)
return;
 
-   line_prefix = diff_line_prefix(options);
count = options->stat_count ? options->stat_count : data->nr;
 
reset = diff_get_color_opt(options, DIFF_RESET);
@@ -1783,26 +1792,29 @@ static void show_stats(struct diffstat_t *data, struct 
diff_options *options)
}
 
if (file->is_binary) {
-   fprintf(options->file, "%s", line_prefix);
-   show_name(options->file, prefix, name, len);
-   fprintf(options->file, " %*s", number_width, "Bin");
+   show_name(&out, prefix, name, len);
+   strbuf_addf(&out, " %*s", number_width, "Bin");
if (!added && !deleted) {
-   putc('\n', options->file);
+   strbuf_addch(&out, '\n');
+   emit_line(options, NULL, NULL, 1, 0, out.buf, 
out.len);
+   strbuf_reset(&out);
continue;
 

[PATCHv5 02/17] diff: move line ending check into emit_hunk_header

2017-05-24 Thread Stefan Beller
The emit_hunk_header() function is responsible for assembling a
hunk header and calling emit_line() to send the hunk header
to the output file.  Its only caller fn_out_consume() needs
to prepare for a case where the function emits an incomplete
line and add the terminating LF.

Instead make sure emit_hunk_header() to always send a
completed line to emit_line().

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 3f5bf8b5a4..c2ed605cd0 100644
--- a/diff.c
+++ b/diff.c
@@ -677,6 +677,8 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
}
 
strbuf_add(&msgbuf, line + len, org_len - len);
+   strbuf_complete_line(&msgbuf);
+
emit_line(ecbdata->opt, "", "", msgbuf.buf, msgbuf.len);
strbuf_release(&msgbuf);
 }
@@ -1315,8 +1317,6 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
len = sane_truncate_line(ecbdata, line, len);
find_lno(line, ecbdata);
emit_hunk_header(ecbdata, line, len);
-   if (line[len-1] != '\n')
-   putc('\n', o->file);
return;
}
 
-- 
2.13.0.18.g7d86cc8ba0



[PATCHv5 05/17] diff.c: convert fn_out_consume to use emit_line

2017-05-24 Thread Stefan Beller
In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line.

This covers the parts of fn_out_consume.  In the next
patches we'll convert more functions that want to emit
formatted output, so we'd want to have a formatted emit
function. Add it here.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 3569857818..8186289734 100644
--- a/diff.c
+++ b/diff.c
@@ -547,6 +547,21 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
fputc('\n', file);
 }
 
+static void emit_line_fmt(struct diff_options *o,
+ const char *set, const char *reset,
+ int add_line_prefix,
+ const char *fmt, ...)
+{
+   struct strbuf sb = STRBUF_INIT;
+   va_list ap;
+   va_start(ap, fmt);
+   strbuf_vaddf(&sb, fmt, ap);
+   va_end(ap);
+
+   emit_line(o, set, reset, add_line_prefix, 0, sb.buf, sb.len);
+   strbuf_release(&sb);
+}
+
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char 
*line, int len)
 {
if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -1270,7 +1285,6 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
const char *context = diff_get_color(ecbdata->color_diff, DIFF_CONTEXT);
const char *reset = diff_get_color(ecbdata->color_diff, DIFF_RESET);
struct diff_options *o = ecbdata->opt;
-   const char *line_prefix = diff_line_prefix(o);
 
o->found_changes = 1;
 
@@ -1282,14 +1296,12 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
 
if (ecbdata->label_path[0]) {
const char *name_a_tab, *name_b_tab;
-
name_a_tab = strchr(ecbdata->label_path[0], ' ') ? "\t" : "";
name_b_tab = strchr(ecbdata->label_path[1], ' ') ? "\t" : "";
-
-   fprintf(o->file, "%s%s--- %s%s%s\n",
-   line_prefix, meta, ecbdata->label_path[0], reset, 
name_a_tab);
-   fprintf(o->file, "%s%s+++ %s%s%s\n",
-   line_prefix, meta, ecbdata->label_path[1], reset, 
name_b_tab);
+   emit_line_fmt(o, meta, reset, 1, "--- %s%s\n",
+ ecbdata->label_path[0], name_a_tab);
+   emit_line_fmt(o, meta, reset, 1, "+++ %s%s\n",
+ ecbdata->label_path[1], name_b_tab);
ecbdata->label_path[0] = ecbdata->label_path[1] = NULL;
}
 
@@ -1330,7 +1342,7 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
diff_words_flush(ecbdata);
if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
emit_line(o, context, reset, 1, 0, line, len);
-   fputs("~\n", o->file);
+   emit_line(o, NULL, NULL, 0, 0, "~\n", 2);
} else {
/*
 * Skip the prefix character, if any.  With
-- 
2.13.0.18.g7d86cc8ba0



[PATCHv5 03/17] diff.c: factor out diff_flush_patch_all_file_pairs

2017-05-24 Thread Stefan Beller
In a later patch we want to do more things before and after all filepairs
are flushed. So factor flushing out all file pairs into its own function
that the new code can be plugged in easily.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index c2ed605cd0..2f9722b382 100644
--- a/diff.c
+++ b/diff.c
@@ -4737,6 +4737,17 @@ void diff_warn_rename_limit(const char *varname, int 
needed, int degraded_cc)
warning(_(rename_limit_advice), varname, needed);
 }
 
+static void diff_flush_patch_all_file_pairs(struct diff_options *o)
+{
+   int i;
+   struct diff_queue_struct *q = &diff_queued_diff;
+   for (i = 0; i < q->nr; i++) {
+   struct diff_filepair *p = q->queue[i];
+   if (check_pair_status(p))
+   diff_flush_patch(p, o);
+   }
+}
+
 void diff_flush(struct diff_options *options)
 {
struct diff_queue_struct *q = &diff_queued_diff;
@@ -4831,11 +4842,7 @@ void diff_flush(struct diff_options *options)
}
}
 
-   for (i = 0; i < q->nr; i++) {
-   struct diff_filepair *p = q->queue[i];
-   if (check_pair_status(p))
-   diff_flush_patch(p, options);
-   }
+   diff_flush_patch_all_file_pairs(options);
}
 
if (output_format & DIFF_FORMAT_CALLBACK)
-- 
2.13.0.18.g7d86cc8ba0



[PATCHv5 04/17] diff: introduce more flexible emit function

2017-05-24 Thread Stefan Beller
Currently, diff output is written either through the emit_line_0
function or through the FILE * in struct diff_options directly. To
make it easier to teach diff to buffer its output (which will be done
in a subsequent commit), introduce a more flexible emit_line() function.
In this commit, direct usages of emit_line_0() are replaced with
emit_line(); subsequent commits will also replace usages of the
FILE * with emit().

Instead of having a 'first' parameter containing the first character
of a line, have a dedicated 'sign' parameter that is just set when
the first character of the line is part of the actual content, i.e.
' ', '+', '-'.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 76 +-
 1 file changed, 33 insertions(+), 43 deletions(-)

diff --git a/diff.c b/diff.c
index 2f9722b382..3569857818 100644
--- a/diff.c
+++ b/diff.c
@@ -516,36 +516,30 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t 
*mf2,
ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-static void emit_line_0(struct diff_options *o, const char *set, const char 
*reset,
-   int first, const char *line, int len)
+static void emit_line(struct diff_options *o, const char *set, const char 
*reset,
+ int add_line_prefix, int sign, const char *line, int len)
 {
int has_trailing_newline, has_trailing_carriage_return;
-   int nofirst;
FILE *file = o->file;
 
-   fputs(diff_line_prefix(o), file);
+   if (add_line_prefix)
+   fputs(diff_line_prefix(o), file);
 
-   if (len == 0) {
-   has_trailing_newline = (first == '\n');
-   has_trailing_carriage_return = (!has_trailing_newline &&
-   (first == '\r'));
-   nofirst = has_trailing_newline || has_trailing_carriage_return;
-   } else {
-   has_trailing_newline = (len > 0 && line[len-1] == '\n');
-   if (has_trailing_newline)
-   len--;
-   has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
-   if (has_trailing_carriage_return)
-   len--;
-   nofirst = 0;
-   }
+   has_trailing_newline = (len > 0 && line[len-1] == '\n');
+   if (has_trailing_newline)
+   len--;
+   has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
+   if (has_trailing_carriage_return)
+   len--;
 
-   if (len || !nofirst) {
-   fputs(set, file);
-   if (!nofirst)
-   fputc(first, file);
+   if (len || sign) {
+   if (set)
+   fputs(set, file);
+   if (sign)
+   fputc(sign, file);
fwrite(line, len, 1, file);
-   fputs(reset, file);
+   if (reset)
+   fputs(reset, file);
}
if (has_trailing_carriage_return)
fputc('\r', file);
@@ -553,12 +547,6 @@ static void emit_line_0(struct diff_options *o, const char 
*set, const char *res
fputc('\n', file);
 }
 
-static void emit_line(struct diff_options *o, const char *set, const char 
*reset,
- const char *line, int len)
-{
-   emit_line_0(o, set, reset, line[0], line+1, len-1);
-}
-
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char 
*line, int len)
 {
if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -587,13 +575,13 @@ static void emit_line_checked(const char *reset,
}
 
if (!ws)
-   emit_line_0(ecbdata->opt, set, reset, sign, line, len);
+   emit_line(ecbdata->opt, set, reset, 1, sign, line, len);
else if (sign == '+' && new_blank_line_at_eof(ecbdata, line, len))
/* Blank line at EOF - paint '+' as well */
-   emit_line_0(ecbdata->opt, ws, reset, sign, line, len);
+   emit_line(ecbdata->opt, ws, reset, 1, sign, line, len);
else {
/* Emit just the prefix, then the rest. */
-   emit_line_0(ecbdata->opt, set, reset, sign, "", 0);
+   emit_line(ecbdata->opt, set, reset, 1, sign, "", 0);
ws_check_emit(line, len, ecbdata->ws_rule,
  ecbdata->opt->file, set, reset, ws);
}
@@ -643,7 +631,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
if (len < 10 ||
memcmp(line, atat, 2) ||
!(ep = memmem(line + 2, len - 2, atat, 2))) {
-   emit_line(ecbdata->opt, context, reset, line, len);
+   emit_line(ecbdata->opt, context, reset, 1, 0, line, len);
return;
}
ep += 2; /* skip over @@ */
@@ -679,7 +667,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
strbuf_add(&msgbuf, line + len, or

[PATCHv5 09/17] submodule.c: convert show_submodule_summary to use emit_line_fmt

2017-05-24 Thread Stefan Beller
In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This prepares the code for submodules to go through the emit_line
function.

As the submodule process is no longer attached to the same stdout as
the superprojects process, one might imagine that we would need to
pass on the usage of colors explicitly as the subprocess can no longer
determine where the output is going to land eventually. But this is not
the case. Apparently coloring submodule diffs never worked, so defer
the submodule diff coloring to a future patch series.

Signed-off-by: Stefan Beller 
---
 diff.c  | 14 ++
 diff.h  |  3 +++
 submodule.c | 87 -
 submodule.h |  9 +++
 4 files changed, 61 insertions(+), 52 deletions(-)

diff --git a/diff.c b/diff.c
index ca6b48cf49..3357c0fca3 100644
--- a/diff.c
+++ b/diff.c
@@ -562,6 +562,12 @@ static void emit_line_fmt(struct diff_options *o,
strbuf_release(&sb);
 }
 
+void diff_emit_line(struct diff_options *o, const char *set, const char *reset,
+   const char *line, int len)
+{
+   emit_line(o, set, reset, 1, 0, line, len);
+}
+
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char 
*line, int len)
 {
if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -2384,8 +2390,7 @@ static void builtin_diff(const char *name_a,
(!two->mode || S_ISGITLINK(two->mode))) {
const char *del = diff_get_color_opt(o, DIFF_FILE_OLD);
const char *add = diff_get_color_opt(o, DIFF_FILE_NEW);
-   show_submodule_summary(o->file, one->path ? one->path : 
two->path,
-   line_prefix,
+   show_submodule_summary(o, one->path ? one->path : two->path,
&one->oid, &two->oid,
two->dirty_submodule,
meta, del, add, reset);
@@ -2395,11 +2400,10 @@ static void builtin_diff(const char *name_a,
   (!two->mode || S_ISGITLINK(two->mode))) {
const char *del = diff_get_color_opt(o, DIFF_FILE_OLD);
const char *add = diff_get_color_opt(o, DIFF_FILE_NEW);
-   show_submodule_inline_diff(o->file, one->path ? one->path : 
two->path,
-   line_prefix,
+   show_submodule_inline_diff(o, one->path ? one->path : two->path,
&one->oid, &two->oid,
two->dirty_submodule,
-   meta, del, add, reset, o);
+   meta, del, add, reset);
return;
}
 
diff --git a/diff.h b/diff.h
index 5be1ee77a7..9ad546361a 100644
--- a/diff.h
+++ b/diff.h
@@ -188,6 +188,9 @@ struct diff_options {
int diff_path_counter;
 };
 
+void diff_emit_line(struct diff_options *o, const char *set, const char *reset,
+   const char *line, int len);
+
 enum color_diff {
DIFF_RESET = 0,
DIFF_CONTEXT = 1,
diff --git a/submodule.c b/submodule.c
index d3299e29c0..19c63197fb 100644
--- a/submodule.c
+++ b/submodule.c
@@ -362,8 +362,8 @@ static int prepare_submodule_summary(struct rev_info *rev, 
const char *path,
return prepare_revision_walk(rev);
 }
 
-static void print_submodule_summary(struct rev_info *rev, FILE *f,
-   const char *line_prefix,
+static void print_submodule_summary(struct rev_info *rev,
+   struct diff_options *o,
const char *del, const char *add, const char *reset)
 {
static const char format[] = "  %m %s";
@@ -375,18 +375,12 @@ static void print_submodule_summary(struct rev_info *rev, 
FILE *f,
ctx.date_mode = rev->date_mode;
ctx.output_encoding = get_log_output_encoding();
strbuf_setlen(&sb, 0);
-   strbuf_addstr(&sb, line_prefix);
-   if (commit->object.flags & SYMMETRIC_LEFT) {
-   if (del)
-   strbuf_addstr(&sb, del);
-   }
-   else if (add)
-   strbuf_addstr(&sb, add);
format_commit_message(commit, format, &sb, &ctx);
-   if (reset)
-   strbuf_addstr(&sb, reset);
strbuf_addch(&sb, '\n');
-   fprintf(f, "%s", sb.buf);
+   if (commit->object.flags & SYMMETRIC_LEFT)
+   diff_emit_line(o, del, reset, sb.buf, sb.len);
+   else if (add)

[PATCHv5 01/17] diff: readability fix

2017-05-24 Thread Stefan Beller
We already have dereferenced 'p->two' into a local variable 'two'. Use
that.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 74283d9001..3f5bf8b5a4 100644
--- a/diff.c
+++ b/diff.c
@@ -3283,8 +3283,8 @@ static void run_diff(struct diff_filepair *p, struct 
diff_options *o)
const char *other;
const char *attr_path;
 
-   name  = p->one->path;
-   other = (strcmp(name, p->two->path) ? p->two->path : NULL);
+   name  = one->path;
+   other = (strcmp(name, two->path) ? two->path : NULL);
attr_path = name;
if (o->prefix_length)
strip_prefix(o->prefix_length, &name, &other);
-- 
2.13.0.18.g7d86cc8ba0



[PATCHv5 00/17] Diff machine: highlight moved lines.

2017-05-24 Thread Stefan Beller
v5:
* removed the color passing to the submodule to make the tests pass again.
* fixed an indentation issue that was introduced from v3 -> v4.
* I merged it with origin/next and tests pass here.

Thanks,
Stefan

diff to v4:
diff --git a/diff.c b/diff.c
index 23e70d348e..1292d3c4ad 100644
--- a/diff.c
+++ b/diff.c
@@ -751,7 +751,7 @@ static void mark_color_as_moved(struct diff_options *o,
 }
 
 static void emit_diff_line(struct diff_options *o,
-struct diff_line *e)
+  struct diff_line *e)
 {
const char *ws;
int has_trailing_newline, has_trailing_carriage_return;
@@ -804,7 +804,7 @@ static void emit_diff_line(struct diff_options *o,
 }
 
 static void append_diff_line(struct diff_options *o,
-  struct diff_line *e)
+struct diff_line *e)
 {
struct diff_line *f;
ALLOC_GROW(o->line_buffer,
diff --git a/submodule.c b/submodule.c
index 428c996c97..19c63197fb 100644
--- a/submodule.c
+++ b/submodule.c
@@ -550,8 +550,6 @@ void show_submodule_inline_diff(struct diff_options *o, 
const char *path,
 
/* TODO: other options may need to be passed here. */
argv_array_push(&cp.args, "diff");
-   if (o->use_color)
-   argv_array_push(&cp.args, "--color=always");
argv_array_pushf(&cp.args, "--line-prefix=%s", diff_line_prefix(o));
if (DIFF_OPT_TST(o, REVERSE_DIFF)) {
argv_array_pushf(&cp.args, "--src-prefix=%s%s/",

v4:
* interdiff to v3 (what is currently origin/sb/diff-color-move) below.
* renamed the "buffered_patch_line" to "diff_line". Originally I planned
  to not carry the "line" part as it can be a piece of a line as well.
  But for the intended functionality it is best to keep the name.
  If we'd want to add more functionality to say have a move detection
  for words as well, we'd rename the struct to have a better name then.
  For now diff_line is the best. (Thanks Jonathan Nieder!)
* tests to demonstrate it doesn't mess with --color-words as well as
  submodules. (Thanks Jonathan Tan!)
* added in the statics (Thanks Ramsay!)
* smaller scope for the hashmaps (Thanks Jonathan Tan!)
* some commit messages were updated, prior patch 4-7 is squashed into one
  (Thanks Jonathan Tan!)
* the tests added revealed an actual fault: now that the submodule process
  is not attached to a dupe of our stdout, it would stop coloring the
  output. We need to pass on use-color explicitly.
* updated the NEEDSWORK comment in the second last patch.

Thanks for bearing,
Stefan

v3:
* see interdiff below.
* fixing one invalid computation (Thanks Junio!)
* I reasoned more about submodule and word diffing, see the commit message
  of the last patch:
  
A note on the options '--submodule=diff' and '--color-words/--word-diff':
In the conversion to use emit_line in the prior patches both submodules
as well as word diff output carefully chose to call emit_line with sign=0.
All output with sign=0 is ignored for move detection purposes in this
patch, such that no weird looking output will be generated for these
cases. This leads to another thought: We could pass on '--color-moved' to
submodules such that they color up moved lines for themselves. If we'd do
so only line moves within a repository boundary are marked up.

* better name for emit_line outside of diff.[ch]

v2:
* emit_line now takes an argument that indicates if we want it
  to emit the line prefix as well. This should allow for a more faithful
  refactoring in the beginning. (Thanks Jonathan!)
* fixed memleaks (Thanks Brandon!)
* "git -c color.moved=true log -p" works now! (Thanks Jeff)
* interdiff below, though it is large.
* less intrusive than v1 (Thanks Jonathan!)

v1:

For details on *why* see the commit message of the last commit.

The first five patches are slight refactorings to get into good
shape, the next patches are funneling all output through emit_line_*.

The second last patch introduces an option to buffer up all output
before printing, and then the last patch can color up moved lines
of code.

Any feedback welcome.

Thanks,
Stefan

Stefan Beller (17):
  diff: readability fix
  diff: move line ending check into emit_hunk_header
  diff.c: factor out diff_flush_patch_all_file_pairs
  diff: introduce more flexible emit function
  diff.c: convert fn_out_consume to use emit_line
  diff.c: convert builtin_diff to use emit_line_*
  diff.c: convert emit_rewrite_diff to use emit_line_*
  diff.c: convert emit_rewrite_lines to use emit_line_*
  submodule.c: convert show_submodule_summary to use emit_line_fmt
  diff.c: convert emit_binary_diff_body to use emit_line_*
  diff.c: convert show_stats to use emit_line_*
  diff.c: convert word diffing to use emit_line_*
  diff.c: convert diff_flush to use emit_line_*
  diff.c: convert diff_summary to use emit_line_*
  diff.c: emit_line includes whitespace highlighting
  diff: buffer all ou

Re: Hide decorations in git log

2017-05-24 Thread Jeff King
On Wed, May 24, 2017 at 09:22:32AM -0500, Robert Dailey wrote:

> Is it possible to hide decorated refs in `git log` even if they are
> reachable from the refs I'm actually interested in seeing the logs of?

Sadly, no, there's no way to do this right now.

There was some discussion in this thread:

  http://public-inbox.org/git/20170119122630.27645-1-pclo...@gmail.com/T/#u

but no patches that do what you want.

Using the existing ref-selectors as discussed in that thread, it would
probably look something like:

  git log --oneline \
--branches 'feature/*' --decorate-refs \
--branches 'feature/*'

It sucks that you'd have to specify your refs twice (once to mark them
for decoration and once to use them as tips). But the flexibility would
let you do things like:

  git log --oneline \
--branches 'feature/*' --decorate-refs \
--tags --decorate-refs \
HEAD

which decorates feature branches and tags, but only traverses from HEAD
(just as an example).

Anyway, none of that exists yet, so patches welcome.

-Peff


Re: Hide decorations in git log

2017-05-24 Thread Robert Dailey
On Wed, May 24, 2017 at 9:22 AM, Robert Dailey  wrote:
> Hello,
>
> Is it possible to hide decorated refs in `git log` even if they are
> reachable from the refs I'm actually interested in seeing the logs of?
>
> For example, if I do `git log --graph --decorate --oneline --branches
> 'feature/*'`, I'd like to *only* see refnames that match the glob
> pattern. However, you'll see tags and other branches that do not match
> the glob if they are reachable from the result set.
>
> This is purely a visual thing, and shouldn't impact the search results
> I'd think.
>
> This is especially useful in cases where I do --simplify-by-decoration
> to get a better understanding of the topology of just a couple of
> select branches. Without some sort of "decoration exclusion", I am
> getting ton of results including tags, etc which obfuscates the
> information I'm really interested in.
>
> Thanks in advance.

Here is an even more concrete example:

$ git log --oneline --abbrev-commit --graph --simplify-by-decoration
--branches='hotfix'

I get the following results:

https://i.imgur.com/arHJss8.png

These results are incredibly confusing... I expected to see the first
line of the log results to be a branch matching 'hotfix' at the very
least. Maybe my question is more of a symptom of confusion about a
different problem. In any case, sorry for the confusion and hopefully
someone can clarify for me. I must be missing something basic.


Hide decorations in git log

2017-05-24 Thread Robert Dailey
Hello,

Is it possible to hide decorated refs in `git log` even if they are
reachable from the refs I'm actually interested in seeing the logs of?

For example, if I do `git log --graph --decorate --oneline --branches
'feature/*'`, I'd like to *only* see refnames that match the glob
pattern. However, you'll see tags and other branches that do not match
the glob if they are reachable from the result set.

This is purely a visual thing, and shouldn't impact the search results
I'd think.

This is especially useful in cases where I do --simplify-by-decoration
to get a better understanding of the topology of just a couple of
select branches. Without some sort of "decoration exclusion", I am
getting ton of results including tags, etc which obfuscates the
information I'm really interested in.

Thanks in advance.


Re: [Non-Bug] cloning a repository with a default MASTER branch tries to check out the master branch

2017-05-24 Thread Jeff King
On Wed, May 24, 2017 at 12:24:52AM +0100, Philip Oakley wrote:

> > > $ git clone g...@github.com:passcod/UPPERCASE-NPM.git
> > > Cloning into 'UPPERCASE-NPM'...
> > > remote: Counting objects: 14, done.
> > > remote: Compressing objects: 100% (11/11), done.
> > > remote: Total 14 (delta 3), reused 14 (delta 3), pack-reused 0
> > > Receiving objects: 100% (14/14), done.
> > > Resolving deltas: 100% (3/3), done.
> > > warning: remote HEAD refers to nonexistent ref, unable to checkout.
> 
> Perhaps here the warning message could include the value of the ref provided
> by the remote's HEAD which would then more clearly indicate to the user what
> was expected.
> 
> I haven't had chance to look at how easy that maybe in the code - perhaps a
> bit of low hanging fruit for someone?

Unfortunately, it can't, because the ref doesn't exist:

  $ git ls-remote git://github.com/passcod/UPPERCASE-NPM.git
  efc7dbfd6ca155d5d19ce67eb98603896062f35a  refs/heads/MASTER
  e60ea8e6ec45ec45ff44ac8939cb4105b16477da  refs/pull/1/head
  f35a73dcb151d336dc3d30c9a2c7423ecdb7bd1c  refs/pull/2/head
  0d9b3a1268ff39350e04a7183af0add912b686e6  refs/tags/V1.0.0
  efc7dbfd6ca155d5d19ce67eb98603896062f35a  refs/tags/V1.0.1

There is no HEAD line at all, so we have no information about it on the
client side.  Likewise, if you run with GIT_TRACE_PACKET=1, you'll see
that the capabilities line does not include a symref marker either.

So if we wanted to improve this, I think the first step would be for the
server to start sending symref lines for HEAD, even when it does not
resolve to anything.

That would also make a case like this:

  git -C dst.git symbolic-ref refs/heads/does-not-exist
  git clone dst.git local

use "does-not-exist" as the default branch name in our local clone
(rather than just falling back to "master", which presumably the other
side never plans to use).

-Peff


Re: [PATCH v2 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman

2017-05-24 Thread Christian Couder
On Thu, May 18, 2017 at 10:13 PM, Ben Peart  wrote:
> This hook script integrates the new fsmonitor capabilities of git with
> the cross platform Watchman file watching service. To use the script:
>
> Download and install Watchman from https://facebook.github.io/watchman/
> and instruct Watchman to watch your working directory for changes
> ('watchman watch-project /usr/src/git').
>
> Rename the sample integration hook from query-fsmonitor.sample to
> query-fsmonitor.
>
> Configure git to use the extension ('git config core.fsmonitor true')
> and optionally turn on the untracked cache for optimal performance
> ('git config core.untrackedcache true').

Ok, it looks like the untracked cache can be used, but it could work without it.

> Signed-off-by: Ben Peart 
> Signed-off-by: Johannes Schindelin 
> ---
>  templates/hooks--query-fsmonitor.sample | 27 +++
>  1 file changed, 27 insertions(+)
>  create mode 100644 templates/hooks--query-fsmonitor.sample
>
> diff --git a/templates/hooks--query-fsmonitor.sample 
> b/templates/hooks--query-fsmonitor.sample
> new file mode 100644
> index 00..4bd22f21d8
> --- /dev/null
> +++ b/templates/hooks--query-fsmonitor.sample
> @@ -0,0 +1,27 @@
> +#!/bin/sh
> +#
> +# An example hook script to integrate Watchman
> +# (https://facebook.github.io/watchman/) with git to provide fast
> +# git status.
> +#
> +# The hook is passed a time_t formatted as a string and outputs to
> +# stdout all files that have been modified since the given time.
> +# Paths must be relative to the root of the working tree and
> +# separated by a single NUL.
> +#
> +# To enable this hook, rename this file to "query-fsmonitor"
> +
> +# Convert unix style paths to escaped Windows style paths
> +case "$(uname -s)" in
> +MINGW*|MSYS_NT*)
> +  GIT_WORK_TREE="$(cygpath -aw "$PWD" | sed 's,\\,,g')"
> +  ;;
> +*)
> +  GIT_WORK_TREE="$PWD"
> +  ;;
> +esac
> +
> +# Query Watchman for all the changes since the requested time
> +echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $1, 
> \"fields\":[\"name\"]}]" | \
> +watchman -j | \
> +perl -e 'use JSON::PP; my $o = JSON::PP->new->utf8->decode(join("", <>)); 
> die "Watchman: $o->{'error'}.\nFalling back to scanning...\n" if 
> defined($o->{"error"}); print(join("\0", @{$o->{"files"}}));'

Maybe put the above small perl script on multiple lines for improved
readability.

Is JSON::PP always available in Perl >= 5.8?
What happens if watchman is not installed or not in the PATH?
It seems to me that the git process will not die, and will just work
as if the hook does not exist, except that it will call the hook which
will probably output error messages.


Re: [PATCH v2 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-24 Thread Christian Couder
On Thu, May 18, 2017 at 10:13 PM, Ben Peart  wrote:
> When the index is read from disk, the query-fsmonitor index extension is
> used to flag the last known potentially dirty index and untracked cach

s/cach/cache/

> entries.

[...]

> diff --git a/cache.h b/cache.h
> index 188811920c..36423c77cc 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -201,6 +201,7 @@ struct cache_entry {
>  #define CE_ADDED (1 << 19)
>
>  #define CE_HASHED(1 << 20)
> +#define CE_FSMONITOR_DIRTY   (1 << 21)

It looks like the (1 << 21) value was used before (as CE_UNHASHED) and
was removed in:

419a597f64 (name-hash.c: remove cache entries instead of marking them
CE_UNHASHED, 2013-11-14)

I wondered if using this value again could make old and new versions
of git interact badly, but it looks like these are in memory only
flags, so it should be ok.

>  #define CE_WT_REMOVE (1 << 22) /* remove in work directory */
>  #define CE_CONFLICTED(1 << 23)
>
>  struct split_index;
>  struct untracked_cache;
> @@ -342,6 +344,8 @@ struct index_state {
> struct hashmap dir_hash;
> unsigned char sha1[20];
> struct untracked_cache *untracked;
> +   time_t fsmonitor_last_update;
> +   struct ewah_bitmap *fsmonitor_dirty_bitmap;

Maybe you could remove "_bitmap" at the end of the name.

>  };


> diff --git a/dir.c b/dir.c
> index 1b5558fdf9..da428489e2 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1652,6 +1652,18 @@ static int valid_cached_dir(struct dir_struct *dir,
> if (!untracked)
> return 0;
>
> +   refresh_by_fsmonitor(&the_index);
> +   if (dir->untracked->use_fsmonitor) {
> +   /*
> +* With fsmonitor, we can trust the untracked cache's
> +* valid field.
> +*/
> +   if (untracked->valid)
> +   goto skip_stat;

Maybe you could avoid this goto using a valid_cached_dir_after_stat()
function that would do what is after the "skip_stat:" label below?

> +   else
> +   invalidate_directory(dir->untracked, untracked);
> +   }
> +
> if (stat(path->len ? path->buf : ".", &st)) {
> invalidate_directory(dir->untracked, untracked);
> memset(&untracked->stat_data, 0, 
> sizeof(untracked->stat_data));
> @@ -1665,6 +1677,7 @@ static int valid_cached_dir(struct dir_struct *dir,
> return 0;
> }
>
> +skip_stat:
> if (untracked->check_only != !!check_only) {
> invalidate_directory(dir->untracked, untracked);
> return 0;

[...]

> +void refresh_by_fsmonitor(struct index_state *istate)
> +{
> +   static has_run_once = FALSE;
> +   struct strbuf query_result = STRBUF_INIT;
> +   int query_success = 0;
> +   size_t bol = 0; /* beginning of line */
> +   time_t last_update;
> +   char *buf, *entry;
> +   int i;
> +
> +   if (!core_fsmonitor || has_run_once)
> +   return;
> +   has_run_once = TRUE;
> +
> +   /*
> +* This could be racy so save the date/time now and the hook
> +* should be inclusive to ensure we don't miss potential changes.
> +*/
> +   last_update = time(NULL);
> +
> +   /* If we have a last update time, call query-monitor for the set of 
> changes since that time */
> +   if (istate->fsmonitor_last_update) {
> +   query_success = 
> !query_fsmonitor(istate->fsmonitor_last_update, &query_result);
> +   }

Braces can be removed.

> +   if (query_success) {
> +   /* Mark all entries returned by the monitor as dirty */
> +   buf = entry = query_result.buf;
> +   for (i = 0; i < query_result.len; i++) {
> +   if (buf[i] != '\0')
> +   continue;
> +   mark_file_dirty(istate, buf + bol);
> +   bol = i + 1;
> +   }
> +   if (bol < query_result.len)
> +   mark_file_dirty(istate, buf + bol);
> +
> +   /* Mark all clean entries up-to-date */
> +   for (i = 0; i < istate->cache_nr; i++) {
> +   struct cache_entry *ce = istate->cache[i];
> +   if (ce_stage(ce) || (ce->ce_flags & 
> CE_FSMONITOR_DIRTY))
> +   continue;
> +   ce_mark_uptodate(ce);
> +   }
> +
> +   /*
> +* Now that we've marked the invalid entries in the
> +* untracked-cache itself, we can mark the untracked cache for
> +* fsmonitor usage.
> +*/
> +   if (istate->untracked) {
> +   istate->untracked->use_fsmonitor = 1;
> +   }

Braces can be removed.

> +   }
> +   else {
> +   /* if we can't update the cache, fall back to checking them 
> all */

Re: [PATCH v2 0/6] Fast git status via a file system watcher

2017-05-24 Thread Christian Couder
> Design
> ~~
>
> A new git hook (query-fsmonitor) must exist and be enabled
> (core.fsmonitor=true) that takes a time_t formatted as a string and
> outputs to stdout all files that have been modified since the requested
> time.

Is there a reason why there is a new hook, instead of a
"core.fsmonitorquery" config option to which you could pass whatever
command line with options?

> A new 'fsmonitor' index extension has been added to store the time the
> fsmonitor hook was last queried and a ewah bitmap of the current
> 'fsmonitor-dirty' files. Unmarked entries are 'fsmonitor-clean', marked
> entries are 'fsmonitor-dirty.'
>
> As needed, git will call the query-fsmonitor hook proc for the set of
> changes since the index was last updated. Git then uses this set of
> files along with the list saved in the fsmonitor index extension to flag
> the potentially dirty index and untracked cache entries.

So this can work only if "core.untrackedCache" is set to true?

Thanks for working on this,
Christian.


Re: [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar

2017-05-24 Thread Jeff King
On Wed, May 24, 2017 at 11:44:46AM +0900, Junio C Hamano wrote:

> Patches around [PATCH 06-08/15] made some unexpected (at least to
> me) turns but the series told a coherent story, building on top of
> what has been achieved in the previous steps.

Yeah, those patches are not technically required for the rest of it. The
"strbuf" conversion for oc->path is just something I've wanted to fix
for a long time. It made sense to me to do it before teaching
handle_dotdot() to look at oc->path (because the interface to do so
changed a bit).

-Peff


Re: [PATCH 03/15] handle_revision_arg: stop using "dotdot" as a generic pointer

2017-05-24 Thread Jeff King
On Wed, May 24, 2017 at 11:45:39AM +0900, Junio C Hamano wrote:

> > It may make sense to pull each of these into its own helper. I didn't
> > really look because they're so small, and because the return semantics
> > seemed confusing to me. Some of them return, and some of them keep
> > parsing. Some of them restore the NUL they overwrite, and some do not.
> >
> > I didn't dig in to see if there are weird corner cases where they
> > misbehave.
> 
> I do not quite know what corner cases you meant, but I agree that
> many places in the codepath we forget to restore "^" we temporarily
> overwrite.  I suspect that none of them is deliberately leaving "^"
> unrestored and they are just being careless (or they truly do not
> care because they assume nobody will look at arg later).
> 
> And I think not restoring cannot be a correct thing to do.  After
> all of these parsing, add_rev_cmdline() wants to see arg_ intact.
> 
> But let's keep reading; perhaps they are addressed in a later patch,
> or they are left as-is, and either is OK.

I don't really know what corner cases I meant, either. :) I just saw
that the code looked funny, but nobody noticed for the common cases, so
I presumed any misbehavior would be from uncommon ones.

As far as "maybe not restoring is intentional", I wondered if there are
cases where we might allow multiple marks. E.g., if we wanted to allow
"foo^@^!", then we might need to progressively pull items off the end,
shortening the string. But I don't think that can be correct:

  - these marks generally don't make sense to combine in the first place

  - even if we allowed combinations, since we make only a single pass
through the function, we'd require the combinations to come in a
particular order

  - even if it were intentional to do so, we'd still be adding weird
stuff to add_rev_cmdline(), as you noted

But I had some other questions, too, about what's supposed to be in the
"name" field of "pending" in some of these cases. For instance, try:

  git tag foo 6a0bc7cf0efbefa5a949d958947e68e29534f04d
  git log --oneline --source foo^-

All of the commits are marked as coming from "foo". That's because of
two things:

  1. When we parse "^-", we turn the first character to NUL. So our call
 to add_parents_only() sees just "foo", with no tail.

  2. We never restore the "^". So later when we add "foo" itself, the
 arg name is still "foo".

So arguably that's correct (these all came from "foo", which is a
resolvable name, and I think that's what the name field of
add_pending_object is going for). But that means add_rev_cmdline() sees
the same munged string, which is probably wrong. We can't possibly get
that case right with munging since the add_rev_cmdline() and
add_pending_object() calls come in pairs. We'd have to actually copy the
pending name into a separate string instead.

So like I said, I was sufficiently confused about what was supposed to
happen that I didn't try fixing it.

-Peff


Re: [PATCH v2 4/7] grep: add support for the PCRE v1 JIT API

2017-05-24 Thread Ævar Arnfjörð Bjarmason
On Wed, May 24, 2017 at 7:17 AM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason   writes:
>
>> diff --git a/grep.c b/grep.c
>> index 1157529115..49e9aed457 100644
>> --- a/grep.c
>> +++ b/grep.c
>> @@ -351,6 +351,9 @@ static void compile_pcre1_regexp(struct grep_pat *p, 
>> const struct grep_opt *opt)
>>   const char *error;
>>   int erroffset;
>>   int options = PCRE_MULTILINE;
>> +#ifdef PCRE_CONFIG_JIT
>> + int canjit;
>> +#endif
>
> Is "canjit" a property purely of the library (e.g. version and
> compilation option), or of combination of that and nature of the
> pattern, or something else like the memory pressure?
>
> I am wondering if it is worth doing something like this:
>
> static int canjit = -1;
> if (canjit < 0) {
> pcre_config(PCRE_CONFIG_JIT, &canjit);
> }
>
> if it depends purely on the library linked to the process.

It purely depends on how the the library, was compiled. I just wrote
it like that because compiling the pattern is not a hot codepath (i.e.
we call this max 8 or so times or so, whereas exec will be called
thousands/millions/billions of times), so trying to avoid calling this
trivial function seemed pointless.

But looking at this again it would be simpler to combine what you're
suggesting with just passing a pointer to *.pcre[12]_jit_on directly,
skipping the canjit variables.

>> @@ -365,9 +368,20 @@ static void compile_pcre1_regexp(struct grep_pat *p, 
>> const struct grep_opt *opt)
>>   if (!p->pcre1_regexp)
>>   compile_regexp_failed(p, error);
>>
>> - p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, &error);
>> + p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 
>> PCRE_STUDY_JIT_COMPILE, &error);
>>   if (!p->pcre1_extra_info && error)
>>   die("%s", error);
>> +
>> +#ifdef PCRE_CONFIG_JIT
>> + pcre_config(PCRE_CONFIG_JIT, &canjit);
>> + if (canjit == 1) {
>> + p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
>> + if (!p->pcre1_jit_stack)
>> + die("BUG: Couldn't allocate PCRE JIT stack");
>
> I agree that dying is OK, but as far as I can tell, this is not a
> BUG (there is no error a programmer can correct by a follow-up
> patch); please do not mark it as such (it is likely that we'll later
> do a tree-wide s/die("BUG:/BUG("/ and this will interfere).

Makes sense. Looks like the convention for this sort of thing is to
just do s/BUG: //, e.g. the code in wrapper.c does that.

>> + pcre_assign_jit_stack(p->pcre1_extra_info, NULL, 
>> p->pcre1_jit_stack);
>> + p->pcre1_jit_on = 1;
>
> Contrary to what I wondered about "canjit" above, I think it makes
> tons of sense to contain the "is JIT in use?" information in "struct
> grep_pat" and not rely on any global state.  Not that we are likely
> to want to be able to JIT some patterns while not doing others.  So
> I agree with the design choice of adding pcre1_jit_on field to the
> structure.
>
> But then, wouldn't it make more sense to do all of the above without
> the canjit variable at all?  i.e. something like...
>
> #ifdef PCRE_CONFIG_JIT
> pcre_config(PCRE_CONFIG_JIT, &p->pcre1_jit_on);
> if (p->pcre1_jit_on)
> ... stack thing ...
> #else
> p->pcre1_jit_on = 0;
> #endif

*Nod*

>> +#ifdef PCRE_CONFIG_JIT
>> + if (p->pcre1_jit_on) {
>> + pcre_free_study(p->pcre1_extra_info);
>> + pcre_jit_stack_free(p->pcre1_jit_stack);
>> + } else
>> +#endif
>> + /* PCRE_CONFIG_JIT !p->pcre1_jit_on else branch */
>>   pcre_free(p->pcre1_extra_info);
>> +
>>   pcre_free((void *)p->pcre1_tables);
>
> It is very thoughtful to add a blank line here (and you did the same
> in another similar hunk), but I have a feeling that it is still a
> bit too subtle a hint to signal to the readers that these two
> pcre_free()s fire differently, i.e. the former does not fire if jit
> is on but the latter always fires.
>
> Would this be a bit safer while being not too ugly to live, I wonder?
>
> #ifdef PCRE_CONFIG_JIT
> if (p->pcre1_jit_on) {
> pcre_free_study(p->pcre1_extra_info);
> pcre_jit_stack_free(p->pcre1_jit_stack);
> } else
> #endif
> {
> /* PCRE_CONFIG_JIT !p->pcre1_jit_on else branch */
> pcre_free(p->pcre1_extra_info);
> }
> pcre_free((void *)p->pcre1_tables);
>

Makes sense. I'll change it.


Re: [PATCH v2 25/25] cache_ref_iterator_begin(): avoid priming unneeded directories

2017-05-24 Thread Junio C Hamano
Michael Haggerty  writes:

> Junio, if a reroll is not needed for other reasons, would you mind
> squashing this into the last patch of my series?

Will do, but won't happen until morning in Tokyo.  I've just
finished today's integration cycle.

Thanks.


Re: [PATCH v2 25/25] cache_ref_iterator_begin(): avoid priming unneeded directories

2017-05-24 Thread Michael Haggerty
On 05/23/2017 09:45 PM, Jeff King wrote:
> On Mon, May 22, 2017 at 04:17:55PM +0200, Michael Haggerty wrote:
> 
>> So:
>>
>> * Move the responsibility for doing the prefix check directly to
>>   `cache_ref_iterator`. This means that `cache_ref_iterator_begin()`
>>   never has to wrap its return value in a `prefix_ref_iterator`.
>>
>> * Teach `cache_ref_iterator_begin()` (and `prime_ref_dir()`) to be
>>   stricter about what they iterate over and what directories they
>>   prime.
>>
>> * Teach `cache_ref_iterator` to keep track of whether the current
>>   `cache_ref_iterator_level` is fully within the prefix. If so, skip
>>   the prefix checks entirely.
> 
> As promised, I came back to this one with a more careful eye. These
> changes all make sense to me, and the implementation matches.
> 
> My only question is:
> 
>> @@ -511,9 +582,12 @@ struct ref_iterator *cache_ref_iterator_begin(struct 
>> ref_cache *cache,
>>  level->index = -1;
>>  level->dir = dir;
>>  
>> -if (prefix && *prefix)
>> -ref_iterator = prefix_ref_iterator_begin(ref_iterator,
>> - prefix, 0);
>> +if (prefix && *prefix) {
>> +iter->prefix = xstrdup(prefix);
>> +level->prefix_state = PREFIX_WITHIN_DIR;
>> +} else {
>> +level->prefix_state = PREFIX_CONTAINS_DIR;
>> +}
> 
> Who frees the prefix? Does this need:
> 
> diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> index fda3942db..a3efc5c51 100644
> --- a/refs/ref-cache.c
> +++ b/refs/ref-cache.c
> @@ -542,6 +542,7 @@ static int cache_ref_iterator_abort(struct ref_iterator 
> *ref_iterator)
>   struct cache_ref_iterator *iter =
>   (struct cache_ref_iterator *)ref_iterator;
>  
> + free(iter->prefix);
>   free(iter->levels);
>   base_ref_iterator_free(ref_iterator);
>   return ITER_DONE;

Yes, you are right. Thanks for catching this.

Note: it has to be

free((char *)iter->prefix);

because `prefix` is const.

Junio, if a reroll is not needed for other reasons, would you mind
squashing this into the last patch of my series?

Michael



Re: [PATCH 00/29] Add blame to libgit

2017-05-24 Thread Junio C Hamano
I haven't had a chance to take a deeper look, but what I saw in
merge conflicts with the rest of the system made all sense to me ;-)
Will take a deeper look tomorrow.

Thanks.