difflame improvements

2017-03-05 Thread Edmundo Carmona Antoranz
Hi!

Since my last post the biggest improvement is the ability to detect
that the user has requested a "reverse" analysis.

Under "normal" circumstances a user would ask difflame to get the diff
from an ancestor (call "difflame treeish1 treeish2" so that merge-base
of treeish1 treeish2 equals treeish1). In this case the blame result
is done using straight blame output for added lines and additional
analysis to detect where a line was deleted (analysis has improved a
lot in this regard I haven't heard anything from Peff, though).
But if the user requests the opposite (call "difflame treeish1
treeish2" so that merge-base of treeish1 treeish2 is treeish2) then
the analysis has to be driven "in reverse".

Here's one example taken from difflame itself:

normal "forward" call (hope output doesn't get butchered):

$ ./difflame.py HEAD~3 HEAD~2
diff --git a/difflame.py b/difflame.py
index e70154a..04c7577 100755
--- a/difflame.py
+++ b/difflame.py
@@ -365,7 +365,7 @@ def get_full_revision_id(revision):
 e5b218e4 (Edmundo 2017-02-01 365) # we already had the revision
 50528377 (Edmundo 2017-03-04 366) return REVISIONS_ID_CACHE[revision]
 d1d11d8a (Edmundo 2017-02-02 367) # fallback to get it from git
   b1a6693 use rev-list to get revision IDs
-b1a6693 (Edmundo 2017-03-04 368) full_revision =
run_git_command(["show", "--pretty=%H", revision]).split("\n")[0]
+b1a66932 (Edmundo 2017-03-04 368) full_revision =
run_git_command(["rev-list", "--max-count=1",
revision]).split("\n")[0]
 50528377 (Edmundo 2017-03-04 369) REVISIONS_ID_CACHE[revision] =
full_revision
 e5b218e4 (Edmundo 2017-02-01 370) return full_revision
 91b7d3f5 (Edmundo 2017-01-31 371)

"reverse" call:
$ ./difflame.py HEAD~2 HEAD~3
diff --git a/difflame.py b/difflame.py
index 04c7577..e70154a 100755
--- a/difflame.py
+++ b/difflame.py
@@ -365,7 +365,7 @@ def get_full_revision_id(revision):
 e5b218e4 (Edmundo 2017-02-01 365) # we already had the revision
 50528377 (Edmundo 2017-03-04 366) return REVISIONS_ID_CACHE[revision]
 d1d11d8a (Edmundo 2017-02-02 367) # fallback to get it from git
   b1a6693 use rev-list to get revision IDs
-b1a66932 (Edmundo 2017-03-04 368) full_revision =
run_git_command(["rev-list", "--max-count=1",
revision]).split("\n")[0]
+b1a6693 (Edmundo 2017-03-04 368) full_revision =
run_git_command(["show", "--pretty=%H", revision]).split("\n")[0]
 50528377 (Edmundo 2017-03-04 369) REVISIONS_ID_CACHE[revision] =
full_revision
 e5b218e4 (Edmundo 2017-02-01 370) return full_revision
 91b7d3f5 (Edmundo 2017-01-31 371)

Notice how the revision reported in both difflame calls is the same:

$ git show b1a66932
commit b1a66932704245fd653f8d48c0a718f168f334a7
Author: Edmundo Carmona Antoranz 
Date:   Sat Mar 4 13:59:50 2017 -0600

   use rev-list to get revision IDs

diff --git a/difflame.py b/difflame.py
index e70154a..04c7577 100755
--- a/difflame.py
+++ b/difflame.py
@@ -365,7 +365,7 @@ def get_full_revision_id(revision):
# we already had the revision
return REVISIONS_ID_CACHE[revision]
# fallback to get it from git
-full_revision = run_git_command(["show", "--pretty=%H",
revision]).split("\n")[0]
+full_revision = run_git_command(["rev-list", "--max-count=1",
revision]).split("\n")[0]
REVISIONS_ID_CACHE[revision] = full_revision
return full_revision


If this "detection" to perform reverse analysis hadn't been done, then
there wouldn't be a lot of useful information because there are no
revisions in HEAD~2..HEAD~3 and so the output would have been
something like:

diff --git a/difflame.py b/difflame.py
index 04c7577..e70154a 100755
--- a/difflame.py
+++ b/difflame.py
@@ -365,7 +365,7 @@ def get_full_revision_id(revision):
 e5b218e4 (Edmundo 2017-02-01 365) # we already had the revision
 50528377 (Edmundo 2017-03-04 366) return REVISIONS_ID_CACHE[revision]
 d1d11d8a (Edmundo 2017-02-02 367) # fallback to get it from git
   b1a6693 use rev-list to get revision IDs
%b1a6693 (Edmundo 2017-03-04 368) full_revision =
run_git_command(["rev-list", "--max-count=1",
revision]).split("\n")[0]
   e5b218e printing hints for deleted lines
+e5b218e4 (Edmundo 2017-02-01 368) full_revision =
run_git_command(["show", "--pretty=%H", revision]).split("\n")[0]
 50528377 (Edmundo 2017-03-04 369) REVISIONS_ID_CACHE[revision] =
full_revision
 e5b218e4 (Edmundo 2017-02-01 370) return full_revision
 91b7d3f5 (Edmundo 2017-01-31 371)

Notice how both the added line and the deleted line are reporting the
_wrong_ revision. It should be b1a66932 in all cases.


One quest

Re: difflame improvements

2017-02-18 Thread Edmundo Carmona Antoranz
On Fri, Feb 17, 2017 at 1:01 AM, Edmundo Carmona Antoranz
 wrote:
> On Thu, Feb 16, 2017 at 11:17 PM, Jeff King  wrote:
>
>> This isn't difflame's fault; that's what "git blame" tells you about
>> that line. But since I already told difflame "v2.6.5..HEAD", it would
>> probably make sense to similarly limit the blame to that range. That
>> turns up a boundary commit for the line. Which is _also_ not helpful,
>> but at least the tool is telling me that the line came from before
>> v2.6.5, and I don't really need to care much about it.
>
>
> I'm running my own tests on difflame and I have a theory about "when"
> it breaks at least one of the cases when it breaks:
>
> Analysis for deleted lines is being driven by git blame --reverse.
> What I have noticed is that it "breaks" when blame --reverse drives
> the analysis into revisions where "treeish1" is not part of their
> history (like, bringing analysis "to the sides" of treeish1 instead of
> keeping analysis in revisions in the history of treeish2 that have
> treeish1 as one of their ancestors which is definitely a valid
> case for analysis, anyway). In this case, blame --reverse stops being
> helpful.
>

At the cost of being slower, I just pushed to master the best results yet.

The workaround I developed for the case I described on the previous
mail ended up providing much better results overall so I ended up
replacing the whole merge-analysis logic with it.

Thanks for your kind help and comments, Peff. Let me know how it goes.


Re: difflame improvements

2017-02-16 Thread Edmundo Carmona Antoranz
On Thu, Feb 16, 2017 at 11:17 PM, Jeff King  wrote:

> This isn't difflame's fault; that's what "git blame" tells you about
> that line. But since I already told difflame "v2.6.5..HEAD", it would
> probably make sense to similarly limit the blame to that range. That
> turns up a boundary commit for the line. Which is _also_ not helpful,
> but at least the tool is telling me that the line came from before
> v2.6.5, and I don't really need to care much about it.


I'm running my own tests on difflame and I have a theory about "when"
it breaks at least one of the cases when it breaks:

Analysis for deleted lines is being driven by git blame --reverse.
What I have noticed is that it "breaks" when blame --reverse drives
the analysis into revisions where "treeish1" is not part of their
history (like, bringing analysis "to the sides" of treeish1 instead of
keeping analysis in revisions in the history of treeish2 that have
treeish1 as one of their ancestors which is definitely a valid
case for analysis, anyway). In this case, blame --reverse stops being
helpful.

Take this example (I just pushed a debug-deletion branch into gh...
probably more debugging messages will be needed):

$ difflame.py HEAD~100 HEAD -- Documentation/git-commit.txt
diff --git a/Documentation/git-commit.txt b/Documentation/git-commit.txt
index f2ab0ee2e..4f8f20a36 100644
--- a/Documentation/git-commit.txt
+++ b/Documentation/git-commit.txt
@@ -265,7 +265,8 @@ FROM UPSTREAM REBASE" section in linkgit:git-rebase[1].)
bcf9626a71 (Matthieu Moy  2016-06-28 13:40:11 +0200 265)   If this
option is specified together with `--amend`, then
04c8ce9c1c (Markus Heidelberg 2008-12-19 13:14:18 +0100 266)   no
paths need to be specified, which can be used to amend
d4ba07cac5 (Johannes Sixt 2008-04-10 13:33:09 +0200 267)   the
last commit without committing changes that have
   Range of revisions: 02db2d..066fb04
   Treeish1 02db2d04: 02db2d042 Merge branch 'ah/grammos'
   Treeish2 066fb0494: 066fb0494 blame: draft of line format
   Blamed Revision afe0e2a39: afe0e2a39 Merge branch
'da/difftool-dir-diff-fix'
   Original Filename a/Documentation/git-commit.txt Deleted Line 268
   Children revisions:
   3aead1cad7a: 3aead1cad Merge branch 'ak/commit-only-allow-empty'
   There's only one child revision on that revision the line
we are tracking is gone
   Parents of this child revision:
   afe0e2a39166: afe0e2a39 Merge branch 'da/difftool-dir-diff-fix'
   beb635ca9ce: beb635ca9 commit: remove 'Clever' message
for --only --amend
   Finding parent where the line has been deleted:
   beb635ca9: beb635ca9 commit: remove 'Clever' message
for --only --amend
   Range of revisions: 02db2d042..beb635c
   Treeish1 02db2d0: 02db2d042 Merge branch 'ah/grammos'
   Treeish2 beb635c: beb635ca9 commit: remove 'Clever'
message for --only --amend
   Blamed Revision 02db2d0: 02db2d042 Merge branch 'ah/grammos'
   Original Filename a/Documentation/git-commit.txt Deleted Line 268
   Children revisions:
   Found no children... will return the original blamed revision
(02db2d0) saying that the deleting revision could not be found
   beb635ca9 commit: remove 'Clever' message for --only --amend
-beb635ca9 (Andreas Krey 2016-12-09 05:10:21 +0100 268)
already been staged.
   319d83524 commit: make --only --allow-empty work without paths
+319d835240 (Andreas Krey  2016-12-02 23:15:13 +0100 268)
already been staged. ...
+319d835240 (Andreas Krey  2016-12-02 23:15:13 +0100 269)   paths
are also not requi...
d4ba07cac5 (Johannes Sixt 2008-04-10 13:33:09 +0200 270)
1947bdbc31 (Junio C Hamano2008-06-22 14:32:27 -0700 271) -u[]::
1947bdbc31 (Junio C Hamano2008-06-22 14:32:27 -0700 272)
--untracked-files[=]::



I know that line 268 was deleted on 319d835240.

So on the first round of merge analysis it says "let's go into
beb635ca9". That's fine. That's exactly the path that is required to
reach 319d835240. But then when using this new "range of revisions"
for git blame --reverse, we get that line 268 is not telling us
anything useful:

$ git blame --reverse -L268,268 02db2d042..beb635c --
Documentation/git-commit.txt
^02db2d042 (Junio C Hamano 2016-12-19 14:45:30 -0800 268)
already been staged.

So, instead of pointing to 319d835240 (the parent of beb635c), it's
basically saying something like "I give up". My hunch (haven't sat
down to digest all the details about the output of git blame
--reverse... YET) is that, given that 02db2d042 is _not_ part of the
history of beb635c, git blame reverse is trying to tell me just
that... and that means I'll have to "script around this scenario".

$ git merge-base 02db2d042 beb635c
0202c411edc25940cc381bf317badcdf67670be4


Thanks in advance.


Re: difflame improvements

2017-02-16 Thread Jeff King
On Tue, Feb 14, 2017 at 11:19:05PM -0600, Edmundo Carmona Antoranz wrote:

> I've been working on detecting revisions where a "real" deletion was
> made and I think I advanced a lot in that front. I still have to work
> on many scenarios (renamed files, for example... also performance) but
> at least I'm using a few runs against git-scm history and the results
> are "promising":

I played with this a bit more, and it did turn up the correct results
for some deletions in my experiments.

One thing I noticed is that it also turned up nonsense for lines that
blame in weird ways. For instance, I have a diff like this (these are
real examples, but don't pay attention to the sha1s; it's in a fork of
git, not upstream):

  $ git diff v2.6.5 builtin/prune-packed.c
  diff --git a/builtin/prune-packed.c b/builtin/prune-packed.c
  index 7cf900ea07..5e3727e841 100644
  --- a/builtin/prune-packed.c
  +++ b/builtin/prune-packed.c
  @@ -2,6 +2,7 @@
   #include "cache.h"
   #include "progress.h"
   #include "parse-options.h"
  +#include "gh-log.h"
   
   static const char * const prune_packed_usage[] = {
N_("git prune-packed [-n | --dry-run] [-q | --quiet]"),
  @@ -29,8 +30,11 @@ static int prune_object(const unsigned char *sha1, const 
char *path,
   
if (*opts & PRUNE_PACKED_DRY_RUN)
printf("rm -f %s\n", path);
  - else
  + else {
  + gh_logf("prune", "%s Duplicate loose object pruned\n",
  + sha1_to_hex(sha1));
unlink_or_warn(path);
  + }
return 0;
   }
   

Running difflame on it says this:

  $ python /path/to/difflame.py v2.6.5..HEAD -- builtin/prune-packed.c
  [...]
  -2c0b29e662 (Jeff King 2016-01-26 15:27:55 -0500 32)  else
  +d60032f8640 builtin/prune-packed.c (Jeff King2015-02-02 23:15:33 
-0500 33)   else {
  +d60032f8640 builtin/prune-packed.c (Jeff King2015-02-02 23:15:33 
-0500 34)   gh_logf("prune", "%s Duplicate loose object pruned\n",
  +d60032f8640 builtin/prune-packed.c (Jeff King2015-02-02 23:15:33 
-0500 35)   sha1_to_hex(sha1));
   0d3b729680e builtin/prune-packed.c (Jeff King2014-10-15 18:40:53 
-0400 36)   unlink_or_warn(path);
  +2396ec85bd1 prune-packed.c (Linus Torvalds   2005-07-03 14:27:34 
-0700 37)   }

There are two weird things. One is that the old "else" is attributed to
my 2c0b29e662. That's quite weird, because that is a merge commit which
did not touch the file at all. I haven't tracked it down, but presumably
that is weirdness with the --reverse blame.

But there's another one, that I think is easy to fix. The closing brace
is attributed to some ancient commit from Linus. Which yes, I'm sure had
a closing brace, but not _my_ closing brace that was added by
d60032f8640, that the rest of the lines got attributed to.

This isn't difflame's fault; that's what "git blame" tells you about
that line. But since I already told difflame "v2.6.5..HEAD", it would
probably make sense to similarly limit the blame to that range. That
turns up a boundary commit for the line. Which is _also_ not helpful,
but at least the tool is telling me that the line came from before
v2.6.5, and I don't really need to care much about it.

Part of this is that my use case may be a bit different than yours. I
don't actually want to look at the blame results directly. I just want
to see the set of commits that I'd need to look at and possibly
cherry-pick in order to re-create the diff.

-Peff


difflame improvements

2017-02-14 Thread Edmundo Carmona Antoranz
Hi!

I've been working on detecting revisions where a "real" deletion was
made and I think I advanced a lot in that front. I still have to work
on many scenarios (renamed files, for example... also performance) but
at least I'm using a few runs against git-scm history and the results
are "promising":

23:05 $ git blame -s --reverse -L 25,40 HEAD~20..HEAD -- versioncmp.c
066fb0494e 25) static int initialized;
066fb0494e 26)
066fb0494e 27) /*
8ec68d1ae2 28)  * p1 and p2 point to the first different character in
two strings. If
8ec68d1ae2 29)  * either p1 or p2 starts with a prerelease suffix, it
will be forced
8ec68d1ae2 30)  * to be on top.
8ec68d1ae2 31)  *
8ec68d1ae2 32)  * If both p1 and p2 start with (different) suffix, the order is
8ec68d1ae2 33)  * determined by config file.
066fb0494e 34)  *
8ec68d1ae2 35)  * Note that we don't have to deal with the situation
when both p1 and
8ec68d1ae2 36)  * p2 start with the same suffix because the common
part is already
8ec68d1ae2 37)  * consumed by the caller.
066fb0494e 38)  *
066fb0494e 39)  * Return non-zero if *diff contains the return value
for versioncmp()
066fb0494e 40)  */

Lines 28-33:

23:05 $ git show --summary 8ec68d1ae2
commit 8ec68d1ae2863823b74d67c5e92297e38bbf97bc
Merge: e801be066 c48886779
Author: Junio C Hamano <>
Date:   Mon Jan 23 15:59:21 2017 -0800

Merge branch 'vn/diff-ihc-config'

"git diff" learned diff.interHunkContext configuration variable
that gives the default value for its --inter-hunk-context option.

* vn/diff-ihc-config:
  diff: add interhunk context config option



And this is not telling me the _real_ revision where the lines were
_deleted_ so it's not very helpful, as Peff has already mentioned.

Running difflame:

23:06 $ time ~/proyectos/git/difflame/difflame.py -bp=-s -w HEAD~20
HEAD -- versioncmp.c
diff --git a/versioncmp.c b/versioncmp.c
index 80bfd109f..9f81dc106 100644
--- a/versioncmp.c
+++ b/versioncmp.c
@@ -24,42 +24,83 @@
.
.
.
+b17846432d  33) static void find_better_matching_suffix(const char
*tagname, const char *suffix,
+b17846432d  34)int
suffix_len, int start, int conf_pos,
+b17846432d  35)struct
suffix_match *match)
+b17846432d  36) {
b17846432d  37)/*
   c026557a3 versioncmp: generalize version sort suffix reordering
-c026557a3 (SZEDER 28)  * p1 and p2 point to the first different
character in two strings. If
-c026557a3 (SZEDER 29)  * either p1 or p2 starts with a prerelease
suffix, it will be forced
-c026557a3 (SZEDER 30)  * to be on top.
-c026557a3 (SZEDER 31)  *
-c026557a3 (SZEDER 32)  * If both p1 and p2 start with (different)
suffix, the order is
-c026557a3 (SZEDER 33)  * determined by config file.
   b17846432 versioncmp: factor out helper for suffix matching
+b17846432d  38) * A better match either starts earlier or
starts at the same offset
+b17846432d  39) * but is longer.
+b17846432d  40) */
+b17846432d  41)int end = match->len < suffix_len ?
match->start : match->start-1;
.
.
.

Same range of (deleted) lines:

23:10 $ git --show --name-status c026557a3
commit c026557a37361b7019acca28f240a19f546739e9
Author: SZEDER Gábor <>
Date:   Thu Dec 8 15:24:01 2016 +0100

   versioncmp: generalize version sort suffix reordering

   The 'versionsort.prereleaseSuffix' configuration variable, as its name
   suggests, is supposed to only deal with tagnames with prerelease
.
.
.


   Signed-off-by: SZEDER Gábor <>
   Signed-off-by: Junio C Hamano <>

M   Documentation/config.txt
M   Documentation/git-tag.txt
M   t/t7004-tag.sh
M   versioncmp.c


This is the revision where the deletion happened.

That's it for the time being.


Re: difflame

2017-02-02 Thread Edmundo Carmona Antoranz
On Thu, Feb 2, 2017 at 10:46 PM, Edmundo Carmona Antoranz
 wrote:
> I have been "scripting around git blame --reverse" for some days now.
> Mind taking a look? I've been working on blame-deletions for this.

blame-deletions branch, that is

Sorry for the previous top-posting.


Re: difflame

2017-02-02 Thread Edmundo Carmona Antoranz
I have been "scripting around git blame --reverse" for some days now.
Mind taking a look? I've been working on blame-deletions for this.

22:41 $ ../difflame/difflame.py HEAD~5 HEAD
diff --git a/file b/file
index b414108..051c298 100644
--- a/file
+++ b/file
@@ -1,6 +1,9 @@
^1513353 (Edmundo 2017-02-02 22:41:45 -0600 1) 1
f20952eb (Edmundo 2017-02-02 22:41:45 -0600 2) 2
bb04dc7c (Edmundo 2017-02-02 22:41:45 -0600 3) 3
-de3c07b (Edmundo 2017-02-02 22:41:47 -060 4) 4
058ea125 (Edmundo 2017-02-02 22:41:45 -0600 4) 5
85fc6b81 (Edmundo 2017-02-02 22:41:45 -0600 5) 6
+2cd990a6 (Edmundo 2017-02-02 22:41:45 -0600 6) 7
+ab0be970 (Edmundo 2017-02-02 22:41:45 -0600 7) 8
+944452c0 (Edmundo 2017-02-02 22:41:45 -0600 8) 9
+6641edb0 (Edmundo 2017-02-02 22:41:45 -0600 9) 10


$ git show de3c07b
commit de3c07bc21a83472d5c5ddf172dcb742665924dd (HEAD -> master)
Author: Edmundo 
Date:   Thu Feb 2 22:41:47 2017 -0600

   drop 4

diff --git a/file b/file
index f00c965..051c298 100644
--- a/file
+++ b/file
@@ -1,7 +1,6 @@
1
2
3
-4
5
6
7


Next step: solve the
find-real-deletion-revision-when-there-are-multiple-child-nodes
problem and let me read the discussion around git blame --reverse.

Thanks in advance.

On Mon, Jan 30, 2017 at 8:37 PM, Edmundo Carmona Antoranz
 wrote:
> Maybe a little work on blame to get the actual revision where some
> lines were "deleted"?
>
> Something like git blame --blame-deletion that could be based on --reverse?
>
> On Mon, Jan 30, 2017 at 7:35 PM, Edmundo Carmona Antoranz
>  wrote:
>> I'm thinking of something like this:
>>
>> Say I just discovered a problem in a file I want to see who worked
>> on it since some revision that I know is working fine (or even
>> something as generic as HEAD~100..). It could be a number of people
>> with different revisions. I would diff to see what changed and
>> blame the added lines (blame reverse to try to get as close as
>> possible with a single command in case I want to see what happened
>> with something that was deleted). If I could get blame information of
>> added/deleted lines in a single run, that would help a lot.
>>
>> Lo and behold: difflame.
>>
>>
>>
>> On Mon, Jan 30, 2017 at 5:26 PM, Jeff King  wrote:
>>> On Mon, Jan 30, 2017 at 01:08:41PM -0800, Junio C Hamano wrote:
>>>
>>>> Jeff King  writes:
>>>>
>>>> > On Tue, Jan 17, 2017 at 11:24:02PM -0600, Edmundo Carmona Antoranz wrote:
>>>> >
>>>> >> For a very long time I had wanted to get the output of diff to include
>>>> >> blame information as well (to see when something was added/removed).
>>>> >
>>>> > This is something I've wanted, too. The trickiest part, though, is
>>>> > blaming deletions, because git-blame only tracks the origin of content,
>>>> > not the origin of a change.
>>>>
>>>> Hmph, this is a comment without looking at what difflame does
>>>> internally, so you can ignore me if I am misunderstood what problem
>>>> you are pointing out, but I am not sure how "tracks the origin of
>>>> content" could be a problem.
>>>>
>>>> If output from "git show" says this:
>>>>
>>>>   --- a/file
>>>>   +++ b/file
>>>>   @@ -1,5 +1,6 @@
>>>>a
>>>>b
>>>>   -c
>>>>   +C
>>>>   +D
>>>>d
>>>>e
>>>>
>>>> in order to annotate lines 'a', 'b', 'd', and 'e' for their origin,
>>>> you would run 'blame' on the commit the above output was taken from
>>>> (or its parent---they are in the context so either would be OK).
>>>>
>>>> You know where 'C' and 'D' came from already.  It's the commit you
>>>> are feeding "git show".
>>>
>>> I think the point (or at least as I understand it) is that the diff is
>>> not "git show" for a particular commit. It could be part of a much
>>> larger diff that covers many commits.
>>>
>>> As a non-hypothetical instance, I have a fork of git.git that has
>>> various enhancements. I want to feed those enhancements upstream. I need
>>> to know which commits should be cherry-picked to get those various
>>> enhancements.
>>>
>>> Looking at "log origin..fork" tells me too many commits, because it
>>> includes ones which aren't useful anymore. Either because they already
>

Re: difflame

2017-01-30 Thread Edmundo Carmona Antoranz
Maybe a little work on blame to get the actual revision where some
lines were "deleted"?

Something like git blame --blame-deletion that could be based on --reverse?

On Mon, Jan 30, 2017 at 7:35 PM, Edmundo Carmona Antoranz
 wrote:
> I'm thinking of something like this:
>
> Say I just discovered a problem in a file I want to see who worked
> on it since some revision that I know is working fine (or even
> something as generic as HEAD~100..). It could be a number of people
> with different revisions. I would diff to see what changed and
> blame the added lines (blame reverse to try to get as close as
> possible with a single command in case I want to see what happened
> with something that was deleted). If I could get blame information of
> added/deleted lines in a single run, that would help a lot.
>
> Lo and behold: difflame.
>
>
>
> On Mon, Jan 30, 2017 at 5:26 PM, Jeff King  wrote:
>> On Mon, Jan 30, 2017 at 01:08:41PM -0800, Junio C Hamano wrote:
>>
>>> Jeff King  writes:
>>>
>>> > On Tue, Jan 17, 2017 at 11:24:02PM -0600, Edmundo Carmona Antoranz wrote:
>>> >
>>> >> For a very long time I had wanted to get the output of diff to include
>>> >> blame information as well (to see when something was added/removed).
>>> >
>>> > This is something I've wanted, too. The trickiest part, though, is
>>> > blaming deletions, because git-blame only tracks the origin of content,
>>> > not the origin of a change.
>>>
>>> Hmph, this is a comment without looking at what difflame does
>>> internally, so you can ignore me if I am misunderstood what problem
>>> you are pointing out, but I am not sure how "tracks the origin of
>>> content" could be a problem.
>>>
>>> If output from "git show" says this:
>>>
>>>   --- a/file
>>>   +++ b/file
>>>   @@ -1,5 +1,6 @@
>>>a
>>>b
>>>   -c
>>>   +C
>>>   +D
>>>d
>>>e
>>>
>>> in order to annotate lines 'a', 'b', 'd', and 'e' for their origin,
>>> you would run 'blame' on the commit the above output was taken from
>>> (or its parent---they are in the context so either would be OK).
>>>
>>> You know where 'C' and 'D' came from already.  It's the commit you
>>> are feeding "git show".
>>
>> I think the point (or at least as I understand it) is that the diff is
>> not "git show" for a particular commit. It could be part of a much
>> larger diff that covers many commits.
>>
>> As a non-hypothetical instance, I have a fork of git.git that has
>> various enhancements. I want to feed those enhancements upstream. I need
>> to know which commits should be cherry-picked to get those various
>> enhancements.
>>
>> Looking at "log origin..fork" tells me too many commits, because it
>> includes ones which aren't useful anymore. Either because they already
>> went upstream, or because they were cherry-picked to the fork and their
>> upstream counterparts merged (or even equivalent commits made upstream
>> that obsoleted the features).
>>
>> Looking at "git diff origin fork" tells me what the actual differences
>> are, but it doesn't show me which commits are responsible for them.
>>
>> I can "git blame" each individual line of the diff (starting with "fork"
>> as the tip), but that doesn't work for lines that no longer exist (i.e.,
>> when the interesting change is a deletion).
>>
>>> In order to run blame to find where 'c' came from, you need to start
>>> at the _parent_ of the commit the above output came from, and the
>>> hunk header shows which line range to find the final 'c'.
>>
>> So perhaps that explains my comment more. "blame" is not good for
>> finding which commit took away a line. I've tried using "blame
>> --reverse", but it shows you the parent of the commit you are looking
>> for, which is slightly annoying. :)
>>
>> "git log -S" is probably a better tool for finding that.
>>
>> -Peff


Re: difflame

2017-01-30 Thread Edmundo Carmona Antoranz
I'm thinking of something like this:

Say I just discovered a problem in a file I want to see who worked
on it since some revision that I know is working fine (or even
something as generic as HEAD~100..). It could be a number of people
with different revisions. I would diff to see what changed and
blame the added lines (blame reverse to try to get as close as
possible with a single command in case I want to see what happened
with something that was deleted). If I could get blame information of
added/deleted lines in a single run, that would help a lot.

Lo and behold: difflame.



On Mon, Jan 30, 2017 at 5:26 PM, Jeff King  wrote:
> On Mon, Jan 30, 2017 at 01:08:41PM -0800, Junio C Hamano wrote:
>
>> Jeff King  writes:
>>
>> > On Tue, Jan 17, 2017 at 11:24:02PM -0600, Edmundo Carmona Antoranz wrote:
>> >
>> >> For a very long time I had wanted to get the output of diff to include
>> >> blame information as well (to see when something was added/removed).
>> >
>> > This is something I've wanted, too. The trickiest part, though, is
>> > blaming deletions, because git-blame only tracks the origin of content,
>> > not the origin of a change.
>>
>> Hmph, this is a comment without looking at what difflame does
>> internally, so you can ignore me if I am misunderstood what problem
>> you are pointing out, but I am not sure how "tracks the origin of
>> content" could be a problem.
>>
>> If output from "git show" says this:
>>
>>   --- a/file
>>   +++ b/file
>>   @@ -1,5 +1,6 @@
>>a
>>b
>>   -c
>>   +C
>>   +D
>>d
>>e
>>
>> in order to annotate lines 'a', 'b', 'd', and 'e' for their origin,
>> you would run 'blame' on the commit the above output was taken from
>> (or its parent---they are in the context so either would be OK).
>>
>> You know where 'C' and 'D' came from already.  It's the commit you
>> are feeding "git show".
>
> I think the point (or at least as I understand it) is that the diff is
> not "git show" for a particular commit. It could be part of a much
> larger diff that covers many commits.
>
> As a non-hypothetical instance, I have a fork of git.git that has
> various enhancements. I want to feed those enhancements upstream. I need
> to know which commits should be cherry-picked to get those various
> enhancements.
>
> Looking at "log origin..fork" tells me too many commits, because it
> includes ones which aren't useful anymore. Either because they already
> went upstream, or because they were cherry-picked to the fork and their
> upstream counterparts merged (or even equivalent commits made upstream
> that obsoleted the features).
>
> Looking at "git diff origin fork" tells me what the actual differences
> are, but it doesn't show me which commits are responsible for them.
>
> I can "git blame" each individual line of the diff (starting with "fork"
> as the tip), but that doesn't work for lines that no longer exist (i.e.,
> when the interesting change is a deletion).
>
>> In order to run blame to find where 'c' came from, you need to start
>> at the _parent_ of the commit the above output came from, and the
>> hunk header shows which line range to find the final 'c'.
>
> So perhaps that explains my comment more. "blame" is not good for
> finding which commit took away a line. I've tried using "blame
> --reverse", but it shows you the parent of the commit you are looking
> for, which is slightly annoying. :)
>
> "git log -S" is probably a better tool for finding that.
>
> -Peff


Re: difflame

2017-01-30 Thread Jeff King
On Mon, Jan 30, 2017 at 01:08:41PM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > On Tue, Jan 17, 2017 at 11:24:02PM -0600, Edmundo Carmona Antoranz wrote:
> >
> >> For a very long time I had wanted to get the output of diff to include
> >> blame information as well (to see when something was added/removed).
> >
> > This is something I've wanted, too. The trickiest part, though, is
> > blaming deletions, because git-blame only tracks the origin of content,
> > not the origin of a change.
> 
> Hmph, this is a comment without looking at what difflame does
> internally, so you can ignore me if I am misunderstood what problem
> you are pointing out, but I am not sure how "tracks the origin of
> content" could be a problem.
> 
> If output from "git show" says this:
> 
>   --- a/file
>   +++ b/file
>   @@ -1,5 +1,6 @@
>a
>b
>   -c
>   +C
>   +D
>d
>e
> 
> in order to annotate lines 'a', 'b', 'd', and 'e' for their origin,
> you would run 'blame' on the commit the above output was taken from
> (or its parent---they are in the context so either would be OK).
> 
> You know where 'C' and 'D' came from already.  It's the commit you
> are feeding "git show".

I think the point (or at least as I understand it) is that the diff is
not "git show" for a particular commit. It could be part of a much
larger diff that covers many commits.

As a non-hypothetical instance, I have a fork of git.git that has
various enhancements. I want to feed those enhancements upstream. I need
to know which commits should be cherry-picked to get those various
enhancements.

Looking at "log origin..fork" tells me too many commits, because it
includes ones which aren't useful anymore. Either because they already
went upstream, or because they were cherry-picked to the fork and their
upstream counterparts merged (or even equivalent commits made upstream
that obsoleted the features).

Looking at "git diff origin fork" tells me what the actual differences
are, but it doesn't show me which commits are responsible for them.

I can "git blame" each individual line of the diff (starting with "fork"
as the tip), but that doesn't work for lines that no longer exist (i.e.,
when the interesting change is a deletion).

> In order to run blame to find where 'c' came from, you need to start
> at the _parent_ of the commit the above output came from, and the
> hunk header shows which line range to find the final 'c'.

So perhaps that explains my comment more. "blame" is not good for
finding which commit took away a line. I've tried using "blame
--reverse", but it shows you the parent of the commit you are looking
for, which is slightly annoying. :)

"git log -S" is probably a better tool for finding that.

-Peff


Re: difflame

2017-01-30 Thread Junio C Hamano
Jeff King  writes:

> On Tue, Jan 17, 2017 at 11:24:02PM -0600, Edmundo Carmona Antoranz wrote:
>
>> For a very long time I had wanted to get the output of diff to include
>> blame information as well (to see when something was added/removed).
>
> This is something I've wanted, too. The trickiest part, though, is
> blaming deletions, because git-blame only tracks the origin of content,
> not the origin of a change.

Hmph, this is a comment without looking at what difflame does
internally, so you can ignore me if I am misunderstood what problem
you are pointing out, but I am not sure how "tracks the origin of
content" could be a problem.

If output from "git show" says this:

--- a/file
+++ b/file
@@ -1,5 +1,6 @@
 a
 b
-c
+C
+D
 d
 e

in order to annotate lines 'a', 'b', 'd', and 'e' for their origin,
you would run 'blame' on the commit the above output was taken from
(or its parent---they are in the context so either would be OK).

You know where 'C' and 'D' came from already.  It's the commit you
are feeding "git show".

In order to run blame to find where 'c' came from, you need to start
at the _parent_ of the commit the above output came from, and the
hunk header shows which line range to find the final 'c'.

So...



Re: difflame

2017-01-27 Thread Jeff King
On Tue, Jan 17, 2017 at 11:24:02PM -0600, Edmundo Carmona Antoranz wrote:

> For a very long time I had wanted to get the output of diff to include
> blame information as well (to see when something was added/removed).

This is something I've wanted, too. The trickiest part, though, is
blaming deletions, because git-blame only tracks the origin of content,
not the origin of a change.

For example, try this case:

  git init
  for i in $(seq 1 10); do
echo $i >>file
git add file
git commit -m "add $i"
  done
  sed 4d tmp && mv tmp file
  git commit -am "drop 4"

Running "difflame HEAD~5 HEAD" produces this output:

  diff --git a/file b/file
  index b414108..051c298 100644
  --- a/file
  +++ b/file
  @@ -1,6 +1,9 @@
   ^2b028ff (Jeff King 2017-01-27 22:44:10 -0500 1) 1
   ed056366 (Jeff King 2017-01-27 22:44:10 -0500 2) 2
   771030d8 (Jeff King 2017-01-27 22:44:10 -0500 3) 3
  -89c09c82 (Jeff King 2017-01-27 22:44:10 -0500 4) 4
   b619039c (Jeff King 2017-01-27 22:44:10 -0500 4) 5
   6a7aa0e5 (Jeff King 2017-01-27 22:44:10 -0500 5) 6
  +39bc9dc4 (Jeff King 2017-01-27 22:44:10 -0500 6) 7
  +f253cc8f (Jeff King 2017-01-27 22:44:10 -0500 7) 8
  +85c10f46 (Jeff King 2017-01-27 22:44:10 -0500 8) 9
  +89c09c82 (Jeff King 2017-01-27 22:44:10 -0500 9) 10

The last 4 lines are right; they correspond to the addition commits. But
the line taking away 4 is wrong. You can see even without looking at its
patch, because it is blamed to the same commit that added "10", which
is wrong.

Sorry I don't have a solution. I think it's an open problem with
git-blame, though you could probably script something around "git blame
--reverse". See the commit message of 85af7929e (git-blame --reverse,
2008-04-02) for some discussion.

-Peff


difflame

2017-01-17 Thread Edmundo Carmona Antoranz
Hi!

For a very long time I had wanted to get the output of diff to include
blame information as well (to see when something was added/removed).

I just created a very small (and rough) tool for that purpose. It's
written in python and if it gets to something better than a small
tool, I think it could be worth to be added into git main (perhaps
into contrib?).

If you want to give ir a try:
https://github.com/eantoranz/difflame

Just provide the two treeishs you would like to diff (no more
parameters are accepted at the time) and you will get the diff along
with blame. Running it right now on the project itself:

✔ ~/difflame [master L|⚑ 1]
23:21 $ ./difflame.py HEAD~3 HEAD~
diff --git a/README.txt b/README.txt
new file mode 100644
index 000..a82aa27
--- /dev/null
+++ b/README.txt
@@ -0,0 +1,11 @@
+3d426842 (Edmundo Carmona Antoranz 2017-01-17 22:26:18 -0600  1) difflame
+3d426842 (Edmundo Carmona Antoranz 2017-01-17 22:26:18 -0600  2)
+3d426842 (Edmundo Carmona Antoranz 2017-01-17 22:26:18 -0600  3)
Copyright 2017 Edmundo Carmona Antoranz
+3d426842 (Edmundo Carmona Antoranz 2017-01-17 22:26:18 -0600  4)
Released under the terms of GPLv2
+3d426842 (Edmundo Carmona Antoranz 2017-01-17 22:26:18 -0600  5)
+3d426842 (Edmundo Carmona Antoranz 2017-01-17 22:26:18 -0600  6) Show
the output of diff with the additional information of blame.
+3d426842 (Edmundo Carmona Antoranz 2017-01-17 22:26:18 -0600  7)
+3d426842 (Edmundo Carmona Antoranz 2017-01-17 22:26:18 -0600  8)
Lines that remain the same or that were added will indicate when those
lines were 'added' to the file
+3d426842 (Edmundo Carmona Antoranz 2017-01-17 22:26:18 -0600  9)
Lines that were removed will display the last revision where those
lines were _present_ on the file (as provided by blame --re
verse)
+3d426842 (Edmundo Carmona Antoranz 2017-01-17 22:26:18 -0600 10)
+3d426842 (Edmundo Carmona Antoranz 2017-01-17 22:26:18 -0600 11) At
the moment, only two parameters need to be provided: two treeishs to
get the diff from
diff --git a/difflame.py b/difflame.py
index f6e879b..06bfc03 100755
--- a/difflame.py
+++ b/difflame.py
@@ -112,16 +112,20 @@ def process_file_from_diff_output(output_lines,
starting_line):
c661286f (Edmundo Carmona Antoranz 2017-01-17 20:10:07 -0600 112)
diff_line = output_lines[i].split()
c661286f (Edmundo Carmona Antoranz 2017-01-17 20:10:07 -0600 113)
if diff_line[0] != "diff":
c661286f (Edmundo Carmona Antoranz 2017-01-17 20:10:07 -0600 114)
   raise Exception("Doesn't seem to exist a 'diff' line at line " +
str(i + 1) + ": " + output_lines[i])
-3d426842 (Edmundo Carmona Antoranz 2017-01-17 22:26:18 -0600 115)
original_name = diff_line[1]
-3d426842 (Edmundo Carmona Antoranz 2017-01-17 22:26:18 -0600 116)
final_name = diff_line[2]
+f135bf04 (Edmundo Carmona Antoranz 2017-01-17 22:50:50 -0600 115)
original_name = diff_line[2]
+f135bf04 (Edmundo Carmona Antoranz 2017-01-17 22:50:50 -0600 116)
final_name = diff_line[3]
c661286f (Edmundo Carmona Antoranz 2017-01-17 20:10:07 -0600 117)
print output_lines[i]; i+=1
.
.
.


Hope you find it useful

Best regards!