Re: difflame improvements

2017-02-18 Thread Edmundo Carmona Antoranz
On Fri, Feb 17, 2017 at 1:01 AM, Edmundo Carmona Antoranz
 wrote:
> On Thu, Feb 16, 2017 at 11:17 PM, Jeff King  wrote:
>
>> This isn't difflame's fault; that's what "git blame" tells you about
>> that line. But since I already told difflame "v2.6.5..HEAD", it would
>> probably make sense to similarly limit the blame to that range. That
>> turns up a boundary commit for the line. Which is _also_ not helpful,
>> but at least the tool is telling me that the line came from before
>> v2.6.5, and I don't really need to care much about it.
>
>
> I'm running my own tests on difflame and I have a theory about "when"
> it breaks at least one of the cases when it breaks:
>
> Analysis for deleted lines is being driven by git blame --reverse.
> What I have noticed is that it "breaks" when blame --reverse drives
> the analysis into revisions where "treeish1" is not part of their
> history (like, bringing analysis "to the sides" of treeish1 instead of
> keeping analysis in revisions in the history of treeish2 that have
> treeish1 as one of their ancestors which is definitely a valid
> case for analysis, anyway). In this case, blame --reverse stops being
> helpful.
>

At the cost of being slower, I just pushed to master the best results yet.

The workaround I developed for the case I described on the previous
mail ended up providing much better results overall so I ended up
replacing the whole merge-analysis logic with it.

Thanks for your kind help and comments, Peff. Let me know how it goes.


Re: difflame improvements

2017-02-16 Thread Edmundo Carmona Antoranz
On Thu, Feb 16, 2017 at 11:17 PM, Jeff King  wrote:

> This isn't difflame's fault; that's what "git blame" tells you about
> that line. But since I already told difflame "v2.6.5..HEAD", it would
> probably make sense to similarly limit the blame to that range. That
> turns up a boundary commit for the line. Which is _also_ not helpful,
> but at least the tool is telling me that the line came from before
> v2.6.5, and I don't really need to care much about it.


I'm running my own tests on difflame and I have a theory about "when"
it breaks at least one of the cases when it breaks:

Analysis for deleted lines is being driven by git blame --reverse.
What I have noticed is that it "breaks" when blame --reverse drives
the analysis into revisions where "treeish1" is not part of their
history (like, bringing analysis "to the sides" of treeish1 instead of
keeping analysis in revisions in the history of treeish2 that have
treeish1 as one of their ancestors which is definitely a valid
case for analysis, anyway). In this case, blame --reverse stops being
helpful.

Take this example (I just pushed a debug-deletion branch into gh...
probably more debugging messages will be needed):

$ difflame.py HEAD~100 HEAD -- Documentation/git-commit.txt
diff --git a/Documentation/git-commit.txt b/Documentation/git-commit.txt
index f2ab0ee2e..4f8f20a36 100644
--- a/Documentation/git-commit.txt
+++ b/Documentation/git-commit.txt
@@ -265,7 +265,8 @@ FROM UPSTREAM REBASE" section in linkgit:git-rebase[1].)
bcf9626a71 (Matthieu Moy  2016-06-28 13:40:11 +0200 265)   If this
option is specified together with `--amend`, then
04c8ce9c1c (Markus Heidelberg 2008-12-19 13:14:18 +0100 266)   no
paths need to be specified, which can be used to amend
d4ba07cac5 (Johannes Sixt 2008-04-10 13:33:09 +0200 267)   the
last commit without committing changes that have
   Range of revisions: 02db2d..066fb04
   Treeish1 02db2d04: 02db2d042 Merge branch 'ah/grammos'
   Treeish2 066fb0494: 066fb0494 blame: draft of line format
   Blamed Revision afe0e2a39: afe0e2a39 Merge branch
'da/difftool-dir-diff-fix'
   Original Filename a/Documentation/git-commit.txt Deleted Line 268
   Children revisions:
   3aead1cad7a: 3aead1cad Merge branch 'ak/commit-only-allow-empty'
   There's only one child revision on that revision the line
we are tracking is gone
   Parents of this child revision:
   afe0e2a39166: afe0e2a39 Merge branch 'da/difftool-dir-diff-fix'
   beb635ca9ce: beb635ca9 commit: remove 'Clever' message
for --only --amend
   Finding parent where the line has been deleted:
   beb635ca9: beb635ca9 commit: remove 'Clever' message
for --only --amend
   Range of revisions: 02db2d042..beb635c
   Treeish1 02db2d0: 02db2d042 Merge branch 'ah/grammos'
   Treeish2 beb635c: beb635ca9 commit: remove 'Clever'
message for --only --amend
   Blamed Revision 02db2d0: 02db2d042 Merge branch 'ah/grammos'
   Original Filename a/Documentation/git-commit.txt Deleted Line 268
   Children revisions:
   Found no children... will return the original blamed revision
(02db2d0) saying that the deleting revision could not be found
   beb635ca9 commit: remove 'Clever' message for --only --amend
-beb635ca9 (Andreas Krey 2016-12-09 05:10:21 +0100 268)
already been staged.
   319d83524 commit: make --only --allow-empty work without paths
+319d835240 (Andreas Krey  2016-12-02 23:15:13 +0100 268)
already been staged. ...
+319d835240 (Andreas Krey  2016-12-02 23:15:13 +0100 269)   paths
are also not requi...
d4ba07cac5 (Johannes Sixt 2008-04-10 13:33:09 +0200 270)
1947bdbc31 (Junio C Hamano2008-06-22 14:32:27 -0700 271) -u[]::
1947bdbc31 (Junio C Hamano2008-06-22 14:32:27 -0700 272)
--untracked-files[=]::



I know that line 268 was deleted on 319d835240.

So on the first round of merge analysis it says "let's go into
beb635ca9". That's fine. That's exactly the path that is required to
reach 319d835240. But then when using this new "range of revisions"
for git blame --reverse, we get that line 268 is not telling us
anything useful:

$ git blame --reverse -L268,268 02db2d042..beb635c --
Documentation/git-commit.txt
^02db2d042 (Junio C Hamano 2016-12-19 14:45:30 -0800 268)
already been staged.

So, instead of pointing to 319d835240 (the parent of beb635c), it's
basically saying something like "I give up". My hunch (haven't sat
down to digest all the details about the output of git blame
--reverse... YET) is that, given that 02db2d042 is _not_ part of the
history of beb635c, git blame reverse is trying to tell me just
that... and that means I'll have to "script around this scenario".

$ git merge-base 02db2d042 beb635c
0202c411edc25940cc381bf317badcdf67670be4


Thanks in advance.


Re: difflame improvements

2017-02-16 Thread Jeff King
On Tue, Feb 14, 2017 at 11:19:05PM -0600, Edmundo Carmona Antoranz wrote:

> I've been working on detecting revisions where a "real" deletion was
> made and I think I advanced a lot in that front. I still have to work
> on many scenarios (renamed files, for example... also performance) but
> at least I'm using a few runs against git-scm history and the results
> are "promising":

I played with this a bit more, and it did turn up the correct results
for some deletions in my experiments.

One thing I noticed is that it also turned up nonsense for lines that
blame in weird ways. For instance, I have a diff like this (these are
real examples, but don't pay attention to the sha1s; it's in a fork of
git, not upstream):

  $ git diff v2.6.5 builtin/prune-packed.c
  diff --git a/builtin/prune-packed.c b/builtin/prune-packed.c
  index 7cf900ea07..5e3727e841 100644
  --- a/builtin/prune-packed.c
  +++ b/builtin/prune-packed.c
  @@ -2,6 +2,7 @@
   #include "cache.h"
   #include "progress.h"
   #include "parse-options.h"
  +#include "gh-log.h"
   
   static const char * const prune_packed_usage[] = {
N_("git prune-packed [-n | --dry-run] [-q | --quiet]"),
  @@ -29,8 +30,11 @@ static int prune_object(const unsigned char *sha1, const 
char *path,
   
if (*opts & PRUNE_PACKED_DRY_RUN)
printf("rm -f %s\n", path);
  - else
  + else {
  + gh_logf("prune", "%s Duplicate loose object pruned\n",
  + sha1_to_hex(sha1));
unlink_or_warn(path);
  + }
return 0;
   }
   

Running difflame on it says this:

  $ python /path/to/difflame.py v2.6.5..HEAD -- builtin/prune-packed.c
  [...]
  -2c0b29e662 (Jeff King 2016-01-26 15:27:55 -0500 32)  else
  +d60032f8640 builtin/prune-packed.c (Jeff King2015-02-02 23:15:33 
-0500 33)   else {
  +d60032f8640 builtin/prune-packed.c (Jeff King2015-02-02 23:15:33 
-0500 34)   gh_logf("prune", "%s Duplicate loose object pruned\n",
  +d60032f8640 builtin/prune-packed.c (Jeff King2015-02-02 23:15:33 
-0500 35)   sha1_to_hex(sha1));
   0d3b729680e builtin/prune-packed.c (Jeff King2014-10-15 18:40:53 
-0400 36)   unlink_or_warn(path);
  +2396ec85bd1 prune-packed.c (Linus Torvalds   2005-07-03 14:27:34 
-0700 37)   }

There are two weird things. One is that the old "else" is attributed to
my 2c0b29e662. That's quite weird, because that is a merge commit which
did not touch the file at all. I haven't tracked it down, but presumably
that is weirdness with the --reverse blame.

But there's another one, that I think is easy to fix. The closing brace
is attributed to some ancient commit from Linus. Which yes, I'm sure had
a closing brace, but not _my_ closing brace that was added by
d60032f8640, that the rest of the lines got attributed to.

This isn't difflame's fault; that's what "git blame" tells you about
that line. But since I already told difflame "v2.6.5..HEAD", it would
probably make sense to similarly limit the blame to that range. That
turns up a boundary commit for the line. Which is _also_ not helpful,
but at least the tool is telling me that the line came from before
v2.6.5, and I don't really need to care much about it.

Part of this is that my use case may be a bit different than yours. I
don't actually want to look at the blame results directly. I just want
to see the set of commits that I'd need to look at and possibly
cherry-pick in order to re-create the diff.

-Peff