Re: [RFC] blame: new option to better handle merged cherry-picks

2014-01-02 Thread Junio C Hamano
Bernhard R. Link brl+...@mail.brlink.eu writes:

 Allows to disable the git blame optimization of assuming that if there is a
 parent of a merge commit that has the exactly same file content, then
 only this parent is to be looked at.

I think this is what we usually call --full-history in git log
family, but more importantly, I do not think this is solving a valid
problem.

 This optimization, while being faster in the usual case, means that in
 the case of cherry-picks the blamed commit depends on which other commits
 touched a file.

 If for example one commit A modified both files b and c. And there are
 commits B and C, B only modifies file b and C only modifies file c
 (so that no conflicts happen), and assume A is cherry-picked as A'
 and the two branches then merged:

 --o-B---A
\ \
 ---C---A'--M---

So the contents of b at M is as the same as in A, so following 'b'
will see A and B changed that path, which is correct.

The contents of c at M is?  It is different from A because at A c
lacks the change made to it at C.  The merged result at M would
match C in A', no?  So following 'c' will see A' and C changed that
path, no?

So what is wrong about it?  If the original history were like this
instead, and A' were a cherry-pick of A, then what should happen?

 --o-B---A'
\ \
 ---C---A---M---

Don't we want to see c blamed the same way?

Also, when handling a merge, we have to handle parents sequencially,
checking the difference between M with its first parent first, and
then passing blame for the remaining common lines to the remaining
parents.  If you flip the order of parents of M when you merge A and
A' in your original history, and with your patch, what would you
see when you blame c?  Wouldn't it notice that M:c is identical to c
in its first parent (now A') and pass the whole blame to A' anyway
with or without your change?



 Then without this new option git blame blames the A|A' changes of
 file b to A while blaming the changes of c to A'.
 With the new option --no-parent-shortcut it blames both changes to the
 same commit.

 Signed-off-by: Bernhard R. Link brl...@debian.org
 ---
  Documentation/blame-options.txt | 6 ++
  builtin/blame.c | 5 -
  2 files changed, 10 insertions(+), 1 deletion(-)

 diff --git a/Documentation/blame-options.txt b/Documentation/blame-options.txt
 index 0cebc4f..55dd12b 100644
 --- a/Documentation/blame-options.txt
 +++ b/Documentation/blame-options.txt
 @@ -48,6 +48,12 @@ include::line-range-format.txt[]
   Show the result incrementally in a format designed for
   machine consumption.
  
 +--no-parent-shortcut::
 + Always look at all parents of a merge and do not shortcut
 + to the first parent with no changes to the file looked at.
 + This takes more time but produces more reliable results
 + if branches with cherry-picked commits were merged.
 +
  --encoding=encoding::
   Specifies the encoding used to output author names
   and commit summaries. Setting it to `none` makes blame
 diff --git a/builtin/blame.c b/builtin/blame.c
 index 4916eb2..dab2c36 100644
 --- a/builtin/blame.c
 +++ b/builtin/blame.c
 @@ -45,6 +45,7 @@ static int incremental;
  static int xdl_opts;
  static int abbrev = -1;
  static int no_whole_file_rename;
 +static int no_parent_shortcut;
  
  static enum date_mode blame_date_mode = DATE_ISO8601;
  static size_t blame_date_width;
 @@ -1248,7 +1249,8 @@ static void pass_blame(struct scoreboard *sb, struct 
 origin *origin, int opt)
   porigin = find(sb, p, origin);
   if (!porigin)
   continue;
 - if (!hashcmp(porigin-blob_sha1, origin-blob_sha1)) {
 + if (!no_parent_shortcut 
 + !hashcmp(porigin-blob_sha1, origin-blob_sha1)) {
   pass_whole_blame(sb, origin, porigin);
   origin_decref(porigin);
   goto finish;
 @@ -2247,6 +2249,7 @@ int cmd_blame(int argc, const char **argv, const char 
 *prefix)
   static const char *contents_from = NULL;
   static const struct option options[] = {
   OPT_BOOL(0, incremental, incremental, N_(Show blame entries 
 as we find them, incrementally)),
 + OPT_BOOL(0, no-parent-shortcut, no_parent_shortcut, 
 N_(Don't take shortcuts in some merges but handle cherry-picks better)),
   OPT_BOOL('b', NULL, blank_boundary, N_(Show blank SHA-1 for 
 boundary commits (Default: off))),
   OPT_BOOL(0, root, show_root, N_(Do not treat root commits 
 as boundaries (Default: off))),
   OPT_BOOL(0, show-stats, show_stats, N_(Show work cost 
 statistics)),
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] blame: new option to better handle merged cherry-picks

2014-01-02 Thread Bernhard R. Link
* Junio C Hamano gits...@pobox.com [140102 21:29]:
  This optimization, while being faster in the usual case, means that in
  the case of cherry-picks the blamed commit depends on which other commits
  touched a file.
 
  If for example one commit A modified both files b and c. And there are
  commits B and C, B only modifies file b and C only modifies file c
  (so that no conflicts happen), and assume A is cherry-picked as A'
  and the two branches then merged:
 
  --o-B---A
 \ \
  ---C---A'--M---

 So the contents of b at M is as the same as in A, so following 'b'
 will see A and B changed that path, which is correct.

 The contents of c at M is?  It is different from A because at A c
 lacks the change made to it at C.  The merged result at M would
 match C in A', no?  So following 'c' will see A' and C changed that
 path, no?

 So what is wrong about it?

It's not wrong (that's why I do not suggest to change the default
behaviour), but it's inconsistent and can be a bit confusing to
have either the one or the other commit blamed depending on whether
some file was touched or not.
The history I'm a bit more concerned is something like (with ...
being unrelated commits not touching B or C):

 --o-...---A--...---B---...--
\\
 ---...---A'--...---C---...---M---


Here having B or C touching b or c determines which of A or A' is
blamed for which part of the patch.

It's even enough to have:

   --...---A'--...---B---...--
  /\
 ---o---...---A-----M---

To have the A/A' changes of c to be attributed to A while the b changes
are attributed to A'. I.e. you have a master branch that has commit A,
which is also cherry-picked to some previously forked side-branch.
Once that side-branch is merged back, parts of the change are attributed
to A' if they are in a file that is not touched otherwise in the main
branch.


 Also, when handling a merge, we have to handle parents sequencially,
 checking the difference between M with its first parent first, and
 then passing blame for the remaining common lines to the remaining
 parents.  If you flip the order of parents of M when you merge A and
 A' in your original history, and with your patch, what would you
 see when you blame c?  Wouldn't it notice that M:c is identical to c
 in its first parent (now A') and pass the whole blame to A' anyway
 with or without your change?

When giving git-blame the new option introduced with my patch, only
the order of parents determines which commit is blamed. Without
the option (i.e. the currently only possible behaviour) which commit
is blamed depends what else touches other parts of the file.
If both branches make modifications to the file (or if there is
any merge conflict resolution in the merge) then the bahaviour with
or without the option are the same.

But in the example with one commit B touching also b and one commit C
touching also c, there is (without the new option) always one part
of the cherry-picked commit is blamed on the original and one on the
cherry-picked, no matter how you order the parents.
(While by having your mainline always the most leftward parent, with
the the new option you always get those commit blamed that is the
first one this was introduced to mainline.)

Bernhard R. Link
-- 
F8AC 04D5 0B9B 064B 3383  C3DA AFFC 96D1 151D FFDC
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] blame: new option to better handle merged cherry-picks

2014-01-02 Thread Junio C Hamano
Bernhard R. Link brl+...@mail.brlink.eu writes:

 When giving git-blame the new option introduced with my patch, only
 the order of parents determines which commit is blamed. Without
 the option (i.e. the currently only possible behaviour) which commit
 is blamed depends what else touches other parts of the file.

I am trying to figure out why that difference matters, in other
words, when using the new option is actually useful.  You give the
command a scenario that can be solved in two equally valid ways
(blaming to either A or A' is equally valid), and sometimes the
command gives the identical result with or without the new option,
and some other times the user gets a different but an equally valid
result (but after traversing more history spending more cycles).  I
am not sure what problem the new option solves.  I am trying to come
up with an easy-to-understand explanation to the end users: If you
want to see blame's result with the property X, use this option---it
may have to spend extra cycles, but the property X is so desirable
that it may be worth it.  And I am having a hard time understanding
what that X is.

 But in the example with one commit B touching also b and one commit C
 touching also c, there is (without the new option) always one part
 of the cherry-picked commit is blamed on the original and one on the
 cherry-picked, no matter how you order the parents.

Yeah, the cherry-picked one will introduce the same change as the
one that was cherry-picked, so if you look at the end result and ask
where did _this_ line come from?, there are two equally plausible
candidates, as blame output can give only one answer to each line.
I still do not see why the one that is picked with the new option is
better.  At best, it looks to me that it is saying running with
this option may (or may not) give a different answer, so run the
command with and without it and see which one you like, which does
not sound too useful to the end users.  That is where my confusion
comes from.

 (While by having your mainline always the most leftward parent, with
 the the new option you always get those commit blamed that is the
 first one this was introduced to mainline.)

Yes, I vaguely recall we talked about adding --first-parent option
to the command in the past.  I do not remember what came out of that
discussion.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html