Re: [RFC] blame: new option to better handle merged cherry-picks
Bernhard R. Link brl+...@mail.brlink.eu writes: Allows to disable the git blame optimization of assuming that if there is a parent of a merge commit that has the exactly same file content, then only this parent is to be looked at. I think this is what we usually call --full-history in git log family, but more importantly, I do not think this is solving a valid problem. This optimization, while being faster in the usual case, means that in the case of cherry-picks the blamed commit depends on which other commits touched a file. If for example one commit A modified both files b and c. And there are commits B and C, B only modifies file b and C only modifies file c (so that no conflicts happen), and assume A is cherry-picked as A' and the two branches then merged: --o-B---A \ \ ---C---A'--M--- So the contents of b at M is as the same as in A, so following 'b' will see A and B changed that path, which is correct. The contents of c at M is? It is different from A because at A c lacks the change made to it at C. The merged result at M would match C in A', no? So following 'c' will see A' and C changed that path, no? So what is wrong about it? If the original history were like this instead, and A' were a cherry-pick of A, then what should happen? --o-B---A' \ \ ---C---A---M--- Don't we want to see c blamed the same way? Also, when handling a merge, we have to handle parents sequencially, checking the difference between M with its first parent first, and then passing blame for the remaining common lines to the remaining parents. If you flip the order of parents of M when you merge A and A' in your original history, and with your patch, what would you see when you blame c? Wouldn't it notice that M:c is identical to c in its first parent (now A') and pass the whole blame to A' anyway with or without your change? Then without this new option git blame blames the A|A' changes of file b to A while blaming the changes of c to A'. With the new option --no-parent-shortcut it blames both changes to the same commit. Signed-off-by: Bernhard R. Link brl...@debian.org --- Documentation/blame-options.txt | 6 ++ builtin/blame.c | 5 - 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/Documentation/blame-options.txt b/Documentation/blame-options.txt index 0cebc4f..55dd12b 100644 --- a/Documentation/blame-options.txt +++ b/Documentation/blame-options.txt @@ -48,6 +48,12 @@ include::line-range-format.txt[] Show the result incrementally in a format designed for machine consumption. +--no-parent-shortcut:: + Always look at all parents of a merge and do not shortcut + to the first parent with no changes to the file looked at. + This takes more time but produces more reliable results + if branches with cherry-picked commits were merged. + --encoding=encoding:: Specifies the encoding used to output author names and commit summaries. Setting it to `none` makes blame diff --git a/builtin/blame.c b/builtin/blame.c index 4916eb2..dab2c36 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -45,6 +45,7 @@ static int incremental; static int xdl_opts; static int abbrev = -1; static int no_whole_file_rename; +static int no_parent_shortcut; static enum date_mode blame_date_mode = DATE_ISO8601; static size_t blame_date_width; @@ -1248,7 +1249,8 @@ static void pass_blame(struct scoreboard *sb, struct origin *origin, int opt) porigin = find(sb, p, origin); if (!porigin) continue; - if (!hashcmp(porigin-blob_sha1, origin-blob_sha1)) { + if (!no_parent_shortcut + !hashcmp(porigin-blob_sha1, origin-blob_sha1)) { pass_whole_blame(sb, origin, porigin); origin_decref(porigin); goto finish; @@ -2247,6 +2249,7 @@ int cmd_blame(int argc, const char **argv, const char *prefix) static const char *contents_from = NULL; static const struct option options[] = { OPT_BOOL(0, incremental, incremental, N_(Show blame entries as we find them, incrementally)), + OPT_BOOL(0, no-parent-shortcut, no_parent_shortcut, N_(Don't take shortcuts in some merges but handle cherry-picks better)), OPT_BOOL('b', NULL, blank_boundary, N_(Show blank SHA-1 for boundary commits (Default: off))), OPT_BOOL(0, root, show_root, N_(Do not treat root commits as boundaries (Default: off))), OPT_BOOL(0, show-stats, show_stats, N_(Show work cost statistics)), -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] blame: new option to better handle merged cherry-picks
* Junio C Hamano gits...@pobox.com [140102 21:29]: This optimization, while being faster in the usual case, means that in the case of cherry-picks the blamed commit depends on which other commits touched a file. If for example one commit A modified both files b and c. And there are commits B and C, B only modifies file b and C only modifies file c (so that no conflicts happen), and assume A is cherry-picked as A' and the two branches then merged: --o-B---A \ \ ---C---A'--M--- So the contents of b at M is as the same as in A, so following 'b' will see A and B changed that path, which is correct. The contents of c at M is? It is different from A because at A c lacks the change made to it at C. The merged result at M would match C in A', no? So following 'c' will see A' and C changed that path, no? So what is wrong about it? It's not wrong (that's why I do not suggest to change the default behaviour), but it's inconsistent and can be a bit confusing to have either the one or the other commit blamed depending on whether some file was touched or not. The history I'm a bit more concerned is something like (with ... being unrelated commits not touching B or C): --o-...---A--...---B---...-- \\ ---...---A'--...---C---...---M--- Here having B or C touching b or c determines which of A or A' is blamed for which part of the patch. It's even enough to have: --...---A'--...---B---...-- /\ ---o---...---A-----M--- To have the A/A' changes of c to be attributed to A while the b changes are attributed to A'. I.e. you have a master branch that has commit A, which is also cherry-picked to some previously forked side-branch. Once that side-branch is merged back, parts of the change are attributed to A' if they are in a file that is not touched otherwise in the main branch. Also, when handling a merge, we have to handle parents sequencially, checking the difference between M with its first parent first, and then passing blame for the remaining common lines to the remaining parents. If you flip the order of parents of M when you merge A and A' in your original history, and with your patch, what would you see when you blame c? Wouldn't it notice that M:c is identical to c in its first parent (now A') and pass the whole blame to A' anyway with or without your change? When giving git-blame the new option introduced with my patch, only the order of parents determines which commit is blamed. Without the option (i.e. the currently only possible behaviour) which commit is blamed depends what else touches other parts of the file. If both branches make modifications to the file (or if there is any merge conflict resolution in the merge) then the bahaviour with or without the option are the same. But in the example with one commit B touching also b and one commit C touching also c, there is (without the new option) always one part of the cherry-picked commit is blamed on the original and one on the cherry-picked, no matter how you order the parents. (While by having your mainline always the most leftward parent, with the the new option you always get those commit blamed that is the first one this was introduced to mainline.) Bernhard R. Link -- F8AC 04D5 0B9B 064B 3383 C3DA AFFC 96D1 151D FFDC -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] blame: new option to better handle merged cherry-picks
Bernhard R. Link brl+...@mail.brlink.eu writes: When giving git-blame the new option introduced with my patch, only the order of parents determines which commit is blamed. Without the option (i.e. the currently only possible behaviour) which commit is blamed depends what else touches other parts of the file. I am trying to figure out why that difference matters, in other words, when using the new option is actually useful. You give the command a scenario that can be solved in two equally valid ways (blaming to either A or A' is equally valid), and sometimes the command gives the identical result with or without the new option, and some other times the user gets a different but an equally valid result (but after traversing more history spending more cycles). I am not sure what problem the new option solves. I am trying to come up with an easy-to-understand explanation to the end users: If you want to see blame's result with the property X, use this option---it may have to spend extra cycles, but the property X is so desirable that it may be worth it. And I am having a hard time understanding what that X is. But in the example with one commit B touching also b and one commit C touching also c, there is (without the new option) always one part of the cherry-picked commit is blamed on the original and one on the cherry-picked, no matter how you order the parents. Yeah, the cherry-picked one will introduce the same change as the one that was cherry-picked, so if you look at the end result and ask where did _this_ line come from?, there are two equally plausible candidates, as blame output can give only one answer to each line. I still do not see why the one that is picked with the new option is better. At best, it looks to me that it is saying running with this option may (or may not) give a different answer, so run the command with and without it and see which one you like, which does not sound too useful to the end users. That is where my confusion comes from. (While by having your mainline always the most leftward parent, with the the new option you always get those commit blamed that is the first one this was introduced to mainline.) Yes, I vaguely recall we talked about adding --first-parent option to the command in the past. I do not remember what came out of that discussion. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html