Re: [PATCH v6] Add new git-related helper to contrib
Felipe Contreras felipe.contre...@gmail.com writes: On Wed, May 22, 2013 at 11:07 PM, Felipe Contreras felipe.contre...@gmail.com wrote: On Wed, May 22, 2013 at 7:08 PM, Junio C Hamano gits...@pobox.com wrote: Felipe Contreras felipe.contre...@gmail.com writes: IIRC, git-gui runs two blames, one without any -C and one with (I do not offhand recall how many -C it uses) to show both. 'git blame' is a very expensive operation, perhaps we should add another option so users don't need to run two blames to find this. Yes, you could add a struct origin *suspect_in_the_same_file next to the current struct origin *suspect to the blame_entry in order to represent the origin you would get if you were to stop at a move across files event, and keep digging, when you are operating under -C -C or higher mode. No, this is what I meant: But that would not print the right line numbers, so perhaps: Users of full-output may want to be able to see the same thing. I have a suspicion that you misread what assign_blame() does. The first inner loop to find one suspect is really what it says. It loops over blame entries in the scoreboard but it is not about finding one entry, but is to find one suspect. The ent found in the loop that you store in tmp_ent is no more special than any other ent that shares the same suspect as its origin. Imagine that your scoreboard originally has three blocks of text (i.e. blame_entry) A, B, and C, and the current suspect for A and C are the same, while we already know what the final guilty party for B is (which may be some descendant of the suspect). Once we find one suspect in the first inner loop, the call to pass_blame() does everything it can do to exonerate that suspect. It runs a single diff between the suspect and each of its parents to find lines in both A and C that can be blamed to one of these parents, and blame entries A and C are split into pieces, some of which still have the same suspect as their origin, which are where their lines originate from, while others are attributable to these parents or their ancestors. If you keep the original entry for A to do something special, like printing what the original range of A was before it was split by pass_blame(), wouldn't you need to do the same for C? We will not be visiting C later in the outer while(1) loop, as a single call to pass_blame() for suspect we did when we found it in A has already taken care of it. I am not sure what you are trying to achieve with that found-guilty logic, even if the save away in tmp_ent logic is changed to keep all the blame entries that have suspect we are looking at as their origin. When the current suspect is found to have passed all lines intact from its parents, we will see found-guilty left to be false. How would it make the original blame_entry (perhaps halfway) guilty in that situation? --- a/builtin/blame.c +++ b/builtin/blame.c @@ -1500,25 +1500,16 @@ static int emit_one_suspect_detail(struct origin *suspect, int repeat) return 1; } -/* - * The blame_entry is found to be guilty for the range. Mark it - * as such, and show it in incremental output. - */ -static void found_guilty_entry(struct blame_entry *ent) +static void print_guilty_entry(struct blame_entry *ent) { - if (ent-guilty) - return; - ent-guilty = 1; - if (incremental) { - struct origin *suspect = ent-suspect; - - printf(%s %d %d %d\n, -sha1_to_hex(suspect-commit-object.sha1), -ent-s_lno + 1, ent-lno + 1, ent-num_lines); - emit_one_suspect_detail(suspect, 0); - write_filename_info(suspect-path); - maybe_flush_or_die(stdout, stdout); - } + struct origin *suspect = ent-suspect; + + printf(%s %d %d %d\n, + sha1_to_hex(suspect-commit-object.sha1), + ent-s_lno + 1, ent-lno + 1, ent-num_lines); + emit_one_suspect_detail(suspect, 0); + write_filename_info(suspect-path); + maybe_flush_or_die(stdout, stdout); } /* @@ -1531,14 +1522,19 @@ static void assign_blame(struct scoreboard *sb, int opt) struct rev_info *revs = sb-revs; while (1) { - struct blame_entry *ent; + struct blame_entry *ent, tmp_ent; struct commit *commit; struct origin *suspect = NULL; + int found_guilty = 0; /* find one suspect to break down */ - for (ent = sb-ent; !suspect ent; ent = ent-next) - if (!ent-guilty) + for (ent = sb-ent; ent; ent = ent-next) + if (!ent-guilty) { + tmp_ent = *ent; suspect = ent-suspect; + break; + } + if (!suspect) return; /* all done */ @@ -1564,9 +1560,20 @@ static
Re: [PATCH v6] Add new git-related helper to contrib
Felipe Contreras felipe.contre...@gmail.com writes: On Wed, May 22, 2013 at 10:23 PM, Felipe Contreras felipe.contre...@gmail.com wrote: This doesn't make any sense: Ah, never mind, it's COPYING the one being modified, not EXTRACTING. Yes. The different levels of -C happens to correspond to the software engineering practice from best to sloppy: - When you refactor to have a block of lines in a file , the original of that block would have come from another file (or the same file) you touched to remove duplication, so a single -C looks for an origin only from modified files (best). - When you start a new file, you may have to start from some boilerplate material that already exists in another file that is not (and does not have to be) changed in the commit you add that new file, so double -C -C looks for an origin of lines from other files, even unmodified ones, when looking at the lines in a new file (unfortunate but acceptable). - When you copy and paste without making any effort to refactor, you modify a file by adding new lines that match identically from another existing file, the latter of which does not change in the commit you do that copy and paste. To find this, you need to look for an origin for any file from all the other files, and triple -C -C -C lets you do so (sloppy). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
Junio C Hamano gits...@pobox.com writes: Users of full-output may want to be able to see the same thing. ... but I agree we may be able to cheat if we only want to support interactive, and we do want to find a way to cheat when we are running interactive without having to store much more information in each blame_entry. Imagine that your scoreboard originally has three blocks of text ... in that situation? (I snipped everything I said in the previous reply only for brevity but they are still relevant). I _think_ one way to cheat without storing too much information in each blame_entry is to first make a copy of all the original blame entries that share the suspect, see if any line in the line range represented by each of the blame entries ended up being blamed to an origin with a path different from that of the suspect. And print those original blame entries that fits the criterion as additional these are not the final blame as you are digging with the option to detect copy across files, but at this commit this line appeared anew as far as the origin-path is concerned output. So I think you were going in the right direction (in other words, the patch inserted new code to the right places), even though you may have got the details in what the new code should do wrong. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
On Thu, May 23, 2013 at 11:54 AM, Junio C Hamano gits...@pobox.com wrote: Felipe Contreras felipe.contre...@gmail.com writes: On Wed, May 22, 2013 at 11:07 PM, Felipe Contreras felipe.contre...@gmail.com wrote: On Wed, May 22, 2013 at 7:08 PM, Junio C Hamano gits...@pobox.com wrote: Felipe Contreras felipe.contre...@gmail.com writes: IIRC, git-gui runs two blames, one without any -C and one with (I do not offhand recall how many -C it uses) to show both. 'git blame' is a very expensive operation, perhaps we should add another option so users don't need to run two blames to find this. Yes, you could add a struct origin *suspect_in_the_same_file next to the current struct origin *suspect to the blame_entry in order to represent the origin you would get if you were to stop at a move across files event, and keep digging, when you are operating under -C -C or higher mode. No, this is what I meant: But that would not print the right line numbers, so perhaps: Users of full-output may want to be able to see the same thing. I have a suspicion that you misread what assign_blame() does. The first inner loop to find one suspect is really what it says. It loops over blame entries in the scoreboard but it is not about finding one entry, but is to find one suspect. The ent found in the loop that you store in tmp_ent is no more special than any other ent that shares the same suspect as its origin. It is stored because it's the only structure that has the line numbers. If the blame is passed to another 'ent', the original line numbers are gone. We could manually copy the line numbers to three variables, or we could copy the whole thing to one variable. The rest of the 'ent' doesn't matter. Imagine that your scoreboard originally has three blocks of text (i.e. blame_entry) A, B, and C, and the current suspect for A and C are the same, while we already know what the final guilty party for B is (which may be some descendant of the suspect). I don't see how that's possible, but whatever. Once we find one suspect in the first inner loop, the call to pass_blame() does everything it can do to exonerate that suspect. It runs a single diff between the suspect and each of its parents to find lines in both A and C that can be blamed to one of these parents, and blame entries A and C are split into pieces, some of which still have the same suspect as their origin, which are where their lines originate from, while others are attributable to these parents or their ancestors. If you keep the original entry for A to do something special, like printing what the original range of A was before it was split by pass_blame(), wouldn't you need to do the same for C? tmp_ent is only used when there's no entry taken the guilt away from A. If the blame entry of A is split, and the result of the split has a different filename, found_guilty() would not be called, and thus we won't show the blame entry, and we want to show it. And yes, we would need to do the same for C, and we would once the turn of C comes along. If it's possible that A descendant from A takes the blame for C, then sure, we would need to do ensure C is printed before it's gone, but pass_blame(A) would not destroy C. We will not be visiting C later in the outer while(1) loop, as a single call to pass_blame() for suspect we did when we found it in A has already taken care of it. It took care of A's suspect, but not C. Isn't it possible that C is split further and creates another blame entry? Either way, in --incremental we want to print C as well, whether it's the guilty one or not. I am not sure what you are trying to achieve with that found-guilty logic, even if the save away in tmp_ent logic is changed to keep all the blame entries that have suspect we are looking at as their origin. When the current suspect is found to have passed all lines intact from its parents, we will see found-guilty left to be false. What? When a 'blame entry' passes the whole blame to a parent, found_guilty is false, so we print the entry, which is exactly what we want to do. How would it make the original blame_entry (perhaps halfway) guilty in that situation? I thought 'guilty' meant that it was guilty of adding those lines to the 'final' text, so it ends up in the non-incremental blame. In that sense those blame entries are not guilty. But they are still blame entries, and we want to print all blame entries in incremental, guilty or not. -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
Felipe Contreras felipe.contre...@gmail.com writes: Imagine that your scoreboard originally has three blocks of text (i.e. blame_entry) A, B, and C, and the current suspect for A and C are the same, while we already know what the final guilty party for B is (which may be some descendant of the suspect). I don't see how that's possible, but whatever. The tree in your latest commit HEAD has a file with block of text A followed by block of text B followed by block of text C. The latest commit was the one that added B (perhaps new lines were inserted, or perhaps existing contiguous block of text was replaced, there is no difference between these two cases). You start digging the history of this file from HEAD. Your scoreboard begins with a single blame-entry that covers all three segments, with its suspect set to HEAD. Then pass_blame() discovers that top and bottom segments are older than this latest commit, and splits the originally-one blame entry into three blame entries. The first and the third blame entries cover the block A and the block C respectively, and their suspect fields are both set to HEAD^. The second blame entry covers the block B and its suspect does not change (it still is HEAD). Then it returns to the caller. The caller of pass_blame() looks at the scoreboard and finds these three blame entries. The second one's supect is still the original suspect the caller asked pass_blame() to exonerate blames for, and the suspect failed to do so for block B. The final blame for the middle block is now known to be HEAD. After all of the above, the next iteration of while(1) loop begins. That is how you arrive to the whatever situation. You have three blame entries, A, B and C, and suspect of A and C are the same, while B has a different suspect. Then in that next iteration, we pick blame entry for A and decide to see if HEAD^, which is the suspect for A, can be exonerated of blames for _any_ (not just A) blame entry it currently is suspected for. We call pass_blame() and it will find and process both A and C with a single git diff-tree, attempting to pass blame to HEAD^^ and its ancestors. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
On Thu, May 23, 2013 at 4:52 PM, Junio C Hamano gits...@pobox.com wrote: Felipe Contreras felipe.contre...@gmail.com writes: Imagine that your scoreboard originally has three blocks of text (i.e. blame_entry) A, B, and C, and the current suspect for A and C are the same, while we already know what the final guilty party for B is (which may be some descendant of the suspect). I don't see how that's possible, but whatever. The tree in your latest commit HEAD has a file with block of text A followed by block of text B followed by block of text C. The latest commit was the one that added B (perhaps new lines were inserted, or perhaps existing contiguous block of text was replaced, there is no difference between these two cases). You start digging the history of this file from HEAD. Your scoreboard begins with a single blame-entry that covers all three segments, with its suspect set to HEAD. Then pass_blame() discovers that top and bottom segments are older than this latest commit, and splits the originally-one blame entry into three blame entries. The first and the third blame entries cover the block A and the block C respectively, and their suspect fields are both set to HEAD^. The second blame entry covers the block B and its suspect does not change (it still is HEAD). Then it returns to the caller. The caller of pass_blame() looks at the scoreboard and finds these three blame entries. The second one's supect is still the original suspect the caller asked pass_blame() to exonerate blames for, and the suspect failed to do so for block B. The final blame for the middle block is now known to be HEAD. After all of the above, the next iteration of while(1) loop begins. That is how you arrive to the whatever situation. You have three blame entries, A, B and C, and suspect of A and C are the same, while B has a different suspect. Then in that next iteration, we pick blame entry for A and decide to see if HEAD^, which is the suspect for A, can be exonerated of blames for _any_ (not just A) blame entry it currently is suspected for. We call pass_blame() and it will find and process both A and C with a single git diff-tree, attempting to pass blame to HEAD^^ and its ancestors. All right, my code still works in that situation. -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
Felipe Contreras felipe.contre...@gmail.com writes: Then in that next iteration, we pick blame entry for A and decide to see if HEAD^, which is the suspect for A, can be exonerated of blames for _any_ (not just A) blame entry it currently is suspected for. We call pass_blame() and it will find and process both A and C with a single git diff-tree, attempting to pass blame to HEAD^^ and its ancestors. All right, my code still works in that situation. When HEAD^ was processed, pass_blame() finds that the first line in A is attributable to HEAD^ (our current suspect) while the rest were copied from a different file in HEAD^^ . All lines in C are found to be copy from a different file in HEAD^^. Then your scoreboard would have: 1. a blame entry that failed to pass blame to parents of HEAD^ (the first line in A), which still has suspect == HEAD^ 2. a blame entry that covers the remainder of A, blaming HEAD^^. 3. a blame entry that covers all of B, whose final guilty party is known already. 4. a blame entry that covers all of C, blaming a different file in HEAD^^. Your Take responsibility loop goes over these blame entries, sets found_guilty to true because of the first blame entry (the first line of A), and calls print_guilty_entry() for blame entry 1, showing that HEAD^ is guilty for the first line of A. After the loop, your if we did not find anybody guilty logic does not kick in, and the original line range for block A you saved in tmp_ent is not used. You lost the fact that the second and remainder of A were in this file at the commit HEAD^ but not in HEAD^^ (i.e. these lines were moved by HEAD^ from elsewhere). The fact that HEAD^ touched _something_ is not lost, so if _all_ you care about is list all the commits, but I do not care about what line range was modified how, you can claim it working, but that is a very limited definition of working and is not very reusable or extensible in order to help those like gitk that currently have to run two separate blames. They need an output that lets them tell between - this is the earliest we saw these lines in the path (it may have been copied from another path, but this entry in the incremental output stream is not about that final blame); and - this is the final blame where these lines came from, possibly from different path and coalesce both kind of origin.. Also the fact that the entire C was copied from elsewhere by HEAD^ is lost but that is the same issue. The next round will not find any blame entry that is !ent-guity because the call to pass_blame() for HEAD^ already handled the final blame not just for blame entries 1 and 2, but also for 4 during this round. If the change in HEAD^ in the above example were to copy the whole A and C from a different file, then your !found_guilty logic would kick in and say all lines of A where copied from elsewhere in HEAD^, but again we would not learn the same information for C. I do not think I care only about a single bit per commit, i.e. if the commit touched the path; screw everybody else who wants to actually know where each line came from can be called working. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
On Thu, May 23, 2013 at 5:44 PM, Junio C Hamano gits...@pobox.com wrote: Felipe Contreras felipe.contre...@gmail.com writes: Then in that next iteration, we pick blame entry for A and decide to see if HEAD^, which is the suspect for A, can be exonerated of blames for _any_ (not just A) blame entry it currently is suspected for. We call pass_blame() and it will find and process both A and C with a single git diff-tree, attempting to pass blame to HEAD^^ and its ancestors. All right, my code still works in that situation. When HEAD^ was processed, pass_blame() finds that the first line in A is attributable to HEAD^ (our current suspect) while the rest were copied from a different file in HEAD^^ . All lines in C are found to be copy from a different file in HEAD^^. Then your scoreboard would have: 1. a blame entry that failed to pass blame to parents of HEAD^ (the first line in A), which still has suspect == HEAD^ 2. a blame entry that covers the remainder of A, blaming HEAD^^. 3. a blame entry that covers all of B, whose final guilty party is known already. 4. a blame entry that covers all of C, blaming a different file in HEAD^^. Your Take responsibility loop goes over these blame entries, sets found_guilty to true because of the first blame entry (the first line of A), and calls print_guilty_entry() for blame entry 1, showing that HEAD^ is guilty for the first line of A. After the loop, your if we did not find anybody guilty logic does not kick in, and the original line range for block A you saved in tmp_ent is not used. You lost the fact that the second and remainder of A were in this file at the commit HEAD^ but not in HEAD^^ (i.e. these lines were moved by HEAD^ from elsewhere). The fact that HEAD^ touched _something_ is not lost, so if _all_ you care about is list all the commits, but I do not care about what line range was modified how, you can claim it working, The line range printed is correct. but that is a very limited definition of working and is not very reusable or extensible in order to help those like gitk that currently have to run two separate blames. They need an output that lets them tell between - this is the earliest we saw these lines in the path (it may have been copied from another path, but this entry in the incremental output stream is not about that final blame); and - this is the final blame where these lines came from, possibly from different path and coalesce both kind of origin.. They can already do that by looking at the lines; if they get replaced by another entry; they are not guilty. Or we can add a 'guilty' line to the incremental output, that's trivial. Also the fact that the entire C was copied from elsewhere by HEAD^ is lost but that is the same issue. The next round will not find any blame entry that is !ent-guity because the call to pass_blame() for HEAD^ already handled the final blame not just for blame entries 1 and 2, but also for 4 during this round. No, that's not true. If it's a different file the blame entry is not found guilty. If the change in HEAD^ in the above example were to copy the whole A and C from a different file, then your !found_guilty logic would kick in and say all lines of A where copied from elsewhere in HEAD^, but again we would not learn the same information for C. We would, when it's the turn for C, which is not guilty at this point. I do not think I care only about a single bit per commit, i.e. if the commit touched the path; screw everybody else who wants to actually know where each line came from can be called working. They can know where each line came from; the lines of each entry are printed properly; that's the whole point of 'tmp_ent'. -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
On Thu, May 23, 2013 at 6:47 PM, Junio C Hamano gits...@pobox.com wrote: Felipe Contreras felipe.contre...@gmail.com writes: If the change in HEAD^ in the above example were to copy the whole A and C from a different file, then your !found_guilty logic would kick in and say all lines of A where copied from elsewhere in HEAD^, but again we would not learn the same information for C. We would, when it's the turn for C, which is not guilty at this point. In _this_ round of the while(1) loop, pass_blame_to_parent() gets the scoreboard and two origins (HEAD^ that we are looking at and HEAD^^ that is its parent); it does not even know what blame entry this request came from. It runs a single diff using diff_hunks(), and asks blame_chunk() to split all the blame entries in the scoreboard that suspect it is looking at may be guilty for. Blame entry for A and C are both processed exactly the same way when HEAD^ is given to pass_blame() for the first time, which is when assign_blame() decided to call it with HEAD^ because it happened to have seen A before seeing C. At that point, both A and C are processed, and the post-processing loop Take responsibility for the remaining will clean up remnants from both A and C. After this round ends, the suspect for A and C are both set to HEAD^^. In the next round of the while(1) loop, C already forgot that its line movement happened in HEAD^. Its suspect is now HEAD^^. When it's the turn for C [*1*], you can say These lines originate in that different path in HEAD^^, but it is too late to say But the first time they appeared in the original file was HEAD^ (which is when they were moved from the different path in HEAD^^), isn't it? If that's the case then we'll need another list of blame entries where each discarded blame entry goes. Given the luck of my previous obvious patches, I'm not interested in implementing this non-obvious one in the least. The point is 'git related' should do -C -C -C, if 'git blame' doesn't throw the right output, that's a bug in 'git blame' not 'git related. -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
Felipe Contreras felipe.contre...@gmail.com writes: diff --git a/contrib/related/git-related b/contrib/related/git-related new file mode 100755 index 000..b96dcdd --- /dev/null +++ b/contrib/related/git-related @@ -0,0 +1,124 @@ +#!/usr/bin/env ruby + +# This script finds people that might be interested in a patch +# usage: git related file + +$since = '5-years-ago' +$min_percent = 10 + +def fmt_person(name, email) + '%s %s' % [name, email] +end Micronit. I suspect you do not need this helper, unless later patches start using it. + def import +return if @items.empty? +File.popen(%w[git cat-file --batch], 'r+') do |p| + p.write(@items.keys.join(\n)) + p.close_write + p.each do |line| +if line =~ /^(\h{40}) commit (\d+)/ + id, len = $1, $2 + data = p.read($2.to_i) + @items[id].parse(data) +end + end +end + end + + def get_blame(source, start, len, from) +return if len == 0 +len ||= 1 +File.popen(['git', 'blame', '--incremental', '-CCC', I am torn on the hardcoded use of -CCC here. Depending on the nature of the change in question, it may match well or worse to what you are trying to find out. When you are trying to say What were you smoking when you implemented this broken logic?, using -C may be good, but when your question is Even though all the callers of this function live in that other file, somebody moved this function that used to be file static in that file to here and made it public. Why?, you do not want to use -C. I am reasonably sure that in the finished code later in the series it will become configurable, but a fallback default is better to be not so expensive one. + '-L', '%u,+%u' % [start, len], + '--since', $since, from + '^', Is from unconditionally set? Perhaps that nil + '^' magically disappear and this code is relying on that, but it smells like a too much magic to me. + '--', source]) do |p| + p.each do |line| +if line =~ /^(\h{40})/ + id = $ + @items[id] = Commit.new(id) +end + end +end + end + + def from_patch(file) +from = source = nil +File.open(file) do |f| + f.each do |line| +case line +when /^From (\h+) (.+)$/ + from = $1 +when /^---\s+(\S+)/ + source = $1 != '/dev/null' ? $1[2..-1] : nil +when /^@@ -(\d+)(?:,(\d+))?/ + get_blame(source, $1, $2, from) +end Makes sense to start from the preimage so that you can find out who wrote the original block of lines your patch is removing. But then if source is /dev/null, wouldn't you be able to stop without running blame at all? You know the patch is creating a new file at that point and there is nobody to point a finger at. + end +end + end + +end + +exit 1 if ARGV.size != 1 + +commits = Commits.new +commits.from_patch(ARGV[0]) +commits.import + +count_per_person = Hash.new(0) + +commits.each do |id, commit| + commit.persons.each do |person| +count_per_person[person] += 1 + end +end + +count_per_person.each do |person, count| + percent = count.to_f * 100 / commits.size + next if percent $min_percent + puts person +end -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
Junio C Hamano gits...@pobox.com writes: Felipe Contreras felipe.contre...@gmail.com writes: diff --git a/contrib/related/git-related b/contrib/related/git-related new file mode 100755 index 000..b96dcdd --- /dev/null +++ b/contrib/related/git-related @@ -0,0 +1,124 @@ +#!/usr/bin/env ruby + +# This script finds people that might be interested in a patch +# usage: git related file + +$since = '5-years-ago' +$min_percent = 10 + +def fmt_person(name, email) + '%s %s' % [name, email] +end Micronit. I suspect you do not need this helper, unless later patches start using it. Not that matters terribly, but later patches start using it in a different way may be needed as a clarification. The current two callers capture both %s %s as two separate groups but they do not need to; they can do (%s %s) and pass $1 to this function, I think. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
On Wed, May 22, 2013 at 2:23 PM, Junio C Hamano gits...@pobox.com wrote: Felipe Contreras felipe.contre...@gmail.com writes: diff --git a/contrib/related/git-related b/contrib/related/git-related new file mode 100755 index 000..b96dcdd --- /dev/null +++ b/contrib/related/git-related @@ -0,0 +1,124 @@ +#!/usr/bin/env ruby + +# This script finds people that might be interested in a patch +# usage: git related file + +$since = '5-years-ago' +$min_percent = 10 + +def fmt_person(name, email) + '%s %s' % [name, email] +end Micronit. I suspect you do not need this helper, unless later patches start using it. It's not *needed*, but it makes if fulfills the role of a function: to avoid typing that code in multiple places. + def import +return if @items.empty? +File.popen(%w[git cat-file --batch], 'r+') do |p| + p.write(@items.keys.join(\n)) + p.close_write + p.each do |line| +if line =~ /^(\h{40}) commit (\d+)/ + id, len = $1, $2 + data = p.read($2.to_i) + @items[id].parse(data) +end + end +end + end + + def get_blame(source, start, len, from) +return if len == 0 +len ||= 1 +File.popen(['git', 'blame', '--incremental', '-CCC', I am torn on the hardcoded use of -CCC here. Depending on the nature of the change in question, it may match well or worse to what you are trying to find out. When you are trying to say What were you smoking when you implemented this broken logic?, using -C may be good, but when your question is Even though all the callers of this function live in that other file, somebody moved this function that used to be file static in that file to here and made it public. Why?, you do not want to use -C. I am reasonably sure that in the finished code later in the series it will become configurable, but a fallback default is better to be not so expensive one. The script's purpose is to find related commits, -CCC does that, does it not? + '-L', '%u,+%u' % [start, len], + '--since', $since, from + '^', Is from unconditionally set? Perhaps that nil + '^' magically disappear and this code is relying on that, but it smells like a too much magic to me. I personally don't care. You decide what's the behavior when no 'From ' line is available in the patch. I don't see the point in worrying about non-git patches. + '--', source]) do |p| + p.each do |line| +if line =~ /^(\h{40})/ + id = $ + @items[id] = Commit.new(id) +end + end +end + end + + def from_patch(file) +from = source = nil +File.open(file) do |f| + f.each do |line| +case line +when /^From (\h+) (.+)$/ + from = $1 +when /^---\s+(\S+)/ + source = $1 != '/dev/null' ? $1[2..-1] : nil +when /^@@ -(\d+)(?:,(\d+))?/ + get_blame(source, $1, $2, from) +end Makes sense to start from the preimage so that you can find out who wrote the original block of lines your patch is removing. But then if source is /dev/null, wouldn't you be able to stop without running blame at all? You know the patch is creating a new file at that point and there is nobody to point a finger at. A patch can touch multiple files. -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
Felipe Contreras felipe.contre...@gmail.com writes: Depending on the nature of the change in question, it may match well or worse to what you are trying to find out. When you are trying to say What were you smoking when you implemented this broken logic?, using -C may be good, but when your question is Even though all the callers of this function live in that other file, somebody moved this function that used to be file static in that file to here and made it public. Why?, you do not want to use -C. I am reasonably sure that in the finished code later in the series it will become configurable, but a fallback default is better to be not so expensive one. The script's purpose is to find related commits, -CCC does that, does it not? As I already said in the above, the answer is no, when you are trying to find who moved the code from the original place. Makes sense to start from the preimage so that you can find out who wrote the original block of lines your patch is removing. But then if source is /dev/null, wouldn't you be able to stop without running blame at all? You know the patch is creating a new file at that point and there is nobody to point a finger at. A patch can touch multiple files. So? What line range would you be feeding blame with, for such a file creation event? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
On Wed, May 22, 2013 at 5:38 PM, Junio C Hamano gits...@pobox.com wrote: Felipe Contreras felipe.contre...@gmail.com writes: Depending on the nature of the change in question, it may match well or worse to what you are trying to find out. When you are trying to say What were you smoking when you implemented this broken logic?, using -C may be good, but when your question is Even though all the callers of this function live in that other file, somebody moved this function that used to be file static in that file to here and made it public. Why?, you do not want to use -C. I am reasonably sure that in the finished code later in the series it will become configurable, but a fallback default is better to be not so expensive one. The script's purpose is to find related commits, -CCC does that, does it not? As I already said in the above, the answer is no, when you are trying to find who moved the code from the original place. But I'm not trying to find who moved the code, I'm trying to find related commits; hence the name 'git related'. The person who moved the code will be on the list regardless, so 'git related' does find the person who moved the code indirectly; by finding the persons that touched the code. Makes sense to start from the preimage so that you can find out who wrote the original block of lines your patch is removing. But then if source is /dev/null, wouldn't you be able to stop without running blame at all? You know the patch is creating a new file at that point and there is nobody to point a finger at. A patch can touch multiple files. So? What line range would you be feeding blame with, for such a file creation event? I thought it was skipping git blame in such cases, perhaps it got dropped in one of the other countless versions of this script. Good catch. -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
Felipe Contreras felipe.contre...@gmail.com writes: On Wed, May 22, 2013 at 5:38 PM, Junio C Hamano gits...@pobox.com wrote: ... As I already said in the above, the answer is no, when you are trying to find who moved the code from the original place. But I'm not trying to find who moved the code, I'm trying to find related commits; hence the name 'git related'. The person who moved the code will be on the list regardless, That is exactly the point I have been trying to raise. Does the person appear in the list when you run blame with -CCC? You ask for the body of the function, and the -C mode of blame sees through the block-of-line movement across file boundaries, and goes straight to the last commit that touched the body of the function in its original file, no? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
Junio C Hamano gits...@pobox.com writes: Felipe Contreras felipe.contre...@gmail.com writes: On Wed, May 22, 2013 at 5:38 PM, Junio C Hamano gits...@pobox.com wrote: ... As I already said in the above, the answer is no, when you are trying to find who moved the code from the original place. But I'm not trying to find who moved the code, I'm trying to find related commits; hence the name 'git related'. The person who moved the code will be on the list regardless, That is exactly the point I have been trying to raise. Does the person appear in the list when you run blame with -CCC? You ask for s/person/commit/; the body of the function, and the -C mode of blame sees through the block-of-line movement across file boundaries, and goes straight to the last commit that touched the body of the function in its original file, no? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
On Wed, May 22, 2013 at 5:53 PM, Junio C Hamano gits...@pobox.com wrote: Felipe Contreras felipe.contre...@gmail.com writes: On Wed, May 22, 2013 at 5:38 PM, Junio C Hamano gits...@pobox.com wrote: ... As I already said in the above, the answer is no, when you are trying to find who moved the code from the original place. But I'm not trying to find who moved the code, I'm trying to find related commits; hence the name 'git related'. The person who moved the code will be on the list regardless, That is exactly the point I have been trying to raise. Does the person appear in the list when you run blame with -CCC? You ask for the body of the function, and the -C mode of blame sees through the block-of-line movement across file boundaries, and goes straight to the last commit that touched the body of the function in its original file, no? I'm not familiar how the different -C options work, but I'm testing right now and I see the commit that moved a file with both -C and -CCC, but strangely enough, not if it's the previous commit (with both). -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
Junio C Hamano gits...@pobox.com writes: The person who moved the code will be on the list regardless, That is exactly the point I have been trying to raise. Does the person appear in the list when you run blame with -CCC? You ask for s/person/commit/; the body of the function, and the -C mode of blame sees through the block-of-line movement across file boundaries, and goes straight to the last commit that touched the body of the function in its original file, no? -- 8 -- cd /var/tmp/ git init blame cd blame cp /src/git/COPYING COPYING git add COPYING git commit -m initial sed -ne 120,140p COPYING EXTRACTING git add EXTRACTING git commit -m second git blame -C -C -C EXTRACTING -- 8 -- will show that all lines from EXTRACTING came from the original revision of COPYING, and we will miss the line-move event. blame -C with a single -C does not look at the files that did not change in the commit that added these lines to EXTRACTING (i.e. COPYING), so the digging stops there. After this, if you do this: -- 8 -- echo COPYING git commit --amend -a --no-edit git blame -C EXTRACTING -- 8 -- then the commit that did add these lines to EXTRACTING touched COPYING, and the origin of these lines are traced to it (this is designed to be useful in a typical refactor by moving code; cut and paste without changing the original people need heavier copy detection with more -C). IIRC, git-gui runs two blames, one without any -C and one with (I do not offhand recall how many -C it uses) to show both. HTH. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
Felipe Contreras felipe.contre...@gmail.com writes: IIRC, git-gui runs two blames, one without any -C and one with (I do not offhand recall how many -C it uses) to show both. 'git blame' is a very expensive operation, perhaps we should add another option so users don't need to run two blames to find this. Yes, you could add a struct origin *suspect_in_the_same_file next to the current struct origin *suspect to the blame_entry in order to represent the origin you would get if you were to stop at a move across files event, and keep digging, when you are operating under -C -C or higher mode. That would make the result just as expensive as a single run of -C -C (or higher mode) [*1*] without having to run an extra git blame without any -C (even though that mode is much cheaper). Representing the result for a terminal with reasonable width might become unwieldy, but for --porcelain output format that should not be a problem. [Footnote] *1* It would make it slightly more expensive than the current code, as it is very likely that you have to split a single blame_entry into many separate pieces. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
This doesn't make any sense: --- cd /tmp rm -rf blame git init blame cd blame cp ~/dev/git/COPYING COPYING git add COPYING git commit -m initial sed -ne 120,140p COPYING EXTRACTING git add EXTRACTING git commit -m second git log --oneline git blame -C EXTRACTING echo COPYING git commit --amend -a --no-edit git log --oneline git blame -C EXTRACTING --- Why would the first instance find the blame in commit #2, and the second one at commit #1? If anything, the second instance should have more trouble finding the original commit. -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
On Wed, May 22, 2013 at 10:23 PM, Felipe Contreras felipe.contre...@gmail.com wrote: This doesn't make any sense: Ah, never mind, it's COPYING the one being modified, not EXTRACTING. -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
On Wed, May 22, 2013 at 7:08 PM, Junio C Hamano gits...@pobox.com wrote: Felipe Contreras felipe.contre...@gmail.com writes: IIRC, git-gui runs two blames, one without any -C and one with (I do not offhand recall how many -C it uses) to show both. 'git blame' is a very expensive operation, perhaps we should add another option so users don't need to run two blames to find this. Yes, you could add a struct origin *suspect_in_the_same_file next to the current struct origin *suspect to the blame_entry in order to represent the origin you would get if you were to stop at a move across files event, and keep digging, when you are operating under -C -C or higher mode. No, this is what I meant: --- a/builtin/blame.c +++ b/builtin/blame.c @@ -1509,16 +1509,6 @@ static void found_guilty_entry(struct blame_entry *ent) if (ent-guilty) return; ent-guilty = 1; - if (incremental) { - struct origin *suspect = ent-suspect; - - printf(%s %d %d %d\n, - sha1_to_hex(suspect-commit-object.sha1), - ent-s_lno + 1, ent-lno + 1, ent-num_lines); - emit_one_suspect_detail(suspect, 0); - write_filename_info(suspect-path); - maybe_flush_or_die(stdout, stdout); - } } /* @@ -1536,12 +1526,24 @@ static void assign_blame(struct scoreboard *sb, int opt) struct origin *suspect = NULL; /* find one suspect to break down */ - for (ent = sb-ent; !suspect ent; ent = ent-next) - if (!ent-guilty) + for (ent = sb-ent; ent; ent = ent-next) + if (!ent-guilty) { suspect = ent-suspect; + break; + } + if (!suspect) return; /* all done */ + if (incremental !is_null_sha1(suspect-commit-object.sha1)) { + printf(%s %d %d %d\n, + sha1_to_hex(suspect-commit-object.sha1), + ent-s_lno + 1, ent-lno + 1, ent-num_lines); + emit_one_suspect_detail(suspect, 0); + write_filename_info(suspect-path); + maybe_flush_or_die(stdout, stdout); + } + /* * We will use this suspect later in the loop, * so hold onto it in the meantime. -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Add new git-related helper to contrib
On Wed, May 22, 2013 at 11:07 PM, Felipe Contreras felipe.contre...@gmail.com wrote: On Wed, May 22, 2013 at 7:08 PM, Junio C Hamano gits...@pobox.com wrote: Felipe Contreras felipe.contre...@gmail.com writes: IIRC, git-gui runs two blames, one without any -C and one with (I do not offhand recall how many -C it uses) to show both. 'git blame' is a very expensive operation, perhaps we should add another option so users don't need to run two blames to find this. Yes, you could add a struct origin *suspect_in_the_same_file next to the current struct origin *suspect to the blame_entry in order to represent the origin you would get if you were to stop at a move across files event, and keep digging, when you are operating under -C -C or higher mode. No, this is what I meant: But that would not print the right line numbers, so perhaps: --- a/builtin/blame.c +++ b/builtin/blame.c @@ -1500,25 +1500,16 @@ static int emit_one_suspect_detail(struct origin *suspect, int repeat) return 1; } -/* - * The blame_entry is found to be guilty for the range. Mark it - * as such, and show it in incremental output. - */ -static void found_guilty_entry(struct blame_entry *ent) +static void print_guilty_entry(struct blame_entry *ent) { - if (ent-guilty) - return; - ent-guilty = 1; - if (incremental) { - struct origin *suspect = ent-suspect; - - printf(%s %d %d %d\n, - sha1_to_hex(suspect-commit-object.sha1), - ent-s_lno + 1, ent-lno + 1, ent-num_lines); - emit_one_suspect_detail(suspect, 0); - write_filename_info(suspect-path); - maybe_flush_or_die(stdout, stdout); - } + struct origin *suspect = ent-suspect; + + printf(%s %d %d %d\n, + sha1_to_hex(suspect-commit-object.sha1), + ent-s_lno + 1, ent-lno + 1, ent-num_lines); + emit_one_suspect_detail(suspect, 0); + write_filename_info(suspect-path); + maybe_flush_or_die(stdout, stdout); } /* @@ -1531,14 +1522,19 @@ static void assign_blame(struct scoreboard *sb, int opt) struct rev_info *revs = sb-revs; while (1) { - struct blame_entry *ent; + struct blame_entry *ent, tmp_ent; struct commit *commit; struct origin *suspect = NULL; + int found_guilty = 0; /* find one suspect to break down */ - for (ent = sb-ent; !suspect ent; ent = ent-next) - if (!ent-guilty) + for (ent = sb-ent; ent; ent = ent-next) + if (!ent-guilty) { + tmp_ent = *ent; suspect = ent-suspect; + break; + } + if (!suspect) return; /* all done */ @@ -1564,9 +1560,20 @@ static void assign_blame(struct scoreboard *sb, int opt) commit-object.flags |= UNINTERESTING; /* Take responsibility for the remaining entries */ - for (ent = sb-ent; ent; ent = ent-next) - if (same_suspect(ent-suspect, suspect)) - found_guilty_entry(ent); + for (ent = sb-ent; ent; ent = ent-next) { + if (same_suspect(ent-suspect, suspect)) { + if (ent-guilty) + continue; + found_guilty = ent-guilty = 1; + if (incremental) + print_guilty_entry(ent); + } + } + + if (incremental !found_guilty + !is_null_sha1(suspect-commit-object.sha1)) + print_guilty_entry(tmp_ent); + origin_decref(suspect); if (DEBUG) /* sanity */ -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6] Add new git-related helper to contrib
This script find people that might be interested in a patch, by going back through the history for each single hunk modified, and finding people that reviewed, acknowledge, signed, or authored the code the patch is modifying. It does this by running 'git blame' incrementally on each hunk, and then parsing the commit message. After gathering all the relevant people, it groups them to show what exactly was their role when the participated in the development of the relevant commit, and on how many relevant commits they participated. They are only displayed if they pass a minimum threshold of participation. For example: % git related 0001-remote-hg-trivial-cleanups.patch Felipe Contreras felipe.contre...@gmail.com Jeff King p...@peff.net Max Horn m...@quendi.de Junio C Hamano gits...@pobox.com Thus it can be used for 'git send-email' as a cc-cmd. There might be some other related functions to this script, not just to be used as a cc-cmd. Comments-by: Ramkumar Ramachandra artag...@gmail.com Signed-off-by: Felipe Contreras felipe.contre...@gmail.com --- Sames as v5, with a few tiny modifications. I'm tired of sending the whole series multiple times over the years, only to get stuck at the first patch, so I'll send only the first one. contrib/related/git-related | 124 1 file changed, 124 insertions(+) create mode 100755 contrib/related/git-related diff --git a/contrib/related/git-related b/contrib/related/git-related new file mode 100755 index 000..b96dcdd --- /dev/null +++ b/contrib/related/git-related @@ -0,0 +1,124 @@ +#!/usr/bin/env ruby + +# This script finds people that might be interested in a patch +# usage: git related file + +$since = '5-years-ago' +$min_percent = 10 + +def fmt_person(name, email) + '%s %s' % [name, email] +end + +class Commit + + attr_reader :persons + + def initialize(id) +@id = id +@persons = [] + end + + def parse(data) +msg = nil +data.each_line do |line| + if not msg +case line +when /^author ([^]+) (\S+) (.+)$/ + @persons fmt_person($1, $2) +when /^$/ + msg = true +end + else +if line =~ /^(Signed-off|Reviewed|Acked)-by: ([^]+) (\S+?)$/ + @persons fmt_person($2, $3) +end + end +end +@persons.uniq! + end + +end + +class Commits + + def initialize +@items = {} + end + + def size +@items.size + end + + def each(block) +@items.each(block) + end + + def import +return if @items.empty? +File.popen(%w[git cat-file --batch], 'r+') do |p| + p.write(@items.keys.join(\n)) + p.close_write + p.each do |line| +if line =~ /^(\h{40}) commit (\d+)/ + id, len = $1, $2 + data = p.read($2.to_i) + @items[id].parse(data) +end + end +end + end + + def get_blame(source, start, len, from) +return if len == 0 +len ||= 1 +File.popen(['git', 'blame', '--incremental', '-CCC', + '-L', '%u,+%u' % [start, len], + '--since', $since, from + '^', + '--', source]) do |p| + p.each do |line| +if line =~ /^(\h{40})/ + id = $ + @items[id] = Commit.new(id) +end + end +end + end + + def from_patch(file) +from = source = nil +File.open(file) do |f| + f.each do |line| +case line +when /^From (\h+) (.+)$/ + from = $1 +when /^---\s+(\S+)/ + source = $1 != '/dev/null' ? $1[2..-1] : nil +when /^@@ -(\d+)(?:,(\d+))?/ + get_blame(source, $1, $2, from) +end + end +end + end + +end + +exit 1 if ARGV.size != 1 + +commits = Commits.new +commits.from_patch(ARGV[0]) +commits.import + +count_per_person = Hash.new(0) + +commits.each do |id, commit| + commit.persons.each do |person| +count_per_person[person] += 1 + end +end + +count_per_person.each do |person, count| + percent = count.to_f * 100 / commits.size + next if percent $min_percent + puts person +end -- 1.8.3.rc3.286.g3d43083 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html