"Philip Oakley" <philipoak...@iee.org> writes:

>> To get ground truth of authorship for each line, I start with
>> git-blame.
>> But later I find this is not sufficient because the last commit may
>> only
>> add comments or may only change a small part of the line, so that I
>> shouldn't attribute the line of code to the last author.
>
> I would suggest there is:
> - White space adjustment
> - Comment or documentation (assumes you can parse the 'code' to decide
> that it isn't executable code)
> - word changes within expressions
> - complete replacement of line (whole statement?)

You are being generous by listing easier cases ;-) I'd add a couple
more that are more problematic if your approach does not consider
semantics.

 - A function gained a new parameter, to which pretty much everbody
   passes the same default value.

        -void fn(int a, int b, int c)
        +void fn(int a, int b, int c, int d)
         {
        +       if (d) {
        +               ...
        +               return;
        +       }
                ...
         }

         void frotz(void)
         {
                ...
        -       fn(a, b, c);
        +       fn(a, b, c, 0);
                ...
        -       fn(a, b, d);
        +       fn(a, b, d, 1);
                ...

   The same commit that changed the above call site must have
   changed the definition of function "fn" and defined what the new
   fourth parameter means.  It is likely that, when the default
   value most everybody passes (perhaps "0") is given, "fn" does
   what it used to do, and a different value may trigger a new
   behaviour of "fn".  It could be argued that the former call
   should not be blamed for this commit, while the latter callsite
   should.

 - A variable was renamed, and the meaning of a line suddenly
   changed, even though the text of that line did not change at all.

         static int foo;
         ...
        -int xyzzy(int foo)
        +int xyzzy(int bar)
         {
                ... some complex computation that
                ... involves foo and bar, resulting in
                ... updating of foo comes here ...
                return foo * 2;
         }

   Whom to blame the behaviour of (i.e. returned value from) the
   function?  The "return foo * 2" never changed with this patch,
   but the patch _is_ responsible for changing the behaviour.

   As the OP is interested in tracking the origin of the _binary_,
   this case is even more interesting, as the generated machine code
   to compute the foo * 2 would likely to be very different before
   and after the patch.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to