Re: Fwd: possible Improving diff algoritm

2012-12-12 Thread Morten Welinder
>> Is there a reason why picking among the choices in a sliding window
>> must be contents neutral?
>
> Sorry, you might be getting at something interesting but I do not
> understand the question.  I have no idea what you mean by "contents
> neutral".

I was merely asking if an algorithm to pick between the
2+ choices was allowed to look at the contents of the
lines.

I.e., an algorithm would look at the C comment
example and determine that the choice starting containing
a full inserted comment is preferable over the one that
appears to close one comment and open a new.

And the in inserted-function case it would prefer the one
where the matching { and } are in correct order.

Morten
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: possible Improving diff algoritm

2012-12-12 Thread Michael Haggerty
On 12/12/2012 10:53 PM, Junio C Hamano wrote:
> Morten Welinder  writes:
> 
>> Is there a reason why picking among the choices in a sliding window
>> must be contents neutral?
> 
> Sorry, you might be getting at something interesting but I do not
> understand the question.  I have no idea what you mean by "contents
> neutral".
> 
> Picking between these two choices
> 
>  /** +/** 
> + * Default parent   + * Default parent   
> + *  + *  
> + * @var int + * @var int 
> + * @access protected+ * @access protected
> + * @index   + * @index   
> + */ + */ 
> +protected $defaultParent;   +protected $defaultParent;   
> ++
> +/**  /** 
> 
> would not affect the correctness of the patch.  You may pick
> whatever you deem the most desirable, but your answer must be a
> correct patch (the definition of "correct" here is "applying that
> patch to the preimage produces the intended postimage").
> 
> And I think if you inserted a block of text B after a context C
> where the tail of B matches the tail of C like the above, you can
> shift what you treat as "inserted" up and still come up with a
> correct patch.

I have the feeling that a few crude heuristics would go a long way
towards improving diffs like this.  For example:

* Prefer to have an add/remove block that has balanced begin/end pairs
(where begin/end pairs might be opening and closing parentheses,
brackets, braces, and angle brackets, "/*" and "*/", and perhaps a
couple of other things.  For SGML-like text begin and end tags could be
matched up.

It would be possible to read these begin/end pairs from a
filetype-specific table or configuration setting, though this would add
complication and would also make it possible that diffs generated by two
different people are not identical if their configurations differ.

* Prefer to have a block where the first non-blank line of the block and
the first non-blank line after the block are indented by the same amount.

* Prefer to have a block with trailing (as opposed to leading or
embedded) blank lines--the more the better.

The beautiful thing is that even if the heuristics sometimes fail, the
correctness of the patch (in the sense that you have defined) is not
compromised.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: possible Improving diff algoritm

2012-12-12 Thread Junio C Hamano
Javier Domingo  writes:

> I must say it is _quite_ helpfull having the diffs well done (natural
> diffs as here named), just because when you want to review a patch on
> the fly, this sort of things are annoying.

I do not think anybody is arguing that it would not help the human
users to shift the hunk boundaries in the case Kevin's original
message demonstrated.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: possible Improving diff algoritm

2012-12-12 Thread Javier Domingo
I must say it is _quite_ helpfull having the diffs well done (natural
diffs as here named), just because when you want to review a patch on
the fly, this sort of things are annoying.

I just wanted to say my opinion. No idea on how to fix that, nor why
does it happen.

Javier Domingo


2012/12/12 Andrew Ardill :
> On 13 December 2012 08:53, Junio C Hamano  wrote:
>> The output being "a correct patch" is not the only thing we need to
>> consider, though, as I mentioned in another response to Kevin
>> regarding the "consequences".
>
> The main benefit of picking a more 'natural' diff is a usability one.
> I know that when a chunk begins and ends one line after the logical
> break point (typically with braces in my experience) mentally parsing
> the diff becomes significantly harder. If there was a way to teach git
> where it should try and break out a chunk (potentially per filetype?)
> this is a good thing for readability, and I think would outweigh any
> temporary pain with regards to cached rerere and diff data.
>
> Regards,
>
> Andrew Ardill
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: possible Improving diff algoritm

2012-12-12 Thread Junio C Hamano
Morten Welinder  writes:

> Is there a reason why picking among the choices in a sliding window
> must be contents neutral?

Sorry, you might be getting at something interesting but I do not
understand the question.  I have no idea what you mean by "contents
neutral".

Picking between these two choices

 /** +/** 
+ * Default parent   + * Default parent   
+ *  + *  
+ * @var int + * @var int 
+ * @access protected+ * @access protected
+ * @index   + * @index   
+ */ + */ 
+protected $defaultParent;   +protected $defaultParent;   
++
+/**  /** 

would not affect the correctness of the patch.  You may pick
whatever you deem the most desirable, but your answer must be a
correct patch (the definition of "correct" here is "applying that
patch to the preimage produces the intended postimage").

And I think if you inserted a block of text B after a context C
where the tail of B matches the tail of C like the above, you can
shift what you treat as "inserted" up and still come up with a
correct patch.

The output being "a correct patch" is not the only thing we need to
consider, though, as I mentioned in another response to Kevin
regarding the "consequences".

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: possible Improving diff algoritm

2012-12-12 Thread Morten Welinder
> So I think with s/Regularly/About half the time/, your observation
> above is correct.
>
> I think the reason you perceived this as "Regularly" is that you do
> not notice nor appreciate it when things go right (half the time),
> but you tend to notice and remember only when a wrong side happened
> to have been picked (the other half).

Is there a reason why picking among the choices in a sliding window
must be contents neutral?

I see these "illogical" (== stylistically not matching user intent) diffs
quite often.  C comments (as in the example given) and #ifdef blocks
are typical cases.

Purely anecdotically, I have seen more trouble applying "illogical"
diffs than I would have expected from the corresponding "logical" diffs.

Morten
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: possible Improving diff algoritm

2012-12-12 Thread Junio C Hamano
Junio C Hamano  writes:

> Kevin  writes:
>
>> Regularly I notice that the diffs that are provided (through diff, or
>> add -p) tend to disconnect changes that belong to each other and
>> report lines being changed that are not changed.
>>
>> An example for this is:
>>
>>  /**
>> + * Default parent
>> + *
>> + * @var int
>> + * @access protected
>> + * @index
>> + */
>> +protected $defaultParent;
>> +
>> +/**
>>
>> I understand this is a valid view of what is changed, but not a very
>> logical view from the point of the user.
>>
>> I wondered if there is a way to improve this, or would that have other
>> consequences.

I forgot to mention consequences.  Changing it obviously changes the
shape of the diff, hence changes the patch id.  Anything that caches
output from "git cherry" to match up "identical patches" will need
to discard and repopulate its cache.  Your "rerere" database will go
stale.

Also "kup" tool used at k.org allows an uploader to pretend to
upload an incremental diff between two known commits by only sending
the GPG signature of the diff the uploader generates.  The actual
diff is generated on the k.org machine locally and deposited next to
the GPG signature file, with the expectation that the signture
matches the diff.  Changing the output from diff between two
versions will break the optimization and force the uploader to
upload the diff over the wire.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: possible Improving diff algoritm

2012-12-12 Thread Kevin
Yeah, I didn't mention it, but I didn't think it was doing this wrong
in a systematic way. I only wondered if there was some kind of
heuristic that could improve the cases where it goes wrong, without
affecting the cases where it would do it right.

I know this is not an easy problem, lest it would already been fixed.

On Wed, Dec 12, 2012 at 7:29 PM, Junio C Hamano  wrote:
> Kevin  writes:
>
>> Regularly I notice that the diffs that are provided (through diff, or
>> add -p) tend to disconnect changes that belong to each other and
>> report lines being changed that are not changed.
>>
>> An example for this is:
>>
>>  /**
>> + * Default parent
>> + *
>> + * @var int
>> + * @access protected
>> + * @index
>> + */
>> +protected $defaultParent;
>> +
>> +/**
>>
>> I understand this is a valid view of what is changed, but not a very
>> logical view from the point of the user.
>>
>> I wondered if there is a way to improve this, or would that have other
>> consequences.
>
> I think your example shows a case where the end of the pre-context
> matches the end of the added text in the hunk, and it appears it may
> produce a better result if you shift the hunk up.  But I think that
> works only half the time.  Imagine:
>
>@@ -K,L +M,N @@
> }
>
>+void new_function(void)
>+{
>+  printf("hello, world.\n");
>+}
>+
> void existing_one(void)
> {
>   printf("goodbye, world.\n");
>
> Here the end of the pre-context matches the end of the added lines,
> but it will produce worse result if you blindly apply the "shift the
> hunk up" trick:
>
>  ... what was before the } we saw in the precontext ...
>+}
>+
>+void new_function(void)
>+{
>+  printf("hello, world.\n");
> }
>
> void existing_one(void)
>
> So I think with s/Regularly/About half the time/, your observation
> above is correct.
>
> I think the reason you perceived this as "Regularly" is that you do
> not notice nor appreciate it when things go right (half the time),
> but you tend to notice and remember only when a wrong side happened
> to have been picked (the other half).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: possible Improving diff algoritm

2012-12-12 Thread Brian J. Murrell
On 12-12-12 01:29 PM, Junio C Hamano wrote:
> 
> Here the end of the pre-context matches the end of the added lines,
> but it will produce worse result if you blindly apply the "shift the
> hunk up" trick:

Yeah.  I would not think a blind shift would be appropriate.  But I
wonder if diff can take whitespace hints about when to shift and when
not to.

It would still never be perfect but might be more accurate than it is
now.  Can't imagine it would be any worse.

b.




signature.asc
Description: OpenPGP digital signature


Re: Fwd: possible Improving diff algoritm

2012-12-12 Thread Junio C Hamano
Kevin  writes:

> Regularly I notice that the diffs that are provided (through diff, or
> add -p) tend to disconnect changes that belong to each other and
> report lines being changed that are not changed.
>
> An example for this is:
>
>  /**
> + * Default parent
> + *
> + * @var int
> + * @access protected
> + * @index
> + */
> +protected $defaultParent;
> +
> +/**
>
> I understand this is a valid view of what is changed, but not a very
> logical view from the point of the user.
>
> I wondered if there is a way to improve this, or would that have other
> consequences.

I think your example shows a case where the end of the pre-context
matches the end of the added text in the hunk, and it appears it may
produce a better result if you shift the hunk up.  But I think that
works only half the time.  Imagine:

   @@ -K,L +M,N @@
}
   
   +void new_function(void)
   +{
   +  printf("hello, world.\n");
   +}
   +
void existing_one(void)
{
  printf("goodbye, world.\n");

Here the end of the pre-context matches the end of the added lines,
but it will produce worse result if you blindly apply the "shift the
hunk up" trick:

 ... what was before the } we saw in the precontext ...
   +}
   +
   +void new_function(void)
   +{
   +  printf("hello, world.\n");
}

void existing_one(void)

So I think with s/Regularly/About half the time/, your observation
above is correct.

I think the reason you perceived this as "Regularly" is that you do
not notice nor appreciate it when things go right (half the time),
but you tend to notice and remember only when a wrong side happened
to have been picked (the other half).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html