Re: Fwd: possible Improving diff algoritm
>> Is there a reason why picking among the choices in a sliding window >> must be contents neutral? > > Sorry, you might be getting at something interesting but I do not > understand the question. I have no idea what you mean by "contents > neutral". I was merely asking if an algorithm to pick between the 2+ choices was allowed to look at the contents of the lines. I.e., an algorithm would look at the C comment example and determine that the choice starting containing a full inserted comment is preferable over the one that appears to close one comment and open a new. And the in inserted-function case it would prefer the one where the matching { and } are in correct order. Morten -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: possible Improving diff algoritm
On 12/12/2012 10:53 PM, Junio C Hamano wrote: > Morten Welinder writes: > >> Is there a reason why picking among the choices in a sliding window >> must be contents neutral? > > Sorry, you might be getting at something interesting but I do not > understand the question. I have no idea what you mean by "contents > neutral". > > Picking between these two choices > > /** +/** > + * Default parent + * Default parent > + * + * > + * @var int + * @var int > + * @access protected+ * @access protected > + * @index + * @index > + */ + */ > +protected $defaultParent; +protected $defaultParent; > ++ > +/** /** > > would not affect the correctness of the patch. You may pick > whatever you deem the most desirable, but your answer must be a > correct patch (the definition of "correct" here is "applying that > patch to the preimage produces the intended postimage"). > > And I think if you inserted a block of text B after a context C > where the tail of B matches the tail of C like the above, you can > shift what you treat as "inserted" up and still come up with a > correct patch. I have the feeling that a few crude heuristics would go a long way towards improving diffs like this. For example: * Prefer to have an add/remove block that has balanced begin/end pairs (where begin/end pairs might be opening and closing parentheses, brackets, braces, and angle brackets, "/*" and "*/", and perhaps a couple of other things. For SGML-like text begin and end tags could be matched up. It would be possible to read these begin/end pairs from a filetype-specific table or configuration setting, though this would add complication and would also make it possible that diffs generated by two different people are not identical if their configurations differ. * Prefer to have a block where the first non-blank line of the block and the first non-blank line after the block are indented by the same amount. * Prefer to have a block with trailing (as opposed to leading or embedded) blank lines--the more the better. The beautiful thing is that even if the heuristics sometimes fail, the correctness of the patch (in the sense that you have defined) is not compromised. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: possible Improving diff algoritm
Javier Domingo writes: > I must say it is _quite_ helpfull having the diffs well done (natural > diffs as here named), just because when you want to review a patch on > the fly, this sort of things are annoying. I do not think anybody is arguing that it would not help the human users to shift the hunk boundaries in the case Kevin's original message demonstrated. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: possible Improving diff algoritm
I must say it is _quite_ helpfull having the diffs well done (natural diffs as here named), just because when you want to review a patch on the fly, this sort of things are annoying. I just wanted to say my opinion. No idea on how to fix that, nor why does it happen. Javier Domingo 2012/12/12 Andrew Ardill : > On 13 December 2012 08:53, Junio C Hamano wrote: >> The output being "a correct patch" is not the only thing we need to >> consider, though, as I mentioned in another response to Kevin >> regarding the "consequences". > > The main benefit of picking a more 'natural' diff is a usability one. > I know that when a chunk begins and ends one line after the logical > break point (typically with braces in my experience) mentally parsing > the diff becomes significantly harder. If there was a way to teach git > where it should try and break out a chunk (potentially per filetype?) > this is a good thing for readability, and I think would outweigh any > temporary pain with regards to cached rerere and diff data. > > Regards, > > Andrew Ardill > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: possible Improving diff algoritm
Morten Welinder writes: > Is there a reason why picking among the choices in a sliding window > must be contents neutral? Sorry, you might be getting at something interesting but I do not understand the question. I have no idea what you mean by "contents neutral". Picking between these two choices /** +/** + * Default parent + * Default parent + * + * + * @var int + * @var int + * @access protected+ * @access protected + * @index + * @index + */ + */ +protected $defaultParent; +protected $defaultParent; ++ +/** /** would not affect the correctness of the patch. You may pick whatever you deem the most desirable, but your answer must be a correct patch (the definition of "correct" here is "applying that patch to the preimage produces the intended postimage"). And I think if you inserted a block of text B after a context C where the tail of B matches the tail of C like the above, you can shift what you treat as "inserted" up and still come up with a correct patch. The output being "a correct patch" is not the only thing we need to consider, though, as I mentioned in another response to Kevin regarding the "consequences". -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: possible Improving diff algoritm
> So I think with s/Regularly/About half the time/, your observation > above is correct. > > I think the reason you perceived this as "Regularly" is that you do > not notice nor appreciate it when things go right (half the time), > but you tend to notice and remember only when a wrong side happened > to have been picked (the other half). Is there a reason why picking among the choices in a sliding window must be contents neutral? I see these "illogical" (== stylistically not matching user intent) diffs quite often. C comments (as in the example given) and #ifdef blocks are typical cases. Purely anecdotically, I have seen more trouble applying "illogical" diffs than I would have expected from the corresponding "logical" diffs. Morten -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: possible Improving diff algoritm
Junio C Hamano writes: > Kevin writes: > >> Regularly I notice that the diffs that are provided (through diff, or >> add -p) tend to disconnect changes that belong to each other and >> report lines being changed that are not changed. >> >> An example for this is: >> >> /** >> + * Default parent >> + * >> + * @var int >> + * @access protected >> + * @index >> + */ >> +protected $defaultParent; >> + >> +/** >> >> I understand this is a valid view of what is changed, but not a very >> logical view from the point of the user. >> >> I wondered if there is a way to improve this, or would that have other >> consequences. I forgot to mention consequences. Changing it obviously changes the shape of the diff, hence changes the patch id. Anything that caches output from "git cherry" to match up "identical patches" will need to discard and repopulate its cache. Your "rerere" database will go stale. Also "kup" tool used at k.org allows an uploader to pretend to upload an incremental diff between two known commits by only sending the GPG signature of the diff the uploader generates. The actual diff is generated on the k.org machine locally and deposited next to the GPG signature file, with the expectation that the signture matches the diff. Changing the output from diff between two versions will break the optimization and force the uploader to upload the diff over the wire. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: possible Improving diff algoritm
Yeah, I didn't mention it, but I didn't think it was doing this wrong in a systematic way. I only wondered if there was some kind of heuristic that could improve the cases where it goes wrong, without affecting the cases where it would do it right. I know this is not an easy problem, lest it would already been fixed. On Wed, Dec 12, 2012 at 7:29 PM, Junio C Hamano wrote: > Kevin writes: > >> Regularly I notice that the diffs that are provided (through diff, or >> add -p) tend to disconnect changes that belong to each other and >> report lines being changed that are not changed. >> >> An example for this is: >> >> /** >> + * Default parent >> + * >> + * @var int >> + * @access protected >> + * @index >> + */ >> +protected $defaultParent; >> + >> +/** >> >> I understand this is a valid view of what is changed, but not a very >> logical view from the point of the user. >> >> I wondered if there is a way to improve this, or would that have other >> consequences. > > I think your example shows a case where the end of the pre-context > matches the end of the added text in the hunk, and it appears it may > produce a better result if you shift the hunk up. But I think that > works only half the time. Imagine: > >@@ -K,L +M,N @@ > } > >+void new_function(void) >+{ >+ printf("hello, world.\n"); >+} >+ > void existing_one(void) > { > printf("goodbye, world.\n"); > > Here the end of the pre-context matches the end of the added lines, > but it will produce worse result if you blindly apply the "shift the > hunk up" trick: > > ... what was before the } we saw in the precontext ... >+} >+ >+void new_function(void) >+{ >+ printf("hello, world.\n"); > } > > void existing_one(void) > > So I think with s/Regularly/About half the time/, your observation > above is correct. > > I think the reason you perceived this as "Regularly" is that you do > not notice nor appreciate it when things go right (half the time), > but you tend to notice and remember only when a wrong side happened > to have been picked (the other half). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: possible Improving diff algoritm
On 12-12-12 01:29 PM, Junio C Hamano wrote: > > Here the end of the pre-context matches the end of the added lines, > but it will produce worse result if you blindly apply the "shift the > hunk up" trick: Yeah. I would not think a blind shift would be appropriate. But I wonder if diff can take whitespace hints about when to shift and when not to. It would still never be perfect but might be more accurate than it is now. Can't imagine it would be any worse. b. signature.asc Description: OpenPGP digital signature
Re: Fwd: possible Improving diff algoritm
Kevin writes: > Regularly I notice that the diffs that are provided (through diff, or > add -p) tend to disconnect changes that belong to each other and > report lines being changed that are not changed. > > An example for this is: > > /** > + * Default parent > + * > + * @var int > + * @access protected > + * @index > + */ > +protected $defaultParent; > + > +/** > > I understand this is a valid view of what is changed, but not a very > logical view from the point of the user. > > I wondered if there is a way to improve this, or would that have other > consequences. I think your example shows a case where the end of the pre-context matches the end of the added text in the hunk, and it appears it may produce a better result if you shift the hunk up. But I think that works only half the time. Imagine: @@ -K,L +M,N @@ } +void new_function(void) +{ + printf("hello, world.\n"); +} + void existing_one(void) { printf("goodbye, world.\n"); Here the end of the pre-context matches the end of the added lines, but it will produce worse result if you blindly apply the "shift the hunk up" trick: ... what was before the } we saw in the precontext ... +} + +void new_function(void) +{ + printf("hello, world.\n"); } void existing_one(void) So I think with s/Regularly/About half the time/, your observation above is correct. I think the reason you perceived this as "Regularly" is that you do not notice nor appreciate it when things go right (half the time), but you tend to notice and remember only when a wrong side happened to have been picked (the other half). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html