On Wed, Jun 1, 2011 at 10:28 AM, Johan Corveleyn <jcor...@gmail.com> wrote: > On Wed, Jun 1, 2011 at 10:04 AM, Philip Martin > <philip.mar...@wandisco.com> wrote: >> Stefan Fuhrmann <eq...@web.de> writes: >> >>>> Updated log message: >>>> [[[ >>>> Simpler/faster LCS algorithm in libsvn_diff by elimination of idx. >>>> >>>> * subversion/libsvn_diff/lcs.c >>>> (svn_diff__snake): Removed idx parameter (always 0) >>>> (svn_diff__lcs): Removed idx variable (always 0) , let d have either >>>> sign, and moved the origo of fp by d. New version no longer chooses >>>> the shorter file as the "original", and can thus give different LCS's >>>> depending on the order of the input files even when the files have >>>> different lengths. Speed is ~15% faster for core lcs algorithm. >>>> ]]] >>> Committed as r1128857. I took the liberty to >>> weaken the speed-up claim. >> >> This commit fixed the spurious conflict on update in issue 3449, I >> suppose the chosen LCS is "better" in this particular case. Are there >> likely to be cases where the chosen LCS is "worse" so that previously >> clean merges are now conflicts? >> >> See r1130036 for a 3449 regression test. > > I was wondering about the exact same thing yesterday :). I suspected > that the sudden resolution to issue 3449 was related to this. The > chosen LCS is indeed better now, because it follows the same direction > for both comparisons involved in the update/merge (i.e. (Base, Mine) > and (Base, Theirs), whereas previously the LCS could be reversed based > on which side was shorter). See Morten's explanation (and my reply) > about this a couple of mails ago in this thread. > > But I'm also unsure about your second question: are the cases where > there was previously a clean merge, and now a "spurious" conflict. I > suppose one could come up with an example, but I guess it would be > more up to interpretation whether or not there should be (or should > have been) a conflict. I have to think some more about this. Maybe > Morten has an idea? > > Cheers, > -- > Johan >
For merges, the idx patch (r1128857) should be almost exclusively beneficial. As mentioned before, it's not a question of whether the individual LCS's are better or worse; the important thing is that the two primary LCS's for the merge (Base vs. Theirs and Base vs. Mine) are done in a consistent manner. If a section of a file is identical in Mine and Theirs, and it is surrounded by correctly matched unchanged code, then it is guaranteed to survive the merge without conflict now. If, however, the changes are slightly different in Mine and Theirs (but without any obvious conflict), then it is possible that it would merge without conflict before the patch but not after. For (I believe) most of these cases, it is necessary that the lengths are such that the old version switches the order for BOTH diffs, i.e. Base is longer than both Mine and Theirs. Thus the old version will also (by "chance") perform the diffs in a consistent manner, but (again by "chance") a manner that happens to avoid conflict for those particular changes. Here's a carefully crafted example where this is not required: abcde (Base) acd (Theirs) acbd (Mine) Here, there is an ambiguity of whether to match the 'b's or the 'c's between Base and Mine, which determines whether there's a conflict with Theirs. Note that if you remove 'e' from Base, then the old version will also give the same conflict. As Johan said, it's not clear whether this example should conflict or not. For blame, there is less, if any, benefit from the patch. If a revision fully reverts the previous revision, then the old version would tend to give the maximal preserved lines (of course, one can also argue whether or not this is "correct"), whereas the patched version has a chance of matching in a different manner when going back (since it doesn't know it's going back). Of course, the old version only guarantees this benefit if you revert the whole file; if you revert only a part, the lengths might be such that you don't get the same match. One possibility that I think would maximize the number of preserved lines (total age of lines) would be to alternate the order, i.e. each revision is diff'ed the same way with both its neighbors. The difference would probably be minimal, though, and as stated above, it's not clear whether increasing the total age of lines would truly be better. All around, I think the patched version is about as good as we're likely to get on these issues without major changes. :-) Morten