[ Taking a privately-started discussion with danielsh to the list, in
case others have inspiration/insight about this. Question at hand: I'm
having trouble making diff3 work with prefix/suffix scanning of the
diff-optimizations-bytes branch. Any feedback is highly appreciated
:-). ]
On Fri, Dec 31, 2010 at 2:38 PM, Daniel Shahaf <[email protected]> wrote:
[...snip...]
> In diff3 with prefix enabled, I was seeing failures like this:
>
> EXPECTED:
> line1
> <<<
> line2
> ===
> line2 modified
>>>>
>
> ACTUAL:
> <<<
> line1
> line2
> ===
> line1
> line2 modified
>>>>
>
> so I assumed that when the prefix code gobbled the shared prefix. How
> to fix it from there was purely guesswork on my part (and I haven't
> implemented it).
>
> These failures would go away if I prepended a "line0 left" and "line0
> right" to the file.
Yes, I know. That's indeed because the prefix-scanning code gobbled up
the identical prefix. That problem is also there in diff2.
In diff2, I fixed this by simply pre-pending a piece of "common" diff
to the diff linked list, in the function diff.c#svn_diff__diff.
>From r1037371:
[[[
--- subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
(original)
+++ subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
Sun Nov 21 02:36:07 2010
@@ -43,6 +43,22 @@ svn_diff__diff(svn_diff__lcs_t *lcs,
svn_diff_t *diff;
svn_diff_t **diff_ref = &diff;
+ if (want_common && (original_start > 1))
+ {
+ /* we have a prefix to skip */
+ (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref));
+
+ (*diff_ref)->type = svn_diff__type_common;
+ (*diff_ref)->original_start = 0;
+ (*diff_ref)->original_length = original_start - 1;
+ (*diff_ref)->modified_start = 0;
+ (*diff_ref)->modified_length = modified_start - 1;
+ (*diff_ref)->latest_start = 0;
+ (*diff_ref)->latest_length = 0;
+
+ diff_ref = &(*diff_ref)->next;
+ }
+
while (1)
{
if (original_start < lcs->position[0]->offset
]]]
Naively, I wanted to do the same for diff3 (along with some other
tricks, to make the algorithm aware of the prefix_lines), but it
doesn't work. See the patch in attachment.
I mean: it works partly: diff-diff3-test.exe passes with the attached
patch, but merge_reintegrate_tests.py (1, 2, 10, 13 and 14) and
merge_tests.py (61) don't. I haven't studied the test failures in
detail (I probably should, but I currently can't wrap my head around
those tests).
I'm now pondering another approach, to pre-pend an "lcs" piece to the
lcs linked list, and let the rest of the algorithm do its work as
normal. Inspiration for this came when I was reading the code of
diff3.c#svn_diff__resolve_conflict, where a similar trick is used
around line 121:
[[[
/* If there were matching symbols at the start of
* both sequences, record that fact.
*/
if (common_length > 0)
{
lcs = apr_palloc(subpool, sizeof(*lcs));
lcs->next = NULL;
lcs->position[0] = start_position[0];
lcs->position[1] = start_position[1];
lcs->length = common_length;
lcs_ref = &lcs->next;
}
]]]
Maybe I can even integrate this logic into lcs.c#svn_diff__lcs, where
I'm already passing the prefix_lines as a new parameter. If that would
work out, I could probably remove the special common-diff-injecting
code in diff.c#svn_diff__diff.
Thoughts?
It will take me some time/experimentation to work out the details, but
I think that should work ...
Cheers,
--
Johan
Index: subversion/libsvn_diff/diff3.c
===================================================================
--- subversion/libsvn_diff/diff3.c (revision 1041582)
+++ subversion/libsvn_diff/diff3.c (working copy)
@@ -251,10 +251,14 @@ svn_diff_diff3(svn_diff_t **diff,
{
svn_diff__tree_t *tree;
svn_diff__position_t *position_list[3];
+ svn_diff_datasource_e datasource[] = {svn_diff_datasource_original,
+ svn_diff_datasource_modified,
+ svn_diff_datasource_latest};
svn_diff__lcs_t *lcs_om;
svn_diff__lcs_t *lcs_ol;
apr_pool_t *subpool;
apr_pool_t *treepool;
+ apr_off_t prefix_lines = 0;
*diff = NULL;
@@ -263,28 +267,30 @@ svn_diff_diff3(svn_diff_t **diff,
svn_diff__tree_create(&tree, treepool);
+ SVN_ERR(vtable->datasources_open(diff_baton, &prefix_lines, datasource, 3));
+
SVN_ERR(svn_diff__get_tokens(&position_list[0],
tree,
diff_baton, vtable,
svn_diff_datasource_original,
- FALSE,
- 0,
+ TRUE,
+ prefix_lines,
subpool));
SVN_ERR(svn_diff__get_tokens(&position_list[1],
tree,
diff_baton, vtable,
svn_diff_datasource_modified,
- FALSE,
- 0,
+ TRUE,
+ prefix_lines,
subpool));
SVN_ERR(svn_diff__get_tokens(&position_list[2],
tree,
diff_baton, vtable,
svn_diff_datasource_latest,
- FALSE,
- 0,
+ TRUE,
+ prefix_lines,
subpool));
/* Get rid of the tokens, we don't need them to calc the diff */
@@ -295,18 +301,18 @@ svn_diff_diff3(svn_diff_t **diff,
svn_pool_destroy(treepool);
/* Get the lcs for original-modified and original-latest */
- lcs_om = svn_diff__lcs(position_list[0], position_list[1], 0,
+ lcs_om = svn_diff__lcs(position_list[0], position_list[1], prefix_lines,
subpool);
- lcs_ol = svn_diff__lcs(position_list[0], position_list[2], 0,
+ lcs_ol = svn_diff__lcs(position_list[0], position_list[2], prefix_lines,
subpool);
/* Produce a merged diff */
{
svn_diff_t **diff_ref = diff;
- apr_off_t original_start = 1;
- apr_off_t modified_start = 1;
- apr_off_t latest_start = 1;
+ apr_off_t original_start = prefix_lines + 1;
+ apr_off_t modified_start = prefix_lines + 1;
+ apr_off_t latest_start = prefix_lines + 1;
apr_off_t original_sync;
apr_off_t modified_sync;
apr_off_t latest_sync;
@@ -317,6 +323,23 @@ svn_diff_diff3(svn_diff_t **diff,
svn_boolean_t is_latest;
svn_diff__position_t sentinel_position[2];
+ if (prefix_lines > 0)
+ {
+ /* Insert a diff_ref for the common prefix. */
+ (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref));
+
+ (*diff_ref)->type = svn_diff__type_common;
+ (*diff_ref)->original_start = 0;
+ (*diff_ref)->original_length = prefix_lines;
+ (*diff_ref)->modified_start = 0;
+ (*diff_ref)->modified_length = prefix_lines;
+ (*diff_ref)->latest_start = 0;
+ (*diff_ref)->latest_length = prefix_lines;
+ (*diff_ref)->resolved_diff = NULL;
+
+ diff_ref = &(*diff_ref)->next;
+ }
+
/* Point the position lists to the start of the list
* so that common_diff/conflict detection actually is
* able to work.
@@ -330,7 +353,7 @@ svn_diff_diff3(svn_diff_t **diff,
}
else
{
- sentinel_position[0].offset = 1;
+ sentinel_position[0].offset = prefix_lines + 1;
sentinel_position[0].next = NULL;
position_list[1] = &sentinel_position[0];
}
@@ -344,7 +367,7 @@ svn_diff_diff3(svn_diff_t **diff,
}
else
{
- sentinel_position[1].offset = 1;
+ sentinel_position[1].offset = prefix_lines + 1;
sentinel_position[1].next = NULL;
position_list[2] = &sentinel_position[1];
}