[ Taking a privately-started discussion with danielsh to the list, in
case others have inspiration/insight about this. Question at hand: I'm
having trouble making diff3 work with prefix/suffix scanning of the
diff-optimizations-bytes branch. Any feedback is highly appreciated
:-). ]

On Fri, Dec 31, 2010 at 2:38 PM, Daniel Shahaf <d...@daniel.shahaf.name> wrote:
[...snip...]
> In diff3 with prefix enabled, I was seeing failures like this:
>
> EXPECTED:
> line1
> <<<
> line2
> ===
> line2 modified
>>>>
>
> ACTUAL:
> <<<
> line1
> line2
> ===
> line1
> line2 modified
>>>>
>
> so I assumed that when the prefix code gobbled the shared prefix.  How
> to fix it from there was purely guesswork on my part (and I haven't
> implemented it).
>
> These failures would go away if I prepended a "line0 left" and "line0
> right" to the file.

Yes, I know. That's indeed because the prefix-scanning code gobbled up
the identical prefix. That problem is also there in diff2.

In diff2, I fixed this by simply pre-pending a piece of "common" diff
to the diff linked list, in the function diff.c#svn_diff__diff.

>From r1037371:
[[[
--- subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
(original)
+++ subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
Sun Nov 21 02:36:07 2010
@@ -43,6 +43,22 @@ svn_diff__diff(svn_diff__lcs_t *lcs,
  svn_diff_t *diff;
  svn_diff_t **diff_ref = &diff;

+  if (want_common && (original_start > 1))
+    {
+      /* we have a prefix to skip */
+      (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref));
+
+      (*diff_ref)->type = svn_diff__type_common;
+      (*diff_ref)->original_start = 0;
+      (*diff_ref)->original_length = original_start - 1;
+      (*diff_ref)->modified_start = 0;
+      (*diff_ref)->modified_length = modified_start - 1;
+      (*diff_ref)->latest_start = 0;
+      (*diff_ref)->latest_length = 0;
+
+      diff_ref = &(*diff_ref)->next;
+    }
+
  while (1)
    {
      if (original_start < lcs->position[0]->offset
]]]

Naively, I wanted to do the same for diff3 (along with some other
tricks, to make the algorithm aware of the prefix_lines), but it
doesn't work. See the patch in attachment.

I mean: it works partly: diff-diff3-test.exe passes with the attached
patch, but merge_reintegrate_tests.py (1, 2, 10, 13 and 14) and
merge_tests.py (61) don't. I haven't studied the test failures in
detail (I probably should, but I currently can't wrap my head around
those tests).

I'm now pondering another approach, to pre-pend an "lcs" piece to the
lcs linked list, and let the rest of the algorithm do its work as
normal. Inspiration for this came when I was reading the code of
diff3.c#svn_diff__resolve_conflict, where a similar trick is used
around line 121:
[[[
    /* If there were matching symbols at the start of
     * both sequences, record that fact.
     */
    if (common_length > 0)
      {
        lcs = apr_palloc(subpool, sizeof(*lcs));
        lcs->next = NULL;
        lcs->position[0] = start_position[0];
        lcs->position[1] = start_position[1];
        lcs->length = common_length;

        lcs_ref = &lcs->next;
      }
]]]

Maybe I can even integrate this logic into lcs.c#svn_diff__lcs, where
I'm already passing the prefix_lines as a new parameter. If that would
work out, I could probably remove the special common-diff-injecting
code in diff.c#svn_diff__diff.

Thoughts?

It will take me some time/experimentation to work out the details, but
I think that should work ...

Cheers,
-- 
Johan
Index: subversion/libsvn_diff/diff3.c
===================================================================
--- subversion/libsvn_diff/diff3.c      (revision 1041582)
+++ subversion/libsvn_diff/diff3.c      (working copy)
@@ -251,10 +251,14 @@ svn_diff_diff3(svn_diff_t **diff,
 {
   svn_diff__tree_t *tree;
   svn_diff__position_t *position_list[3];
+  svn_diff_datasource_e datasource[] = {svn_diff_datasource_original,
+                                        svn_diff_datasource_modified,
+                                        svn_diff_datasource_latest};
   svn_diff__lcs_t *lcs_om;
   svn_diff__lcs_t *lcs_ol;
   apr_pool_t *subpool;
   apr_pool_t *treepool;
+  apr_off_t prefix_lines = 0;
 
   *diff = NULL;
 
@@ -263,28 +267,30 @@ svn_diff_diff3(svn_diff_t **diff,
 
   svn_diff__tree_create(&tree, treepool);
 
+  SVN_ERR(vtable->datasources_open(diff_baton, &prefix_lines, datasource, 3));
+
   SVN_ERR(svn_diff__get_tokens(&position_list[0],
                                tree,
                                diff_baton, vtable,
                                svn_diff_datasource_original,
-                               FALSE,
-                               0,
+                               TRUE,
+                               prefix_lines,
                                subpool));
 
   SVN_ERR(svn_diff__get_tokens(&position_list[1],
                                tree,
                                diff_baton, vtable,
                                svn_diff_datasource_modified,
-                               FALSE,
-                               0,
+                               TRUE,
+                               prefix_lines,
                                subpool));
 
   SVN_ERR(svn_diff__get_tokens(&position_list[2],
                                tree,
                                diff_baton, vtable,
                                svn_diff_datasource_latest,
-                               FALSE,
-                               0,
+                               TRUE,
+                               prefix_lines,
                                subpool));
 
   /* Get rid of the tokens, we don't need them to calc the diff */
@@ -295,18 +301,18 @@ svn_diff_diff3(svn_diff_t **diff,
   svn_pool_destroy(treepool);
 
   /* Get the lcs for original-modified and original-latest */
-  lcs_om = svn_diff__lcs(position_list[0], position_list[1], 0,
+  lcs_om = svn_diff__lcs(position_list[0], position_list[1], prefix_lines,
                          subpool);
-  lcs_ol = svn_diff__lcs(position_list[0], position_list[2], 0,
+  lcs_ol = svn_diff__lcs(position_list[0], position_list[2], prefix_lines,
                          subpool);
 
   /* Produce a merged diff */
   {
     svn_diff_t **diff_ref = diff;
 
-    apr_off_t original_start = 1;
-    apr_off_t modified_start = 1;
-    apr_off_t latest_start = 1;
+    apr_off_t original_start = prefix_lines + 1;
+    apr_off_t modified_start = prefix_lines + 1;
+    apr_off_t latest_start = prefix_lines + 1;
     apr_off_t original_sync;
     apr_off_t modified_sync;
     apr_off_t latest_sync;
@@ -317,6 +323,23 @@ svn_diff_diff3(svn_diff_t **diff,
     svn_boolean_t is_latest;
     svn_diff__position_t sentinel_position[2];
 
+    if (prefix_lines > 0)
+      {
+        /* Insert a diff_ref for the common prefix. */
+        (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref));
+
+        (*diff_ref)->type = svn_diff__type_common;
+        (*diff_ref)->original_start = 0;
+        (*diff_ref)->original_length = prefix_lines;
+        (*diff_ref)->modified_start = 0;
+        (*diff_ref)->modified_length = prefix_lines;
+        (*diff_ref)->latest_start = 0;
+        (*diff_ref)->latest_length = prefix_lines;
+        (*diff_ref)->resolved_diff = NULL;
+
+        diff_ref = &(*diff_ref)->next;        
+      }
+
     /* Point the position lists to the start of the list
      * so that common_diff/conflict detection actually is
      * able to work.
@@ -330,7 +353,7 @@ svn_diff_diff3(svn_diff_t **diff,
       }
     else
       {
-        sentinel_position[0].offset = 1;
+        sentinel_position[0].offset = prefix_lines + 1;
         sentinel_position[0].next = NULL;
         position_list[1] = &sentinel_position[0];
       }
@@ -344,7 +367,7 @@ svn_diff_diff3(svn_diff_t **diff,
       }
     else
       {
-        sentinel_position[1].offset = 1;
+        sentinel_position[1].offset = prefix_lines + 1;
         sentinel_position[1].next = NULL;
         position_list[2] = &sentinel_position[1];
       }

Reply via email to