Re: diff-optimizations-bytes: how to make diff3 work with prefix/suffix scanning

2011-01-05 Thread Johan Corveleyn
On Wed, Jan 5, 2011 at 12:11 PM, Julian Foad  wrote:
> On Sat, 2011-01-01, Johan Corveleyn wrote:
>> [ Taking a privately-started discussion with danielsh to the list, in
>> case others have inspiration/insight about this. Question at hand: I'm
>> having trouble making diff3 work with prefix/suffix scanning of the
>> diff-optimizations-bytes branch. Any feedback is highly appreciated
>> :-). ]
>>
>> On Fri, Dec 31, 2010 at 2:38 PM, Daniel Shahaf  
>> wrote:
>> [...snip...]
>> > In diff3 with prefix enabled, I was seeing failures like this:
>> >
>> > EXPECTED:
>> > line1
>> > <<<
>> > line2
>> > ===
>> > line2 modified
>> 
>> >
>> > ACTUAL:
>> > <<<
>> > line1
>> > line2
>> > ===
>> > line1
>> > line2 modified
>> 
>> >
>> > so I assumed that when the prefix code gobbled the shared prefix.  How
>> > to fix it from there was purely guesswork on my part (and I haven't
>> > implemented it).
>> >
>> > These failures would go away if I prepended a "line0 left" and "line0
>> > right" to the file.
>>
>> Yes, I know. That's indeed because the prefix-scanning code gobbled up
>> the identical prefix. That problem is also there in diff2.
>>
>> In diff2, I fixed this by simply pre-pending a piece of "common" diff
>> to the diff linked list, in the function diff.c#svn_diff__diff.
>>
>> >From r1037371:
>> [[[
>> --- 
>> subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
>> (original)
>> +++ 
>> subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
>> Sun Nov 21 02:36:07 2010
>> @@ -43,6 +43,22 @@ svn_diff__diff(svn_diff__lcs_t *lcs,
>>   svn_diff_t *diff;
>>   svn_diff_t **diff_ref = &diff;
>>
>> +  if (want_common && (original_start > 1))
>> +    {
>> +      /* we have a prefix to skip */
>> +      (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref));
>> +
>> +      (*diff_ref)->type = svn_diff__type_common;
>> +      (*diff_ref)->original_start = 0;
>> +      (*diff_ref)->original_length = original_start - 1;
>> +      (*diff_ref)->modified_start = 0;
>> +      (*diff_ref)->modified_length = modified_start - 1;
>> +      (*diff_ref)->latest_start = 0;
>> +      (*diff_ref)->latest_length = 0;
>> +
>> +      diff_ref = &(*diff_ref)->next;
>> +    }
>
> Hi Johan.
>
> I know this is work in progress but that whole "if" block of code is a
> good example of where a comment above the block would be *really* useful
> to tell the reader what it's for.  :-)

Ok, you're absolutely right. If I can't throw this block away (as
suggested below), I'll add some more comments.

> (Also, all that dereferencing is unnecessary in that particular block,
> but I looked at the rest of the function and saw that that style is used
> in two other blocks where it is not unnecessary.  Personally I would
> find the following style more readable throughout the function:
>
>       {
>         svn_diff_t *new_diff = apr_palloc(pool, sizeof(*new_diff));
>         new_diff->type = ...
>         ...
>
>         *diff_ref = new_diff;
>         diff_ref = &new_diff->next;
>       }
> )

Yep, that's definitely clearer. I just copy-pasted from the other
similar blocks in that function. But you're right.

Now, this style is also used in other similar places in diff3.c and
diff4.c. So if it's decided that it's best to change this to make it
more readable, I guess it should be changed in all those places
(preferably on trunk, it's not specific to the
diff-optimizations-bytes branch).

>>   while (1)
>>     {
>>       if (original_start < lcs->position[0]->offset
>> ]]]
>>
>> Naively, I wanted to do the same for diff3 (along with some other
>> tricks, to make the algorithm aware of the prefix_lines), but it
>> doesn't work. See the patch in attachment.
>>
>> I mean: it works partly: diff-diff3-test.exe passes with the attached
>> patch, but merge_reintegrate_tests.py (1, 2, 10, 13 and 14) and
>> merge_tests.py (61) don't. I haven't studied the test failures in
>> detail (I probably should, but I currently can't wrap my head around
>> those tests).
>>
>> I'm now pondering another approach, to pre-pend an "lcs" piece to the
>> lcs linked list, and let the rest of the algorithm do its work as
>> normal. Inspiration for this came when I was reading the code of
>> diff3.c#svn_diff__resolve_conflict, where a similar trick is used
>> around line 121:
>> [[[
>>     /* If there were matching symbols at the start of
>>      * both sequences, record that fact.
>>      */
>>     if (common_length > 0)
>>       {
>>         lcs = apr_palloc(subpool, sizeof(*lcs));
>>         lcs->next = NULL;
>>         lcs->position[0] = start_position[0];
>>         lcs->position[1] = start_position[1];
>>         lcs->length = common_length;
>>
>>         lcs_ref = &lcs->next;
>>       }
>> ]]]
>>
>> Maybe I can even integrate this logic into lcs.c#svn_diff__lcs, where
>> I'm already passing the prefix_lines as a new parameter. If that would
>> work out, I could probably remove the special common-diff-injecting
>> code in diff.c#svn_diff_

Re: diff-optimizations-bytes: how to make diff3 work with prefix/suffix scanning

2011-01-05 Thread Julian Foad
On Sat, 2011-01-01, Johan Corveleyn wrote:
> [ Taking a privately-started discussion with danielsh to the list, in
> case others have inspiration/insight about this. Question at hand: I'm
> having trouble making diff3 work with prefix/suffix scanning of the
> diff-optimizations-bytes branch. Any feedback is highly appreciated
> :-). ]
> 
> On Fri, Dec 31, 2010 at 2:38 PM, Daniel Shahaf  
> wrote:
> [...snip...]
> > In diff3 with prefix enabled, I was seeing failures like this:
> >
> > EXPECTED:
> > line1
> > <<<
> > line2
> > ===
> > line2 modified
> 
> >
> > ACTUAL:
> > <<<
> > line1
> > line2
> > ===
> > line1
> > line2 modified
> 
> >
> > so I assumed that when the prefix code gobbled the shared prefix.  How
> > to fix it from there was purely guesswork on my part (and I haven't
> > implemented it).
> >
> > These failures would go away if I prepended a "line0 left" and "line0
> > right" to the file.
> 
> Yes, I know. That's indeed because the prefix-scanning code gobbled up
> the identical prefix. That problem is also there in diff2.
> 
> In diff2, I fixed this by simply pre-pending a piece of "common" diff
> to the diff linked list, in the function diff.c#svn_diff__diff.
> 
> >From r1037371:
> [[[
> --- subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
> (original)
> +++ subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
> Sun Nov 21 02:36:07 2010
> @@ -43,6 +43,22 @@ svn_diff__diff(svn_diff__lcs_t *lcs,
>   svn_diff_t *diff;
>   svn_diff_t **diff_ref = &diff;
> 
> +  if (want_common && (original_start > 1))
> +{
> +  /* we have a prefix to skip */
> +  (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref));
> +
> +  (*diff_ref)->type = svn_diff__type_common;
> +  (*diff_ref)->original_start = 0;
> +  (*diff_ref)->original_length = original_start - 1;
> +  (*diff_ref)->modified_start = 0;
> +  (*diff_ref)->modified_length = modified_start - 1;
> +  (*diff_ref)->latest_start = 0;
> +  (*diff_ref)->latest_length = 0;
> +
> +  diff_ref = &(*diff_ref)->next;
> +}

Hi Johan.

I know this is work in progress but that whole "if" block of code is a
good example of where a comment above the block would be *really* useful
to tell the reader what it's for.  :-)

(Also, all that dereferencing is unnecessary in that particular block,
but I looked at the rest of the function and saw that that style is used
in two other blocks where it is not unnecessary.  Personally I would
find the following style more readable throughout the function:

   {
 svn_diff_t *new_diff = apr_palloc(pool, sizeof(*new_diff));
 new_diff->type = ...
 ...

 *diff_ref = new_diff;
 diff_ref = &new_diff->next;
   }
)

>   while (1)
> {
>   if (original_start < lcs->position[0]->offset
> ]]]
> 
> Naively, I wanted to do the same for diff3 (along with some other
> tricks, to make the algorithm aware of the prefix_lines), but it
> doesn't work. See the patch in attachment.
> 
> I mean: it works partly: diff-diff3-test.exe passes with the attached
> patch, but merge_reintegrate_tests.py (1, 2, 10, 13 and 14) and
> merge_tests.py (61) don't. I haven't studied the test failures in
> detail (I probably should, but I currently can't wrap my head around
> those tests).
> 
> I'm now pondering another approach, to pre-pend an "lcs" piece to the
> lcs linked list, and let the rest of the algorithm do its work as
> normal. Inspiration for this came when I was reading the code of
> diff3.c#svn_diff__resolve_conflict, where a similar trick is used
> around line 121:
> [[[
> /* If there were matching symbols at the start of
>  * both sequences, record that fact.
>  */
> if (common_length > 0)
>   {
> lcs = apr_palloc(subpool, sizeof(*lcs));
> lcs->next = NULL;
> lcs->position[0] = start_position[0];
> lcs->position[1] = start_position[1];
> lcs->length = common_length;
> 
> lcs_ref = &lcs->next;
>   }
> ]]]
> 
> Maybe I can even integrate this logic into lcs.c#svn_diff__lcs, where
> I'm already passing the prefix_lines as a new parameter. If that would
> work out, I could probably remove the special common-diff-injecting
> code in diff.c#svn_diff__diff.
> 
> Thoughts?

Just from reading what you say here, that last solution sounds much
better.

- Julian


> It will take me some time/experimentation to work out the details, but
> I think that should work ...
> 
> Cheers,




diff-optimizations-bytes: how to make diff3 work with prefix/suffix scanning

2011-01-01 Thread Johan Corveleyn
[ Taking a privately-started discussion with danielsh to the list, in
case others have inspiration/insight about this. Question at hand: I'm
having trouble making diff3 work with prefix/suffix scanning of the
diff-optimizations-bytes branch. Any feedback is highly appreciated
:-). ]

On Fri, Dec 31, 2010 at 2:38 PM, Daniel Shahaf  wrote:
[...snip...]
> In diff3 with prefix enabled, I was seeing failures like this:
>
> EXPECTED:
> line1
> <<<
> line2
> ===
> line2 modified

>
> ACTUAL:
> <<<
> line1
> line2
> ===
> line1
> line2 modified

>
> so I assumed that when the prefix code gobbled the shared prefix.  How
> to fix it from there was purely guesswork on my part (and I haven't
> implemented it).
>
> These failures would go away if I prepended a "line0 left" and "line0
> right" to the file.

Yes, I know. That's indeed because the prefix-scanning code gobbled up
the identical prefix. That problem is also there in diff2.

In diff2, I fixed this by simply pre-pending a piece of "common" diff
to the diff linked list, in the function diff.c#svn_diff__diff.

>From r1037371:
[[[
--- subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
(original)
+++ subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
Sun Nov 21 02:36:07 2010
@@ -43,6 +43,22 @@ svn_diff__diff(svn_diff__lcs_t *lcs,
  svn_diff_t *diff;
  svn_diff_t **diff_ref = &diff;

+  if (want_common && (original_start > 1))
+{
+  /* we have a prefix to skip */
+  (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref));
+
+  (*diff_ref)->type = svn_diff__type_common;
+  (*diff_ref)->original_start = 0;
+  (*diff_ref)->original_length = original_start - 1;
+  (*diff_ref)->modified_start = 0;
+  (*diff_ref)->modified_length = modified_start - 1;
+  (*diff_ref)->latest_start = 0;
+  (*diff_ref)->latest_length = 0;
+
+  diff_ref = &(*diff_ref)->next;
+}
+
  while (1)
{
  if (original_start < lcs->position[0]->offset
]]]

Naively, I wanted to do the same for diff3 (along with some other
tricks, to make the algorithm aware of the prefix_lines), but it
doesn't work. See the patch in attachment.

I mean: it works partly: diff-diff3-test.exe passes with the attached
patch, but merge_reintegrate_tests.py (1, 2, 10, 13 and 14) and
merge_tests.py (61) don't. I haven't studied the test failures in
detail (I probably should, but I currently can't wrap my head around
those tests).

I'm now pondering another approach, to pre-pend an "lcs" piece to the
lcs linked list, and let the rest of the algorithm do its work as
normal. Inspiration for this came when I was reading the code of
diff3.c#svn_diff__resolve_conflict, where a similar trick is used
around line 121:
[[[
/* If there were matching symbols at the start of
 * both sequences, record that fact.
 */
if (common_length > 0)
  {
lcs = apr_palloc(subpool, sizeof(*lcs));
lcs->next = NULL;
lcs->position[0] = start_position[0];
lcs->position[1] = start_position[1];
lcs->length = common_length;

lcs_ref = &lcs->next;
  }
]]]

Maybe I can even integrate this logic into lcs.c#svn_diff__lcs, where
I'm already passing the prefix_lines as a new parameter. If that would
work out, I could probably remove the special common-diff-injecting
code in diff.c#svn_diff__diff.

Thoughts?

It will take me some time/experimentation to work out the details, but
I think that should work ...

Cheers,
-- 
Johan
Index: subversion/libsvn_diff/diff3.c
===
--- subversion/libsvn_diff/diff3.c  (revision 1041582)
+++ subversion/libsvn_diff/diff3.c  (working copy)
@@ -251,10 +251,14 @@ svn_diff_diff3(svn_diff_t **diff,
 {
   svn_diff__tree_t *tree;
   svn_diff__position_t *position_list[3];
+  svn_diff_datasource_e datasource[] = {svn_diff_datasource_original,
+svn_diff_datasource_modified,
+svn_diff_datasource_latest};
   svn_diff__lcs_t *lcs_om;
   svn_diff__lcs_t *lcs_ol;
   apr_pool_t *subpool;
   apr_pool_t *treepool;
+  apr_off_t prefix_lines = 0;
 
   *diff = NULL;
 
@@ -263,28 +267,30 @@ svn_diff_diff3(svn_diff_t **diff,
 
   svn_diff__tree_create(&tree, treepool);
 
+  SVN_ERR(vtable->datasources_open(diff_baton, &prefix_lines, datasource, 3));
+
   SVN_ERR(svn_diff__get_tokens(&position_list[0],
tree,
diff_baton, vtable,
svn_diff_datasource_original,
-   FALSE,
-   0,
+   TRUE,
+   prefix_lines,
subpool));
 
   SVN_ERR(svn_diff__get_tokens(&position_list[1],
tree,
diff_baton, vtable,
svn_diff_data