Re: diff-optimizations-bytes: how to make diff3 work with prefix/suffix scanning
On Wed, Jan 5, 2011 at 12:11 PM, Julian Foad wrote: > On Sat, 2011-01-01, Johan Corveleyn wrote: >> [ Taking a privately-started discussion with danielsh to the list, in >> case others have inspiration/insight about this. Question at hand: I'm >> having trouble making diff3 work with prefix/suffix scanning of the >> diff-optimizations-bytes branch. Any feedback is highly appreciated >> :-). ] >> >> On Fri, Dec 31, 2010 at 2:38 PM, Daniel Shahaf >> wrote: >> [...snip...] >> > In diff3 with prefix enabled, I was seeing failures like this: >> > >> > EXPECTED: >> > line1 >> > <<< >> > line2 >> > === >> > line2 modified >> >> > >> > ACTUAL: >> > <<< >> > line1 >> > line2 >> > === >> > line1 >> > line2 modified >> >> > >> > so I assumed that when the prefix code gobbled the shared prefix. How >> > to fix it from there was purely guesswork on my part (and I haven't >> > implemented it). >> > >> > These failures would go away if I prepended a "line0 left" and "line0 >> > right" to the file. >> >> Yes, I know. That's indeed because the prefix-scanning code gobbled up >> the identical prefix. That problem is also there in diff2. >> >> In diff2, I fixed this by simply pre-pending a piece of "common" diff >> to the diff linked list, in the function diff.c#svn_diff__diff. >> >> >From r1037371: >> [[[ >> --- >> subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c >> (original) >> +++ >> subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c >> Sun Nov 21 02:36:07 2010 >> @@ -43,6 +43,22 @@ svn_diff__diff(svn_diff__lcs_t *lcs, >> svn_diff_t *diff; >> svn_diff_t **diff_ref = &diff; >> >> + if (want_common && (original_start > 1)) >> + { >> + /* we have a prefix to skip */ >> + (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref)); >> + >> + (*diff_ref)->type = svn_diff__type_common; >> + (*diff_ref)->original_start = 0; >> + (*diff_ref)->original_length = original_start - 1; >> + (*diff_ref)->modified_start = 0; >> + (*diff_ref)->modified_length = modified_start - 1; >> + (*diff_ref)->latest_start = 0; >> + (*diff_ref)->latest_length = 0; >> + >> + diff_ref = &(*diff_ref)->next; >> + } > > Hi Johan. > > I know this is work in progress but that whole "if" block of code is a > good example of where a comment above the block would be *really* useful > to tell the reader what it's for. :-) Ok, you're absolutely right. If I can't throw this block away (as suggested below), I'll add some more comments. > (Also, all that dereferencing is unnecessary in that particular block, > but I looked at the rest of the function and saw that that style is used > in two other blocks where it is not unnecessary. Personally I would > find the following style more readable throughout the function: > > { > svn_diff_t *new_diff = apr_palloc(pool, sizeof(*new_diff)); > new_diff->type = ... > ... > > *diff_ref = new_diff; > diff_ref = &new_diff->next; > } > ) Yep, that's definitely clearer. I just copy-pasted from the other similar blocks in that function. But you're right. Now, this style is also used in other similar places in diff3.c and diff4.c. So if it's decided that it's best to change this to make it more readable, I guess it should be changed in all those places (preferably on trunk, it's not specific to the diff-optimizations-bytes branch). >> while (1) >> { >> if (original_start < lcs->position[0]->offset >> ]]] >> >> Naively, I wanted to do the same for diff3 (along with some other >> tricks, to make the algorithm aware of the prefix_lines), but it >> doesn't work. See the patch in attachment. >> >> I mean: it works partly: diff-diff3-test.exe passes with the attached >> patch, but merge_reintegrate_tests.py (1, 2, 10, 13 and 14) and >> merge_tests.py (61) don't. I haven't studied the test failures in >> detail (I probably should, but I currently can't wrap my head around >> those tests). >> >> I'm now pondering another approach, to pre-pend an "lcs" piece to the >> lcs linked list, and let the rest of the algorithm do its work as >> normal. Inspiration for this came when I was reading the code of >> diff3.c#svn_diff__resolve_conflict, where a similar trick is used >> around line 121: >> [[[ >> /* If there were matching symbols at the start of >> * both sequences, record that fact. >> */ >> if (common_length > 0) >> { >> lcs = apr_palloc(subpool, sizeof(*lcs)); >> lcs->next = NULL; >> lcs->position[0] = start_position[0]; >> lcs->position[1] = start_position[1]; >> lcs->length = common_length; >> >> lcs_ref = &lcs->next; >> } >> ]]] >> >> Maybe I can even integrate this logic into lcs.c#svn_diff__lcs, where >> I'm already passing the prefix_lines as a new parameter. If that would >> work out, I could probably remove the special common-diff-injecting >> code in diff.c#svn_diff_
Re: diff-optimizations-bytes: how to make diff3 work with prefix/suffix scanning
On Sat, 2011-01-01, Johan Corveleyn wrote: > [ Taking a privately-started discussion with danielsh to the list, in > case others have inspiration/insight about this. Question at hand: I'm > having trouble making diff3 work with prefix/suffix scanning of the > diff-optimizations-bytes branch. Any feedback is highly appreciated > :-). ] > > On Fri, Dec 31, 2010 at 2:38 PM, Daniel Shahaf > wrote: > [...snip...] > > In diff3 with prefix enabled, I was seeing failures like this: > > > > EXPECTED: > > line1 > > <<< > > line2 > > === > > line2 modified > > > > > ACTUAL: > > <<< > > line1 > > line2 > > === > > line1 > > line2 modified > > > > > so I assumed that when the prefix code gobbled the shared prefix. How > > to fix it from there was purely guesswork on my part (and I haven't > > implemented it). > > > > These failures would go away if I prepended a "line0 left" and "line0 > > right" to the file. > > Yes, I know. That's indeed because the prefix-scanning code gobbled up > the identical prefix. That problem is also there in diff2. > > In diff2, I fixed this by simply pre-pending a piece of "common" diff > to the diff linked list, in the function diff.c#svn_diff__diff. > > >From r1037371: > [[[ > --- subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c > (original) > +++ subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c > Sun Nov 21 02:36:07 2010 > @@ -43,6 +43,22 @@ svn_diff__diff(svn_diff__lcs_t *lcs, > svn_diff_t *diff; > svn_diff_t **diff_ref = &diff; > > + if (want_common && (original_start > 1)) > +{ > + /* we have a prefix to skip */ > + (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref)); > + > + (*diff_ref)->type = svn_diff__type_common; > + (*diff_ref)->original_start = 0; > + (*diff_ref)->original_length = original_start - 1; > + (*diff_ref)->modified_start = 0; > + (*diff_ref)->modified_length = modified_start - 1; > + (*diff_ref)->latest_start = 0; > + (*diff_ref)->latest_length = 0; > + > + diff_ref = &(*diff_ref)->next; > +} Hi Johan. I know this is work in progress but that whole "if" block of code is a good example of where a comment above the block would be *really* useful to tell the reader what it's for. :-) (Also, all that dereferencing is unnecessary in that particular block, but I looked at the rest of the function and saw that that style is used in two other blocks where it is not unnecessary. Personally I would find the following style more readable throughout the function: { svn_diff_t *new_diff = apr_palloc(pool, sizeof(*new_diff)); new_diff->type = ... ... *diff_ref = new_diff; diff_ref = &new_diff->next; } ) > while (1) > { > if (original_start < lcs->position[0]->offset > ]]] > > Naively, I wanted to do the same for diff3 (along with some other > tricks, to make the algorithm aware of the prefix_lines), but it > doesn't work. See the patch in attachment. > > I mean: it works partly: diff-diff3-test.exe passes with the attached > patch, but merge_reintegrate_tests.py (1, 2, 10, 13 and 14) and > merge_tests.py (61) don't. I haven't studied the test failures in > detail (I probably should, but I currently can't wrap my head around > those tests). > > I'm now pondering another approach, to pre-pend an "lcs" piece to the > lcs linked list, and let the rest of the algorithm do its work as > normal. Inspiration for this came when I was reading the code of > diff3.c#svn_diff__resolve_conflict, where a similar trick is used > around line 121: > [[[ > /* If there were matching symbols at the start of > * both sequences, record that fact. > */ > if (common_length > 0) > { > lcs = apr_palloc(subpool, sizeof(*lcs)); > lcs->next = NULL; > lcs->position[0] = start_position[0]; > lcs->position[1] = start_position[1]; > lcs->length = common_length; > > lcs_ref = &lcs->next; > } > ]]] > > Maybe I can even integrate this logic into lcs.c#svn_diff__lcs, where > I'm already passing the prefix_lines as a new parameter. If that would > work out, I could probably remove the special common-diff-injecting > code in diff.c#svn_diff__diff. > > Thoughts? Just from reading what you say here, that last solution sounds much better. - Julian > It will take me some time/experimentation to work out the details, but > I think that should work ... > > Cheers,
diff-optimizations-bytes: how to make diff3 work with prefix/suffix scanning
[ Taking a privately-started discussion with danielsh to the list, in case others have inspiration/insight about this. Question at hand: I'm having trouble making diff3 work with prefix/suffix scanning of the diff-optimizations-bytes branch. Any feedback is highly appreciated :-). ] On Fri, Dec 31, 2010 at 2:38 PM, Daniel Shahaf wrote: [...snip...] > In diff3 with prefix enabled, I was seeing failures like this: > > EXPECTED: > line1 > <<< > line2 > === > line2 modified > > ACTUAL: > <<< > line1 > line2 > === > line1 > line2 modified > > so I assumed that when the prefix code gobbled the shared prefix. How > to fix it from there was purely guesswork on my part (and I haven't > implemented it). > > These failures would go away if I prepended a "line0 left" and "line0 > right" to the file. Yes, I know. That's indeed because the prefix-scanning code gobbled up the identical prefix. That problem is also there in diff2. In diff2, I fixed this by simply pre-pending a piece of "common" diff to the diff linked list, in the function diff.c#svn_diff__diff. >From r1037371: [[[ --- subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c (original) +++ subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c Sun Nov 21 02:36:07 2010 @@ -43,6 +43,22 @@ svn_diff__diff(svn_diff__lcs_t *lcs, svn_diff_t *diff; svn_diff_t **diff_ref = &diff; + if (want_common && (original_start > 1)) +{ + /* we have a prefix to skip */ + (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref)); + + (*diff_ref)->type = svn_diff__type_common; + (*diff_ref)->original_start = 0; + (*diff_ref)->original_length = original_start - 1; + (*diff_ref)->modified_start = 0; + (*diff_ref)->modified_length = modified_start - 1; + (*diff_ref)->latest_start = 0; + (*diff_ref)->latest_length = 0; + + diff_ref = &(*diff_ref)->next; +} + while (1) { if (original_start < lcs->position[0]->offset ]]] Naively, I wanted to do the same for diff3 (along with some other tricks, to make the algorithm aware of the prefix_lines), but it doesn't work. See the patch in attachment. I mean: it works partly: diff-diff3-test.exe passes with the attached patch, but merge_reintegrate_tests.py (1, 2, 10, 13 and 14) and merge_tests.py (61) don't. I haven't studied the test failures in detail (I probably should, but I currently can't wrap my head around those tests). I'm now pondering another approach, to pre-pend an "lcs" piece to the lcs linked list, and let the rest of the algorithm do its work as normal. Inspiration for this came when I was reading the code of diff3.c#svn_diff__resolve_conflict, where a similar trick is used around line 121: [[[ /* If there were matching symbols at the start of * both sequences, record that fact. */ if (common_length > 0) { lcs = apr_palloc(subpool, sizeof(*lcs)); lcs->next = NULL; lcs->position[0] = start_position[0]; lcs->position[1] = start_position[1]; lcs->length = common_length; lcs_ref = &lcs->next; } ]]] Maybe I can even integrate this logic into lcs.c#svn_diff__lcs, where I'm already passing the prefix_lines as a new parameter. If that would work out, I could probably remove the special common-diff-injecting code in diff.c#svn_diff__diff. Thoughts? It will take me some time/experimentation to work out the details, but I think that should work ... Cheers, -- Johan Index: subversion/libsvn_diff/diff3.c === --- subversion/libsvn_diff/diff3.c (revision 1041582) +++ subversion/libsvn_diff/diff3.c (working copy) @@ -251,10 +251,14 @@ svn_diff_diff3(svn_diff_t **diff, { svn_diff__tree_t *tree; svn_diff__position_t *position_list[3]; + svn_diff_datasource_e datasource[] = {svn_diff_datasource_original, +svn_diff_datasource_modified, +svn_diff_datasource_latest}; svn_diff__lcs_t *lcs_om; svn_diff__lcs_t *lcs_ol; apr_pool_t *subpool; apr_pool_t *treepool; + apr_off_t prefix_lines = 0; *diff = NULL; @@ -263,28 +267,30 @@ svn_diff_diff3(svn_diff_t **diff, svn_diff__tree_create(&tree, treepool); + SVN_ERR(vtable->datasources_open(diff_baton, &prefix_lines, datasource, 3)); + SVN_ERR(svn_diff__get_tokens(&position_list[0], tree, diff_baton, vtable, svn_diff_datasource_original, - FALSE, - 0, + TRUE, + prefix_lines, subpool)); SVN_ERR(svn_diff__get_tokens(&position_list[1], tree, diff_baton, vtable, svn_diff_data