Re: diff -e: mishandling of bare dots

2016-02-27 Thread Martin Natano
Last chance to voice objections. I plan to commit this in the next days.

natano


> Index: diffreg.c
> ===
> RCS file: /cvs/src/usr.bin/diff/diffreg.c,v
> retrieving revision 1.90
> diff -u -p -r1.90 diffreg.c
> --- diffreg.c 26 Oct 2015 12:52:27 -  1.90
> +++ diffreg.c 21 Feb 2016 16:01:43 -
> @@ -1075,8 +1075,9 @@ proceed:
>* back and restart where we left off.
>*/
>   diff_output(".\n");
> - diff_output("%ds/.//\n", a);
> - a += i;
> + diff_output("%ds/.//\n", a + i - 1);
> + b = a + i - 1;
> + a = b + 1;
>   c += i;
>   goto restart;
>   }
> 



Re: diff -e: mishandling of bare dots

2016-02-21 Thread Tobias Stoeckmann
On Sun, Feb 21, 2016 at 05:15:34PM +0100, Martin Natano wrote:
> While testing your suggestion with the example I run into a bug with
> ed-style diff handling in patch(1), resulting in incorrect output
> (fix in the works). Applying the generated diff with ed(1) works fine,
> though.

Yeah, patch has its own ed handling now. And its substitution only
supports s/.// the way it was before (cheating in handling "a"): 

> - diff_output("%ds/.//\n", a);

With this change, the handling of "cline->subst" in ed.c must be adjusted,
which won't be difficult.

Sounds like Martin is already on his way, so I'm lurking to review his
upcoming diff. ;)


Tobias



Re: diff -e: mishandling of bare dots

2016-02-21 Thread Martin Natano
> Hmm, something still does not seem right. For example, try
> diff -e on the contents below (some other tests attached):
> 
> A
> B
> C
> 
> vs.
> 
> W
> X
> .
> Y
> Z
> 
> [...]
> 
> About the changes below, is something like this the idea behind it?
> The first change command has already removed lines [a,b] and replaced them
> with lines [c, c + i]. So for the rest of the lines in [c + i + 1, d], we
> need to make sure that they are appended after the line that follows the
> inserted '.'
> 
> Can't we set b = a + i - 1 and a = b + 1 to make sure that the rest
> of the 'to' file is appended after the proper line?

Yes, that's a good solution and it fixes the issue with the example
above. Updated diff below.

While testing your suggestion with the example I run into a bug with
ed-style diff handling in patch(1), resulting in incorrect output
(fix in the works). Applying the generated diff with ed(1) works fine,
though.

natano


Index: diffreg.c
===
RCS file: /cvs/src/usr.bin/diff/diffreg.c,v
retrieving revision 1.90
diff -u -p -r1.90 diffreg.c
--- diffreg.c   26 Oct 2015 12:52:27 -  1.90
+++ diffreg.c   21 Feb 2016 16:01:43 -
@@ -1075,8 +1075,9 @@ proceed:
 * back and restart where we left off.
 */
diff_output(".\n");
-   diff_output("%ds/.//\n", a);
-   a += i;
+   diff_output("%ds/.//\n", a + i - 1);
+   b = a + i - 1;
+   a = b + 1;
c += i;
goto restart;
}



Re: diff -e: mishandling of bare dots

2016-02-21 Thread Stefan Kempf
Martin Natano wrote:
> When diff encounters a line that consists of a single dot, it emits two
> dots instead, stops the current command and emits a substitute command
> to replace the double dot with a single one. Then it restarts the
> (original) command if necessary and inserts further lines. This is done
> because a single dot on a line does have special meaning in ed. (It
> stops text insertion.)
> 
> However, there are multiple issues with the current implementation,
> resulting in mangled output:
> 
> - The line number for the substitute command should be the number of the
> most recently inserted line. diff instead uses the number of the first
> inserted line of the current hunk. The first character of that line is
> removed when applying the diff, while the superfluous dot is not.
> 
> - The line number of the restarted command is not adjusted for the
> number of lines already inserted, resulting in the reordering of lines..
> 
> - When there is a bare dot in the replacement text of a change command,
> too many lines are deleted, because a second change command is emitted.
> An append command should be emitted instead, because the target lines
> have already been removed by the first change command.
> 
> The following patch fixes all those issues.

Hmm, something still does not seem right. For example, try
diff -e on the contents below (some other tests attached):

A
B
C

vs.

W
X
.
Y
Z

> Index: diffreg.c
> ===
> RCS file: /cvs/src/usr.bin/diff/diffreg.c,v
> retrieving revision 1.90
> diff -u -p -r1.90 diffreg.c
> --- diffreg.c 26 Oct 2015 12:52:27 -  1.90
> +++ diffreg.c 13 Feb 2016 16:35:08 -
> @@ -1075,8 +1075,12 @@ proceed:
>* back and restart where we left off.
>*/
>   diff_output(".\n");
> - diff_output("%ds/.//\n", a);
> + diff_output("%ds/.//\n", a + i - 1);

This I understand. We really need to substitute the line that contains
the ..

About the changes below, is something like this the idea behind it?
The first change command has already removed lines [a,b] and replaced them
with lines [c, c + i]. So for the rest of the lines in [c + i + 1, d], we
need to make sure that they are appended after the line that follows the
inserted '.'

Can't we set b = a + i - 1 and a = b + 1 to make sure that the rest
of the 'to' file is appended after the proper line?

>   a += i;
> + if (a > b)
> + b += i;
> + else
> + b = a - 1;
>   c += i;
>   goto restart;
>   }

W
X
.
Y
Z
A
W
X
.
Y
Z
A
B
C
W
X
.
Y
Z
A
B
C
W
X
.
Y
Z
.


diff -e: mishandling of bare dots

2016-02-13 Thread Martin Natano
When diff encounters a line that consists of a single dot, it emits two
dots instead, stops the current command and emits a substitute command
to replace the double dot with a single one. Then it restarts the
(original) command if necessary and inserts further lines. This is done
because a single dot on a line does have special meaning in ed. (It
stops text insertion.)

However, there are multiple issues with the current implementation,
resulting in mangled output:

- The line number for the substitute command should be the number of the
most recently inserted line. diff instead uses the number of the first
inserted line of the current hunk. The first character of that line is
removed when applying the diff, while the superfluous dot is not.

- The line number of the restarted command is not adjusted for the
number of lines already inserted, resulting in the reordering of lines..

- When there is a bare dot in the replacement text of a change command,
too many lines are deleted, because a second change command is emitted.
An append command should be emitted instead, because the target lines
have already been removed by the first change command.

The following patch fixes all those issues.

Index: diffreg.c
===
RCS file: /cvs/src/usr.bin/diff/diffreg.c,v
retrieving revision 1.90
diff -u -p -r1.90 diffreg.c
--- diffreg.c   26 Oct 2015 12:52:27 -  1.90
+++ diffreg.c   13 Feb 2016 16:35:08 -
@@ -1075,8 +1075,12 @@ proceed:
 * back and restart where we left off.
 */
diff_output(".\n");
-   diff_output("%ds/.//\n", a);
+   diff_output("%ds/.//\n", a + i - 1);
a += i;
+   if (a > b)
+   b += i;
+   else
+   b = a - 1;
c += i;
goto restart;
}

natano