>Synopsis:      sed newline bugs
>Category:      user
>Environment:
        System      : OpenBSD 5.3
        Details     : OpenBSD 5.3 (GENERIC.MP) #58: Tue Mar 12 18:43:53 MDT 2013
                         
dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP

        Architecture: OpenBSD.i386
        Machine     : i386
>Description:
        There are several bugs in the way that the sed utility handles
        newline characters.  In particular, the sed utility contains the
        following bugs.
        
        (BUG 1) The POSIX standard at
        
         http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html
        
        states that "Whenever the pattern space is written to standard
        output or a named file, sed shall immediately follow it with a
        <newline>." But OpenBSD's sed utility currently violates this
        requirement.  For example, the output of the following command
        is not terminated with a newline.
        
         printf line1 | sed -n -e p
        
        (BUG 2) The POSIX standard states that the sed H command will
        "Append to the hold space a <newline> followed by the contents
        of the pattern space."  But when the hold space is empty,
        OpenBSD's sed utility fails to append a newline character.  For
        example, the output of the following command
        
         echo line1 | sed -n -e H -e g -e P
        
        should be a single empty line, but OpenBSD outputs the line
        
         line1
        
        More generally, this bug occurs whenever the hold space is not
        terminated with a newline character.  For example, the output of
        the following command
        
         printf line1 | sed -e h -e H -e g
        
        should be
        
         line1
         line1
        
        but OpenBSD outputs
        
         line1line1
        
        (BUG 3) Similarly, the POSIX standard states that the sed G
        command will "Append to the pattern space a <newline> followed
        by the contents of the hold space."  But when the pattern space
        is empty, OpenBSD's sed utility fails to append a newline
        character.  For example, the output of the following command
        
         echo line1 | sed -e g -e G | wc -l
        
        should be 2, but OpenBSD outputs 1.  More generally, this bug
        occurs whenever the pattern space is not terminated with a
        newline character.  For example, the output of the following
        command
        
         printf 'line1\nline2' | sed -n -e 1h -e 2G -e 2p
        
        should be the pair of lines
        
         line2
         line1
        
        but OpenBSD outputs the single line
        
         line2line1

>How-To-Repeat:
        See the bug descriptions, above. 

>Fix:
        The ultimate problem is that the code for the sed utility is
        schizophrenic.  Some portions of the code appear to have been
        written with the following assumption.
        
          (ASSUMPTION 1) The pattern and hold spaces must always be
          terminated with newline characters.
        
        But other portions of the code appear to have been written with
        the following assumption.
        
         (ASSUMPTION 2) The pattern and hold spaces are not required to
         terminate with newline characters.
        
        When these two assumptions conflict, a bug occurs.
        
        Now, there are two obvious solutions.  First, the code could be
        rewritten under the assumption that the pattern and hold spaces
        are NEVER terminated with newline characters.  In fact, this is
        the solution that was chosen by the FreeBSD project.  See
        
         http://svnweb.freebsd.org/base?view=revision&revision=98601
        
        Unfortunately, FreeBSD's first attempt to implement this
        solution led to the introduction of new bugs, such as the
        following.
        
         http://svnweb.freebsd.org/base?view=revision&revision=98743
         http://svnweb.freebsd.org/base?view=revision&revision=101668
        
        An alternative solution is to rewrite sed's code so that
        ASSUMPTION 1 is strictly followed.  The patches below implement
        this solution for OpenBSD 5.3.

--- usr.bin/sed/main.c.orig     Thu Oct 29 04:34:07 2009
+++ usr.bin/sed/main.c  Fri Jul 26 21:57:56 2013
@@ -302,6 +302,13 @@
                err(FATAL, "%s: %s", fname, strerror(errno ? errno : EIO));
        cspace(sp, p, len, spflag);
 
+       /*
+        * The pattern space must always be terminated with a newline
+        * character.
+        */
+       if (p[len - 1] != '\n')
+               cspace(sp, "\n", 1, APPEND);
+
        linenum++;
        /* Advance to next non-empty file */
        while ((c = getc(f)) == EOF) {
--- usr.bin/sed/process.c.orig  Tue Nov 15 04:16:31 2011
+++ usr.bin/sed/process.c       Fri Jul 26 21:58:03 2013
@@ -86,6 +86,12 @@
        size_t len, oldpsl;
        char *p;
 
+       /*
+        * Initialize the hold space.  The hold space must always be terminated
+        * with a newline character.
+        */
+       cspace(&HS, "\n", 1, REPLACE);
+
        for (linenum = 0; mf_fgets(&PS, REPLACE);) {
                pd = 0;
 top:
@@ -117,7 +123,7 @@
                                goto redirect;
                        case 'c':
                                pd = 1;
-                               psl = 0;
+                               cspace(&PS, "\n", 1, REPLACE);
                                if (cp->a2 == NULL || lastaddr)
                                        (void)printf("%s", cp->t);
                                break;
@@ -127,8 +133,7 @@
                        case 'D':
                                if (pd)
                                        goto new;
-                               if (psl == 0 ||
-                                   (p = memchr(ps, '\n', psl - 1)) == NULL) {
+                               if ((p = memchr(ps, '\n', psl - 1)) == NULL) {
                                        pd = 1;
                                        goto new;
                                } else {
@@ -140,15 +145,13 @@
                                cspace(&PS, hs, hsl, REPLACE);
                                break;
                        case 'G':
-                               if (hs == NULL)
-                                       cspace(&HS, "\n", 1, REPLACE);
-                               cspace(&PS, hs, hsl, 0);
+                               cspace(&PS, hs, hsl, APPEND);
                                break;
                        case 'h':
                                cspace(&HS, ps, psl, REPLACE);
                                break;
                        case 'H':
-                               cspace(&HS, ps, psl, 0);
+                               cspace(&HS, ps, psl, APPEND);
                                break;
                        case 'i':
                                (void)printf("%s", cp->t);
@@ -180,8 +183,7 @@
                        case 'P':
                                if (pd)
                                        break;
-                               if (psl != 0 &&
-                                   (p = memchr(ps, '\n', psl - 1)) != NULL) {
+                               if ((p = memchr(ps, '\n', psl - 1)) != NULL) {
                                        oldpsl = psl;
                                        psl = (p + 1) - ps;
                                }
@@ -227,8 +229,6 @@
                                            cp->t, strerror(errno));
                                break;
                        case 'x':
-                               if (hs == NULL)
-                                       cspace(&HS, "\n", 1, REPLACE);
                                tspace = PS;
                                PS = HS;
                                HS = tspace;

Reply via email to