>Synopsis: sed newline bugs >Category: user >Environment: System : OpenBSD 5.3 Details : OpenBSD 5.3 (GENERIC.MP) #58: Tue Mar 12 18:43:53 MDT 2013 dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP
Architecture: OpenBSD.i386 Machine : i386 >Description: There are several bugs in the way that the sed utility handles newline characters. In particular, the sed utility contains the following bugs. (BUG 1) The POSIX standard at http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html states that "Whenever the pattern space is written to standard output or a named file, sed shall immediately follow it with a <newline>." But OpenBSD's sed utility currently violates this requirement. For example, the output of the following command is not terminated with a newline. printf line1 | sed -n -e p (BUG 2) The POSIX standard states that the sed H command will "Append to the hold space a <newline> followed by the contents of the pattern space." But when the hold space is empty, OpenBSD's sed utility fails to append a newline character. For example, the output of the following command echo line1 | sed -n -e H -e g -e P should be a single empty line, but OpenBSD outputs the line line1 More generally, this bug occurs whenever the hold space is not terminated with a newline character. For example, the output of the following command printf line1 | sed -e h -e H -e g should be line1 line1 but OpenBSD outputs line1line1 (BUG 3) Similarly, the POSIX standard states that the sed G command will "Append to the pattern space a <newline> followed by the contents of the hold space." But when the pattern space is empty, OpenBSD's sed utility fails to append a newline character. For example, the output of the following command echo line1 | sed -e g -e G | wc -l should be 2, but OpenBSD outputs 1. More generally, this bug occurs whenever the pattern space is not terminated with a newline character. For example, the output of the following command printf 'line1\nline2' | sed -n -e 1h -e 2G -e 2p should be the pair of lines line2 line1 but OpenBSD outputs the single line line2line1 >How-To-Repeat: See the bug descriptions, above. >Fix: The ultimate problem is that the code for the sed utility is schizophrenic. Some portions of the code appear to have been written with the following assumption. (ASSUMPTION 1) The pattern and hold spaces must always be terminated with newline characters. But other portions of the code appear to have been written with the following assumption. (ASSUMPTION 2) The pattern and hold spaces are not required to terminate with newline characters. When these two assumptions conflict, a bug occurs. Now, there are two obvious solutions. First, the code could be rewritten under the assumption that the pattern and hold spaces are NEVER terminated with newline characters. In fact, this is the solution that was chosen by the FreeBSD project. See http://svnweb.freebsd.org/base?view=revision&revision=98601 Unfortunately, FreeBSD's first attempt to implement this solution led to the introduction of new bugs, such as the following. http://svnweb.freebsd.org/base?view=revision&revision=98743 http://svnweb.freebsd.org/base?view=revision&revision=101668 An alternative solution is to rewrite sed's code so that ASSUMPTION 1 is strictly followed. The patches below implement this solution for OpenBSD 5.3. --- usr.bin/sed/main.c.orig Thu Oct 29 04:34:07 2009 +++ usr.bin/sed/main.c Fri Jul 26 21:57:56 2013 @@ -302,6 +302,13 @@ err(FATAL, "%s: %s", fname, strerror(errno ? errno : EIO)); cspace(sp, p, len, spflag); + /* + * The pattern space must always be terminated with a newline + * character. + */ + if (p[len - 1] != '\n') + cspace(sp, "\n", 1, APPEND); + linenum++; /* Advance to next non-empty file */ while ((c = getc(f)) == EOF) { --- usr.bin/sed/process.c.orig Tue Nov 15 04:16:31 2011 +++ usr.bin/sed/process.c Fri Jul 26 21:58:03 2013 @@ -86,6 +86,12 @@ size_t len, oldpsl; char *p; + /* + * Initialize the hold space. The hold space must always be terminated + * with a newline character. + */ + cspace(&HS, "\n", 1, REPLACE); + for (linenum = 0; mf_fgets(&PS, REPLACE);) { pd = 0; top: @@ -117,7 +123,7 @@ goto redirect; case 'c': pd = 1; - psl = 0; + cspace(&PS, "\n", 1, REPLACE); if (cp->a2 == NULL || lastaddr) (void)printf("%s", cp->t); break; @@ -127,8 +133,7 @@ case 'D': if (pd) goto new; - if (psl == 0 || - (p = memchr(ps, '\n', psl - 1)) == NULL) { + if ((p = memchr(ps, '\n', psl - 1)) == NULL) { pd = 1; goto new; } else { @@ -140,15 +145,13 @@ cspace(&PS, hs, hsl, REPLACE); break; case 'G': - if (hs == NULL) - cspace(&HS, "\n", 1, REPLACE); - cspace(&PS, hs, hsl, 0); + cspace(&PS, hs, hsl, APPEND); break; case 'h': cspace(&HS, ps, psl, REPLACE); break; case 'H': - cspace(&HS, ps, psl, 0); + cspace(&HS, ps, psl, APPEND); break; case 'i': (void)printf("%s", cp->t); @@ -180,8 +183,7 @@ case 'P': if (pd) break; - if (psl != 0 && - (p = memchr(ps, '\n', psl - 1)) != NULL) { + if ((p = memchr(ps, '\n', psl - 1)) != NULL) { oldpsl = psl; psl = (p + 1) - ps; } @@ -227,8 +229,6 @@ cp->t, strerror(errno)); break; case 'x': - if (hs == NULL) - cspace(&HS, "\n", 1, REPLACE); tspace = PS; PS = HS; HS = tspace;