Hi Martijn,

Martijn van Duren wrote on Thu, Dec 06, 2018 at 07:07:14AM +0100:
> On 12/5/18 7:24 PM, Ingo Schwarze wrote:

>> putting the minimal useful example in the place of longer quotations:
>> 
>>    $ printf "A\nB\n" | gsed '1b;='
>>   A
>>   2
>>   B
>>    $ printf "A\nB\n" | sed '1b;='  
>>   sed: 1: "1b;=": undefined label ''

>> Martijn van Duren wrote on Wed, Dec 05, 2018 at 09:24:05AM +0100:

>>> Note that the label should consist of "portable filename
>>> character set" characters, so adding the semicolon support doesn't break
>>> compatibility too bad. Although it is a violation, not an extension on
>>> unspecified behaviour (only unspecified behaviour is for is for
>>> s/../../w).

>> Why do you think it is a violation?

> Because POSIX goes out of its way to make it not obvious:
> Editing commands other than {...}, a, b, c, i, r, t, w, :, and # can be 
> followed by a <semicolon>, optional <blank> characters, and another 
> editing command. However, when an s editing command is used with the w  
> flag, following it with another command in this manner produces 
> undefined results.
> 
> They begin by a negation which can use a semicolon and then they follow
> by explicitly stating where undefined behaviour lies. So assuming that
> not including in "can" equals "may still", and assuming that the
> undefined results section is a non-exhaustive list, or an exclusive for
> the inverse group mentioned at the star, may result in undefined
> behaviour. But combine the obscure language with the fact that there's a
> profound reason to not use a semicolon in 6 out of the 10 exclude group
> makes me wonder if it's not a violation why they went out of their way
> to place them in the same list as a, c, i, r, w, #.

Ah.  I think when reading a standard, one must carefully look what it
actually says, not jump to conclusions from how that is said.  Even when
logically unambiguous, the wording may sometimes sound confusing.
And sometimes, what is prescribed is unambiguous, but something else
would seem to make more sense.

No doubt it says what a semicolon is supposed to do after the
commands not listed.  No doubt it says that "s///w filename;something"
results in undefined behaviour - by the way, "undefined" is stronger
than "unspecified".  But i don't see that it says anywhere
what "b label;something" is supposed to do - so that is left
unspecified, and operating systems are free to implement and
document an extension.

By the way, we do have a case here of the specification looking
slightly ill-designed: "s///w filename;something" is explicitly
marked as undefined, whereas the even simpler "w filename;something"
is merely left unspecified.  But fortunately, we are not planning
to change the behaviour of "[s///]w filename;something", so we don't
need to worry about that right now.


All that said, i see a few problems with the manual page, so here is
a patch to fix it.

The information in the CAVEATS section is misplaced.  The purpose
of that section is to warn about typical programming mistakes, not
to explain what our implementation does nor to explain what the
standard requires.  Besides, it is wrong, semicolons *can* be used
after "b" and "t" with our implementation.  Finally, the current
wording can mislead people to think this might be forbidden:

  $ echo "A\nB" | sed '=;r suffix.txt'


So move the information about "a", "c", "i", "r", and "w" to the
DESCRIPTION.  I don't think it belongs into the second paragraph
from the top; even though that is where ";" is introduced, that
place would be way too prominent.  Below "SED FUNCTIONS", where
other special properties of groups of functions are also explained,
seems about right.

Move the information about "b", "t", and ":" to STANDARDS where it
belongs.  That commands in general can be separated with ";" was
already said at the very top of the page.

I don't think anything more needs to be said about "#".
We already have:

    The '#' and the remainder of the line are ignored (treated as a
    comment), with the single exception that if the first two
    characters in the file are '#n', the default output is
    [...]

It's kind of obvious the remainder of the line may contain ';'
and it will be ignored.

While here, avoid "permitted" - were aren't planning to send anybody
to jail for sed(1) abuse.

OK?
  Ingo


Index: sed.1
===================================================================
RCS file: /cvs/src/usr.bin/sed/sed.1,v
retrieving revision 1.57
diff -u -r1.57 sed.1
--- sed.1       14 Nov 2018 10:59:33 -0000      1.57
+++ sed.1       7 Dec 2018 16:48:14 -0000
@@ -277,6 +277,20 @@
 The synopses below indicate which arguments have to be separated from
 the function letters by whitespace characters.
 .Pp
+The
+.Ic a ,
+.Ic c ,
+.Ic i ,
+.Ic r ,
+and
+.Ic w
+functions cannot be followed by another command separated with a semicolon.
+The
+.Ar text
+and
+.Ar file
+arguments may contain semicolon characters.
+.Pp
 Functions can be combined to form a
 .Em function list ,
 a list of
@@ -561,6 +575,14 @@
 .Op Fl aEiru
 are extensions to that specification.
 .Pp
+Following the
+.Ic b ,
+.Ic t ,
+or
+.Ic \&:
+commands with a semicolon and another command is an extension to the
+specification.
+.Pp
 The use of newlines to separate multiple commands on the command line
 is non-portable;
 the use of newlines to separate multiple commands within a command file
@@ -571,11 +593,3 @@
 .Nm
 command appeared in
 .At v7 .
-.Sh CAVEATS
-The use of semicolons to separate multiple commands
-is not permitted for the following commands:
-.Ic a , b , c ,
-.Ic i , r , t ,
-.Ic w , \&: ,
-and
-.Ic # .

Reply via email to