Hi Martijn, Martijn van Duren wrote on Thu, Dec 06, 2018 at 07:07:14AM +0100: > On 12/5/18 7:24 PM, Ingo Schwarze wrote:
>> putting the minimal useful example in the place of longer quotations: >> >> $ printf "A\nB\n" | gsed '1b;=' >> A >> 2 >> B >> $ printf "A\nB\n" | sed '1b;=' >> sed: 1: "1b;=": undefined label '' >> Martijn van Duren wrote on Wed, Dec 05, 2018 at 09:24:05AM +0100: >>> Note that the label should consist of "portable filename >>> character set" characters, so adding the semicolon support doesn't break >>> compatibility too bad. Although it is a violation, not an extension on >>> unspecified behaviour (only unspecified behaviour is for is for >>> s/../../w). >> Why do you think it is a violation? > Because POSIX goes out of its way to make it not obvious: > Editing commands other than {...}, a, b, c, i, r, t, w, :, and # can be > followed by a <semicolon>, optional <blank> characters, and another > editing command. However, when an s editing command is used with the w > flag, following it with another command in this manner produces > undefined results. > > They begin by a negation which can use a semicolon and then they follow > by explicitly stating where undefined behaviour lies. So assuming that > not including in "can" equals "may still", and assuming that the > undefined results section is a non-exhaustive list, or an exclusive for > the inverse group mentioned at the star, may result in undefined > behaviour. But combine the obscure language with the fact that there's a > profound reason to not use a semicolon in 6 out of the 10 exclude group > makes me wonder if it's not a violation why they went out of their way > to place them in the same list as a, c, i, r, w, #. Ah. I think when reading a standard, one must carefully look what it actually says, not jump to conclusions from how that is said. Even when logically unambiguous, the wording may sometimes sound confusing. And sometimes, what is prescribed is unambiguous, but something else would seem to make more sense. No doubt it says what a semicolon is supposed to do after the commands not listed. No doubt it says that "s///w filename;something" results in undefined behaviour - by the way, "undefined" is stronger than "unspecified". But i don't see that it says anywhere what "b label;something" is supposed to do - so that is left unspecified, and operating systems are free to implement and document an extension. By the way, we do have a case here of the specification looking slightly ill-designed: "s///w filename;something" is explicitly marked as undefined, whereas the even simpler "w filename;something" is merely left unspecified. But fortunately, we are not planning to change the behaviour of "[s///]w filename;something", so we don't need to worry about that right now. All that said, i see a few problems with the manual page, so here is a patch to fix it. The information in the CAVEATS section is misplaced. The purpose of that section is to warn about typical programming mistakes, not to explain what our implementation does nor to explain what the standard requires. Besides, it is wrong, semicolons *can* be used after "b" and "t" with our implementation. Finally, the current wording can mislead people to think this might be forbidden: $ echo "A\nB" | sed '=;r suffix.txt' So move the information about "a", "c", "i", "r", and "w" to the DESCRIPTION. I don't think it belongs into the second paragraph from the top; even though that is where ";" is introduced, that place would be way too prominent. Below "SED FUNCTIONS", where other special properties of groups of functions are also explained, seems about right. Move the information about "b", "t", and ":" to STANDARDS where it belongs. That commands in general can be separated with ";" was already said at the very top of the page. I don't think anything more needs to be said about "#". We already have: The '#' and the remainder of the line are ignored (treated as a comment), with the single exception that if the first two characters in the file are '#n', the default output is [...] It's kind of obvious the remainder of the line may contain ';' and it will be ignored. While here, avoid "permitted" - were aren't planning to send anybody to jail for sed(1) abuse. OK? Ingo Index: sed.1 =================================================================== RCS file: /cvs/src/usr.bin/sed/sed.1,v retrieving revision 1.57 diff -u -r1.57 sed.1 --- sed.1 14 Nov 2018 10:59:33 -0000 1.57 +++ sed.1 7 Dec 2018 16:48:14 -0000 @@ -277,6 +277,20 @@ The synopses below indicate which arguments have to be separated from the function letters by whitespace characters. .Pp +The +.Ic a , +.Ic c , +.Ic i , +.Ic r , +and +.Ic w +functions cannot be followed by another command separated with a semicolon. +The +.Ar text +and +.Ar file +arguments may contain semicolon characters. +.Pp Functions can be combined to form a .Em function list , a list of @@ -561,6 +575,14 @@ .Op Fl aEiru are extensions to that specification. .Pp +Following the +.Ic b , +.Ic t , +or +.Ic \&: +commands with a semicolon and another command is an extension to the +specification. +.Pp The use of newlines to separate multiple commands on the command line is non-portable; the use of newlines to separate multiple commands within a command file @@ -571,11 +593,3 @@ .Nm command appeared in .At v7 . -.Sh CAVEATS -The use of semicolons to separate multiple commands -is not permitted for the following commands: -.Ic a , b , c , -.Ic i , r , t , -.Ic w , \&: , -and -.Ic # .