Re: [Issue 8 drafts 0001556]: clarify meaning of \n used in a bracket expression in a sed context address or s-command

Christoph Anton Mitterer via austin-group-l at The Open Group Sun, 24 Apr 2022 17:51:28 -0700

Hey.

Geoff, I haven't had time yet to look at your updated proposal of
#1550, not sure whether I manage to do it this night or in the next
days.
But I'll definitely reply, so please be a bit more patient. :-)



However, on thing came to my minds again, which I think needs further
discussion...



The current "solution" to a number of previous problems is:

Inside a bracket expression there cannot be any escape sequences.
Therefore, there cannot be any \n (in the sense of <newline>) nor any
\c (in the sense of "un-delimitering" the delimiter character c).


While this is per se perfectly valid (and solves numerous issues), it
has one problem:

(at least) GNU sed breaks it already!



As you noted yourself in
https://www.austingroupbugs.net/view.php?id=1556#c5621

it requires POSIXLY_CORRECT=1 to work as it should.

$ printf 'a\\b\n' | sed 's/a[\n]b/X/'
a\b
$ printf 'a\nb\n' | sed 's/a[\n]b/X/'
a
b
$ printf 'a\nb\n' | sed -z 's/a[\n]b/X/'
X
$ printf 'anb\n' | sed 's/a[\n]b/X/'
anb
$ export POSIXLY_CORRECT=1
$ printf 'a\\b\n' | sed 's/a[\n]b/X/'
X
$ printf 'a\nb\n' | sed 's/a[\n]b/X/'
a
b
$ printf 'a\nb\n' | sed -z 's/a[\n]b/X/'
a
b
$ printf 'anb\n' | sed 's/a[\n]b/X/'
X
$ 


NOT so for GNU's extension of '\s':
'\s'
     Matches whitespace characters (spaces and tabs).  Newlines
     embedded in the pattern/hold spaces will also match...
(and I assume neither for any similar such extensions):

$ printf 'asb\n' | sed 's/a[\s]b/X/'
X
$ printf 'a\\b\n' | sed 's/a[\s]b/X/'
X
$ printf 'a b\n' | sed 's/a[\s]b/X/'
a b
$ export POSIXLY_CORRECT=1
$ printf 'asb\n' | sed 's/a[\s]b/X/'
X
calestyo@heisenberg:~$ printf 'a\\b\n' | sed 's/a[\s]b/X/'
X
calestyo@heisenberg:~$ printf 'a b\n' | sed 's/a[\s]b/X/'
a b
$


It also works as expected for escaped delimiter characters:
$ printf 'aDb\n' | sed 'sDa[\D]bDXD'
X
$ printf 'a\\b\n' | sed 'sDa[\D]bDXD'
X

even when the delimiter char has also special meaning when escaped (as
with '\s'):
$ printf 'asb\n' | sed 'ssa[\s]bsXs'
X
$ printf 'a\\b\n' | sed 'ssa[\s]bsXs'
X
$ printf 'a b\n' | sed 'ssa[\s]bsXs'
a b


(all the above with GNU sed 4.8).


So the only problematic case seems to be '\n'.



I don't want to step on anyone's toes... but GNU sed is probably one of
the (if not the) major implementation of sed, isn't it?


And regardless of POSIXLY_CORRECT, the standard describes now a
behaviour (namely that the bracket expression [\n] is the literal
characters '\' or 'n' and *not* <newline>)... which is not shared by a
major implementation, at least not with its default settings.

Anyone who reads the standard would assume that [\n] is not a
<newline>. 
And of course we could just say "well your implementation is not
compliant" or "look at it's documentation, where it says about
POSIXLY_CORRECT" ... but that doesn't seem so good to me.

Usually, implementations extend POSIX rather gracefully, but this is a
more serious deviation.


I mean should we just leave it at that?

Or should we add some hint, e.g. indicating that portable applications
should not use '\n' but rather 'n\' ... or perhaps even generally place
'\' last in the bracket expression?


The best would of course be to get GNU change it's behaviour, though I
have no idea how likely that is ;-)

I had tried to reach out to GNU and BusyBox sed maintainers before, and
while I got replies from BusyBox' I couldn't get in touch with GNU's.

Is there anyone who's in contact with these people?



Cheers,
Chris.

Re: [Issue 8 drafts 0001556]: clarify meaning of \n used in a bracket expression in a sed context address or s-command

Reply via email to