Hey. Geoff, I haven't had time yet to look at your updated proposal of #1550, not sure whether I manage to do it this night or in the next days. But I'll definitely reply, so please be a bit more patient. :-)
However, on thing came to my minds again, which I think needs further discussion... The current "solution" to a number of previous problems is: Inside a bracket expression there cannot be any escape sequences. Therefore, there cannot be any \n (in the sense of <newline>) nor any \c (in the sense of "un-delimitering" the delimiter character c). While this is per se perfectly valid (and solves numerous issues), it has one problem: (at least) GNU sed breaks it already! As you noted yourself in https://www.austingroupbugs.net/view.php?id=1556#c5621 it requires POSIXLY_CORRECT=1 to work as it should. $ printf 'a\\b\n' | sed 's/a[\n]b/X/' a\b $ printf 'a\nb\n' | sed 's/a[\n]b/X/' a b $ printf 'a\nb\n' | sed -z 's/a[\n]b/X/' X $ printf 'anb\n' | sed 's/a[\n]b/X/' anb $ export POSIXLY_CORRECT=1 $ printf 'a\\b\n' | sed 's/a[\n]b/X/' X $ printf 'a\nb\n' | sed 's/a[\n]b/X/' a b $ printf 'a\nb\n' | sed -z 's/a[\n]b/X/' a b $ printf 'anb\n' | sed 's/a[\n]b/X/' X $ NOT so for GNU's extension of '\s': '\s' Matches whitespace characters (spaces and tabs). Newlines embedded in the pattern/hold spaces will also match... (and I assume neither for any similar such extensions): $ printf 'asb\n' | sed 's/a[\s]b/X/' X $ printf 'a\\b\n' | sed 's/a[\s]b/X/' X $ printf 'a b\n' | sed 's/a[\s]b/X/' a b $ export POSIXLY_CORRECT=1 $ printf 'asb\n' | sed 's/a[\s]b/X/' X calestyo@heisenberg:~$ printf 'a\\b\n' | sed 's/a[\s]b/X/' X calestyo@heisenberg:~$ printf 'a b\n' | sed 's/a[\s]b/X/' a b $ It also works as expected for escaped delimiter characters: $ printf 'aDb\n' | sed 'sDa[\D]bDXD' X $ printf 'a\\b\n' | sed 'sDa[\D]bDXD' X even when the delimiter char has also special meaning when escaped (as with '\s'): $ printf 'asb\n' | sed 'ssa[\s]bsXs' X $ printf 'a\\b\n' | sed 'ssa[\s]bsXs' X $ printf 'a b\n' | sed 'ssa[\s]bsXs' a b (all the above with GNU sed 4.8). So the only problematic case seems to be '\n'. I don't want to step on anyone's toes... but GNU sed is probably one of the (if not the) major implementation of sed, isn't it? And regardless of POSIXLY_CORRECT, the standard describes now a behaviour (namely that the bracket expression [\n] is the literal characters '\' or 'n' and *not* <newline>)... which is not shared by a major implementation, at least not with its default settings. Anyone who reads the standard would assume that [\n] is not a <newline>. And of course we could just say "well your implementation is not compliant" or "look at it's documentation, where it says about POSIXLY_CORRECT" ... but that doesn't seem so good to me. Usually, implementations extend POSIX rather gracefully, but this is a more serious deviation. I mean should we just leave it at that? Or should we add some hint, e.g. indicating that portable applications should not use '\n' but rather 'n\' ... or perhaps even generally place '\' last in the bracket expression? The best would of course be to get GNU change it's behaviour, though I have no idea how likely that is ;-) I had tried to reach out to GNU and BusyBox sed maintainers before, and while I got replies from BusyBox' I couldn't get in touch with GNU's. Is there anyone who's in contact with these people? Cheers, Chris.
