Re: [Issue 8 drafts 0001556]: clarify meaning of \n used in a bracket expression in a sed context address or s-command

Christoph Anton Mitterer via austin-group-l at The Open Group Mon, 25 Apr 2022 16:02:09 -0700

On Mon, 2022-04-25 at 10:21 +0100, Geoff Clare via austin-group-l at
The Open Group wrote:
> This was discussed during our work on bug 1233 and resulted in
> additions
> to the sed APPLICATION USAGE (line 106286 in draft 2.1) and FUTURE
> DIRECTIONS.


Hmm... AFAICS, that was only done for sed, right?

Isn't it something that would apply to regular expressions in general
(and thus also e.g. grep)?
And via them, it would also affect bracket expressions in the pattern
matching notation (which refer to the RE bracket expressions).

So shouldn't that be mentioned there, too?


Is it considered likely, that these future directions will actually
ever be implemented?


When you take something like '\+' in BREs (and the future directions
we've added for that recently), there's some big difference:

POSIX clearly said, that '\+' produces undefined results (9.3.2),... so
anyone who wanted to be sure to stay portable, had the chance to do so
by simply not using it.
Should POSIX ever actually change '\+' to have *only* the special
meaning of + and not the literal,... no one could really complain when
he used it in the sense of the literal plus and his stuff breaks -
because it was never defined so.

But with any '[\x]', it was (AFAIU) always be meant to be the literal
character '\' or the literal character for which x stands.
If someone faithfully relied on that, any actual future change would
break that assumption.

If someone would say that it's unlikely that people ever used '[\x]'
and wanted the literal '\' and the literal character for which x
stands... then what about '[\^]', which people might have used when
they mean '\' or '^' but couldn't write '[^\]'? Or what about a range
like '[\-_]'?



And in practise it would seem even more complicated:
As my examples showed before, e.g. GNU sed only seems to do this for
'\n' while e.g. '\s' in a bracket expression *is* taken as the literal
character '\' or the literal character 's'.
(btw: and so does GNU grep)

So should the standard ever allow them to be escape sequences there
would be even more uncertainty on what means what.



Allowing escape sequences inside bracket expressions would also open up
quite a few of the questions we've tried to deal with in #1550, #1551
and #1552.

E.g. what is 's.a[\.]b.xxx.'? The literal '.'? The special '.'? Or the
literal '\' or literal '.'?

And I thought one conclusion from these issues was, that when the
delimiter is a character that (by itself) is not special BUT gets
special meaning when escaped (e.g. '+'),... then the escape sequence
'\+' MUST always be the literal '+' (when used while the delimiter is
'+').
So e.g.
'sna[\n]bnxxxn' <--- Is it '\' or 'n'? Is it 'n'? Is it <newline>?
's/a[\n]b/xxx/' <--/ 

Sure, one can simply also give the same requirement if that is used
inside a bracket expression... but it probably doesn't make life for
the end user much easier.


Oh and btw, GNU grep seems to handle '[\n]' as the literal '\' and the
literal 'n':
$ printf 'a\nb' | grep -z '^a[\n]b$' ; echo

$ printf 'a\\b' | grep -z '^a[\n]b$' ; echo
a\b
$ printf 'anb' | grep -z '^a[\n]b$' ; echo
anb


So changing this inside bracket expressions seem pretty messy to me...
and I guess I'd generally doubt the usefulness of greatly extending
ERE/BRE (as it was suggested in #1233), at least when it breaks
compatibility.



Cheers,
Chris.

Re: [Issue 8 drafts 0001556]: clarify meaning of \n used in a bracket expression in a sed context address or s-command

Reply via email to