On Mon, 2022-04-25 at 10:21 +0100, Geoff Clare via austin-group-l at The Open Group wrote: > This was discussed during our work on bug 1233 and resulted in > additions > to the sed APPLICATION USAGE (line 106286 in draft 2.1) and FUTURE > DIRECTIONS.
Hmm... AFAICS, that was only done for sed, right? Isn't it something that would apply to regular expressions in general (and thus also e.g. grep)? And via them, it would also affect bracket expressions in the pattern matching notation (which refer to the RE bracket expressions). So shouldn't that be mentioned there, too? Is it considered likely, that these future directions will actually ever be implemented? When you take something like '\+' in BREs (and the future directions we've added for that recently), there's some big difference: POSIX clearly said, that '\+' produces undefined results (9.3.2),... so anyone who wanted to be sure to stay portable, had the chance to do so by simply not using it. Should POSIX ever actually change '\+' to have *only* the special meaning of + and not the literal,... no one could really complain when he used it in the sense of the literal plus and his stuff breaks - because it was never defined so. But with any '[\x]', it was (AFAIU) always be meant to be the literal character '\' or the literal character for which x stands. If someone faithfully relied on that, any actual future change would break that assumption. If someone would say that it's unlikely that people ever used '[\x]' and wanted the literal '\' and the literal character for which x stands... then what about '[\^]', which people might have used when they mean '\' or '^' but couldn't write '[^\]'? Or what about a range like '[\-_]'? And in practise it would seem even more complicated: As my examples showed before, e.g. GNU sed only seems to do this for '\n' while e.g. '\s' in a bracket expression *is* taken as the literal character '\' or the literal character 's'. (btw: and so does GNU grep) So should the standard ever allow them to be escape sequences there would be even more uncertainty on what means what. Allowing escape sequences inside bracket expressions would also open up quite a few of the questions we've tried to deal with in #1550, #1551 and #1552. E.g. what is 's.a[\.]b.xxx.'? The literal '.'? The special '.'? Or the literal '\' or literal '.'? And I thought one conclusion from these issues was, that when the delimiter is a character that (by itself) is not special BUT gets special meaning when escaped (e.g. '+'),... then the escape sequence '\+' MUST always be the literal '+' (when used while the delimiter is '+'). So e.g. 'sna[\n]bnxxxn' <--- Is it '\' or 'n'? Is it 'n'? Is it <newline>? 's/a[\n]b/xxx/' <--/ Sure, one can simply also give the same requirement if that is used inside a bracket expression... but it probably doesn't make life for the end user much easier. Oh and btw, GNU grep seems to handle '[\n]' as the literal '\' and the literal 'n': $ printf 'a\nb' | grep -z '^a[\n]b$' ; echo $ printf 'a\\b' | grep -z '^a[\n]b$' ; echo a\b $ printf 'anb' | grep -z '^a[\n]b$' ; echo anb So changing this inside bracket expressions seem pretty messy to me... and I guess I'd generally doubt the usefulness of greatly extending ERE/BRE (as it was suggested in #1233), at least when it breaks compatibility. Cheers, Chris.
