A NOTE has been added to this issue. ====================================================================== https://austingroupbugs.net/view.php?id=1556 ====================================================================== Reported By: calestyo Assigned To: ====================================================================== Project: Issue 8 drafts Issue ID: 1556 Category: Shell and Utilities Type: Clarification Requested Severity: Objection Priority: normal Status: New Name: Christoph Anton Mitterer Organization: User Reference: Section: Utilities, sed / 9.3.5 RE Bracket Expression Page Number: - Line Number: - Final Accepted Text: ====================================================================== Date Submitted: 2022-01-18 01:07 UTC Last Modified: 2022-01-18 21:17 UTC ====================================================================== Summary: clarify meaning of \n used in a bracket expression in a sed context address or s-command ======================================================================
---------------------------------------------------------------------- (0005626) calestyo (reporter) - 2022-01-18 21:17 https://austingroupbugs.net/view.php?id=1556#c5626 ---------------------------------------------------------------------- That: »Any <backslash> used to alter the default meaning of a subsequent character shall be discarded from the RE or the replacement before evaluating the RE or using the replacement.« (page 3137, line 10622, Draft 2.1) Hmm that's quite "hidden" between paragraphs only dealing with the replacement. But IMO it merely says, that the escaping \ is "removed" (after it has done it's job)... which - for a change - I think was in fact already clear. One could, however, indeed follow from that, that [\n] is in fact a bracket expression containing a newline, cause the \ needs to be discarded BEFORE evaluating the RE/replacement. But even then... things are so scattered over many different places... and quite ambiguously written... And that would e.g. mean that GNU sed does it just right withOUT POSIXLY_CORRECT, and just wrong WITH). And it still wouldn't explain, whether \n in a sed command is newline or the delimiter n, if the delimiter was n. What I would have kinda wanted is a clear algorithm like the following (just a hypothetical one): » When REs and/or replacements are used in context addresses respectively the s-command, the following applies: The string is parsed from left to right, with the rules for REs and the specific rules for their use within delimiters being applied at the same time according to the following precedence: 1) A '\' (that is itself not escaped with '\') followed by a delimiter character, causes (the '\' to be removed and) the delimiter character not to be interpreted as delimiter but as normal part of the RE respectively the replacement. This also means, that if the delimiter is a RE/replacement special character, that it will have the special meaning with respect to the RE respectively the replacement and that it won't be possible to get its literal meaning (with that delimiter). For example 's.\..x.' is the same as 's/./x/' and 's/\./x/' cannot be obtained. It further means, that if the delimiter is a character, that would get its RE/replacement special meaning only when preceded by a `\` (that is itself not escaped with '\'), that this character always retains its literal meaning with respect to the RE/replacement (and that it's special meaning cannot be gained with that delimiter). For example 's(\((x(' is the same as 's/(/x/' and 's/\(/x/' cannot be obtained. [Depending on how it should work:] This is also the case when inside a RE bracket expression. 2) In the RE (but not the replacement), when the character 'n' is preceded by '\' (that is itself not escaped with '\') AND when rule (1) didn't apply (that is: when the delimiter is not 'n'), it shall be interpreted as a newline character. [So in this example, \n being a delimiter would win over \n being a newline] [Depending on how it should work:] This is also the case when inside a RE bracket expression. 3) If neither (1) nor (2) applied, the rules for RE (see chapter...) respectively the replacement shall apply. « - Placing (3) as 3rd would also already make clear (in that example), that s/[\n]/x/ would in fact be a bracket expression with a newlin in it, because the newline rule (2) comes before (3) (which "contains" the rule that everything in a BE is literal). - Whereas placing (2) after (1), would make clear that sn\nnxn i seffectively s/n/x/ and not s/\n/x/ . - And (1) would make clear, what "literal" means when a delimiter is escaped by \ ... here that it retains it's sepcial meaning, when it would have one, respectively wouldn't gain a special meaning when it wouldn't have one. E.g. s.\..x. would be s/./x/ (and not s/\./x/) ... and s(\(x\)(x( would be s/(x\)/x/ and not s/\(x\)/x/ . Of course one could also define all that differently (e.g. that [\n] would NOT be a newline in a BE)... but the above is IMO how a proper definition would look like, without having multiple pieces of text that could be part of the definition scatter over n places, where one needs to guess about the meant context (like as in "escape sequence" means by it's context that it cannot be escaped itself). And even if something of it couldn't be clearly specified (because of already incompatible major implementations), it should clearly say which behaviour is undefined. Issue History Date Modified Username Field Change ====================================================================== 2022-01-18 01:07 calestyo New Issue 2022-01-18 01:07 calestyo Name => Christoph Anton Mitterer 2022-01-18 01:07 calestyo Section => Utilities, sed / 9.3.5 RE Bracket Expression 2022-01-18 01:07 calestyo Page Number => - 2022-01-18 01:07 calestyo Line Number => - 2022-01-18 09:41 geoffclare Note Added: 0005621 2022-01-18 13:26 calestyo Note Added: 0005622 2022-01-18 16:30 kre Note Added: 0005623 2022-01-18 17:12 calestyo Note Added: 0005624 2022-01-18 18:41 shware_systems Note Added: 0005625 2022-01-18 21:17 calestyo Note Added: 0005626 ======================================================================
