A NOTE has been added to this issue. 
====================================================================== 
https://austingroupbugs.net/view.php?id=1556 
====================================================================== 
Reported By:                calestyo
Assigned To:                
====================================================================== 
Project:                    Issue 8 drafts
Issue ID:                   1556
Category:                   Shell and Utilities
Type:                       Clarification Requested
Severity:                   Objection
Priority:                   normal
Status:                     New
Name:                       Christoph Anton Mitterer 
Organization:                
User Reference:              
Section:                    Utilities, sed / 9.3.5 RE Bracket Expression 
Page Number:                - 
Line Number:                - 
Final Accepted Text:         
====================================================================== 
Date Submitted:             2022-01-18 01:07 UTC
Last Modified:              2022-01-18 21:17 UTC
====================================================================== 
Summary:                    clarify meaning of \n used in a bracket expression
in a sed context address or s-command
====================================================================== 

---------------------------------------------------------------------- 
 (0005626) calestyo (reporter) - 2022-01-18 21:17
 https://austingroupbugs.net/view.php?id=1556#c5626 
---------------------------------------------------------------------- 
That:
»Any <backslash> used to alter the default meaning of a subsequent
character shall be discarded from the RE or the replacement before
evaluating the RE or using the replacement.«
(page 3137, line 10622, Draft 2.1)

Hmm that's quite "hidden" between paragraphs only dealing with the
replacement.

But IMO it merely says, that the escaping \ is "removed" (after it has done
it's job)... which - for a change - I think was in fact already clear.

One could, however, indeed follow from that, that [\n] is in fact a bracket
expression containing a newline, cause the \ needs to be discarded BEFORE
evaluating the RE/replacement.
But even then... things are so scattered over many different places... and
quite ambiguously written...

And that would e.g. mean that GNU sed does it just right withOUT
POSIXLY_CORRECT, and just wrong WITH).
And it still wouldn't explain, whether \n in a sed command is newline or
the delimiter n, if the delimiter was n.



What I would have kinda wanted is a clear algorithm like the following
(just a hypothetical one):

»
When REs and/or replacements are used in context addresses respectively the
s-command, the following applies:
The string is parsed from left to right, with the rules for REs and the
specific rules for their use within delimiters being applied at the same
time according to the following precedence:
1) A '\' (that is itself not escaped with '\') followed by a delimiter
character, causes (the '\' to be removed and) the delimiter character not
to be interpreted as delimiter but as normal part of the RE respectively
the replacement.
This also means, that if the delimiter is a RE/replacement special
character, that it will have the special meaning with respect to the RE
respectively the replacement and that it won't be possible to get its
literal meaning (with that delimiter). For example 's.\..x.' is the same as
's/./x/' and 's/\./x/' cannot be obtained.
It further means, that if the delimiter is a character, that would get its
RE/replacement special meaning only when preceded by a `\` (that is itself
not escaped with '\'), that this character always retains its literal
meaning with respect to the RE/replacement (and that it's special meaning
cannot be gained with that delimiter). For example 's(\((x(' is the same as
's/(/x/' and 's/\(/x/' cannot be obtained.

[Depending on how it should work:]
This is also the case when inside a RE bracket expression.


2) In the RE (but not the replacement), when the character 'n' is preceded
by '\' (that is itself not escaped with '\') AND when rule (1) didn't apply
(that is: when the delimiter is not 'n'), it shall be interpreted as a
newline character.

[So in this example, \n being a delimiter would win over \n being a
newline]

[Depending on how it should work:]
This is also the case when inside a RE bracket expression.


3) If neither (1) nor (2) applied, the rules for RE (see chapter...)
respectively the replacement shall apply.
«

- Placing (3) as 3rd would also already make clear (in that example), that
s/[\n]/x/ would in fact be a bracket expression with a newlin in it,
because the newline rule (2) comes before (3) (which "contains" the rule
that everything in a BE is literal).

- Whereas placing (2) after (1), would make clear that sn\nnxn i
seffectively s/n/x/ and not s/\n/x/ .

- And (1) would make clear, what "literal" means when a delimiter is
escaped by \ ... here that it retains it's sepcial meaning, when it would
have one, respectively wouldn't gain a special meaning when it wouldn't
have one.
E.g.  s.\..x. would be s/./x/ (and not s/\./x/) ... and s(\(x\)(x( would be
s/(x\)/x/ and not s/\(x\)/x/ .


Of course one could also define all that differently (e.g. that [\n] would
NOT be a newline in a BE)... but the above is IMO how a proper definition
would look like, without having multiple pieces of text that could be part
of the definition scatter over n places, where one needs to guess about the
meant context (like as in "escape sequence" means by it's context that it
cannot be escaped itself).


And even if something of it couldn't be clearly specified (because of
already incompatible major implementations), it should clearly say which
behaviour is undefined. 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2022-01-18 01:07 calestyo       New Issue                                    
2022-01-18 01:07 calestyo       Name                      => Christoph Anton
Mitterer
2022-01-18 01:07 calestyo       Section                   => Utilities, sed /
9.3.5 RE Bracket Expression
2022-01-18 01:07 calestyo       Page Number               => -               
2022-01-18 01:07 calestyo       Line Number               => -               
2022-01-18 09:41 geoffclare     Note Added: 0005621                          
2022-01-18 13:26 calestyo       Note Added: 0005622                          
2022-01-18 16:30 kre            Note Added: 0005623                          
2022-01-18 17:12 calestyo       Note Added: 0005624                          
2022-01-18 18:41 shware_systems Note Added: 0005625                          
2022-01-18 21:17 calestyo       Note Added: 0005626                          
======================================================================


  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
      • Re:... Christoph Anton Mitterer via austin-group-l at The Open Group
        • ... Geoff Clare via austin-group-l at The Open Group
          • ... Christoph Anton Mitterer via austin-group-l at The Open Group
            • ... Geoff Clare via austin-group-l at The Open Group
        • ... Eric Blake via austin-group-l at The Open Group
          • ... Christoph Anton Mitterer via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group

Reply via email to