The following issue has been SUBMITTED. 
====================================================================== 
https://www.austingroupbugs.net/view.php?id=1561 
====================================================================== 
Reported By:                calestyo
Assigned To:                
====================================================================== 
Project:                    Issue 8 drafts
Issue ID:                   1561
Category:                   Shell and Utilities
Type:                       Enhancement Request
Severity:                   Editorial
Priority:                   normal
Status:                     New
Name:                       Christoph Anton Mitterer 
Organization:                
User Reference:              
Section:                    various 
Page Number:                N/A 
Line Number:                N/A 
Final Accepted Text:         
====================================================================== 
Date Submitted:             2022-02-01 00:10 UTC
Last Modified:              2022-02-01 00:10 UTC
====================================================================== 
Summary:                    clarify what kind of data shell variables need to be
able to hold
Description: 
In:
https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33722&limit=100&offset=0&sid=

I've raised the question, on which data shell variables are required to be
able to hold.

In various replies following it became clear that there is some ambiguity
with respect to that question:


In:
https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33723&limit=100&offset=0&sid=
Geoff Clare brought up that:
»but POSIX clearly requires that a variable can be
assigned any value obtained from a command substitution that does not
include a NUL byte, and specifies utilities that can be used to
generate arbitrary byte values, therefore a variable can contain any
sequence of bytes that does not include a NUL byte.«

Which AFAIU means that shell variables are expected to hold any bytes
except NUL, and only the use of these shell variables in certain other
constructs (e.g. ${#var}) interprets them as characters according to the
current locale.


It was brought up, that e.g. yash discards any bytes from shell variables
that don't make up a valid encoding:
https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33724&limit=100&offset=0&sid=


In:
https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33725&limit=100&offset=0&sid=
Chet Ramey brought up, that shell variables are initialised from
environment variables, which themselves may contain anything except NUL as
value, as long as anything before the "=" is a valid Name (in the sense of
POSIX).
And in the later:
https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33731&limit=100&offset=0&sid=
that:
»applications can obviously put whatever they want into the value of an
environment variable in envp and call execve.«


In:
https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33730&limit=100&offset=0&sid=
Harald van Dijk countered, that:
»That is not what POSIX says. It says "The value of an environment
variable is a string of characters" (8.1 Environment Variable  Definition),
and "character" is defined as "a sequence of one or more bytes representing
a single graphic symbol or control code" (3 Definitions), with a note that
says it corresponds to what C calls a multi-byte character. Environment
variables are not specified to allow arbitrary bytes.«


There was some further discussion on whether the definition of command
substitutions implies whether or not any bytes other than NUL need to be
able to be stored in shell variables.
One argument brought up was, that there the wording "<newline> character"
is used - another, that this would clearly refer *only* to the <newline>
itself which is per definition the same (byte) in every locale.
(for that particular part see also the proposed clarifications in
https://www.austingroupbugs.net/view.php?id=1560 ).



In:
https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33736&limit=100&offset=0&sid=
I brought up that in addition to what Harald pointed out earlier, in 8.1
Environment Variables it says:
»These strings have the form name=value; names shall not contain the
character '='. For values to be portable across systems conforming to
POSIX.1-2017, the value shall be composed of characters from the
portable character set (except NUL and as indicated below).«

but a bit further down it says the contradicting:
»The values that the environment variables may be assigned are not
restricted except that they are considered to end with a null byte and
the total space used to store the environment and the arguments to the
process is limited to {ARG_MAX} bytes.«


And in:
https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33737&limit=100&offset=0&sid=
I brought up:
»3.368 Standard Output
"An output stream usually intended to be used for primary data output."

And:
3.370 Stream
"Appearing in lowercase, a stream is a file access object that allows
access to an ordered sequence of characters, as described by the ISO C
standard. Such objects can be created by the fdopen(), fmemopen(), fopen(),
open_memstream(), or popen() functions, and are associated with a file
descriptor. A stream provides the additional services of user-selectable
buffering and formatted input and output; see also STREAM."


This however links to Standard I/O Streams (
file:///usr/share/doc/susv4/susv4-2018/functions/V2_chap02.html#tag_15_05
)
which very well names byte output modes (fputc and so on).«
Desired Action: 
1) All the above should be clarified, i.e. which values shell variables
hold (bytes vs. characters?) and which of them are *not only allowed*...
but *must* be supported by any compliant shell (any byte except NUL)?

Ideally there would be one central place where this is clearly defined (and
not just indirectly), e.g. in  2.5 Parameters and Variables

Probably at least the following places are also affected and need some work
(see above):
- 2.6.3 Command Substitution
- perhaps (but rather not):
  3.267 Parameter
  3.440 Variable
- 8. Environment Variables (there are at least two places here, which are
contradictory)


2) In combination with (1) above, it should also be clarified in 8.
Environment Variables, whether implementations MUST initialise shell
variables from the environment (where the portion before the '=' is a Name)
with values "as is" (i.e. with exactly the bytes that were found in char
**environ ... or whether an implementation would be allowed to transform
that (this idea was brought up on help-bash within some discussion) or e.g.
skip variables that contain an invalid character encoding.


3) Since command substitution refers to standard output (but presumably in
the sense of it being binary - with NUL causing undefined behaviour) and
standard output is in defined in 3.368 Standard Output to be a stream...
... and that in 3.370 Stream to be defined as working on characters (while
e.g. the definitions of fdopen() or fputc() allow for binary)...

... there probably needs to be resolved something in at least 3.370
Stream.


4) In  2.5.3 Shell Variables  and/or  8.1 Environment Variable Definition
it should be clarified what happens to assignments in char **environ whose
portion before the first '=' is not a valid 3.235 Name, i.e.:
- is it unspecified
- do they have to be ignored
- may an implementation transform the name somehow (e.g. replace all
invalid chars with '_')
- anything else

Thanks,
Chris
====================================================================== 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2022-02-01 00:10 calestyo       New Issue                                    
2022-02-01 00:10 calestyo       Name                      => Christoph Anton
Mitterer
2022-02-01 00:10 calestyo       Section                   => various         
2022-02-01 00:10 calestyo       Page Number               => N/A             
2022-02-01 00:10 calestyo       Line Number               => N/A             
======================================================================


  • [Issue 8 dra... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [Issue ... Austin Group Bug Tracker via austin-group-l at The Open Group

Reply via email to