The following issue has been SUBMITTED. ====================================================================== https://www.austingroupbugs.net/view.php?id=1561 ====================================================================== Reported By: calestyo Assigned To: ====================================================================== Project: Issue 8 drafts Issue ID: 1561 Category: Shell and Utilities Type: Enhancement Request Severity: Editorial Priority: normal Status: New Name: Christoph Anton Mitterer Organization: User Reference: Section: various Page Number: N/A Line Number: N/A Final Accepted Text: ====================================================================== Date Submitted: 2022-02-01 00:10 UTC Last Modified: 2022-02-01 00:10 UTC ====================================================================== Summary: clarify what kind of data shell variables need to be able to hold Description: In: https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33722&limit=100&offset=0&sid=
I've raised the question, on which data shell variables are required to be able to hold. In various replies following it became clear that there is some ambiguity with respect to that question: In: https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33723&limit=100&offset=0&sid= Geoff Clare brought up that: »but POSIX clearly requires that a variable can be assigned any value obtained from a command substitution that does not include a NUL byte, and specifies utilities that can be used to generate arbitrary byte values, therefore a variable can contain any sequence of bytes that does not include a NUL byte.« Which AFAIU means that shell variables are expected to hold any bytes except NUL, and only the use of these shell variables in certain other constructs (e.g. ${#var}) interprets them as characters according to the current locale. It was brought up, that e.g. yash discards any bytes from shell variables that don't make up a valid encoding: https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33724&limit=100&offset=0&sid= In: https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33725&limit=100&offset=0&sid= Chet Ramey brought up, that shell variables are initialised from environment variables, which themselves may contain anything except NUL as value, as long as anything before the "=" is a valid Name (in the sense of POSIX). And in the later: https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33731&limit=100&offset=0&sid= that: »applications can obviously put whatever they want into the value of an environment variable in envp and call execve.« In: https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33730&limit=100&offset=0&sid= Harald van Dijk countered, that: »That is not what POSIX says. It says "The value of an environment variable is a string of characters" (8.1 Environment Variable Definition), and "character" is defined as "a sequence of one or more bytes representing a single graphic symbol or control code" (3 Definitions), with a note that says it corresponds to what C calls a multi-byte character. Environment variables are not specified to allow arbitrary bytes.« There was some further discussion on whether the definition of command substitutions implies whether or not any bytes other than NUL need to be able to be stored in shell variables. One argument brought up was, that there the wording "<newline> character" is used - another, that this would clearly refer *only* to the <newline> itself which is per definition the same (byte) in every locale. (for that particular part see also the proposed clarifications in https://www.austingroupbugs.net/view.php?id=1560 ). In: https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33736&limit=100&offset=0&sid= I brought up that in addition to what Harald pointed out earlier, in 8.1 Environment Variables it says: »These strings have the form name=value; names shall not contain the character '='. For values to be portable across systems conforming to POSIX.1-2017, the value shall be composed of characters from the portable character set (except NUL and as indicated below).« but a bit further down it says the contradicting: »The values that the environment variables may be assigned are not restricted except that they are considered to end with a null byte and the total space used to store the environment and the arguments to the process is limited to {ARG_MAX} bytes.« And in: https://collaboration.opengroup.org/austin/plato/protected/mailarch.php?soph=N&action=show&archive=austin-group-l&num=33737&limit=100&offset=0&sid= I brought up: »3.368 Standard Output "An output stream usually intended to be used for primary data output." And: 3.370 Stream "Appearing in lowercase, a stream is a file access object that allows access to an ordered sequence of characters, as described by the ISO C standard. Such objects can be created by the fdopen(), fmemopen(), fopen(), open_memstream(), or popen() functions, and are associated with a file descriptor. A stream provides the additional services of user-selectable buffering and formatted input and output; see also STREAM." This however links to Standard I/O Streams ( file:///usr/share/doc/susv4/susv4-2018/functions/V2_chap02.html#tag_15_05 ) which very well names byte output modes (fputc and so on).« Desired Action: 1) All the above should be clarified, i.e. which values shell variables hold (bytes vs. characters?) and which of them are *not only allowed*... but *must* be supported by any compliant shell (any byte except NUL)? Ideally there would be one central place where this is clearly defined (and not just indirectly), e.g. in 2.5 Parameters and Variables Probably at least the following places are also affected and need some work (see above): - 2.6.3 Command Substitution - perhaps (but rather not): 3.267 Parameter 3.440 Variable - 8. Environment Variables (there are at least two places here, which are contradictory) 2) In combination with (1) above, it should also be clarified in 8. Environment Variables, whether implementations MUST initialise shell variables from the environment (where the portion before the '=' is a Name) with values "as is" (i.e. with exactly the bytes that were found in char **environ ... or whether an implementation would be allowed to transform that (this idea was brought up on help-bash within some discussion) or e.g. skip variables that contain an invalid character encoding. 3) Since command substitution refers to standard output (but presumably in the sense of it being binary - with NUL causing undefined behaviour) and standard output is in defined in 3.368 Standard Output to be a stream... ... and that in 3.370 Stream to be defined as working on characters (while e.g. the definitions of fdopen() or fputc() allow for binary)... ... there probably needs to be resolved something in at least 3.370 Stream. 4) In 2.5.3 Shell Variables and/or 8.1 Environment Variable Definition it should be clarified what happens to assignments in char **environ whose portion before the first '=' is not a valid 3.235 Name, i.e.: - is it unspecified - do they have to be ignored - may an implementation transform the name somehow (e.g. replace all invalid chars with '_') - anything else Thanks, Chris ====================================================================== Issue History Date Modified Username Field Change ====================================================================== 2022-02-01 00:10 calestyo New Issue 2022-02-01 00:10 calestyo Name => Christoph Anton Mitterer 2022-02-01 00:10 calestyo Section => various 2022-02-01 00:10 calestyo Page Number => N/A 2022-02-01 00:10 calestyo Line Number => N/A ======================================================================
