Date: Thu, 28 Feb 2019 12:04:25 +0000 From: Geoff Clare <g...@opengroup.org> Message-ID: <20190228120425.GA10849@lt2.masqnet>
To take this in something of a reverse order... | As far as I can see this agrees with the examples in XRAT C.2.5.2. It was intended to - nothing there was supposed to be even slightly controversial, and I was very surprised that bosh managed to get it all wrong (everything else gets the same results). The point was just that in ${var=word} "word" is expanded in a context where there is no field splitting (quotes are irrelevant to that). This is just the same as in var=word The latter is quite clear in the standard (XCU 2.9.1 point 4): 4. Each variable assignment shall be expanded for tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal prior to assigning the value. "field splitting" is not mentioned. And isn't done (nothing is new here, this is as it has always been). On the other hand, in 2.6.2 ... ${parameter:=[word]} Assign Default Values. If parameter is unset or null, the expansion of word (or an empty string if word is omitted) shall be assigned to parameter. [...] I had not noticed at the time (but have now) that this is actually specified in 2.6.2 ... but it occurs at the bottom of the page before the descriptions of the ${var<op>word} expansions, and I did not think to look backwards, or I would have just quoted this: each case that a value of word is needed (based on the state of parameter, as described below), word shall be subjected to tilde expansion, parameter expansion, command substitution, and arithmetic expansion. [...] No mention of field splitting there (but field splitting, in an appropriate context, does happen to the results of the expansion in which the expansion of word was used). That is, the context matters. Generally when the standard just uses "the expansion" or "word expansions" it means "all of them", eg: back in 2.9.1, this time, point 2 2. The words that are not variable assignments or redirections shall be expanded. [...] Where that (and it is used in other places that way) means "apply all of the expansions" - which does include field splitting. As written, those words from 2.6.2 might be read to imply the same (and I think I may have seen implementations, once, where it was) and that field splitting should be performed. If that happens, how one would assign the result to parameter is completely unspecified however, there is nothing (at least in shells that don't implement arrays) which explains in any way how to assign multiple words (fields) to a variable. That's because it simply isn't done. So, while it is not strictly necessary, it might be an idea to add to something to "the expansion of word" (in all of the cases) to be more like "the expansion of word as indicated earlier in this section" or something - just to avoid others not noticing what the previous page said. All of this stuff (in my previous message) was just to indicate that the two bullet points in the description of $@ can be deleted as they add nothing (or could be made to add nothing) useful to the definition. As currently stated, the 2nd one allows only $(param-word} and ${param+word} (explicitly) and makes all others unspecified (if $@ is used in word), so I needed to show that those two are the only two where the word appears in a context where field splitting happens. Having done that all we need is to say that $@ is unspecified when in a context where field splitting does not happen. (Field splitting does eventually happen to the "word" in ${var+word} - (or '-' instead of '+') if it is used (otherwise it isn't expanded, and whether an expansion would be specified or not is irrelevant) whereas it never does in the other cases (in ${var=word} it is the expansion of var, after a value has been assigned to it that is field split, not the expasion of word. The only one where there's even a possibility of doubt about that is the '=' operator case - hence I gave the (undisputed really) example to show that field splitting does not happen to the expansion of word in ${param=word} | I don't like this rearrangement because it is a completely different | structure from the one for '*'. The text for $* would need some changes as well. Certainly I agree that having them aligned is a good thing, as they are so similar, and it makes it much easier to spot what the differences between the two are if they are written using similar language, and in a similar order. But I don't think the differences would end up being quite as big as you imagine - though the way the different parts would be shown would differ from the way it is done now (see more below). I do agree however that the way I actually explained what I imagine the result would be, made that hard to see. In particular, I'd start with explaining the cases where $@ is unspecified before continuing to the "how to do it" (when the result is specified, of course) where the $@ and $* texts can be more aligned, rather than, as now, starting the $@ text in the same way the $* text starts, and then adding all those "except when" clauses throughout $@ - which is what makes it hard to read (IMO). $* is always specified (can be used anywhere) so there's no need to include anything similar in that one, rather it just has two different methods of being expanded, whereas $@ has (really) only one. | It also has numerous problems, which I started to write up but decided [...] That's entirely possible. Since I didn't write up (even for me) what the actual final text would be, I didn't get a chance to make sure it was all complete and correct - and we should probably wait before getting bogged down in those kinds of details until it is determined whether or not the general approach is one worth pursuing. You may have missed it, since it was way down near the end, but my message did indicate that I knew this was all something that needed to get buy-in to be accepted (altering things the way I suggested), but my message did include ... If the text is not altered (something like the above, or with a similar intent) then we at least need to define what this mention of "embedded within" actually means, [...] That is, I was not assuming that the end result would necessarily be the way I suggested. I am still not, though I would prefer it if that happened, as I believe the result will be better. Now, back to that: | That may be how it works in your mind, but it's not what the standard | says. No, I knew that. A part of my message that you didn't quote in your reply was: This would be helped if we weren't slightly mixed up about what it means to be a "context where field splitting will be performed" - we already got rid of the assumption that when IFS='' we no longer have such a context, we also need to get rid of the assumption that when a word is quoted we are not in a context where field splitting will be performed. Maybe that wasn't well written, but what I meant was that we should be changing the way we do things - simplifying it - throughout the standard (I think will benefit several places - more below - and not just the $@ description.) Note this is all intended to be editorial (perhaps except the "${0+$@}" case) - just change the way that we specify what is already specified, or at least not to change what we are intending to specify, just make it easier for readers to ascertain what that is. | The description of field splitting explicitly says it is only | performed for expansions that are not in double quotes ("the shell shall | scan the results of expansions and substitutions that did not occur in | double-quotes for field splitting and multiple fields can result"). Yes, it does, and that is just fine - and in fact, that is exactly what I am relying upon. The same section also says 2. If the value of IFS is null, no field splitting shall be performed. but we do not make $@ expansions unspecified if IFS is null (more on that below.) The way that is done (the language required) is much of what makes the $@ description hard to read. [ Aside: Note that in all of this, I am taking the quoted text from TC2 (vers 181) even though I know that in some cases there have been some later interpretations which will alter the text for TC3 when it appears - for right now I'm just too lazy to go find all the the relevant bits (it would be **really** nice if there was a version available that was kept up to date as each interpretation is approved - I think I've said that before!) Except perhaps in one case (mentioned below) I don't think that any of the changes already agreed alter anything that is significant to any of the points I am making (eg: I quote the text from case patterns, which does not have quote removal included, whereas we have an interpretation where that's been added - though hopefully it will go away again ... but quote removal is irrelevant to the point, so this difference doesn't matter). ] The point I was making is that we should clearly separate, in our minds, as well as in the text, the difference between in a context where field splitting is performed from field splitting is performed "A context where field splitting is performed" is where the text just says "the word expansions" (or similar), as in the quote above from 2.9.1 point 2 (all the words in a command), or where it explicitly states that field spitting is performed, if there are any such occurrences. There are other examples, such as in the definition of "for" (XCU 2.9.4.2): First, the list of words following in shall be expanded to generate a list of items. The "shall be expanded" means that (each word) is subject to all of the default expansions (specified in the prelude of 2.6) - this is a context where field splitting is performed, and means that both for var in $@ and for var in "$@" are uses of $@ where the result is specified (which is a good thing, as 'in "$@"' is explicitly the default when no "in" is given). On the other hand, there are places in the spec where things are different, as the quote above from 2.9.1 point 4 (variable assignment expansions), but also in case statements (2.9.4.3) which sets out: [...] that is matched by the string resulting from the tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal of the given word. No mention of field splitting there. And: [...] each pattern that labels a compound-list shall be subjected to tilde expansion, parameter expansion, command substitution, and arithmetic expansion, and [...] Again, no mention of field splitting. In none of these cases does it matter whether or not anything is quoted (nor what the value of IFS might happen to be) - the expansions do not occur in a context where field splitting occurs, so if we use $@ in any of them, we get unspecified results. That is, none of these VAR=$@ VAR="$@" case $@ in .... esac case "$@" in .... esac case word in $@) ... ;; esac case word in "$@") ...;; esac has any specified result. The same applies in 2.7 (redirection operators) ... For the other redirection operators [other than << and <<-], the word that follows the redirection operator shall be subjected to tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal. [...] The immediately following elided text discusses pathname expansion, and what that might mean (not relevant here). But there is no field splitting. > $@ and > "$@" are always unspecified. Note however that >$* is not unspecified, and has a perfectly well defined meaning (quite unlikely to be useful, but defined anyway.) It is this distinction that I would like to have the words "in a context where field splitting is performed" mean. Not for it to mean "when field splitting is performed" (or "will be performed") or "field splitting scans the text" or "field splitting splits the field", any of which can easily be stated, if required, in exactly those words (or similar ones). Of course, we could also just coin (and define) a new phrase to mean what I intend here - though I think that would actually require more changes, rather than fewer, and would probably end up with a result in which the words "in a context where field splitting occurs" (or does not occur) would all vanish, and be replaced by the new phrase. While that would (in the long run) work just as well, and if we were to find a particularly apt phrase to define and use, perhaps even better, I am not sure that it would be worth the effort. And yes, I am aware this is going to be a change, and that there will be a few consequential changes needed in other places. I'm willing to look through the text (and even to try and find existing approved, or accepted, unapplied interpretations) to find anything that will be affected - but I am not about to do that unless we get some general agreement that doing what I would like to do is a good idea. Now for a little more convincing that it will help... First, note in the prelude to 2.6, the order of the word expansions, number 2... 2. Field splitting (see Section 2.6.5) shall be performed on the portions of the fields generated by step 1, unless IFS is null. This is where I think there's an interpretation already approved that deletes the final "unless IFS is null", as that's not needed (or useful). If that had not already been done, we would need to do it, as otherwise IFS=''; echo "$@" would result in unspecified behaviour - as IFS being null would make it a context where field splitting is not performed, and consequently, "$@" does not have a specified expansion. But I suspect that argument (or one like it) was already made, as I do recall that "unless IFS is null" has gone already. But why I quoted this here is not for that, it is for what is not there. Note it does not say: 2. Field splitting (see Section 2.6.5) shall be performed on the unquoted portions of the fields generated by step 1... There is nothing about quoted or not quoted there at all. Field splitting is performed in either case. This is a context where field splitting is performed. Then when we get into 2.6.5 we see the text that you quoted, and I retained, above, from 2.6.5, which says that field splitting does not scan quoted text. All this is good - it is exactly as it ought to be. The context (for normal word expansions) is one where field splitting is performed, but when quoted, text is not actually split. Surely the distinction is not hard to grasp? Further, this is actually important, as in an expansion like ${0+"a b" "c d"} we end up with text, which both is, and is not, subject to field splitting, all in the expansion of a single word. The quoted strings do not get field split, but the unquoted space between them does (assuming IFS includes a space of course.) So from this word we get two fields. Not 1, not 4. [And what is more, *every* shell I tested agrees with that.] On the other hand, in X=${0+"a b" "c d"} what gets assigned to X is "a b c d" as that is a context where field splitting does not happen, the unquoted space is simply an unquoted space. [Again, everyone agrees.] What happens to be quoted, or not quoted, is irrelevant to determining whether we are in a context where field splitting happens - though it certainly affects the results (if we are in a context where field splitting happens). Personally I consider this to be a very important distinction. Second (to refer to something from above again), it gets rid of the possible interpretation that these words from 2.6.5 might otherwise imply... 2. If the value of IFS is null, no field splitting shall be performed. If that is interpreted as meaning that when IFS is null, we have a context where field splitting is not performed, then we need to look at the wording in the current definition of $@ (2.5.2) · Field splitting as described in Section 2.6.5 would be performed if the expansion were not within double-quotes If IFS=NULL means "field splitting would not be performed" (as 2.6.5 says it does) then this is false when IFS is NULL, as field splitting would not be performed (regardless of double quotes). I know this bullet point goes on to say: (regardless of whether field splitting would have any effect; for example, if IFS is null). But "have no effect" is not what 2.6.5 says, having no effect would be for example set -- abc IFS=: echo "$@" in that, field splitting "would have no effect", but would still be performed, were the "$@" not in double quotes (exactly what the quote from the bullet point from 2.6.5 says) so that is a specified case. But when we instead have IFS='' then (according to 2.6.5) field splitting is not performed, not just has no effect. I know the intent of the "for example, if IFS is null" is intended to make it clear that "$@" is still specified when IFS='' but I am not sure that it is technically sufficient to achieve that effect, whatever the obvious intent was. On the other hand, if we were to adopt my definition of what "in a context where field splitting is performed" then all this kind of problem simply goes away. We just say "Expanding @ is unspecified if not in a context where field splitting is performed" (which is what we actually want, in every case, as far as I can tell ... that is, we do not request field splitting anywhere where we require only a single field, and we do request it everywhere that we can handle as many fields as happen to be produced - which is identical to the cases when "$@" (or unquoted $@) has a sensible meaning. Now I know that this does mean that the description of $* needs to be changed from When the expansion occurs in a context where field splitting will be performed, any empty fields may be discarded and each of the non-empty fields shall be further split as described in Section 2.6.5. When the expansion occurs in a context where field splitting will not be performed, the initial fields shall be joined to form a [...] into something like (see later for an alternative) When the expansion occurs in a context where field splitting will not be performed, and is not enclosed within double quotes, any empty fields .... When the expansion occurs in a context where field splitting will not be performed, or when the '*' is included within double quotes, the initial fields shall be joined ... That is one of the consequential changes mentioned above ... though I would not do it that way, instead I would prefix the entire $* section with: When the expansion occurs in a context where field splitting does not occur, it will be treated, solely for the purpose of generating the value, as if it had been enclosed within double quotes. and then continue as originally specified, except using "when the expansion occurs within double quotes" rather than "when the expansion occurs in a context where field splitting will not be performed" which would (kind of) align it with the similar wording I propose for the lead in to the description of the @ special param When the expansion occurs in a context where field splitting is not performed, the result is unspecified. Otherwise ... and then continue with the description of how it is done, which just like the $* expansion, varies depending upon whether or not double quotes are present (though $@ varies a little less, it just has special, or unspecified, cases.) Note that doing that also allows us to get rid of the following from 2.6.2: If a parameter expansion occurs inside double-quotes: · Pathname expansion shall not be performed on the results of the expansion. · Field splitting shall not be performed on the results of the expansion. None of that is needed, I'm not even sure that the first bullet point makes any sense at all, pathname expansion (unlike field splitting) is not performed upon "the results of the expansion", rather it is performed upon the field that results, whether it came from an expansion, wholly or partially, or not. That is, if we have var=.c and then do echo *"${var}" we do pathname expansion on *".c" (and since neither '.' nor 'c' is special to pathname expansion, that is the same as *.c ... all files that end in ".c") Exactly the same happens if we had instead had var=.? (or if you prefer, var='.?' just for extra clarity, there is no difference). The *."${var}" expands all files with names that end (literally) with ".?", we have (in both cases) performed pathname expansion using the results of an expansion which occurred within double quotes. What the quotes did, was not to prevent pathname expansion from happening, but to prevent the '?' being interpreted as "match any character" but instead be "match a question mark". The definition of pathname expansion needs to be able to handle this already, and does, as it also has to cope with cases where there is no parameter expansion, like echo *'.?' or echo *.\? so that first bullet point is, I believe, nonsense, and certainly not needed. The second is simply not needed, as whether we perform field splitting or not, 2.6.5 (in the section you quoted above) says quite clearly that field splitting only scans the results of expansions that did not occur in double quotes ... we do not need to state it again in the case of variable expansions -- we do not for arithmetic expansions, eg, it is clear that in IFS=5; printf %s\\n $(( 108 + 46 )) printf writes out two lines, "1" and "4", (or should, old pdksh does not) whereas: IFS=5; printf %s\\n "$(( 108 + 46 ))" writes a single line, "154" (which bosh gets wrong, after getting the harder one correct!) - yet the description of arithmetic expansion does not, and does not need to, say: If an arithmetic expansion occurs inside double quotes: . Field splitting shall not be performed on the results of the expansion Nor does command substitution. Nor does it need to. 2.6.5 already covers all of that. Next, the current wording, and its interpretation, for @ (from 2.5.2) says ... . When the expansion occurs in a context where field splitting will be performed, which, if that is intended to mean "not within double quotes" is, in a sense, requiring the shell to predict the future. True, it is a prediction that can be accurately made, as we are not depending upon some random event to decide what "will" happen (later), but simply specifying something that depends upon what future events will occur seems like a poor method to me. If it were changed as I wrote above: When the expansion occurs in a context where field splitting is not performed, the result is unspecified. Otherwise ... then we have none of that, we know what context we're in - eg: we know if we're expanding the word in a case statement, or one of the words in a for statement, when we do the expansion. We know what context we are currently in, and there is none of this "will be performed stuff" to deal with. | In any case, I think it is worth keeping the clarification in the | descriptions of '@' and '*' that field splitting is applied to each of | the positional-parameter-derived initial fields individually. Fine, I have no problem with that. I am not sure that it makes a lot of difference, as I cannot even imagine how anything different could possibly happen (we have created multiple fields - in the case where it matters) field splitting never joins things together, it can divide fields into multiple (that's the splitting) and in some cases can make fields vanish completely, but it never joins anything together. But I have no difficulty reinforcing the point. (Note that in my previous message, and even more in this one, I have not attempted to give the complete text - there is stuff that is there now which does not need alterations, those parts I just skipped.) | I think the way forward is for you to look at the description of '*' | and decide if you think it needs to change. Once we have text for '*' | that we are both happy with, we can think about how to describe '@' in | a way that maintains the correspondence between the two descriptions. Actually, I think that's backwards. $@ is the more complex case. We should (IMO) get that one correct first, so it is easy to follow, and gets all the corner cases (including some which are to be explicitly unspecified) correct, and then modify the description of $* (which is really much simpler) to match, so the differences and similarities with $@ are easy to detect. For now, I am going to wait to see if you (or anyone) agrees with my approach ... if not, then we need to find some other way to make it clear than "${0+$@}" is one of the cases where, when $# = 0, it is unspecified whether we get "" or nothing. If we do get some agreement, then I will open an issue (or do we call it submitting a defect report, or whatever ... fill in the mantis form anyway) and in that, attempt to actually list the changes that will be needed to make this happen. kre