Date:        Thu, 28 Feb 2019 12:04:25 +0000
    From:        Geoff Clare <g...@opengroup.org>
    Message-ID:  <20190228120425.GA10849@lt2.masqnet>

To take this in something of a reverse order...

  | As far as I can see this agrees with the examples in XRAT C.2.5.2.

It was intended to - nothing there was supposed to be even slightly
controversial, and I was very surprised that bosh managed to get it
all wrong (everything else gets the same results).

The point was just that in

        ${var=word}

"word" is expanded in a context where there is no field splitting
(quotes are irrelevant to that).   This is just the same as in

        var=word

The latter is quite clear in the standard (XCU 2.9.1 point 4):

        4. Each variable assignment shall be expanded for tilde expansion,
           parameter expansion, command substitution, arithmetic expansion,
           and quote removal prior to assigning the value.

"field splitting" is not mentioned.   And isn't done (nothing is new here,
this is as it has always been).

On the other hand, in 2.6.2 ...

        ${parameter:=[word]}    Assign Default Values. If parameter is unset
                                or null, the expansion of word (or an empty
                                string if word is omitted) shall be assigned
                                to parameter. [...]

I had not noticed at the time (but have now) that this is actually
specified in 2.6.2 ... but it occurs at the bottom of the page before
the descriptions of the ${var<op>word} expansions, and I did not
think to look backwards, or I would have just quoted this:

        each case that a value of word is needed (based on the state
        of parameter, as described below), word shall be subjected to
        tilde expansion, parameter expansion, command substitution, and
        arithmetic expansion. [...]

No mention of field splitting there (but field splitting, in an
appropriate context, does happen to the results of the expansion in
which the expansion of word was used).   That is, the context matters.

Generally when the standard just uses "the expansion" or "word expansions"
it means "all of them", eg: back in 2.9.1, this time, point 2

        2. The words that are not variable assignments or redirections
           shall be expanded. [...]

Where that (and it is used in other places that way) means "apply all of the
expansions" - which does include field splitting.

As written, those words from 2.6.2 might be read to imply the same (and
I think I may have seen implementations, once, where it was) and that
field splitting should be performed.    If that happens, how one would
assign the result to parameter is completely unspecified however, there
is nothing (at least in shells that don't implement arrays) which explains
in any way how to assign multiple words (fields) to a variable.   That's
because it simply isn't done.

So, while it is not strictly necessary, it might be an idea to add
to something to "the expansion of word" (in all of the cases) to
be more like "the expansion of word as indicated earlier in this section"
or something - just to avoid others not noticing what the previous
page said.


All of this stuff (in my previous message) was just to indicate that the two
bullet points in the description of $@ can be deleted as they add nothing
(or could be made to add nothing) useful to the definition.

As currently stated, the 2nd one allows only $(param-word} and ${param+word}
(explicitly) and makes all others unspecified (if $@ is used in word),
so I needed to show that those two are the only two where the word appears
in a context where field splitting happens.   Having done that all we need
is to say that $@ is unspecified when in a context where field splitting
does not happen.   (Field splitting does eventually happen to the "word"
in ${var+word} - (or '-' instead of '+') if it is used (otherwise it isn't
expanded, and whether an expansion would be specified or not is irrelevant)
whereas it never does in the other cases (in ${var=word} it is the expansion
of var, after a value has been assigned to it that is field split, not the
expasion of word.

The only one where there's even a possibility of doubt about that is the '='
operator case - hence I gave the (undisputed really) example to show that
field splitting does not happen to the expansion of word in ${param=word}



  | I don't like this rearrangement because it is a completely different
  | structure from the one for '*'.

The text for $* would need some changes as well.    Certainly I agree that
having them aligned is a good thing, as they are so similar, and it makes
it much easier to spot what the differences between the two are if they are
written using similar language, and in a similar order.

But I don't think the differences would end up being quite as big as you
imagine - though the way the different parts would be shown would differ from
the way it is done now (see more below).   I do agree however that the way I
actually explained what I imagine the result would be, made that hard to see.

In particular, I'd start with explaining the cases where $@ is unspecified
before continuing to the "how to do it" (when the result is specified, of 
course) where the $@ and $* texts can be more aligned, rather than, as now,
starting the $@ text in the same way the $* text starts, and then adding all
those "except when" clauses throughout $@ - which is what makes it hard to
read (IMO).   $* is always specified (can be used anywhere) so there's no
need to include anything similar in that one, rather it just has two
different methods of being expanded, whereas $@ has (really) only one.



  | It also has numerous problems, which I started to write up but decided
[...]

That's entirely possible.   Since I didn't write up (even for me) what
the actual final text would be, I didn't get a chance to make sure it was
all complete and correct - and we should probably wait before getting
bogged down in those kinds of details until it is determined whether or
not the general approach is one worth pursuing.

You may have missed it, since it was way down near the end, but my
message did indicate that I knew this was all something that needed to
get buy-in to be accepted (altering things the way I suggested), but
my message did include ...

        If the text is not altered (something like the above,
        or with a similar intent) then we at least need to define
        what this mention of "embedded within" actually means, [...]

That is, I was not assuming that the end result would necessarily be
the way I suggested.   I am still not, though I would prefer it if that
happened, as I believe the result will be better.


Now, back to that:

  | That may be how it works in your mind, but it's not what the standard
  | says.

No, I knew that.   A part of my message that you didn't quote in your
reply was:

        This would be helped if we weren't slightly mixed up about what
        it means to be a "context where field splitting will be performed"
        - we already got rid of the assumption that when IFS='' we no longer
        have such a context, we also need to get rid of the assumption that
        when a word is quoted we are not in a context where field splitting
        will be performed.

Maybe that wasn't well written, but what I meant was that we should be
changing the way we do things - simplifying it - throughout the standard
(I think will benefit several places - more below - and not just the $@
description.)

Note this is all intended to be editorial (perhaps except the "${0+$@}"
case) - just change the way that we specify what is already specified, or
at least not to change what we are intending to specify, just make it
easier for readers to ascertain what that is.

  | The description of field splitting explicitly says it is only
  | performed for expansions that are not in double quotes ("the shell shall
  | scan the results of expansions and substitutions that did not occur in
  | double-quotes for field splitting and multiple fields can result").

Yes, it does, and that is just fine - and in fact, that is exactly what
I am relying upon.

The same section also says

        2. If the value of IFS is null, no field splitting shall be performed.

but we do not make $@ expansions unspecified if IFS is null (more on
that below.)   The way that is done (the language required) is much of
what makes the $@ description hard to read.


     [  Aside: Note that in all of this, I am taking the quoted text
        from TC2 (vers 181) even though I know that in some cases there
        have been some later interpretations which will alter the text
        for TC3 when it appears - for right now I'm just too lazy to go
        find all the the relevant bits (it would be **really** nice if
        there was a version available that was kept up to date as each
        interpretation is approved - I think I've said that before!)

        Except perhaps in one case (mentioned below) I don't think
        that any of the changes already agreed alter anything that is
        significant to any of the points I am making (eg: I quote the
        text from case patterns, which does not have quote removal included,
        whereas we have an interpretation where that's been added - though
        hopefully it will go away again ... but quote removal is irrelevant
        to the point, so this difference doesn't matter).
     ]


The point I was making is that we should clearly separate, in our minds,
as well as in the text, the difference between

        in a context where field splitting is performed
from
        field splitting is performed


"A context where field splitting is performed" is where the text just
says "the word expansions" (or similar), as in the quote above from
2.9.1 point 2 (all the words in a command), or where it explicitly states
that field spitting is performed, if there are any such occurrences.

There are other examples, such as in the definition of "for" (XCU 2.9.4.2):

        First, the list of words following in shall be expanded to
        generate a list of items.

The "shall be expanded" means that (each word) is subject to all of
the default expansions (specified in the prelude of 2.6) - this is
a context where field splitting is performed, and means that both

        for var in $@
and
        for var in "$@"

are uses of $@ where the result is specified (which is a good thing,
as 'in "$@"' is explicitly the default when no "in" is given).

On the other hand, there are places in the spec where things are
different, as the quote above from 2.9.1 point 4 (variable assignment
expansions), but also in case statements (2.9.4.3) which sets out:

        [...] that is matched by the string resulting from the tilde
        expansion, parameter expansion, command substitution, arithmetic
        expansion, and quote removal of the given word.

No mention of field splitting there.    And:

        [...] each pattern that labels a compound-list shall be subjected
        to tilde expansion, parameter expansion, command substitution, and
        arithmetic expansion, and [...]

Again, no mention of field splitting.   In none of these cases does it
matter whether or not anything is quoted (nor what the value of IFS might
happen to be) - the expansions do not occur in a context where field
splitting occurs, so if we use $@ in any of them, we get unspecified results.

That is, none of these
        VAR=$@
        VAR="$@"
        case $@ in .... esac
        case "$@" in .... esac
        case word in $@) ... ;; esac
        case word in "$@") ...;; esac
has any specified result.

The same applies in 2.7 (redirection operators) ...

        For the other redirection operators [other than << and <<-],
        the word that follows the redirection operator shall be subjected
        to tilde expansion, parameter expansion, command substitution,
        arithmetic expansion, and quote removal. [...]

The immediately following elided text discusses pathname expansion, and
what that might mean (not relevant here).   But there is no field splitting.

        > $@
and
        > "$@"

are always unspecified.   Note however that >$* is not unspecified, and
has a perfectly well defined meaning (quite unlikely to be useful, but
defined anyway.)

It is this distinction that I would like to have the words "in a context
where field splitting is performed" mean.  Not for it to mean "when field
splitting is performed" (or "will be performed") or "field splitting scans
the text" or "field splitting splits the field", any of which can easily be
stated, if required, in exactly those words (or similar ones).


Of course, we could also just coin (and define) a new phrase to mean what
I intend here - though I think that would actually require more changes,
rather than fewer, and would probably end up with a result in which the
words "in a context where field splitting occurs" (or does not occur) would
all vanish, and be replaced by the new phrase.  While that would (in the long
run) work just as well, and if we were to find a particularly apt phrase
to define and use, perhaps even better, I am not sure that it would be
worth the effort.


And yes, I am aware this is going to be a change, and that there will be
a few consequential changes needed in other places.   I'm willing to look
through the text (and even to try and find existing approved, or accepted,
unapplied interpretations) to find anything that will be affected - but I
am not about to do that unless we get some general agreement that doing
what I would like to do is a good idea.


Now for a little more convincing that it will help...

First, note in the prelude to 2.6, the order of the word expansions,
number 2...

        2. Field splitting (see Section 2.6.5) shall be performed on
           the portions of the fields generated by step 1, unless IFS is null.

This is where I think there's an interpretation already approved that
deletes the final "unless IFS is null", as that's not needed (or useful).
If that had not already been done, we would need to do it, as otherwise
IFS=''; echo "$@" would result in unspecified behaviour - as IFS being
null would make it a context where field splitting is not performed,
and consequently, "$@" does not have a specified expansion.  But I suspect
that argument (or one like it) was already made, as I do recall that
"unless IFS is null" has gone already.

But why I quoted this here is not for that, it is for what is not there.

Note it does not say:

        2. Field splitting (see Section 2.6.5) shall be performed on
           the unquoted portions of the fields generated by step 1...

There is nothing about quoted or not quoted there at all.   Field splitting
is performed in either case.   This is a context where field splitting
is performed.

Then when we get into 2.6.5 we see the text that you quoted, and I retained,
above, from 2.6.5, which says that field splitting does not scan quoted text.

All this is good - it is exactly as it ought to be.   The context (for normal
word expansions) is one where field splitting is performed, but when quoted,
text is not actually split.   Surely the distinction is not hard to grasp?

Further, this is actually important, as in an expansion like

        ${0+"a b" "c d"}

we end up with text, which both is, and is not, subject to field splitting,
all in the expansion of a single word.   The quoted strings do not get
field split, but the unquoted space between them does (assuming IFS includes
a space of course.)   So from this word we get two fields.   Not 1, not 4.
[And what is more, *every* shell I tested agrees with that.]

On the other hand, in
        X=${0+"a b" "c d"}
what gets assigned to X is
        "a b c d"
as that is a context where field splitting does not happen, the unquoted
space is simply an unquoted space.   [Again, everyone agrees.]

What happens to be quoted, or not quoted, is irrelevant to determining
whether we are in a context where field splitting happens - though it
certainly affects the results (if we are in a context where field splitting
happens).

Personally I consider this to be a very important distinction.

Second (to refer to something from above again), it gets rid of the
possible interpretation that these words from 2.6.5 might otherwise
imply...

        2. If the value of IFS is null, no field splitting shall be performed.

If that is interpreted as meaning that when IFS is null, we have a
context where field splitting is not performed, then we need to look
at the wording in the current definition of $@ (2.5.2)

        · Field splitting as described in Section 2.6.5 would be performed
          if the expansion were not within double-quotes

If IFS=NULL means "field splitting would not be performed" (as 2.6.5
says it does) then this is false when IFS is NULL, as field splitting would
not be performed (regardless of double quotes).

I know this bullet point goes on to say:

         (regardless of whether field splitting would have any effect;
         for example, if IFS is null).

But "have no effect" is not what 2.6.5 says, having no effect would be
for example

                set -- abc
                IFS=:
                echo "$@"

in that, field splitting "would have no effect", but would still be
performed, were the "$@" not in double quotes (exactly what the quote
from the bullet point from 2.6.5 says) so that is a specified case.

But when we instead have
                IFS=''
then (according to 2.6.5) field splitting is not performed, not just
has no effect.

I know the intent of the "for example, if IFS is null" is intended to
make it clear that "$@" is still specified when IFS='' but I am not sure
that it is technically sufficient to achieve that effect, whatever the
obvious intent was.

On the other hand, if we were to adopt my definition of what "in a context
where field splitting is performed" then all this kind of problem simply
goes away.

We just say "Expanding @ is unspecified if not in a context where field
splitting is performed" (which is what we actually want, in every case, as
far as I can tell ... that is, we do not request field splitting anywhere
where we require only a single field, and we do request it everywhere that
we can handle as many fields as happen to be produced - which is identical
to the cases when "$@" (or unquoted $@) has a sensible meaning.

Now I know that this does mean that the description of $* needs to be
changed from

        When the expansion occurs in a context where field splitting will
        be performed, any empty fields may be discarded and each of the
        non-empty fields shall be further split as described in Section 2.6.5.
        When the expansion occurs in a context where field splitting will
        not be performed, the initial fields shall be joined to form a [...]

into something like (see later for an alternative)

        When the expansion occurs in a context where field splitting will
        not be performed, and is not enclosed within double quotes, any
        empty fields ....   When the expansion occurs in a context where field
        splitting will not be performed, or when the '*' is included within
        double quotes, the initial fields shall be joined ...

That is one of the consequential changes mentioned above ... though I would
not do it that way, instead I would prefix the entire $* section with:

        When the expansion occurs in a context where field splitting does
        not occur, it will be treated, solely for the purpose of generating
        the value, as if it had been enclosed within double quotes.

and then continue as originally specified, except using "when the expansion
occurs within double quotes" rather than "when the expansion occurs in a
context where field splitting will not be performed"

which would (kind of) align it with the similar wording I propose for the
lead in to the description of the @ special param

        When the expansion occurs in a context where field splitting is
        not performed, the result is unspecified.   Otherwise ...

and then continue with the description of how it is done, which just like
the $* expansion, varies depending upon whether or not double quotes are
present (though $@ varies a little less, it just has special, or unspecified,
cases.)


Note that doing that also allows us to get rid of the following
from 2.6.2:

If a parameter expansion occurs inside double-quotes:
    · Pathname expansion shall not be performed on the results of the expansion.
    · Field splitting shall not be performed on the results of the expansion.

None of that is needed, I'm not even sure that the first bullet point
makes any sense at all, pathname expansion (unlike field splitting) is
not performed upon "the results of the expansion", rather it is performed
upon the field that results, whether it came from an expansion, wholly or
partially, or not.

That is, if we have

        var=.c

and then do

        echo *"${var}"

we do pathname expansion on *".c" (and since neither '.' nor 'c' is
special to pathname expansion, that is the same as *.c ... all files
that end in ".c")

Exactly the same happens if we had instead had

        var=.?    (or if you prefer, var='.?' just for extra clarity, there
                   is no difference).

The *."${var}" expands all files with names that end (literally) with ".?",
we have (in both cases) performed pathname expansion using the results
of an expansion which occurred within double quotes.   What the quotes
did, was not to prevent pathname expansion from happening, but to prevent the
'?' being interpreted as "match any character" but instead be "match a
question mark".

The definition of pathname expansion needs to be able to handle this
already, and does, as it also has to cope with cases where there is no
parameter expansion, like

        echo *'.?'
or
        echo *.\?

so that first bullet point is, I believe, nonsense, and certainly not
needed.

The second is simply not needed, as whether we perform field splitting
or not, 2.6.5 (in the section you quoted above) says quite clearly that
field splitting only scans the results of expansions that did not occur
in double quotes ... we do not need to state it again in the case of
variable expansions -- we do not for arithmetic expansions, eg, it is
clear that in
        IFS=5; printf %s\\n $(( 108 + 46 ))
printf writes out two lines, "1" and "4", (or should, old pdksh does not)
whereas:
        IFS=5; printf %s\\n "$(( 108 + 46 ))"
writes a single line, "154" (which bosh gets wrong, after getting the
harder one correct!) - yet the description of arithmetic expansion
does not, and does not need to, say:

        If an arithmetic expansion occurs inside double quotes:
         . Field splitting shall not be performed on the results of
           the expansion

Nor does command substitution.  Nor does it need to.   2.6.5 already covers
all of that.


Next, the current wording, and its interpretation, for @ (from 2.5.2) says ...

        . When the expansion occurs in a context where field
          splitting will be performed,

which, if that is intended to mean "not within double quotes" is,
in a sense, requiring the shell to predict the future.   True, it
is a prediction that can be accurately made, as we are not depending
upon some random event to decide what "will" happen (later), but
simply specifying something that depends upon what future events will
occur seems like a poor method to me.

If it were changed as I wrote above:

        When the expansion occurs in a context where field splitting is
        not performed, the result is unspecified.   Otherwise ...

then we have none of that, we know what context we're in - eg: we know if
we're expanding the word in a case statement, or one of the words in a
for statement, when we do the expansion.  We know what context we are
currently in, and there is none of this "will be performed stuff" to deal
with.

  | In any case, I think it is worth keeping the clarification in the
  | descriptions of '@' and '*' that field splitting is applied to each of
  | the positional-parameter-derived initial fields individually.

Fine, I have no problem with that.    I am not sure that it makes a lot
of difference, as I cannot even imagine how anything different could
possibly happen (we have created multiple fields - in the case where it
matters) field splitting never joins things together, it can divide
fields into multiple (that's the splitting) and in some cases can make
fields vanish completely, but it never joins anything together.   But I
have no difficulty reinforcing the point.   (Note that in my previous
message, and even more in this one, I have not attempted to give the
complete text - there is stuff that is there now which does not need
alterations, those parts I just skipped.)


  | I think the way forward is for you to look at the description of '*'
  | and decide if you think it needs to change.  Once we have text for '*'
  | that we are both happy with, we can think about how to describe '@' in
  | a way that maintains the correspondence between the two descriptions.

Actually, I think that's backwards.   $@ is the more complex case.  We should
(IMO) get that one correct first, so it is easy to follow, and gets all the
corner cases (including some which are to be explicitly unspecified) correct,
and then modify the description of $* (which is really much simpler) to
match, so the differences and similarities with $@ are easy to detect.

For now, I am going to wait to see if you (or anyone) agrees with my
approach ... if not, then we need to find some other way to make it clear
than "${0+$@}" is one of the cases where, when $# = 0, it is unspecified
whether we get "" or nothing.

If we do get some agreement, then I will open an issue (or do we call it
submitting a defect report, or whatever ... fill in the mantis form anyway)
and in that, attempt to actually list the changes that will be needed to
make this happen.

kre




Reply via email to