Re: IFS field splitting doesn't conform with POSIX

Robert Elz Sat, 01 Apr 2023 19:41:12 -0700

    Date:        Sat, 1 Apr 2023 18:49:56 -0600
    From:        Felipe Contreras <felipe.contre...@gmail.com>
    Message-ID:  
<camp44s1nv0+4r34_+4zyocvg+81subm_-nr0pphi1b52vzh...@mail.gmail.com>


  | Fortunately kre did listen.

Not really.    I agree that what POSIX currently says is not correct,
which is why the defect report got filed (you may have noticed that there
was no new wording proposed there - and still isn't - which is because
this is very hard to get correct, other than possibly by simply giving
the code that should be executed (no, that won't happen)).

But the others are correct, POSIX (in general) standardises what shells
actually do - you can see this if you read a few pages, all kinds of
things lead to unspecified (or worse, undefined) behaviour.   That's
because different implementations do different things in those cases.
Not because some specific behaviour could not be required, not even
that doing so might not be better all around.   But implementations
don't do the same thing in those cases, and so users cannot rely upon
anything particular happening (sometimes behaviour is unspecified,
but only between a limited number of choices).

The standard has two purposes - one is to allow application writers
(users) work out what they can expect to work, and what they should not
do if they expect code to be portable.   The other is so implementors
of new implementations (of the shell, or anything else included)
know what to implement (and where they can do things differently).

You're right, when the standard uses "shall" it is being prescriptive,
and implementations must do that if they want to claim to conform.
But the standard only does that when the existing (at least major, and
intending to conform) implementations, at the time the standard is
written, actually do what is proposed to be required by a "shall".

There are odd occasions (such as the read errors in scripts) where something
that (almost all) implementations do is so obviously the wrong thing to
do, that the standard requires implementations to change, but if you
looked at that issue, and I believe you did, that was only done after
checking with implementors to see if they were willing to make the
change.

In this case, the standard will certainly end up saying that IFS
characters (both white space and others - there are differences in
how they work, but not in this regard) terminate fields, and
if there is nothing after the final IFS character (or characters,
in the case of IFS whitespace), then there is no additional field,
and if there is something there, then that makes an additional field,
even if there is no IFS terminator following it.   That's because that's
what all (or essentially all) shells do, and always (for almost 45
years now) have done so.

That is, if we have "IFS=," then both a,b,c and a,b,c,
produce 3 fields "a" "b" and "c".

On the other hand, the standard is likely to say that whether
characters other than space/tab/newline which are white space according
to the definition of that term in the standard, can be IFS white
space, is unspecified - because shell implementations are split
about that (about 60/40 for "no" - even though the standard currently
seems to say "yes").   That is unless shell implementers can be persuaded
to change their implementations, which in this case is probably unlikely
(as no-one can be sure that there aren't scripts around which rely
upon their current behaviour - no-one wants to break backward compat).
The effect will probably be that using any white space char in IFS, other
than the blessed 3, will make a script non-portable (might work with one
shell, and not another).

kre

Re: IFS field splitting doesn't conform with POSIX

Reply via email to