Date:        Sat, 14 Oct 2023 14:46:12 +1000
    From:        Martin D Kealey <mar...@kurahaupo.gen.nz>
    Message-ID:  
<CAN_U6MX60UN+wfNpHU1pzQRzCaHgt_c+N=a2+3nnknhm5a+...@mail.gmail.com>


  | Back when I used the Bourne Shell we didn't have `local`, so we used to
  | write `var= func` to make sure that `func` couldn't mess with *our* `var`.

If you were using the original Bourne shell you couldn't have done that,
as it had no functions either.   The changes made to it beyond that were
very often badly designed or just plain broken, and in POSIX (because of
the way it worked in some shells) that behaviour was prohibited,
     VAR=foo func
was required to set VAR=foo in the shell environment (unless func altered it).

That requirement has only relatively recently been changed - changed because
it violated another POSIX requirement, that being that (ignoring execution
speed, etc) it should not be possible to tell the difference between a utility
implemented as a function and one implemented as a file system, command
(that is, if the function sets out to implement the same thing as the external
utility - which of course precludes it from making any changes to the shell
environment).

Even now it is unspecified what happens:

This is from XCU 2.9.1.2 (in the latest available Issue 8 draft, but I think
it is the same text in the current issued standard (Issue 7 + TCs))

  � If the command name is a function that is not a standard utility
    implemented as a function, variable assignments shall affect the current
    execution environment during the execution of the function.

[Aside: "standard utilities implemented as functions" are required to
act identically to the external utility, but they aren't the issue here]

    It is unspecified:

      -- Whether or not the variable assignments persist after the completion
         of the function

[ie: your trick is not guaranteed to work]

      -- Whether or not the variables gain the export attribute during the
         execution of the function

[ie: such a variable isn't even guaranteed to be exported]

      -- Whether or not export attributes gained as a result of the
         variable assignments persist after the completion of the function
         (if variable assignments persist after the completion of the function)

[ie: it is possible that a variable that wasn't exported before being
used as "VAR=val func" might now be exported]

A good implementation will revert the value if the func doesn't alter it,
and will put it in the environment during the lifetime of the function, but
none of that is guaranteed.   At least now that is permitted, rather than
prohibited, which it used to be.

There is nothing here anywhere that permits an implementation to avoid
making an assignment to a variable within a function fail to persist
when the function terminates - regardless of whether the variable was
named in a var-assign that precedes the command name (the command being
a function).    Of course if some non standard feature is used (like for
example, "local") then all bets are off, and whatever happens depends upon
what the shell defines to happen.

However in the case of a special built-in utility (which "." is) then
the requirements are much stricter:

  � If the command name is a special built-in utility, variable assignments
    shall affect the current execution environment before the utility is
    executed and remain in effect when the command completes; if an assigned
    variable is further modified by the utility, the modifications made by
    the utility shall persist. Unless the set -a option is on (see set),
    it is unspecified:

      -- Whether or not the variables gain the export attribute during the
         execution of the special built-in utility

      -- Whether or not export attributes gained as a result of the variable
         assignments persist after the completion of the special built-in
         utility

That is, in the case of "VAR=val . script" (which is what the OP was doing,
there were no functions involved) POSIX actually requires that VAR=val be
done before the utility is invoked (and never undone) and that if the script
modifies VAR, that modification remain after it has completed.

(It is just unspecified whether anything gets exported by this, and if it
does, whether that attribute remains after the script ends).

Note that some of that text is new in Issue 8, to deal with making it
clear what happens if a script does "X=Y unset X" where previously it
might have seemed permitted for X to remain set after that command (to Y
or whatever value it had before) - now it is (will be) clear that is not
permitted, and X must be unset after that command completes.   Similarly
"X=foo export X" must result in X being exported with value "foo" when
that command completes.   "." is no different (conceptually) than those.


  | Given that "put in the environment" actually means "create a shell variable
  | and mark it as exported",

That's an implementation detail, it doesn't require that at all for
external utilities, all that is required is that there be an entry in
the environment when the utility is envoked.   Not creating a sh variable
makes it much simpler to implement not changing the attributes or value of
an existing variable of the same name.

How that is simulated in the case of a function or a '.' command is entirely
up to the implementation, but it should (in a good implementation) be done
in a way that is effectively the same as what happens for an external
utility, even if the standard doesn't require it (just because so many
implementations are defective.)

  | it's difficult to see how "only put into the
  | environment but don't make it a local variable" could work without making
  | the semantics even more contorted and confusing.

The semantics are trivial - "unset VAR; VAR=init; VAR=val func" causes VAR
to have the value "val" when func starts, and be exported during func.
If func doesn't alter the value of VAR, then it reverts to "init" as its value
when func ends, and if func did not explicitly do "export VAR" its exported
status reverts to what it was before as well, which here means not exported,
that was the point of the "unset VAR" being there.   If func does change VAR's
value, whatever it is changed to persists when func terminates (just as it
would if the invocation had been "export VAR=val; func") and if func
explicitly does "export VAR" then the export attribute persists on VAR 
(whatever value it ends up having) as well.

POSIX doesn't require all of that, but that's what an implementation should
really be setting out to achieve.

And since "local" has been in bash for many decades now, I don't
really think anyone needs to be concerned with anything using the
implementation quirk that "FOO= func" happens to make FOO act just
like a local variable in func any more.   "local FOO" (or one of the
alternate bash commands that achieves the same effect) is a much
better thing to use (and doesn't result in FOO being potentially
exported in cases when there was no need for that).

  | It seems to me that what's needed is a new model for variables,

It would be nice if any two implementations could actually agree on a
model for variables - that's been the sticking point that has prevented
standardisation of "local" (since all shells have that now - it just operates
differently depending upon how they see variables working).

I have a very simple model.  All sh has are global vars.   That's exactly
as Bourne designed it (and ideal for an interactive command interpreter, if
less so for a programming language - but sh is first and foremost a command
interpreter).   The "local" command just saves the value and attributes of
the variables named, and arranges for those to be restored when the function
ceases to be active.   Meanwhile, everything carries on as if all that exists
are global vars.

Models that attempt to make truly scoped vars, get very messy when they
try to explain how things like:

        func()
        {
                local IFS=:
                read a b c < file
        }

is supposed to work (but making that work is essential).   read (which is not
defined within the scope of func) uses IFS to split the line read from file
amongst the variables.  Which IFS?   (In my model this is simple, there is
only one IFS.)

Making things be "dynamically scoped" (which is not really that much different
than my model in operation) helps with that, but leads to things like:

  | where the entire scope chain can be inspected and modified where necessary,

which is where things start getting exceedingly messy and complicated.

Far more complicated that a command interpreted needs - if you're looking
for a language in which to write good programs, pick something different,
sh's scripting ability is intended to allow the user to save command
sequences they use a lot, and not have to re-enter them all the time, with
enough scope for variability in what happens to depend upon the args.

Attempting for force it to be a general purpose programming language,
suitable to use for everything is what leads to giant messes (perl used
to be a nice combination sh/sed/awk language initially, until people
forced it to be able to do everything - now it is a cess pool).

kre

ps: I am not including the @gmail.com addr on any of these messages, as
gmail bounces all mail from me (so it would be pointless anyway) - and
my general belief is that anyone who uses gmail doesn't really deserve to
receive any e-mail at all.



Reply via email to