Date: Sat, 14 Oct 2023 14:46:12 +1000 From: Martin D Kealey <mar...@kurahaupo.gen.nz> Message-ID: <CAN_U6MX60UN+wfNpHU1pzQRzCaHgt_c+N=a2+3nnknhm5a+...@mail.gmail.com>
| Back when I used the Bourne Shell we didn't have `local`, so we used to | write `var= func` to make sure that `func` couldn't mess with *our* `var`. If you were using the original Bourne shell you couldn't have done that, as it had no functions either. The changes made to it beyond that were very often badly designed or just plain broken, and in POSIX (because of the way it worked in some shells) that behaviour was prohibited, VAR=foo func was required to set VAR=foo in the shell environment (unless func altered it). That requirement has only relatively recently been changed - changed because it violated another POSIX requirement, that being that (ignoring execution speed, etc) it should not be possible to tell the difference between a utility implemented as a function and one implemented as a file system, command (that is, if the function sets out to implement the same thing as the external utility - which of course precludes it from making any changes to the shell environment). Even now it is unspecified what happens: This is from XCU 2.9.1.2 (in the latest available Issue 8 draft, but I think it is the same text in the current issued standard (Issue 7 + TCs)) � If the command name is a function that is not a standard utility implemented as a function, variable assignments shall affect the current execution environment during the execution of the function. [Aside: "standard utilities implemented as functions" are required to act identically to the external utility, but they aren't the issue here] It is unspecified: -- Whether or not the variable assignments persist after the completion of the function [ie: your trick is not guaranteed to work] -- Whether or not the variables gain the export attribute during the execution of the function [ie: such a variable isn't even guaranteed to be exported] -- Whether or not export attributes gained as a result of the variable assignments persist after the completion of the function (if variable assignments persist after the completion of the function) [ie: it is possible that a variable that wasn't exported before being used as "VAR=val func" might now be exported] A good implementation will revert the value if the func doesn't alter it, and will put it in the environment during the lifetime of the function, but none of that is guaranteed. At least now that is permitted, rather than prohibited, which it used to be. There is nothing here anywhere that permits an implementation to avoid making an assignment to a variable within a function fail to persist when the function terminates - regardless of whether the variable was named in a var-assign that precedes the command name (the command being a function). Of course if some non standard feature is used (like for example, "local") then all bets are off, and whatever happens depends upon what the shell defines to happen. However in the case of a special built-in utility (which "." is) then the requirements are much stricter: � If the command name is a special built-in utility, variable assignments shall affect the current execution environment before the utility is executed and remain in effect when the command completes; if an assigned variable is further modified by the utility, the modifications made by the utility shall persist. Unless the set -a option is on (see set), it is unspecified: -- Whether or not the variables gain the export attribute during the execution of the special built-in utility -- Whether or not export attributes gained as a result of the variable assignments persist after the completion of the special built-in utility That is, in the case of "VAR=val . script" (which is what the OP was doing, there were no functions involved) POSIX actually requires that VAR=val be done before the utility is invoked (and never undone) and that if the script modifies VAR, that modification remain after it has completed. (It is just unspecified whether anything gets exported by this, and if it does, whether that attribute remains after the script ends). Note that some of that text is new in Issue 8, to deal with making it clear what happens if a script does "X=Y unset X" where previously it might have seemed permitted for X to remain set after that command (to Y or whatever value it had before) - now it is (will be) clear that is not permitted, and X must be unset after that command completes. Similarly "X=foo export X" must result in X being exported with value "foo" when that command completes. "." is no different (conceptually) than those. | Given that "put in the environment" actually means "create a shell variable | and mark it as exported", That's an implementation detail, it doesn't require that at all for external utilities, all that is required is that there be an entry in the environment when the utility is envoked. Not creating a sh variable makes it much simpler to implement not changing the attributes or value of an existing variable of the same name. How that is simulated in the case of a function or a '.' command is entirely up to the implementation, but it should (in a good implementation) be done in a way that is effectively the same as what happens for an external utility, even if the standard doesn't require it (just because so many implementations are defective.) | it's difficult to see how "only put into the | environment but don't make it a local variable" could work without making | the semantics even more contorted and confusing. The semantics are trivial - "unset VAR; VAR=init; VAR=val func" causes VAR to have the value "val" when func starts, and be exported during func. If func doesn't alter the value of VAR, then it reverts to "init" as its value when func ends, and if func did not explicitly do "export VAR" its exported status reverts to what it was before as well, which here means not exported, that was the point of the "unset VAR" being there. If func does change VAR's value, whatever it is changed to persists when func terminates (just as it would if the invocation had been "export VAR=val; func") and if func explicitly does "export VAR" then the export attribute persists on VAR (whatever value it ends up having) as well. POSIX doesn't require all of that, but that's what an implementation should really be setting out to achieve. And since "local" has been in bash for many decades now, I don't really think anyone needs to be concerned with anything using the implementation quirk that "FOO= func" happens to make FOO act just like a local variable in func any more. "local FOO" (or one of the alternate bash commands that achieves the same effect) is a much better thing to use (and doesn't result in FOO being potentially exported in cases when there was no need for that). | It seems to me that what's needed is a new model for variables, It would be nice if any two implementations could actually agree on a model for variables - that's been the sticking point that has prevented standardisation of "local" (since all shells have that now - it just operates differently depending upon how they see variables working). I have a very simple model. All sh has are global vars. That's exactly as Bourne designed it (and ideal for an interactive command interpreter, if less so for a programming language - but sh is first and foremost a command interpreter). The "local" command just saves the value and attributes of the variables named, and arranges for those to be restored when the function ceases to be active. Meanwhile, everything carries on as if all that exists are global vars. Models that attempt to make truly scoped vars, get very messy when they try to explain how things like: func() { local IFS=: read a b c < file } is supposed to work (but making that work is essential). read (which is not defined within the scope of func) uses IFS to split the line read from file amongst the variables. Which IFS? (In my model this is simple, there is only one IFS.) Making things be "dynamically scoped" (which is not really that much different than my model in operation) helps with that, but leads to things like: | where the entire scope chain can be inspected and modified where necessary, which is where things start getting exceedingly messy and complicated. Far more complicated that a command interpreted needs - if you're looking for a language in which to write good programs, pick something different, sh's scripting ability is intended to allow the user to save command sequences they use a lot, and not have to re-enter them all the time, with enough scope for variability in what happens to depend upon the args. Attempting for force it to be a general purpose programming language, suitable to use for everything is what leads to giant messes (perl used to be a nice combination sh/sed/awk language initially, until people forced it to be able to do everything - now it is a cess pool). kre ps: I am not including the @gmail.com addr on any of these messages, as gmail bounces all mail from me (so it would be pointless anyway) - and my general belief is that anyone who uses gmail doesn't really deserve to receive any e-mail at all.