Re: variable set in exec'ing shell cannot be unset by child shell

2023-10-14 Thread Martin D Kealey
On Sun, 15 Oct 2023, 02:03 Robert Elz,  wrote:

> Date:Sat, 14 Oct 2023 14:46:12 +1000
> From:Martin D Kealey 
> Message-ID:   a2+3nnknhm5a+...@mail.gmail.com>
>
>
>   | Back when I used the Bourne Shell we didn't have `local`, so we used to
>   | write `var= func` to make sure that `func` couldn't mess with *our*
> `var`.
>
> If you were using the original Bourne shell you couldn't have done that,
> as it had no functions either.


Fair point, it was just whatever was /bin/sh on Ultrix at the time. I was a
uni student so I don't even know what version of Ultrix we were using.

I take your point that the Shell (and especially Bash) has grown
Frankenfeatures way beyond a mere command interpreter, in ways that are
fundamentally irreconcilable.

But I don't think sticking to our guns about "let's go back to simple" is
the best way forward.

The one thing to be said for the Shell is that it's universal. If we kill
it, what will take its place? I already have to install Bash, Awk, Perl,
Python, and Node just to have a running system. How many more will be
needed after Bash finally dies?

If the Shell is left out in no man's land, with a shortfall in features so
it can't be a "real" programming language, but at the same time with the
crazy complexity for users to learn, we pretty much doom it to extinction.

If the Shell is truly a moribund legacy language, we should stop changing
it. No new features. No "bug fixes". No new safety guards.

Or we design a new language that feels more like a regular programming
language even if its syntax is weird. In my opinion it should have:
Proper per-package feature selection;
proper lexically scoped variables & functions; opt-in rather than opt-out
globbing & word splitting;
opt-in rather than opt-out filedescriptor inheritance;
strongly typed variables, with string/number/array/compound/filehandle
values;
distinguishable binary (octet-stream) and text (Unicode/utf-8), with
support for null bytes in strings, and a Cstring attribute to prohibit
assignments that include null bytes (because execve is so central to
everything);
support for AF_LOCAL sockets as bidirectional pipes;
exceptions separate from exit-status, with ability to enrol some but not
all commands for the set-e treatment.

Yes that's far too much work for one person; I do not expect Chet to do all
this, I expect there to be a governance team.


Re: variable set in exec'ing shell cannot be unset by child shell

2023-10-14 Thread Martin D Kealey
On Sun, 15 Oct 2023, 03:15 Ti Strga,  wrote:

> On Fri, Oct 13, 2023 at 5:59 PM Grisha Levit 
> wrote:
> > IMHO you'd be better off just putting a `{` line at the start and `}`
> line at the end of your scripts
>

The big weakness of the "{}" approach is that if a writer forgets to do
> that, there's no way to detect it until a script is modified and the
> running one crashes.  But in the case of cloning, we can add such explicit
> test-and-detection for "did you forget to trigger the cloning" in the few
> scripts that really, really need it.
>

I think I would attack this from an entirely different angle: what about
simply modifying Baeh so that it slurps in the entire file upon opening it?

You could even hide that inside an LD_PRELOAD module so you don't have to
recompile Bash, and so that it's inherited automatically.

-Martin


Re: variable set in exec'ing shell cannot be unset by child shell

2023-10-14 Thread Martin D Kealey
On Sun, 15 Oct 2023, 03:05 Greg Wooledge,  wrote:

> On Sat, Oct 14, 2023 at 12:55:21PM -0400, Ti Strga wrote:
> > it's just the "[[ -v foo ]]" tests to see where along the cloning
> process we are.
>
> *Shudder*
>

Likewise, b.

If the *real* goal is to overwrite a running script with a new version of
> itself, and then re-exec it, then the correct solution is to wrap the
> script in a single compound command so that it gets read and parsed up
> front, before beginning execution of the main loop.  Either wrap the whole
> thing in "{" ... "}" as Grisha suggested, or wrap the whole thing in a
> "main()" function and then call main "$@".
>

Agreed. Either way, don't forget to put "exit;" just before the closing
"}". Or write « exec main "$@" ».

(For good measure I would also make sure it's a valid posix text file with
a terminal newline, so that cat rubbish >> script can't break it.)

Personally I don't much care for the main "$@" style as it makes an extra
copy of argv for no particularly good reason, and Shell Is Not C™; but it's
better than allowing the script to blow up with parse errors after it's
started running.

-Martin

>


Re: variable set in exec'ing shell cannot be unset by child shell

2023-10-14 Thread Ti Strga
On Fri, Oct 13, 2023 at 5:59 PM Grisha Levit  wrote:
> On Fri, Oct 13, 2023, 10:03 Ti Strga  wrote:
>>
>> [*] Alternatively, there's the trick about putting the entire script
>> contents inside a compound statement to force the parser to read it all,
>> but that just makes the script harder for a human to read.  Copy-and-exec
>> makes the top-level scripts cleaner IMHO.
>
> IMHO you'd be better off just putting a `{` line at the start and `}` line at 
> the end of your scripts,

Enh, that clutters up the calling scripts, and unlike setting a
variable at the top (the "OUTSIDE" in the example, with a real name in
the real code), it's not immediately clear to future coworkers why
we're doing it and what effect it has.  Semi-self-documenting
variables that can be easily grepped for are always better than
apparently arbitrary isolated curly braces.  Having to play tricks
with the parser to avoid something tangentially related to parsing is
not my style, but I appreciate that others may feel differently.

The big weakness of the "{}" approach is that if a writer forgets to
do that, there's no way to detect it until a script is modified and
the running one crashes.  But in the case of cloning, we can add such
explicit test-and-detection for "did you forget to trigger the
cloning" in the few scripts that really, really need it.


> and avoid a whole host of other potential problems. (Do you make a separate 
> holding directory for each run of the outer script? If so, what happens if 
> someone starts another copy after making changes? If not, how do you clean it 
> up? Etc.)

Already taken care of.  Honestly, this part of the functionality is
pretty solid, I just didn't put it in the example.  :-)  Yes, we use
different holding copies, it's not a hardcoded "COPY_OF_SCRIPT" in the
real script.  Several simultaneous copies are fine.  We clean things
up with a combination of chained EXIT traps in the scripts, and some
systemd-tmpfiles work for the parts that aren't scripts.



Re: variable set in exec'ing shell cannot be unset by child shell

2023-10-14 Thread Greg Wooledge
On Sat, Oct 14, 2023 at 12:55:21PM -0400, Ti Strga wrote:
> it's just the "[[ -v
> foo ]]" tests to see where along the cloning process we are.

*Shudder*

I foresee so much more pain in your future.  Seriously, this is going
to blow up in your face at some point.  -v peeks into some incredibly
dark and spooky corners of the shell, and will expose *precisely* how
your assumptions about the shell differ with those of the bash author.
Also, it's been historically buggy.

I'm inclined to agree with Grisha Levit.  This whole thing looks like
a massively out-of-control X-Y problem.  If the *real* goal is to
overwrite a running script with a new version of itself, and then
re-exec it, then the correct solution is to wrap the script in a single
compound command so that it gets read and parsed up front, before
beginning execution of the main loop.  Either wrap the whole thing in
"{" ... "}" as Grisha suggested, or wrap the whole thing in a "main()"
function and then call main "$@".  That way, you can overwrite the file
without sabotaging running instances of the script.



Re: variable set in exec'ing shell cannot be unset by child shell

2023-10-14 Thread Ti Strga
On Fri, Oct 13, 2023 at 5:35 PM Chet Ramey  wrote:
> This is what happens. First, you have to remember that variables supplied
> as temporary assignments to builtins like eval and source persist for the
> entire life of that builtin's execution, and appear in the environment of
> child processes those builtins create (this is what the man page text
> "added to the environment of the executed command" means for a builtin).

Yep, that part I'm extremely familiar with...


> these temporary variables can shadow global variables.

...but I was not aware of that part in this context!  That's what I was missing.


> 6. inner.sh calls unset, which unsets the temporary variable (clone) and
> `unshadows' the global variable (clone2)

And that makes it very clear what's going on.  Thank you for that walkthrough.


> There is code in bash to make a unsetting a function's local copy of a
> dynamically-scoped variable that shadows a global variable remain `unset'
> instead of unshadowing the global, but I've never done that for source or
> eval. It's not clear that would help in this case, either -- it depends
> on what the rest of the code does and expects.

I could see "helpful or not" going either way, honestly.  In this
specific case, there isn't really any "rest of the code" that's
relevant to the variables being shadowed, etc, it's just the "[[ -v
foo ]]" tests to see where along the cloning process we are.  The only
part I didn't include in the example was the code that does such tests
to see if it's inside the cloned copy, and if it is, arranges to
delete the temporary copy on exit.  (The arbitrary top-level script
might also be doing on-exit actions, so there's this whole thing of
registering a chain of functions to be called by the single permitted
EXIT trap.  All of that is working, and is independent of the optional
cloning, so I didn't want to litter up the example with distractions.)

Activating the code you mention for source/eval might have helped for
my particular use case, in that it would have saved me some debugging
time that I would otherwise have spent... I dunno, probably drinking
more coffee.  But it would likely have introduced confusion for all
the other users who were accustomed to the shadowing behavior, and
caused them to spend even more time debugging and writing bug report
emails.  I agree probably not helpful to have that code for
source/eval.  :-)

We'll either be writing a solid comment in the code explaining why
that particular 'unset' has to be where it is, or we'll change to
testing the values of those trigger variables instead of just "is it
set or not" and using varying values to track where in the process it
is ("clone" vs "clone2" in your example).

Thank you again!
-Ti



Re: variable set in exec'ing shell cannot be unset by child shell

2023-10-14 Thread Robert Elz
Date:Sat, 14 Oct 2023 14:46:12 +1000
From:Martin D Kealey 
Message-ID:  



  | Back when I used the Bourne Shell we didn't have `local`, so we used to
  | write `var= func` to make sure that `func` couldn't mess with *our* `var`.

If you were using the original Bourne shell you couldn't have done that,
as it had no functions either.   The changes made to it beyond that were
very often badly designed or just plain broken, and in POSIX (because of
the way it worked in some shells) that behaviour was prohibited,
 VAR=foo func
was required to set VAR=foo in the shell environment (unless func altered it).

That requirement has only relatively recently been changed - changed because
it violated another POSIX requirement, that being that (ignoring execution
speed, etc) it should not be possible to tell the difference between a utility
implemented as a function and one implemented as a file system, command
(that is, if the function sets out to implement the same thing as the external
utility - which of course precludes it from making any changes to the shell
environment).

Even now it is unspecified what happens:

This is from XCU 2.9.1.2 (in the latest available Issue 8 draft, but I think
it is the same text in the current issued standard (Issue 7 + TCs))

  � If the command name is a function that is not a standard utility
implemented as a function, variable assignments shall affect the current
execution environment during the execution of the function.

[Aside: "standard utilities implemented as functions" are required to
act identically to the external utility, but they aren't the issue here]

It is unspecified:

  -- Whether or not the variable assignments persist after the completion
 of the function

[ie: your trick is not guaranteed to work]

  -- Whether or not the variables gain the export attribute during the
 execution of the function

[ie: such a variable isn't even guaranteed to be exported]

  -- Whether or not export attributes gained as a result of the
 variable assignments persist after the completion of the function
 (if variable assignments persist after the completion of the function)

[ie: it is possible that a variable that wasn't exported before being
used as "VAR=val func" might now be exported]

A good implementation will revert the value if the func doesn't alter it,
and will put it in the environment during the lifetime of the function, but
none of that is guaranteed.   At least now that is permitted, rather than
prohibited, which it used to be.

There is nothing here anywhere that permits an implementation to avoid
making an assignment to a variable within a function fail to persist
when the function terminates - regardless of whether the variable was
named in a var-assign that precedes the command name (the command being
a function).Of course if some non standard feature is used (like for
example, "local") then all bets are off, and whatever happens depends upon
what the shell defines to happen.

However in the case of a special built-in utility (which "." is) then
the requirements are much stricter:

  � If the command name is a special built-in utility, variable assignments
shall affect the current execution environment before the utility is
executed and remain in effect when the command completes; if an assigned
variable is further modified by the utility, the modifications made by
the utility shall persist. Unless the set -a option is on (see set),
it is unspecified:

  -- Whether or not the variables gain the export attribute during the
 execution of the special built-in utility

  -- Whether or not export attributes gained as a result of the variable
 assignments persist after the completion of the special built-in
 utility

That is, in the case of "VAR=val . script" (which is what the OP was doing,
there were no functions involved) POSIX actually requires that VAR=val be
done before the utility is invoked (and never undone) and that if the script
modifies VAR, that modification remain after it has completed.

(It is just unspecified whether anything gets exported by this, and if it
does, whether that attribute remains after the script ends).

Note that some of that text is new in Issue 8, to deal with making it
clear what happens if a script does "X=Y unset X" where previously it
might have seemed permitted for X to remain set after that command (to Y
or whatever value it had before) - now it is (will be) clear that is not
permitted, and X must be unset after that command completes.   Similarly
"X=foo export X" must result in X being exported with value "foo" when
that command completes.   "." is no different (conceptually) than those.


  | Given that "put in the environment" actually means "create a shell variable
  | and mark it as exported",

That's an implementation detail, it doesn't require that at all for
external utilities,