Re: Examples of concurrent coproc usage?

Carl Edquist Thu, 04 Apr 2024 05:51:03 -0700

Hi Chet, thanks for taking the time to review this  :D


[My apologies again upfront for another lengthy (comprehensive?) email.]


On Wed, 3 Apr 2024, Chet Ramey wrote:

On 4/2/24 12:22 PM, Carl Edquist wrote:
 the forked coproc has to close its fds to/from _all_ other existing
 coprocs (as there can be several).
And there is the issue. Without multi-coproc support, the shell onlykeeps track of one coproc at a time, so there's only one set of pipefile descriptors to close.

Right, exactly. The example with the default build (showing the essentialcase that causes deadlock) was to highlight that your multi-coproc supportcode apparently does indeed correctly track and close all these fds, andthus prevents the deadlock issue.



On Wed, 3 Apr 2024, Chet Ramey wrote:

It's straightforward: the coproc process terminates, the shell reaps it,marks it as dead, notifies the user that the process exited, and reapsit before printing the next prompt. I don't observe any differentbehavior between the default and when compiled for multiple coprocs.
It depends on when the process terminates as to whether you get a promptback and need to run an additional command before reaping the coproc(macOS, RHEL), which gives you the opportunity to run the `read'command:

Ah, my mistake then - thanks for explaining. I must have been thrown offby the timing, running it with and without an intervening interactiveprompt before the read command. When run interactively, an extra 'Enter'(or not) before the read command changes the behavior.

So in that case, this issue (that the shell closes its read-end of thepipe from a reaped coproc, potentially before being able to read the finaloutput) was already there and is not specific to the multi-coproc code.

But in any case, it seems like this is a race then? That is, whether thechild process terminates before or after the prompt in question.

$ coproc WC { wc; }
[1] 48057
$ exec {WC[1]}>&-
$ read -u ${WC[0]} X
[1]+  Done                    coproc WC { wc; }
bash: DEBUG warning: cpl_reap: deleting 48057
$ echo $X
0 0 0

(I put in a trace statement to show exactly when the coproc gets reaped and
deallocated.)


Thanks! (for taking the time to play with this)

Though apparently it's still a race here. If you diagram the shell andcoproc (child) processes, I think you'll see that your DEBUG statement canalso happen _before_ the read command, which would then fail. You cancontrive this by adding a small sleep (eg, 0.1s) at the end ofexecute_builtin_or_function (in execute_cmd.c), just before it returns.


Eg:

        diff --git a/execute_cmd.c b/execute_cmd.c
        index ed1063e..c72f322 100644
        --- a/execute_cmd.c
        +++ b/execute_cmd.c
        @@ -5535,6 +5535,7 @@ execute_builtin_or_function (words, builtin, var, 
redirects,
           discard_unwind_frame ("saved_fifos");
         #endif

        +  usleep(100000);
           return (result);
         }

If I do this, I consistently see "read: X: invalid file descriptorspecification" running the above 4-line "coproc WC" example in a script,demonstrating that there is no guarantee that the read command will startbefore the WC coproc is reaped and {WC[0]} is closed, even though it's thenext statement after 'exec {WC[1]}>&-'.

But (as I'll try to show) you can trip up on this race even withoutslowing down bash itself artificially.

I can't reproduce your results with non-interactive shells, either, withjob control enabled or disabled.


That's fair; let's try it with a script:

        $ cat cope.sh
        #!/bin/bash

        coproc WC { wc; }
        jobs
        exec {WC[1]}>&-
        [[ $1 ]] && sleep "$1"
        jobs
        read -u ${WC[0]} X
        echo $X


Run without sleep, the wc output is seen:

        $ ./cope.sh
        [1]+  Running                 coproc WC { wc; } &
        [1]+  Running                 coproc WC { wc; } &
        0 0 0


Run with a brief sleep after closing the write end, and it breaks:

        $ ./cope.sh .1
        [1]+  Running                 coproc WC { wc; } &
        [1]+  Done                    coproc WC { wc; }
        ./cope.sh: line 8: read: X: invalid file descriptor specification

And, if I run with "0" for a sleep time, it intermittently behaves likeeither of the above. Racy!

This is a bug. The shell should not automatically close its read pipeto a coprocess that has terminated -- it should stay open to read thefinal output, and the user should be responsible for closing the readend explicitly.
How long should the shell defer deallocating the coproc after theprocess terminates?

I only offer my opinion here, but it strikes me that it definitely should_not_ be based on an amount of _time_. That's inherently racy.

In the above example, there is only a single line to read; but the generalcase there may be many 'records' sitting in the pipe waiting to beprocessed, and processing each record may take an arbitrary amount oftime. (Consider a coproc containing a sort command, for example, thatproduces all its output lines at once after it sees EOF, and thenterminates.)



Zack illustrated basically the same point with his example:

        exec {fd}< <( some command )
        while IFS='' read -r line <&"${fd}"; do
          # do stuff
        done
        {fd}<&-

A process-substitution open to the shell like this is effectively aone-ended coproc (though not in the jobs list), and it behaves reliablyhere because the user can count on {fd} to remain open even after thechild process terminates.

So, the user can determine when the coproc fds are no longer needed,whether that's when EOF is hit trying to read from the coproc, or whateverother condition.

Personally I like the idea of 'closing' a coproc explicitly, but if it's abother to add options to the coproc keyword, then I would say just let theuser be responsible for closing the fds. Once the coproc has terminated_and_ the coproc's fds are closed, then the coproc can be deallocated.

Apparently there is already some detection in there for when the coprocfds get closed, as the {NAME[@]} fd array members get set to -1automatically when when you do, eg, 'exec {NAME[0]}<&-'. So perhaps thiswon't be a radical change.

Alternatively (or, additionally), you could interpret 'unset NAME' for acoproc to mean "deallocate the coproc." That is, close the {NAME[@]} fds,unset the NAME variable, and remove any coproc bookkeeping for NAME.

(Though if the coproc child process hasn't terminated on its own yet,still it shouldn't be killed, and perhaps it should remain in the jobslist as a background process until it's done.)

...

[And if you _really_ don't want to defer deallocating a coproc after itterminates, I suppose you can go ahead and deallocate it in terms ofremoving it from the jobs list and dropping any bookkeeping for it - aslong as you leave the fds and fd variables intact for the user.

It's a little dicey, but in theory it should not lead to deadlock, even ifcopies of these (now untracked) reaped-coproc pipe fds end up in othercoprocs. Why not? Because (1) this coproc is dead and thus won't betrying to read anymore from its (now closed) end, and (2) reads from theshell's end will not block since there are no copies of the coproc's writeend open anymore.

Still, it'd be cleaner to defer deallocation, to avoid these stray (albeitharmless) copies of fds making their way into new coprocs.]

What should it do to make sure that the variables don't hang around withinvalid file descriptors?

First, just to be clear, the fds to/from the coproc pipes are not invalidwhen the coproc terminates (you can still read from them); they are onlyinvalid after they are closed.

The surprising bit is when they become invalid unexpectedly (from thepoint of view of the user) because the shell closes them automatically,at the somewhat arbitrary timing when the coproc is reaped.

Second, why is it a problem if the variables keep their (invalid) fdsafter closing them, if the user is the one that closed them anyway?


Isn't this how it works with the auto-assigned fd redirections?

Eg:

        $ exec {d}<.
        $ echo $d
        10
        $ exec {d}<&-
        $ echo $d
        10

But, as noted, bash apparently already ensures that the variables don'thang around with invalid file descriptors, as once you close them thecorresponding variable gets updated to "-1".

Or should the user be responsible for unsetting the array variable too?(That's never been a requirement, obviously.)

On the one hand, bash is already messing with the coproc array variables(setting the values to -1 when the user closes the fds), so it's notreally a stretch in my mind for bash to unset the whole variable when thecoproc is deallocated. On the other hand, as mentioned above, bash leavesautomatically allocated fd variables intact after the user explicitlycloses them.


So I guess either way seems reasonable.

If the user has explicitly closed both fd ends for a coproc, it should notbe a surprise to the user either way - whether the variable gets unsetautomatically, or whether it remains with (-1 -1).

Since you are already unsetting the variable when the coproc isdeallocated though, I'd say it's fine to keep doing that -- just don'tdeallocate the coproc before the user has closed both fds.

It also invites trouble if the shell variable that holds the fds getsremoved unexpectedly when the coprocess terminates. (Suddenly thevariable expands to an empty string.) It seems to me that the propertime to clear the coproc variable (if at all) is after the user hasexplicitly closed both of the fds.
That requires adding more plumbing than I want to,


Your project your call :D

especially since the user can always save the file descriptors ofinterest into another variable if they want to use them after the coprocterminates.

*Except* that it's inherently a race condition whether the originalvariables will still be intact to save them.


Even if you attempt to save them immediately:

        coproc X { exit; }
        X_BACKUP=( ${X[@]} )

it's not guaranteed that X_BACKUP=(...) will run before coproc X has beendeallocated, and the X variable cleared.

No doubt this hasn't escaped you, but in any case you can see it foryourself if you introduce a small delay in execute_coproc, in the parent,just after the call to make_child:



        diff --git a/execute_cmd.c b/execute_cmd.c
        index ed1063e..5949e3e 100644
        --- a/execute_cmd.c
        +++ b/execute_cmd.c
        @@ -2440,6 +2440,8 @@ execute_coproc (command, pipe_in, pipe_out,
        fds_to_close)

               exit (estat);
             }
        +  else
        +    usleep(100000);

           close (rpipe[1]);
           close (wpipe[0]);

When I try this, X_BACKUP is consistently empty. Though if you add, say,a call to "sleep 0.2" to coproc X, then X_BACKUP consistently gets a copyof X's fds in time.

I am sorry if this sounds contrived, but I hope it demonstrates thatclosing fds and and unsetting the variable for a coproc automatically whenit terminates is fundamentally flawed, because it depends on the arbitraryrace timing between the two processes.

*Or* else add an option to the coproc keyword to explicitly close thecoproc - which will close both fds and clear the variable.
Not going to add any more options to reserved words; that does moreviolence to the grammar than I want.

Not sure how you'd feel about using 'unset' on the coproc variableinstead. (Though as discussed, I think the coproc terminated + fdsmanually closed condition is also sufficient.)



.............

Anyway, as far as I'm concerned there's nothing urgent about all this, but(along with the multi-coproc support that you implemented), avoiding thecurrent automatic deallocation behavior would seem to go a long way towardmaking coproc a correct and generally useful feature.



Thanks for your time!

Carl


PS Zack you're welcome :)

Re: Examples of concurrent coproc usage?

Reply via email to