Re: [contrib]: setpgrp + killpg builtins

2015-02-03 Thread Chet Ramey
On 1/31/15 8:13 PM, Jason Vas Dias wrote:
> Dear bash developers -
> 
> It is very difficult to overcome the problems caused by
> the scenario described within this email without something the
> enclosed "setpgrp  " and "killpg  "
> bash loadable builtins .

I haven't looked at the setpgrp builtin yet, but the kill builtin
already allows you to kill process groups by providing a `pid'
argument that's less than -1.  It's done this since at least bash-3.0.
This just reflects how kill(2) treats its pid argument.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: [contrib]: setpgrp + killpg builtins

2015-02-03 Thread Greg Wooledge
On Sun, Feb 01, 2015 at 01:13:06AM +, Jason Vas Dias wrote:
> 1. An "invoker.sh" process runs a "job.sh" bash script in a separate
>process which runs a long-running (or non-terminating!)
>'Simple Command' (not a shell "Job") (call it "nterm.sh").
> 
> 2. After a while, the originator decides that the job has timed-out,
>and kills its process (the instance of bash running job.sh), and
>then exits.
> 
> 3. The "long-command" nterm.sh process is left still running as an orphan,
>and would become a zombie if it tries to exit.

If nterm.sh's parent has already exited, then nterm.sh gets "adopted"
by init.  When nterm.sh exits, init will wait() for it to harvest and
discard the exit status, so it won't become a zombie for any significant
length of time (it'll only be a zombie for however long it takes init
to wait()).

That said, the real issue here is your step 2.  If someone kills the shell
that's managing a long-running job, but doesn't kill the long-running
job itself, then you as the shell script developer have the opportunity
to catch the signal and pass it along to the child process.

In theory, that's great.  In practice, it only works if you launch the
long-running child as a background job and then block yourself with a
shell builtin (such as wait or read), so that you can catch the signal
immediately, rather than whenever the long-running child finishes.
But that's the nature of shell script development.  You just have to
know these limitations and work around them in your script.

Simplistic example:

#!/usr/bin/env bash
set -m
myjob() { some | long-running | pipeline | here; }
trap 'kill %1; exit' TERM
myjob &
wait



[contrib]: setpgrp + killpg builtins

2015-01-31 Thread Jason Vas Dias
Dear bash developers -

It is very difficult to overcome the problems caused by
the scenario described within this email without something the
enclosed "setpgrp  " and "killpg  "
bash loadable builtins .

Without them, or options to change signal handling for simple
commands,  it is too easy to create orphan processes,
and too difficult to find a workaround to prevent
orphan processes being created, as in the following
scenario:

1. An "invoker.sh" process runs a "job.sh" bash script in a separate
   process which runs a long-running (or non-terminating!)
   'Simple Command' (not a shell "Job") (call it "nterm.sh").

2. After a while, the originator decides that the job has timed-out,
   and kills its process (the instance of bash running job.sh), and
   then exits.

3. The "long-command" nterm.sh process is left still running as an orphan,
   and would become a zombie if it tries to exit.

I tested this with lastest bash-4.3.33 and with bash-4.2 .

The problem is most shell scripts use just simple commands and not
background jobs - changing a large number of scripts to use
asynchronous background jobs for every simple command that may
potentially not terminate due to for example NFS hangs is not
an option. Simple commands will run in their
own process groups in interactive mode, or that of the parent
in non-interactive mode, and will not be killed when
their parent job.sh  exits because the parent has no
background pid to wait for so cannot wait for them.

This is demonstrated by the attached shell scripts in the
 "nterm-demo.tar" file (nterm-demo/*) :
 invoker.sh: forks off "job.sh", waits for it to timeout, and kills it
 job.sh: runs "nterm.sh" as a simple command
 nterm.sh  : a non-terminating process
 killpg.c  : killpg built-in

To demonstrate:

$ tar -xpf nterm_demo.tar
$ cd nterm_demo
$ BASH_BUILD_DIR=...  BASH_SOURCE_DIR=... make

Example output is :
gcc  -fPIC -O3 -g -I. -I/home/jvasdias/src/3P/bash
-I/home/jvasdias/src/3P/bash/lib -I/home/jvasdias/src/3P/bash/builtins
-I/home/jvasdias/src/3P/bash/include
-I/home/jvasdias/src/3P/bash-4.30-ubuntu
-I/home/jvasdias/src/3P/bash-4.30-ubuntu/lib
-I/home/jvasdias/src/3P/bash-4.30-ubuntu/builtins  -c -o setpgid.o
setpgid.c
gcc  -shared -Wl,-soname,$@  setpgid.o   -o setpgid
gcc  -fPIC -O3 -g -I. -I/home/jvasdias/src/3P/bash
-I/home/jvasdias/src/3P/bash/lib -I/home/jvasdias/src/3P/bash/builtins
-I/home/jvasdias/src/3P/bash/include
-I/home/jvasdias/src/3P/bash-4.30-ubuntu
-I/home/jvasdias/src/3P/bash-4.30-ubuntu/lib
-I/home/jvasdias/src/3P/bash-4.30-ubuntu/builtins  -c -o killpg.o
killpg.c
...
gcc  -shared -Wl,-soname,$@  killpg.o   -o killpg
bash -c ./invoker.sh 0<&- 2>&1 | tee
./invoker.sh: hB : 11524
JOB: 11528
./job.sh: 11528: pgid : 11510
./job.sh: 11528: pgid now : 11528
./nterm.sh: hB: 11535 : pgid: 11528
non-terminating command 11535 (11528) still running.
./invoker.sh: timeout - killing job: 11528
Terminated
./nterm.sh: 11535: exits 143
./job.sh: 11528: exits 143
./invoker.sh: 11524: exiting.
$

To demonstrate the problem, make the built-ins not be found:
$ make show_the_bug
unset BASH_LOADABLES_DIR; ./invoker.sh  0<&- 2>&1 | tee
./invoker.sh: hB : 11670
Demonstrating the bug. Please kill the nterm.sh process manually.
JOB: 11672
./job.sh: 11672: pgid : 11668
job.sh will be killed, but nterm.sh will not.
./nterm.sh: hB: 11676 : pgid: 11668
non-terminating command 11676 (11668) still running.
./invoker.sh: timeout - killing job: 11672
non-terminating command 11676 (11668) still running.
non-terminating command 11676 (11668) still running.
^Cmake: *** [show_the_bug] Interrupt

Fortunately, make carefully cleans up and kills 11676 silenty.
If one types at the command line or in a shell script:
 $ BASH_LOADABLES_DIR='' ./invoker.sh
then it is really hard to kill the resulting nterm.sh process -
one has to use kill -9 $nterm_pid .

So, please give scripts some means of saying
"if I am killed, kill my current simple command",
even in interactive mode, with some new shopt option,
or provide something like the killpg / setpgid built-ins attached.

Thanks & Regards,
Jason


nterm_demo.tar
Description: Unix tar archive