Date: Mon, 16 Mar 2020 13:38:58 +0100 From: Joerg Schilling <joerg.schill...@fokus.fraunhofer.de> Message-ID: <5e6f7362.u+rw3m3sirjpta0s%joerg.schill...@fokus.fraunhofer.de>
| Do you like to talk about what happens when pid numbers are reused? Yes. | This may be a negative side-effect of PID ramomization that could reuse | pis numbers much earlier than without... It makes no difference to the issue - may alter the probability of it occurring. | > Does anyone know of a shell that correctly handles this now? | | I guess there is no behavior that could be called "correct", | since the behavior is not caused by the shell but by the kernel. No, it is the shell causing this one, not the kernel - the kernel (or at least, no kernel I'm aware of, from 5th edition (mid 1970's) to now) avoid reassigning a pid to a process that exists - there are never duplicates. However the shell keeps jobs in its jobs table until a script does a wait command - the shell is keeping the pid "alive" longer than it remains alive in the kernel (and so protected from reuse). That's the problem. That is, this is a common shell implementation technique, this one is incredibly hard to test without a custom kernel to force the issue (very few available pids) so I haven't attempted to discover which shells, if any, have any mitigation for this (or even avoid it completely). That's why I asked. | Before talking about your ideas, it would be important to define what you | intend. Correctness. Making sure that when a script evaluates $! it gets a handle on a job that it can (hours, or days, or weeks later) reference to wait for (or kill) the job, and know it is referencing the correct job. Always. | Solaris defines PID_MAX to 999999. What value are you using? Irrelevant. Modern processors can run through that many processes in almost no time - the issue is that pids are reused, eventually, in all systems. | In fact, there are platforms (AIX IIRC) that implement waitid() | flags only with waitid(), OK, not that I care a lot about AIX | but why do you care about outdated interfaces anyway? I was anticiupating a comment like that - but this is irrelevant, if we use the WNOWAIT method, and can somehow make it work, it is up to the implementation to make that function on the system it is to be installed upon, if that means using waitid() then that's what it would have to do. And from your other message... | Then you would need to rewrite the shell to behave like ksh93 and install a | SIGCLD handler. I am not sure about possible side effects.... Depending what the handler does (what action the shell takes when it receives a SIGCHLD - please spell it correctly the list of signal names, including SIGCHLD is in XBD, see page 334 - regardless or whether that happens in the signal handler, or in some code run later that is triggered by the signal handler) that can only even make the problem worse (or be neutral) though I suppose in conjunction with never doing waitid(P_ALL, ...) or the equivalent using one of the other wait*() interfaces, there might be a method there which could work (keep the zombie in the kernel, yet be aware that it is a zombie, and why it exited). Harald's suggestion (and my comment about it) "works" (as much as it does) as it would avoid the shell using pids (which may have been reused by the kernel) to refer to shell jobs, and instead use job designators (%1 %% etc) which are totally under control of the shell, and so can be made safe. Pity that "ps -p %%" doesn't work though... kre