On Wed, Oct 31, 2018 at 7:25 AM, David Laight <david.lai...@aculab.com> wrote: > From: Daniel Colascione >> Sent: 31 October 2018 12:56 >> On Wed, Oct 31, 2018 at 12:27 PM, David Laight <david.lai...@aculab.com> >> wrote: >> > From: Daniel Colascione >> >> Sent: 29 October 2018 17:53 >> >> >> >> This patch adds a new file under /proc/pid, /proc/pid/exithand. >> >> Attempting to read from an exithand file will block until the >> >> corresponding process exits, at which point the read will successfully >> >> complete with EOF. The file descriptor supports both blocking >> >> operations and poll(2). It's intended to be a minimal interface for >> >> allowing a program to wait for the exit of a process that is not one >> >> of its children. >> > >> > Why do you need an extra file? >> >> Because no current file suffices. > > That doesn't stop you making something work on any/all of the existing files. > >> > It ought to be possible to use poll() to wait for POLLERR having set >> > 'events' to zero on any of the nodes in /proc/pid - or even on >> > the directory itself. >> >> That doesn't actually work today. And waiting on a directory with >> POLLERR would be very weird, since directories in general don't do >> things like blocking reads or poll support. A separate file with >> self-contained, well-defined semantics is cleaner. > > Device drivers will (well ought to) return POLLERR when a device > is removed. > Making procfs behave the same way wouldn't be too stupid. > >> > Indeed, to avoid killing the wrong process you need to have opened >> > some node of /proc/pid/* (maybe cmdline) before sending the kill >> > signal. >> >> The kernel really needs better documentation of the semantics of >> procfs file descriptors. You're not the only person to think, >> mistakenly, that keeping a reference to a /proc/$PID/something FD >> reserves $PID and prevents it being used for another process. Procfs >> FDs do no such thing. kill(2) is unsafe whether or not >> /proc/pid/cmdline or any other /proc file is open. > > Interesting. > Linux 'fixed' the problem of pid reuse in the kernel by adding (IIRC) > 'struct pid' that reference counts the pid stopping reuse.
This is incorrect if you mean numeric pids. See the end of these comments in include/linux/pid.h . A pid value can be reused, it just works Ok because it causes a new struct pid allocation. That doesn't mean there isn't a numeric reuse. There's also no where in pid_alloc() where we prevent the numeric reuse AFAICT. /* * What is struct pid? * * A struct pid is the kernel's internal notion of a process identifier. * It refers to individual tasks, process groups, and sessions. While * there are processes attached to it the struct pid lives in a hash * table, so it and then the processes that it refers to can be found * quickly from the numeric pid value. The attached processes may be * quickly accessed by following pointers from struct pid. * * Storing pid_t values in the kernel and referring to them later has a * problem. The process originally with that pid may have exited and the * pid allocator wrapped, and another process could have come along * and been assigned that pid. * * Referring to user space processes by holding a reference to struct * task_struct has a problem. When the user space process exits * the now useless task_struct is still kept. A task_struct plus a * stack consumes around 10K of low kernel memory. More precisely * this is THREAD_SIZE + sizeof(struct task_struct). By comparison * a struct pid is about 64 bytes. * * Holding a reference to struct pid solves both of these problems. * It is small so holding a reference does not consume a lot of * resources, and since a new struct pid is allocated when the numeric pid * value is reused (when pids wrap around) we don't mistakenly refer to new * processes. */