On Tue, Oct 30, 2018 at 12:12 PM Daniel Colascione <dan...@google.com> wrote: > > On Tue, Oct 30, 2018 at 11:04 AM, Christian Brauner > <christian.brau...@canonical.com> wrote: > > On Tue, Oct 30, 2018 at 11:48 AM Daniel Colascione <dan...@google.com> > > wrote: > >> > >> On Tue, Oct 30, 2018 at 10:40 AM, Christian Brauner > >> <christian.brau...@canonical.com> wrote: > >> > On Tue, Oct 30, 2018 at 11:39:11AM +0100, Christian Brauner wrote: > >> >> On Tue, Oct 30, 2018 at 08:50:22AM +0000, Daniel Colascione wrote: > >> >> > On Tue, Oct 30, 2018 at 3:21 AM, Joel Fernandes <joe...@google.com> > >> >> > wrote: > >> >> > > On Mon, Oct 29, 2018 at 3:11 PM Daniel Colascione > >> >> > > <dan...@google.com> wrote: > >> >> > >> > >> >> > >> Add a simple proc-based kill interface. To use /proc/pid/kill, just > >> >> > >> write the signal number in base-10 ASCII to the kill file of the > >> >> > >> process to be killed: for example, 'echo 9 > /proc/$$/kill'. > >> >> > >> > >> >> > >> Semantically, /proc/pid/kill works like kill(2), except that the > >> >> > >> process ID comes from the proc filesystem context instead of from > >> >> > >> an > >> >> > >> explicit system call parameter. This way, it's possible to avoid > >> >> > >> races > >> >> > >> between inspecting some aspect of a process and that process's PID > >> >> > >> being reused for some other process. > >> >> > >> > >> >> > >> With /proc/pid/kill, it's possible to write a proper race-free and > >> >> > >> safe pkill(1). An approximation follows. A real program might use > >> >> > >> openat(2), having opened a process's /proc/pid directory > >> >> > >> explicitly, > >> >> > >> with the directory file descriptor serving as a sort of "process > >> >> > >> handle". > >> >> > > > >> >> > > How long does the 'inspection' procedure take? If its a short > >> >> > > duration, then is PID reuse really an issue, I mean the PIDs are not > >> >> > > reused until wrap around and the only reason this can be a problem > >> >> > > is > >> >> > > if you have the wrap around while the 'inspecting some aspect' > >> >> > > procedure takes really long. > >> >> > > >> >> > It's a race. Would you make similar statements about a similar fix for > >> >> > a race condition involving a mutex and a double-free just because the > >> >> > race didn't crash most of the time? The issue I'm trying to fix here > >> >> > is the same problem, one level higher up in the abstraction hierarchy. > >> >> > > >> >> > > Also the proc fs is typically not the right place for this. Some > >> >> > > entries in proc are writeable, but those are for changing values of > >> >> > > kernel data structures. The title of man proc(5) is "proc - process > >> >> > > information pseudo-filesystem". So its "information" right? > >> >> > > >> >> > Why should userspace care whether a particular operation is "changing > >> >> > [a] value[] of [a] kernel data structure" or something else? That > >> >> > something in /proc is a struct field is an implementation detail. It's > >> >> > the interface semantics that matters, and whether a particular > >> >> > operation is achieved by changing a struct field or by making a > >> >> > function call is irrelevant to userspace. Proc is a filesystem about > >> >> > processes. Why shouldn't you be able to send a signal to a process via > >> >> > proc? It's an operation involving processes. > >> >> > > >> >> > It's already possible to do things *to* processes via proc, e.g., > >> >> > adjust OOM killer scores. Proc filesystem file descriptors are > >> >> > userspace references to kernel-side struct pid instances, and as such, > >> >> > make good process handles. There are already "verb" files in procfs, > >> >> > such as /proc/sys/vm/drop_caches and /proc/sysrq-trigger. Why not add > >> >> > a kill "verb", especially if it closes a race that can't be closed > >> >> > some other way? > >> >> > > >> >> > You could implement this interface as a system call that took a procfs > >> >> > directory file descriptor, but relative to this proposal, it would be > >> >> > all downside. Such a thing would act just the same way as > >> >> > /pric/pid/kill, and wouldn't be usable from the shell or from programs > >> >> > that didn't want to use syscall(2). (Since glibc isn't adding new > >> >> > system call wrappers.) AFAIK, the only downside of having a "kill" > >> >> > file is the need for a string-to-integer conversion, but compared to > >> >> > process killing, integer parsing is insignificant. > >> >> > > >> >> > > IMO without a really good reason for this, it could really be a hard > >> >> > > sell but the RFC was worth it anyway to discuss it ;-) > >> >> > > >> >> > The traditional unix process API is down there at level -10 of Rusty > >> >> > Russel's old bad API scale: "It's impossible to get right". The races > >> >> > in the current API are unavoidable. That most programs don't hit these > >> >> > races most of the time doesn't mean that the race isn't present. > >> >> > > >> >> > We've moved to a model where we identify other system resources, like > >> >> > DRM fences, locks, sockets, and everything else via file descriptors. > >> >> > This change is a step toward using procfs file descriptors to work > >> >> > with processes, which makes the system more regular and easier to > >> >> > reason about. A clean API that's possible to use correctly is a > >> >> > worthwhile project. > >> >> > >> >> So I have been disucssing a new process API With David Howells, Kees > >> >> Cook and a few others and I am working on an RFC/proposal for this. It > >> >> is partially inspired by the new mount API. So I would like to block > >> >> this patch until then. I would like to get this right very much and > >> > >> It's good to hear that others are thinking about this problem. > >> > >> >> I > >> >> don't think this is the way to go. > > > > Because we want this to be generic and things like getting handles on > > processes via /proc is just a part of that > > The word "generic" is like the word "secure": it's hard to tell what
Ha, trolling hard but true. :) > it means in isolation. :-) Over what domain do we need to be generic? > Procfs file descriptors already work on processes generally, and they > allow for race-free access to anything that's reachable via a procfs > pid directory. In what way would an alternate approach be even more > generic? > > >> Why not? > >> > >> Does your proposed API allow for a race-free pkill, with arbitrary > >> selection criteria? This capability is a good litmus test for fixing > >> the long-standing Unix process API issues. > > > > You'd have a handle on the process with an fd so yes, it would be. > > Thanks. That's good to hear. > > Any idea on the timetable for this proposal? I'm open to lots of > alternative technical approaches, but I don't want this capability to > languish for a long time. Latest end of year likely sooner depending on the feedback I'm getting during LPC.