On Thu, Apr 11, 2019 at 10:33:32AM -0700, Daniel Colascione wrote: > On Thu, Apr 11, 2019 at 10:09 AM Suren Baghdasaryan <sur...@google.com> wrote: > > On Thu, Apr 11, 2019 at 8:33 AM Matthew Wilcox <wi...@infradead.org> wrote: > > > > > > On Wed, Apr 10, 2019 at 06:43:53PM -0700, Suren Baghdasaryan wrote: > > > > Add new SS_EXPEDITE flag to be used when sending SIGKILL via > > > > pidfd_send_signal() syscall to allow expedited memory reclaim of the > > > > victim process. The usage of this flag is currently limited to SIGKILL > > > > signal and only to privileged users. > > > > > > What is the downside of doing expedited memory reclaim? ie why not do it > > > every time a process is going to die? > > > > I think with an implementation that does not use/abuse oom-reaper > > thread this could be done for any kill. As I mentioned oom-reaper is a > > limited resource which has access to memory reserves and should not be > > abused in the way I do in this reference implementation. > > While there might be downsides that I don't know of, I'm not sure it's > > required to hurry every kill's memory reclaim. I think there are cases > > when resource deallocation is critical, for example when we kill to > > relieve resource shortage and there are kills when reclaim speed is > > not essential. It would be great if we can identify urgent cases > > without userspace hints, so I'm open to suggestions that do not > > involve additional flags. > > I was imagining a PI-ish approach where we'd reap in case an RT > process was waiting on the death of some other process. I'd still > prefer the API I proposed in the other message because it gets the > kernel out of the business of deciding what the right signal is. I'm a > huge believer in "mechanism, not policy".
It's not a question of the kernel deciding what the right signal is. The kernel knows whether a signal is fatal to a particular process or not. The question is whether the killing process should do the work of reaping the dying process's resources sometimes, always or never. Currently, that is never (the process reaps its own resources); Suren is suggesting sometimes, and I'm asking "Why not always?"