On Fri, 3 Jun 2016, Michal Hocko wrote: > From: Michal Hocko <mho...@suse.com> > > Currently oom_kill_process skips both the oom reaper and SIG_KILL if a > process sharing the same mm is unkillable via OOM_ADJUST_MIN. After "mm, > oom_adj: make sure processes sharing mm have same view of oom_score_adj" > all such processes are sharing the same value so we shouldn't see such a > task at all (oom_badness would rule them out). > > We can still encounter oom disabled vforked task which has to be killed > as well if we want to have other tasks sharing the mm reapable > because it can access the memory before doing exec. Killing such a task > should be acceptable because it is highly unlikely it has done anything > useful because it cannot modify any memory before it calls exec. An > alternative would be to keep the task alive and skip the oom reaper and > risk all the weird corner cases where the OOM killer cannot make forward > progress because the oom victim hung somewhere on the way to exit. > > There is a potential race where we kill the oom disabled task which is > highly unlikely but possible. It would happen if __set_oom_adj raced > with select_bad_process and then it is OK to consider the old value or > with fork when it should be acceptable as well. > Let's add a little note to the log so that people would tell us that > this really happens in the real life and it matters. >
We cannot kill oom disabled processes at all, little race or otherwise. We'd rather panic the system than oom kill these processes, and that's the semantic that the user is basing their decision on. We cannot suddenly start allowing them to be SIGKILL'd.