On Mon, May 30, 2016 at 11:52:12AM +0200, Michal Hocko wrote: > On Mon 30-05-16 09:13:57, Michal Hocko wrote: > > On Fri 27-05-16 19:48:30, Vladimir Davydov wrote: > > > On Thu, May 26, 2016 at 02:40:13PM +0200, Michal Hocko wrote: > > [...] > > > > @@ -839,6 +841,13 @@ void oom_kill_process(struct oom_control *oc, > > > > struct task_struct *p, > > > > for_each_process(p) { > > > > if (!process_shares_mm(p, mm)) > > > > continue; > > > > + /* > > > > + * vforked tasks are ignored because they will drop the > > > > mm soon > > > > + * hopefully and even if not they will not mind being > > > > oom > > > > + * reaped because they cannot touch any memory. > > > > > > They shouldn't modify memory, but they still can touch it AFAIK. > > > > You are right. This means that the vforked child might see zero pages. > > Let me think whether this is acceptable or not. > > OK, I was thinking about it some more and I think you have a good point > here. I can see two options here: > - keep vforked task alive and skip the oom reaper. If the victim exits > normally and the oom wouldn't get resolved the vforked task will be > selected in the next round because the victim would clean up > vfork_done state in wait_for_vfork_done. We are still risking that > the victim gets stuck though > - kill vforked task and so it would be reapable.
IMHO it all depends on what we're trying to achieve. If we want per task oom, which could make some sense since a task can consume a lot of mem via e.g. pipe buffers, we would go with option #1. However, it's rather difficult to find out how much of kmem a task consumes w/o using kmemcg, so IMHO per-mm approach makes more sense in general. In this case I think we should kill both vforked task and its parent if their mm was selected provided their oom_score_adj allows that. > > The later sounds more robust to me because we invoke the oom_reaper and > the side effect shouldn't be really a problem because the vforked task > couldn't have done a lot of useful work anyway. So I will drop this > patch and update "mm, oom: fortify task_will_free_mem" to skip the > the vfork check as well.