On Fri 05-02-16 00:08:25, Tetsuo Handa wrote: > Michal Hocko wrote: > > > > + /* > > > > + * Clear TIF_MEMDIE because the task shouldn't be sitting on a > > > > + * reasonably reclaimable memory anymore. OOM killer can > > > > continue > > > > + * by selecting other victim if unmapping hasn't led to any > > > > + * improvements. This also means that selecting this task > > > > doesn't > > > > + * make any sense. > > > > + */ > > > > + tsk->signal->oom_score_adj = OOM_SCORE_ADJ_MIN; > > > > + exit_oom_victim(tsk); > > > > > > I noticed that updating only one thread group's oom_score_adj disables > > > further wake_oom_reaper() calls due to rough-grained can_oom_reap check at > > > > > > p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN > > > > > > in oom_kill_process(). I think we need to either update all thread groups' > > > oom_score_adj using the reaped mm equally or use more fine-grained > > > can_oom_reap > > > check which ignores OOM_SCORE_ADJ_MIN if all threads in that thread group > > > are > > > dying or exiting. > > > > I do not understand. Why would you want to reap the mm again when > > this has been done already? The mm is shared, right? > > The mm is shared between previous victim and next victim, but these victims > are in different thread groups. The OOM killer selects next victim whose mm > was already reaped due to sharing previous victim's memory.
OK, now I got your point. From your previous email it sounded like you were talking about oom_reaper and its invocation which is was confusing. > We don't want the OOM killer to select such next victim. Yes, selecting such a task doesn't make much sense. It has been killed so it has fatal_signal_pending. If it wanted to allocate it would get TIF_MEMDIE already and it's address space has been reaped so there is nothing to free left. These CLONE_VM without CLONE_SIGHAND is really crazy combo, it is just causing troubles all over and I am not convinced it is actually that helpful </rant>. > Maybe set MMF_OOM_REAP_DONE on > the previous victim's mm and check it instead of TIF_MEMDIE when selecting > a victim? That will also avoid problems caused by clearing TIF_MEMDIE? Hmm, it doesn't seem we are under MMF_ availabel bits pressure right now so using the flag sounds like the easiest way to go. Then we even do not have to play with OOM_SCORE_ADJ_MIN which might be updated from the userspace after the oom reaper has done that. Care to send a patch? Thanks! -- Michal Hocko SUSE Labs