On Tue 09-10-18 22:51:00, Tetsuo Handa wrote:
> On 2018/10/09 22:26, Michal Hocko wrote:
> > On Tue 09-10-18 22:14:24, Tetsuo Handa wrote:
> >> On 2018/10/09 21:58, Michal Hocko wrote:
> >>> On Tue 09-10-18 21:52:12, Tetsuo Handa wrote:
> >>>> On 2018/10/09 20:10, Michal Hocko wrote:
> >>>>> On Tue 09-10-18 19:00:44, Tetsuo Handa wrote:
> >>>>>>> 2) add OOM_SCORE_ADJ_MIN and do not kill tasks sharing mm and do not
> >>>>>>> reap the mm in the rare case of the race.
> >>>>>>
> >>>>>> That is no problem. The mistake we made in 4.6 was that we updated 
> >>>>>> oom_score_adj
> >>>>>> to -1000 (and allowed unprivileged users to OOM-lockup the system).
> >>>>>
> >>>>> I do not follow.
> >>>>>
> >>>>
> >>>> http://tomoyo.osdn.jp/cgi-bin/lxr/source/mm/oom_kill.c?v=linux-4.6.7#L493
> >>>
> >>> Ahh, so you are not referring to the current upstream code. Do you see
> >>> any specific problem with the current one (well, except for the possible
> >>> race which I have tried to evaluate).
> >>>
> >>
> >> Yes. "task_will_free_mem(current) in out_of_memory() returns false due to 
> >> MMF_OOM_SKIP
> >> being already set" is a problem for clone(CLONE_VM without 
> >> CLONE_THREAD/CLONE_SIGHAND)
> >> with the current code.
> > 
> > a) I fail to see how that is related to your previous post and b) could
> > you be more specific. Is there any other scenario from the two described
> > in my earlier email?
> > 
> 
> I do not follow. Just reverting commit 44a70adec910d692 and commit 
> 97fd49c2355ffded
> is sufficient for closing the copy_process() versus __set_oom_adj() race.

Please go back and see why this has been done in the first place.

> We went too far towards complete "struct mm_struct" based OOM handling. But 
> stepping
> back to "struct signal_struct" based OOM handling solves Yong-Taek's 
> for_each_process()
> latency problem and your copy_process() versus __set_oom_adj() race problem 
> and my
> task_will_free_mem(current) race problem.

And again, I have put an evaluation of the race and try to see what is
the effect. Then you have started to fire hard to follow notes and it is
not clear whether the analysis/conclusions is wrong/incomplete.

So an we get back to that analysis and stick to the topic please?

-- 
Michal Hocko
SUSE Labs

Reply via email to