On Fri, Jun 23, 2017 at 06:52:20AM +0900, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > Roman Gushchin wrote:
> > > On Thu, Jun 22, 2017 at 09:40:28AM +0900, Tetsuo Handa wrote:
> > > > Roman Gushchin wrote:
> > > > > --- a/mm/oom_kill.c
> > > > > +++ b/mm/oom_kill.c
> > > > > @@ -992,6 +992,13 @@ bool out_of_memory(struct oom_control *oc)
> > > > > if (oom_killer_disabled)
> > > > > return false;
> > > > >
> > > > > + /*
> > > > > + * If there are oom victims in flight, we don't need to select
> > > > > + * a new victim.
> > > > > + */
> > > > > + if (atomic_read(&oom_victims) > 0)
> > > > > + return true;
> > > > > +
> > > > > if (!is_memcg_oom(oc)) {
> > > > > blocking_notifier_call_chain(&oom_notify_list, 0,
> > > > > &freed);
> > > > > if (freed > 0)
> > > >
> > > > The OOM reaper is not available for CONFIG_MMU=n kernels, and timeout
> > > > based
> > > > giveup is not permitted, but a multithreaded process might be selected
> > > > as
> > > > an OOM victim. Not setting TIF_MEMDIE to all threads sharing an OOM
> > > > victim's
> > > > mm increases possibility of preventing some OOM victim thread from
> > > > terminating
> > > > (e.g. one of them cannot leave __alloc_pages_slowpath() with mmap_sem
> > > > held for
> > > > write due to waiting for the TIF_MEMDIE thread to call
> > > > exit_oom_victim() when
> > > > the TIF_MEMDIE thread is waiting for the thread with mmap_sem held for
> > > > write).
> > >
> > > I agree, that CONFIG_MMU=n is a special case, and the proposed approach
> > > can't
> > > be used directly. But can you, please, why do you find the first chunk
> > > wrong?
> >
> > Since you are checking oom_victims before checking
> > task_will_free_mem(current),
> > only one thread can get TIF_MEMDIE. This is where a multithreaded OOM
> > victim without
> > the OOM reaper can get stuck forever.
>
> Oops, I misinterpreted. This is where a multithreaded OOM victim with or
> without
> the OOM reaper can get stuck forever. Think about a process with two threads
> is
> selected by the OOM killer and only one of these two threads can get
> TIF_MEMDIE.
>
> Thread-1 Thread-2 The OOM killer
> The OOM reaper
>
> Calls down_write(¤t->mm->mmap_sem).
> Enters __alloc_pages_slowpath().
> Enters __alloc_pages_slowpath().
> Takes oom_lock.
> Calls out_of_memory().
> Selects Thread-1 as an
> OOM victim.
> Gets SIGKILL. Gets SIGKILL.
> Gets TIF_MEMDIE.
> Releases oom_lock.
> Leaves __alloc_pages_slowpath() because Thread-1 has TIF_MEMDIE.
>
> Takes oom_lock.
>
> Will do nothing because down_read_trylock() fails.
>
> Releases oom_lock.
>
> Gives up and sets MMF_OOM_SKIP after one second.
> Takes oom_lock.
> Calls out_of_memory().
> Will not check MMF_OOM_SKIP because Thread-1 still
> has TIF_MEMDIE. // <= get stuck waiting for Thread-1.
> Releases oom_lock.
> Will not leave __alloc_pages_slowpath() because
> Thread-2 does not have TIF_MEMDIE.
> Will not call up_write(¤t->mm->mmap_sem).
> Reaches do_exit().
> Calls down_read(¤t->mm->mmap_sem) in exit_mm() in do_exit(). // <=
> get stuck waiting for Thread-2.
> Will not call up_read(¤t->mm->mmap_sem) in exit_mm() in do_exit().
> Will not clear TIF_MEMDIE in exit_oom_victim() in exit_mm() in do_exit().
That's interesting... Does it mean, that we have to give an access to the
reserves
to all threads to guarantee the forward progress?
What do you think about Michal's approach? He posted a link in the thread.
Thank you!
Roman