Re: [PATCH] mm, oom: simplify alloc_pages_before_oomkill handling

2017-12-01 Thread Tetsuo Handa
Michal Hocko wrote:
> Recently added alloc_pages_before_oomkill gained new caller with this
> patchset and I think it just grown to deserve a simpler code flow.
> What do you think about this on top of the series?

I'm planning to post below patch in order to mitigate OOM lockup problem
caused by scheduling priority. But I'm OK with your patch because your patch
will not conflict with below patch.

--
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b2746a7..ef6e951 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3332,13 +3332,14 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, 
const char *fmt, ...)
*did_some_progress = 0;
 
/*
-* Acquire the oom lock.  If that fails, somebody else is
-* making progress for us.
+* Acquire the oom lock. If that fails, give enough CPU time to the
+* owner of the oom lock in order to help reclaiming memory.
 */
-   if (!mutex_trylock(_lock)) {
-   *did_some_progress = 1;
+   while (!mutex_trylock(_lock)) {
+   page = alloc_pages_before_oomkill(oc);
+   if (page)
+   return page;
schedule_timeout_uninterruptible(1);
-   return NULL;
}
 
/* Coredumps can quickly deplete all memory reserves */
--
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v8 0/4] cgroup-aware OOM killer

2017-10-02 Thread Tetsuo Handa
Shakeel Butt wrote:
> I think Tim has given very clear explanation why comparing A & D makes
> perfect sense. However I think the above example, a single user system
> where a user has designed and created the whole hierarchy and then
> attaches different jobs/applications to different nodes in this
> hierarchy, is also a valid scenario. One solution I can think of, to
> cater both scenarios, is to introduce a notion of 'bypass oom' or not
> include a memcg for oom comparision and instead include its children
> in the comparison.

I'm not catching up to this thread because I don't use memcg.
But if there are multiple scenarios, what about offloading memcg OOM
handling to loadable kernel modules (like there are many filesystems
which are called by VFS interface) ? We can do try and error more casually.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 1/6] mm, oom: use oom_victims counter to synchronize oom victim selection

2017-06-29 Thread Tetsuo Handa
Roman Gushchin wrote:
> On Fri, Jun 23, 2017 at 06:52:20AM +0900, Tetsuo Handa wrote:
> > Tetsuo Handa wrote:
> > Oops, I misinterpreted. This is where a multithreaded OOM victim with or 
> > without
> > the OOM reaper can get stuck forever. Think about a process with two 
> > threads is
> > selected by the OOM killer and only one of these two threads can get 
> > TIF_MEMDIE.
> > 
> >   Thread-1 Thread-2 The OOM killer  
> >  The OOM reaper
> > 
> >Calls down_write(>mm->mmap_sem).
> >   Enters __alloc_pages_slowpath().
> >Enters __alloc_pages_slowpath().
> >   Takes oom_lock.
> >   Calls out_of_memory().
> > Selects Thread-1 as an 
> > OOM victim.
> >   Gets SIGKILL.Gets SIGKILL.
> >   Gets TIF_MEMDIE.
> >   Releases oom_lock.
> >   Leaves __alloc_pages_slowpath() because Thread-1 has TIF_MEMDIE.
> > 
> >  Takes oom_lock.
> > 
> >  Will do nothing because down_read_trylock() fails.
> > 
> >  Releases oom_lock.
> > 
> >  Gives up and sets MMF_OOM_SKIP after one second.
> >Takes oom_lock.
> >Calls out_of_memory().
> >Will not check MMF_OOM_SKIP because Thread-1 
> > still has TIF_MEMDIE. // <= get stuck waiting for Thread-1.
> >Releases oom_lock.
> >Will not leave __alloc_pages_slowpath() because 
> > Thread-2 does not have TIF_MEMDIE.
> >Will not call up_write(>mm->mmap_sem).
> >   Reaches do_exit().
> >   Calls down_read(>mm->mmap_sem) in exit_mm() in do_exit(). // <= 
> > get stuck waiting for Thread-2.
> >   Will not call up_read(>mm->mmap_sem) in exit_mm() in do_exit().
> >   Will not clear TIF_MEMDIE in exit_oom_victim() in exit_mm() in do_exit().
> 
> That's interesting... Does it mean, that we have to give an access to the 
> reserves
> to all threads to guarantee the forward progress?

Yes, for we don't have __GFP_KILLABLE flag.

> 
> What do you think about Michal's approach? He posted a link in the thread.

Please read that thread.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 1/6] mm, oom: use oom_victims counter to synchronize oom victim selection

2017-06-22 Thread Tetsuo Handa
Tetsuo Handa wrote:
> Roman Gushchin wrote:
> > On Thu, Jun 22, 2017 at 09:40:28AM +0900, Tetsuo Handa wrote:
> > > Roman Gushchin wrote:
> > > > --- a/mm/oom_kill.c
> > > > +++ b/mm/oom_kill.c
> > > > @@ -992,6 +992,13 @@ bool out_of_memory(struct oom_control *oc)
> > > > if (oom_killer_disabled)
> > > > return false;
> > > >  
> > > > +   /*
> > > > +* If there are oom victims in flight, we don't need to select
> > > > +* a new victim.
> > > > +*/
> > > > +   if (atomic_read(_victims) > 0)
> > > > +   return true;
> > > > +
> > > > if (!is_memcg_oom(oc)) {
> > > > blocking_notifier_call_chain(_notify_list, 0, 
> > > > );
> > > > if (freed > 0)
> > > 
> > > The OOM reaper is not available for CONFIG_MMU=n kernels, and timeout 
> > > based
> > > giveup is not permitted, but a multithreaded process might be selected as
> > > an OOM victim. Not setting TIF_MEMDIE to all threads sharing an OOM 
> > > victim's
> > > mm increases possibility of preventing some OOM victim thread from 
> > > terminating
> > > (e.g. one of them cannot leave __alloc_pages_slowpath() with mmap_sem 
> > > held for
> > > write due to waiting for the TIF_MEMDIE thread to call exit_oom_victim() 
> > > when
> > > the TIF_MEMDIE thread is waiting for the thread with mmap_sem held for 
> > > write).
> > 
> > I agree, that CONFIG_MMU=n is a special case, and the proposed approach 
> > can't
> > be used directly. But can you, please, why do you find the first  chunk 
> > wrong?
> 
> Since you are checking oom_victims before checking 
> task_will_free_mem(current),
> only one thread can get TIF_MEMDIE. This is where a multithreaded OOM victim 
> without
> the OOM reaper can get stuck forever.

Oops, I misinterpreted. This is where a multithreaded OOM victim with or without
the OOM reaper can get stuck forever. Think about a process with two threads is
selected by the OOM killer and only one of these two threads can get TIF_MEMDIE.

  Thread-1 Thread-2 The OOM killer   
The OOM reaper

   Calls down_write(>mm->mmap_sem).
  Enters __alloc_pages_slowpath().
   Enters __alloc_pages_slowpath().
  Takes oom_lock.
  Calls out_of_memory().
Selects Thread-1 as an OOM 
victim.
  Gets SIGKILL.Gets SIGKILL.
  Gets TIF_MEMDIE.
  Releases oom_lock.
  Leaves __alloc_pages_slowpath() because Thread-1 has TIF_MEMDIE.
 
Takes oom_lock.
 
Will do nothing because down_read_trylock() fails.
 
Releases oom_lock.
 
Gives up and sets MMF_OOM_SKIP after one second.
   Takes oom_lock.
   Calls out_of_memory().
   Will not check MMF_OOM_SKIP because Thread-1 still 
has TIF_MEMDIE. // <= get stuck waiting for Thread-1.
   Releases oom_lock.
   Will not leave __alloc_pages_slowpath() because 
Thread-2 does not have TIF_MEMDIE.
   Will not call up_write(>mm->mmap_sem).
  Reaches do_exit().
  Calls down_read(>mm->mmap_sem) in exit_mm() in do_exit(). // <= get 
stuck waiting for Thread-2.
  Will not call up_read(>mm->mmap_sem) in exit_mm() in do_exit().
  Will not clear TIF_MEMDIE in exit_oom_victim() in exit_mm() in do_exit().
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 1/6] mm, oom: use oom_victims counter to synchronize oom victim selection

2017-06-22 Thread Tetsuo Handa
Roman Gushchin wrote:
> On Thu, Jun 22, 2017 at 09:40:28AM +0900, Tetsuo Handa wrote:
> > Roman Gushchin wrote:
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -992,6 +992,13 @@ bool out_of_memory(struct oom_control *oc)
> > >   if (oom_killer_disabled)
> > >   return false;
> > >  
> > > + /*
> > > +  * If there are oom victims in flight, we don't need to select
> > > +  * a new victim.
> > > +  */
> > > + if (atomic_read(_victims) > 0)
> > > + return true;
> > > +
> > >   if (!is_memcg_oom(oc)) {
> > >   blocking_notifier_call_chain(_notify_list, 0, );
> > >   if (freed > 0)
> > 
> > Above in this patch and below in patch 5 are wrong.
> > 
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -665,7 +672,13 @@ static void mark_oom_victim(struct task_struct *tsk)
> > >* that TIF_MEMDIE tasks should be ignored.
> > >*/
> > >   __thaw_task(tsk);
> > > - atomic_inc(_victims);
> > > +
> > > + /*
> > > +  * If there are no oom victims in flight,
> > > +  * give the task an access to the memory reserves.
> > > +  */
> > > + if (atomic_inc_return(_victims) == 1)
> > > + set_tsk_thread_flag(tsk, TIF_MEMDIE);
> > >  }
> > >  
> > >  /**
> > 
> > The OOM reaper is not available for CONFIG_MMU=n kernels, and timeout based
> > giveup is not permitted, but a multithreaded process might be selected as
> > an OOM victim. Not setting TIF_MEMDIE to all threads sharing an OOM victim's
> > mm increases possibility of preventing some OOM victim thread from 
> > terminating
> > (e.g. one of them cannot leave __alloc_pages_slowpath() with mmap_sem held 
> > for
> > write due to waiting for the TIF_MEMDIE thread to call exit_oom_victim() 
> > when
> > the TIF_MEMDIE thread is waiting for the thread with mmap_sem held for 
> > write).
> 
> I agree, that CONFIG_MMU=n is a special case, and the proposed approach can't
> be used directly. But can you, please, why do you find the first  chunk wrong?

Since you are checking oom_victims before checking task_will_free_mem(current),
only one thread can get TIF_MEMDIE. This is where a multithreaded OOM victim 
without
the OOM reaper can get stuck forever.

> Basically, it's exactly what we do now: we increment oom_victims for every oom
> victim, and every process decrements this counter during it's exit path.
> If there is at least one existing victim, we will select it again, so it's 
> just
> an optimization. Am I missing something? Why should we start new victim 
> selection
> if there processes that will likely quit and release memory soon?

If you check oom_victims like
http://lkml.kernel.org/r/201706220053.v5m0rmou078...@www262.sakura.ne.jp does,
all threads in a multithreaded OOM victim will have a chance to get TIF_MEMDIE.



By the way, below in patch 5 is also wrong.
You are not permitted to set TIF_MEMDIE if oom_killer_disabled == true.

--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -556,8 +556,18 @@ static void oom_reap_task(struct task_struct *tsk)
struct mm_struct *mm = tsk->signal->oom_mm;
 
/* Retry the down_read_trylock(mmap_sem) a few times */
-   while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, 
mm))
+   while (attempts++ < MAX_OOM_REAP_RETRIES &&
+  !__oom_reap_task_mm(tsk, mm)) {
+
+   /*
+* If the task has no access to the memory reserves,
+* grant it to help the task to exit.
+*/
+   if (!test_tsk_thread_flag(tsk, TIF_MEMDIE))
+   set_tsk_thread_flag(tsk, TIF_MEMDIE);
+
schedule_timeout_idle(HZ/10);
+   }
 
if (attempts <= MAX_OOM_REAP_RETRIES)
goto done;
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html