On 2019/01/26 4:47, Arkadiusz Miśkiewicz wrote: >> Can you please see whether the problem can be reproduced on the >> current linux-next? >> >> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > > I can reproduce on next (5.0.0-rc3-next-20190125), too: >
Please try this patch. Subject: [PATCH v2] memcg: killed threads should not invoke memcg OOM killer From: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> To: Andrew Morton <a...@linux-foundation.org>, Johannes Weiner <han...@cmpxchg.org>, David Rientjes <rient...@google.com> Cc: Michal Hocko <mho...@kernel.org>, linux...@kvack.org, Kirill Tkhai <ktk...@virtuozzo.com>, Linus Torvalds <torva...@linux-foundation.org> Message-ID: <01370f70-e1f6-ebe4-b95e-0df21a0bc...@i-love.sakura.ne.jp> Date: Tue, 15 Jan 2019 19:17:27 +0900 If $N > $M, a single process with $N threads in a memcg group can easily kill all $M processes in that memcg group, for mem_cgroup_out_of_memory() does not check if current thread needs to invoke the memcg OOM killer. T1@P1 |T2...$N@P1|P2...$M |OOM reaper ----------+----------+----------+---------- # all sleeping try_charge() mem_cgroup_out_of_memory() mutex_lock(oom_lock) try_charge() mem_cgroup_out_of_memory() mutex_lock(oom_lock) out_of_memory() select_bad_process() oom_kill_process(P1) wake_oom_reaper() oom_reap_task() # ignores P1 mutex_unlock(oom_lock) out_of_memory() select_bad_process(P2...$M) # all killed by T2...$N@P1 wake_oom_reaper() oom_reap_task() # ignores P2...$M mutex_unlock(oom_lock) We don't need to invoke the memcg OOM killer if current thread was killed when waiting for oom_lock, for mem_cgroup_oom_synchronize(true) can count on try_charge() when mem_cgroup_oom_synchronize(true) can not make forward progress because try_charge() allows already killed/exiting threads to make forward progress, and memory_max_write() can bail out upon signals. At first Michal thought that fatal signal check is racy compared to tsk_is_oom_victim() check. But an experiment showed that trying to call mark_oom_victim() on all killed thread groups is more racy than fatal signal check due to task_will_free_mem(current) path in out_of_memory(). Therefore, this patch changes mem_cgroup_out_of_memory() to bail out upon should_force_charge() == T rather than upon fatal_signal_pending() == T, for should_force_charge() == T && signal_pending(current) == F at memory_max_write() can't happen because current thread won't call memory_max_write() after getting PF_EXITING. Signed-off-by: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> Acked-by: Michal Hocko <mho...@suse.com> Fixes: 29ef680ae7c2 ("memcg, oom: move out_of_memory back to the charge path") Fixes: 3100dab2aa09 ("mm: memcontrol: print proper OOM header when no eligible victim left") Cc: sta...@vger.kernel.org # 4.19+ --- mm/memcontrol.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index af7f18b..79a7d2a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -248,6 +248,12 @@ enum res_type { iter != NULL; \ iter = mem_cgroup_iter(NULL, iter, NULL)) +static inline bool should_force_charge(void) +{ + return tsk_is_oom_victim(current) || fatal_signal_pending(current) || + (current->flags & PF_EXITING); +} + /* Some nice accessors for the vmpressure. */ struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg) { @@ -1389,8 +1395,13 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, }; bool ret; - mutex_lock(&oom_lock); - ret = out_of_memory(&oc); + if (mutex_lock_killable(&oom_lock)) + return true; + /* + * A few threads which were not waiting at mutex_lock_killable() can + * fail to bail out. Therefore, check again after holding oom_lock. + */ + ret = should_force_charge() || out_of_memory(&oc); mutex_unlock(&oom_lock); return ret; } @@ -2209,9 +2220,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, * bypass the last charges so that they can exit quickly and * free their memory. */ - if (unlikely(tsk_is_oom_victim(current) || - fatal_signal_pending(current) || - current->flags & PF_EXITING)) + if (unlikely(should_force_charge())) goto force; /* -- 1.8.3.1